Repository: JMoonr/LATR Branch: main Commit: 1c9e62646274 Files: 43 Total size: 330.6 KB Directory structure: gitextract_q_wu7p65/ ├── .gitignore ├── LICENSE ├── README.md ├── config/ │ ├── _base_/ │ │ ├── base_res101_bs16xep100.py │ │ ├── base_res101_bs16xep100_apollo.py │ │ ├── once_eval_config.json │ │ └── optimizer.py │ └── release_iccv/ │ ├── apollo_illu.py │ ├── apollo_rare.py │ ├── apollo_standard.py │ ├── latr_1000_baseline.py │ ├── latr_1000_baseline_lite.py │ └── once.py ├── data/ │ ├── Load_Data.py │ ├── __init__.py │ ├── apollo_dataset.py │ └── transform.py ├── docs/ │ ├── data_preparation.md │ ├── install.md │ └── train_eval.md ├── experiments/ │ ├── __init__.py │ ├── ddp.py │ ├── gpu_utils.py │ └── runner.py ├── main.py ├── models/ │ ├── __init__.py │ ├── latr.py │ ├── latr_head.py │ ├── ms2one.py │ ├── scatter_utils.py │ ├── sparse_ins.py │ ├── sparse_inst_loss.py │ ├── transformer_bricks.py │ └── utils.py ├── pretrained_models/ │ └── .gitkeep ├── requirements.txt ├── utils/ │ ├── MinCostFlow.py │ ├── __init__.py │ ├── eval_3D_lane.py │ ├── eval_3D_lane_apollo.py │ ├── eval_3D_once.py │ └── utils.py └── work_dirs/ └── .gitkeep ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ *pyc *pth ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2023 JMoonr Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================

LATR: 3D Lane Detection from Monocular Images with Transformer

This is the official PyTorch implementation of [LATR: 3D Lane Detection from Monocular Images with Transformer](https://arxiv.org/abs/2308.04583). ![fig2](/assets/fig2.png) ## News - **2024-01-15** :confetti_ball: Our new work [DV-3DLane: End-to-end Multi-modal 3D Lane Detection with Dual-view Representation](https://github.com/JMoonr/dv-3dlane) is accepted by ICLR2024. - **2023-08-12** :tada: LATR is accepted as an Oral presentation at ICCV2023! :sparkles: ## Environments To set up the required packages, please refer to the [installation guide](./docs/install.md). ## Data Please follow [data preparation](./docs/data_preparation.md) to download dataset. ## Pretrained Models Note that the performance of pretrained model is higher than our paper due to code refactoration and optimization. All models are uploaded to [google drive](https://drive.google.com/drive/folders/1AhvLvE84vayzFxa0teRHYRdXz34ulzjB?usp=sharing). | Dataset | Pretrained | Metrics | md5 | | - | - | - | - | | OpenLane-1000 | [Google Drive](https://drive.google.com/file/d/1jThvqnJ2cUaAuKdlTuRKjhLCH0Zq62A1/view?usp=sharing) | F1=0.6297 | d8ecb900c34fd23a9e7af840aff00843 | | OpenLane-1000 (Lite version) | [Google Drive](https://drive.google.com/file/d/1WD5dxa6SI2oR9popw3kO2-7eGM2z-IHY/view?usp=sharing) | F1=0.6212 | 918de41d0d31dbfbecff3001c49dc296 | | ONCE | [Google Drive](https://drive.google.com/file/d/12kXkJ9tDxm13CyFbB1ddt82lJZkYEicd/view?usp=sharing) | F1=0.8125 | 65a6958c162e3c7be0960bceb3f54650 | | Apollo-balance | [Google Drive](https://drive.google.com/file/d/1hGyNrYi3wAQaKbC1mD_18NG35gdmMUiM/view?usp=sharing) | F1=0.9697 | 551967e8654a8a522bdb0756d74dd1a2 | | Apollo-rare | [Google Drive](https://drive.google.com/file/d/19VVBaWBnWiEqGx1zJaeXF_1CKn88G5v0/view?usp=sharing) | F1=0.9641 | 184cfff1d3097a9009011f79f4594138 | | Apollo-visual | [Google Drive](https://drive.google.com/file/d/1ZzaUODYK2dyiG_2bDXe5tiutxNvc71M2/view?usp=sharing) | F1=0.9611 | cec4aa567c264c84808f3c32f5aace82 | ## Evaluation You can download the [pretrained models](#pretrained-models) to `./pretrained_models` directory and refer to the [eval guide](./docs/train_eval.md#evaluation) for evaluation. ## Train Please follow the steps in [training](./docs/train_eval.md#train) to train the model. ## Benchmark ### OpenLane | Models | F1 | Accuracy | X error
near \| far | Z-error
near \| far | | ----- | -- | -------- | ------- | ------- | | 3DLaneNet | 44.1 | - | 0.479 \| 0.572 | 0.367 \| 0.443 | | GenLaneNet | 32.3 | - | 0.593 \| 0.494 | 0.140 \| 0.195 | | Cond-IPM | 36.3 | - | 0.563 \| 1.080 | 0.421 \| 0.892 | | PersFormer | 50.5 | 89.5 | 0.319 \| 0.325 | 0.112 \| 0.141 | | CurveFormer | 50.5 | - | 0.340 \| 0.772 | 0.207 \| 0.651 | | PersFormer-Res50 | 53.0 | 89.2 | 0.321 \| 0.303 | 0.085 \| 0.118 | | **LATR-Lite** | 61.5 | 91.9 | 0.225 \| 0.249 | 0.073 \| 0.106 | | **LATR** | 61.9 | 92.0 | 0.219 \| 0.259 | 0.075 \| 0.104 | ### Apollo Plaes kindly refer to our paper for the performance on other scenes.
Scene Models F1 AP X error
near | far
Z error
near | far
Balanced Scene 3DLaneNet 86.4 89.3 0.068 | 0.477 0.015 | 0.202
GenLaneNet 88.1 90.1 0.061 | 0.496 0.012 | 0.214
CLGo 91.9 94.2 0.061 | 0.361 0.029 | 0.250
PersFormer 92.9 - 0.054 | 0.356 0.010 | 0.234
GP 91.9 93.8 0.049 | 0.387 0.008 | 0.213
CurveFormer 95.8 97.3 0.078 | 0.326 0.018 | 0.219
LATR-Lite 96.5 97.8 0.035 | 0.283 0.012 | 0.209
LATR 96.8 97.9 0.022 | 0.253 0.007 | 0.202
### ONCE | Method | F1 | Precision(%) | Recall(%) | CD error(m) | | :- | :- | :- | :- | :- | | 3DLaneNet | 44.73 | 61.46 | 35.16 | 0.127 | | GenLaneNet | 45.59 | 63.95 | 35.42 | 0.121 | | SALAD | 64.07 | 75.90 | 55.42 | 0.098 | | PersFormer | 72.07 | 77.82 | 67.11 | 0.086 | | **LATR** | 80.59 | 86.12 | 75.73 | 0.052 | ## Acknowledgment This library is inspired by [OpenLane](https://github.com/OpenDriveLab/PersFormer_3DLane), [GenLaneNet](https://github.com/yuliangguo/Pytorch_Generalized_3D_Lane_Detection), [mmdetection3d](https://github.com/open-mmlab/mmdetection3d), [SparseInst](https://github.com/hustvl/SparseInst), [ONCE](https://github.com/once-3dlanes/once_3dlanes_benchmark) and many other related works, we thank them for sharing the code and datasets. ## Citation If you find LATR is useful for your research, please consider citing the paper: ```tex @article{luo2023latr, title={LATR: 3D Lane Detection from Monocular Images with Transformer}, author={Luo, Yueru and Zheng, Chaoda and Yan, Xu and Kun, Tang and Zheng, Chao and Cui, Shuguang and Li, Zhen}, journal={arXiv preprint arXiv:2308.04583}, year={2023} } ``` ================================================ FILE: config/_base_/base_res101_bs16xep100.py ================================================ import os import os.path as osp import numpy as np dataset_name = 'openlane' dataset = '300' # '300' | '1000' # The path of dataset json files (annotations) data_dir = './data/openlane/lane3d_300/' # The path of dataset image files (images) dataset_dir = './data/openlane/images/' output_dir = dataset_name org_h = 1280 org_w = 1920 crop_y = 0 ipm_h = 208 ipm_w = 128 resize_h = 360 resize_w = 480 mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] cam_height = 1.55 pitch = 3 fix_cam = False pred_cam = False model_name = 'LATR' weight_init = 'normal' mod = None position_embedding = 'learned' max_lanes = 20 num_category = 21 prob_th = 0.5 num_class = 21 # 1 bgd | 1 lanes # top view top_view_region = np.array([[-10, 103], [10, 103], [-10, 3], [10, 3]]) anchor_y_steps = np.linspace(3, 103, 25) num_y_steps = len(anchor_y_steps) # placeholder, not used K = np.array([[1000., 0., 960.], [0., 1000., 640.], [0., 0., 1.]]) # persformer anchor use_default_anchor = False batch_size = 16 nepochs = 100 no_cuda = False nworkers = 16 start_epoch = 0 channels_in = 3 # args input test_mode = False # 'store_true' # TODO evaluate = False # TODO resume = '' # resume latest saved run. # tensorboard no_tb = False # print & save print_freq = 50 save_freq = 50 # ddp setting dist = True sync_bn = True cudnn = True distributed = True local_rank = None #TODO gpu = 0 world_size = 1 nodes = 1 # for reload ckpt eval_ckpt = '' resume_from = '' output_dir = 'openlane' evaluate_case = '' eval_freq = 8 # eval freq during training save_json_path = None save_root = 'work_dirs' save_prefix = osp.join(os.getcwd(), save_root) save_path = osp.join(save_prefix, output_dir) ================================================ FILE: config/_base_/base_res101_bs16xep100_apollo.py ================================================ import os import os.path as osp import numpy as np # ========DATA SETTING======== # dataset_name = 'apollo' dataset = 'standard' data_dir = osp.join('./data/apollosyn_gen-lanenet/data_splits', dataset) dataset_dir = './data/apollosyn_gen-lanenet/Apollo_Sim_3D_Lane_Release' output_dir = 'apollo' rewrite_pred = True save_best = False output_dir = dataset_name org_h = 1080 org_w = 1920 crop_y = 0 cam_height = 1.55 pitch = 3 fix_cam = False pred_cam = False model_name = 'LATR' mod = None ipm_h = 208 ipm_w = 128 resize_h = 360 resize_w = 480 mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] K = np.array([[2015., 0., 960.], [0., 2015., 540.], [0., 0., 1.]]) position_embedding = 'learned' max_lanes = 6 num_category = 2 prob_th = 0.5 num_class = 2 # 1 bgd | 1 lanes batch_size = 16 nepochs = 210 nworkers = 16 # ddp setting dist = True sync_bn = True cudnn = True distributed = True local_rank = None #TODO gpu = 0 world_size = 1 nodes = 1 # for reload ckpt eval_ckpt = '' resume = '' # ckpt number as input resume_from = '' # ckpt path as input no_cuda = False # tensorboard no_tb = False start_epoch = 0 channels_in = 3 # args input test_mode = False # 'store_true' # TODO evaluate = False # TODO evaluate_case = '' # print & save print_freq = 50 save_freq = 50 eval_freq = 20 # eval freq during training # top view top_view_region = np.array([[-10, 103], [10, 103], [-10, 3], [10, 3]]) anchor_y_steps = np.linspace(3, 103, 25) num_y_steps = len(anchor_y_steps) save_path = None save_json_path = None ================================================ FILE: config/_base_/once_eval_config.json ================================================ {"side_range_l": -10, "side_range_h": 10, "fwd_range_l": 0, "fwd_range_h": 50, "height_range_l": 0, "height_range_h": 5, "res": 0.05, "lane_width_x": 30, "lane_width_y": 10, "iou_thresh": 0.3, "distance_thresh": 0.3, "process_num": 10, "score_l": 0.10, "score_h": 1, "score_step": 0.05, "exp_name": "evaluation"} ================================================ FILE: config/_base_/optimizer.py ================================================ # opt setting optimizer = 'adam' learning_rate = 2e-4 weight_decay = 0.001 lr_decay = False # TODO 'store_true' niter = 900 # num of iter at starting learning rate niter_decay = 400 # '# of iter to linearly decay learning rate to zero' lr_policy = 'cosine' gamma = 0.1 # multiplicative factor of learning rate decay lr_decay_iters = 10 # multiply by a gamma every lr_decay_iters iterations T_max = 8 # maximum number of iterations T_0 = 8 T_mult = 2 eta_min = 1e-5 # minimum learning rate clip_grad_norm = 35.0 # grad clipping loss_threshold = 1e5 ================================================ FILE: config/release_iccv/apollo_illu.py ================================================ import numpy as np from mmcv.utils import Config import os.path as osp _base_ = [ '../_base_/base_res101_bs16xep100_apollo.py', '../_base_/optimizer.py', ] mod = 'release_iccv/apollo_illu' mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] dataset_name = 'apollo' dataset = 'illus_chg' data_dir = osp.join('./data/apollosyn_gen-lanenet/data_splits', dataset) dataset_dir = './data/apollosyn_gen-lanenet/Apollo_Sim_3D_Lane_Release' output_dir = 'apollo' num_category = 2 max_lanes = 6 T_max = 30 eta_min = 1e-6 clip_grad_norm = 20 nepochs = 210 eval_freq = 1 h_org, w_org = 1080, 1920 batch_size = 8 nworkers = 10 pos_threshold = 0.5 top_view_region = np.array([ [-10, 103], [10, 103], [-10, 3], [10, 3]]) enlarge_length = 20 position_range = [ top_view_region[0][0] - enlarge_length, top_view_region[2][1] - enlarge_length, -5, top_view_region[1][0] + enlarge_length, top_view_region[0][1] + enlarge_length, 5.] anchor_y_steps = np.linspace(3, 103, 20) num_y_steps = len(anchor_y_steps) photo_aug = dict( brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18) _dim_ = 256 num_query = 12 num_pt_per_line = 20 latr_cfg = dict( fpn_dim = _dim_, num_query = num_query, num_group = 1, sparse_num_group = 4, encoder = dict( type='ResNet', depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=-1, norm_cfg=dict(type='BN2d', requires_grad=False), norm_eval=True, style='caffe', dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50') ), neck = dict( type='FPN', in_channels=[512, 1024, 2048], out_channels=_dim_, start_level=0, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True ), head=dict( xs_loss_weight=2.0, zs_loss_weight=10.0, vis_loss_weight=1.0, cls_loss_weight=10, project_loss_weight=1.0, pt_as_query=True, num_pt_per_line=num_pt_per_line, ), trans_params=dict(init_z=0, bev_h=150, bev_w=70), ) ms2one=dict( type='DilateNaive', inc=_dim_, outc=_dim_, num_scales=4, dilations=(1, 2, 5, 9)) transformer=dict( type='LATRTransformer', decoder=dict( type='LATRTransformerDecoder', embed_dims=_dim_, num_layers=6, enlarge_length=enlarge_length, M_decay_ratio=1, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, transformerlayers=dict( type='LATRDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=_dim_, num_heads=4, dropout=0.1), dict( type='MSDeformableAttention3D', embed_dims=_dim_, num_heads=4, num_levels=1, num_points=8, batch_first=False, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, dropout=0.1), ], ffn_cfgs=dict( type='FFN', embed_dims=_dim_, feedforward_channels=_dim_*8, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type='ReLU', inplace=True), ), feedforward_channels=_dim_ * 8, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')), )) sparse_ins_decoder=Config( dict( encoder=dict( out_dims=_dim_), decoder=dict( num_query=latr_cfg['num_query'], num_group=latr_cfg['num_group'], sparse_num_group=latr_cfg['sparse_num_group'], hidden_dim=_dim_, kernel_dim=_dim_, num_classes=num_category, num_convs=4, output_iam=True, scale_factor=1., ce_weight=2.0, mask_weight=5.0, dice_weight=2.0, objectness_weight=1.0, ), sparse_decoder_weight=5.0, )) resize_h = 720 resize_w = 960 optimizer_cfg = dict( type='AdamW', lr=2e-4, weight_decay=0.01) ================================================ FILE: config/release_iccv/apollo_rare.py ================================================ import numpy as np from mmcv.utils import Config import os.path as osp _base_ = [ '../_base_/base_res101_bs16xep100_apollo.py', '../_base_/optimizer.py', ] mod = 'release_iccv/apollo_rare' mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] dataset_name = 'apollo' dataset = 'rare_subset' data_dir = osp.join('./data/apollosyn_gen-lanenet/data_splits', dataset) dataset_dir = './data/apollosyn_gen-lanenet/Apollo_Sim_3D_Lane_Release' output_dir = 'apollo' num_category = 2 max_lanes = 6 T_max = 30 eta_min = 1e-8 clip_grad_norm = 20 nepochs = 210 eval_freq = 1 h_org, w_org = 1080, 1920 batch_size = 8 nworkers = 10 pos_threshold = 0.5 top_view_region = np.array([ [-10, 103], [10, 103], [-10, 3], [10, 3]]) enlarge_length = 20 position_range = [ top_view_region[0][0] - enlarge_length, top_view_region[2][1] - enlarge_length, -5, top_view_region[1][0] + enlarge_length, top_view_region[0][1] + enlarge_length, 5.] anchor_y_steps = np.linspace(3, 103, 20) num_y_steps = len(anchor_y_steps) photo_aug = dict( brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18) _dim_ = 256 num_query = 12 num_pt_per_line = 20 latr_cfg = dict( fpn_dim = _dim_, num_query = num_query, num_group = 1, sparse_num_group = 4, encoder = dict( type='ResNet', depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=-1, norm_cfg=dict(type='BN2d', requires_grad=False), norm_eval=True, style='caffe', dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50') ), neck = dict( type='FPN', in_channels=[512, 1024, 2048], out_channels=_dim_, start_level=0, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True ), head=dict( xs_loss_weight=2.0, zs_loss_weight=10.0, vis_loss_weight=1.0, cls_loss_weight=10, project_loss_weight=1.0, pt_as_query=True, num_pt_per_line=num_pt_per_line, ), trans_params=dict(init_z=0, bev_h=150, bev_w=70), ) ms2one=dict( type='DilateNaive', inc=_dim_, outc=_dim_, num_scales=4, dilations=(1, 2, 5, 9)) transformer=dict( type='LATRTransformer', decoder=dict( type='LATRTransformerDecoder', embed_dims=_dim_, num_layers=6, enlarge_length=enlarge_length, M_decay_ratio=1, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, transformerlayers=dict( type='LATRDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=_dim_, num_heads=4, dropout=0.1), dict( type='MSDeformableAttention3D', embed_dims=_dim_, num_heads=4, num_levels=1, num_points=8, batch_first=False, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, dropout=0.1), ], ffn_cfgs=dict( type='FFN', embed_dims=_dim_, feedforward_channels=_dim_*8, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type='ReLU', inplace=True), ), feedforward_channels=_dim_ * 8, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')), )) sparse_ins_decoder=Config( dict( encoder=dict( out_dims=_dim_), decoder=dict( num_query=latr_cfg['num_query'], num_group=latr_cfg['num_group'], sparse_num_group=latr_cfg['sparse_num_group'], hidden_dim=_dim_, kernel_dim=_dim_, num_classes=num_category, num_convs=4, output_iam=True, scale_factor=1., ce_weight=2.0, mask_weight=5.0, dice_weight=2.0, objectness_weight=1.0, ), sparse_decoder_weight=5.0, )) resize_h = 720 resize_w = 960 optimizer_cfg = dict( type='AdamW', lr=2e-4, weight_decay=0.01) ================================================ FILE: config/release_iccv/apollo_standard.py ================================================ import numpy as np from mmcv.utils import Config import os.path as osp _base_ = [ '../_base_/base_res101_bs16xep100_apollo.py', '../_base_/optimizer.py', ] mod = 'release_iccv/apollo_standard' mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] dataset_name = 'apollo' dataset = 'standard' data_dir = osp.join('./data/apollosyn_gen-lanenet/data_splits', dataset) dataset_dir = './data/apollosyn_gen-lanenet/Apollo_Sim_3D_Lane_Release' output_dir = 'apollo' num_category = 2 max_lanes = 6 T_max = 30 eta_min = 1e-6 clip_grad_norm = 20 nepochs = 210 eval_freq = 1 h_org, w_org = 1080, 1920 batch_size = 8 nworkers = 10 pos_threshold = 0.3 top_view_region = np.array([ [-10, 103], [10, 103], [-10, 3], [10, 3]]) enlarge_length = 20 position_range = [ top_view_region[0][0] - enlarge_length, top_view_region[2][1] - enlarge_length, -5, top_view_region[1][0] + enlarge_length, top_view_region[0][1] + enlarge_length, 5.] anchor_y_steps = np.linspace(3, 103, 20) num_y_steps = len(anchor_y_steps) _dim_ = 256 num_query = 12 num_pt_per_line = 20 latr_cfg = dict( fpn_dim = _dim_, num_query = num_query, num_group = 1, sparse_num_group = 4, encoder = dict( type='ResNet', depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=-1, norm_cfg=dict(type='BN2d', requires_grad=False), norm_eval=True, style='caffe', dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50') ), neck = dict( type='FPN', in_channels=[512, 1024, 2048], out_channels=_dim_, start_level=0, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True ), head=dict( xs_loss_weight=2.0, zs_loss_weight=10.0, vis_loss_weight=1.0, cls_loss_weight=10, project_loss_weight=1.0, pt_as_query=True, num_pt_per_line=num_pt_per_line, ), trans_params=dict(init_z=0, bev_h=150, bev_w=70), ) ms2one=dict( type='DilateNaive', inc=_dim_, outc=_dim_, num_scales=4, dilations=(1, 2, 5, 9)) transformer=dict( type='LATRTransformer', decoder=dict( type='LATRTransformerDecoder', embed_dims=_dim_, num_layers=6, enlarge_length=enlarge_length, M_decay_ratio=1, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, transformerlayers=dict( type='LATRDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=_dim_, num_heads=4, dropout=0.1), dict( type='MSDeformableAttention3D', embed_dims=_dim_, num_heads=4, num_levels=1, num_points=8, batch_first=False, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, dropout=0.1), ], ffn_cfgs=dict( type='FFN', embed_dims=_dim_, feedforward_channels=_dim_*8, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type='ReLU', inplace=True), ), feedforward_channels=_dim_ * 8, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')), )) sparse_ins_decoder=Config( dict( encoder=dict( out_dims=_dim_), decoder=dict( num_query=latr_cfg['num_query'], num_group=latr_cfg['num_group'], sparse_num_group=latr_cfg['sparse_num_group'], hidden_dim=_dim_, kernel_dim=_dim_, num_classes=num_category, num_convs=4, output_iam=True, scale_factor=1., ce_weight=2.0, mask_weight=5.0, dice_weight=2.0, objectness_weight=1.0, ), sparse_decoder_weight=5.0, )) resize_h = 720 resize_w = 960 optimizer_cfg = dict( type='AdamW', lr=2e-4, weight_decay=0.01) ================================================ FILE: config/release_iccv/latr_1000_baseline.py ================================================ import numpy as np from mmcv.utils import Config _base_ = [ '../_base_/base_res101_bs16xep100.py', '../_base_/optimizer.py' ] mod = 'release_iccv/latr_1000_baseline' mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] dataset = '1000' dataset_dir = './data/openlane/images/' data_dir = './data/openlane/lane3d_1000/' batch_size = 8 nworkers = 10 num_category = 21 pos_threshold = 0.3 top_view_region = np.array([ [-10, 103], [10, 103], [-10, 3], [10, 3]]) enlarge_length = 20 position_range = [ top_view_region[0][0] - enlarge_length, top_view_region[2][1] - enlarge_length, -5, top_view_region[1][0] + enlarge_length, top_view_region[0][1] + enlarge_length, 5.] anchor_y_steps = np.linspace(3, 103, 20) num_y_steps = len(anchor_y_steps) # extra aug photo_aug = dict( brightness_delta=32 // 2, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=9) clip_grad_norm = 20.0 _dim_ = 256 num_query = 40 num_pt_per_line = 20 latr_cfg = dict( fpn_dim = _dim_, num_query = num_query, num_group = 1, sparse_num_group = 4, encoder = dict( type='ResNet', depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN2d', requires_grad=False), norm_eval=True, style='caffe', dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50') ), neck = dict( type='FPN', in_channels=[512, 1024, 2048], out_channels=_dim_, start_level=0, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True ), head=dict( pt_as_query=True, num_pt_per_line=num_pt_per_line, xs_loss_weight=2.0, zs_loss_weight=10.0, vis_loss_weight=1.0, cls_loss_weight=10, project_loss_weight=1.0, ), trans_params=dict(init_z=0, bev_h=150, bev_w=70), ) ms2one=dict( type='DilateNaive', inc=_dim_, outc=_dim_, num_scales=4, dilations=(1, 2, 5, 9)) transformer=dict( type='LATRTransformer', decoder=dict( type='LATRTransformerDecoder', embed_dims=_dim_, num_layers=6, enlarge_length=enlarge_length, M_decay_ratio=1, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, transformerlayers=dict( type='LATRDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=_dim_, num_heads=4, dropout=0.1), dict( type='MSDeformableAttention3D', embed_dims=_dim_, num_heads=4, num_levels=1, num_points=8, batch_first=False, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, dropout=0.1), ], ffn_cfgs=dict( type='FFN', embed_dims=_dim_, feedforward_channels=_dim_*8, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type='ReLU', inplace=True), ), feedforward_channels=_dim_ * 8, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')), )) sparse_ins_decoder=Config( dict( encoder=dict( out_dims=_dim_), decoder=dict( num_query=latr_cfg['num_query'], num_group=latr_cfg['num_group'], sparse_num_group=latr_cfg['sparse_num_group'], hidden_dim=_dim_, kernel_dim=_dim_, num_classes=num_category, num_convs=4, output_iam=True, scale_factor=1., ce_weight=2.0, mask_weight=5.0, dice_weight=2.0, objectness_weight=1.0, ), sparse_decoder_weight=5.0, )) nepochs = 24 resize_h = 720 resize_w = 960 eval_freq = 8 optimizer_cfg = dict( type='AdamW', lr=2e-4, paramwise_cfg=dict( custom_keys={ 'sampling_offsets': dict(lr_mult=0.1), }), weight_decay=0.01) ================================================ FILE: config/release_iccv/latr_1000_baseline_lite.py ================================================ import numpy as np from mmcv.utils import Config _base_ = [ '../_base_/base_res101_bs16xep100.py', '../_base_/optimizer.py' ] mod = 'release_iccv/latr_1000_baseline_lite' mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] dataset = '1000' dataset_dir = './data/openlane/images/' data_dir = './data/openlane/lane3d_1000/' batch_size = 8 nworkers = 10 num_category = 21 pos_threshold = 0.3 clip_grad_norm = 20 top_view_region = np.array([ [-10, 103], [10, 103], [-10, 3], [10, 3]]) enlarge_length = 20 position_range = [ top_view_region[0][0] - enlarge_length, top_view_region[2][1] - enlarge_length, -5, top_view_region[1][0] + enlarge_length, top_view_region[0][1] + enlarge_length, 5.] anchor_y_steps = np.linspace(3, 103, 20) num_y_steps = len(anchor_y_steps) # extra aug photo_aug = dict( brightness_delta=32 // 2, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=9) _dim_ = 256 num_query = 40 num_pt_per_line = 20 latr_cfg = dict( fpn_dim = _dim_, num_query = num_query, num_group = 1, sparse_num_group = 4, encoder = dict( type='ResNet', depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN2d', requires_grad=False), norm_eval=True, style='caffe', dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50') ), neck = dict( type='FPN', in_channels=[512, 1024, 2048], out_channels=_dim_, start_level=0, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True ), head=dict( pt_as_query=True, num_pt_per_line=num_pt_per_line, xs_loss_weight=2.0, zs_loss_weight=10.0, vis_loss_weight=1.0, cls_loss_weight=10, project_loss_weight=1.0, ), trans_params=dict(init_z=0, bev_h=150, bev_w=70), ) ms2one=dict( type='DilateNaive', inc=_dim_, outc=_dim_, num_scales=4, dilations=(1, 2, 5, 9)) transformer=dict( type='LATRTransformer', decoder=dict( type='LATRTransformerDecoder', embed_dims=_dim_, num_layers=2, enlarge_length=enlarge_length, M_decay_ratio=1, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, transformerlayers=dict( type='LATRDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=_dim_, num_heads=4, dropout=0.1), dict( type='MSDeformableAttention3D', embed_dims=_dim_, num_heads=4, num_levels=1, num_points=8, batch_first=False, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, dropout=0.1), ], ffn_cfgs=dict( type='FFN', embed_dims=_dim_, feedforward_channels=_dim_*8, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type='ReLU', inplace=True), ), feedforward_channels=_dim_ * 8, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')), )) sparse_ins_decoder=Config( dict( encoder=dict( out_dims=_dim_), decoder=dict( num_query=latr_cfg['num_query'], num_group=latr_cfg['num_group'], sparse_num_group=latr_cfg['sparse_num_group'], hidden_dim=_dim_, kernel_dim=_dim_, num_classes=num_category, num_convs=4, output_iam=True, scale_factor=1., ce_weight=2.0, mask_weight=5.0, dice_weight=2.0, objectness_weight=1.0, ), sparse_decoder_weight=5.0, )) resize_h = 720 resize_w = 960 nepochs = 24 eval_freq = 8 optimizer_cfg = dict( type='AdamW', lr=2e-4, paramwise_cfg=dict( custom_keys={ 'sampling_offsets': dict(lr_mult=0.1), }), weight_decay=0.01) ================================================ FILE: config/release_iccv/once.py ================================================ import numpy as np from mmcv.utils import Config import os.path as osp _base_ = [ '../_base_/base_res101_bs16xep100.py', '../_base_/optimizer.py' ] mod = 'release_iccv/once' mean = [0.485, 0.456, 0.406] std = [0.229, 0.224, 0.225] dataset = 'once' dataset_name = 'once' data_dir = 'data/once/' dataset_dir = 'data/once/data/' eval_config_dir = 'config/_base_/once_eval_config.json' save_path = osp.join('./work_dirs', dataset) max_lanes = 8 num_pt_per_line = 20 eta_min = 1e-6 clip_grad_norm = 20 batch_size = 8 nworkers = 10 num_category = 2 pos_threshold = 0.3 top_view_region = np.array([ [-10, 65], [10, 65], [-10, 0.5], [10, 0.5]]) enlarge_length = 10 position_range = [ top_view_region[0][0] - enlarge_length, top_view_region[2][1] - enlarge_length, -5, top_view_region[1][0] + enlarge_length, top_view_region[0][1] + enlarge_length, 5.] anchor_y_steps = np.linspace(0.5, 65, num_pt_per_line) num_y_steps = len(anchor_y_steps) _dim_ = 256 num_query = 12 num_pt_per_line = 20 latr_cfg = dict( fpn_dim = _dim_, num_query = num_query, num_group = 1, sparse_num_group = 4, encoder = dict( type='ResNet', depth=50, num_stages=4, out_indices=(1, 2, 3), frozen_stages=1, norm_cfg=dict(type='BN2d', requires_grad=False), norm_eval=True, style='caffe', # with_cp=True, dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False), stage_with_dcn=(False, False, True, True), init_cfg=dict(type='Pretrained', checkpoint='torchvision://resnet50') ), neck = dict( type='FPN', in_channels=[512, 1024, 2048], out_channels=_dim_, start_level=0, add_extra_convs='on_output', num_outs=4, relu_before_extra_convs=True ), head=dict( xs_loss_weight=2.0, zs_loss_weight=10.0, vis_loss_weight=1.0, cls_loss_weight=10, project_loss_weight=1.0, pt_as_query=True, num_pt_per_line=num_pt_per_line, ), trans_params=dict(init_z=0, bev_h=150, bev_w=70), ) ms2one=dict( type='DilateNaive', inc=_dim_, outc=_dim_, num_scales=4, dilations=(1, 2, 5, 9)) transformer=dict( type='LATRTransformer', decoder=dict( type='LATRTransformerDecoder', embed_dims=_dim_, num_layers=6, enlarge_length=enlarge_length, M_decay_ratio=1, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, transformerlayers=dict( type='LATRDecoderLayer', attn_cfgs=[ dict( type='MultiheadAttention', embed_dims=_dim_, num_heads=4, dropout=0.1), dict( type='MSDeformableAttention3D', embed_dims=_dim_, num_heads=4, num_levels=1, num_points=8, batch_first=False, num_query=num_query, num_anchor_per_query=num_pt_per_line, anchor_y_steps=anchor_y_steps, dropout=0.1), ], ffn_cfgs=dict( type='FFN', embed_dims=_dim_, feedforward_channels=_dim_*8, num_fcs=2, ffn_drop=0.1, act_cfg=dict(type='ReLU', inplace=True), ), feedforward_channels=_dim_ * 8, operation_order=('self_attn', 'norm', 'cross_attn', 'norm', 'ffn', 'norm')), )) sparse_ins_decoder=Config( dict( encoder=dict( out_dims=_dim_), decoder=dict( num_query=latr_cfg['num_query'], num_group=latr_cfg['num_group'], sparse_num_group=latr_cfg['sparse_num_group'], hidden_dim=_dim_, kernel_dim=_dim_, num_classes=num_category, num_convs=4, output_iam=True, scale_factor=1., ce_weight=2.0, mask_weight=5.0, dice_weight=2.0, objectness_weight=1.0, ), sparse_decoder_weight=5.0, )) resize_h = 720 resize_w = 960 nepochs = 24 eval_freq = 8 optimizer_cfg = dict( type='AdamW', lr=2e-4, paramwise_cfg=dict( custom_keys={ 'sampling_offsets': dict(lr_mult=0.1), }), weight_decay=0.01) ================================================ FILE: data/Load_Data.py ================================================ import re import os import sys import copy import json import glob import random import warnings import numpy as np import cv2 from PIL import Image from torch.utils.data import Dataset, DataLoader from torchvision import transforms import torchvision.transforms.functional as F from torchvision.transforms import InterpolationMode from utils.utils import * from experiments.gpu_utils import is_main_process from .transform import PhotoMetricDistortionMultiViewImage sys.path.append('./') warnings.simplefilter('ignore', np.RankWarning) matplotlib.use('Agg') import yaml class LaneDataset(Dataset): """ Dataset with labeled lanes This implementation considers: w/o laneline 3D attributes w/o centerline annotations default considers 3D laneline, including centerlines This new version of data loader prepare ground-truth anchor tensor in flat ground space. It is assumed the dataset provides accurate visibility labels. Preparing ground-truth tensor depends on it. """ # dataset_base_dir is image path, json_file_path is json file path, def __init__(self, dataset_base_dir, json_file_path, args, data_aug=False): """ :param dataset_info_file: json file list """ self.totensor = transforms.ToTensor() mean = [0.485, 0.456, 0.406] if args.mean is None else args.mean std = [0.229, 0.224, 0.225] if args.std is None else args.std self.normalize = transforms.Normalize(mean, std) self.data_aug = data_aug if data_aug: if hasattr(args, 'photo_aug'): self.photo_aug = PhotoMetricDistortionMultiViewImage(**args.photo_aug) else: self.photo_aug = False self.dataset_base_dir = dataset_base_dir self.json_file_path = json_file_path # dataset parameters self.dataset_name = args.dataset_name self.num_category = args.num_category self.h_org = args.org_h self.w_org = args.org_w self.h_crop = args.crop_y # parameters related to service network self.h_net = args.resize_h self.w_net = args.resize_w self.u_ratio = float(self.w_net) / float(self.w_org) self.v_ratio = float(self.h_net) / float(self.h_org - self.h_crop) self.top_view_region = args.top_view_region self.max_lanes = args.max_lanes self.K = args.K self.H_crop = homography_crop_resize([args.org_h, args.org_w], args.crop_y, [args.resize_h, args.resize_w]) if args.fix_cam: self.fix_cam = True # compute the homography between image and IPM, and crop transformation self.cam_height = args.cam_height self.cam_pitch = np.pi / 180 * args.pitch self.P_g2im = projection_g2im(self.cam_pitch, self.cam_height, args.K) else: self.fix_cam = False # compute anchor steps self.use_default_anchor = args.use_default_anchor self.x_min, self.x_max = self.top_view_region[0, 0], self.top_view_region[1, 0] self.y_min, self.y_max = self.top_view_region[2, 1], self.top_view_region[0, 1] self.anchor_y_steps = args.anchor_y_steps self.num_y_steps = len(self.anchor_y_steps) self.anchor_y_steps_dense = args.get( 'anchor_y_steps_dense', np.linspace(3, 103, 200)) args.anchor_y_steps_dense = self.anchor_y_steps_dense self.num_y_steps_dense = len(self.anchor_y_steps_dense) self.anchor_dim = 3 * self.num_y_steps + args.num_category self.save_json_path = args.save_json_path # parse ground-truth file if 'openlane' in self.dataset_name: label_list = glob.glob(json_file_path + '**/*.json', recursive=True) self._label_list = label_list elif 'once' in self.dataset_name: label_list = glob.glob(json_file_path + '*/*/*.json', recursive=True) self._label_list = [] for js_label_file in label_list: if not os.path.getsize(js_label_file): continue image_path = map_once_json2img(js_label_file) if not os.path.exists(image_path): continue self._label_list.append(js_label_file) else: raise ValueError("to use ApolloDataset for apollo") if hasattr(self, '_label_list'): self.n_samples = len(self._label_list) else: self.n_samples = self._label_image_path.shape[0] def preprocess_data_from_json_once(self, idx_json_file): _label_image_path = None _label_cam_height = None _label_cam_pitch = None cam_extrinsics = None cam_intrinsics = None _label_laneline_org = None _gt_laneline_category_org = None image_path = map_once_json2img(idx_json_file) assert ops.exists(image_path), '{:s} not exist'.format(image_path) _label_image_path = image_path with open(idx_json_file, 'r') as file: file_lines = [line for line in file] if len(file_lines) != 0: info_dict = json.loads(file_lines[0]) else: print('Empty label_file:', idx_json_file) return if not self.fix_cam: cam_pitch = 0.3/180*np.pi cam_height = 1.5 cam_extrinsics = np.array([[np.cos(cam_pitch), 0, -np.sin(cam_pitch), 0], [0, 1, 0, 0], [np.sin(cam_pitch), 0, np.cos(cam_pitch), cam_height], [0, 0, 0, 1]], dtype=float) R_vg = np.array([[0, 1, 0], [-1, 0, 0], [0, 0, 1]], dtype=float) R_gc = np.array([[1, 0, 0], [0, 0, 1], [0, -1, 0]], dtype=float) cam_extrinsics[:3, :3] = np.matmul(np.matmul( np.matmul(np.linalg.inv(R_vg), cam_extrinsics[:3, :3]), R_vg), R_gc) cam_extrinsics[0:2, 3] = 0.0 gt_cam_height = cam_extrinsics[2, 3] gt_cam_pitch = 0 if 'calibration' in info_dict: cam_intrinsics = info_dict['calibration'] cam_intrinsics = np.array(cam_intrinsics) cam_intrinsics = cam_intrinsics[:, :3] else: cam_intrinsics = self.K _label_cam_height = gt_cam_height _label_cam_pitch = gt_cam_pitch gt_lanes_packed = info_dict['lanes'] gt_lane_pts, gt_lane_visibility, gt_laneline_category = [], [], [] for i, gt_lane_packed in enumerate(gt_lanes_packed): lane = np.array(gt_lane_packed).T # Coordinate convertion for openlane_300 data lane = np.vstack((lane, np.ones((1, lane.shape[1])))) lane = np.matmul(cam_extrinsics, lane) lane = lane[0:3, :].T lane = lane[lane[:,1].argsort()] #TODO:make y mono increase gt_lane_pts.append(lane) gt_lane_visibility.append(1.0) gt_laneline_category.append(1) _gt_laneline_category_org = copy.deepcopy(np.array(gt_laneline_category)) if not self.fix_cam: cam_K = cam_intrinsics if 'openlane' in self.dataset_name or 'once' in self.dataset_name: cam_E = cam_extrinsics P_g2im = projection_g2im_extrinsic(cam_E, cam_K) H_g2im = homograpthy_g2im_extrinsic(cam_E, cam_K) else: gt_cam_height = _label_cam_height gt_cam_pitch = _label_cam_pitch P_g2im = projection_g2im(gt_cam_pitch, gt_cam_height, cam_K) H_g2im = homograpthy_g2im(gt_cam_pitch, gt_cam_height, cam_K) H_im2g = np.linalg.inv(H_g2im) else: P_g2im = self.P_g2im H_im2g = self.H_im2g P_g2gflat = np.matmul(H_im2g, P_g2im) gt_lanes = gt_lane_pts gt_visibility = gt_lane_visibility gt_category = gt_laneline_category # prune gt lanes by visibility labels gt_lanes = [prune_3d_lane_by_visibility(gt_lane, gt_visibility[k]).squeeze(0) for k, gt_lane in enumerate(gt_lanes)] _label_laneline_org = copy.deepcopy(gt_lanes) return _label_image_path, _label_cam_height, _label_cam_pitch, \ cam_extrinsics, cam_intrinsics, \ _label_laneline_org, \ _gt_laneline_category_org, info_dict # _label_laneline, \ # _gt_laneline_visibility, _gt_laneline_category, \ def preprocess_data_from_json_openlane(self, idx_json_file): _label_image_path = None _label_cam_height = None _label_cam_pitch = None cam_extrinsics = None cam_intrinsics = None # _label_laneline = None _label_laneline_org = None # _gt_laneline_visibility = None # _gt_laneline_category = None _gt_laneline_category_org = None # _laneline_ass_id = None with open(idx_json_file, 'r') as file: file_lines = [line for line in file] info_dict = json.loads(file_lines[0]) image_path = ops.join(self.dataset_base_dir, info_dict['file_path']) assert ops.exists(image_path), '{:s} not exist'.format(image_path) _label_image_path = image_path if not self.fix_cam: cam_extrinsics = np.array(info_dict['extrinsic']) # Re-calculate extrinsic matrix based on ground coordinate R_vg = np.array([[0, 1, 0], [-1, 0, 0], [0, 0, 1]], dtype=float) R_gc = np.array([[1, 0, 0], [0, 0, 1], [0, -1, 0]], dtype=float) cam_extrinsics[:3, :3] = np.matmul(np.matmul( np.matmul(np.linalg.inv(R_vg), cam_extrinsics[:3, :3]), R_vg), R_gc) cam_extrinsics[0:2, 3] = 0.0 # gt_cam_height = info_dict['cam_height'] gt_cam_height = cam_extrinsics[2, 3] if 'cam_pitch' in info_dict: gt_cam_pitch = info_dict['cam_pitch'] else: gt_cam_pitch = 0 if 'intrinsic' in info_dict: cam_intrinsics = info_dict['intrinsic'] cam_intrinsics = np.array(cam_intrinsics) else: cam_intrinsics = self.K _label_cam_height = gt_cam_height _label_cam_pitch = gt_cam_pitch gt_lanes_packed = info_dict['lane_lines'] gt_lane_pts, gt_lane_visibility, gt_laneline_category = [], [], [] for i, gt_lane_packed in enumerate(gt_lanes_packed): # A GT lane can be either 2D or 3D # if a GT lane is 3D, the height is intact from 3D GT, so keep it intact here too lane = np.array(gt_lane_packed['xyz']) lane_visibility = np.array(gt_lane_packed['visibility']) # Coordinate convertion for openlane_300 data lane = np.vstack((lane, np.ones((1, lane.shape[1])))) cam_representation = np.linalg.inv( np.array([[0, 0, 1, 0], [-1, 0, 0, 0], [0, -1, 0, 0], [0, 0, 0, 1]], dtype=float)) # transformation from apollo camera to openlane camera lane = np.matmul(cam_extrinsics, np.matmul(cam_representation, lane)) lane = lane[0:3, :].T gt_lane_pts.append(lane) gt_lane_visibility.append(lane_visibility) if 'category' in gt_lane_packed: lane_cate = gt_lane_packed['category'] if lane_cate == 21: # merge left and right road edge into road edge lane_cate = 20 gt_laneline_category.append(lane_cate) else: gt_laneline_category.append(1) # _label_laneline_org = copy.deepcopy(gt_lane_pts) _gt_laneline_category_org = copy.deepcopy(np.array(gt_laneline_category)) gt_lanes = gt_lane_pts gt_visibility = gt_lane_visibility gt_category = gt_laneline_category # prune gt lanes by visibility labels gt_lanes = [prune_3d_lane_by_visibility(gt_lane, gt_visibility[k]) for k, gt_lane in enumerate(gt_lanes)] _label_laneline_org = copy.deepcopy(gt_lanes) return _label_image_path, _label_cam_height, _label_cam_pitch, \ cam_extrinsics, cam_intrinsics, \ _label_laneline_org, \ _gt_laneline_category_org, info_dict def __len__(self): """ Conventional len method """ return self.n_samples # new getitem, WIP def WIP__getitem__(self, idx): """ Args: idx (int): Index in list to load image """ extra_dict = {} idx_json_file = self._label_list[idx] # preprocess data from json file if 'openlane' in self.dataset_name: _label_image_path, _label_cam_height, _label_cam_pitch, \ cam_extrinsics, cam_intrinsics, \ _label_laneline_org, \ _gt_laneline_category_org, info_dict = self.preprocess_data_from_json_openlane(idx_json_file) elif 'once' in self.dataset_name: _label_image_path, _label_cam_height, _label_cam_pitch, \ cam_extrinsics, cam_intrinsics, \ _label_laneline_org, \ _gt_laneline_category_org, info_dict = self.preprocess_data_from_json_once(idx_json_file) # fetch camera height and pitch if not self.fix_cam: gt_cam_height = _label_cam_height gt_cam_pitch = _label_cam_pitch if 'openlane' in self.dataset_name or 'once' in self.dataset_name: intrinsics = cam_intrinsics extrinsics = cam_extrinsics else: # should not be used intrinsics = self.K extrinsics = np.zeros((3,4)) extrinsics[2,3] = gt_cam_height else: gt_cam_height = self.cam_height gt_cam_pitch = self.cam_pitch # should not be used intrinsics = self.K extrinsics = np.zeros((3,4)) extrinsics[2,3] = gt_cam_height img_name = _label_image_path with open(img_name, 'rb') as f: image = (Image.open(f).convert('RGB')) # image preprocess with crop and resize image = F.crop(image, self.h_crop, 0, self.h_org-self.h_crop, self.w_org) image = F.resize(image, size=(self.h_net, self.w_net), interpolation=InterpolationMode.BILINEAR) gt_category_2d = _gt_laneline_category_org if self.data_aug: img_rot, aug_mat = data_aug_rotate(image) if self.photo_aug: img_rot = self.photo_aug( dict(img=img_rot.copy().astype(np.float32)) )['img'] image = Image.fromarray( np.clip(img_rot, 0, 255).astype(np.uint8)) image = self.totensor(image).float() image = self.normalize(image) intrinsics = torch.from_numpy(intrinsics) extrinsics = torch.from_numpy(extrinsics) # prepare binary segmentation label map seg_label = np.zeros((self.h_net, self.w_net), dtype=np.int8) # seg idx has the same order as gt_lanes seg_idx_label = np.zeros((self.max_lanes, self.h_net, self.w_net), dtype=np.uint8) ground_lanes = np.zeros((self.max_lanes, self.anchor_dim), dtype=np.float32) ground_lanes_dense = np.zeros( (self.max_lanes, self.num_y_steps_dense * 3), dtype=np.float32) gt_lanes = _label_laneline_org # ground gt_laneline_img = [[0]] * len(gt_lanes) H_g2im, P_g2im, H_crop = self.transform_mats_impl(cam_extrinsics, \ cam_intrinsics, _label_cam_pitch, _label_cam_height) M = np.matmul(H_crop, P_g2im) # update transformation with image augmentation if self.data_aug: M = np.matmul(aug_mat, M) lidar2img = np.eye(4).astype(np.float32) lidar2img[:3] = M SEG_WIDTH = 80 thickness = int(SEG_WIDTH / 2650 * self.h_net / 2) for i, lane in enumerate(gt_lanes): if i >= self.max_lanes: break if lane.shape[0] <= 2: continue if _gt_laneline_category_org[i] >= self.num_category: continue xs, zs = resample_laneline_in_y(lane, self.anchor_y_steps) vis = np.logical_and( self.anchor_y_steps > lane[:, 1].min() - 5, self.anchor_y_steps < lane[:, 1].max() + 5) ground_lanes[i][0: self.num_y_steps] = xs ground_lanes[i][self.num_y_steps:2*self.num_y_steps] = zs ground_lanes[i][2*self.num_y_steps:3*self.num_y_steps] = vis * 1.0 ground_lanes[i][self.anchor_dim - self.num_category] = 0.0 ground_lanes[i][self.anchor_dim - self.num_category + _gt_laneline_category_org[i]] = 1.0 xs_dense, zs_dense = resample_laneline_in_y( lane, self.anchor_y_steps_dense) vis_dense = np.logical_and( self.anchor_y_steps_dense > lane[:, 1].min(), self.anchor_y_steps_dense < lane[:, 1].max()) ground_lanes_dense[i][0: self.num_y_steps_dense] = xs_dense ground_lanes_dense[i][1*self.num_y_steps_dense: 2*self.num_y_steps_dense] = zs_dense ground_lanes_dense[i][2*self.num_y_steps_dense: 3*self.num_y_steps_dense] = vis_dense * 1.0 x_2d, y_2d = projective_transformation(M, lane[:, 0], lane[:, 1], lane[:, 2]) gt_laneline_img[i] = np.array([x_2d, y_2d]).T.tolist() for j in range(len(x_2d) - 1): seg_label = cv2.line(seg_label, (int(x_2d[j]), int(y_2d[j])), (int(x_2d[j+1]), int(y_2d[j+1])), color=np.asscalar(np.array([1])), thickness=thickness) seg_idx_label[i] = cv2.line( seg_idx_label[i], (int(x_2d[j]), int(y_2d[j])), (int(x_2d[j+1]), int(y_2d[j+1])), color=gt_category_2d[i].item(), thickness=thickness) seg_label = torch.from_numpy(seg_label.astype(np.float32)) seg_label.unsqueeze_(0) extra_dict['seg_label'] = seg_label extra_dict['seg_idx_label'] = seg_idx_label extra_dict['ground_lanes'] = ground_lanes extra_dict['ground_lanes_dense'] = ground_lanes_dense extra_dict['lidar2img'] = lidar2img extra_dict['pad_shape'] = torch.Tensor(seg_idx_label.shape[-2:]).float() extra_dict['idx_json_file'] = idx_json_file extra_dict['image'] = image if self.data_aug: aug_mat = torch.from_numpy(aug_mat.astype(np.float32)) extra_dict['aug_mat'] = aug_mat return extra_dict # old getitem, workable def __getitem__(self, idx): """ Args: idx (int): Index in list to load image """ return self.WIP__getitem__(idx) def transform_mats_impl(self, cam_extrinsics, cam_intrinsics, cam_pitch, cam_height): if not self.fix_cam: if 'openlane' in self.dataset_name or 'once' in self.dataset_name: H_g2im = homograpthy_g2im_extrinsic(cam_extrinsics, cam_intrinsics) P_g2im = projection_g2im_extrinsic(cam_extrinsics, cam_intrinsics) else: H_g2im = homograpthy_g2im(cam_pitch, cam_height, self.K) P_g2im = projection_g2im(cam_pitch, cam_height, self.K) return H_g2im, P_g2im, self.H_crop else: return self.H_g2im, self.P_g2im, self.H_crop def make_lane_y_mono_inc(lane): """ Due to lose of height dim, projected lanes to flat ground plane may not have monotonically increasing y. This function trace the y with monotonically increasing y, and output a pruned lane :param lane: :return: """ idx2del = [] max_y = lane[0, 1] for i in range(1, lane.shape[0]): # hard-coded a smallest step, so the far-away near horizontal tail can be pruned if lane[i, 1] <= max_y + 3: idx2del.append(i) else: max_y = lane[i, 1] lane = np.delete(lane, idx2del, 0) return lane def data_aug_rotate(img): # assume img in PIL image format rot = random.uniform(-np.pi/18, np.pi/18) center_x = img.width / 2 center_y = img.height / 2 rot_mat = cv2.getRotationMatrix2D((center_x, center_y), rot, 1.0) img_rot = np.array(img) img_rot = cv2.warpAffine(img_rot, rot_mat, (img.width, img.height), flags=cv2.INTER_LINEAR) rot_mat = np.vstack([rot_mat, [0, 0, 1]]) return img_rot, rot_mat def seed_worker(worker_id): worker_seed = torch.initial_seed() % 2**32 np.random.seed(worker_seed) random.seed(worker_seed) def get_loader(transformed_dataset, args): """ create dataset from ground-truth return a batch sampler based ont the dataset """ # transformed_dataset = LaneDataset(dataset_base_dir, json_file_path, args) sample_idx = range(transformed_dataset.n_samples) g = torch.Generator() g.manual_seed(0) discarded_sample_start = len(sample_idx) // args.batch_size * args.batch_size if is_main_process(): print("Discarding images:") if hasattr(transformed_dataset, '_label_image_path'): print(transformed_dataset._label_image_path[discarded_sample_start: len(sample_idx)]) else: print(len(sample_idx) - discarded_sample_start) sample_idx = sample_idx[0 : discarded_sample_start] if args.dist: if is_main_process(): print('use distributed sampler') if 'standard' in args.dataset_name or 'rare_subset' in args.dataset_name or 'illus_chg' in args.dataset_name: data_sampler = torch.utils.data.distributed.DistributedSampler(transformed_dataset, shuffle=True, drop_last=True) data_loader = DataLoader(transformed_dataset, batch_size=args.batch_size, sampler=data_sampler, num_workers=args.nworkers, pin_memory=True, persistent_workers=args.nworkers > 0, worker_init_fn=seed_worker, generator=g, drop_last=True) else: data_sampler = torch.utils.data.distributed.DistributedSampler(transformed_dataset) data_loader = DataLoader(transformed_dataset, batch_size=args.batch_size, sampler=data_sampler, num_workers=args.nworkers, pin_memory=True, persistent_workers=args.nworkers > 0, worker_init_fn=seed_worker, generator=g) else: if is_main_process(): print("use default sampler") data_sampler = torch.utils.data.sampler.SubsetRandomSampler(sample_idx) data_loader = DataLoader(transformed_dataset, batch_size=args.batch_size, sampler=data_sampler, num_workers=args.nworkers, pin_memory=True, persistent_workers=args.nworkers > 0, worker_init_fn=seed_worker, generator=g) if args.dist: return data_loader, data_sampler return data_loader def map_once_json2img(json_label_file): if 'train' in json_label_file: split_name = 'train' elif 'val' in json_label_file: split_name = 'val' elif 'test' in json_label_file: split_name = 'test' else: raise ValueError("train/val/test not in the json path") image_path = json_label_file.replace(split_name, 'data').replace('.json', '.jpg') return image_path ================================================ FILE: data/__init__.py ================================================ ================================================ FILE: data/apollo_dataset.py ================================================ # ============================================================================== # Copyright (c) 2022 The PersFormer Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== import re import os import sys import copy import json import glob import random import pickle import warnings from pathlib import Path import numpy as np from numpy import int32#, result_type import cv2 from PIL import Image from torch.utils.data import Dataset, DataLoader from torchvision import transforms import torchvision.transforms.functional as F from torchvision.transforms import InterpolationMode from utils.utils import * sys.path.append('./') warnings.simplefilter('ignore', np.RankWarning) matplotlib.use('Agg') from .transform import PhotoMetricDistortionMultiViewImage from tqdm import tqdm class ApolloLaneDataset(Dataset): def __init__(self, dataset_base_dir, json_file_path, args, data_aug=False, **kwargs): # define image pre-processor self.totensor = transforms.ToTensor() # expect same mean/std for all torchvision models mean = [0.485, 0.456, 0.406] if args.mean is None else args.mean std = [0.229, 0.224, 0.225] if args.std is None else args.std self.normalize = transforms.Normalize(mean, std) self.data_aug = data_aug if data_aug: if hasattr(args, 'photo_aug'): self.photo_aug = PhotoMetricDistortionMultiViewImage(**args.photo_aug) else: self.photo_aug = False self.dataset_base_dir = dataset_base_dir self.json_file_path = json_file_path # dataset parameters self.dataset_name = args.dataset_name self.num_category = args.num_category self.h_org = args.org_h self.w_org = args.org_w self.h_crop = args.crop_y # parameters related to service network self.h_net = args.resize_h self.w_net = args.resize_w self.ipm_h = args.ipm_h self.ipm_w = args.ipm_w self.u_ratio = float(self.w_net) / float(self.w_org) self.v_ratio = float(self.h_net) / float(self.h_org - self.h_crop) self.top_view_region = args.top_view_region self.max_lanes = args.max_lanes self.K = args.K self.H_crop = homography_crop_resize([args.org_h, args.org_w], args.crop_y, [args.resize_h, args.resize_w]) self.fix_cam = False self.x_min, self.x_max = self.top_view_region[0, 0], self.top_view_region[1, 0] self.y_min, self.y_max = self.top_view_region[2, 1], self.top_view_region[0, 1] self.anchor_y_steps = args.anchor_y_steps self.num_y_steps = len(self.anchor_y_steps) self.anchor_y_steps_dense = args.get( 'anchor_y_steps_dense', np.linspace(3, 103, 200)) args.anchor_y_steps_dense = self.anchor_y_steps_dense self.num_y_steps_dense = len(self.anchor_y_steps_dense) self.anchor_dim = 3 * self.num_y_steps + args.num_category self.save_json_path = args.save_json_path # parse ground-truth file self.processed_info_dict = None self.label_list = self.gen_single_file_json() self.n_samples = len(self.label_list) self.processed_info_dict = self.init_dataset_3D(dataset_base_dir, json_file_path) def gen_single_file_json(self): gt_labels_json_dict = [json.loads(line) for line in open(self.json_file_path, 'r').readlines()] # e.g., xxx/standard/train json_save_dir = self.json_file_path.split('.json')[0] mkdir_if_missing(json_save_dir) label_list_path = self.json_file_path.rsplit('/', 1)[0] + '/%s_json_list.txt' % self.json_file_path.rsplit('/', 1)[-1].split('.json')[0] if os.path.isfile(label_list_path): with open(label_list_path, 'r') as f: label_list = f.readlines() label_list = list(map(lambda x: os.path.join(json_save_dir, x.strip()), label_list)) else: label_list = [] for single_info in tqdm(gt_labels_json_dict): img_p = Path(single_info['raw_file']) json_dir = os.path.join(json_save_dir, img_p.parent.name) mkdir_if_missing(json_dir) json_p = os.path.join(json_dir, img_p.stem + '.json') single_info['file_path'] = json_p.split(json_save_dir + '/')[-1] json.dump(single_info, open(json_p, 'w'), separators=(',', ': '), indent=4) label_list.append(json_p) with open(label_list_path, 'w') as f: for label_js in label_list: f.write(label_js) f.write('\n') return label_list def parse_processed_info_dict_apollo(self, idx): keys = self.processed_info_dict.keys() keys = list(keys) res = [] for k in keys: res.append(self.processed_info_dict[k][idx]) return res def __len__(self): """ Conventional len method """ return len(self.label_list) # new getitem, WIP def WIP__getitem__(self, idx): """ Args: idx (int): Index in list to load image """ # preprocess data from json file _label_image_path, _label_cam_height, _label_cam_pitch, \ cam_extrinsics, cam_intrinsics, \ _label_laneline, _label_laneline_org, \ _gt_laneline_visibility, _gt_laneline_category, \ _gt_laneline_category_org, gt_laneline_img = self.parse_processed_info_dict_apollo(idx) if not self.fix_cam: gt_cam_height = _label_cam_height gt_cam_pitch = _label_cam_pitch intrinsics = cam_intrinsics extrinsics = cam_extrinsics else: raise ValueError('check release with training, fix_cam=False') img_name = _label_image_path with open(img_name, 'rb') as f: image = (Image.open(f).convert('RGB')) # image preprocess with crop and resize image = F.crop(image, self.h_crop, 0, self.h_org-self.h_crop, self.w_org) image = F.resize(image, size=(self.h_net, self.w_net), interpolation=InterpolationMode.BILINEAR) gt_category_2d = _gt_laneline_category_org if self.data_aug: img_rot, aug_mat = data_aug_rotate(image) if self.photo_aug: img_rot = self.photo_aug( dict(img=img_rot.copy().astype(np.float32)) )['img'] image = Image.fromarray(np.clip(img_rot, 0, 255).astype(np.uint8)) image = self.totensor(image).float() image = self.normalize(image) gt_cam_height = torch.tensor(gt_cam_height, dtype=torch.float32) gt_cam_pitch = torch.tensor(gt_cam_pitch, dtype=torch.float32) intrinsics = torch.from_numpy(intrinsics) extrinsics = torch.from_numpy(extrinsics) # prepare binary segmentation label map seg_label = np.zeros((self.h_net, self.w_net), dtype=np.uint8) seg_idx_label = np.zeros((self.max_lanes, self.h_net, self.w_net), dtype=np.uint8) ground_lanes = np.zeros((self.max_lanes, self.anchor_dim), dtype=np.float32) ground_lanes_dense = np.zeros( (self.max_lanes, self.num_y_steps_dense * 3), dtype=np.float32) gt_lanes = _label_laneline_org H_g2im, P_g2im, H_crop = self.transform_mats_impl(_label_cam_pitch, _label_cam_height) M = np.matmul(H_crop, P_g2im) # update transformation with image augmentation if self.data_aug: M = np.matmul(aug_mat, M) lidar2img = np.eye(4).astype(np.float32) lidar2img[:3] = M SEG_WIDTH = 80 thickness_st = int(SEG_WIDTH / 2550 * self.h_net / 2) for i, lane in enumerate(gt_lanes): if i >= self.max_lanes: break # TODO remove this if lane.shape[0] < 2: continue if _gt_laneline_category_org[i] > self.num_category: continue xs, zs = resample_laneline_in_y(lane, self.anchor_y_steps) vis = np.logical_and( self.anchor_y_steps > lane[:, 1].min() - 5, self.anchor_y_steps < lane[:, 1].max() + 5) ground_lanes[i][0: self.num_y_steps] = xs ground_lanes[i][self.num_y_steps:2*self.num_y_steps] = zs ground_lanes[i][2*self.num_y_steps:3*self.num_y_steps] = vis * 1.0 ground_lanes[i][self.anchor_dim - self.num_category] = 0.0 ground_lanes[i][self.anchor_dim - self.num_category + 1] = 1.0 xs_dense, zs_dense = resample_laneline_in_y( lane, self.anchor_y_steps_dense) vis_dense = np.logical_and( self.anchor_y_steps_dense > lane[:, 1].min(), self.anchor_y_steps_dense < lane[:, 1].max()) ground_lanes_dense[i][0: self.num_y_steps_dense] = xs_dense ground_lanes_dense[i][1*self.num_y_steps_dense: 2*self.num_y_steps_dense] = zs_dense ground_lanes_dense[i][2*self.num_y_steps_dense: 3*self.num_y_steps_dense] = vis_dense * 1.0 x_2d, y_2d = projective_transformation(M, lane[:, 0], lane[:, 1], lane[:, 2]) for j in range(len(x_2d) - 1): # empirical setting. k = 2.7e-2 - ((2.5e-2 - 5e-5) / 600) * y_2d[j] thickness = max(round(thickness_st - k * (self.h_net - y_2d[j])), 2) if thickness >= 6: thickness += 1 seg_label = cv2.line(seg_label, (int(x_2d[j]), int(y_2d[j])), (int(x_2d[j+1]), int(y_2d[j+1])), color=1, thickness=thickness) seg_idx_label[i] = cv2.line(seg_idx_label[i], (int(x_2d[j]), int(y_2d[j])), (int(x_2d[j+1]), int(y_2d[j+1])), color=gt_category_2d[i].item(), thickness=thickness, lineType=cv2.LINE_AA ) seg_label = torch.from_numpy(seg_label.astype(np.float32)) seg_label.unsqueeze_(0) extra_dict = {} extra_dict['seg_label'] = seg_label extra_dict['seg_idx_label'] = seg_idx_label extra_dict['ground_lanes'] = ground_lanes extra_dict['ground_lanes_dense'] = ground_lanes_dense extra_dict['lidar2img'] = lidar2img extra_dict['pad_shape'] = torch.Tensor(seg_idx_label.shape[-2:]).float() extra_dict['idx_json_file'] = self.label_list[idx] extra_dict['image'] = image if self.data_aug: aug_mat = torch.from_numpy(aug_mat.astype(np.float32)) extra_dict['aug_mat'] = aug_mat extra_dict['cam_extrinsics'] = cam_extrinsics extra_dict['cam_intrinsics'] = cam_intrinsics return extra_dict # old getitem, workable def __getitem__(self, idx): """ Args: idx (int): Index in list to load image """ return self.WIP__getitem__(idx) def init_dataset_3D(self, dataset_base_dir, json_file_path): """ :param dataset_info_file: :return: image paths, labels in unormalized net input coordinates data processing: ground truth labels map are scaled wrt network input sizes """ # load image path, and lane pts label_image_path = [] gt_laneline_pts_all = [] gt_centerline_pts_all = [] gt_laneline_visibility_all = [] gt_centerline_visibility_all = [] gt_laneline_category_all = [] gt_cam_height_all = [] gt_cam_pitch_all = [] assert ops.exists(json_file_path), '{:s} not exist'.format(json_file_path) with open(json_file_path, 'r') as file: for idx, line in enumerate(file): info_dict = json.loads(line) # print('load json : %s | %s' % (idx, info_dict['raw_file'])) image_path = ops.join(dataset_base_dir, info_dict['raw_file']) assert ops.exists(image_path), '{:s} not exist'.format(image_path) label_image_path.append(image_path) gt_lane_pts = info_dict['laneLines'] gt_lane_visibility = info_dict['laneLines_visibility'] for i, lane in enumerate(gt_lane_pts): lane = np.array(lane) gt_lane_pts[i] = lane gt_lane_visibility[i] = np.array(gt_lane_visibility[i]) gt_laneline_pts_all.append(gt_lane_pts) gt_laneline_visibility_all.append(gt_lane_visibility) if 'category' in info_dict: gt_laneline_category = info_dict['category'] gt_laneline_category_all.append(np.array(gt_laneline_category, dtype=np.int32)) else: gt_laneline_category_all.append(np.ones(len(gt_lane_pts), dtype=np.int32)) if not self.fix_cam: gt_cam_height = info_dict['cam_height'] gt_cam_height_all.append(gt_cam_height) gt_cam_pitch = info_dict['cam_pitch'] gt_cam_pitch_all.append(gt_cam_pitch) label_image_path = np.array(label_image_path) gt_cam_height_all = np.array(gt_cam_height_all) gt_cam_pitch_all = np.array(gt_cam_pitch_all) gt_laneline_pts_all_org = copy.deepcopy(gt_laneline_pts_all) gt_laneline_category_all_org = copy.deepcopy(gt_laneline_category_all) visibility_all_flat = [] gt_laneline_im_all = [] gt_centerline_im_all = [] cam_extrinsics_all = [] cam_intrinsics_all = [] for idx in range(len(gt_laneline_pts_all)): # fetch camera height and pitch gt_cam_height = gt_cam_height_all[idx] gt_cam_pitch = gt_cam_pitch_all[idx] if not self.fix_cam: P_g2im = projection_g2im(gt_cam_pitch, gt_cam_height, self.K) H_g2im = homograpthy_g2im(gt_cam_pitch, gt_cam_height, self.K) H_im2g = np.linalg.inv(H_g2im) else: P_g2im = self.P_g2im H_im2g = self.H_im2g gt_lanes = gt_laneline_pts_all[idx] gt_visibility = gt_laneline_visibility_all[idx] # prune gt lanes by visibility labels gt_lanes = [prune_3d_lane_by_visibility(gt_lane, gt_visibility[k]) for k, gt_lane in enumerate(gt_lanes)] gt_laneline_pts_all_org[idx] = gt_lanes # project gt laneline to image plane gt_laneline_im = [] for gt_lane in gt_lanes: x_vals, y_vals = projective_transformation(P_g2im, gt_lane[:,0], gt_lane[:,1], gt_lane[:,2]) gt_laneline_im_oneline = np.array([x_vals, y_vals]).T.tolist() gt_laneline_im.append(gt_laneline_im_oneline) gt_laneline_im_all.append(gt_laneline_im) # generate ex/in from apollo cam_intrinsics = self.K cam_extrinsics = np.zeros((4,4)) cam_extrinsics[-1, -1] = 1 cam_extrinsics[2,3] = gt_cam_height cam_extrinsics_all.append(cam_extrinsics) cam_intrinsics_all.append(cam_intrinsics) visibility_all_flat = np.array(visibility_all_flat) processed_info_dict = {} processed_info_dict['label_json_path'] = label_image_path processed_info_dict['gt_cam_height_all'] = gt_cam_height_all processed_info_dict['gt_cam_pitch_all'] = gt_cam_pitch_all processed_info_dict['cam_extrinsics'] = cam_extrinsics_all processed_info_dict['cam_intrinsics'] = cam_intrinsics_all processed_info_dict['gt_laneline_pts_all'] = gt_laneline_pts_all processed_info_dict['gt_laneline_pts_all_org'] = gt_laneline_pts_all_org processed_info_dict['gt_laneline_visibility_all'] = gt_laneline_visibility_all processed_info_dict['gt_laneline_category_all'] = gt_laneline_category_all processed_info_dict['gt_laneline_category_all_org'] = gt_laneline_category_all_org processed_info_dict['gt_laneline_im_all'] = gt_laneline_im_all return processed_info_dict def transform_mats_impl(self, cam_pitch, cam_height): if not self.fix_cam: H_g2im = homograpthy_g2im(cam_pitch, cam_height, self.K) P_g2im = projection_g2im(cam_pitch, cam_height, self.K) return H_g2im, P_g2im, self.H_crop else: return self.H_g2im, self.P_g2im, self.H_crop def data_aug_rotate(img): # assume img in PIL image format rot = random.uniform(-np.pi/18, np.pi/18) center_x = img.width / 2 center_y = img.height / 2 rot_mat = cv2.getRotationMatrix2D((center_x, center_y), rot, 1.0) img_rot = np.array(img) img_rot = cv2.warpAffine(img_rot, rot_mat, (img.width, img.height), flags=cv2.INTER_LINEAR) rot_mat = np.vstack([rot_mat, [0, 0, 1]]) return img_rot, rot_mat def seed_worker(worker_id): worker_seed = torch.initial_seed() % 2**32 np.random.seed(worker_seed) random.seed(worker_seed) def get_loader(transformed_dataset, args): """ create dataset from ground-truth return a batch sampler based ont the dataset """ sample_idx = range(transformed_dataset.n_samples) g = torch.Generator() g.manual_seed(0) discarded_sample_start = len(sample_idx) // args.batch_size * args.batch_size if args.proc_id == 0: print("Discarding images:") if args.proc_id == 0: if hasattr(transformed_dataset, '_label_image_path'): print(transformed_dataset._label_image_path[discarded_sample_start: len(sample_idx)]) else: print(len(sample_idx) - discarded_sample_start) sample_idx = sample_idx[0 : discarded_sample_start] if args.dist: if args.proc_id == 0: print('use distributed sampler') if 'standard' in args.dataset_name or 'rare_subset' in args.dataset_name or 'illus_chg' in args.dataset_name: data_sampler = torch.utils.data.distributed.DistributedSampler(transformed_dataset, shuffle=True, drop_last=True) data_loader = DataLoader(transformed_dataset, batch_size=args.batch_size, sampler=data_sampler, num_workers=args.nworkers, pin_memory=True, persistent_workers=True, worker_init_fn=seed_worker, generator=g, drop_last=True) else: data_sampler = torch.utils.data.distributed.DistributedSampler(transformed_dataset) data_loader = DataLoader(transformed_dataset, batch_size=args.batch_size, sampler=data_sampler, num_workers=args.nworkers, pin_memory=True, persistent_workers=True, worker_init_fn=seed_worker, generator=g) else: if args.proc_id == 0: print("use default sampler") data_sampler = torch.utils.data.sampler.SubsetRandomSampler(sample_idx) data_loader = DataLoader(transformed_dataset, batch_size=args.batch_size, sampler=data_sampler, num_workers=args.nworkers, pin_memory=True, persistent_workers=True, worker_init_fn=seed_worker, generator=g) if args.dist: return data_loader, data_sampler return data_loader ================================================ FILE: data/transform.py ================================================ import numpy as np import mmcv import torch import torch.nn.functional as F import PIL import random def get_random_state() -> np.random.RandomState: return np.random.RandomState(random.randint(0, (1 << 32) - 1)) def normal( loc=0.0, scale=1.0, size=None, random_state=None, ): if random_state is None: random_state = get_random_state() return random_state.normal(loc, scale, size) class PhotoMetricDistortionMultiViewImage: """Apply photometric distortion to image sequentially, every transformation is applied with a probability of 0.5. The position of random contrast is in second or second to last. 1. random brightness 2. random contrast (mode 0) 3. convert color from BGR to HSV 4. random saturation 5. random hue 6. convert color from HSV to BGR 7. random contrast (mode 1) 8. randomly swap channels Args: brightness_delta (int): delta of brightness. contrast_range (tuple): range of contrast. saturation_range (tuple): range of saturation. hue_delta (int): delta of hue. """ def __init__(self, brightness_delta=32, contrast_range=(0.5, 1.5), saturation_range=(0.5, 1.5), hue_delta=18): self.brightness_delta = brightness_delta self.contrast_lower, self.contrast_upper = contrast_range self.saturation_lower, self.saturation_upper = saturation_range self.hue_delta = hue_delta def __call__(self, results): """Call function to perform photometric distortion on images. Args: results (dict): Result dict from loading pipeline. Returns: dict: Result dict with images distorted. """ imgs = results['img'] if not isinstance(imgs, list): imgs = [imgs] new_imgs = [] for img in imgs: assert img.dtype == np.float32, \ 'PhotoMetricDistortion needs the input image of dtype np.float32,'\ ' please set "to_float32=True" in "LoadImageFromFile" pipeline' # random brightness if np.random.randint(2): delta = np.random.uniform(-self.brightness_delta, self.brightness_delta) img += delta # mode == 0 --> do random contrast first # mode == 1 --> do random contrast last mode = np.random.randint(2) if mode == 1: if np.random.randint(2): alpha = np.random.uniform(self.contrast_lower, self.contrast_upper) img *= alpha # convert color from BGR to HSV img = mmcv.bgr2hsv(img) # random saturation if np.random.randint(2): img[..., 1] *= np.random.uniform(self.saturation_lower, self.saturation_upper) # random hue if np.random.randint(2): img[..., 0] += np.random.uniform(-self.hue_delta, self.hue_delta) img[..., 0][img[..., 0] > 360] -= 360 img[..., 0][img[..., 0] < 0] += 360 # convert color from HSV to BGR img = mmcv.hsv2bgr(img) # random contrast if mode == 0: if np.random.randint(2): alpha = np.random.uniform(self.contrast_lower, self.contrast_upper) img *= alpha # randomly swap channels if np.random.randint(2): img = img[..., np.random.permutation(3)] new_imgs.append(img) if not isinstance(results['img'], list): new_imgs = new_imgs[0] results['img'] = new_imgs return results ================================================ FILE: docs/data_preparation.md ================================================ # Data Preparation ## OpenLane Follow [OpenLane](https://github.com/OpenDriveLab/PersFormer_3DLane#dataset) to download dataset and then link it under `data` directory. ```bash cd data && mkdir openlane && cd openlane ln -s ${OPENLANE_PATH}/images . ln -s ${OPENLANE_PATH}/lane3d_1000 . ``` ## ONCE Follow [ONCE](https://github.com/once-3dlanes/once_3dlanes_benchmark#data-preparation) to download dataset, and then link it under `data` directory. ```bash cd data ln -s ${once} . ``` ## Apollo Follow [Apollo](https://github.com/yuliangguo/Pytorch_Generalized_3D_Lane_Detection#data-preparation) to download dataset and link it under `data` directory. ```bash cd data && mkdir apollosyn_gen-lanenet cd apollosyn_gen-lanenet ln -s ${Apollo_Sim_3D_Lane_Release} . ln -s ${data_splits} . ``` Your data directory should be like: ```bash |-- apollosyn_gen-lanenet |-- Apollo_Sim_3D_Lane_Release | |-- depth | |-- images | |-- img_list.txt | |-- labels | |-- laneline_label.json | `-- segmentation `-- data_splits |-- illus_chg |-- rare_subset `-- standard |-- Load_Data.py |-- __init__.py |-- apollo_dataset.py |-- once -> ${once} |-- annotation |-- data |-- data_check.py |-- list |-- raw_cam01 |-- raw_cam_multi |-- train `-- val |-- openlane | |-- images -> ${openlane}/images/ |-- training `-- validation | `-- lane3d_1000 -> ${openlane}/lane3d_1000/ |-- test |-- training `-- validation `-- transform.py ``` ================================================ FILE: docs/install.md ================================================ # Environment It is recommanded to build a new virtual environment. ## 1. Install pytorch and requirements. ```bash # first install pytorch conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cudatoolkit=10.1 -c pytorch # then clone LATR and change directory to it to install requirements cd ${LATR_PATH} python -m pip install -r requirements.txt ``` ## 2. Install mm packages ### 2.1 Install `mmcv` ```bash git clone https://github.com/open-mmlab/mmcv.git cd mmcv && git checkout v1.5.0 FORCE_CUDA=1 MMCV_WITH_OPS=1 python -m pip install . ``` ### 2.2 Install other mm packages Install [mmdet](https://github.com/open-mmlab/mmdetection) and [mmdet3d](https://github.com/open-mmlab/mmdetection3d). Note that we use `mmdet==2.24.0` and `mmdet3d==1.0.0rc3`. ================================================ FILE: docs/train_eval.md ================================================ # Training and Evaluation ## Train ### Openlane - Base version: ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node 4 main.py --config config/release_iccv/latr_1000_baseline.py ``` - lite version: ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node 4 main.py --config config/release_iccv/latr_1000_baseline_lite.py ``` ### ONCE ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node 4 main.py --config config/release_iccv/once.py ``` ### Apollo - Balanced Scene ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node 4 main.py --config config/release_iccv/apollo_standard.py ``` - Rare Subset ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node 4 main.py --config config/release_iccv/apollo_rare.py ``` - Visual Variations ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --master_port=29284 --nproc_per_node 4 main.py --config config/release_iccv/apollo_illu.py ``` ## Evaluation ### Openlane - Base version: ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node 4 main.py --config config/release_iccv/latr_1000_baseline.py --cfg-options evaluate=true eval_ckpt=pretrained_models/openlane.pth ``` - lite version: ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node 4 main.py --config config/release_iccv/latr_1000_baseline_lite.py --cfg-options evaluate=true eval_ckpt=pretrained_models/openlane_lite.pth ``` ### ONCE ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node 4 main.py --config config/release_iccv/once.py --cfg-options evaluate=true eval_ckpt=pretrained_models/once.pth ``` ### Apollo - Balanced Scene ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node 4 main.py --config config/release_iccv/apollo_standard.py --cfg-options evaluate=true eval_ckpt=pretrained_models/apollo_standard.pth ``` - Rare Subset ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --nproc_per_node 4 main.py --config config/release_iccv/apollo_rare.py --cfg-options evaluate=true eval_ckpt=pretrained_models/apollo_rare.pth ``` - Visual Variations ```bash CUDA_VISIBLE_DEVICES='0,1,2,3' python -m torch.distributed.launch --master_port=29284 --nproc_per_node 4 main.py --config config/release_iccv/apollo_illu.py --cfg-options evaluate=true eval_ckpt=pretrained_models/apollo_illu.pth ``` ================================================ FILE: experiments/__init__.py ================================================ ================================================ FILE: experiments/ddp.py ================================================ # ============================================================================== # Copyright (c) 2022 The PersFormer Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== import torch import torch.distributed as dist import torch.multiprocessing as mp import os import subprocess import numpy as np import random def setup_dist_launch(args): args.proc_id = args.local_rank world_size = int(os.getenv('WORLD_SIZE', 1))*args.nodes print("proc_id: " + str(args.proc_id)) print("world size: " + str(world_size)) print("local_rank: " + str(args.local_rank)) os.environ['WORLD_SIZE'] = str(world_size) os.environ['RANK'] = str(args.proc_id) os.environ['LOCAL_RANK'] = str(args.local_rank) def setup_slurm(args): if mp.get_start_method(allow_none=True) is None: mp.set_start_method('spawn') args.proc_id = int(os.environ['SLURM_PROCID']) ntasks = int(os.environ['SLURM_NTASKS']) node_list = os.environ['SLURM_NODELIST'] num_gpus = torch.cuda.device_count() local_rank = args.proc_id % num_gpus args.local_rank = local_rank print("proc_id: " + str(args.proc_id)) print("world size: " + str(ntasks)) print("local_rank: " + str(local_rank)) addr = subprocess.getoutput( f'scontrol show hostname {node_list} | head -n1') os.environ['MASTER_PORT'] = str(args.port) os.environ['MASTER_ADDR'] = addr os.environ['WORLD_SIZE'] = str(ntasks) os.environ['RANK'] = str(args.proc_id) os.environ['LOCAL_RANK'] = str(local_rank) def setup_distributed(args): args.gpu = args.local_rank torch.cuda.set_device(args.gpu) dist.init_process_group(backend='nccl') args.world_size = dist.get_world_size() torch.set_printoptions(precision=10) print('args.world_size', args.world_size) def ddp_init(args): args.proc_id, args.gpu, args.world_size = 0, 0, 1 if args.use_slurm == True: setup_slurm(args) else: setup_dist_launch(args) if 'WORLD_SIZE' in os.environ: args.distributed = int(os.environ['WORLD_SIZE']) >= 1 if args.distributed: setup_distributed(args) # deterministic torch.backends.cudnn.benchmark = False torch.backends.cudnn.deterministic = True torch.manual_seed(args.proc_id) np.random.seed(args.proc_id) random.seed(args.proc_id) def to_python_float(t): if hasattr(t, 'item'): return t.item() else: return t[0] def reduce_tensor(tensor, world_size): rt = tensor.clone() dist.all_reduce(rt, op=dist.ReduceOp.SUM) rt /= world_size return rt def reduce_tensors(*tensors, world_size): return [reduce_tensor(tensor, world_size) for tensor in tensors] ================================================ FILE: experiments/gpu_utils.py ================================================ import torch import torch.distributed as dist def get_rank() -> int: if not dist.is_available(): return 0 if not dist.is_initialized(): return 0 return dist.get_rank() def is_main_process() -> bool: return get_rank() == 0 def gpu_available() -> bool: return torch.cuda.is_available() ================================================ FILE: experiments/runner.py ================================================ import torch import torch.optim import torch.nn as nn import numpy as np import glob import time import os from tqdm import tqdm from tensorboardX import SummaryWriter import traceback import shutil from data.Load_Data import * from data.apollo_dataset import ApolloLaneDataset from models.latr import LATR from experiments.gpu_utils import is_main_process from utils import eval_3D_lane, eval_3D_once from utils import eval_3D_lane_apollo from utils.utils import * # ddp related import torch.distributed as dist from torch.nn.parallel import DistributedDataParallel as DDP from .ddp import * import os.path as osp from .gpu_utils import gpu_available from mmcv.runner.optimizer import build_optimizer class Runner: def __init__(self, args): self.args = args set_work_dir(self.args) self.logger = create_logger(args) # Check GPU availability if is_main_process(): if not gpu_available(): raise Exception("No gpu available for usage") if int(os.getenv('WORLD_SIZE', 1)) >= 1: self.logger.info("Let's use %s" % os.environ['WORLD_SIZE'] + "GPUs!") torch.cuda.empty_cache() # Get Dataset if is_main_process(): self.logger.info("Loading Dataset ...") self.val_gt_file = ops.join(args.save_path, 'test.json') if not args.evaluate: self.train_dataset, self.train_loader, self.train_sampler = self._get_train_dataset() else: self.train_dataset, self.train_loader, self.train_sampler = [],[],[] self.valid_dataset, self.valid_loader, self.valid_sampler = self._get_valid_dataset() if 'openlane' in args.dataset_name: self.evaluator = eval_3D_lane.LaneEval(args, logger=self.logger) elif 'apollo' in args.dataset_name: self.evaluator = eval_3D_lane_apollo.LaneEval(args, logger=self.logger) elif 'once' in args.dataset_name: self.evaluator = eval_3D_once.LaneEval() else: assert False # Tensorboard writer if not args.no_tb and is_main_process(): tensorboard_path = os.path.join(args.save_path, 'Tensorboard/') mkdir_if_missing(tensorboard_path) self.writer = SummaryWriter(tensorboard_path) if is_main_process(): self.logger.info("Init Done!") self.is_apollo = False if 'apollo' in args.dataset_name: self.is_apollo = True def train(self): args = self.args # Get Dataset train_loader = self.train_loader train_sampler = self.train_sampler global lowest_loss, best_f1_epoch, best_val_f1, best_epoch # Define model or resume model, optimizer, scheduler, best_epoch, \ lowest_loss, best_f1_epoch, best_val_f1 = self._get_model_ddp() self._log_model_info(model) def save_cur_ckpt( loss, with_eval=True, eval_stats=None): # Save model if not with_eval: self.save_checkpoint({ 'state_dict': model.module.state_dict(), 'optimizer': optimizer.state_dict(), 'scheduler': scheduler.state_dict() }, False, epoch+1, self.args.save_path) else: total_score = loss.item() # loss_list[0].avg if is_main_process(): # File to keep latest epoch with open(os.path.join(args.save_path, 'first_run.txt'), 'w') as f: f.write(str(epoch + 1)) global best_val_f1, best_f1_epoch, lowest_loss, best_epoch to_copy, to_save = False, True # False if args.save_best else True if total_score < lowest_loss: best_epoch = epoch + 1 lowest_loss = total_score if eval_stats[0] > best_val_f1: to_copy = True best_f1_epoch = epoch + 1 best_val_f1 = eval_stats[0] to_save = True self.log_eval_stats(eval_stats) self.logger.info("===> Last best F1 was {:.8f} in epoch {}".format(best_val_f1, best_f1_epoch)) if not to_save: return self.save_checkpoint({ 'state_dict': model.module.state_dict(), 'optimizer': optimizer.state_dict(), 'scheduler': scheduler.state_dict() }, to_copy, epoch+1, self.args.save_path) # Start training and validation for nepochs for epoch in range(args.start_epoch, args.nepochs): if is_main_process(): self.logger.info("\n => Start train set for EPOCH {}".format(epoch + 1)) self.logger.info('lr is set to {}'.format(optimizer.param_groups[0]['lr'])) if args.distributed: train_sampler.set_epoch(epoch) # Define container objects to keep track of multiple losses/metrics batch_time = AverageMeter() data_time = AverageMeter() # compute FPS epoch_time = AverageMeter() loss = 0 # Specify operation modules model.train() # compute timing end = time.time() epoch_time.start = end # Start training loop train_pbar = tqdm(total=len(train_loader), ncols=60) for i, extra_dict in enumerate(train_loader): train_pbar.update(1) data_time.update(time.time() - end) if gpu_available(): json_files = extra_dict.pop('idx_json_file') for k, v in extra_dict.items(): extra_dict[k] = v.cuda() image = extra_dict['image'] image = image.contiguous().float() # Run model optimizer.zero_grad() output = model(image=image, extra_dict=extra_dict, is_training=True) loss, loss_info = self._log_training_loss( output, epoch, step=i, data_loader=train_loader) train_pbar.set_postfix(loss=loss.item()) if is_main_process(): self.writer.add_scalar('lr', optimizer.param_groups[0]['lr'], epoch) # Setup backward pass loss.backward() # Clip gradients (usefull for instabilities or mistakes in ground truth) if args.clip_grad_norm != 0: nn.utils.clip_grad_norm_(model.parameters(), args.clip_grad_norm) # update params optimizer.step() if args.lr_policy == 'cosine_warmup': scheduler.step(epoch + i / len(train_loader)) elif args.lr_policy == 'PolyLR': scheduler.step() # Time trainig iteration batch_time.update(time.time() - end) end = time.time() # Print info if (i + 1) % args.print_freq == 0 and is_main_process(): self.logger.info('Epoch: [{0}][{1}/{2}]\t' 'Batch Time / Avg Time {batch_time.val:.3f} ({batch_time.avg:.3f})\t' 'Loss {loss:.8f} {loss_info}'.format( epoch+1, i+1, len(train_loader), batch_time=batch_time, data_time=data_time, loss=loss.item(), loss_info=loss_info)) train_pbar.close() epoch_time.update(time.time() - epoch_time.start) if is_main_process(): self.logger.info('Epoch time : {:.3f} hours.'.format(epoch_time.val / 60 / 60)) # Adjust learning rate if args.lr_policy != 'cosine_warmup': scheduler.step() meet_eval_freq = args.eval_freq > 0 and (epoch + 1) % args.eval_freq == 0 last_ep = (epoch == args.nepochs - 1) if meet_eval_freq or last_ep: loss_valid_list, eval_stats = self.validate(model) if eval_stats[0] >= best_val_f1: self.logger.info(' >>> to save new best model at ep : %s with F1 %s' % ((epoch+1), eval_stats[0])) save_cur_ckpt(loss, with_eval=True, eval_stats=eval_stats) elif last_ep: self.logger.info(' >>> to save the last model at ep : %s with F1 %s' % ((epoch+1), eval_stats[0])) save_cur_ckpt(loss, with_eval=True, eval_stats=eval_stats) else: self.logger.info(' >>> skip model at ep : %s with lower F1 : %s' % ((epoch+1), eval_stats[0])) self.log_eval_stats(eval_stats) dist.barrier() torch.cuda.empty_cache() # at the end of training if not args.no_tb and is_main_process(): self.writer.close() def _log_model_info(self, model): args = self.args if not is_main_process(): return self.logger.info(40*"="+"\nArgs:{}\n".format(args)+40*"=") self.logger.info("Init model: '{}'".format(args.mod)) self.logger.info("Number of parameters in model {} is {:.3f}M".format(args.mod, sum(tensor.numel() for tensor in model.parameters())/1e6)) def _log_training_loss(self, output, epoch, step, data_loader): loss = 0.0 loss_info = '' for k, v in output.items(): if 'loss' in k: loss = loss + v loss_info = loss_info + '| %s:%.4f ' % (k, v.item() if isinstance(v, torch.Tensor) else v) if isinstance(v, torch.Tensor): v = v.item() if is_main_process(): self.writer.add_scalar(k, v, epoch*len(data_loader) + step) return loss, loss_info def save_checkpoint(self, state, to_copy, epoch, save_path): if is_main_process(): self.logger.info('Saving checkpoint to {}'.format(save_path)) if to_copy: file_pre = f'model_best_epoch_{epoch}.pth.tar' self.logger.info('save the best model : %s' % epoch) else: file_pre = f'checkpoint_model_epoch_{epoch}.path.tar' filepath = os.path.join(save_path, file_pre) torch.save(state, filepath) def validate(self, model, **kwargs): args = self.args loader = self.valid_loader pred_lines_sub = [] gt_lines_sub = [] model.eval() # Start validation loop with torch.no_grad(): val_pbar = tqdm(total=len(loader), ncols=50) for i, extra_dict in enumerate(loader): val_pbar.update(1) if not args.no_cuda: json_files = extra_dict.pop('idx_json_file') for k, v in extra_dict.items(): extra_dict[k] = v.cuda() image = extra_dict['image'] image = image.contiguous().float() output = model(image=image, extra_dict=extra_dict, is_training=False) all_line_preds = output['all_line_preds'] # in ground coordinate system all_cls_scores = output['all_cls_scores'] all_line_preds = all_line_preds[-1] all_cls_scores = all_cls_scores[-1] num_el = all_cls_scores.shape[0] if 'cam_extrinsics' in extra_dict: cam_extrinsics_all = extra_dict['cam_extrinsics'] cam_intrinsics_all = extra_dict['cam_intrinsics'] else: cam_extrinsics_all, cam_intrinsics_all = None, None # Print info if (i + 1) % args.print_freq == 0 and is_main_process(): self.logger.info('Test: [{0}/{1}]'.format(i+1, len(loader))) # Write results for j in range(num_el): json_file = json_files[j] if cam_extrinsics_all is not None: extrinsic = cam_extrinsics_all[j].cpu().numpy() intrinsic = cam_intrinsics_all[j].cpu().numpy() with open(json_file, 'r') as file: if 'apollo' in args.dataset_name: json_line = json.loads(file.read()) if 'extrinsic' not in json_line: json_line['extrinsic'] = extrinsic if 'intrinsic' not in json_line: json_line['intrinsic'] = intrinsic else: file_lines = [line for line in file] json_line = json.loads(file_lines[0]) json_line['json_file'] = json_file if 'once' in args.dataset_name: if 'train' in json_file: img_path = json_file.replace('train', 'data').replace('.json', '.jpg') elif 'val' in json_file: img_path = json_file.replace('val', 'data').replace('.json', '.jpg') elif 'test' in json_file: img_path = json_file.replace('test', 'data').replace('.json', '.jpg') json_line["file_path"] = img_path gt_lines_sub.append(copy.deepcopy(json_line)) # pred in ground lane_pred = all_line_preds[j].cpu().numpy() cls_pred = torch.argmax(all_cls_scores[j], dim=-1).cpu().numpy() pos_lanes = lane_pred[cls_pred > 0] if self.args.num_category > 1: scores_pred = torch.softmax(all_cls_scores[j][cls_pred > 0], dim=-1).cpu().numpy() else: scores_pred = torch.sigmoid(all_cls_scores[j][cls_pred > 0]).cpu().numpy() if pos_lanes.shape[0]: lanelines_pred = [] lanelines_prob = [] xs = pos_lanes[:, 0:args.num_y_steps] ys = np.tile(args.anchor_y_steps.copy()[None, :], (xs.shape[0], 1)) zs = pos_lanes[:, args.num_y_steps:2*args.num_y_steps] vis = pos_lanes[:, 2*args.num_y_steps:] for tmp_idx in range(pos_lanes.shape[0]): cur_vis = vis[tmp_idx] > 0 cur_xs = xs[tmp_idx][cur_vis] cur_ys = ys[tmp_idx][cur_vis] cur_zs = zs[tmp_idx][cur_vis] if cur_vis.sum() < 2: continue lanelines_pred.append([]) for tmp_inner_idx in range(cur_xs.shape[0]): lanelines_pred[-1].append( [cur_xs[tmp_inner_idx], cur_ys[tmp_inner_idx], cur_zs[tmp_inner_idx]]) lanelines_prob.append(scores_pred[tmp_idx].tolist()) else: lanelines_pred = [] lanelines_prob = [] json_line["pred_laneLines"] = lanelines_pred json_line["pred_laneLines_prob"] = lanelines_prob pred_lines_sub.append(copy.deepcopy(json_line)) img_path = json_line['file_path'] if args.dataset_name == 'once': self.save_eval_result_once(args, img_path, lanelines_pred, lanelines_prob) val_pbar.close() if 'openlane' in args.dataset_name: eval_stats = self.evaluator.bench_one_submit_ddp( pred_lines_sub, gt_lines_sub, args.model_name, args.pos_threshold, vis=False) elif 'once' in args.dataset_name: eval_stats = self.evaluator.lane_evaluation( args.data_dir + 'val', '%s/once_pred/test' % (args.save_path), args.eval_config_dir, args) elif 'apollo' in args.dataset_name: self.logger.info(' >>> eval mAP | [0.05, 0.95]') eval_stats = self.evaluator.bench_one_submit_ddp( pred_lines_sub, gt_lines_sub, args.model_name, args.pos_threshold, vis=False) else: assert False if any(name in args.dataset_name for name in ['openlane', 'apollo']): gather_output = [None for _ in range(args.world_size)] # all_gather all eval_stats and calculate mean dist.all_gather_object(gather_output, eval_stats) dist.barrier() eval_stats = self._recal_gpus_val(gather_output, eval_stats) loss_list = [] return loss_list, eval_stats elif 'once' in args.dataset_name: loss_list = [] return loss_list, eval_stats def _recal_gpus_val(self, gather_output, eval_stats): args = self.args apollo_metrics = { 'r_lane': 0, 'p_lane': 0, 'cnt_gt': 0, 'cnt_pred': 0 } openlane_metrics = { 'r_lane': 0, 'p_lane': 0, 'c_lane': 0, 'cnt_gt': 0, 'cnt_pred': 0, 'match_num': 0 } if 'apollo' in self.args.dataset_name: # apollo no category accuracy. start_idx = 7 gather_metrics = apollo_metrics else: start_idx = 8 gather_metrics = openlane_metrics for i, k in enumerate(gather_metrics.keys()): gather_metrics[k] = np.sum( [eval_stats_sub[start_idx + i] for eval_stats_sub in gather_output]) if gather_metrics['cnt_gt']!=0 : Recall = gather_metrics['r_lane'] / gather_metrics['cnt_gt'] else: Recall = gather_metrics['r_lane'] / (gather_metrics['cnt_gt'] + 1e-6) if gather_metrics['cnt_pred'] !=0 : Precision = gather_metrics['p_lane'] / gather_metrics['cnt_pred'] else: Precision = gather_metrics['p_lane'] / (gather_metrics['cnt_pred'] + 1e-6) if (Recall + Precision)!=0: f1_score = 2 * Recall * Precision / (Recall + Precision) else: f1_score = 2 * Recall * Precision / (Recall + Precision + 1e-6) if 'apollo' not in self.args.dataset_name: if gather_metrics['match_num']!=0: category_accuracy = gather_metrics['c_lane'] / gather_metrics['match_num'] else: category_accuracy = gather_metrics['c_lane'] / (gather_metrics['match_num'] + 1e-6) eval_stats[0] = f1_score eval_stats[1] = Recall eval_stats[2] = Precision if self.is_apollo: err_start_idx = 3 else: eval_stats[3] = category_accuracy err_start_idx = 4 for i in range(4): err_idx = err_start_idx + i eval_stats[err_idx] = np.sum([eval_stats_sub[err_idx] for eval_stats_sub in gather_output]) / args.world_size return eval_stats def _get_model_from_cfg(self): args = self.args model = LATR(args) if args.sync_bn: if is_main_process(): self.logger.info("Convert model with Sync BatchNorm") model = nn.SyncBatchNorm.convert_sync_batchnorm(model) if gpu_available(): device = torch.device("cuda", args.local_rank) model = model.to(device) return model def _load_ckpt_from_workdir(self, model): args = self.args if args.eval_ckpt: best_file_name = args.eval_ckpt else: best_file_name = glob.glob(os.path.join(args.save_path, 'model_best*')) if len(best_file_name) > 0: best_file_name = best_file_name[0] else: best_file_name = '' if os.path.isfile(best_file_name): checkpoint = torch.load(best_file_name) if is_main_process(): self.logger.info("=> loading checkpoint '{}'".format(best_file_name)) model.load_state_dict(checkpoint['state_dict']) else: self.logger.info("=> no checkpoint found at '{}'".format(best_file_name)) def eval(self): self.logger.info('>>>>> start eval <<<<< \n') args = self.args model = self._get_model_from_cfg() self._load_ckpt_from_workdir(model) dist.barrier() # DDP setting if args.distributed: model = DDP(model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True) _, eval_stats = self.validate(model) if is_main_process() and (eval_stats is not None): self.log_eval_stats(eval_stats) def _get_train_dataset(self): args = self.args if 'openlane' in args.dataset_name: train_dataset = LaneDataset(args.dataset_dir, args.data_dir + 'training/', args, data_aug=True) elif 'once' in args.dataset_name: train_dataset = LaneDataset(args.dataset_dir, ops.join(args.data_dir, 'train/'), args, data_aug=True) else: self.logger.info('using Apollo Dataset') train_dataset = ApolloLaneDataset(args.dataset_dir, ops.join(args.data_dir, 'train.json'), args, data_aug=True) train_loader, train_sampler = get_loader(train_dataset, args) return train_dataset, train_loader, train_sampler def _get_model_ddp(self): args = self.args # define network model = LATR(args) # if args.sync_bn: if is_main_process(): self.logger.info("Convert model with Sync BatchNorm") model = nn.SyncBatchNorm.convert_sync_batchnorm(model) if gpu_available(): # Load model on gpu before passing params to optimizer device = torch.device("cuda", args.local_rank) model = model.to(device) """ first load param to model, then model = DDP(model) """ # resume model args.resume = first_run(args.save_path) model, best_epoch, lowest_loss, best_f1_epoch, best_val_f1, \ optim_saved_state, schedule_saved_state = self.resume_model(model) dist.barrier() # DDP setting if args.distributed: model = DDP( model, device_ids=[args.local_rank], output_device=args.local_rank, find_unused_parameters=True ) # Define optimizer and scheduler optimizer = build_optimizer( model, args.optimizer_cfg) scheduler = define_scheduler( optimizer, args, dataset_size=len(self.train_loader)) return model, optimizer, scheduler, best_epoch, lowest_loss, best_f1_epoch, best_val_f1 def resume_model(self, model, path=''): args = self.args best_epoch = 0 lowest_loss = np.inf best_f1_epoch = 0 best_val_f1 = -1e-5 optim_saved_state = None schedule_saved_state = None if len(path) == 0 and args.resume: # try the latest ckpt path = os.path.join(args.save_path, 'checkpoint_model_epoch_{}.pth.tar'.format(int(args.resume))) # try the best ckpt saved if not os.path.isfile(path): path = os.path.join(args.save_path, f'model_best_epoch_{args.resume}.pth.tar') if os.path.isfile(path): self.logger.info("=> loading checkpoint from {}".format(path)) checkpoint = torch.load(path, map_location='cpu') if is_main_process(): model.load_state_dict(checkpoint['state_dict']) self.logger.info("=> loaded checkpoint '{}' (epoch {})".format(args.resume, args.start_epoch)) optim_saved_state = checkpoint['optimizer'] schedule_saved_state = checkpoint['scheduler'] args.start_epoch = int(args.resume) else: if is_main_process(): self.logger.info("=> Warning: no checkpoint found at '{}'".format(path)) return model, best_epoch, lowest_loss, best_f1_epoch, best_val_f1, optim_saved_state, schedule_saved_state def _get_valid_dataset(self): args = self.args if 'openlane' in args.dataset_name: if not args.evaluate_case: valid_dataset = LaneDataset(args.dataset_dir, args.data_dir + 'validation/', args) else: # TODO eval case valid_dataset = LaneDataset(args.dataset_dir, args.data_dir + 'test/up_down_case/', args) elif 'once' in args.dataset_name: valid_dataset = LaneDataset(args.dataset_dir, ops.join(args.data_dir, 'val/'), args) else: valid_dataset = ApolloLaneDataset(args.dataset_dir, os.path.join(args.data_dir, 'test.json'), args) valid_loader, valid_sampler = get_loader(valid_dataset, args) return valid_dataset, valid_loader, valid_sampler def save_eval_result_once(self, args, img_path, lanelines_pred, lanelines_prob): # 3d eval result result = {} result_dir = os.path.join(args.save_path, 'once_pred/') mkdir_if_missing(result_dir) result_dir = os.path.join(result_dir, 'test/') mkdir_if_missing(result_dir) file_path_splited = img_path.split('/') result_dir = os.path.join(result_dir, file_path_splited[-3]) # sequence mkdir_if_missing(result_dir) result_dir = os.path.join(result_dir, 'cam01/') mkdir_if_missing(result_dir) result_file_path = ops.join(result_dir, file_path_splited[-1][:-4]+'.json') cam_pitch = 0.3/180*np.pi cam_height = 1.5 cam_extrinsics = np.array([[np.cos(cam_pitch), 0, -np.sin(cam_pitch), 0], [0, 1, 0, 0], [np.sin(cam_pitch), 0, np.cos(cam_pitch), cam_height], [0, 0, 0, 1]], dtype=float) R_vg = np.array([[0, 1, 0], [-1, 0, 0], [0, 0, 1]], dtype=float) R_gc = np.array([[1, 0, 0], [0, 0, 1], [0, -1, 0]], dtype=float) cam_extrinsics[:3, :3] = np.matmul(np.matmul( np.matmul(np.linalg.inv(R_vg), cam_extrinsics[:3, :3]), R_vg), R_gc) cam_extrinsics[0:2, 3] = 0.0 # write lane result lane_lines = [] for k in range(len(lanelines_pred)): lane = np.array(lanelines_pred[k]) lane = np.flip(lane, axis=0) lane = lane.T lane = np.vstack((lane, np.ones((1, lane.shape[1])))) lane = np.matmul(np.linalg.inv(cam_extrinsics), lane) lane = lane[0:3,:].T lane_lines.append({'points': lane.tolist(), 'score': np.max(lanelines_prob[k])}) result['lanes'] = lane_lines with open(result_file_path, 'w') as result_file: json.dump(result, result_file) def log_eval_stats(self, eval_stats): if self.is_apollo: return self._log_genlane_eval_info(eval_stats) if is_main_process(): self.logger.info("===> Evaluation laneline F-measure: {:.8f}".format(eval_stats[0])) self.logger.info("===> Evaluation laneline Recall: {:.8f}".format(eval_stats[1])) self.logger.info("===> Evaluation laneline Precision: {:.8f}".format(eval_stats[2])) self.logger.info("===> Evaluation laneline Category Accuracy: {:.8f}".format(eval_stats[3])) self.logger.info("===> Evaluation laneline x error (close): {:.8f} m".format(eval_stats[4])) self.logger.info("===> Evaluation laneline x error (far): {:.8f} m".format(eval_stats[5])) self.logger.info("===> Evaluation laneline z error (close): {:.8f} m".format(eval_stats[6])) self.logger.info("===> Evaluation laneline z error (far): {:.8f} m".format(eval_stats[7])) def _log_genlane_eval_info(self, eval_stats): if is_main_process(): self.logger.info("===> Evaluation on validation set: \n" "laneline F-measure {:.8} \n" "laneline Recall {:.8} \n" "laneline Precision {:.8} \n" "laneline x error (close) {:.8} m\n" "laneline x error (far) {:.8} m\n" "laneline z error (close) {:.8} m\n" "laneline z error (far) {:.8} m\n".format(eval_stats[0], eval_stats[1], eval_stats[2], eval_stats[3], eval_stats[4], eval_stats[5], eval_stats[6])) def set_work_dir(cfg): # =========output path========== # save_prefix = osp.join(os.getcwd(), 'work_dirs') save_root = osp.join(save_prefix, cfg.output_dir) # cur work dirname cfg_path = Path(cfg.config) if cfg.mod is None: cfg.mod = os.path.join(cfg_path.parent.name, cfg_path.stem) save_ppath = Path(save_root, cfg.mod) save_ppath.mkdir(parents=True, exist_ok=True) cfg.save_path = save_ppath.as_posix() cfg.save_json_path = cfg.save_path seg_output_dir = Path(cfg.save_path, 'seg_vis') seg_output_dir.mkdir(parents=True, exist_ok=True) # cp config into cur_work_dir shutil.copy(cfg_path.as_posix(), cfg.save_path) ================================================ FILE: main.py ================================================ import argparse from mmcv.utils import Config, DictAction from utils.utils import * from experiments.ddp import * from experiments.runner import * def get_args(): parser = argparse.ArgumentParser() # DDP setting parser.add_argument('--distributed', action='store_true') parser.add_argument("--local_rank", type=int) parser.add_argument('--gpu', type=int, default=0) parser.add_argument('--world_size', type=int, default=1) parser.add_argument('--nodes', type=int, default=1) parser.add_argument('--use_slurm', default=False, action='store_true') # exp setting parser.add_argument('--config', type=str, help='config file path') parser.add_argument( '--cfg-options', nargs='+', action=DictAction, help='overwrite config param.') return parser.parse_args() def main(): args = get_args() # define runner to begin training or evaluation cfg = Config.fromfile(args.config) if args.cfg_options is not None: cfg.merge_from_dict(args.cfg_options) # initialize distributed data parallel set ddp_init(args) cfg.merge_from_dict(vars(args)) runner = Runner(cfg) if not cfg.evaluate: runner.train() else: runner.eval() if __name__ == '__main__': main() ================================================ FILE: models/__init__.py ================================================ ================================================ FILE: models/latr.py ================================================ import torch import torch.nn as nn import torch.nn.functional as F from utils.utils import * from mmdet3d.models import build_backbone, build_neck from .latr_head import LATRHead from mmcv.utils import Config from .ms2one import build_ms2one from .utils import deepFeatureExtractor_EfficientNet from mmdet.models.builder import BACKBONES # overall network class LATR(nn.Module): def __init__(self, args): super().__init__() self.no_cuda = args.no_cuda self.batch_size = args.batch_size self.num_lane_type = 1 # no centerline self.num_y_steps = args.num_y_steps self.max_lanes = args.max_lanes self.num_category = args.num_category _dim_ = args.latr_cfg.fpn_dim num_query = args.latr_cfg.num_query num_group = args.latr_cfg.num_group sparse_num_group = args.latr_cfg.sparse_num_group self.encoder = build_backbone(args.latr_cfg.encoder) if getattr(args.latr_cfg, 'neck', None): self.neck = build_neck(args.latr_cfg.neck) else: self.neck = None self.encoder.init_weights() self.ms2one = build_ms2one(args.ms2one) # build 2d query-based instance seg self.head = LATRHead( args=args, dim=_dim_, num_group=num_group, num_convs=4, in_channels=_dim_, kernel_dim=_dim_, position_range=args.position_range, top_view_region=args.top_view_region, positional_encoding=dict( type='SinePositionalEncoding', num_feats=_dim_// 2, normalize=True), num_query=num_query, pred_dim=self.num_y_steps, num_classes=args.num_category, embed_dims=_dim_, transformer=args.transformer, sparse_ins_decoder=args.sparse_ins_decoder, **args.latr_cfg.get('head', {}), trans_params=args.latr_cfg.get('trans_params', {}) ) def forward(self, image, _M_inv=None, is_training=True, extra_dict=None): out_featList = self.encoder(image) neck_out = self.neck(out_featList) neck_out = self.ms2one(neck_out) output = self.head( dict( x=neck_out, lane_idx=extra_dict['seg_idx_label'], seg=extra_dict['seg_label'], lidar2img=extra_dict['lidar2img'], pad_shape=extra_dict['pad_shape'], ground_lanes=extra_dict['ground_lanes'] if is_training else None, ground_lanes_dense=extra_dict['ground_lanes_dense'] if is_training else None, image=image, ), is_training=is_training, ) return output ================================================ FILE: models/latr_head.py ================================================ import numpy as np import math import cv2 import torch import torch.nn as nn from torch.nn import init import torch.nn.functional as F from torch.nn.init import normal_ from mmcv.cnn import bias_init_with_prob from mmdet.models.builder import build_loss from mmdet.models.utils import build_transformer from mmdet.core import multi_apply from mmcv.utils import Config from models.sparse_ins import SparseInsDecoder from .utils import inverse_sigmoid from .transformer_bricks import * class LATRHead(nn.Module): def __init__(self, args, dim=128, num_group=1, num_convs=4, in_channels=128, kernel_dim=128, positional_encoding=dict( type='SinePositionalEncoding', num_feats=128 // 2, normalize=True), num_classes=21, num_query=30, embed_dims=128, transformer=None, num_reg_fcs=2, depth_num=50, depth_start=3, top_view_region=None, position_range=[-50, 3, -10, 50, 103, 10.], pred_dim=10, loss_cls=dict( type='FocalLoss', use_sigmoid=True, gamma=2.0, alpha=0.25, loss_weight=2.0), loss_reg=dict(type='L1Loss', loss_weight=2.0), loss_vis=dict(type='BCEWithLogitsLoss', reduction='mean'), sparse_ins_decoder=Config( dict( encoder=dict( out_dims=64),# neck output feature channels decoder=dict( num_group=1, output_iam=True, scale_factor=1.), sparse_decoder_weight=1.0, )), xs_loss_weight=1.0, zs_loss_weight=5.0, vis_loss_weight=1.0, cls_loss_weight=20, project_loss_weight=1.0, trans_params=dict( init_z=0, bev_h=250, bev_w=100), pt_as_query=False, num_pt_per_line=5, num_feature_levels=1, gt_project_h=20, gt_project_w=30, project_crit=dict( type='SmoothL1Loss', reduction='none'), ): super().__init__() self.trans_params = dict( top_view_region=top_view_region, z_region=[position_range[2], position_range[5]]) self.trans_params.update(trans_params) self.gt_project_h = gt_project_h self.gt_project_w = gt_project_w self.num_y_steps = args.num_y_steps self.register_buffer('anchor_y_steps', torch.from_numpy(args.anchor_y_steps).float()) self.register_buffer('anchor_y_steps_dense', torch.from_numpy(args.anchor_y_steps_dense).float()) project_crit['reduction'] = 'none' self.project_crit = getattr( nn, project_crit.pop('type'))(**project_crit) self.num_classes = num_classes self.embed_dims = embed_dims # points num along y-axis. self.code_size = pred_dim self.num_query = num_query self.num_group = num_group self.num_pred = transformer['decoder']['num_layers'] self.pc_range = position_range self.xs_loss_weight = xs_loss_weight self.zs_loss_weight = zs_loss_weight self.vis_loss_weight = vis_loss_weight self.cls_loss_weight = cls_loss_weight self.project_loss_weight = project_loss_weight loss_reg['reduction'] = 'none' self.reg_crit = build_loss(loss_reg) self.cls_crit = build_loss(loss_cls) self.bce_loss = build_nn_loss(loss_vis) self.sparse_ins = SparseInsDecoder(cfg=sparse_ins_decoder) self.depth_num = depth_num self.position_dim = 3 * self.depth_num self.position_range = position_range self.depth_start = depth_start self.adapt_pos3d = nn.Sequential( nn.Conv2d(self.embed_dims, self.embed_dims*4, kernel_size=1, stride=1, padding=0), nn.ReLU(), nn.Conv2d(self.embed_dims*4, self.embed_dims, kernel_size=1, stride=1, padding=0), ) self.positional_encoding = build_positional_encoding(positional_encoding) self.position_encoder = nn.Sequential( nn.Conv2d(self.position_dim, self.embed_dims*4, kernel_size=1, stride=1, padding=0), nn.ReLU(), nn.Conv2d(self.embed_dims*4, self.embed_dims, kernel_size=1, stride=1, padding=0), ) self.transformer = build_transformer(transformer) self.query_embedding = nn.Sequential( nn.Linear(self.embed_dims, self.embed_dims), nn.ReLU(), nn.Linear(self.embed_dims, self.embed_dims), ) # build pred layer: cls, reg, vis self.num_reg_fcs = num_reg_fcs cls_branch = [] for _ in range(self.num_reg_fcs): cls_branch.append(nn.Linear(self.embed_dims, self.embed_dims)) cls_branch.append(nn.LayerNorm(self.embed_dims)) cls_branch.append(nn.ReLU(inplace=True)) cls_branch.append(nn.Linear(self.embed_dims, self.num_classes)) fc_cls = nn.Sequential(*cls_branch) reg_branch = [] for _ in range(self.num_reg_fcs): reg_branch.append(nn.Linear(self.embed_dims, self.embed_dims)) reg_branch.append(nn.ReLU()) reg_branch.append( nn.Linear( self.embed_dims, 3 * self.code_size // num_pt_per_line)) reg_branch = nn.Sequential(*reg_branch) self.cls_branches = nn.ModuleList( [fc_cls for _ in range(self.num_pred)]) self.reg_branches = nn.ModuleList( [reg_branch for _ in range(self.num_pred)]) self.num_pt_per_line = num_pt_per_line self.point_embedding = nn.Embedding( self.num_pt_per_line, self.embed_dims) self.reference_points = nn.Sequential( nn.Linear(self.embed_dims, self.embed_dims), nn.ReLU(True), nn.Linear(self.embed_dims, self.embed_dims), nn.ReLU(True), nn.Linear(self.embed_dims, 2 * self.code_size // num_pt_per_line)) self.num_feature_levels = num_feature_levels self.level_embeds = nn.Parameter(torch.Tensor( self.num_feature_levels, self.embed_dims)) self._init_weights() def _init_weights(self): self.transformer.init_weights() xavier_init(self.reference_points, distribution='uniform', bias=0) if self.cls_crit.use_sigmoid: bias_init = bias_init_with_prob(0.01) for m in self.cls_branches: nn.init.constant_(m[-1].bias, bias_init) normal_(self.level_embeds) def forward(self, input_dict, is_training=True): output_dict = {} img_feats = input_dict['x'] if not isinstance(img_feats, (list, tuple)): img_feats = [img_feats] sparse_output = self.sparse_ins( img_feats[0], lane_idx_map=input_dict['lane_idx'], input_shape=input_dict['seg'].shape[-2:], is_training=is_training) # generate 2d pos emb B, C, H, W = img_feats[0].shape masks = img_feats[0].new_zeros((B, H, W)) # TODO use actual mask if using padding or other aug sin_embed = self.positional_encoding(masks) sin_embed = self.adapt_pos3d(sin_embed) # init query and reference pt query = sparse_output['inst_features'] # BxNxC # B, N, C -> B, N, num_anchor_per_line, C query = query.unsqueeze(2) + self.point_embedding.weight[None, None, ...] query_embeds = self.query_embedding(query).flatten(1, 2) query = torch.zeros_like(query_embeds) reference_points = self.reference_points(query_embeds) reference_points = reference_points.sigmoid() mlvl_feats = img_feats feat_flatten = [] spatial_shapes = [] mlvl_masks = [] assert self.num_feature_levels == len(mlvl_feats) for lvl, feat in enumerate(mlvl_feats): bs, c, h, w = feat.shape spatial_shape = (h, w) feat = feat.flatten(2).permute(2, 0, 1) # NxBxC feat = feat + self.level_embeds[None, lvl:lvl+1, :].to(feat.device) spatial_shapes.append(spatial_shape) feat_flatten.append(feat) mlvl_masks.append(torch.zeros((bs, *spatial_shape), dtype=torch.bool, device=feat.device)) if self.transformer.with_encoder: mlvl_positional_encodings = [] pos_embed2d = [] for lvl, feat in enumerate(mlvl_feats): mlvl_positional_encodings.append( self.positional_encoding(mlvl_masks[lvl])) pos_embed2d.append( mlvl_positional_encodings[-1].flatten(2).permute(2, 0, 1)) pos_embed2d = torch.cat(pos_embed2d, 0) else: mlvl_positional_encodings = None pos_embed2d = None feat_flatten = torch.cat(feat_flatten, 0) spatial_shapes = torch.as_tensor( spatial_shapes, dtype=torch.long, device=query.device) level_start_index = torch.cat( (spatial_shapes.new_zeros((1, )), spatial_shapes.prod(1).cumsum(0)[:-1]) ) # head pos_embed = None outs_dec, project_results, outputs_classes, outputs_coords = \ self.transformer( feat_flatten, None, query, query_embeds, pos_embed, reference_points=reference_points, reg_branches=self.reg_branches, cls_branches=self.cls_branches, img_feats=img_feats, lidar2img=input_dict['lidar2img'], pad_shape=input_dict['pad_shape'], sin_embed=sin_embed, spatial_shapes=spatial_shapes, level_start_index=level_start_index, mlvl_masks=mlvl_masks, mlvl_positional_encodings=mlvl_positional_encodings, pos_embed2d=pos_embed2d, image=input_dict['image'], **self.trans_params) all_cls_scores = torch.stack(outputs_classes) all_line_preds = torch.stack(outputs_coords) all_line_preds[..., 0] = (all_line_preds[..., 0] * (self.pc_range[3] - self.pc_range[0]) + self.pc_range[0]) all_line_preds[..., 1] = (all_line_preds[..., 1] * (self.pc_range[5] - self.pc_range[2]) + self.pc_range[2]) # reshape to original format all_line_preds = all_line_preds.view( len(outputs_classes), bs, self.num_query, self.transformer.decoder.num_anchor_per_query, self.transformer.decoder.num_points_per_anchor, 2 + 1 # xz+vis ) all_line_preds = all_line_preds.permute(0, 1, 2, 5, 3, 4) all_line_preds = all_line_preds.flatten(3, 5) output_dict.update({ 'all_cls_scores': all_cls_scores, 'all_line_preds': all_line_preds, }) output_dict.update(sparse_output) if is_training: losses = self.get_loss(output_dict, input_dict) project_loss = self.get_project_loss( project_results, input_dict, h=self.gt_project_h, w=self.gt_project_w) losses['project_loss'] = \ self.project_loss_weight * project_loss output_dict.update(losses) return output_dict def get_project_loss(self, results, input_dict, h=20, w=30): gt_lane = input_dict['ground_lanes_dense'] gt_ys = self.anchor_y_steps_dense.clone() code_size = gt_ys.shape[0] gt_xs = gt_lane[..., :code_size] gt_zs = gt_lane[..., code_size : 2*code_size] gt_vis = gt_lane[..., 2*code_size:3*code_size] gt_ys = gt_ys[None, None, :].expand_as(gt_xs) gt_points = torch.stack([gt_xs, gt_ys, gt_zs], dim=-1) B = results[0].shape[0] ref_3d_home = F.pad(gt_points, (0, 1), value=1) coords_img = ground2img( ref_3d_home, h, w, input_dict['lidar2img'], input_dict['pad_shape'], mask=gt_vis) all_loss = 0. for projct_result in results: projct_result = F.interpolate( projct_result, size=(h, w), mode='nearest') gt_proj = coords_img.clone() mask = (gt_proj[:, -1, ...] > 0) * (projct_result[:, -1, ...] > 0) diff_loss = self.project_crit( projct_result[:, :3, ...], gt_proj[:, :3, ...], ) diff_y_loss = diff_loss[:, 1, ...] diff_z_loss = diff_loss[:, 2, ...] diff_loss = diff_y_loss * 0.1 + diff_z_loss diff_loss = (diff_loss * mask).sum() / torch.clamp(mask.sum(), 1) all_loss = all_loss + diff_loss return all_loss / len(results) def get_loss(self, output_dict, input_dict): all_cls_pred = output_dict['all_cls_scores'] all_lane_pred = output_dict['all_line_preds'] gt_lanes = input_dict['ground_lanes'] all_xs_loss = 0.0 all_zs_loss = 0.0 all_vis_loss = 0.0 all_cls_loss = 0.0 matched_indices = output_dict['matched_indices'] num_layers = all_lane_pred.shape[0] def single_layer_loss(layer_idx): gcls_pred = all_cls_pred[layer_idx] glane_pred = all_lane_pred[layer_idx] glane_pred = glane_pred.view( glane_pred.shape[0], self.num_group, self.num_query, glane_pred.shape[-1]) gcls_pred = gcls_pred.view( gcls_pred.shape[0], self.num_group, self.num_query, gcls_pred.shape[-1]) per_xs_loss = 0.0 per_zs_loss = 0.0 per_vis_loss = 0.0 per_cls_loss = 0.0 batch_size = len(matched_indices[0]) for b_idx in range(len(matched_indices[0])): for group_idx in range(self.num_group): pred_idx = matched_indices[group_idx][b_idx][0] gt_idx = matched_indices[group_idx][b_idx][1] cls_pred = gcls_pred[:, group_idx, ...] lane_pred = glane_pred[:, group_idx, ...] if gt_idx.shape[0] < 1: cls_target = cls_pred.new_zeros(cls_pred[b_idx].shape[0]).long() cls_loss = self.cls_crit(cls_pred[b_idx], cls_target) per_cls_loss = per_cls_loss + cls_loss per_xs_loss = per_xs_loss + 0.0 * lane_pred[b_idx].mean() continue pos_lane_pred = lane_pred[b_idx][pred_idx] gt_lane = gt_lanes[b_idx][gt_idx] pred_xs = pos_lane_pred[:, :self.code_size] pred_zs = pos_lane_pred[:, self.code_size : 2*self.code_size] pred_vis = pos_lane_pred[:, 2*self.code_size:] gt_xs = gt_lane[:, :self.code_size] gt_zs = gt_lane[:, self.code_size : 2*self.code_size] gt_vis = gt_lane[:, 2*self.code_size:3*self.code_size] loc_mask = gt_vis > 0 xs_loss = self.reg_crit(pred_xs, gt_xs) zs_loss = self.reg_crit(pred_zs, gt_zs) xs_loss = (xs_loss * loc_mask).sum() / torch.clamp(loc_mask.sum(), 1) zs_loss = (zs_loss * loc_mask).sum() / torch.clamp(loc_mask.sum(), 1) vis_loss = self.bce_loss(pred_vis, gt_vis) cls_target = cls_pred.new_zeros(cls_pred[b_idx].shape[0]).long() cls_target[pred_idx] = torch.argmax( gt_lane[:, 3*self.code_size:], dim=1) cls_loss = self.cls_crit(cls_pred[b_idx], cls_target) per_xs_loss += xs_loss per_zs_loss += zs_loss per_vis_loss += vis_loss per_cls_loss += cls_loss return tuple(map(lambda x: x / batch_size / self.num_group, [per_xs_loss, per_zs_loss, per_vis_loss, per_cls_loss])) all_xs_loss, all_zs_loss, all_vis_loss, all_cls_loss = multi_apply( single_layer_loss, range(all_lane_pred.shape[0])) all_xs_loss = sum(all_xs_loss) / num_layers all_zs_loss = sum(all_zs_loss) / num_layers all_vis_loss = sum(all_vis_loss) / num_layers all_cls_loss = sum(all_cls_loss) / num_layers return dict( all_xs_loss=self.xs_loss_weight * all_xs_loss, all_zs_loss=self.zs_loss_weight * all_zs_loss, all_vis_loss=self.vis_loss_weight * all_vis_loss, all_cls_loss=self.cls_loss_weight * all_cls_loss, ) @staticmethod def get_reference_points(H, W, bs=1, device='cuda', dtype=torch.float): ref_y, ref_x = torch.meshgrid( torch.linspace( 0.5, H - 0.5, H, dtype=dtype, device=device), torch.linspace( 0.5, W - 0.5, W, dtype=dtype, device=device) ) ref_y = ref_y.reshape(-1)[None] / H ref_x = ref_x.reshape(-1)[None] / W ref_2d = torch.stack((ref_x, ref_y), -1) ref_2d = ref_2d.repeat(bs, 1, 1) return ref_2d def build_nn_loss(loss_cfg): crit_t = loss_cfg.pop('type') return getattr(nn, crit_t)(**loss_cfg) ================================================ FILE: models/ms2one.py ================================================ import torch import torch.nn as nn import torch.nn.functional as F import copy from mmcv.cnn import ConvModule from mmseg.ops import resize def build_ms2one(config): config = copy.deepcopy(config) t = config.pop('type') if t == 'Naive': return Naive(**config) elif t == 'DilateNaive': return DilateNaive(**config) class Naive(nn.Module): def __init__(self, inc, outc, kernel_size=1): super().__init__() self.layer = nn.Conv2d(inc, outc, kernel_size=1) def forward(self, ms_feats): out = self.layer(torch.cat([ F.interpolate(tmp, ms_feats[0].shape[-2:], mode='bilinear') for tmp in ms_feats], dim=1)) return out class DilateNaive(nn.Module): def __init__(self, inc, outc, num_scales=4, dilations=(1, 2, 5, 9), merge=True, fpn=False, target_shape=None, one_layer_before=False): super().__init__() self.dilations = dilations self.num_scales = num_scales if not isinstance(inc, (tuple, list)): inc = [inc for _ in range(num_scales)] self.inc = inc self.outc = outc self.merge = merge self.fpn = fpn self.target_shape = target_shape self.layers = nn.ModuleList() for i in range(num_scales): layers = [] if one_layer_before: layers.extend([ nn.Conv2d(inc[i], outc, kernel_size=1, bias=False), nn.BatchNorm2d(outc), nn.ReLU(True) ]) for j in range(len(dilations[:-i])): d = dilations[j] layers.append(nn.Sequential( nn.Conv2d(inc[i] if j == 0 and not one_layer_before else outc, outc, kernel_size=1 if d == 1 else 3, stride=1, padding=0 if d == 1 else d, dilation=d, bias=False), nn.BatchNorm2d(outc), nn.ReLU(True))) self.layers.append(nn.Sequential(*layers)) if self.merge: self.final_layer = nn.Sequential( nn.Conv2d(outc, outc, 3, 1, padding=1, bias=False), nn.BatchNorm2d(outc), nn.ReLU(True), nn.Conv2d(outc, outc, 1)) def forward(self, x): outs = [] for i in range(self.num_scales - 1, -1, -1): if self.fpn and i < self.num_scales - 1: tmp = self.layers[i](x[i] + F.interpolate( x[i + 1], x[i].shape[2:], mode='bilinear', align_corners=True)) else: tmp = self.layers[i](x[i]) if self.target_shape is None: if i > 0 and self.merge: tmp = F.interpolate(tmp, x[0].shape[2:], mode='bilinear', align_corners=True) else: tmp = F.interpolate(tmp, self.target_shape, mode='bilinear', align_corners=True) outs.append(tmp) if self.merge: out = torch.sum(torch.stack(outs, dim=-1), dim=-1) out = self.final_layer(out) return out else: return outs ================================================ FILE: models/scatter_utils.py ================================================ # Copy from https://github.com/rusty1s/pytorch_scatter from typing import Optional, Tuple import torch def broadcast(src: torch.Tensor, other: torch.Tensor, dim: int): if dim < 0: dim = other.dim() + dim if src.dim() == 1: for _ in range(0, dim): src = src.unsqueeze(0) for _ in range(src.dim(), other.dim()): src = src.unsqueeze(-1) src = src.expand(other.size()) return src def scatter_sum(src: torch.Tensor, index: torch.Tensor, dim: int = -1, out: Optional[torch.Tensor] = None, dim_size: Optional[int] = None) -> torch.Tensor: index = broadcast(index, src, dim) if out is None: size = list(src.size()) if dim_size is not None: size[dim] = dim_size elif index.numel() == 0: size[dim] = 0 else: size[dim] = int(index.max()) + 1 out = torch.zeros(size, dtype=src.dtype, device=src.device) return out.scatter_add_(dim, index, src) else: return out.scatter_add_(dim, index, src) def scatter_add(src: torch.Tensor, index: torch.Tensor, dim: int = -1, out: Optional[torch.Tensor] = None, dim_size: Optional[int] = None) -> torch.Tensor: return scatter_sum(src, index, dim, out, dim_size) def scatter_mul(src: torch.Tensor, index: torch.Tensor, dim: int = -1, out: Optional[torch.Tensor] = None, dim_size: Optional[int] = None) -> torch.Tensor: return torch.ops.torch_scatter.scatter_mul(src, index, dim, out, dim_size) def scatter_mean(src: torch.Tensor, index: torch.Tensor, dim: int = -1, out: Optional[torch.Tensor] = None, dim_size: Optional[int] = None) -> torch.Tensor: out = scatter_sum(src, index, dim, out, dim_size) dim_size = out.size(dim) index_dim = dim if index_dim < 0: index_dim = index_dim + src.dim() if index.dim() <= index_dim: index_dim = index.dim() - 1 ones = torch.ones(index.size(), dtype=src.dtype, device=src.device) count = scatter_sum(ones, index, index_dim, None, dim_size) count[count < 1] = 1 count = broadcast(count, out, dim) if out.is_floating_point(): out.true_divide_(count) else: out.div_(count, rounding_mode='floor') return out def scatter_min( src: torch.Tensor, index: torch.Tensor, dim: int = -1, out: Optional[torch.Tensor] = None, dim_size: Optional[int] = None) -> Tuple[torch.Tensor, torch.Tensor]: return torch.ops.torch_scatter.scatter_min(src, index, dim, out, dim_size) def scatter_max( src: torch.Tensor, index: torch.Tensor, dim: int = -1, out: Optional[torch.Tensor] = None, dim_size: Optional[int] = None) -> Tuple[torch.Tensor, torch.Tensor]: return torch.ops.torch_scatter.scatter_max(src, index, dim, out, dim_size) def scatter(src: torch.Tensor, index: torch.Tensor, dim: int = -1, out: Optional[torch.Tensor] = None, dim_size: Optional[int] = None, reduce: str = "sum") -> torch.Tensor: r""" | .. image:: https://raw.githubusercontent.com/rusty1s/pytorch_scatter/ master/docs/source/_figures/add.svg?sanitize=true :align: center :width: 400px | Reduces all values from the :attr:`src` tensor into :attr:`out` at the indices specified in the :attr:`index` tensor along a given axis :attr:`dim`. For each value in :attr:`src`, its output index is specified by its index in :attr:`src` for dimensions outside of :attr:`dim` and by the corresponding value in :attr:`index` for dimension :attr:`dim`. The applied reduction is defined via the :attr:`reduce` argument. Formally, if :attr:`src` and :attr:`index` are :math:`n`-dimensional tensors with size :math:`(x_0, ..., x_{i-1}, x_i, x_{i+1}, ..., x_{n-1})` and :attr:`dim` = `i`, then :attr:`out` must be an :math:`n`-dimensional tensor with size :math:`(x_0, ..., x_{i-1}, y, x_{i+1}, ..., x_{n-1})`. Moreover, the values of :attr:`index` must be between :math:`0` and :math:`y - 1`, although no specific ordering of indices is required. The :attr:`index` tensor supports broadcasting in case its dimensions do not match with :attr:`src`. For one-dimensional tensors with :obj:`reduce="sum"`, the operation computes .. math:: \mathrm{out}_i = \mathrm{out}_i + \sum_j~\mathrm{src}_j where :math:`\sum_j` is over :math:`j` such that :math:`\mathrm{index}_j = i`. .. note:: This operation is implemented via atomic operations on the GPU and is therefore **non-deterministic** since the order of parallel operations to the same value is undetermined. For floating-point variables, this results in a source of variance in the result. :param src: The source tensor. :param index: The indices of elements to scatter. :param dim: The axis along which to index. (default: :obj:`-1`) :param out: The destination tensor. :param dim_size: If :attr:`out` is not given, automatically create output with size :attr:`dim_size` at dimension :attr:`dim`. If :attr:`dim_size` is not given, a minimal sized output tensor according to :obj:`index.max() + 1` is returned. :param reduce: The reduce operation (:obj:`"sum"`, :obj:`"mul"`, :obj:`"mean"`, :obj:`"min"` or :obj:`"max"`). (default: :obj:`"sum"`) :rtype: :class:`Tensor` .. code-block:: python from torch_scatter import scatter src = torch.randn(10, 6, 64) index = torch.tensor([0, 1, 0, 1, 2, 1]) # Broadcasting in the first and last dim. out = scatter(src, index, dim=1, reduce="sum") print(out.size()) .. code-block:: torch.Size([10, 3, 64]) """ if reduce == 'sum' or reduce == 'add': return scatter_sum(src, index, dim, out, dim_size) if reduce == 'mul': return scatter_mul(src, index, dim, out, dim_size) elif reduce == 'mean': return scatter_mean(src, index, dim, out, dim_size) elif reduce == 'min': return scatter_min(src, index, dim, out, dim_size)[0] elif reduce == 'max': return scatter_max(src, index, dim, out, dim_size)[0] else: raise ValueError ================================================ FILE: models/sparse_ins.py ================================================ import math import torch import torch.nn as nn from torch.nn import init import torch.nn.functional as F from fvcore.nn.weight_init import c2_msra_fill, c2_xavier_fill from .sparse_inst_loss import SparseInstCriterion, SparseInstMatcher def _make_stack_3x3_convs(num_convs, in_channels, out_channels): convs = [] for _ in range(num_convs): convs.append( nn.Conv2d(in_channels, out_channels, 3, padding=1)) convs.append(nn.ReLU(True)) in_channels = out_channels return nn.Sequential(*convs) class MaskBranch(nn.Module): def __init__(self, cfg, in_channels): super().__init__() dim = cfg.hidden_dim num_convs = cfg.num_convs kernel_dim = cfg.kernel_dim self.mask_convs = _make_stack_3x3_convs(num_convs, in_channels, dim) self.projection = nn.Conv2d(dim, kernel_dim, kernel_size=1) self._init_weights() def _init_weights(self): for m in self.mask_convs.modules(): if isinstance(m, nn.Conv2d): c2_msra_fill(m) c2_msra_fill(self.projection) def forward(self, features): # mask features (x4 convs) features = self.mask_convs(features) return self.projection(features) class InstanceBranch(nn.Module): def __init__(self, cfg, in_channels, **kwargs): super().__init__() num_mask = cfg.num_query dim = cfg.hidden_dim num_classes = cfg.num_classes kernel_dim = cfg.kernel_dim num_convs = cfg.num_convs num_group = cfg.get('num_group', 1) sparse_num_group = cfg.get('sparse_num_group', 1) self.num_group = num_group self.sparse_num_group = sparse_num_group self.num_mask = num_mask self.inst_convs = _make_stack_3x3_convs( num_convs=num_convs, in_channels=in_channels, out_channels=dim) self.iam_conv = nn.Conv2d( dim * num_group, num_group * num_mask * sparse_num_group, 3, padding=1, groups=num_group * sparse_num_group) self.fc = nn.Linear(dim * sparse_num_group, dim) # output self.mask_kernel = nn.Linear( dim, kernel_dim) self.cls_score = nn.Linear( dim, num_classes) self.objectness = nn.Linear( dim, 1) self.prior_prob = 0.01 self._init_weights() def _init_weights(self): for m in self.inst_convs.modules(): if isinstance(m, nn.Conv2d): c2_msra_fill(m) bias_value = -math.log((1 - self.prior_prob) / self.prior_prob) for module in [self.iam_conv, self.cls_score]: init.constant_(module.bias, bias_value) init.normal_(self.iam_conv.weight, std=0.01) init.normal_(self.cls_score.weight, std=0.01) init.normal_(self.mask_kernel.weight, std=0.01) init.constant_(self.mask_kernel.bias, 0.0) c2_xavier_fill(self.fc) def forward(self, seg_features, is_training=True): out = {} # SparseInst part seg_features = self.inst_convs(seg_features) # predict instance activation maps iam = self.iam_conv(seg_features.tile( (1, self.num_group, 1, 1))) if not is_training: iam = iam.view( iam.shape[0], self.num_group, self.num_mask * self.sparse_num_group, *iam.shape[-2:]) iam = iam[:, 0, ...] num_group = 1 else: num_group = self.num_group iam_prob = iam.sigmoid() B, N = iam_prob.shape[:2] C = seg_features.size(1) # BxNxHxW -> BxNx(HW) iam_prob = iam_prob.view(B, N, -1) normalizer = iam_prob.sum(-1).clamp(min=1e-6) iam_prob_norm_hw = iam_prob / normalizer[:, :, None] # aggregate features: BxCxHxW -> Bx(HW)xC # (B x N x HW) @ (B x HW x C) -> B x N x C all_inst_features = torch.bmm( iam_prob_norm_hw, seg_features.view(B, C, -1).permute(0, 2, 1)) #BxNxC # concat sparse group features inst_features = all_inst_features.reshape( B, num_group, self.sparse_num_group, self.num_mask, -1 ).permute(0, 1, 3, 2, 4).reshape( B, num_group, self.num_mask, -1) inst_features = F.relu_( self.fc(inst_features)) # avg over sparse group iam_prob = iam_prob.view( B, num_group, self.sparse_num_group, self.num_mask, iam_prob.shape[-1]) iam_prob = iam_prob.mean(dim=2).flatten(1, 2) inst_features = inst_features.flatten(1, 2) out.update(dict( iam_prob=iam_prob, inst_features=inst_features)) if self.training: pred_logits = self.cls_score(inst_features) pred_kernel = self.mask_kernel(inst_features) pred_scores = self.objectness(inst_features) out.update(dict( pred_logits=pred_logits, pred_kernel=pred_kernel, pred_scores=pred_scores)) return out class SparseInsDecoder(nn.Module): def __init__(self, cfg, **kargs) -> None: super().__init__() in_channels = cfg.encoder.out_dims + 2 self.output_iam = cfg.decoder.output_iam self.scale_factor = cfg.decoder.scale_factor self.sparse_decoder_weight = cfg.sparse_decoder_weight self.inst_branch = InstanceBranch(cfg.decoder, in_channels) # dim, num_convs, kernel_dim, in_channels self.mask_branch = MaskBranch(cfg.decoder, in_channels) self.sparse_inst_crit = SparseInstCriterion( num_classes=cfg.decoder.num_classes, matcher=SparseInstMatcher(), cfg=cfg) self._init_weights() def _init_weights(self): self.inst_branch._init_weights() self.mask_branch._init_weights() @torch.no_grad() def compute_coordinates(self, x): h, w = x.size(2), x.size(3) y_loc = -1.0 + 2.0 * torch.arange(h, device=x.device) / (h - 1) x_loc = -1.0 + 2.0 * torch.arange(w, device=x.device) / (w - 1) y_loc, x_loc = torch.meshgrid(y_loc, x_loc) y_loc = y_loc.expand([x.shape[0], 1, -1, -1]) x_loc = x_loc.expand([x.shape[0], 1, -1, -1]) locations = torch.cat([x_loc, y_loc], 1) return locations.to(x) def forward(self, features, is_training=True, **kwargs): output = {} coord_features = self.compute_coordinates(features) features = torch.cat([coord_features, features], dim=1) inst_output = self.inst_branch( features, is_training=is_training) output.update(inst_output) if is_training: mask_features = self.mask_branch(features) pred_kernel = inst_output['pred_kernel'] N = pred_kernel.shape[1] B, C, H, W = mask_features.shape pred_masks = torch.bmm(pred_kernel, mask_features.view( B, C, H * W)).view(B, N, H, W) pred_masks = F.interpolate( pred_masks, scale_factor=self.scale_factor, mode='bilinear', align_corners=False) output.update(dict( pred_masks=pred_masks)) if self.training: sparse_inst_losses, matched_indices = self.loss( output, lane_idx_map=kwargs.get('lane_idx_map'), input_shape=kwargs.get('input_shape') ) for k, v in sparse_inst_losses.items(): sparse_inst_losses[k] = self.sparse_decoder_weight * v output.update(sparse_inst_losses) output['matched_indices'] = matched_indices return output def loss(self, output, lane_idx_map, input_shape): """ output : from self.forward lane_idx_map : instance-level segmentation map, [20, H, W] where 20=max_lanes """ pred_masks = output['pred_masks'] pred_masks = output['pred_masks'].view( pred_masks.shape[0], self.inst_branch.num_group, self.inst_branch.num_mask, *pred_masks.shape[2:]) pred_logits = output['pred_logits'] pred_logits = output['pred_logits'].view( pred_logits.shape[0], self.inst_branch.num_group, self.inst_branch.num_mask, *pred_logits.shape[2:]) pred_scores = output['pred_scores'] pred_scores = output['pred_scores'].view( pred_scores.shape[0], self.inst_branch.num_group, self.inst_branch.num_mask, *pred_scores.shape[2:]) out = {} all_matched_indices = [] for group_idx in range(self.inst_branch.num_group): sparse_inst_losses, matched_indices = \ self.sparse_inst_crit( outputs=dict( pred_masks=pred_masks[:, group_idx, ...].contiguous(), pred_logits=pred_logits[:, group_idx, ...].contiguous(), pred_scores=pred_scores[:, group_idx, ...].contiguous(), ), targets=self.prepare_targets(lane_idx_map), input_shape=input_shape, # seg_bev ) for k, v in sparse_inst_losses.items(): out['%s_%d' % (k, group_idx)] = v all_matched_indices.append(matched_indices) return out, all_matched_indices def prepare_targets(self, targets): new_targets = [] for targets_per_image in targets: target = {} cls_labels = targets_per_image.flatten(-2).max(-1)[0] pos_mask = cls_labels > 0 target["labels"] = cls_labels[pos_mask].long() target["masks"] = targets_per_image[pos_mask] > 0 new_targets.append(target) return new_targets return output ================================================ FILE: models/sparse_inst_loss.py ================================================ # Copyright (c) Tianheng Cheng and its affiliates. All Rights Reserved import torch import torch.nn as nn import torch.nn.functional as F from torch.cuda.amp import autocast from scipy.optimize import linear_sum_assignment from fvcore.nn import sigmoid_focal_loss_jit from typing import Optional, List import torch from torch import Tensor import torch.distributed as dist import torch.nn.functional as F import torchvision def _max_by_axis(the_list): # type: (List[List[int]]) -> List[int] maxes = the_list[0] for sublist in the_list[1:]: for index, item in enumerate(sublist): maxes[index] = max(maxes[index], item) return maxes class NestedTensor(object): def __init__(self, tensors, mask: Optional[Tensor]): self.tensors = tensors self.mask = mask def to(self, device): cast_tensor = self.tensors.to(device) mask = self.mask if mask is not None: assert mask is not None cast_mask = mask.to(device) else: cast_mask = None return NestedTensor(cast_tensor, cast_mask) def decompose(self): return self.tensors, self.mask def __repr__(self): return str(self.tensors) # _onnx_nested_tensor_from_tensor_list() is an implementation of # nested_tensor_from_tensor_list() that is supported by ONNX tracing. @torch.jit.unused def _onnx_nested_tensor_from_tensor_list(tensor_list: List[Tensor]) -> NestedTensor: max_size = [] for i in range(tensor_list[0].dim()): max_size_i = torch.max(torch.stack([img.shape[i] for img in tensor_list]).to(torch.float32)).to(torch.int64) max_size.append(max_size_i) max_size = tuple(max_size) # work around for # pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img) # m[: img.shape[1], :img.shape[2]] = False # which is not yet supported in onnx padded_imgs = [] padded_masks = [] for img in tensor_list: padding = [(s1 - s2) for s1, s2 in zip(max_size, tuple(img.shape))] padded_img = torch.nn.functional.pad(img, (0, padding[2], 0, padding[1], 0, padding[0])) padded_imgs.append(padded_img) m = torch.zeros_like(img[0], dtype=torch.int, device=img.device) padded_mask = torch.nn.functional.pad(m, (0, padding[2], 0, padding[1]), "constant", 1) padded_masks.append(padded_mask.to(torch.bool)) tensor = torch.stack(padded_imgs) mask = torch.stack(padded_masks) return NestedTensor(tensor, mask=mask) def nested_tensor_from_tensor_list(tensor_list: List[Tensor]): # TODO make this more general if tensor_list[0].ndim == 3: if torchvision._is_tracing(): # nested_tensor_from_tensor_list() does not export well to ONNX # call _onnx_nested_tensor_from_tensor_list() instead return _onnx_nested_tensor_from_tensor_list(tensor_list) # TODO make it support different-sized images max_size = _max_by_axis([list(img.shape) for img in tensor_list]) # min_size = tuple(min(s) for s in zip(*[img.shape for img in tensor_list])) batch_shape = [len(tensor_list)] + max_size b, c, h, w = batch_shape dtype = tensor_list[0].dtype device = tensor_list[0].device tensor = torch.zeros(batch_shape, dtype=dtype, device=device) mask = torch.ones((b, h, w), dtype=torch.bool, device=device) for img, pad_img, m in zip(tensor_list, tensor, mask): pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img) m[: img.shape[1], :img.shape[2]] = False else: raise ValueError('not supported') return NestedTensor(tensor, mask) def nested_masks_from_list(tensor_list: List[Tensor], input_shape=None): if tensor_list[0].ndim == 3: dim_size = sum([img.shape[0] for img in tensor_list]) if input_shape is None: max_size = _max_by_axis([list(img.shape[-2:]) for img in tensor_list]) else: max_size = [input_shape[0], input_shape[1]] batch_shape = [dim_size] + max_size # b, h, w = batch_shape dtype = tensor_list[0].dtype device = tensor_list[0].device tensor = torch.zeros(batch_shape, dtype=dtype, device=device) mask = torch.zeros(batch_shape, dtype=torch.bool, device=device) idx = 0 for img in tensor_list: c = img.shape[0] c_ = idx + c tensor[idx: c_, :img.shape[1], : img.shape[2]].copy_(img) mask[idx: c_, :img.shape[1], :img.shape[2]] = True idx = c_ else: raise ValueError('not supported') return NestedTensor(tensor, mask) def is_dist_avail_and_initialized(): if not dist.is_available(): return False if not dist.is_initialized(): return False return True def get_world_size(): if not is_dist_avail_and_initialized(): return 1 return dist.get_world_size() def aligned_bilinear(tensor, factor): # borrowed from Adelaidet: https://github1s.com/aim-uofa/AdelaiDet/blob/HEAD/adet/utils/comm.py assert tensor.dim() == 4 assert factor >= 1 assert int(factor) == factor if factor == 1: return tensor h, w = tensor.size()[2:] tensor = F.pad(tensor, pad=(0, 1, 0, 1), mode="replicate") oh = factor * h + 1 ow = factor * w + 1 tensor = F.interpolate( tensor, size=(oh, ow), mode='bilinear', align_corners=True ) tensor = F.pad( tensor, pad=(factor // 2, 0, factor // 2, 0), mode="replicate" ) return tensor[:, :, :oh - 1, :ow - 1] def compute_mask_iou(inputs, targets): inputs = inputs.sigmoid() # thresholding binarized_inputs = (inputs >= 0.4).float() targets = (targets > 0.5).float() intersection = (binarized_inputs * targets).sum(-1) union = targets.sum(-1) + binarized_inputs.sum(-1) - intersection score = intersection / (union + 1e-6) return score def dice_score(inputs, targets): inputs = inputs.sigmoid() numerator = 2 * torch.matmul(inputs, targets.t()) denominator = ( inputs * inputs).sum(-1)[:, None] + (targets * targets).sum(-1) score = numerator / (denominator + 1e-4) return score def dice_loss(inputs, targets, reduction='sum'): inputs = inputs.sigmoid() assert inputs.shape == targets.shape numerator = 2 * (inputs * targets).sum(1) denominator = (inputs * inputs).sum(-1) + (targets * targets).sum(-1) loss = 1 - (numerator) / (denominator + 1e-4) if reduction == 'none': return loss return loss.sum() # @SPARSE_INST_CRITERION_REGISTRY.register() class SparseInstCriterion(nn.Module): # This part is partially derivated from: https://github.com/facebookresearch/detr/blob/main/models/detr.py def __init__(self, num_classes=4, cfg=None, matcher=None): super().__init__() self.matcher = matcher self.losses = ("labels", "masks") # cfg.MODEL.SPARSE_INST.LOSS.ITEMS self.weight_dict = self.get_weight_dict(cfg) self.num_classes = num_classes # cfg.MODEL.SPARSE_INST.DECODER.NUM_CLASSES def get_weight_dict(self, cfg): losses = ("loss_ce", "loss_mask", "loss_dice", "loss_objectness") weight_dict = {} ce_weight = cfg.get('ce_weight', 2.0) mask_weight = cfg.get('mask_weight', 5.0) dice_weight = cfg.get('dice_weight', 2.0) objectness_weight = cfg.get('objectness_weight', 1.0) weight_dict = dict( zip(losses, (ce_weight, mask_weight, dice_weight, objectness_weight))) return weight_dict def _get_src_permutation_idx(self, indices): # permute predictions following indices batch_idx = torch.cat([torch.full_like(src, i) for i, (src, _) in enumerate(indices)]) src_idx = torch.cat([src for (src, _) in indices]) return batch_idx, src_idx def _get_tgt_permutation_idx(self, indices): # permute targets following indices batch_idx = torch.cat([torch.full_like(tgt, i) for i, (_, tgt) in enumerate(indices)]) tgt_idx = torch.cat([tgt for (_, tgt) in indices]) return batch_idx, tgt_idx def loss_labels(self, outputs, targets, indices, num_instances, input_shape=None): assert "pred_logits" in outputs src_logits = outputs['pred_logits'] target_classes = torch.full(src_logits.shape[:2], 0, # self.num_classes, dtype=torch.int64, device=src_logits.device) if sum([tmp[0].shape[0] for tmp in indices]) > 0: idx = self._get_src_permutation_idx(indices) target_classes_o = torch.cat([t["labels"][J] for t, (_, J) in zip(targets, indices)]) target_classes[idx] = target_classes_o src_logits = src_logits.flatten(0, 1) # prepare one_hot target. target_classes = target_classes.flatten(0, 1) pos_inds = torch.nonzero( target_classes != self.num_classes, as_tuple=True)[0] labels = torch.zeros_like(src_logits) labels[pos_inds, target_classes[pos_inds]] = 1 # comp focal loss. class_loss = sigmoid_focal_loss_jit( src_logits, labels, alpha=0.25, gamma=2.0, reduction="sum", ) / num_instances losses = {'loss_ce': class_loss} return losses def loss_masks_with_iou_objectness(self, outputs, targets, indices, num_instances, input_shape): src_idx = self._get_src_permutation_idx(indices) tgt_idx = self._get_tgt_permutation_idx(indices) # Bx100xHxW assert "pred_masks" in outputs assert "pred_scores" in outputs src_iou_scores = outputs["pred_scores"] src_masks = outputs["pred_masks"] with torch.no_grad(): target_masks, _ = nested_masks_from_list( [t["masks"] for t in targets], input_shape).decompose() num_masks = [len(t["masks"]) for t in targets] target_masks = target_masks.to(src_masks) if len(target_masks) == 0: losses = { "loss_dice": src_masks.sum() * 0.0, "loss_mask": src_masks.sum() * 0.0, "loss_objectness": src_iou_scores.sum() * 0.0 } return losses src_masks = src_masks[src_idx] target_masks = F.interpolate( target_masks[:, None], size=src_masks.shape[-2:], mode='bilinear', align_corners=False).squeeze(1) src_masks = src_masks.flatten(1) # FIXME: tgt_idx mix_tgt_idx = torch.zeros_like(tgt_idx[1]) cum_sum = 0 for num_mask in num_masks: mix_tgt_idx[cum_sum: cum_sum + num_mask] = cum_sum cum_sum += num_mask mix_tgt_idx += tgt_idx[1] target_masks = target_masks[mix_tgt_idx].flatten(1) with torch.no_grad(): ious = compute_mask_iou(src_masks, target_masks) tgt_iou_scores = ious src_iou_scores = src_iou_scores[src_idx] tgt_iou_scores = tgt_iou_scores.flatten(0) src_iou_scores = src_iou_scores.flatten(0) losses = { "loss_objectness": F.binary_cross_entropy_with_logits(src_iou_scores, tgt_iou_scores, reduction='mean'), "loss_dice": dice_loss(src_masks, target_masks) / num_instances, "loss_mask": F.binary_cross_entropy_with_logits(src_masks, target_masks, reduction='mean') } return losses def get_loss(self, loss, outputs, targets, indices, num_instances, **kwargs): loss_map = { "labels": self.loss_labels, "masks": self.loss_masks_with_iou_objectness, } if loss == "loss_objectness": # NOTE: loss_objectness will be calculated in `loss_masks_with_iou_objectness` return {} assert loss in loss_map return loss_map[loss](outputs, targets, indices, num_instances, **kwargs) def forward(self, outputs, targets, input_shape): outputs_without_aux = {k: v for k, v in outputs.items() if k != 'aux_outputs'} # Retrieve the matching between the outputs of the last layer and the targets indices = self.matcher(outputs_without_aux, targets, input_shape) # Compute the average number of target boxes accross all nodes, for normalization purposes num_instances = sum(len(t["labels"]) for t in targets) num_instances = torch.as_tensor( [num_instances], dtype=torch.float, device=next(iter(outputs.values())).device) if is_dist_avail_and_initialized(): torch.distributed.all_reduce(num_instances) num_instances = torch.clamp( num_instances / get_world_size(), min=1).item() # Compute all the requested losses losses = {} for loss in self.losses: # try: losses.update(self.get_loss(loss, outputs, targets, indices, num_instances, input_shape=input_shape)) # except Exception as e: # import pdb; pdb.set_trace() for k in losses.keys(): if k in self.weight_dict: losses[k] *= self.weight_dict[k] return losses, indices # @SPARSE_INST_MATCHER_REGISTRY.register() class SparseInstMatcherV1(nn.Module): def __init__(self, cfg=None): super().__init__() self.alpha = 0.8 # cfg.MODEL.SPARSE_INST.MATCHER.ALPHA self.beta = 0.2 # cfg.MODEL.SPARSE_INST.MATCHER.BETA self.mask_score = dice_score @torch.no_grad() def forward(self, outputs, targets, input_shape): B, N, H, W = outputs["pred_masks"].shape pred_masks = outputs['pred_masks'] pred_logits = outputs['pred_logits'].sigmoid() indices = [] for i in range(B): tgt_ids = targets[i]["labels"] # no annotations if tgt_ids.shape[0] == 0: indices.append((torch.as_tensor([]), torch.as_tensor([]))) continue tgt_masks = targets[i]['masks'].tensor.to(pred_masks) pred_logit = pred_logits[i] out_masks = pred_masks[i] # upsampling: # (1) padding/ # (2) upsampling to 1x input size (input_shape) # (3) downsampling to 0.25x input size (output mask size) ori_h, ori_w = tgt_masks.size(1), tgt_masks.size(2) tgt_masks_ = torch.zeros( (1, tgt_masks.size(0), input_shape[0], input_shape[1])).to(pred_masks) tgt_masks_[0, :, :ori_h, :ori_w] = tgt_masks tgt_masks = F.interpolate( tgt_masks_, size=out_masks.shape[-2:], mode='bilinear', align_corners=False)[0] # compute dice score and classification score tgt_masks = tgt_masks.flatten(1) out_masks = out_masks.flatten(1) mask_score = self.mask_score(out_masks, tgt_masks) # Nx(Number of gts) matching_prob = pred_logit[:, tgt_ids] C = (mask_score ** self.alpha) * (matching_prob ** self.beta) # hungarian matching inds = linear_sum_assignment(C.cpu(), maximize=True) indices.append(inds) return [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor(j, dtype=torch.int64)) for i, j in indices] # @SPARSE_INST_MATCHER_REGISTRY.register() class SparseInstMatcher(nn.Module): def __init__(self, cfg=None): super().__init__() self.alpha = 0.8 # cfg.MODEL.SPARSE_INST.MATCHER.ALPHA self.beta = 0.2 # cfg.MODEL.SPARSE_INST.MATCHER.BETA self.mask_score = dice_score def forward(self, outputs, targets, input_shape): with torch.no_grad(): # B x 40 x 90 x 120 B, N, H, W = outputs["pred_masks"].shape pred_masks = outputs['pred_masks'] pred_logits = outputs['pred_logits'].sigmoid() tgt_ids = torch.cat([v["labels"] for v in targets]) if tgt_ids.shape[0] == 0: return [(torch.as_tensor([]).to(pred_logits), torch.as_tensor([]).to(pred_logits))] * B tgt_masks, _ = nested_masks_from_list( [t["masks"] for t in targets], input_shape).decompose() device = pred_masks.device tgt_masks = tgt_masks.to(pred_masks) tgt_masks = F.interpolate( tgt_masks[:, None], size=pred_masks.shape[-2:], mode="bilinear", align_corners=False).squeeze(1) pred_masks = pred_masks.view(B * N, -1) tgt_masks = tgt_masks.flatten(1) with autocast(enabled=False): pred_masks = pred_masks.float() tgt_masks = tgt_masks.float() pred_logits = pred_logits.float() mask_score = self.mask_score(pred_masks, tgt_masks) # Nx(Number of gts) matching_prob = pred_logits.view(B * N, -1)[:, tgt_ids] C = (mask_score ** self.alpha) * (matching_prob ** self.beta) C = C.view(B, N, -1).cpu() # hungarian matching sizes = [len(v["masks"]) for v in targets] indices = [linear_sum_assignment(c[i], maximize=True) for i, c in enumerate(C.split(sizes, -1))] indices = [(torch.as_tensor(i, dtype=torch.int64), torch.as_tensor( j, dtype=torch.int64)) for i, j in indices] return indices # def build_sparse_inst_matcher(cfg): # name = cfg.MODEL.SPARSE_INST.MATCHER.NAME # return SPARSE_INST_MATCHER_REGISTRY.get(name)(cfg) # def build_sparse_inst_criterion(cfg): # matcher = build_sparse_inst_matcher(cfg) # name = cfg.MODEL.SPARSE_INST.LOSS.NAME # return SPARSE_INST_CRITERION_REGISTRY.get(name)(cfg, matcher) ================================================ FILE: models/transformer_bricks.py ================================================ import numpy as np import math import warnings import torch import torch.nn as nn from torch.nn import init import torch.nn.functional as F from fvcore.nn.weight_init import c2_msra_fill, c2_xavier_fill from mmdet.models.utils.builder import TRANSFORMER from mmcv.cnn.bricks.transformer import FFN, build_positional_encoding from mmcv.cnn import (build_activation_layer, build_conv_layer, build_norm_layer, xavier_init, constant_init) from mmcv.runner.base_module import BaseModule from mmcv.cnn.bricks.transformer import (BaseTransformerLayer, TransformerLayerSequence, build_transformer_layer_sequence) from mmcv.cnn.bricks.registry import (ATTENTION,TRANSFORMER_LAYER, TRANSFORMER_LAYER_SEQUENCE) from mmcv.ops.multi_scale_deform_attn import MultiScaleDeformableAttnFunction from .scatter_utils import scatter_mean from .utils import inverse_sigmoid def pos2posemb3d(pos, num_pos_feats=128, temperature=10000): scale = 2 * math.pi pos = pos * scale dim_t = torch.arange(num_pos_feats, dtype=torch.float32, device=pos.device) dim_t = temperature ** (2 * (dim_t // 2) / num_pos_feats) pos_x = pos[..., 0, None] / dim_t pos_y = pos[..., 1, None] / dim_t pos_z = pos[..., 2, None] / dim_t pos_x = torch.stack((pos_x[..., 0::2].sin(), pos_x[..., 1::2].cos()), dim=-1).flatten(-2) pos_y = torch.stack((pos_y[..., 0::2].sin(), pos_y[..., 1::2].cos()), dim=-1).flatten(-2) pos_z = torch.stack((pos_z[..., 0::2].sin(), pos_z[..., 1::2].cos()), dim=-1).flatten(-2) posemb = torch.cat((pos_y, pos_x, pos_z), dim=-1) return posemb def generate_ref_pt(minx, miny, maxx, maxy, z, nx, ny, device='cuda'): if isinstance(z, list): nz = z[-1] # minx, miny, maxx, maxy : in ground coords xs = torch.linspace(minx, maxx, nx, dtype=torch.float, device=device ).view(1, -1, 1).expand(ny, nx, nz) ys = torch.linspace(miny, maxy, ny, dtype=torch.float, device=device ).view(-1, 1, 1).expand(ny, nx, nz) zs = torch.linspace(z[0], z[1], nz, dtype=torch.float, device=device ).view(1, 1, -1).expand(ny, nx, nz) ref_3d = torch.stack([xs, ys, zs], dim=-1) ref_3d = ref_3d.flatten(1, 2) else: # minx, miny, maxx, maxy : in ground coords xs = torch.linspace(minx, maxx, nx, dtype=torch.float, device=device ).view(1, -1, 1).expand(ny, nx, 1) ys = torch.linspace(miny, maxy, ny, dtype=torch.float, device=device ).view(-1, 1, 1).expand(ny, nx, 1) ref_3d = F.pad(torch.cat([xs, ys], dim=-1), (0, 1), mode='constant', value=z) return ref_3d def ground2img(coords3d, H, W, lidar2img, ori_shape, mask=None, return_img_pts=False): coords3d = coords3d.clone() img_pt = coords3d.flatten(1, 2) @ lidar2img.permute(0, 2, 1) img_pt = torch.cat([ img_pt[..., :2] / torch.maximum( img_pt[..., 2:3], torch.ones_like(img_pt[..., 2:3]) * 1e-5), img_pt[..., 2:] ], dim=-1) # rescale to feature_map size x = img_pt[..., 0] / ori_shape[0][1] * (W - 1) y = img_pt[..., 1] / ori_shape[0][0] * (H - 1) valid = (x >= 0) * (y >= 0) * (x <= (W - 1)) \ * (y <= (H - 1)) * (img_pt[..., 2] > 0) if return_img_pts: return x, y, valid if mask is not None: valid = valid * mask.flatten(1, 2).float() # B, C, H, W = img_feats.shape B = coords3d.shape[0] canvas = torch.zeros((B, H, W, 3 + 1), dtype=torch.float32, device=coords3d.device) x = x.long() y = y.long() ind = (x + y * W) * valid.long() # ind = torch.clamp(ind, 0, H * W - 1) ind = ind.long().unsqueeze(-1).repeat(1, 1, canvas.shape[-1]) canvas = canvas.flatten(1, 2) target = coords3d.flatten(1, 2).clone() scatter_mean(target, ind, out=canvas, dim=1) canvas = canvas.view(B, H, W, canvas.shape[-1] ).permute(0, 3, 1, 2).contiguous() canvas[:, :, 0, 0] = 0 return canvas @ATTENTION.register_module() class MSDeformableAttention3D(BaseModule): def __init__(self, embed_dims=256, num_heads=8, num_levels=4, num_points=8, im2col_step=64, dropout=0.1, num_query=None, num_anchor_per_query=None, anchor_y_steps=None, batch_first=False, norm_cfg=None, init_cfg=None): super().__init__(init_cfg) if embed_dims % num_heads != 0: raise ValueError(f'embed_dims must be divisible by num_heads, ' f'but got {embed_dims} and {num_heads}') dim_per_head = embed_dims // num_heads self.norm_cfg = norm_cfg self.batch_first = batch_first self.output_proj = None self.fp16_enabled = False self.num_query = num_query self.num_anchor_per_query = num_anchor_per_query self.register_buffer('anchor_y_steps', torch.from_numpy(anchor_y_steps).float()) self.num_points_per_anchor = len(anchor_y_steps) // num_anchor_per_query # you'd better set dim_per_head to a power of 2 # which is more efficient in the CUDA implementation def _is_power_of_2(n): if (not isinstance(n, int)) or (n < 0): raise ValueError( 'invalid input for _is_power_of_2: {} (type: {})'.format( n, type(n))) return (n & (n - 1) == 0) and n != 0 if not _is_power_of_2(dim_per_head): warnings.warn( "You'd better set embed_dims in " 'MultiScaleDeformAttention to make ' 'the dimension of each attention head a power of 2 ' 'which is more efficient in our CUDA implementation.') self.im2col_step = im2col_step self.embed_dims = embed_dims self.num_levels = num_levels self.num_heads = num_heads self.num_points = num_points self.sampling_offsets = nn.Linear( embed_dims, num_heads * num_levels * num_points * 2 * self.num_points_per_anchor) self.attention_weights = nn.Linear(embed_dims, num_heads * num_levels * num_points * self.num_points_per_anchor) self.value_proj = nn.Linear(embed_dims, embed_dims) self.init_weights() def init_weights(self): """Default initialization for Parameters of Module.""" constant_init(self.sampling_offsets, 0.) thetas = torch.arange( self.num_heads, dtype=torch.float32) * (2.0 * math.pi / self.num_heads) grid_init = torch.stack([thetas.cos(), thetas.sin()], -1) grid_init = (grid_init / grid_init.abs().max(-1, keepdim=True)[0]).view( self.num_heads, 1, 1, 1, 2).repeat(1, self.num_points_per_anchor, self.num_levels, self.num_points, 1) for i in range(self.num_points): grid_init[..., i, :] *= i + 1 self.sampling_offsets.bias.data = grid_init.view(-1) constant_init(self.attention_weights, val=0., bias=0.) xavier_init(self.value_proj, distribution='uniform', bias=0.) xavier_init(self.output_proj, distribution='uniform', bias=0.) self._is_init = True def ref_to_lidar(self, reference_points, pc_range, not_y=True): reference_points = reference_points.clone() reference_points[..., 0:1] = reference_points[..., 0:1] * \ (pc_range[3] - pc_range[0]) + pc_range[0] if not not_y: reference_points[..., 1:2] = reference_points[..., 1:2] * \ (pc_range[4] - pc_range[1]) + pc_range[1] reference_points[..., 2:3] = reference_points[..., 2:3] * \ (pc_range[5] - pc_range[2]) + pc_range[2] return reference_points def point_sampling(self, reference_points, lidar2img, ori_shape): x, y, mask = ground2img( reference_points, H=2, W=2, lidar2img=lidar2img, ori_shape=ori_shape, mask=None, return_img_pts=True) return torch.stack([x, y], -1), mask def forward(self, query, key=None, value=None, identity=None, query_pos=None, key_padding_mask=None, reference_points=None, spatial_shapes=None, level_start_index=None, pc_range=None, lidar2img=None, pad_shape=None, key_pos=None, **kwargs): if value is None: assert False value = key if identity is None: identity = query if query_pos is not None: query = query + query_pos if key_pos is not None: value = value + key_pos if not self.batch_first: # change to (bs, num_query ,embed_dims) query = query.permute(1, 0, 2) value = value.permute(1, 0, 2) bs, num_query, _ = query.shape bs, num_value, _ = value.shape assert (spatial_shapes[:, 0] * spatial_shapes[:, 1]).sum() == num_value value = self.value_proj(value) if key_padding_mask is not None: value = value.masked_fill(key_padding_mask[..., None], 0.0) value = value.view(bs, num_value, self.num_heads, -1) sampling_offsets = self.sampling_offsets(query).view( bs, num_query, self.num_points_per_anchor, self.num_heads, self.num_levels, self.num_points, 2) attention_weights = self.attention_weights(query).view( bs, num_query, self.num_points_per_anchor, self.num_heads, self.num_levels * self.num_points) attention_weights = attention_weights.softmax(-1) attention_weights = attention_weights.view(bs, num_query, self.num_points_per_anchor, self.num_heads, self.num_levels, self.num_points) reference_points = reference_points.view( bs, self.num_query, self.num_anchor_per_query, -1, 2) ref_pt3d = torch.cat([ reference_points[..., 0:1], # x self.anchor_y_steps.view(1, 1, self.num_anchor_per_query, -1, 1 ).expand_as(reference_points[..., 0:1]), # y reference_points[..., 1:2] # z ], dim=-1) sampling_locations = self.ref_to_lidar(ref_pt3d, pc_range, not_y=True) sampling_locations2d, mask = self.point_sampling( F.pad(sampling_locations.flatten(1, 2), (0, 1), value=1), lidar2img=lidar2img, ori_shape=pad_shape, ) sampling_locations2d = sampling_locations2d.view( *sampling_locations.shape[:-1], 2) offset_normalizer = torch.stack( [spatial_shapes[..., 1], spatial_shapes[..., 0]], -1) sampling_offsets = sampling_offsets / \ offset_normalizer[None, None, None, :, None, :] sampling_locations2d = sampling_locations2d.view( bs, self.num_query, self.num_anchor_per_query, -1, 1, 1, 1, 2) \ + sampling_offsets.view( bs, self.num_query, self.num_anchor_per_query, self.num_points_per_anchor, *sampling_offsets.shape[3:] ) # reshape, move self.num_anchor_per_query to last axis sampling_locations2d = sampling_locations2d.permute(0, 1, 2, 4, 5, 6, 3, 7) attention_weights = attention_weights.permute(0, 1, 3, 4, 5, 2) sampling_locations2d = sampling_locations2d.flatten(-3, -2) attention_weights = attention_weights.flatten(-2) / self.num_points_per_anchor xy = 2 num_all_points = sampling_locations2d.shape[-2] sampling_locations2d = sampling_locations2d.view( bs, num_query, self.num_heads, self.num_levels, num_all_points, xy) if torch.cuda.is_available() and value.is_cuda: output = MultiScaleDeformableAttnFunction.apply( value, spatial_shapes, level_start_index, sampling_locations2d, attention_weights, self.im2col_step) else: output = multi_scale_deformable_attn_pytorch( value, spatial_shapes, sampling_locations2d, attention_weights) if not self.batch_first: output = output.permute(1, 0, 2) return output @TRANSFORMER_LAYER.register_module() class LATRDecoderLayer(BaseTransformerLayer): def __init__(self, attn_cfgs, feedforward_channels, ffn_dropout=0.0, operation_order=None, act_cfg=dict(type='ReLU', inplace=True), norm_cfg=dict(type='LN'), ffn_num_fcs=2, **kwargs): super(LATRDecoderLayer, self).__init__( attn_cfgs=attn_cfgs, feedforward_channels=feedforward_channels, ffn_dropout=ffn_dropout, operation_order=operation_order, act_cfg=act_cfg, norm_cfg=norm_cfg, ffn_num_fcs=ffn_num_fcs, **kwargs) def forward(self, query, key=None, value=None, query_pos=None, key_pos=None, attn_masks=None, query_key_padding_mask=None, key_padding_mask=None, **kwargs): query = super().forward( query=query, key=key, value=value, query_pos=query_pos, key_pos=key_pos, attn_masks=attn_masks, query_key_padding_mask=query_key_padding_mask, key_padding_mask=key_padding_mask, **kwargs) return query @TRANSFORMER_LAYER_SEQUENCE.register_module() class LATRTransformerDecoder(TransformerLayerSequence): def __init__(self, *args, embed_dims=None, post_norm_cfg=dict(type='LN'), enlarge_length=10, M_decay_ratio=10, num_query=None, num_anchor_per_query=None, anchor_y_steps=None, **kwargs): super(LATRTransformerDecoder, self).__init__(*args, **kwargs) if post_norm_cfg is not None: self.post_norm = build_norm_layer(post_norm_cfg, self.embed_dims)[1] else: self.post_norm = None self.num_query = num_query self.num_anchor_per_query = num_anchor_per_query self.anchor_y_steps = anchor_y_steps self.num_points_per_anchor = len(anchor_y_steps) // num_anchor_per_query self.embed_dims = embed_dims self.gflat_pred_layer = nn.Sequential( nn.Conv2d(embed_dims + 4, embed_dims, 3, stride=2, padding=1, bias=False), nn.BatchNorm2d(embed_dims), nn.ReLU(True), nn.Conv2d(embed_dims, embed_dims, 3, stride=2, padding=1, bias=False), nn.BatchNorm2d(embed_dims), nn.ReLU(True), nn.AdaptiveAvgPool2d(1), nn.Conv2d(embed_dims, embed_dims, 1), nn.BatchNorm2d(embed_dims), nn.ReLU(True), nn.Conv2d(embed_dims, embed_dims // 4, 1), nn.BatchNorm2d(embed_dims // 4), nn.ReLU(True), nn.Conv2d(embed_dims // 4, 2, 1)) self.position_encoder = nn.Sequential( nn.Conv2d(3, self.embed_dims*4, kernel_size=1, stride=1, padding=0), nn.ReLU(), nn.Conv2d(self.embed_dims*4, self.embed_dims, kernel_size=1, stride=1, padding=0), ) self.M_decay_ratio = M_decay_ratio self.enlarge_length = enlarge_length def init_weights(self): super().init_weights() for l in self.gflat_pred_layer: xavier_init(l, gain=0.01) def pred2M(self, pitch_z): pitch_z = pitch_z / self.M_decay_ratio t = pitch_z[:, 0] / 100 z = pitch_z[:, 1] one = torch.ones_like(t) zero = torch.zeros_like(t) # rot first, then translate M = torch.stack([ one, zero, zero, zero, zero, t.cos(), -t.sin(), zero, zero, t.sin(), t.cos(), z, zero, zero, zero, one], dim=-1).view(t.shape[0], 4, 4) return M def forward(self, query, key, value, top_view_region=None, z_region=None, bev_h=None, bev_w=None, init_z=0, img_feats=None, lidar2img=None, pad_shape=None, key_pos=None, key_padding_mask=None, sin_embed=None, reference_points=None, reg_branches=None, cls_branches=None, query_pos=None, **kwargs): assert key_pos is None assert key_padding_mask is None # init pts and M to generate pos embed for key/value batch_size = query.shape[1] xmin = top_view_region[0][0] - self.enlarge_length xmax = top_view_region[1][0] + self.enlarge_length ymin = top_view_region[2][1] - self.enlarge_length ymax = top_view_region[0][1] + self.enlarge_length zmin = z_region[0] zmax = z_region[1] init_ref_3d = generate_ref_pt( xmin, ymin, xmax, ymax, init_z, bev_w, bev_h, query.device) init_ref_3d = init_ref_3d[None, ...].repeat(batch_size, 1, 1, 1) ref_3d_homo = F.pad(init_ref_3d, (0, 1), value=1) init_ref_3d_homo = ref_3d_homo.clone() init_M = torch.eye(4, device=query.device).float() M = init_M[None, ...].repeat(batch_size, 1, 1) intermediate = [] project_results = [] outputs_classes = [] outputs_coords = [] for layer_idx, layer in enumerate(self.layers): coords_img = ground2img( ref_3d_homo, *img_feats[0].shape[-2:], lidar2img, pad_shape) if layer_idx > 0: project_results.append(coords_img.clone()) coords_img_key_pos = coords_img.clone() ground_coords = coords_img_key_pos[:, :3, ...] img_mask = coords_img_key_pos[:, -1, ...] ground_coords[:, 0, ...] = (ground_coords[:, 0, ...] - xmin) / (xmax - xmin) ground_coords[:, 1, ...] = (ground_coords[:, 1, ...] - ymin) / (ymax - ymin) ground_coords[:, 2, ...] = (ground_coords[:, 2, ...] - zmin) / (zmax - zmin) ground_coords = inverse_sigmoid(ground_coords) key_pos = self.position_encoder(ground_coords) query = layer(query, key=key, value=value, key_pos=(key_pos + sin_embed ).flatten(2, 3).permute(2, 0, 1).contiguous(), reference_points=reference_points, pc_range=[xmin, ymin, zmin, xmax, ymax, zmax], pad_shape=pad_shape, lidar2img=lidar2img, query_pos=query_pos, **kwargs) # update M if layer_idx < len(self.layers) - 1: input_feat = torch.cat([img_feats[0], coords_img], dim=1) M = M.detach() @ self.pred2M(self.gflat_pred_layer(input_feat).squeeze(-1).squeeze(-1)) ref_3d_homo = (init_ref_3d_homo.flatten(1, 2) @ M.permute(0, 2, 1) ).view(*ref_3d_homo.shape) if self.post_norm is not None: intermediate.append(self.post_norm(query)) else: intermediate.append(query) query = query.permute(1, 0, 2) tmp = reg_branches[layer_idx](query) bs = tmp.shape[0] # iterative update tmp = tmp.view(bs, self.num_query, self.num_anchor_per_query, -1, 3) reference_points = reference_points.view( bs, self.num_query, self.num_anchor_per_query, self.num_points_per_anchor, 2 ) reference_points = inverse_sigmoid(reference_points) new_reference_points = torch.stack([ reference_points[..., 0] + tmp[..., 0], reference_points[..., 1] + tmp[..., 1], ], dim=-1) reference_points = new_reference_points.sigmoid() cls_feat = query.view(bs, self.num_query, self.num_anchor_per_query, -1) cls_feat = torch.max(cls_feat, dim=2)[0] outputs_class = cls_branches[layer_idx](cls_feat) outputs_classes.append(outputs_class) outputs_coords.append(torch.cat([ reference_points, tmp[..., -1:] ], dim=-1)) reference_points = reference_points.view( bs, self.num_query * self.num_anchor_per_query, self.num_points_per_anchor * 2 ).detach() query = query.permute(1, 0, 2) return torch.stack(intermediate), project_results, outputs_classes, outputs_coords @TRANSFORMER.register_module() class LATRTransformer(BaseModule): def __init__(self, encoder=None, decoder=None, init_cfg=None): super(LATRTransformer, self).__init__(init_cfg=init_cfg) if encoder is not None: self.encoder = build_transformer_layer_sequence(encoder) else: self.encoder = None self.decoder = build_transformer_layer_sequence(decoder) self.embed_dims = self.decoder.embed_dims self.init_weights() def init_weights(self): # follow the official DETR to init parameters for m in self.modules(): if hasattr(m, 'weight') and m.weight.dim() > 1: xavier_init(m, distribution='uniform') self._is_init = True @staticmethod def get_reference_points(spatial_shapes, valid_ratios, device): """Get the reference points used in decoder. Args: spatial_shapes (Tensor): The shape of all feature maps, has shape (num_level, 2). valid_ratios (Tensor): The radios of valid points on the feature map, has shape (bs, num_levels, 2) device (obj:`device`): The device where reference_points should be. Returns: Tensor: reference points used in decoder, has \ shape (bs, num_keys, num_levels, 2). """ reference_points_list = [] for lvl, (H, W) in enumerate(spatial_shapes): # TODO check this 0.5 ref_y, ref_x = torch.meshgrid( torch.linspace( 0.5, H - 0.5, H, dtype=torch.float32, device=device), torch.linspace( 0.5, W - 0.5, W, dtype=torch.float32, device=device)) ref_y = ref_y.reshape(-1)[None] / (valid_ratios[:, None, lvl, 1] * H) ref_x = ref_x.reshape(-1)[None] / (valid_ratios[:, None, lvl, 0] * W) ref = torch.stack((ref_x, ref_y), -1) reference_points_list.append(ref) reference_points = torch.cat(reference_points_list, 1) reference_points = reference_points[:, :, None] * valid_ratios[:, None] return reference_points def get_valid_ratio(self, mask): """Get the valid radios of feature maps of all level.""" _, H, W = mask.shape valid_H = torch.sum(~mask[:, :, 0], 1) valid_W = torch.sum(~mask[:, 0, :], 1) valid_ratio_h = valid_H.float() / H valid_ratio_w = valid_W.float() / W valid_ratio = torch.stack([valid_ratio_w, valid_ratio_h], -1) return valid_ratio @property def with_encoder(self): return hasattr(self, 'encoder') and self.encoder def forward(self, x, mask, query, query_embed, pos_embed, reference_points=None, reg_branches=None, cls_branches=None, spatial_shapes=None, level_start_index=None, mlvl_masks=None, mlvl_positional_encodings=None, pos_embed2d=None, **kwargs): assert pos_embed is None memory = x # encoder if hasattr(self, 'encoder') and self.encoder: B = x.shape[1] valid_ratios = torch.stack( [self.get_valid_ratio(m) for m in mlvl_masks], 1) reference_points_2d = \ self.get_reference_points(spatial_shapes, valid_ratios, device=x.device) memory = self.encoder( query=memory, key=memory, value=memory, key_pos=pos_embed2d, query_pos=pos_embed2d, spatial_shapes=spatial_shapes, level_start_index=level_start_index, reference_points=reference_points_2d, valid_ratios=valid_ratios, ) query_embed = query_embed.permute(1, 0, 2) if mask is not None: mask = mask.view(bs, -1) # [bs, n, h, w] -> [bs, n*h*w] (n=1) target = query.permute(1, 0, 2) # out_dec: [num_layers, num_query, bs, dim] out_dec, project_results, outputs_classes, outputs_coords = \ self.decoder( query=target, key=memory, value=memory, key_pos=pos_embed, query_pos=query_embed, key_padding_mask=mask.astype(torch.bool) if mask is not None else None, reg_branches=reg_branches, cls_branches=cls_branches, reference_points=reference_points, spatial_shapes=spatial_shapes, level_start_index=level_start_index, **kwargs ) return out_dec.permute(0, 2, 1, 3), project_results, \ outputs_classes, outputs_coords ================================================ FILE: models/utils.py ================================================ import torch import geffnet import torch.nn as nn def inverse_sigmoid(x, eps=1e-5): """Inverse function of sigmoid. Args: x (Tensor): The tensor to do the inverse. eps (float): EPS avoid numerical overflow. Defaults 1e-5. Returns: Tensor: The x has passed the inverse function of sigmoid, has same shape with input. """ x = x.clamp(min=0, max=1) x1 = x.clamp(min=eps) x2 = (1 - x).clamp(min=eps) return torch.log(x1 / x2) class deepFeatureExtractor_EfficientNet(nn.Module): def __init__(self, architecture="EfficientNet-B5", lv6=False, lv5=False, lv4=False, lv3=False): super(deepFeatureExtractor_EfficientNet, self).__init__() assert architecture in ["EfficientNet-B0", "EfficientNet-B1", "EfficientNet-B2", "EfficientNet-B3", "EfficientNet-B4", "EfficientNet-B5", "EfficientNet-B6", "EfficientNet-B7"] if architecture == "EfficientNet-B0": self.encoder = geffnet.tf_efficientnet_b0_ns(pretrained=True) self.dimList = [16, 24, 40, 112, 1280] #5th feature is extracted after conv_head or bn2 #self.dimList = [16, 24, 40, 112, 320] #5th feature is extracted after blocks[6] elif architecture == "EfficientNet-B1": self.encoder = geffnet.tf_efficientnet_b1_ns(pretrained=True) self.dimList = [16, 24, 40, 112, 1280] #5th feature is extracted after conv_head or bn2 #self.dimList = [16, 24, 40, 112, 320] #5th feature is extracted after blocks[6] elif architecture == "EfficientNet-B2": self.encoder = geffnet.tf_efficientnet_b2_ns(pretrained=True) self.dimList = [16, 24, 48, 120, 1408] #5th feature is extracted after conv_head or bn2 #self.dimList = [16, 24, 48, 120, 352] #5th feature is extracted after blocks[6] elif architecture == "EfficientNet-B3": self.encoder = geffnet.tf_efficientnet_b3_ns(pretrained=True) self.dimList = [24, 32, 48, 136, 1536] #5th feature is extracted after conv_head or bn2 #self.dimList = [24, 32, 48, 136, 384] #5th feature is extracted after blocks[6] elif architecture == "EfficientNet-B4": self.encoder = geffnet.tf_efficientnet_b4_ns(pretrained=True) self.dimList = [24, 32, 56, 160, 1792] #5th feature is extracted after conv_head or bn2 #self.dimList = [24, 32, 56, 160, 448] #5th feature is extracted after blocks[6] elif architecture == "EfficientNet-B5": self.encoder = geffnet.tf_efficientnet_b5_ns(pretrained=True) self.dimList = [24, 40, 64, 176, 2048] #5th feature is extracted after conv_head or bn2 #self.dimList = [24, 40, 64, 176, 512] #5th feature is extracted after blocks[6] elif architecture == "EfficientNet-B6": self.encoder = geffnet.tf_efficientnet_b6_ns(pretrained=True) self.dimList = [32, 40, 72, 200, 2304] #5th feature is extracted after conv_head or bn2 #self.dimList = [32, 40, 72, 200, 576] #5th feature is extracted after blocks[6] elif architecture == "EfficientNet-B7": self.encoder = geffnet.tf_efficientnet_b7_ns(pretrained=True) self.dimList = [32, 48, 80, 224, 2560] #5th feature is extracted after conv_head or bn2 #self.dimList = [32, 48, 80, 224, 640] #5th feature is extracted after blocks[6] del self.encoder.global_pool del self.encoder.classifier #self.block_idx = [3, 4, 5, 7, 9] #5th feature is extracted after blocks[6] #self.block_idx = [3, 4, 5, 7, 10] #5th feature is extracted after conv_head self.block_idx = [3, 4, 5, 7, 11] #5th feature is extracted after bn2 if lv6 is False: del self.encoder.blocks[6] del self.encoder.conv_head del self.encoder.bn2 del self.encoder.act2 self.block_idx = self.block_idx[:4] self.dimList = self.dimList[:4] if lv5 is False: del self.encoder.blocks[5] self.block_idx = self.block_idx[:3] self.dimList = self.dimList[:3] if lv4 is False: del self.encoder.blocks[4] self.block_idx = self.block_idx[:2] self.dimList = self.dimList[:2] if lv3 is False: del self.encoder.blocks[3] self.block_idx = self.block_idx[:1] self.dimList = self.dimList[:1] # after passing blocks[3] : H/2 x W/2 # after passing blocks[4] : H/4 x W/4 # after passing blocks[5] : H/8 x W/8 # after passing blocks[7] : H/16 x W/16 # after passing conv_stem : H/32 x W/32 self.fixList = ['blocks.0.0','bn'] for name, parameters in self.encoder.named_parameters(): if name == 'conv_stem.weight': parameters.requires_grad = False if any(x in name for x in self.fixList): parameters.requires_grad = False def forward(self, x): out_featList = [] feature = x cnt = 0 block_cnt = 0 for k, v in self.encoder._modules.items(): if k == 'act2': break if k == 'blocks': for m, n in v._modules.items(): feature = n(feature) try: if self.block_idx[block_cnt] == cnt: out_featList.append(feature) block_cnt += 1 break cnt += 1 except: continue else: feature = v(feature) if self.block_idx[block_cnt] == cnt: out_featList.append(feature) block_cnt += 1 break cnt += 1 return out_featList def freeze_bn(self, enable=False): """ Adapted from https://discuss.pytorch.org/t/how-to-train-with-frozen-batchnorm/12106/8 """ for module in self.modules(): if isinstance(module, nn.BatchNorm2d): module.train() if enable else module.eval() module.weight.requires_grad = enable module.bias.requires_grad = enable ================================================ FILE: pretrained_models/.gitkeep ================================================ ================================================ FILE: requirements.txt ================================================ opencv_python==4.2.0.32 Shapely==1.7.0 xmljson==0.2.0 thop==0.0.31.post2005241907 matplotlib==3.5.1 scipy==1.4.1 p_tqdm==1.3.3 lxml==4.5.0 tqdm==4.43.0 ujson==1.35 PyYAML==5.3.1 scikit_learn==0.23.2 tensorboard==2.3.0 gdown==4.4.0 ortools==9.2.9972 geffnet==1.0.2 tensorboardX==2.5 ================================================ FILE: utils/MinCostFlow.py ================================================ # ============================================================================== # Binaries and/or source for the following packages or projects are presented under one or more of the following open # source licenses: # MinCostFlow.py The PersFormer Authors Apache License, Version 2.0 # # Contact simachonghao@pjlab.org.cn if you have any issue # # See: # https://github.com/yuliangguo/Pytorch_Generalized_3D_Lane_Detection/blob/master/tools/MinCostFlow.py # # Copyright (c) 2022 The PersFormer Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== from __future__ import print_function import numpy as np from ortools.graph import pywrapgraph import time def SolveMinCostFlow(adj_mat, cost_mat): """ Solving an Assignment Problem with MinCostFlow" :param adj_mat: adjacency matrix with binary values indicating possible matchings between two sets :param cost_mat: cost matrix recording the matching cost of every possible pair of items from two sets :return: """ # Instantiate a SimpleMinCostFlow solver. min_cost_flow = pywrapgraph.SimpleMinCostFlow() # Define the directed graph for the flow. cnt_1, cnt_2 = adj_mat.shape cnt_nonzero_row = int(np.sum(np.sum(adj_mat, axis=1) > 0)) cnt_nonzero_col = int(np.sum(np.sum(adj_mat, axis=0) > 0)) # prepare directed graph for the flow start_nodes = np.zeros(cnt_1, dtype=np.int).tolist() +\ np.repeat(np.array(range(1, cnt_1+1)), cnt_2).tolist() + \ [i for i in range(cnt_1+1, cnt_1 + cnt_2 + 1)] end_nodes = [i for i in range(1, cnt_1+1)] + \ np.repeat(np.array([i for i in range(cnt_1+1, cnt_1 + cnt_2 + 1)]).reshape([1, -1]), cnt_1, axis=0).flatten().tolist() + \ [cnt_1 + cnt_2 + 1 for i in range(cnt_2)] capacities = np.ones(cnt_1, dtype=np.int).tolist() + adj_mat.flatten().astype(np.int).tolist() + np.ones(cnt_2, dtype=np.int).tolist() costs = (np.zeros(cnt_1, dtype=np.int).tolist() + cost_mat.flatten().astype(np.int).tolist() + np.zeros(cnt_2, dtype=np.int).tolist()) # Define an array of supplies at each node. supplies = [min(cnt_nonzero_row, cnt_nonzero_col)] + np.zeros(cnt_1 + cnt_2, dtype=np.int).tolist() + [-min(cnt_nonzero_row, cnt_nonzero_col)] # supplies = [min(cnt_1, cnt_2)] + np.zeros(cnt_1 + cnt_2, dtype=np.int).tolist() + [-min(cnt_1, cnt_2)] source = 0 sink = cnt_1 + cnt_2 + 1 # Add each arc. for i in range(len(start_nodes)): min_cost_flow.AddArcWithCapacityAndUnitCost(start_nodes[i], end_nodes[i], capacities[i], costs[i]) # Add node supplies. for i in range(len(supplies)): min_cost_flow.SetNodeSupply(i, supplies[i]) match_results = [] # Find the minimum cost flow between node 0 and node 10. if min_cost_flow.Solve() == min_cost_flow.OPTIMAL: # print('Total cost = ', min_cost_flow.OptimalCost()) # print() for arc in range(min_cost_flow.NumArcs()): # Can ignore arcs leading out of source or into sink. if min_cost_flow.Tail(arc)!=source and min_cost_flow.Head(arc)!=sink: # Arcs in the solution have a flow value of 1. Their start and end nodes # give an assignment of worker to task. if min_cost_flow.Flow(arc) > 0: # print('set A item %d assigned to set B item %d. Cost = %d' % ( # min_cost_flow.Tail(arc)-1, # min_cost_flow.Head(arc)-cnt_1-1, # min_cost_flow.UnitCost(arc))) match_results.append([min_cost_flow.Tail(arc)-1, min_cost_flow.Head(arc)-cnt_1-1, min_cost_flow.UnitCost(arc)]) else: print('There was an issue with the min cost flow input.') return match_results def main(): """Solving an Assignment Problem with MinCostFlow""" # Instantiate a SimpleMinCostFlow solver. min_cost_flow = pywrapgraph.SimpleMinCostFlow() # Define the directed graph for the flow. start_nodes = [0, 0, 0, 0] + [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4] + [5, 6, 7, 8] end_nodes = [1, 2, 3, 4] + [5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8, 5, 6, 7, 8] + [9, 9, 9, 9] capacities = [1, 1, 1, 1] + [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1] + [1, 1, 1, 1] costs = ([0, 0, 0, 0] + [90, 76, 75, 70, 35, 85, 55, 65, 125, 95, 90, 105, 45, 110, 95, 115] + [0, 0, 0, 0]) # Define an array of supplies at each node. supplies = [4, 0, 0, 0, 0, 0, 0, 0, 0, -4] source = 0 sink = 9 tasks = 4 # Add each arc. for i in range(len(start_nodes)): min_cost_flow.AddArcWithCapacityAndUnitCost(start_nodes[i], end_nodes[i], capacities[i], costs[i]) # Add node supplies. for i in range(len(supplies)): min_cost_flow.SetNodeSupply(i, supplies[i]) # Find the minimum cost flow between node 0 and node 10. if min_cost_flow.Solve() == min_cost_flow.OPTIMAL: print('Total cost = ', min_cost_flow.OptimalCost()) print() for arc in range(min_cost_flow.NumArcs()): # Can ignore arcs leading out of source or into sink. if min_cost_flow.Tail(arc)!=source and min_cost_flow.Head(arc)!=sink: # Arcs in the solution have a flow value of 1. Their start and end nodes # give an assignment of worker to task. if min_cost_flow.Flow(arc) > 0: print('Worker %d assigned to task %d. Cost = %d' % ( min_cost_flow.Tail(arc), min_cost_flow.Head(arc), min_cost_flow.UnitCost(arc))) else: print('There was an issue with the min cost flow input.') if __name__ == '__main__': start_time = time.clock() main() print() print("Time =", time.clock() - start_time, "seconds") ================================================ FILE: utils/__init__.py ================================================ ================================================ FILE: utils/eval_3D_lane.py ================================================ # ============================================================================== # Binaries and/or source for the following packages or projects are presented under one or more of the following open # source licenses: # eval_3D_lane.py The OpenLane Dataset Authors Apache License, Version 2.0 # # Contact simachonghao@pjlab.org.cn if you have any issue # # See: # https://github.com/yuliangguo/Pytorch_Generalized_3D_Lane_Detection/blob/master/tools/eval_3D_lane.py # # Copyright (c) 2022 The OpenLane Dataset Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """ Description: This code is to evaluate 3D lane detection. The optimal matching between ground-truth set and predicted set of lanes are sought via solving a min cost flow. Evaluation metrics includes: F-scores x error close (0 - 40 m) x error far (0 - 100 m) z error close (0 - 40 m) z error far (0 - 100 m) """ from copy import deepcopy import numpy as np from utils.utils import * from utils.MinCostFlow import SolveMinCostFlow class LaneEval(object): def __init__(self, args, logger): self.dataset_name = args.dataset_name self.dataset_dir = args.dataset_dir self.x_min = args.top_view_region[0, 0] self.x_max = args.top_view_region[1, 0] self.y_min = args.top_view_region[2, 1] self.y_max = args.top_view_region[0, 1] self.y_samples = np.linspace(self.y_min, self.y_max, num=100, endpoint=False) self.dist_th = 1.5 self.ratio_th = 0.75 self.close_range = 40 self.is_apollo = 'apollo' in self.dataset_name self.args = args self.logger = logger def bench(self, pred_lanes, pred_category, gt_lanes, gt_visibility, gt_category, raw_file, gt_cam_height, gt_cam_pitch, vis, P_g2im=None): """ Matching predicted lanes and ground-truth lanes in their IPM projection, ignoring z attributes. x error, y_error, and z error are all considered, although the matching does not rely on z The input of prediction and ground-truth lanes are in ground coordinate, x-right, y-forward, z-up The fundamental assumption is: 1. there are no two points from different lanes with identical x, y but different z's 2. there are no two points from a single lane having identical x, y but different z's If the interest area is within the current drivable road, the above assumptions are almost always valid. :param pred_lanes: N X 2 or N X 3 lists depending on 2D or 3D :param gt_lanes: N X 2 or N X 3 lists depending on 2D or 3D :param raw_file: file path rooted in dataset folder :param gt_cam_height: camera height given in ground-truth data :param gt_cam_pitch: camera pitch given in ground-truth data :return: """ # change this properly close_range_idx = np.where(self.y_samples > self.close_range)[0][0] r_lane, p_lane, c_lane = 0., 0., 0. x_error_close = [] x_error_far = [] z_error_close = [] z_error_far = [] # only keep the visible portion gt_lanes = [prune_3d_lane_by_visibility(np.array(gt_lane), np.array(gt_visibility[k])) for k, gt_lane in enumerate(gt_lanes)] if 'openlane' in self.dataset_name: gt_category = [gt_category[k] for k, lane in enumerate(gt_lanes) if lane.shape[0] > 1] gt_lanes = [lane for lane in gt_lanes if lane.shape[0] > 1] # # only consider those pred lanes overlapping with sampling range pred_category = [pred_category[k] for k, lane in enumerate(pred_lanes) if np.array(lane)[0, 1] < self.y_samples[-1] and np.array(lane)[-1, 1] > self.y_samples[0]] pred_lanes = [lane for lane in pred_lanes if np.array(lane)[0, 1] < self.y_samples[-1] and np.array(lane)[-1, 1] > self.y_samples[0]] pred_lanes = [prune_3d_lane_by_range(np.array(lane), self.x_min, self.x_max) for lane in pred_lanes] pred_category = [pred_category[k] for k, lane in enumerate(pred_lanes) if np.array(lane).shape[0] > 1] pred_lanes = [lane for lane in pred_lanes if np.array(lane).shape[0] > 1] # only consider those gt lanes overlapping with sampling range gt_category = [gt_category[k] for k, lane in enumerate(gt_lanes) if lane[0, 1] < self.y_samples[-1] and lane[-1, 1] > self.y_samples[0]] gt_lanes = [lane for lane in gt_lanes if lane[0, 1] < self.y_samples[-1] and lane[-1, 1] > self.y_samples[0]] gt_lanes = [prune_3d_lane_by_range(np.array(lane), self.x_min, self.x_max) for lane in gt_lanes] gt_category = [gt_category[k] for k, lane in enumerate(gt_lanes) if lane.shape[0] > 1] gt_lanes = [lane for lane in gt_lanes if lane.shape[0] > 1] cnt_gt = len(gt_lanes) cnt_pred = len(pred_lanes) gt_visibility_mat = np.zeros((cnt_gt, 100)) pred_visibility_mat = np.zeros((cnt_pred, 100)) # resample gt and pred at y_samples for i in range(cnt_gt): min_y = np.min(np.array(gt_lanes[i])[:, 1]) max_y = np.max(np.array(gt_lanes[i])[:, 1]) x_values, z_values, visibility_vec = resample_laneline_in_y(np.array(gt_lanes[i]), self.y_samples, out_vis=True) gt_lanes[i] = np.vstack([x_values, z_values]).T gt_visibility_mat[i, :] = np.logical_and(x_values >= self.x_min, np.logical_and(x_values <= self.x_max, np.logical_and(self.y_samples >= min_y, self.y_samples <= max_y))) gt_visibility_mat[i, :] = np.logical_and(gt_visibility_mat[i, :], visibility_vec) for i in range(cnt_pred): # # ATTENTION: ensure y mono increase before interpolation: but it can reduce size # pred_lanes[i] = make_lane_y_mono_inc(np.array(pred_lanes[i])) # pred_lane = prune_3d_lane_by_range(np.array(pred_lanes[i]), self.x_min, self.x_max) min_y = np.min(np.array(pred_lanes[i])[:, 1]) max_y = np.max(np.array(pred_lanes[i])[:, 1]) x_values, z_values, visibility_vec = resample_laneline_in_y(np.array(pred_lanes[i]), self.y_samples, out_vis=True) pred_lanes[i] = np.vstack([x_values, z_values]).T pred_visibility_mat[i, :] = np.logical_and(x_values >= self.x_min, np.logical_and(x_values <= self.x_max, np.logical_and(self.y_samples >= min_y, self.y_samples <= max_y))) pred_visibility_mat[i, :] = np.logical_and(pred_visibility_mat[i, :], visibility_vec) # pred_visibility_mat[i, :] = np.logical_and(x_values >= self.x_min, x_values <= self.x_max) # at least two-points for both gt and pred gt_lanes = [gt_lanes[k] for k in range(cnt_gt) if np.sum(gt_visibility_mat[k, :]) > 1] gt_category = [gt_category[k] for k in range(cnt_gt) if np.sum(gt_visibility_mat[k, :]) > 1] gt_visibility_mat = gt_visibility_mat[np.sum(gt_visibility_mat, axis=-1) > 1, :] cnt_gt = len(gt_lanes) pred_lanes = [pred_lanes[k] for k in range(cnt_pred) if np.sum(pred_visibility_mat[k, :]) > 1] pred_category = [pred_category[k] for k in range(cnt_pred) if np.sum(pred_visibility_mat[k, :]) > 1] pred_visibility_mat = pred_visibility_mat[np.sum(pred_visibility_mat, axis=-1) > 1, :] cnt_pred = len(pred_lanes) adj_mat = np.zeros((cnt_gt, cnt_pred), dtype=int) cost_mat = np.zeros((cnt_gt, cnt_pred), dtype=int) cost_mat.fill(1000) num_match_mat = np.zeros((cnt_gt, cnt_pred), dtype=float) x_dist_mat_close = np.zeros((cnt_gt, cnt_pred), dtype=float) x_dist_mat_close.fill(1000.) x_dist_mat_far = np.zeros((cnt_gt, cnt_pred), dtype=float) x_dist_mat_far.fill(1000.) z_dist_mat_close = np.zeros((cnt_gt, cnt_pred), dtype=float) z_dist_mat_close.fill(1000.) z_dist_mat_far = np.zeros((cnt_gt, cnt_pred), dtype=float) z_dist_mat_far.fill(1000.) # compute curve to curve distance for i in range(cnt_gt): for j in range(cnt_pred): x_dist = np.abs(gt_lanes[i][:, 0] - pred_lanes[j][:, 0]) z_dist = np.abs(gt_lanes[i][:, 1] - pred_lanes[j][:, 1]) # apply visibility to penalize different partial matching accordingly both_visible_indices = np.logical_and(gt_visibility_mat[i, :] >= 0.5, pred_visibility_mat[j, :] >= 0.5) both_invisible_indices = np.logical_and(gt_visibility_mat[i, :] < 0.5, pred_visibility_mat[j, :] < 0.5) other_indices = np.logical_not(np.logical_or(both_visible_indices, both_invisible_indices)) euclidean_dist = np.sqrt(x_dist ** 2 + z_dist ** 2) euclidean_dist[both_invisible_indices] = 0 euclidean_dist[other_indices] = self.dist_th # if np.average(euclidean_dist) < 2*self.dist_th: # don't prune here to encourage finding perfect match num_match_mat[i, j] = np.sum(euclidean_dist < self.dist_th) - np.sum(both_invisible_indices) adj_mat[i, j] = 1 # ATTENTION: use the sum as int type to meet the requirements of min cost flow optimization (int type) # using num_match_mat as cost does not work? cost_ = np.sum(euclidean_dist) if cost_<1 and cost_>0: cost_ = 1 else: cost_ = (cost_).astype(int) cost_mat[i, j] = cost_ # use the both visible portion to calculate distance error if np.sum(both_visible_indices[:close_range_idx]) > 0: x_dist_mat_close[i, j] = np.sum( x_dist[:close_range_idx] * both_visible_indices[:close_range_idx]) / np.sum( both_visible_indices[:close_range_idx]) z_dist_mat_close[i, j] = np.sum( z_dist[:close_range_idx] * both_visible_indices[:close_range_idx]) / np.sum( both_visible_indices[:close_range_idx]) else: x_dist_mat_close[i, j] = -1 z_dist_mat_close[i, j] = -1 if np.sum(both_visible_indices[close_range_idx:]) > 0: x_dist_mat_far[i, j] = np.sum( x_dist[close_range_idx:] * both_visible_indices[close_range_idx:]) / np.sum( both_visible_indices[close_range_idx:]) z_dist_mat_far[i, j] = np.sum( z_dist[close_range_idx:] * both_visible_indices[close_range_idx:]) / np.sum( both_visible_indices[close_range_idx:]) else: x_dist_mat_far[i, j] = -1 z_dist_mat_far[i, j] = -1 # solve bipartite matching vis min cost flow solver match_results = SolveMinCostFlow(adj_mat, cost_mat) match_results = np.array(match_results) # only a match with avg cost < self.dist_th is consider valid one match_gt_ids = [] match_pred_ids = [] match_num = 0 if match_results.shape[0] > 0: for i in range(len(match_results)): if match_results[i, 2] < self.dist_th * self.y_samples.shape[0]: match_num += 1 gt_i = match_results[i, 0] pred_i = match_results[i, 1] # consider match when the matched points is above a ratio if num_match_mat[gt_i, pred_i] / np.sum(gt_visibility_mat[gt_i, :]) >= self.ratio_th: r_lane += 1 match_gt_ids.append(gt_i) if num_match_mat[gt_i, pred_i] / np.sum(pred_visibility_mat[pred_i, :]) >= self.ratio_th: p_lane += 1 match_pred_ids.append(pred_i) if pred_category != []: if self.args.num_category == 2: true_pred = pred_category[pred_i] == 1 else: true_pred = (pred_category[pred_i] == gt_category[gt_i]) or (pred_category[pred_i]==20 and gt_category[gt_i]==21) if true_pred: c_lane += 1 # category matched num x_error_close.append(x_dist_mat_close[gt_i, pred_i]) x_error_far.append(x_dist_mat_far[gt_i, pred_i]) z_error_close.append(z_dist_mat_close[gt_i, pred_i]) z_error_far.append(z_dist_mat_far[gt_i, pred_i]) # # Visulization to be added # if vis: # pass return r_lane, p_lane, c_lane, cnt_gt, cnt_pred, match_num, x_error_close, x_error_far, z_error_close, z_error_far def bench_one_submit(self, pred_dir, gt_dir, test_txt, prob_th=0.5, vis=False): pred_lines = open(test_txt).readlines() gt_lines = pred_lines json_pred = [] json_gt = [] print("Loading pred json ...") for pred_file_path in pred_lines: pred_lines = pred_dir + pred_file_path.strip('\n').replace('jpg','json') with open(pred_lines,'r') as fp: json_pred.append(json.load(fp)) print("Loading gt json ...") for gt_file_path in gt_lines: gt_lines = gt_dir + gt_file_path.strip('\n').replace('jpg','json') with open(gt_lines,'r') as fp: json_gt.append(json.load(fp)) if len(json_gt) != len(json_pred): raise Exception('We do not get the predictions of all the test tasks') gts = {l['file_path']: l for l in json_gt} laneline_stats = [] laneline_x_error_close = [] laneline_x_error_far = [] laneline_z_error_close = [] laneline_z_error_far = [] for i, pred in enumerate(json_pred): if i % 1000 == 0 or i == len(json_pred)-1: self.logger.info('eval:{}/{}'.format(i+1,len(json_pred))) if 'file_path' not in pred or 'lane_lines' not in pred: raise Exception('file_path or lane_lines not in some predictions.') raw_file = pred['file_path'] pred_lanelines = pred['lane_lines'] pred_lanes = [np.array(lane['xyz']) for i, lane in enumerate(pred_lanelines)] pred_category = [int(lane['category']) for i, lane in enumerate(pred_lanelines)] if raw_file not in gts: raise Exception('Some raw_file from your predictions do not exist in the test tasks.') gt = gts[raw_file] # evaluate lanelines cam_extrinsics = np.array(gt['extrinsic']) # Re-calculate extrinsic matrix based on ground coordinate R_vg = np.array([[0, 1, 0], [-1, 0, 0], [0, 0, 1]], dtype=float) R_gc = np.array([[1, 0, 0], [0, 0, 1], [0, -1, 0]], dtype=float) cam_extrinsics[:3, :3] = np.matmul(np.matmul( np.matmul(np.linalg.inv(R_vg), cam_extrinsics[:3, :3]), R_vg), R_gc) gt_cam_height = cam_extrinsics[2, 3] gt_cam_pitch = 0 cam_extrinsics[0:2, 3] = 0.0 # cam_extrinsics[2, 3] = gt_cam_height cam_intrinsics = gt['intrinsic'] cam_intrinsics = np.array(cam_intrinsics) try: gt_lanes_packed = gt['lane_lines'] except: print("error 'lane_lines' in gt: ", gt['file_path']) gt_lanes, gt_visibility, gt_category = [], [], [] for j, gt_lane_packed in enumerate(gt_lanes_packed): # A GT lane can be either 2D or 3D # if a GT lane is 3D, the height is intact from 3D GT, so keep it intact here too lane = np.array(gt_lane_packed['xyz']) lane_visibility = np.array(gt_lane_packed['visibility']) lane = np.vstack((lane, np.ones((1, lane.shape[1])))) cam_representation = np.linalg.inv( np.array([[0, 0, 1, 0], [-1, 0, 0, 0], [0, -1, 0, 0], [0, 0, 0, 1]], dtype=float)) lane = np.matmul(cam_extrinsics, np.matmul(cam_representation, lane)) lane = lane[0:3, :].T gt_lanes.append(lane) gt_visibility.append(lane_visibility) gt_category.append(gt_lane_packed['category']) if is_apollo: P_g2im = projection_g2im() else: P_g2im = projection_g2im_extrinsic(cam_extrinsics, cam_intrinsics) # N to N matching of lanelines r_lane, p_lane, c_lane, cnt_gt, cnt_pred, match_num, \ x_error_close, x_error_far, \ z_error_close, z_error_far = self.bench(pred_lanes, pred_category, gt_lanes, gt_visibility, gt_category, raw_file, gt_cam_height, gt_cam_pitch, vis, P_g2im) laneline_stats.append(np.array([r_lane, p_lane, c_lane, cnt_gt, cnt_pred, match_num])) laneline_x_error_close.extend(x_error_close) laneline_x_error_far.extend(x_error_far) laneline_z_error_close.extend(z_error_close) laneline_z_error_far.extend(z_error_far) output_stats = [] laneline_stats = np.array(laneline_stats) laneline_x_error_close = np.array(laneline_x_error_close) laneline_x_error_far = np.array(laneline_x_error_far) laneline_z_error_close = np.array(laneline_z_error_close) laneline_z_error_far = np.array(laneline_z_error_far) if np.sum(laneline_stats[:, 3])!= 0: R_lane = np.sum(laneline_stats[:, 0]) / (np.sum(laneline_stats[:, 3])) else: R_lane = np.sum(laneline_stats[:, 0]) / (np.sum(laneline_stats[:, 3]) + 1e-6) # recall = TP / (TP+FN) if np.sum(laneline_stats[:, 4]) != 0: P_lane = np.sum(laneline_stats[:, 1]) / (np.sum(laneline_stats[:, 4])) else: P_lane = np.sum(laneline_stats[:, 1]) / (np.sum(laneline_stats[:, 4]) + 1e-6) # precision = TP / (TP+FP) if np.sum(laneline_stats[:, 5]) != 0: C_lane = np.sum(laneline_stats[:, 2]) / (np.sum(laneline_stats[:, 5])) else: C_lane = np.sum(laneline_stats[:, 2]) / (np.sum(laneline_stats[:, 5]) + 1e-6) # category_accuracy if R_lane + P_lane != 0: F_lane = 2 * R_lane * P_lane / (R_lane + P_lane) else: F_lane = 2 * R_lane * P_lane / (R_lane + P_lane + 1e-6) x_error_close_avg = np.average(laneline_x_error_close[laneline_x_error_close > -1 + 1e-6]) x_error_far_avg = np.average(laneline_x_error_far[laneline_x_error_far > -1 + 1e-6]) z_error_close_avg = np.average(laneline_z_error_close[laneline_z_error_close > -1 + 1e-6]) z_error_far_avg = np.average(laneline_z_error_far[laneline_z_error_far > -1 + 1e-6]) output_stats.append(F_lane) output_stats.append(R_lane) output_stats.append(P_lane) output_stats.append(C_lane) output_stats.append(x_error_close_avg) output_stats.append(x_error_far_avg) output_stats.append(z_error_close_avg) output_stats.append(z_error_far_avg) output_stats.append(np.sum(laneline_stats[:, 0])) # 8 output_stats.append(np.sum(laneline_stats[:, 1])) # 9 output_stats.append(np.sum(laneline_stats[:, 2])) # 10 output_stats.append(np.sum(laneline_stats[:, 3])) # 11 output_stats.append(np.sum(laneline_stats[:, 4])) # 12 output_stats.append(np.sum(laneline_stats[:, 5])) # 13 return output_stats # compare predicted set and ground-truth set using a fixed lane probability threshold def bench_one_submit_ddp(self, pred_lines_sub, gt_lines_sub, model_name, prob_th=0.5, vis=False): json_gt = gt_lines_sub json_pred = pred_lines_sub gts = {l['file_path']: l for l in json_gt} laneline_stats = [] laneline_x_error_close = [] laneline_x_error_far = [] laneline_z_error_close = [] laneline_z_error_far = [] gt_num_all, pred_num_all = 0, 0 for i, pred in enumerate(json_pred): if 'file_path' not in pred or 'pred_laneLines' not in pred: raise Exception('file_path or lane_lines not in some predictions.') raw_file = pred['file_path'] pred_lanes = pred['pred_laneLines'] pred_lanes_prob = pred['pred_laneLines_prob'] if model_name == "GenLaneNet": pred_lanes = [pred_lanes[ii] for ii in range(len(pred_lanes_prob)) if pred_lanes_prob[ii] > prob_th] pred_category = np.zeros(len(pred_lanes_prob)) # Note: non-lane class is already filtered out in compute_3d_lanes_all_category() else: pred_lanes = [pred_lanes[ii] for ii in range(len(pred_lanes_prob)) if max(pred_lanes_prob[ii]) > prob_th] pred_lanes_prob = [prob for k, prob in enumerate(pred_lanes_prob) if max(prob) > prob_th] if pred_lanes_prob: pred_category = np.argmax(pred_lanes_prob, 1) else: pred_category = [] if raw_file not in gts: raise Exception('Some raw_file from your predictions do not exist in the test tasks.') gt = gts[raw_file] # evaluate lanelines assert 'extrinsic' in gt and 'intrinsic' in gt cam_extrinsics = np.array(gt['extrinsic']) cam_intrinsics = gt['intrinsic'] cam_intrinsics = np.array(cam_intrinsics) # else: # assert all(cam_param in gt for cam_param in ['cam_pitch', 'cam_height']), "without 'extrinsic' in gt json_file dict, AND not 'cam_height' & 'cam_pitch' as well." # cam_extrinsics = # cam_intrinsics = np.array(self.args.K) if self.is_apollo: gt_cam_height = gt['cam_height'] gt_cam_pitch = gt['cam_pitch'] gt_lanes_packed = gt['laneLines'] gt_lane_visibility = gt['laneLines_visibility'] gt_lanes, gt_visibility, gt_category = [], [], [] for j, gt_lane_packed in enumerate(gt_lanes_packed): lane = np.array(gt_lane_packed) lane_visibility = np.array(gt_lane_visibility[j]) gt_lanes.append(lane) gt_visibility.append(lane_visibility) gt_category.append(1) else: # Re-calculate extrinsic matrix based on ground coordinate R_vg = np.array([[0, 1, 0], [-1, 0, 0], [0, 0, 1]], dtype=float) R_gc = np.array([[1, 0, 0], [0, 0, 1], [0, -1, 0]], dtype=float) cam_extrinsics[:3, :3] = np.matmul(np.matmul( np.matmul(np.linalg.inv(R_vg), cam_extrinsics[:3, :3]), R_vg), R_gc) gt_cam_height = cam_extrinsics[2, 3] gt_cam_pitch = 0 cam_extrinsics[0:2, 3] = 0.0 # cam_extrinsics[2, 3] = gt_cam_height cam_intrinsics = gt['intrinsic'] cam_intrinsics = np.array(cam_intrinsics) try: gt_lanes_packed = gt['lane_lines'] except: print("error 'lane_lines' in gt: ", gt['file_path']) gt_lanes_packed = [] gt_lanes, gt_visibility, gt_category = [], [], [] for j, gt_lane_packed in enumerate(gt_lanes_packed): # A GT lane can be either 2D or 3D # if a GT lane is 3D, the height is intact from 3D GT, so keep it intact here too lane = np.array(gt_lane_packed['xyz']) lane_visibility = np.array(gt_lane_packed['visibility']) lane = np.vstack((lane, np.ones((1, lane.shape[1])))) cam_representation = np.linalg.inv( np.array([[0, 0, 1, 0], [-1, 0, 0, 0], [0, -1, 0, 0], [0, 0, 0, 1]], dtype=float)) lane = np.matmul(cam_extrinsics, np.matmul(cam_representation, lane)) lane = lane[0:3, :].T gt_lanes.append(lane) gt_visibility.append(lane_visibility) gt_category.append(gt_lane_packed['category']) if self.is_apollo: P_g2im = projection_g2im(gt_cam_pitch, gt_cam_height, cam_intrinsics) else: P_g2im = projection_g2im_extrinsic(cam_extrinsics, cam_intrinsics) # N to N matching of lanelines gt_num_all += len(gt_lanes) pred_num_all += len(pred_lanes) r_lane, p_lane, c_lane, cnt_gt, cnt_pred, match_num, \ x_error_close, x_error_far, \ z_error_close, z_error_far = self.bench(pred_lanes, pred_category, gt_lanes, gt_visibility, gt_category, raw_file, gt_cam_height, gt_cam_pitch, vis, P_g2im) laneline_stats.append(np.array([r_lane, p_lane, c_lane, cnt_gt, cnt_pred, match_num])) # consider x_error z_error only for the matched lanes # if r_lane > 0 and p_lane > 0: laneline_x_error_close.extend(x_error_close) laneline_x_error_far.extend(x_error_far) laneline_z_error_close.extend(z_error_close) laneline_z_error_far.extend(z_error_far) output_stats = [] laneline_stats = np.array(laneline_stats) laneline_x_error_close = np.array(laneline_x_error_close) laneline_x_error_far = np.array(laneline_x_error_far) laneline_z_error_close = np.array(laneline_z_error_close) laneline_z_error_far = np.array(laneline_z_error_far) self.logger.info("match num:"+(str(np.sum(laneline_stats[:,5])))) self.logger.info("cnt_gt_all:"+str(gt_num_all)) self.logger.info("cnt_pred_all:"+str(pred_num_all)) self.logger.info("cnt_gt_matched:"+str(np.sum(laneline_stats[:,3]))) self.logger.info("cnt_pred_matched:"+str(np.sum(laneline_stats[:,4]))) if np.sum(laneline_stats[:, 3])!= 0: R_lane = np.sum(laneline_stats[:, 0]) / (np.sum(laneline_stats[:, 3])) else: R_lane = np.sum(laneline_stats[:, 0]) / (np.sum(laneline_stats[:, 3]) + 1e-6) # recall = TP / (TP+FN) if np.sum(laneline_stats[:, 4]) != 0: P_lane = np.sum(laneline_stats[:, 1]) / (np.sum(laneline_stats[:, 4])) else: P_lane = np.sum(laneline_stats[:, 1]) / (np.sum(laneline_stats[:, 4]) + 1e-6) # precision = TP / (TP+FP) if np.sum(laneline_stats[:, 5]) != 0: C_lane = np.sum(laneline_stats[:, 2]) / (np.sum(laneline_stats[:, 5])) else: C_lane = np.sum(laneline_stats[:, 2]) / (np.sum(laneline_stats[:, 5]) + 1e-6) # category_accuracy if (R_lane + P_lane) != 0: F_lane = 2 * R_lane * P_lane / (R_lane + P_lane) else: F_lane = 2 * R_lane * P_lane / (R_lane + P_lane + 1e-6) if laneline_x_error_close.shape[0] > 0: x_error_close_avg = np.average(laneline_x_error_close[laneline_x_error_close > -1 + 1e-6]) else: x_error_close_avg = -1 if laneline_x_error_far.shape[0] > 0: x_error_far_avg = np.average(laneline_x_error_far[laneline_x_error_far > -1 + 1e-6]) else: x_error_far_avg = -1 if laneline_z_error_close.shape[0] > 0: z_error_close_avg = np.average(laneline_z_error_close[laneline_z_error_close > -1 + 1e-6]) else: z_error_close_avg = -1 if laneline_z_error_far.shape[0] > 0: z_error_far_avg = np.average(laneline_z_error_far[laneline_z_error_far > -1 + 1e-6]) else: z_error_far_avg = -1 output_stats.append(F_lane) output_stats.append(R_lane) output_stats.append(P_lane) output_stats.append(C_lane) output_stats.append(x_error_close_avg) output_stats.append(x_error_far_avg) output_stats.append(z_error_close_avg) output_stats.append(z_error_far_avg) output_stats.append(np.sum(laneline_stats[:, 0])) # 8 output_stats.append(np.sum(laneline_stats[:, 1])) # 9 output_stats.append(np.sum(laneline_stats[:, 2])) # 10 output_stats.append(np.sum(laneline_stats[:, 3])) # 11 output_stats.append(np.sum(laneline_stats[:, 4])) # 12 output_stats.append(np.sum(laneline_stats[:, 5])) # 13 return output_stats ================================================ FILE: utils/eval_3D_lane_apollo.py ================================================ """ Description: This code is to evaluate 3D lane detection. The optimal matching between ground-truth set and predicted set of lanes are sought via solving a min cost flow. Evaluation metrics includes: Average Precision (AP) Max F-scores x error close (0 - 40 m) x error far (0 - 100 m) z error close (0 - 40 m) z error far (0 - 100 m) Reference: "Gen-LaneNet: Generalized and Scalable Approach for 3D Lane Detection". Y. Guo. etal. 2020 Author: Yuliang Guo (33yuliangguo@gmail.com) Date: March, 2020 """ import numpy as np import cv2 import os import os.path as ops import copy import math import ujson as json from scipy.interpolate import interp1d import matplotlib from utils.utils import * from utils.MinCostFlow import SolveMinCostFlow from mpl_toolkits.mplot3d import Axes3D matplotlib.use('Agg') import matplotlib.pyplot as plt plt.rcParams['figure.figsize'] = (35, 30) plt.rcParams.update({'font.size': 25}) plt.rcParams.update({'font.weight': 'semibold'}) color = [[0, 0, 255], # red [0, 255, 0], # green [255, 0, 255], # purple [255, 255, 0]] # cyan vis_min_y = 5 vis_max_y = 80 class LaneEval(object): def __init__(self, args, logger=None): self.dataset_dir = args.dataset_dir self.K = args.K self.no_centerline = True self.resize_h = args.resize_h self.resize_w = args.resize_w self.H_crop = homography_crop_resize([args.org_h, args.org_w], args.crop_y, [args.resize_h, args.resize_w]) self.x_min = args.top_view_region[0, 0] self.x_max = args.top_view_region[1, 0] self.y_min = args.top_view_region[2, 1] self.y_max = args.top_view_region[0, 1] self.y_samples = np.linspace(self.y_min, self.y_max, num=100, endpoint=False) self.dist_th = 1.5 self.ratio_th = 0.75 self.close_range = 40 self.logger = logger self.log_eval_info() def log_eval_info(self): self.logger.info('eval x range : [%s, %s]' % (self.x_min, self.x_max)) self.logger.info('eval y range : [%s, %s)' % (self.y_min, self.y_max)) self.logger.info('eval distance thresh: %s meter' % self.dist_th) self.logger.info('eval points ratio : %s' % self.ratio_th) self.logger.info('eval close range: %s' % self.close_range) def bench(self, pred_lanes, gt_lanes, gt_visibility, raw_file, gt_cam_height, gt_cam_pitch, vis, ax1, ax2): """ Matching predicted lanes and ground-truth lanes in their IPM projection, ignoring z attributes. x error, y_error, and z error are all considered, although the matching does not rely on z The input of prediction and ground-truth lanes are in ground coordinate, x-right, y-forward, z-up The fundamental assumption is: 1. there are no two points from different lanes with identical x, y but different z's 2. there are no two points from a single lane having identical x, y but different z's If the interest area is within the current drivable road, the above assumptions are almost always valid. :param pred_lanes: N X 2 or N X 3 lists depending on 2D or 3D :param gt_lanes: N X 2 or N X 3 lists depending on 2D or 3D :param raw_file: file path rooted in dataset folder :param gt_cam_height: camera height given in ground-truth data :param gt_cam_pitch: camera pitch given in ground-truth data :return: """ # change this properly close_range_idx = np.where(self.y_samples > self.close_range)[0][0] r_lane, p_lane = 0., 0. x_error_close = [] x_error_far = [] z_error_close = [] z_error_far = [] # only keep the visible portion gt_lanes = [prune_3d_lane_by_visibility(np.array(gt_lane), np.array(gt_visibility[k])) for k, gt_lane in enumerate(gt_lanes)] gt_lanes = [lane for lane in gt_lanes if lane.shape[0] > 1] # only consider those gt lanes overlapping with sampling range gt_lanes = [lane for lane in gt_lanes if lane[0, 1] < self.y_samples[-1] and lane[-1, 1] > self.y_samples[0]] gt_lanes = [prune_3d_lane_by_range(np.array(gt_lane), 3 * self.x_min, 3 * self.x_max) for gt_lane in gt_lanes] gt_lanes = [lane for lane in gt_lanes if lane.shape[0] > 1] cnt_gt = len(gt_lanes) cnt_pred = len(pred_lanes) gt_visibility_mat = np.zeros((cnt_gt, 100)) pred_visibility_mat = np.zeros((cnt_pred, 100)) # resample gt and pred at y_samples for i in range(cnt_gt): min_y = np.min(np.array(gt_lanes[i])[:, 1]) max_y = np.max(np.array(gt_lanes[i])[:, 1]) x_values, z_values, visibility_vec = resample_laneline_in_y(np.array(gt_lanes[i]), self.y_samples, out_vis=True) gt_lanes[i] = np.vstack([x_values, z_values]).T gt_visibility_mat[i, :] = np.logical_and(x_values >= self.x_min, np.logical_and(x_values <= self.x_max, np.logical_and(self.y_samples >= min_y, self.y_samples <= max_y))) gt_visibility_mat[i, :] = np.logical_and(gt_visibility_mat[i, :], visibility_vec) for i in range(cnt_pred): # # ATTENTION: ensure y mono increase before interpolation: but it can reduce size # pred_lanes[i] = make_lane_y_mono_inc(np.array(pred_lanes[i])) # pred_lane = prune_3d_lane_by_range(np.array(pred_lanes[i]), self.x_min, self.x_max) min_y = np.min(np.array(pred_lanes[i])[:, 1]) max_y = np.max(np.array(pred_lanes[i])[:, 1]) x_values, z_values, visibility_vec = resample_laneline_in_y(np.array(pred_lanes[i]), self.y_samples, out_vis=True) pred_lanes[i] = np.vstack([x_values, z_values]).T pred_visibility_mat[i, :] = np.logical_and(x_values >= self.x_min, np.logical_and(x_values <= self.x_max, np.logical_and(self.y_samples >= min_y, self.y_samples <= max_y))) pred_visibility_mat[i, :] = np.logical_and(pred_visibility_mat[i, :], visibility_vec) # pred_visibility_mat[i, :] = np.logical_and(x_values >= self.x_min, x_values <= self.x_max) adj_mat = np.zeros((cnt_gt, cnt_pred), dtype=np.int) cost_mat = np.zeros((cnt_gt, cnt_pred), dtype=np.int) cost_mat.fill(1000) num_match_mat = np.zeros((cnt_gt, cnt_pred), dtype=np.float) x_dist_mat_close = np.zeros((cnt_gt, cnt_pred), dtype=np.float) x_dist_mat_close.fill(1000.) x_dist_mat_far = np.zeros((cnt_gt, cnt_pred), dtype=np.float) x_dist_mat_far.fill(1000.) z_dist_mat_close = np.zeros((cnt_gt, cnt_pred), dtype=np.float) z_dist_mat_close.fill(1000.) z_dist_mat_far = np.zeros((cnt_gt, cnt_pred), dtype=np.float) z_dist_mat_far.fill(1000.) # compute curve to curve distance for i in range(cnt_gt): for j in range(cnt_pred): x_dist = np.abs(gt_lanes[i][:, 0] - pred_lanes[j][:, 0]) z_dist = np.abs(gt_lanes[i][:, 1] - pred_lanes[j][:, 1]) euclidean_dist = np.sqrt(x_dist ** 2 + z_dist ** 2) # apply visibility to penalize different partial matching accordingly euclidean_dist[ np.logical_or(gt_visibility_mat[i, :] < 0.5, pred_visibility_mat[j, :] < 0.5)] = self.dist_th # if np.average(euclidean_dist) < 2*self.dist_th: # don't prune here to encourage finding perfect match num_match_mat[i, j] = np.sum(euclidean_dist < self.dist_th) adj_mat[i, j] = 1 # ATTENTION: use the sum as int type to meet the requirements of min cost flow optimization (int type) # using num_match_mat as cost does not work? cost_mat[i, j] = np.sum(euclidean_dist).astype(np.int) # cost_mat[i, j] = num_match_mat[i, j] # use the both visible portion to calculate distance error both_visible_indices = np.logical_and(gt_visibility_mat[i, :] > 0.5, pred_visibility_mat[j, :] > 0.5) if np.sum(both_visible_indices[:close_range_idx]) > 0: x_dist_mat_close[i, j] = np.sum( x_dist[:close_range_idx] * both_visible_indices[:close_range_idx]) / np.sum( both_visible_indices[:close_range_idx]) z_dist_mat_close[i, j] = np.sum( z_dist[:close_range_idx] * both_visible_indices[:close_range_idx]) / np.sum( both_visible_indices[:close_range_idx]) else: x_dist_mat_close[i, j] = self.dist_th z_dist_mat_close[i, j] = self.dist_th if np.sum(both_visible_indices[close_range_idx:]) > 0: x_dist_mat_far[i, j] = np.sum( x_dist[close_range_idx:] * both_visible_indices[close_range_idx:]) / np.sum( both_visible_indices[close_range_idx:]) z_dist_mat_far[i, j] = np.sum( z_dist[close_range_idx:] * both_visible_indices[close_range_idx:]) / np.sum( both_visible_indices[close_range_idx:]) else: x_dist_mat_far[i, j] = self.dist_th z_dist_mat_far[i, j] = self.dist_th # solve bipartite matching vis min cost flow solver match_results = SolveMinCostFlow(adj_mat, cost_mat) match_results = np.array(match_results) # only a match with avg cost < self.dist_th is consider valid one match_gt_ids = [] match_pred_ids = [] if match_results.shape[0] > 0: for i in range(len(match_results)): if match_results[i, 2] < self.dist_th * self.y_samples.shape[0]: gt_i = match_results[i, 0] pred_i = match_results[i, 1] # consider match when the matched points is above a ratio if num_match_mat[gt_i, pred_i] / np.sum(gt_visibility_mat[gt_i, :]) >= self.ratio_th: r_lane += 1 match_gt_ids.append(gt_i) if num_match_mat[gt_i, pred_i] / np.sum(pred_visibility_mat[pred_i, :]) >= self.ratio_th: p_lane += 1 match_pred_ids.append(pred_i) x_error_close.append(x_dist_mat_close[gt_i, pred_i]) x_error_far.append(x_dist_mat_far[gt_i, pred_i]) z_error_close.append(z_dist_mat_close[gt_i, pred_i]) z_error_far.append(z_dist_mat_far[gt_i, pred_i]) # visualize lanelines and matching results both in image and 3D if vis: P_g2im = projection_g2im(gt_cam_pitch, gt_cam_height, self.K) P_gt = np.matmul(self.H_crop, P_g2im) img = cv2.imread(ops.join(self.dataset_dir, raw_file)) img = cv2.warpPerspective(img, self.H_crop, (self.resize_w, self.resize_h)) img = img.astype(np.float) / 255 for i in range(cnt_gt): x_values = gt_lanes[i][:, 0] z_values = gt_lanes[i][:, 1] x_2d, y_2d = projective_transformation(P_gt, x_values, self.y_samples, z_values) x_2d = x_2d.astype(np.int) y_2d = y_2d.astype(np.int) if i in match_gt_ids: color = [0, 0, 1] else: color = [0, 1, 1] for k in range(1, x_2d.shape[0]): # only draw the visible portion if gt_visibility_mat[i, k - 1] and gt_visibility_mat[i, k]: img = cv2.line(img, (x_2d[k - 1], y_2d[k - 1]), (x_2d[k], y_2d[k]), color[-1::-1], 3) ax2.plot(x_values[np.where(gt_visibility_mat[i, :])], self.y_samples[np.where(gt_visibility_mat[i, :])], z_values[np.where(gt_visibility_mat[i, :])], color=color, linewidth=5) for i in range(cnt_pred): x_values = pred_lanes[i][:, 0] z_values = pred_lanes[i][:, 1] x_2d, y_2d = projective_transformation(P_gt, x_values, self.y_samples, z_values) x_2d = x_2d.astype(np.int) y_2d = y_2d.astype(np.int) if i in match_pred_ids: color = [1, 0, 0] else: color = [1, 0, 1] for k in range(1, x_2d.shape[0]): # only draw the visible portion if pred_visibility_mat[i, k - 1] and pred_visibility_mat[i, k]: img = cv2.line(img, (x_2d[k - 1], y_2d[k - 1]), (x_2d[k], y_2d[k]), color[-1::-1], 2) ax2.plot(x_values[np.where(pred_visibility_mat[i, :])], self.y_samples[np.where(pred_visibility_mat[i, :])], z_values[np.where(pred_visibility_mat[i, :])], color=color, linewidth=5) cv2.putText(img, 'Recall: {:.3f}'.format(r_lane / (cnt_gt + 1e-6)), (5, 30), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.7, color=(0, 0, 1), thickness=2) cv2.putText(img, 'Precision: {:.3f}'.format(p_lane / (cnt_pred + 1e-6)), (5, 60), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=0.7, color=(0, 0, 1), thickness=2) ax1.imshow(img[:, :, [2, 1, 0]]) return r_lane, p_lane, cnt_gt, cnt_pred, x_error_close, x_error_far, z_error_close, z_error_far # compare predicted set and ground-truth set using a fixed lane probability threshold def bench_one_submit(self, pred_file, gt_file, prob_th=0.5, vis=False): if vis: save_path = pred_file[:pred_file.rfind('/')] save_path += '/vis' if vis and not os.path.exists(save_path): try: os.makedirs(save_path) except OSError as e: print(e.message) pred_lines = open(pred_file).readlines() json_pred = [json.loads(line) for line in pred_lines] json_gt = [json.loads(line) for line in open(gt_file).readlines()] if len(json_gt) != len(json_pred): raise Exception('We do not get the predictions of all the test tasks') gts = {l['raw_file']: l for l in json_gt} laneline_stats = [] laneline_x_error_close = [] laneline_x_error_far = [] laneline_z_error_close = [] laneline_z_error_far = [] centerline_stats = [] centerline_x_error_close = [] centerline_x_error_far = [] centerline_z_error_close = [] centerline_z_error_far = [] print('prob th: %s' % prob_th) for i, pred in enumerate(json_pred): if 'raw_file' not in pred or 'laneLines' not in pred: raise Exception('raw_file or lanelines not in some predictions.') raw_file = pred['raw_file'] pred_lanelines = pred['laneLines'] pred_laneLines_prob = pred['laneLines_prob'] pred_laneLines_prob_one_cls = [] for prob_2cls in pred_laneLines_prob: max_cls = np.argmax(prob_2cls) if max_cls > 0: pred_laneLines_prob_one_cls.append(prob_2cls[max_cls]) else: pred_laneLines_prob_one_cls.append(prob_2cls[0]) pred_lanelines = [pred_lanelines[ii] for ii in range(len(pred_laneLines_prob_one_cls)) if pred_laneLines_prob_one_cls[ii] > prob_th] if raw_file not in gts: raise Exception('Some raw_file from your predictions do not exist in the test tasks.') gt = gts[raw_file] gt_cam_height = gt['cam_height'] gt_cam_pitch = gt['cam_pitch'] if vis: fig = plt.figure() ax1 = fig.add_subplot(221) ax2 = fig.add_subplot(222, projection='3d') ax3 = fig.add_subplot(223) ax4 = fig.add_subplot(224, projection='3d') else: ax1 = 0 ax2 = 0 ax3 = 0 ax4 = 0 # evaluate lanelines gt_lanelines = gt['laneLines'] gt_visibility = gt['laneLines_visibility'] # N to N matching of lanelines r_lane, p_lane, cnt_gt, cnt_pred, \ x_error_close, x_error_far, \ z_error_close, z_error_far = self.bench(pred_lanelines, gt_lanelines, gt_visibility, raw_file, gt_cam_height, gt_cam_pitch, vis, ax1, ax2) c_lane = 0.0 laneline_stats.append(np.array([r_lane, p_lane, cnt_gt, cnt_pred])) laneline_x_error_close.extend(x_error_close) laneline_x_error_far.extend(x_error_far) laneline_z_error_close.extend(z_error_close) laneline_z_error_far.extend(z_error_far) # evaluate centerlines if not self.no_centerline: pred_centerlines = pred['centerLines'] pred_centerlines_prob = pred['centerLines_prob'] pred_centerlines = [pred_centerlines[ii] for ii in range(len(pred_centerlines_prob)) if pred_centerlines_prob[ii] > prob_th] gt_centerlines = gt['centerLines'] gt_visibility = gt['centerLines_visibility'] # N to N matching of lanelines r_lane, p_lane, cnt_gt, cnt_pred, \ x_error_close, x_error_far, \ z_error_close, z_error_far = self.bench(pred_centerlines, gt_centerlines, gt_visibility, raw_file, gt_cam_height, gt_cam_pitch, vis, ax3, ax4) centerline_stats.append(np.array([r_lane, p_lane, cnt_gt, cnt_pred])) centerline_x_error_close.extend(x_error_close) centerline_x_error_far.extend(x_error_far) centerline_z_error_close.extend(z_error_close) centerline_z_error_far.extend(z_error_far) if vis: ax1.set_xticks([]) ax1.set_yticks([]) # ax2.set_xlabel('x axis') # ax2.set_ylabel('y axis') # ax2.set_zlabel('z axis') bottom, top = ax2.get_zlim() left, right = ax2.get_xlim() ax2.set_zlim(min(bottom, -0.1), max(top, 0.1)) ax2.set_xlim(left, right) ax2.set_ylim(vis_min_y, vis_max_y) ax2.locator_params(nbins=5, axis='x') ax2.locator_params(nbins=5, axis='z') ax2.tick_params(pad=18) ax3.set_xticks([]) ax3.set_yticks([]) # ax4.set_xlabel('x axis') # ax4.set_ylabel('y axis') # ax4.set_zlabel('z axis') bottom, top = ax4.get_zlim() left, right = ax4.get_xlim() ax4.set_zlim(min(bottom, -0.1), max(top, 0.1)) ax4.set_xlim(left, right) ax4.set_ylim(vis_min_y, vis_max_y) ax4.locator_params(nbins=5, axis='x') ax4.locator_params(nbins=5, axis='z') ax4.tick_params(pad=18) fig.subplots_adjust(wspace=0, hspace=0.01) fig.savefig(ops.join(save_path, raw_file.replace("/", "_"))) plt.close(fig) print('processed sample: {} {}'.format(i, raw_file)) output_stats = [] laneline_stats = np.array(laneline_stats) laneline_x_error_close = np.array(laneline_x_error_close) laneline_x_error_far = np.array(laneline_x_error_far) laneline_z_error_close = np.array(laneline_z_error_close) laneline_z_error_far = np.array(laneline_z_error_far) R_lane = np.sum(laneline_stats[:, 0]) / (np.sum(laneline_stats[:, 2]) + 1e-6) P_lane = np.sum(laneline_stats[:, 1]) / (np.sum(laneline_stats[:, 3]) + 1e-6) F_lane = 2 * R_lane * P_lane / (R_lane + P_lane + 1e-6) x_error_close_avg = np.average(laneline_x_error_close) x_error_far_avg = np.average(laneline_x_error_far) z_error_close_avg = np.average(laneline_z_error_close) z_error_far_avg = np.average(laneline_z_error_far) output_stats.append(F_lane) output_stats.append(R_lane) output_stats.append(P_lane) # output_stats.append(0.0) output_stats.append(x_error_close_avg) output_stats.append(x_error_far_avg) output_stats.append(z_error_close_avg) output_stats.append(z_error_far_avg) # for gpus cal res. output_stats.append(np.sum(laneline_stats[:, 0])) # 7 output_stats.append(np.sum(laneline_stats[:, 1])) # 8 output_stats.append(np.sum(laneline_stats[:, 2])) # 9 output_stats.append(np.sum(laneline_stats[:, 3])) # 10 if not self.no_centerline: centerline_stats = np.array(centerline_stats) centerline_x_error_close = np.array(centerline_x_error_close) centerline_x_error_far = np.array(centerline_x_error_far) centerline_z_error_close = np.array(centerline_z_error_close) centerline_z_error_far = np.array(centerline_z_error_far) R_lane = np.sum(centerline_stats[:, 0]) / (np.sum(centerline_stats[:, 2]) + 1e-6) P_lane = np.sum(centerline_stats[:, 1]) / (np.sum(centerline_stats[:, 3]) + 1e-6) F_lane = 2 * R_lane * P_lane / (R_lane + P_lane + 1e-6) x_error_close_avg = np.average(centerline_x_error_close) x_error_far_avg = np.average(centerline_x_error_far) z_error_close_avg = np.average(centerline_z_error_close) z_error_far_avg = np.average(centerline_z_error_far) output_stats.append(F_lane) output_stats.append(R_lane) output_stats.append(P_lane) output_stats.append(x_error_close_avg) output_stats.append(x_error_far_avg) output_stats.append(z_error_close_avg) output_stats.append(z_error_far_avg) return output_stats def bench_one_submit_ddp(self, pred_lines_sub, gt_lines_sub, model_name, prob_th=0.5, vis=False): gts = {l['file_path']: l for l in gt_lines_sub} laneline_stats = [] laneline_x_error_close = [] laneline_x_error_far = [] laneline_z_error_close = [] laneline_z_error_far = [] gt_num_all, pred_num_all = 0, 0 for i, pred in enumerate(pred_lines_sub): if 'file_path' not in pred or 'pred_laneLines' not in pred: raise Exception('file_path or lane_lines not in some predictions.') raw_file = pred['file_path'] pred_lanes = pred['pred_laneLines'] pred_lanes_prob = pred['pred_laneLines_prob'] pred_lanes = [pred_lanes[ii] for ii in range(len(pred_lanes_prob)) if max(pred_lanes_prob[ii]) > prob_th] pred_lanes_prob = [prob for k, prob in enumerate(pred_lanes_prob) if max(prob) > prob_th] if len(pred_lanes_prob) > 0: pred_category = np.argmax(pred_lanes_prob, 1) else: pred_category = [] if raw_file not in gts: raise Exception('Some raw_file from your predictions do not exist in the test tasks.') gt = gts[raw_file] gt_cam_height = gt['cam_height'] gt_cam_pitch = gt['cam_pitch'] cam_intrinsics = gt['intrinsic'] gt_lanes_packed = gt['laneLines'] gt_lane_visibility = gt['laneLines_visibility'] gt_lanes, gt_visibility, gt_category = [], [], [] for j, gt_lane_packed in enumerate(gt_lanes_packed): lane = np.array(gt_lane_packed) lane_visibility = np.array(gt_lane_visibility[j]) gt_lanes.append(lane) gt_visibility.append(lane_visibility) gt_category.append(1) P_g2im = projection_g2im(gt_cam_pitch, gt_cam_height, cam_intrinsics) if vis: fig = plt.figure() ax1 = fig.add_subplot(221) ax2 = fig.add_subplot(222, projection='3d') ax3 = fig.add_subplot(223) ax4 = fig.add_subplot(224, projection='3d') else: ax1 = 0 ax2 = 0 ax3 = 0 ax4 = 0 # N to N matching of lanelines gt_num_all += len(gt_lanes) pred_num_all += len(pred_lanes) r_lane, p_lane, cnt_gt, cnt_pred, \ x_error_close, x_error_far, \ z_error_close, z_error_far = self.bench(pred_lanes, gt_lanes, gt_visibility, raw_file, gt_cam_height, gt_cam_pitch, vis, ax1, ax2) laneline_stats.append(np.array([r_lane, p_lane, cnt_gt, cnt_pred])) laneline_x_error_close.extend(x_error_close) laneline_x_error_far.extend(x_error_far) laneline_z_error_close.extend(z_error_close) laneline_z_error_far.extend(z_error_far) output_stats = [] laneline_stats = np.array(laneline_stats) laneline_x_error_close = np.array(laneline_x_error_close) laneline_x_error_far = np.array(laneline_x_error_far) laneline_z_error_close = np.array(laneline_z_error_close) laneline_z_error_far = np.array(laneline_z_error_far) R_lane = np.sum(laneline_stats[:, 0]) / (np.sum(laneline_stats[:, 2]) + 1e-6) # recall = TP / (TP+FN) P_lane = np.sum(laneline_stats[:, 1]) / (np.sum(laneline_stats[:, 3]) + 1e-6) # precision = TP / (TP+FP) F_lane = 2 * R_lane * P_lane / (R_lane + P_lane + 1e-6) x_error_close_avg = np.average(laneline_x_error_close) x_error_far_avg = np.average(laneline_x_error_far) z_error_close_avg = np.average(laneline_z_error_close) z_error_far_avg = np.average(laneline_z_error_far) output_stats.append(F_lane) output_stats.append(R_lane) output_stats.append(P_lane) output_stats.append(x_error_close_avg) output_stats.append(x_error_far_avg) output_stats.append(z_error_close_avg) output_stats.append(z_error_far_avg) # for gpus output_stats.append(np.sum(laneline_stats[:, 0])) output_stats.append(np.sum(laneline_stats[:, 1])) output_stats.append(np.sum(laneline_stats[:, 2])) output_stats.append(np.sum(laneline_stats[:, 3])) return output_stats def bench_PR(self, pred_lanes, gt_lanes, gt_visibility): """ Matching predicted lanes and ground-truth lanes in their IPM projection, ignoring z attributes. x error, y_error, and z error are all considered, although the matching does not rely on z The input of prediction and ground-truth lanes are in ground coordinate, x-right, y-forward, z-up The fundamental assumption is: 1. there are no two points from different lanes with identical x, y but different z's 2. there are no two points from a single lane having identical x, y but different z's If the interest area is within the current drivable road, the above assumptions are almost always valid. :param pred_lanes: N X 2 or N X 3 lists depending on 2D or 3D :param gt_lanes: N X 2 or N X 3 lists depending on 2D or 3D :return: """ r_lane, p_lane = 0., 0. # only keep the visible portion gt_lanes = [prune_3d_lane_by_visibility(np.array(gt_lane), np.array(gt_visibility[k])) for k, gt_lane in enumerate(gt_lanes)] gt_lanes = [lane for lane in gt_lanes if lane.shape[0] > 1] # only consider those gt lanes overlapping with sampling range gt_lanes = [lane for lane in gt_lanes if lane[0, 1] < self.y_samples[-1] and lane[-1, 1] > self.y_samples[0]] gt_lanes = [prune_3d_lane_by_range(np.array(gt_lane), 3 * self.x_min, 3 * self.x_max) for gt_lane in gt_lanes] gt_lanes = [lane for lane in gt_lanes if lane.shape[0] > 1] cnt_gt = len(gt_lanes) cnt_pred = len(pred_lanes) gt_visibility_mat = np.zeros((cnt_gt, 100)) pred_visibility_mat = np.zeros((cnt_pred, 100)) # resample gt and pred at y_samples for i in range(cnt_gt): min_y = np.min(np.array(gt_lanes[i])[:, 1]) max_y = np.max(np.array(gt_lanes[i])[:, 1]) x_values, z_values, visibility_vec = resample_laneline_in_y(np.array(gt_lanes[i]), self.y_samples, out_vis=True) gt_lanes[i] = np.vstack([x_values, z_values]).T gt_visibility_mat[i, :] = np.logical_and(x_values >= self.x_min, np.logical_and(x_values <= self.x_max, np.logical_and(self.y_samples >= min_y, self.y_samples <= max_y))) gt_visibility_mat[i, :] = np.logical_and(gt_visibility_mat[i, :], visibility_vec) for i in range(cnt_pred): # # ATTENTION: ensure y mono increase before interpolation: but it can reduce size # pred_lanes[i] = make_lane_y_mono_inc(np.array(pred_lanes[i])) # pred_lane = prune_3d_lane_by_range(np.array(pred_lanes[i]), self.x_min, self.x_max) min_y = np.min(np.array(pred_lanes[i])[:, 1]) max_y = np.max(np.array(pred_lanes[i])[:, 1]) x_values, z_values, visibility_vec = resample_laneline_in_y(np.array(pred_lanes[i]), self.y_samples, out_vis=True) pred_lanes[i] = np.vstack([x_values, z_values]).T pred_visibility_mat[i, :] = np.logical_and(x_values >= self.x_min, np.logical_and(x_values <= self.x_max, np.logical_and(self.y_samples >= min_y, self.y_samples <= max_y))) pred_visibility_mat[i, :] = np.logical_and(pred_visibility_mat[i, :], visibility_vec) # pred_visibility_mat[i, :] = np.logical_and(x_values >= self.x_min, x_values <= self.x_max) adj_mat = np.zeros((cnt_gt, cnt_pred), dtype=np.int) cost_mat = np.zeros((cnt_gt, cnt_pred), dtype=np.int) cost_mat.fill(1000) num_match_mat = np.zeros((cnt_gt, cnt_pred), dtype=np.float) # compute curve to curve distance for i in range(cnt_gt): for j in range(cnt_pred): x_dist = np.abs(gt_lanes[i][:, 0] - pred_lanes[j][:, 0]) z_dist = np.abs(gt_lanes[i][:, 1] - pred_lanes[j][:, 1]) euclidean_dist = np.sqrt(x_dist ** 2 + z_dist ** 2) # apply visibility to penalize different partial matching accordingly euclidean_dist[ np.logical_or(gt_visibility_mat[i, :] < 0.5, pred_visibility_mat[j, :] < 0.5)] = self.dist_th # if np.average(euclidean_dist) < 2*self.dist_th: # don't prune here to encourage finding perfect match num_match_mat[i, j] = np.sum(euclidean_dist < self.dist_th) adj_mat[i, j] = 1 # ATTENTION: use the sum as int type to meet the requirements of min cost flow optimization (int type) # why using num_match_mat as cost does not work? cost_mat[i, j] = np.sum(euclidean_dist).astype(np.int) # cost_mat[i, j] = num_match_mat[i, j] # solve bipartite matching vis min cost flow solver match_results = SolveMinCostFlow(adj_mat, cost_mat) match_results = np.array(match_results) # only a match with avg cost < self.dist_th is consider valid one match_gt_ids = [] match_pred_ids = [] if match_results.shape[0] > 0: for i in range(len(match_results)): if match_results[i, 2] < self.dist_th * self.y_samples.shape[0]: gt_i = match_results[i, 0] pred_i = match_results[i, 1] # consider match when the matched points is above a ratio if num_match_mat[gt_i, pred_i] / np.sum(gt_visibility_mat[gt_i, :]) >= self.ratio_th: r_lane += 1 match_gt_ids.append(gt_i) if num_match_mat[gt_i, pred_i] / np.sum(pred_visibility_mat[pred_i, :]) >= self.ratio_th: p_lane += 1 match_pred_ids.append(pred_i) return r_lane, p_lane, cnt_gt, cnt_pred # evaluate two dataset at varying lane probability threshold to calculate AP def bench_one_submit_varying_probs(self, pred_file, gt_file, eval_out_file=None, eval_fig_file=None): varying_th = np.linspace(0.05, 0.95, 19) # try: pred_lines = open(pred_file).readlines() json_pred = [json.loads(line) for line in pred_lines] # except BaseException as e: # raise Exception('Fail to load json file of the prediction.') json_gt = [json.loads(line) for line in open(gt_file).readlines()] if len(json_gt) != len(json_pred): raise Exception('We do not get the predictions of all the test tasks') gts = {l['raw_file']: l for l in json_gt} laneline_r_all = [] laneline_p_all = [] laneline_gt_cnt_all = [] laneline_pred_cnt_all = [] centerline_r_all = [] centerline_p_all = [] centerline_gt_cnt_all = [] centerline_pred_cnt_all = [] for i, pred in enumerate(json_pred): self.logger.info('Evaluating sample {} / {}'.format(i, len(json_pred))) if 'raw_file' not in pred or 'laneLines' not in pred: raise Exception('raw_file or lanelines not in some predictions.') raw_file = pred['raw_file'] pred_lanelines = pred['laneLines'] pred_laneLines_prob = pred['laneLines_prob'] if raw_file not in gts: raise Exception('Some raw_file from your predictions do not exist in the test tasks.') gt = gts[raw_file] gt_cam_height = gt['cam_height'] gt_cam_pitch = gt['cam_pitch'] # evaluate lanelines gt_lanelines = gt['laneLines'] gt_visibility = gt['laneLines_visibility'] r_lane_vec = [] p_lane_vec = [] cnt_gt_vec = [] cnt_pred_vec = [] for prob_th in varying_th: pred_lanelines_tmp = [pred_lanelines[ii] for ii in range(len(pred_laneLines_prob)) if pred_laneLines_prob[ii][1] > prob_th] pred_laneLines_prob_tmp = [prob[1] for prob in pred_laneLines_prob if prob[1] > prob_th] pred_lanelines_copy = copy.deepcopy(pred_lanelines_tmp) # N to N matching of lanelines r_lane, p_lane, cnt_gt, cnt_pred = self.bench_PR(pred_lanelines_copy, gt_lanelines, gt_visibility) r_lane_vec.append(r_lane) p_lane_vec.append(p_lane) cnt_gt_vec.append(cnt_gt) cnt_pred_vec.append(cnt_pred) laneline_r_all.append(r_lane_vec) laneline_p_all.append(p_lane_vec) laneline_gt_cnt_all.append(cnt_gt_vec) laneline_pred_cnt_all.append(cnt_pred_vec) # evaluate centerlines if not self.no_centerline: pred_centerlines = pred['centerLines'] pred_centerLines_prob = pred['centerLines_prob'] gt_centerlines = gt['centerLines'] gt_visibility = gt['centerLines_visibility'] r_lane_vec = [] p_lane_vec = [] cnt_gt_vec = [] cnt_pred_vec = [] for prob_th in varying_th: pred_centerlines = [pred_centerlines[ii] for ii in range(len(pred_centerLines_prob)) if pred_centerLines_prob[ii] > prob_th] pred_centerLines_prob = [prob for prob in pred_centerLines_prob if prob > prob_th] pred_centerlines_copy = copy.deepcopy(pred_centerlines) # N to N matching of lanelines r_lane, p_lane, cnt_gt, cnt_pred = self.bench_PR(pred_centerlines_copy, gt_centerlines, gt_visibility) r_lane_vec.append(r_lane) p_lane_vec.append(p_lane) cnt_gt_vec.append(cnt_gt) cnt_pred_vec.append(cnt_pred) centerline_r_all.append(r_lane_vec) centerline_p_all.append(p_lane_vec) centerline_gt_cnt_all.append(cnt_gt_vec) centerline_pred_cnt_all.append(cnt_pred_vec) output_stats = [] # compute precision, recall laneline_r_all = np.array(laneline_r_all) laneline_p_all = np.array(laneline_p_all) laneline_gt_cnt_all = np.array(laneline_gt_cnt_all) laneline_pred_cnt_all = np.array(laneline_pred_cnt_all) R_lane = np.sum(laneline_r_all, axis=0) / (np.sum(laneline_gt_cnt_all, axis=0) + 1e-6) P_lane = np.sum(laneline_p_all, axis=0) / (np.sum(laneline_pred_cnt_all, axis=0) + 1e-6) F_lane = 2 * R_lane * P_lane / (R_lane + P_lane + 1e-6) output_stats.append(F_lane) output_stats.append(R_lane) output_stats.append(P_lane) # if self.args.dist: # output_stats.append(laneline_r_all) # output_stats.append(laneline_p_all) # output_stats.append(laneline_gt_cnt_all) # output_stats.append(laneline_pred_cnt_all) if not self.no_centerline: centerline_r_all = np.array(centerline_r_all) centerline_p_all = np.array(centerline_p_all) centerline_gt_cnt_all = np.array(centerline_gt_cnt_all) centerline_pred_cnt_all = np.array(centerline_pred_cnt_all) R_lane = np.sum(centerline_r_all, axis=0) / (np.sum(centerline_gt_cnt_all, axis=0) + 1e-6) P_lane = np.sum(centerline_p_all, axis=0) / (np.sum(centerline_pred_cnt_all, axis=0) + 1e-6) F_lane = 2 * R_lane * P_lane / (R_lane + P_lane + 1e-6) output_stats.append(F_lane) output_stats.append(R_lane) output_stats.append(P_lane) # calculate metrics laneline_F = output_stats[0] laneline_F_max = np.max(laneline_F) laneline_max_i = np.argmax(laneline_F) laneline_R = output_stats[1] laneline_P = output_stats[2] if not self.no_centerline: centerline_F = output_stats[3] centerline_F_max = centerline_F[laneline_max_i] centerline_max_i = laneline_max_i centerline_R = output_stats[4] centerline_P = output_stats[5] laneline_R = np.array([1.] + laneline_R.tolist() + [0.]) laneline_P = np.array([0.] + laneline_P.tolist() + [1.]) f_laneline = interp1d(laneline_R, laneline_P) r_range = np.linspace(0.05, 0.95, 19) laneline_AP = np.mean(f_laneline(r_range)) if not self.no_centerline: centerline_R = np.array([1.] + centerline_R.tolist() + [0.]) centerline_P = np.array([0.] + centerline_P.tolist() + [1.]) f_centerline = interp1d(centerline_R, centerline_P) centerline_AP = np.mean(f_centerline(r_range)) if eval_fig_file is not None: # plot PR curve fig = plt.figure() if not self.no_centerline: ax1 = fig.add_subplot(121) ax2 = fig.add_subplot(122) ax1.plot(laneline_R, laneline_P, '-s') ax2.plot(centerline_R, centerline_P, '-s') else: ax1 = fig.add_subplot(111) ax1.plot(laneline_R, laneline_P, '-s') ax1.set_xlim(0, 1) ax1.set_ylim(0, 1) ax1.set_title('Lane Line') ax1.set_xlabel('Recall') ax1.set_ylabel('Precision') ax1.set_aspect('equal') ax1.legend('Max F-measure {:.3}'.format(laneline_F_max)) if not self.no_centerline: ax2.set_xlim(0, 1) ax2.set_ylim(0, 1) ax2.set_title('Center Line') ax2.set_xlabel('Recall') ax2.set_ylabel('Precision') ax2.set_aspect('equal') ax2.legend('Max F-measure {:.3}'.format(centerline_F_max)) # fig.subplots_adjust(wspace=0.1, hspace=0.01) fig.savefig(eval_fig_file) plt.close(fig) json_out = {} json_out['laneline_R'] = laneline_R[1:-1].astype(np.float32).tolist() json_out['laneline_P'] = laneline_P[1:-1].astype(np.float32).tolist() json_out['laneline_F_max'] = laneline_F_max json_out['laneline_max_i'] = laneline_max_i.tolist() json_out['laneline_AP'] = laneline_AP if not self.no_centerline: json_out['centerline_R'] = centerline_R[1:-1].astype(np.float32).tolist() json_out['centerline_P'] = centerline_P[1:-1].astype(np.float32).tolist() json_out['centerline_F_max'] = centerline_F_max json_out['centerline_max_i'] = centerline_max_i.tolist() json_out['centerline_AP'] = centerline_AP json_out['max_F_prob_th'] = varying_th[laneline_max_i] if eval_out_file is not None: with open(eval_out_file, 'w') as jsonFile: jsonFile.write(json.dumps(json_out)) jsonFile.write('\n') jsonFile.close() for k, v in json_out.items(): self.logger.info('%s: %s' % (k, v)) return json_out ================================================ FILE: utils/eval_3D_once.py ================================================ import argparse import numpy as np from multiprocessing import Process import cv2 # from jarvis.eload import load_json from munkres import Munkres import os from shapely.geometry import LineString # from jarvis.edump import ptable_to_csv # from jarvis.epath import inherit import time # from jarvis.eload import load_json import tempfile import json from prettytable import PrettyTable import torch class Bev_Projector: def __init__(self, side_range, fwd_range, height_range, res, lane_width_x, lane_width_y): self.side_range = side_range self.fwd_range = fwd_range self.height_range = height_range self.res = res self.lane_width_x = lane_width_x self.lane_width_y = lane_width_y self.zx_xmax = int((self.side_range[1] - self.side_range[0]) / self.res) self.zx_ymax = int((self.fwd_range[1] - self.fwd_range[0]) / self.res) self.zy_xmax = self.zx_ymax self.zy_ymax = int((self.height_range[1] - self.height_range[0]) / self.res) def proj_oneline_zx(self, one_lane): """ :param one_lane: N*3,[[x,y,z],...] :return: """ img = np.zeros([self.zx_ymax, self.zx_xmax], dtype=np.uint8) one_lane = np.array(one_lane) one_lane = one_lane[one_lane[:, 2] < 10] lane_x = one_lane[:, 0] lane_z = one_lane[:, 2] x_img = (lane_x / self.res).astype(np.int32) y_img = (-lane_z / self.res).astype(np.int32) x_img += int(self.side_range[1] / self.res) y_img += int(self.fwd_range[1] / self.res) # img[y_img, x_img] = 255 for i in range(y_img.shape[0]-1): cv2.line(img, (x_img[i], y_img[i]), (x_img[i+1], y_img[i+1]), 255, self.lane_width_x) return img class LaneEval: @staticmethod def file_parser(gt_root_path, pred_root_path): gt_files_list = list() pred_files_list = list() for segment in os.listdir(gt_root_path): gt_segment_path = os.path.join(gt_root_path, segment) gt_segment_path = os.path.join(gt_segment_path, 'cam01') pred_segment_path = os.path.join(pred_root_path, segment) if not os.path.exists(pred_segment_path): assert False print('%s Missed from pred' % segment) continue pred_segment_path = os.path.join(pred_segment_path, 'cam01') gt_files_list.extend([os.path.join(gt_segment_path, filename) for filename in os.listdir(gt_segment_path) if filename.endswith(".json")]) pred_files_list.extend([os.path.join(pred_segment_path, filename) for filename in os.listdir(gt_segment_path) if filename.endswith(".json")]) return gt_files_list, pred_files_list @staticmethod def summarize(res): gt_all = 0. pred_all = 0. tp_all = 0. distance_mean = 0. for res_spec in res: gt_all += res_spec[0] pred_all += res_spec[1] tp_all += res_spec[2] distance_mean += res_spec[3] precision = tp_all / pred_all recall = tp_all / gt_all if precision + recall == 0: F_value = 0. else: F_value = 2 * precision * recall / (precision + recall) distance_mean /= tp_all + 1e-5 return dict( F_value=F_value, precision=precision, recall=recall, distance_error=distance_mean, ) def lane_evaluation(self, gt_root_path, pred_root_path, config_path, args=None): gt_files_list, pred_files_list = self.file_parser(gt_root_path, pred_root_path) with open(config_path, 'r') as file: file_lines = [line for line in file] if len(file_lines) != 0: config = json.loads(file_lines[0]) # config = json.loads(config_path) process_num = config['process_num'] score_l = int(config["score_l"] * 100) score_h = int(config["score_h"] * 100) score_step = int(config["score_step"] * 100) score_num = int((score_h - score_l) / score_step) config['score_num'] = score_num tempfile.tempdir = './tmp' if not os.path.exists('./tmp'): os.mkdir('./tmp') tmp_dir = tempfile.mkdtemp() if not os.path.exists(tmp_dir): os.mkdir(tmp_dir) gt_in_process = [[] for n in range(process_num)] pr_in_process = [[] for n in range(process_num)] n_file = 0 for gt_file, pred_file in zip(gt_files_list, pred_files_list): gt_in_process[n_file % process_num].append(gt_file) pr_in_process[n_file % process_num].append(pred_file) n_file += 1 process_list = list() for n in range(process_num): tmp_file = tmp_dir + str(n) + ".json" config["tmp_file"] = tmp_file p = Process(target=evaluate_list, args=(gt_in_process[n], pr_in_process[n], config)) process_list.append(p) p.start() for p in process_list: p.join() torch.distributed.barrier() gt_all = np.zeros((score_num,), dtype=np.float32) pr_all = np.zeros((score_num,), dtype=np.float32) tp_all = np.zeros((score_num,), dtype=np.float32) distance_error = np.zeros((score_num,), dtype=np.float32) for n in range(process_num): tmp_file = tmp_dir + str(n) + ".json" json_data = json.load(open(tmp_file)) gt_all += json_data['gt_all'] pr_all += json_data['pr_all'] tp_all += json_data['tp_all'] distance_error += json_data['distance_error'] precision = tp_all / pr_all recall = tp_all / gt_all F_value = 2 * precision * recall / (precision + recall) distance_error /= tp_all + 1e-5 pt = PrettyTable() title_file = f'evaluate by {__file__}' pt.title = f'{title_file}' pt.field_names = ['prob_thresh', 'F1', 'precision', 'recall', 'D error'] for i in range(score_l, score_h, score_step): index = int((i - score_l) / score_step) pt.add_row([str(i / 100), F_value[index], precision[index], recall[index], distance_error[index] ]) if args.proc_id == 0: print(pt) result_dir = os.path.join(os.path.dirname(__file__), 'eval_results') os.makedirs(result_dir, exist_ok=True) result_file_name = config['exp_name'] # result_path = inherit(dirname=__file__, filename=result_file_name, middlename='eval_results', suffix='.csv') # ptable_to_csv(table=pt, filename=result_path) print(f'''legacy evaluate end at {time.strftime('%Y-%m-%d @ %H:%M:%S')}''') return F_value def evaluate_list(gt_path_list, pred_path_list, config): bev_projector = Bev_Projector(side_range=(config['side_range_l'], config['side_range_h']), fwd_range=(config['fwd_range_l'], config['fwd_range_h']), height_range=(config['height_range_l'], config['height_range_h']), res=config['res'], lane_width_x=config['lane_width_x'], lane_width_y=config['lane_width_y']) score_num = config["score_num"] tmp_file = config["tmp_file"] iou_thresh = config['iou_thresh'] distance_thresh = config['distance_thresh'] score_l = int(config["score_l"] * 100) score_h = int(config["score_h"] * 100) score_step = int(config["score_step"] * 100) gt_all = np.zeros((score_num,), dtype=np.float32) pr_all = np.zeros((score_num,), dtype=np.float32) tp_all = np.zeros((score_num,), dtype=np.float32) distance_error = np.zeros((score_num,), dtype=np.float32) for gt_path, pred_path in zip(gt_path_list, pred_path_list): leof = LaneEvalOneFile(gt_path, pred_path, bev_projector, iou_thresh, distance_thresh, score_l, score_h, score_step) gt_num, pr_num, tp_num, distance_tmp = leof.eval() gt_all += gt_num pr_all += pr_num tp_all += tp_num distance_error += distance_tmp json_out_data = {"gt_all": gt_all.tolist(), "pr_all": pr_all.tolist(), "tp_all": tp_all.tolist(), "distance_error": distance_error.tolist()} fid_tmp_out = open(tmp_file, 'w') json.dump(json_out_data, fid_tmp_out, indent=4) fid_tmp_out.close() class LaneEvalOneFile: def __init__(self, gt_path, pred_path, bev_projector, iou_thresh, distance_thresh, score_l, score_h, score_step): self.gt_path = gt_path self.pred_path = pred_path self.bev_projector = bev_projector self.iou_thresh = iou_thresh self.distance_thresh = distance_thresh self.score_l = score_l self.score_h = score_h self.score_step = score_step def preprocess(self, store_spec): gt_json = json.load(open(self.gt_path)) pred_json = json.load(open(self.pred_path)) gt_lanes3d = gt_json['lanes'] gt_lanes3d = [gt_lanespec3d for gt_lanespec3d in gt_lanes3d if len(gt_lanespec3d) >= 2] gt_num = len(gt_lanes3d) pred_lanes3d = pred_json['lanes'] pred_lanes3d = [pred_lanespec3d["points"] for pred_lanespec3d in pred_lanes3d if len(pred_lanespec3d) >= 2 and np.float(pred_lanespec3d["score"]) > store_spec] # pred_lanes3d = [pred_lanespec3d for pred_lanespec3d in pred_lanes3d if len(pred_lanespec3d) >= 2] pred_num = len(pred_lanes3d) return gt_lanes3d, gt_num, pred_lanes3d, pred_num def calc_iou(self, lane1, lane2): """ :param lane1: :param lane2: :return: """ img1, img2 = self.bev_projector.proj_oneline_zx(lane1), self.bev_projector.proj_oneline_zx(lane2) union_im = cv2.bitwise_or(img1, img2) union_sum = union_im.sum() inter_sum = img1.sum() + img2.sum() - union_sum if union_sum == 0: return 0 else: return inter_sum / float(union_sum) def cal_mean_dist(self, src_line, dst_line): """ :param src_line: gt :param dst_line: pred :return: """ src_line = LineString(np.array(src_line)) dst_line = LineString(np.array(dst_line)) total_distance = 0 samples = np.arange(0.05, 1, 0.1) for sample in samples: total_distance += src_line.interpolate(sample, normalized=True).distance(dst_line) mean_distance = total_distance / samples.shape[0] return mean_distance def sort_lanes_z(self, lanes): sorted_lanes = list() for lane_spec in lanes: if lane_spec[0][-1] > lane_spec[1][-1]: lane_spec = lane_spec[::-1] sorted_lanes.append(lane_spec) return sorted_lanes def eval(self): gt_num = list() pred_num = list() tp = list() distance_error = list() for store in range(self.score_l, self.score_h, self.score_step): store_spec = store * 0.01 gt_lanes, gt_num_spec, pred_lanes, pred_num_spec = self.preprocess(store_spec) gt_lanes = self.sort_lanes_z(gt_lanes) pred_lanes = self.sort_lanes_z(pred_lanes) tp_spec, distance_error_spec = self.cal_tp(gt_num_spec, pred_num_spec, gt_lanes, pred_lanes) gt_num.append(gt_num_spec) pred_num.append(pred_num_spec) tp.append(tp_spec) distance_error.append(distance_error_spec) return gt_num, pred_num, tp, distance_error def cal_tp(self, gt_num, pred_num, gt_lanes, pred_lanes): tp = 0 distance_error = 0 if gt_num > 0 and pred_num > 0: iou_mat = [[0 for col in range(pred_num)] for row in range(gt_num)] for i in range(gt_num): for j in range(pred_num): iou_mat[i][j] = self.calc_iou(gt_lanes[i], pred_lanes[j]) cost_mat = [] for row in iou_mat: cost_row = list() for col in row: cost_row.append(1.0 - col) cost_mat.append(cost_row) m = Munkres() match_idx = m.compute(cost_mat) # for row, col in match_idx: gt_lane = gt_lanes[row] pred_lane = pred_lanes[col] cur_distance = self.cal_mean_dist(gt_lane, pred_lane) if cur_distance < self.distance_thresh: distance_error += cur_distance tp += 1 return tp, distance_error def parse_config(): parser = argparse.ArgumentParser(description='arg parser') parser.add_argument('--cfg_file', type=str, default='/home/dingzihan/PersFormer_3DLane/config/once_eval_config.json', help='specify the config for evaluation') # parser.add_argument('--gt_path', type=str, default=None, required=True, help='') # parser.add_argument('--pred_path', type=str, default=None, required=True, help='') args, unknown_args = parser.parse_known_args() return args, unknown_args ================================================ FILE: utils/utils.py ================================================ # ============================================================================== # Copyright (c) 2022 The PersFormer Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== import argparse import errno import os import sys from pathlib import Path import cv2 import matplotlib import numpy as np import torch import torch.nn.init as init import torch.optim from torch.optim import lr_scheduler import os.path as ops from scipy.interpolate import interp1d from scipy.special import softmax import logging, datetime from experiments.gpu_utils import is_main_process from mmdet.utils import get_root_logger as get_mmdet_root_logger def create_logger(args): datenow = datetime.datetime.now() ymd = '-'.join(list(map(str, [datenow.year, datenow.month, datenow.day]))) hms = ':'.join(list(map(str, [datenow.hour, datenow.minute, datenow.second]))) logname = '%s_%s' % (ymd, hms) logdir = os.path.join(args.save_path, 'logs') os.makedirs(logdir, exist_ok=True) ckpt_name = Path(args.eval_ckpt).stem.split('checkpoint_model_epoch_')[-1] logtype = 'eval_{}'.format(ckpt_name) if args.evaluate else 'train' filename = os.path.join(logdir, '%s_%s.log' % (logtype, logname)) logging.basicConfig(level=logging.INFO, format ='%(asctime)s %(filename)s[line:%(lineno)d] %(levelname)s %(message)s', datefmt='%a, %d-%b-%Y %H:%W:%S', filename=filename, filemode= 'w' ) # logger = logging.getLogger(filename) logger = get_mmdet_root_logger(log_file=filename, log_level=logging.INFO) return logger def define_args(): parser = argparse.ArgumentParser(description='PersFormer_3DLane_Detection') # CUDNN usage parser.add_argument("--cudnn", type=str2bool, nargs='?', const=True, default=True, help="cudnn optimization active") # DDP setting parser.add_argument('--distributed', action='store_true') parser.add_argument("--local_rank", type=int) parser.add_argument('--gpu', type=int, default = 0) parser.add_argument('--world_size', type=int, default = 1) parser.add_argument('--nodes', type=int, default = 1) parser.add_argument('--eval_ckpt', type=str, default='') parser.add_argument('--resume_from', type=str, default='') parser.add_argument('--no_eval', action='store_true') # General model settings parser.add_argument('--nworkers', type=int, default=0, help='num of threads') parser.add_argument('--test_mode', action='store_true', help='prevents loading latest saved model') parser.add_argument('--start_epoch', type=int, default=0, help='prevents loading latest saved model') parser.add_argument('--evaluate', action='store_true', default=False, help='only perform evaluation') parser.add_argument('--vis', action='store_true') parser.add_argument('--resume', type=str, default='', help='resume latest saved run') parser.add_argument('--output_dir', default='openlane', type=str, help='output_dir name under `work_dirs`') parser.add_argument('--evaluate_case', default='', type=str, help='scene name, some are in shor.') parser.add_argument('--eval_freq', type=int, default=2, help='evaluation frequency during training, 0 means no eval', ) # eval using gen-laneNet parser.add_argument('--rewrite_pred', default=False, action='store_true', help='whether rewrite existing pred .json file.') parser.add_argument('--save_best', default=False, action='store_true', help='only save best ckpt.') # workdir parser.add_argument('--save_root', default='work_dirs', type=str) # dataset parser.add_argument('--dataset', default='300', type=str, help='1000 | 300 openlane dataset') return parser def prune_3d_lane_by_visibility(lane_3d, visibility): lane_3d = lane_3d[visibility > 0, ...] return lane_3d def prune_3d_lane_by_range(lane_3d, x_min, x_max): # TODO: solve hard coded range later # remove points with y out of range # 3D label may miss super long straight-line with only two points: Not have to be 200, gt need a min-step # 2D dataset requires this to rule out those points projected to ground, but out of meaningful range lane_3d = lane_3d[np.logical_and(lane_3d[:, 1] > 0, lane_3d[:, 1] < 200), ...] # remove lane points out of x range lane_3d = lane_3d[np.logical_and(lane_3d[:, 0] > x_min, lane_3d[:, 0] < x_max), ...] return lane_3d def resample_laneline_in_y(input_lane, y_steps, out_vis=False): """ Interpolate x, z values at each anchor grid, including those beyond the range of input lnae y range :param input_lane: N x 2 or N x 3 ndarray, one row for a point (x, y, z-optional). It requires y values of input lane in ascending order :param y_steps: a vector of steps in y :param out_vis: whether to output visibility indicator which only depends on input y range :return: """ # at least two points are included assert(input_lane.shape[0] >= 2) y_min = np.min(input_lane[:, 1])-5 y_max = np.max(input_lane[:, 1])+5 if input_lane.shape[1] < 3: input_lane = np.concatenate([input_lane, np.zeros([input_lane.shape[0], 1], dtype=np.float32)], axis=1) f_x = interp1d(input_lane[:, 1], input_lane[:, 0], fill_value="extrapolate") f_z = interp1d(input_lane[:, 1], input_lane[:, 2], fill_value="extrapolate") x_values = f_x(y_steps) z_values = f_z(y_steps) if out_vis: output_visibility = np.logical_and(y_steps >= y_min, y_steps <= y_max) return x_values, z_values, output_visibility.astype(np.float32) + 1e-9 return x_values, z_values def resample_laneline_in_y_with_vis(input_lane, y_steps, vis_vec): """ Interpolate x, z values at each anchor grid, including those beyond the range of input lnae y range :param input_lane: N x 2 or N x 3 ndarray, one row for a point (x, y, z-optional). It requires y values of input lane in ascending order :param y_steps: a vector of steps in y :param out_vis: whether to output visibility indicator which only depends on input y range :return: """ # at least two points are included assert(input_lane.shape[0] >= 2) if input_lane.shape[1] < 3: input_lane = np.concatenate([input_lane, np.zeros([input_lane.shape[0], 1], dtype=np.float32)], axis=1) f_x = interp1d(input_lane[:, 1], input_lane[:, 0], fill_value="extrapolate") f_z = interp1d(input_lane[:, 1], input_lane[:, 2], fill_value="extrapolate") f_vis = interp1d(input_lane[:, 1], vis_vec, fill_value="extrapolate") x_values = f_x(y_steps) z_values = f_z(y_steps) vis_values = f_vis(y_steps) x_values = x_values[vis_values > 0.5] y_values = y_steps[vis_values > 0.5] z_values = z_values[vis_values > 0.5] return np.array([x_values, y_values, z_values]).T def homograpthy_g2im(cam_pitch, cam_height, K): # transform top-view region to original image region R_g2c = np.array([[1, 0, 0], [0, np.cos(np.pi / 2 + cam_pitch), -np.sin(np.pi / 2 + cam_pitch)], [0, np.sin(np.pi / 2 + cam_pitch), np.cos(np.pi / 2 + cam_pitch)]]) H_g2im = np.matmul(K, np.concatenate([R_g2c[:, 0:2], [[0], [cam_height], [0]]], 1)) return H_g2im def projection_g2im(cam_pitch, cam_height, K): P_g2c = np.array([[1, 0, 0, 0], [0, np.cos(np.pi / 2 + cam_pitch), -np.sin(np.pi / 2 + cam_pitch), cam_height], [0, np.sin(np.pi / 2 + cam_pitch), np.cos(np.pi / 2 + cam_pitch), 0]]) P_g2im = np.matmul(K, P_g2c) return P_g2im def homograpthy_g2im_extrinsic(E, K): """E: extrinsic matrix, 4*4""" E_inv = np.linalg.inv(E)[0:3, :] H_g2c = E_inv[:, [0,1,3]] H_g2im = np.matmul(K, H_g2c) return H_g2im def projection_g2im_extrinsic(E, K): E_inv = np.linalg.inv(E)[0:3, :] P_g2im = np.matmul(K, E_inv) return P_g2im def homography_crop_resize(org_img_size, crop_y, resize_img_size): """ compute the homography matrix transform original image to cropped and resized image :param org_img_size: [org_h, org_w] :param crop_y: :param resize_img_size: [resize_h, resize_w] :return: """ # transform original image region to network input region ratio_x = resize_img_size[1] / org_img_size[1] ratio_y = resize_img_size[0] / (org_img_size[0] - crop_y) H_c = np.array([[ratio_x, 0, 0], [0, ratio_y, -ratio_y*crop_y], [0, 0, 1]]) return H_c def homographic_transformation(Matrix, x, y): """ Helper function to transform coordinates defined by transformation matrix Args: Matrix (multi dim - array): 3x3 homography matrix x (array): original x coordinates y (array): original y coordinates """ ones = np.ones((1, len(y))) coordinates = np.vstack((x, y, ones)) trans = np.matmul(Matrix, coordinates) x_vals = trans[0, :]/trans[2, :] y_vals = trans[1, :]/trans[2, :] return x_vals, y_vals def projective_transformation(Matrix, x, y, z): """ Helper function to transform coordinates defined by transformation matrix Args: Matrix (multi dim - array): 3x4 projection matrix x (array): original x coordinates y (array): original y coordinates z (array): original z coordinates """ ones = np.ones((1, len(z))) coordinates = np.vstack((x, y, z, ones)) trans = np.matmul(Matrix, coordinates) x_vals = trans[0, :]/trans[2, :] y_vals = trans[1, :]/trans[2, :] return x_vals, y_vals def first_run(save_path): txt_file = os.path.join(save_path,'first_run.txt') if not os.path.exists(txt_file): open(txt_file, 'w').close() else: saved_epoch = open(txt_file).read() if saved_epoch is None: print('You forgot to delete [first run file]') return '' return saved_epoch return '' def mkdir_if_missing(directory): if not os.path.exists(directory): try: os.makedirs(directory) except OSError as e: if e.errno != errno.EEXIST: raise # trick from stackoverflow def str2bool(argument): if argument.lower() in ('yes', 'true', 't', 'y', '1'): return True elif argument.lower() in ('no', 'false', 'f', 'n', '0'): return False else: raise argparse.ArgumentTypeError('Wrong argument in argparse, should be a boolean') class Logger(object): """ Source https://github.com/Cysu/open-reid/blob/master/reid/utils/logging.py. """ def __init__(self, fpath=None): self.console = sys.stdout self.file = None self.fpath = fpath if fpath is not None: mkdir_if_missing(os.path.dirname(fpath)) self.file = open(fpath, 'w') def __del__(self): self.close() def __enter__(self): pass def __exit__(self, *args): self.close() def write(self, msg): self.console.write(msg) if self.file is not None: self.file.write(msg) def flush(self): self.console.flush() if self.file is not None: self.file.flush() os.fsync(self.file.fileno()) def close(self): self.console.close() if self.file is not None: self.file.close() class AverageMeter(object): """Computes and stores the average and current value""" def __init__(self): self.reset() def reset(self): self.val = 0 self.avg = 0 self.sum = 0 self.count = 0 def update(self, val, n=1): self.val = val self.sum += val * n self.count += n self.avg = self.sum / self.count def define_optim(optim, params, lr, weight_decay): if optim == 'adam': optimizer = torch.optim.Adam(params, lr=lr, weight_decay=weight_decay) elif optim == 'adamw': optimizer = torch.optim.AdamW(params, lr=lr, weight_decay=weight_decay) elif optim == 'sgd': optimizer = torch.optim.SGD(params, lr=lr, momentum=0.9, weight_decay=weight_decay) elif optim == 'rmsprop': optimizer = torch.optim.RMSprop(params, lr=lr, momentum=0.9, weight_decay=weight_decay) else: raise KeyError("The requested optimizer: {} is not implemented".format(optim)) return optimizer def cosine_schedule_with_warmup(k, args, dataset_size=None): # k : iter num num_gpu = args.world_size dataset_size = dataset_size batch_size = args.batch_size num_epochs = args.nepochs if num_gpu == 1: warmup_iters = 0 else: warmup_iters = 1000 // num_gpu if k < warmup_iters: return (k + 1) / warmup_iters else: iter_per_epoch = (dataset_size + batch_size - 1) // batch_size return 0.5 * (1 + np.cos(np.pi * (k - warmup_iters) / (num_epochs * iter_per_epoch))) def define_scheduler(optimizer, args, dataset_size=None): if args.lr_policy == 'lambda': def lambda_rule(epoch): lr_l = 1.0 - max(0, epoch + 1 - args.niter) / float(args.niter_decay + 1) return lr_l scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda_rule) elif args.lr_policy == 'step': scheduler = lr_scheduler.StepLR(optimizer, step_size=args.lr_decay_iters, gamma=args.gamma) elif args.lr_policy == 'multi_step': scheduler = lr_scheduler.MultiStepLR( optimizer, step_size=args.lr_multi_steps, gamma=args.gamma) elif args.lr_policy == 'cosine': scheduler = lr_scheduler.CosineAnnealingLR(optimizer, T_max=args.T_max, eta_min=args.eta_min) elif args.lr_policy == 'cosine_warm': ''' lr_config = dict( policy='CosineAnnealing', warmup='linear', warmup_iters=500, warmup_ratio=1.0 / 3, min_lr_ratio=1e-3) ''' scheduler = lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0=args.T_0, T_mult=args.T_mult, eta_min=args.eta_min) # elif args.lr_policy == 'plateau': # scheduler = lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', # factor=args.gamma, # threshold=0.0001, # patience=args.lr_decay_iters) elif args.lr_policy == 'cosine_warmup': from functools import partial cosine_warmup = partial(cosine_schedule_with_warmup, args=args, dataset_size=dataset_size) scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=cosine_warmup) elif args.lr_policy == 'None': scheduler = None else: return NotImplementedError('learning rate policy [%s] is not implemented', args.lr_policy) return scheduler def define_init_weights(model, init_w='normal', activation='relu'): # print('Init weights in network with [{}]'.format(init_w)) if init_w == 'normal': model.apply(weights_init_normal) elif init_w == 'xavier': model.apply(weights_init_xavier) elif init_w == 'kaiming': model.apply(weights_init_kaiming) elif init_w == 'orthogonal': model.apply(weights_init_orthogonal) else: raise NotImplementedError('initialization method [{}] is not implemented'.format(init_w)) def weights_init_normal(m): classname = m.__class__.__name__ # print(classname) if classname.find('Conv') != -1 or classname.find('ConvTranspose') != -1: try: init.normal_(m.weight.data, 0.0, 0.02) if m.bias is not None: m.bias.data.zero_() except: print("{} not support init".format(str(classname))) elif classname.find('Linear') != -1: try: init.normal_(m.weight.data, 0.0, 0.02) if m.bias is not None: m.bias.data.zero_() except: print("{} not support init".format(str(classname))) elif classname.find('BatchNorm2d') != -1: try: init.normal_(m.weight.data, 1.0, 0.02) init.constant_(m.bias.data, 0.0) except: print("{} not support init".format(str(classname))) def weights_init_xavier(m): classname = m.__class__.__name__ # print(classname) if classname.find('Conv') != -1 or classname.find('ConvTranspose') != -1: init.xavier_normal_(m.weight.data, gain=0.02) if m.bias is not None: m.bias.data.zero_() elif classname.find('Linear') != -1: init.xavier_normal_(m.weight.data, gain=0.02) if m.bias is not None: m.bias.data.zero_() elif classname.find('BatchNorm2d') != -1: init.normal_(m.weight.data, 1.0, 0.02) init.constant_(m.bias.data, 0.0) def weights_init_kaiming(m): classname = m.__class__.__name__ # print(classname) if classname.find('Conv') != -1 or classname.find('ConvTranspose') != -1: init.kaiming_normal_(m.weight.data, a=0, mode='fan_in', nonlinearity='relu') if m.bias is not None: m.bias.data.zero_() elif classname.find('Linear') != -1: init.kaiming_normal_(m.weight.data, a=0, mode='fan_in', nonlinearity='relu') if m.bias is not None: m.bias.data.zero_() elif classname.find('BatchNorm2d') != -1: init.normal_(m.weight.data, 1.0, 0.02) init.constant_(m.bias.data, 0.0) def weights_init_orthogonal(m): classname = m.__class__.__name__ # print(classname) if classname.find('Conv') != -1 or classname.find('ConvTranspose') != -1: init.orthogonal(m.weight.data, gain=1) if m.bias is not None: m.bias.data.zero_() elif classname.find('Linear') != -1: init.orthogonal(m.weight.data, gain=1) if m.bias is not None: m.bias.data.zero_() elif classname.find('BatchNorm2d') != -1: init.normal_(m.weight.data, 1.0, 0.02) init.constant_(m.bias.data, 0.0) ================================================ FILE: work_dirs/.gitkeep ================================================