Repository: Nicholasli1995/EgoNet
Branch: master
Commit: 13e3758388ab
Files: 56
Total size: 467.4 KB

Directory structure:
gitextract_zura5gg8/

├── .gitignore
├── LICENSE
├── README.md
├── configs/
│   ├── KITTI_inference:demo.yml
│   ├── KITTI_inference:test_submission.yml
│   ├── KITTI_train_IGRs.yml
│   ├── KITTI_train_IGRs_Ped.yml
│   └── KITTI_train_lifting.yml
├── docs/
│   ├── demo.md
│   ├── inference.md
│   ├── preparation.md
│   ├── spec-list.txt
│   └── training.md
├── libs/
│   ├── arguments/
│   │   ├── __init__.py
│   │   └── parse.py
│   ├── common/
│   │   ├── __init__.py
│   │   ├── format.py
│   │   ├── img_proc.py
│   │   ├── transformation.py
│   │   └── utils.py
│   ├── dataset/
│   │   ├── KITTI/
│   │   │   ├── __init__.py
│   │   │   └── car_instance.py
│   │   ├── __init__.py
│   │   ├── basic/
│   │   │   ├── __init__.py
│   │   │   └── basic_classes.py
│   │   └── normalization/
│   │       ├── __init__.py
│   │       └── operations.py
│   ├── logger/
│   │   ├── __init__.py
│   │   └── logger.py
│   ├── loss/
│   │   ├── __init__.py
│   │   └── function.py
│   ├── metric/
│   │   └── criterions.py
│   ├── model/
│   │   ├── FCmodel.py
│   │   ├── __init__.py
│   │   ├── egonet.py
│   │   └── heatmapModel/
│   │       ├── __init__.py
│   │       ├── hrnet.py
│   │       └── resnet.py
│   ├── optimizer/
│   │   ├── __init__.py
│   │   └── optimizer.py
│   ├── trainer/
│   │   ├── __init__.py
│   │   ├── accuracy.py
│   │   └── trainer.py
│   └── visualization/
│       ├── __init__.py
│       ├── debug.py
│       ├── egonet_utils.py
│       └── points.py
└── tools/
    ├── inference.py
    ├── inference_legacy.py
    ├── kitti-eval/
    │   ├── README.md
    │   ├── evaluate_object_3d.cpp
    │   ├── evaluate_object_3d_offline.cpp
    │   ├── evaluate_object_3d_offline_r40.cpp
    │   └── mail.h
    ├── train_IGRs.py
    └── train_lifting.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
#/*
**/__pycache__
.spyproject/
*.log
*.ini
*.bak
*.pth
*.csv
*.jpg
*.png
*.pdf
/tools/kitti-eval/evaluate_object_3d_offline
/outputs


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2022 Shichao Li

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploring-intermediate-representation-for/vehicle-pose-estimation-on-kitti-cars-hard)](https://paperswithcode.com/sota/vehicle-pose-estimation-on-kitti-cars-hard?p=exploring-intermediate-representation-for)
# EgoNet
Official project website for the CVPR 2021 paper "Exploring intermediate representation for monocular vehicle pose estimation". This repo includes an implementation that performs vehicle orientation estimation on the KITTI dataset from a single RGB image. 

News:

(2022-??-??): v-1.1 will be released which include pre-trained models for other object classes (Pedestrian and Cyclist in KITTI).

(2021-08-16): v-1.0 is released. The training documentation is added.

(2021-06-21): v-0.9 (beta version) is released. **The inference utility is here!** For Q&A, go to [discussions](https://github.com/Nicholasli1995/EgoNet/discussions). If you believe there is a technical problem, submit to [issues](https://github.com/Nicholasli1995/EgoNet/issues). 

(2021-06-16): This repo is under final code cleaning and documentation preparation. Stay tuned and come back in a week!

**Check our 5-min video ([Youtube](https://www.youtube.com/watch?v=isKo0F3MU68), [爱奇艺](https://www.iqiyi.com/v_y6lrdy33kg.html)) for an introduction.**

**中文详解**：[哔哩哔哩](https://www.bilibili.com/video/BV1jP4y1t7ee)
<p align="center">
  <img src="https://github.com/Nicholasli1995/EgoNet/blob/master/imgs/teaser.jpg"  width="830" height="200" />
</p>

## Run a demo with a one-line command!
Check instructions [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/demo.md).
<p align="center">
  <img src="https://github.com/Nicholasli1995/EgoNet/blob/master/imgs/Ego-Net_demo.png" height="175"/>
  <img src="https://github.com/Nicholasli1995/EgoNet/blob/master/imgs/Ego-Net_demo.gif" height="175"/>
</p>

## Performance: AP<sup>BEV</sup>@R<sub>40</sub> on KITTI val set for Car (monocular RGB)
The validation results in the paper was based on R<sub>11</sub>, the results using R<sub>40</sub> are attached here.
| Method                    | Reference|Easy|Moderate|Hard|
| ------------------------- | ---------------| --------------| --------------| --------------| 
|[M3D-RPN](https://arxiv.org/abs/1907.06038)|ICCV 2019|20.85| 15.62| 11.88|
|[MonoDIS](https://openaccess.thecvf.com/content_ICCV_2019/papers/Simonelli_Disentangling_Monocular_3D_Object_Detection_ICCV_2019_paper.pdf)|ICCV 2019|18.45 |12.58 |10.66|
|[MonoPair](https://arxiv.org/abs/2003.00504)|CVPR 2020|24.12| 18.17| 15.76|
|[D4LCN](https://github.com/dingmyu/D4LCN)|CVPR 2020|31.53 |22.58  |17.87|
|[Kinematic3D](https://arxiv.org/abs/2007.09548)|ECCV 2020|27.83| 19.72| 15.10|
|[GrooMeD-NMS](https://github.com/abhi1kumar/groomed_nms)|CVPR 2021 |27.38|19.75|15.92|
|[MonoDLE](https://github.com/xinzhuma/monodle)|CVPR 2021|24.97| 19.33| 17.01|
|Ours (@R<sub>11</sub>)           |CVPR 2021 |**33.60**|**25.38**|**22.80**|
|Ours (@R<sub>40</sub>)           |CVPR 2021 |**34.31**|**24.80**|**20.16**|

## Performance: AOS@R<sub>40</sub> on KITTI test set for Car (RGB)

| Method                    | Reference|Configuration|Easy|Moderate|Hard|
| ------------------------- | ---------------| --------------| --------------| --------------| --------------| 
|[M3D-RPN](https://arxiv.org/abs/1907.06038)|ICCV 2019|Monocular|88.38 |82.81| 67.08|
|[DSGN](https://github.com/Jia-Research-Lab/DSGN)|CVPR 2020|Stereo|95.42|86.03| 78.27|
|[Disp-RCNN](https://github.com/zju3dv/disprcnn)|CVPR 2020|Stereo |93.02 |	81.70 |	67.16|
|[MonoPair](https://arxiv.org/abs/2003.00504)|CVPR 2020|Monocular|91.65 |86.11 |76.45|
|[D4LCN](https://github.com/dingmyu/D4LCN)|CVPR 2020|Monocular|90.01|82.08| 63.98|
|[Kinematic3D](https://arxiv.org/abs/2007.09548)|ECCV 2020|Monocular|58.33 |	45.50 |	34.81|
|[MonoDLE](https://github.com/xinzhuma/monodle)|CVPR 2021|Monocular|93.46| 90.23| 80.11|
|[Ours](http://www.cvlibs.net/datasets/kitti/eval_object_detail.php?&result=e5233225fd5ef36fa63eb00252d9c00024961f2c)           |CVPR 2021 |Monocular|**96.11**|**91.23**|**80.96**|

## Inference/Deployment
Check instructions [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/inference.md) to **reproduce** the above quantitative results.

## Training
Check instructions [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/training.md) to train Ego-Net and learn how to prepare your own training dataset other than KITTI.

## Citation
Please star this repository and cite the following paper in your publications if it helps your research:

    @InProceedings{Li_2021_CVPR,
    author    = {Li, Shichao and Yan, Zengqiang and Li, Hongyang and Cheng, Kwang-Ting},
    title     = {Exploring intermediate representation for monocular vehicle pose estimation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2021},
    pages     = {1873-1883}
    }

## License
This repository can be used freely for non-commercial purposes. Contact me if you are interested in a commercial license.

## Links
Link to the paper:
[Exploring intermediate representation for monocular vehicle pose estimation](https://arxiv.org/abs/2011.08464)

Link to the presentation video:
[Youtube](https://www.youtube.com/watch?v=isKo0F3MU68), [爱奇艺](https://www.iqiyi.com/v_y6lrdy33kg.html)

Relevant ECCV 2020 work: [GSNet](https://github.com/lkeab/gsnet)


================================================
FILE: configs/KITTI_inference:demo.yml
================================================
# This is a YAML file storing experimental configurations for KITTI dataset

## general settings
name: 'refine a given set of predictions from D4LCN'
exp_type: 'inference'
model_type: 'heatmapModel'
use_gpu: True
use_pred_box: True
use_gt_box: True
gpu_id: [0]

## operations
train: False
save: False
evaluate: False
inference: True

## used directories
dirs:
    # output directory
    output: 'YOUR_OURPUT_DIR' 
    ckpt: 'YOUR_PRETRAINED_DIR' 
    load_prediction_file: '../resources/D4LCN/data'

## CUDNN settings
cudnn:
    enabled: True
    deterministic: True
    benchmark: False

## dataset settings
dataset:
    name: 'KITTI'
    split: 'valid'
    detect_classes: ['Car']
    3d_kpt_sample_style: 'bbox9'
    # interpolate the 3D bbox
    interpolate:
        flag: True
        style: 'bbox12'
        coef: [0.332, 0.667]
    # do some pre-processing
    pre-process: False
    root: 'YOUR_KITTI_DIR'
    # augmentation parameters
    scaling_factor: 0.2
    rotation_factor: 30. # degrees
    # pytorch image transformation setting
    pth_transform:
# mean: [0.485, 0.456, 0.406, 0., 0.] 
# std: [0.229, 0.224, 0.225, 1., 1.]    
        mean: [0.485, 0.456, 0.406] 
        std: [0.229, 0.224, 0.225]          
    # annotation style for 2d key-point
    2d_kpt_style: 'bbox9' # projected 3d bounding box corner and center points
    # input-output representation for 2d-to-3d lifting
    lft_in_rep: 'coordinates2d' # 2d coordinates on screen
    lft_out_rep: 'R3d+T' # 3d coordinates relative to centroid plus translation vector   

## model settings for a fully-connected network if used
FCModel:
    name: 'lifter'
    refine_3d: False 
    norm_twoD: False
    num_blocks: 2 
    input_size: 66 
    output_size: 96 
    num_neurons: 1024
    dropout: 0.5
    leaky: False

## settings for a fully-convolutional heatmap/coordinate regression model
heatmapModel:
    name: hrnet # here a high-resolution (hr) model is used
    add_xy: False # concatenate xy coodrinate maps along with the input
    jitter_bbox: False
    jitter_params:
        shift:
        - 0.1
        - 0.1
        scaling:
        - 0.4
        - 0.4
    input_size: 
    - 256
    - 256
    # rotate and scaling and input images
    augment_input: False
    # one can choose to regress dense semantic heatmaps or coordinates 
    head_type: 'coordinates'
    # up-sampling with pixel-shuffle
    pixel_shuffle: False
    # if an intermediate heatmap is produced
    heatmap_size:
    - 64
    - 64
    init_weights: true
    num_joints: 33
    extra:
        pretrained_layers:
        - 'conv1'
        - 'bn1'
        - 'conv2'
        - 'bn2'
        - 'layer1'
        - 'transition1'
        - 'stage2'
        - 'transition2'
        - 'stage3'
        - 'transition3'
        - 'stage4'
        final_conv_kernel: 1
        stage2:
            num_modules: 1
            num_branches: 2
            block: basic
            num_blocks:
            - 4
            - 4
            num_channels:
            - 48
            - 96
            fuse_method: sum
        stage3:
            num_modules: 4
            num_branches: 3
            block: basic
            num_blocks:
            - 4
            - 4
            - 4
            num_channels:
            - 48
            - 96
            - 192
            fuse_method: sum
        stage4:
            num_modules: 3
            num_branches: 4
            block: basic
            num_blocks:
            - 4
            - 4
            - 4
            - 4
            num_channels:
            - 48
            - 96
            - 192
            - 384
            fuse_method: sum

## testing settings
testing_settings:
    batch_size: 1
    num_threads: 0
    shuffle: True
    pin_memory: False
    apply_dropout: False
    unnormalize: False
    alpha_mode: 'proj'

================================================
FILE: configs/KITTI_inference:test_submission.yml
================================================
# YAML file storing experimental configurations for KITTI dataset

## general settings
name: 'produce vehicle pose predictions on KITTI test split given detected bounding boxes'
exp_type: 'inference'
model_type: 'heatmapModel'
use_gpu: True
use_pred_box: True
use_gt_box: False
gpu_id: [0]

## operations
train: False
save: False
visualize: False # visualize during inference
batch_to_show: 1000000 # how many batches to visualize if needed
evaluate: False
inference: True
conf_thres: 0.1 # discard low score boxes

## used directories
dirs:
    # output directory
    output: 'YOUR_OURPUT_DIR' 
    ckpt: 'YOUR_PRETRAINED_DIR' 
    # raw detection results on test set by using RRC-Net
    load_prediction_file: '../resources/test_boxes'

## CUDNN settings
cudnn:
    enabled: True
    deterministic: True
    benchmark: False

## dataset settings
dataset:
    name: 'KITTI'
    split: 'test'
    detect_classes: ['Car']
    3d_kpt_sample_style: 'bbox9'
    # interpolate the 3D bbox
    interpolate:
        flag: True
        style: 'bbox12'
        coef: [0.332, 0.667]
    # do some pre-processing
    pre-process: False
    root: 'YOUR_KITTI_DIR'
    # augmentation parameters
    scaling_factor: 0.2
    rotation_factor: 30. # degrees
    # pytorch image transformation setting
    pth_transform:
        mean: [0.485, 0.456, 0.406] # TODO re-calculate this: R, G, B, X, Y 
        std: [0.229, 0.224, 0.225]          
    # annotation style for 2d key-point
    2d_kpt_style: 'bbox9' # projected 3d bounding box corner and center points
    # input-output representation for 2d-to-3d lifting
    lft_in_rep: 'coordinates2d' # 2d coordinates on screen
    lft_out_rep: 'R3d+T' # 3d coordinates relative to centroid plus translation vector   

## model settings for a fully-connected network if used
FCModel:
    name: 'lifter'
    refine_3d: False 
    norm_twoD: False
    num_blocks: 2 
    input_size: 66 
    output_size: 96 
    num_neurons: 1024
    dropout: 0.5
    leaky: False

## settings for a fully-convolutional heatmap regression model
heatmapModel:
    name: hrnet # here a high-resolution (hr) model is used
    add_xy: False # concatenate xy coodrinate maps along with the input
    jitter_bbox: True
    jitter_params:
        shift:
        - 0.1
        - 0.1
        scaling:
        - 0.4
        - 0.4
    input_size: 
    - 256
    - 256
    # rotate and scaling and input images
    augment_input: True
    head_type: 'coordinates'
    pixel_shuffle: False
    # if an intermediate heatmap is produced
    heatmap_size:
    - 64
    - 64
    init_weights: true
    num_joints: 33
    use_different_joints_weight: False
    extra:
        pretrained_layers:
        - 'conv1'
        - 'bn1'
        - 'conv2'
        - 'bn2'
        - 'layer1'
        - 'transition1'
        - 'stage2'
        - 'transition2'
        - 'stage3'
        - 'transition3'
        - 'stage4'
        final_conv_kernel: 1
        stage2:
            num_modules: 1
            num_branches: 2
            block: basic
            num_blocks:
            - 4
            - 4
            num_channels:
            - 48
            - 96
            fuse_method: sum
        stage3:
            num_modules: 4
            num_branches: 3
            block: basic
            num_blocks:
            - 4
            - 4
            - 4
            num_channels:
            - 48
            - 96
            - 192
            fuse_method: sum
        stage4:
            num_modules: 3
            num_branches: 4
            block: basic
            num_blocks:
            - 4
            - 4
            - 4
            - 4
            num_channels:
            - 48
            - 96
            - 192
            - 384
            fuse_method: sum

## testing settings
testing_settings:
    batch_size: 1
    num_threads: 0
    shuffle: True
    pin_memory: False
    apply_dropout: False
    unnormalize: False
    alpha_mode: 'proj'

================================================
FILE: configs/KITTI_train_IGRs.yml
================================================
# YAML file storing experimental configurations for training on KITTI dataset

## general settings
name: 'kitti_kpt_loc'
exp_type: 'instanceto2d'
model_type: 'heatmapModel'
use_gpu: True
gpu_id: [0,1,2] # MODIFY this to the GPU/GPUs ids in your computer

## operations
train: True
save: True
visualize: False
evaluate: False

## output directories
dirs:
    # MODIFY them to your preferred directories
    output: '../outputs/training_record' 
    # This directory save intermediate training results (optional)
    debug: '../outputs/training_record/debug' 

## CUDNN settings
cudnn:
    enabled: True
    deterministic: False
    benchmark: False

## dataset settings
dataset:
    name: 'KITTI'
    detect_classes: ['Car']
    3d_kpt_sample_style: 'bbox9'
    interpolate:
        flag: True
        style: 'bbox12'
        coef: [0.332, 0.667]
    # do some pre-processing
    pre-process: False
    # MODIFY this to your KITTI directory
    root: '$YOUR_DIR/KITTI'
    # augmentation parameters
    scaling_factor: 0.2
    rotation_factor: 30. # degrees
    # pytorch image transformation setting
    pth_transform:
#        mean: [0.485, 0.456, 0.406, 0., 0.]
#        std: [0.229, 0.224, 0.225, 1., 1.]    
        mean: [0.485, 0.456, 0.406] 
        std: [0.229, 0.224, 0.225]    
    2d_kpt_style: 'bbox9'

## self-supervision settings
ss:
    flag: False
    # MODIFY this to your unlabeled image record if you enable self-supervised representation learning
    record_path: '$YOUR_DIR/Apollo_ss_record.npy'
    img_root: '$YOUR_DIR/ApolloScape/images'
    max_per_img: 6

## settings for a fully-convolutional heatmap/coordinate regression model
heatmapModel:
    name: hrnet # here a high-resolution (hr) model is used
    add_xy: False # concatenate xy coodrinate maps along with the input
    # data augmentation by adding noise to bounding box location
    jitter_bbox: True
    jitter_params:
        shift:
        - 0.1
        - 0.1
        scaling:
        - 0.4
        - 0.4
    input_size: 
    - 256
    - 256
    # rotate and scaling and input images
    augment_input: True
    head_type: 'coordinates'
    # up-sampling with pixel-shuffle
    pixel_shuffle: False
    # if an intermediate heatmap is produced
    heatmap_size:
    - 64
    - 64
    loss_type: JointsCompositeLoss
    # the following two settings are only valid for JointsCompositeLoss
    loss_spec_list: ['mse', 'l1', 'sl1']
    loss_weight_list: [1.0, 0.1, 'None']
    cr_loss_threshold: 0.15
    init_weights: true
    num_joints: 33
    #use_different_joints_weight: False
    # use a pre-trained checkpoint to initialize the model
    # MODIFY it to your own checkpoint directory
    pretrained: '../resources/start_point.pth'
    target_type: gaussian
    sigma: 1
    extra:
        pretrained_layers:
        - 'conv1'
        - 'bn1'
        - 'conv2'
        - 'bn2'
        - 'layer1'
        - 'transition1'
        - 'stage2'
        - 'transition2'
        - 'stage3'
        - 'transition3'
        - 'stage4'
        final_conv_kernel: 1
        stage2:
            num_modules: 1
            num_branches: 2
            block: basic
            num_blocks:
            - 4
            - 4
            num_channels:
            - 48
            - 96
            fuse_method: sum
        stage3:
            num_modules: 4
            num_branches: 3
            block: basic
            num_blocks:
            - 4
            - 4
            - 4
            num_channels:
            - 48
            - 96
            - 192
            fuse_method: sum
        stage4:
            num_modules: 3
            num_branches: 4
            block: basic
            num_blocks:
            - 4
            - 4
            - 4
            - 4
            num_channels:
            - 48
            - 96
            - 192
            - 384
            fuse_method: sum

## training settings  
training_settings:
    total_epochs: 45
    resume: False
    batch_size: 24
    num_threads: 16 # MODIFY this accordingly based on your machine
    shuffle: True
    pin_memory: False
    # weighted loss computation
    use_target_weight: False
    report_every: 30
    eval_every: 130
    eval_during: False # set this to True if you want to evaluate during training
    eval_metrics: ['JointDistance2DSIP']
    plot_loss: False
    # debugging configurations 
    debug: 
        save: True # save some intermeadiate images with keypoint prediction
        save_images_kpts: True
        save_hms_gt: True
        save_hms_pred: True

## testing settings
testing_settings:
    batch_size: 2
    num_threads: 4
    shuffle: False
    pin_memory: False
    apply_dropout: False
    unnormalize: False
    eval_metrics: ['JointDistance2DSIP']

## optimizer settings
optimizer:
    # for ADAM
    optim_type: 'adam'
    lr: 0.001
    weight_decay: 0.0
    # for SGD
    momentum: 0.9
    # learning rate decay
    milestones: [10, 20, 30, 40]
    gamma: 0.5


================================================
FILE: configs/KITTI_train_IGRs_Ped.yml
================================================
# YAML file storing experimental configurations for training on KITTI dataset for the Pedestrian class

## general settings
name: 'kitti_kpt_loc_pedestrian'
exp_type: 'instanceto2d'
# baselin
model_type: 'heatmapModel'
use_gpu: True
gpu_id: [0,1,]

## operations
train: True
save: True
visualize: False
evaluate: False

## output directories
dirs:
    # MODIFY them to your preferred directories
    output: '../outputs/training_record' 
    # This directory save intermediate training results (optional)
    debug: '../outputs/training_record/debug' 

## CUDNN settings
cudnn:
    enabled: True
    deterministic: False
    benchmark: False

## dataset settings
dataset:
    name: 'KITTI'
    detect_classes: ['Pedestrian']
    3d_kpt_sample_style: 'bbox9'
    interpolate:
        flag: True
        style: 'bbox12'
        coef: [0.332, 0.667]
    enlarge_factor: 1.05 # patch size parameter
    # do some pre-processing
    pre-process: True
    root: '/media/nicholas/Database/datasets/KITTI'
    # augmentation parameters
    scaling_factor: 0.2
    rotation_factor: 30. # degrees
    # pytorch image transformation setting
    pth_transform:  
        mean: [0.485, 0.456, 0.406]
        std: [0.229, 0.224, 0.225]    
    2d_kpt_style: 'bbox9'

## self-supervision settings
ss:
    flag: False
    # MODIFY this to your unlabeled image record if you enable self-supervised representation learning
    record_path: '$YOUR_DIR/Apollo_ss_record.npy'
    img_root: '$YOUR_DIR/ApolloScape/images'
    max_per_img: 6

## settings for a fully-convolutional heatmap regression model
heatmapModel:
    name: hrnet # here a high-resolution (hr) model is used
    add_xy: False # concatenate xy coodrinate maps along with the input
    jitter_bbox: True
    jitter_params:
        shift:
        - 0.1
        - 0.1
        scaling:
        - 0.2
        - 0.2
    input_size: 
    - 192
    - 256
    # rotate and scaling and input images
    augment_input: True
    head_type: 'coordinates'
    # up-sampling with pixel-shuffle
    pixel_shuffle: False
    # if an intermediate heatmap is produced
    heatmap_size:
    - 48
    - 64
    loss_type: JointsCompositeLoss
    # the following two settings are only valid for JointsCompositeLoss
    loss_spec_list: ['mse', 'l1', 'None']
    loss_weight_list: [1.0, 0.1, 0.]
    cr_loss_threshold: 0.1
    init_weights: true
    num_joints: 33
    use_different_joints_weight: False
    # use a pre-trained checkpoint to initialize the model
    # MODIFY it to your own checkpoint directory
    pretrained: '../resources/start_point.pth'
    target_type: gaussian
    sigma: 2
    extra:
        pretrained_layers:
        - 'conv1'
        - 'bn1'
        - 'conv2'
        - 'bn2'
        - 'layer1'
        - 'transition1'
        - 'stage2'
        - 'transition2'
        - 'stage3'
        - 'transition3'
        - 'stage4'
        freeze_layers:
        - 'conv1'
        - 'bn1'
        - 'conv2'
        - 'bn2'
        - 'layer1'
        - 'transition1'
        - 'stage2'        
        final_conv_kernel: 1
        stage2:
            num_modules: 1
            num_branches: 2
            block: basic
            num_blocks:
            - 4
            - 4
            num_channels:
            - 32
            - 64
            fuse_method: sum
        stage3:
            num_modules: 4
            num_branches: 3
            block: basic
            num_blocks:
            - 4
            - 4
            - 4
            num_channels:
            - 32
            - 64
            - 128
            fuse_method: sum
        stage4:
            num_modules: 3
            num_branches: 4
            block: basic
            num_blocks:
            - 4
            - 4
            - 4
            - 4
            num_channels:
            - 32
            - 64
            - 128
            - 256
            fuse_method: sum

## training settings  
training_settings:
    total_epochs: 40
    resume: False
    begin_epoch: 1
    end_epoch: 10
    snapshot_epochs: [20, 30, 40]
    batch_size: 2
    num_threads: 0
    shuffle: True
    pin_memory: False
    # weighted loss computation
    use_target_weight: False
    report_every: 100
    eval_every: 1000
    eval_during: True
    eval_metrics: ['JointDistance2DSIP']
    plot_loss: False
    # debugging configurations 
    debug: 
        save: True # save some intermeadiate results
        save_images_kpts: True
        save_hms_gt: True
        save_hms_pred: True

## testing settings
testing_settings:
    batch_size: 2
    num_threads: 0
    shuffle: False
    pin_memory: False
    apply_dropout: False
    unnormalize: False
    eval_metrics: ['JointDistance2DSIP']
    # save_debug: True
    save_debug: False

## optimizer settings
optimizer:
    # for ADAM
    optim_type: 'adam'
    lr: 0.001
    weight_decay: 0.0
    # for SGD
    momentum: 0.9
    # learning rate decay
    milestones: [10, 20, 30]
    gamma: 0.5


================================================
FILE: configs/KITTI_train_lifting.yml
================================================
# YAML file storing experimental configurations for KITTI dataset

## general settings
name: 'lifter'
exp_type: '2dto3d'
model_type: 'FCModel'
use_gpu: True
gpu_id: [1] # modify this to the GPU ids that you use 

## operations
train: True # perform training
save: True # save the trained model
visualize: False # visualize the training results
evaluate: False # perform evaluation

## paths to the relevant directories
dirs:
    # output directory
    output: '../outputs/training_record' 
    debug: '../outputs/training_record/debug'
    data_vis: '../outputs/training_record/data_vis'

## CUDNN settings
cudnn:
    enabled: True
    deterministic: False
    benchmark: False

## evaluation metrics
metrics:
    R3D:
        T_style: 'direct'
        R_style: 'euler'

## dataset settings
dataset:
    name: 'KITTI'
    detect_classes: ['Car'] # used class for training
    3d_kpt_sample_style: 'bbox9' # construct a cuboid for each 3D bounding box
    # interpolate the 3D bbox
    interpolate:
        flag: True
        style: 'bbox12'
        coef: [0.332, 0.667]
    # do some pre-processing
    pre-process: False
    root: '$YOUR_DIR/KITTI' # MODIFY this to your own path    
    # input-output representation for 2d-to-3d lifting
    lft_in_rep: 'coordinates2d' # 2d coordinates on screen
    lft_out_rep: 'R3d' # 3d coordinates relative to centroid plus translation vector

## optional cascaded regression
cascade: 
    num_stages: 1 # the default is simply no cascade

## model settings for a fully-connected network if used
FCModel:
    name: 'lifter'
    refine_3d: False 
    norm_twoD: False
    num_blocks: 2 
    num_neurons: 1024
    dropout: 0.5
    leaky: False
    loss_type: MSELoss1D
    loss_reduction: 'mean'

## training settings  
training_settings:
    # total_epochs: 300
    total_epochs: 1
    eval_start_epoch: 250 # start evaluation after this epoch
    resume: False
    batch_size: 2048
    num_threads: 4 # set the number of workers that works for your machine
    shuffle: True
    pin_memory: False
#    report_every: 500 # report every 500 batches
#    eval_every: 500 # test on the evaluation set every 500 batches
    report_every: 5 # report every 500 batches
    eval_every: 5 # test on the evaluation set every 500 batches
    eval_during: False # MODIFY this to True if you want to evaluate during the training process
    # how many times to augment data for 2D-to-3D lifting
    lft_aug: True
    lft_aug_times: 100
    # what evaluation metrics to use
    eval_metrics: ['RError3D']
    plot_loss: False # visualize the loss function during training 

## testing settings if used
testing_settings:
    apply_dropout: False
    unnormalize: True
    batch_size: 1024
    num_threads: 4
    shuffle: False
#    vis_epoch: 290 # start ploting after this epoch

## optimizer settings
optimizer:
    # for ADAM
    optim_type: 'adam'
    lr: 0.001
    weight_decay: 0.0
    # for SGD
    momentum: 0.9
    # learning rate will decay at each milestone epoch
    milestones: [50, 100, 150, 250]
    gamma: 0.5


================================================
FILE: docs/demo.md
================================================
Firstly you need to prepare the dataset and pre-trained models as described [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/preparation.md).

Then modify the directories by

```bash
cd ${EgoNet_DIR}/configs && vim KITTI_inference:demo.yml
```

Edit dirs:ckpt to your pre-trained model directory.

Edit dataset:root to your KITTI directory.

Finally, go to ${EgoNet_DIR}/tools and run

```bash
 python inference.py --cfg "../configs/KITTI_inference:demo.yml" --visualize True --batch_to_show 2
```

You can set --batch_to_show to other integers to see more results.

The visualized 3D bounding boxes are distinguished by their colors: 
1. Black indicates ground truth 3D boxes.
2. Magenta indicates 3D bounding boxes predicted by another 3D object detector ([D4LCN](https://github.com/dingmyu/D4LCN)).
3. Red indicates the predictions of Ego-Net, using the 2D bounding boxes from [D4LCN](https://github.com/dingmyu/D4LCN).
4. Yellow indicates the predictions of Ego-Net, using the ground truth 2D bounding boxes.


================================================
FILE: docs/inference.md
================================================
Firstly you need to prepare the dataset and pre-trained models as described [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/preparation.md).

## Reproduce D4LCN + EgoNet on the val split
You need to modify the directories by

```bash
cd ${EgoNet_DIR}/configs && vim KITTI_inference:demo.yml
```
Edit dirs:output to where you want to save the predictions.

Edit dirs:ckpt to your pre-trained model directory.

Edit dataset:root to your KITTI directory.

Finally, go to ${EgoNet_DIR}/tools and run

```bash
 python inference.py --cfg "../configs/KITTI_inference:demo.yml"
```

This will load D4LCN predictions, refine their vehicle orientation predictions and save the results.
The official evaluation program will automatically run to produce quantitative performance.

## Reproduce results on the test split
You need to modify the directories by

```bash
cd ${EgoNet_DIR}/configs && vim KITTI_inference:test_submission.yml
```
Edit dirs:output to where you want to save the predictions.

Edit dirs:ckpt to your pre-trained model directory.

Edit dataset:root to your KITTI directory.

Finally, go to ${EgoNet_DIR}/tools and run

```bash
 python inference.py --cfg "../configs/KITTI_inference:test_submission.yml"
```

This will load prepared 2D bounding boxes, predict the vehicle orientation and save the predictions.

Now you can zip the results and submit it to the [official server](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d)!

You can hit [91.23% AOS](http://www.cvlibs.net/datasets/kitti/eval_object_detail.php?&result=e5233225fd5ef36fa63eb00252d9c00024961f2c) for the moderate setting! This is the **most important** metric for joint vehicle detection and pose estimation on KITTI. You achieved this with a single RGB image without extra training data.


================================================
FILE: docs/preparation.md
================================================
## Data Preparation 
You need to download KITTI dataset [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Download left images, calibration files and labels.
Download the split files [here](https://drive.google.com/drive/folders/1YLtptqspOFw08QG2MsxewDT9tjF2O45g?usp=sharing) and place them at ${YOUR_KITTI_DIR}/SPLIT/ImageSets.
Your data folder should look like this:

   ```
   ${YOUR_KITTI_DIR}
   ├── training
      ├── calib
          ├── xxxxxx.txt (Camera parameters for image xxxxxx)
      ├── image_2
          ├── xxxxxx.png (image xxxxxx)
      ├── label_2
          ├── xxxxxx.txt (object labels for image xxxxxx)
      ├── ImageSets
         ├── train.txt
         ├── val.txt   
         ├── trainval.txt        
   ├── testing
      ├── calib
          ├── xxxxxx.txt (Camera parameters for image xxxxxx)
      ├── image_2
          ├── xxxxxx.png (image xxxxxx)
      ├── ImageSets
         ├── test.txt
   ```

## Download pre-trained model
You need to download the pre-trained checkpoints [here](https://drive.google.com/file/d/1JsVzw7HMfchxOXoXgvWG1I_bPRD1ierE/view?usp=sharing) in order to use Ego-Net. Unzip it to ${YOUR_MODEL_DIR}.

## Compile the official evaluator
Go to the folder storing the source code
```bash
cd ${EgoNet_DIR}/tools/kitti-eval 
```
Compile the source code
```bash
g++ -o evaluate_object_3d_offline evaluate_object_3d_offline.cpp -O3
```

## Download the input bounding boxes
Download the [resources folder](https://drive.google.com/drive/folders/1atfXLmsLFG6XEtNnwZuEYLydKqjr7Icf?usp=sharing) and unzip its contents. Place the resource folder at ${EgoNet_DIR}/resources


## Environment
You need to create an environment that meets the following dependencies. 
The versions included in the parenthesis are **tested**. Other versions may also work but are **not tested**.

- Python (3.7.9)
- Numpy (1.19.2)
- PyTorch (1.6.0, GPU required)
- Scipy (1.5.2)
- Matplotlib (3.3.4)
- OpenCV (3.4.2)
- pyyaml (5.4.1)

For more details of my tested local environment, refer to [spec-list.txt](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/spec-list.txt). 
The recommended environment manager is [Anaconda](https://www.anaconda.com/), which can create an environment using this provided spec-list. 
For debugging using an IDE, I personally use and recommend Spyder 4.2 which you can get by
```bash
conda install spyder
```


================================================
FILE: docs/spec-list.txt
================================================
# This file may be used to create an environment using:
# $ conda create --name <env> --file <this file>
# platform: linux-64
@EXPLICIT
https://repo.anaconda.com/pkgs/main/linux-64/_libgcc_mutex-0.1-main.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/blas-1.0-mkl.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ca-certificates-2021.1.19-h06a4308_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ld_impl_linux-64-2.33.1-h53a641e_7.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libgfortran-ng-7.3.0-hdf63c60_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libstdcxx-ng-9.1.0-hdf63c60_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pandoc-2.12-h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/cudatoolkit-9.2-0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libgcc-ng-9.1.0-hdf63c60_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/bzip2-1.0.8-h7b6447c_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/expat-2.2.10-he6710b0_2.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/freeglut-3.0.0-hf484d3e_5.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/graphite2-1.3.14-h23475e2_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/icu-58.2-he6710b0_3.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/jpeg-9b-h024ee3a_2.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libffi-3.3-he6710b0_2.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libglu-9.0.0-hf484d3e_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libopus-1.3.1-h7b6447c_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libsodium-1.0.18-h7b6447c_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libspatialindex-1.9.3-h2531618_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libuuid-1.0.3-h1bed415_2.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libvpx-1.7.0-h439df22_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libwebp-base-1.2.0-h27cfd23_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libxcb-1.14-h7b6447c_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/lz4-c-1.9.3-h2531618_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ncurses-6.2-he6710b0_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/openssl-1.1.1k-h27cfd23_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pcre-8.44-he6710b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pixman-0.40.0-h7b6447c_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/xz-5.2.5-h7b6447c_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/yaml-0.2.5-h7b6447c_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/zlib-1.2.11-h7b6447c_3.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/glib-2.68.0-h36276a3_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/hdf5-1.10.2-hba1933b_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/jasper-2.0.14-h07fcdf6_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libpng-1.6.37-hbc83047_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libxml2-2.9.10-hb55368b_3.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/readline-8.1-h27cfd23_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/tk-8.6.10-hbc83047_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/zeromq-4.3.4-h2531618_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/zstd-1.4.5-h9ceee32_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/dbus-1.13.18-hb2f20db_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/freetype-2.10.4-h5ab3b9f_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/gstreamer-1.14.0-h28cd5cc_2.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libtiff-4.2.0-h85742a9_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/sqlite-3.35.2-hdfb4753_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ffmpeg-4.0-hcdf2ecd_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/fontconfig-2.13.1-h6c09931_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/gst-plugins-base-1.14.0-h8213a91_2.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/lcms2-2.11-h396b838_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/python-3.7.9-h7579374_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/alabaster-0.7.12-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/appdirs-1.4.4-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/argh-0.26.2-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/atomicwrites-1.4.0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/attrs-20.3.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/backcall-0.2.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/cairo-1.16.0-hf32fb01_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/certifi-2020.12.5-py37h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/chardet-4.0.0-py37h06a4308_1003.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/click-7.1.2-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/cloudpickle-1.6.0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/colorama-0.4.4-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/decorator-4.4.2-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/defusedxml-0.7.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/diff-match-patch-20200713-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/docutils-0.16-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/entrypoints-0.3-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/future-0.18.2-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/idna-2.10-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/imagesize-1.2.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/ipython_genutils-0.2.0-pyhd3eb1b0_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/jeepney-0.6.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/kiwisolver-1.3.1-py37h2531618_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/lazy-object-proxy-1.6.0-py37h27cfd23_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/markupsafe-1.1.1-py37h14c3975_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/mccabe-0.6.1-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/mistune-0.8.4-py37h14c3975_1001.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/mypy_extensions-0.4.3-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ninja-1.10.2-py37hff7bd54_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/olefile-0.46-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pandocfilters-1.4.3-py37h06a4308_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/parso-0.7.0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pathspec-0.7.0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pickleshare-0.7.5-pyhd3eb1b0_1003.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/psutil-5.8.0-py37h27cfd23_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/ptyprocess-0.7.0-pyhd3eb1b0_2.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pycodestyle-2.6.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pycparser-2.20-py_2.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pyflakes-2.2.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pyparsing-2.4.7-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pyrsistent-0.17.3-py37h7b6447c_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pysocks-1.7.1-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pytz-2021.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pyxdg-0.27-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pyyaml-5.4.1-py37h27cfd23_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pyzmq-20.0.0-py37h2531618_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/qdarkstyle-2.8.1-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/qt-5.9.7-h5867ecd_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/qtpy-1.9.0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/regex-2021.3.17-py37h27cfd23_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/rope-0.18.0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/rtree-0.9.4-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/sip-4.19.8-py37hf484d3e_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/six-1.15.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/snowballstemmer-2.1.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/sortedcontainers-2.3.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-applehelp-1.0.2-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-devhelp-1.0.2-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-htmlhelp-1.0.3-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-jsmath-1.0.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-qthelp-1.0.3-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-serializinghtml-1.1.4-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/testpath-0.4.4-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/textdistance-4.2.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/toml-0.10.2-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/tornado-6.1-py37h27cfd23_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/typed-ast-1.4.2-py37h27cfd23_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/typing_extensions-3.7.4.3-pyha847dfd_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ujson-4.0.2-py37h2531618_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/wcwidth-0.2.5-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/webencodings-0.5.1-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/wheel-0.36.2-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/wrapt-1.12.1-py37h7b6447c_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/wurlitzer-2.0.1-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/yapf-0.31.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/zipp-3.4.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/autopep8-1.5.6-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/babel-2.9.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/black-19.10b0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/cffi-1.14.5-py37h261ae71_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/cycler-0.10.0-py37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/harfbuzz-1.8.8-hffaf4a1_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/importlib-metadata-3.7.3-py37h06a4308_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/intervaltree-3.1.0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/jedi-0.17.2-py37h06a4308_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/packaging-20.9-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pexpect-4.8.0-pyhd3eb1b0_3.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pillow-8.1.2-py37he98fc37_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/prompt-toolkit-3.0.17-pyh06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pydocstyle-6.0.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pyqt-5.9.2-py37h05f1152_2.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/python-dateutil-2.8.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/python-jsonrpc-server-0.4.0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/qtawesome-1.0.2-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/setuptools-52.0.0-py37h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/three-merge-0.1.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/traitlets-5.0.5-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/watchdog-1.0.2-py37h06a4308_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/astroid-2.5-py37h06a4308_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/bleach-3.3.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/brotlipy-0.7.0-py37h27cfd23_1003.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/cryptography-3.4.6-py37hd23ed53_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/flake8-3.9.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/importlib_metadata-3.7.3-hd3eb1b0_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/isort-5.8.0-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/jinja2-2.11.3-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/jupyter_core-4.7.1-py37h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/libopencv-3.4.2-hb342d67_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pip-21.0.1-py37h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pygments-2.8.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ipython-7.21.0-py37hb070fc8_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/jsonschema-3.2.0-py_2.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/jupyter_client-6.1.7-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pluggy-0.13.1-py37h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/pylint-2.7.2-py37h06a4308_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pyopenssl-20.0.1-pyhd3eb1b0_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/secretstorage-3.3.1-py37h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/ipykernel-5.3.4-py37h5ca1d4c_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/keyring-22.3.0-py37h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/nbformat-5.1.2-pyhd3eb1b0_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/python-language-server-0.36.2-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/urllib3-1.26.4-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/nbconvert-5.6.1-py37_1.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pyls-black-0.4.6-hd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/pyls-spyder-0.3.2-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/qtconsole-5.0.3-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/requests-2.25.1-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/spyder-kernels-1.10.2-py37h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/sphinx-3.5.3-pyhd3eb1b0_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/numpydoc-1.1.0-pyhd3eb1b0_1.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/spyder-4.2.4-py37h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/noarch/imageio-2.9.0-py_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/intel-openmp-2020.2-254.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/mkl-2020.2-256.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/mkl-service-2.3.0-py37he8ac12f_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/matplotlib-3.3.4-py37h06a4308_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/matplotlib-base-3.3.4-py37h62a2d02_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/mkl_fft-1.3.0-py37h54f3939_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/mkl_random-1.1.1-py37h0573a6f_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/numpy-1.19.2-py37h54aff64_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/numpy-base-1.19.2-py37hfa32c7d_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/py-opencv-3.4.2-py37hb342d67_1.tar.bz2
https://conda.anaconda.org/pytorch/linux-64/pytorch-1.6.0-py3.7_cuda9.2.148_cudnn7.6.3_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/scipy-1.5.2-py37h0b6359f_0.tar.bz2
https://repo.anaconda.com/pkgs/main/linux-64/opencv-3.4.2-py37h6fd60c2_1.tar.bz2
https://conda.anaconda.org/pytorch/linux-64/torchvision-0.7.0-py37_cu92.tar.bz2


================================================
FILE: docs/training.md
================================================
Firstly you need to prepare the dataset as described [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/preparation.md).

Then download a start point model [here](https://drive.google.com/file/d/1VFtMGgBG0cLGnbr3brrnPnJii2xGYj-9/view?usp=sharing) and place it at ${EgoNet_DIR}/resources. 

The training phase consists of two stages which are described as follows. 

For training on other datasets. You need to prepare the training images and camera parameters accordingly.

## Stage 1: train a lifter (L.pth)
You need to modify the configuration by

```bash
cd ${EgoNet_DIR}/configs && vim KITTI_train_lifting.yml
```
Edit dataset:root to your KITTI directory.

(Optional) Edit dirs:output to where you want to save the output model.

(Optional) You can evaluate during training by setting eval_during to True.

Finally, run

```bash
 cd tools
 python train_lifting.py --cfg "../configs/KITTI_train_lifting.yml"
```


## Stage 2: train the remaining part (HC.pth)
You need to modify the configuration by

```bash
cd ${EgoNet_DIR}/configs && vim KITTI_train_IGRs.yml
```

Edit dataset:root to your KITTI directory.

Edit gpu_id according to your local machine and set batch_size based on how much GPU memory you have. 

(Optional) Edit dirs:output to where you want to save the output model.

(Optional) You can evaluate during training by setting eval_during to True.

(Optional) Edit ss to enable self-supervised representation learning. You need to prepare unlabeled ApolloScape images and download record [here](https://drive.google.com/file/d/1uPdOC7LioomMF5DieUNrx3aZKsgobP5U/view?usp=sharing).

(Optional) Edit training_settings:debug to disable saveing intermediate training results.

Finally, run

```bash
 cd tools
 python train_IGRs.py --cfg "../configs/KITTI_train_IGRs.yml"
```


================================================
FILE: libs/arguments/__init__.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Empty file.
"""


================================================
FILE: libs/arguments/parse.py
================================================
"""
Argument parser for command line inputs and experiment configuration file.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import yaml
import argparse

def read_yaml_file(path):
    """
    Read a .yml file.
    """
    try: 
        with open (path, 'r') as file:
            configs = yaml.safe_load(file)
    except Exception as e:
        print('Error reading the config file: ', e)
    return configs

def parse_args():
    """
    Read a .yml experiment configuration file whose path is provided by the user.
    
    You can add more arguments and modify configs accordingly.
    """
    parser = argparse.ArgumentParser(description='a general parser')
    # path to the configuration file
    parser.add_argument('--cfg',
                        help='experiment configuration file path',
                        type=str
                        )
    parser.add_argument('--visualize',
                        default=False,
                        type=bool
                        )    
    parser.add_argument('--batch_to_show',
                        default=1000000,
                        type=int
                        )    
    args, unknown = parser.parse_known_args()
    configs = read_yaml_file(args.cfg)   
    configs['config_path'] = args.cfg
    configs['visualize'] = args.visualize
    configs['batch_to_show'] = args.batch_to_show
    return configs

================================================
FILE: libs/common/__init__.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Empty file.
"""


================================================
FILE: libs/common/format.py
================================================
"""
Methods for formatted output.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""
import os

from copy import deepcopy

def format_str_submission(roll, pitch, yaw, x, y, z, score):
    """
    Get a prediction string in ApolloScape style.
    """      
    tempt_str = "{pitch:.3f} {yaw:.3f} {roll:.3f} {x:.3f} {y:.3f} {z:.3f} {score:.3f}".format(
            pitch=pitch,
            yaw=yaw,
            roll=roll,
            x=x,
            y=y,
            z=z,
            score=score)
    return tempt_str

def get_instance_str(dic):
    """
    Produce KITTI style prediction string for one instance.
    """     
    string = ""
    string += dic['class'] + " "
    string += "{:.1f} ".format(dic['truncation'])
    string += "{:.1f} ".format(dic['occlusion'])
    string += "{:.6f} ".format(dic['alpha'])
    string += "{:.6f} {:.6f} {:.6f} {:.6f} ".format(dic['bbox'][0], dic['bbox'][1], dic['bbox'][2], dic['bbox'][3])
    string += "{:.6f} {:.6f} {:.6f} ".format(dic['dimensions'][1], dic['dimensions'][2], dic['dimensions'][0])
    string += "{:.6f} {:.6f} {:.6f} ".format(dic['locations'][0], dic['locations'][1], dic['locations'][2])
    string += "{:.6f} ".format(dic['rot_y'])
    if 'score' in dic:
        string += "{:.8f} ".format(dic['score'])
    else:
        string += "{:.8f} ".format(1.0)
    return string

def get_pred_str(record):
    """
    Produce KITTI style prediction string for a record dictionary.
    """      
    # replace the rotation predictions of input bounding boxes
    updated_txt = deepcopy(record['raw_txt_format'])
    for instance_id in range(len(record['euler_angles'])):
        updated_txt[instance_id]['rot_y'] = record['euler_angles'][instance_id, 1]
        updated_txt[instance_id]['alpha'] = record['alphas'][instance_id]
    pred_str = ""
    angles = record['euler_angles']
    for instance_id in range(len(angles)):
        # format a string for submission
        tempt_str = get_instance_str(updated_txt[instance_id])
        if instance_id != len(angles) - 1:
            tempt_str += '\n'
        pred_str += tempt_str
    return pred_str

def save_txt_file(img_path, prediction, params):
    """
    Save a txt file for predictions of an image.
    """    
    if not params['flag']:
        return
    file_name = img_path.split('/')[-1][:-3] + 'txt'
    save_path = os.path.join(params['save_dir'], file_name) 
    with open(save_path, 'w') as f:
        f.write(prediction['pred_str'])
    print('Wrote prediction file at {:s}'.format(save_path))
    return

================================================
FILE: libs/common/img_proc.py
================================================
"""
Image processing utilities.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import cv2
import numpy as np
import torch
import torch.nn.functional as F
import os

SIZE = 200.0

def transform_preds(coords, center, scale, output_size):
    """
    Transform local coordinates within a patch to screen coordinates.
    """      
    target_coords = np.zeros(coords.shape)
    trans = get_affine_transform(center, scale, 0, output_size, inv=1)
    for p in range(coords.shape[0]):
        target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)
    return target_coords

def get_affine_transform(center, 
                         scale, 
                         rot, 
                         output_size,
                         shift=np.array([0, 0], dtype=np.float32), 
                         inv=0
                         ):
    """
    Estimate an affine transformation given crop parameters (center, scale and
    rotation) and output resolution.                                                        
    """  
    if isinstance(scale, list):
        scale = np.array(scale)
    if isinstance(center, list):
        center = np.array(center)
    scale_tmp = scale * SIZE
    src_w = scale_tmp[0]
    dst_h, dst_w = output_size

    rot_rad = np.pi * rot / 180
    src_dir = get_dir([0, src_w * -0.5], rot_rad)
    dst_dir = np.array([0, dst_w * -0.5], np.float32)

    src = np.zeros((3, 2), dtype=np.float32)
    dst = np.zeros((3, 2), dtype=np.float32)
    src[0, :] = center + scale_tmp * shift
    src[1, :] = center + src_dir + scale_tmp * shift
    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]
    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir

    src[2:, :] = get_3rd_point(src[0, :], src[1, :])
    dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :])

    if inv:
        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
    else:
        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))

    return trans

def affine_transform(pt, t):
    new_pt = np.array([pt[0], pt[1], 1.]).T
    new_pt = np.dot(t, new_pt)
    return new_pt[:2]

def affine_transform_modified(pts, t):
    """
    Apply affine transformation with homogeneous coordinates.                                                    
    """ 
    # pts of shape [n, 2]
    new_pts = np.hstack([pts, np.ones((len(pts), 1))]).T
    new_pts = t @ new_pts
    return new_pts[:2, :].T

def get_3rd_point(a, b):
    direct = a - b
    return b + np.array([-direct[1], direct[0]], dtype=np.float32)

def get_dir(src_point, rot_rad):
    sn, cs = np.sin(rot_rad), np.cos(rot_rad)

    src_result = [0, 0]
    src_result[0] = src_point[0] * cs - src_point[1] * sn
    src_result[1] = src_point[0] * sn + src_point[1] * cs

    return src_result

def crop(img, center, scale, output_size, rot=0):
    """
    A cropping function implemented as warping.                                                      
    """     
    trans = get_affine_transform(center, scale, rot, output_size)

    dst_img = cv2.warpAffine(img, 
                             trans, 
                             (int(output_size[0]), int(output_size[1])),
                             flags=cv2.INTER_LINEAR
                             )   

    return dst_img

def simple_crop(input_image, center, crop_size):
    """
    A simple cropping function without warping.
    """  
    assert len(input_image.shape) == 3, 'Unsupported image format.'
    channel = input_image.shape[2]
    # crop a rectangular region around the center in the image
    start_x = int(center[0] - crop_size[0])
    end_x = int(center[0] + crop_size[0]) 
    start_y = int(center[1] - crop_size[1])
    end_y = int(center[1] + crop_size[1])
    cropped = np.zeros((end_y - start_y, end_x - start_x, channel), 
                       dtype = input_image.dtype)
    # new bounding box index 
    new_start_x = max(-start_x, 0)
    new_end_x = min(input_image.shape[1], end_x) - start_x
    new_start_y = max(-start_y, 0)
    new_end_y = min(input_image.shape[0], end_y) - start_y
    # clamped old bounding box index
    old_start_x = max(start_x, 0)
    old_end_x = min(end_x, input_image.shape[1])
    old_start_y = max(start_y, 0)
    old_end_y = min(end_y, input_image.shape[0])
    try:
        cropped[new_start_y:new_end_y, new_start_x:new_end_x,:] = input_image[
            old_start_y:old_end_y, old_start_x:old_end_x,:]
    except ValueError:
        print('Error: cropping fails')
    return cropped

def np_random():
    """
    Return a random number sampled uniformly from [-1, 1]
    """
    return np.random.rand()*2 - 1

def jitter_bbox_with_kpts(old_bbox, joints, parameters):
    """
    Randomly shifting and resizeing a bounding box and mask out occluded joints.
    Used as data augmentation to improve robustness to detector noise.
    
    bbox: [x1, y1, x2, y2]
    joints: [N, 3]
    """
    new_joints = joints.copy()
    width, height = old_bbox[2] - old_bbox[0], old_bbox[3] - old_bbox[1]
    old_center = [0.5*(old_bbox[0] + old_bbox[2]), 
                  0.5*(old_bbox[1] + old_bbox[3])]
    horizontal_shift = parameters['shift'][0]*width*np_random()
    vertical_shift = parameters['shift'][1]*height*np_random()
    new_center = [old_center[0] + horizontal_shift,
                  old_center[1] + vertical_shift]
    horizontal_scaling = parameters['scaling'][0]*np_random() + 1
    vertical_scaling = parameters['scaling'][1]*np_random() + 1
    new_width = width*horizontal_scaling
    new_height = height*vertical_scaling
    new_bbox = [new_center[0] - 0.5*new_width, new_center[1] - 0.5*new_height,
                new_center[0] + 0.5*new_width, new_center[1] + 0.5*new_height]
    # predicate from upper left corner
    predicate1 = joints[:, :2] - np.array([[new_bbox[0], new_bbox[1]]])
    predicate1 = (predicate1 > 0.).prod(axis=1)
    # predicate from lower right corner
    predicate2 = joints[:, :2] - np.array([[new_bbox[2], new_bbox[3]]])
    predicate2 = (predicate2 < 0.).prod(axis=1)
    new_joints[:, 2] *= predicate1*predicate2
    return new_bbox, new_joints

def jitter_bbox_with_kpts_no_occlu(old_bbox, joints, parameters):
    """
    Similar to the function above, but does not produce occluded joints
    """
    width, height = old_bbox[2] - old_bbox[0], old_bbox[3] - old_bbox[1]
    old_center = [0.5 * (old_bbox[0] + old_bbox[2]), 
                  0.5 * (old_bbox[1] + old_bbox[3])]
    horizontal_scaling = parameters['scaling'][0] * np.random.rand() + 1
    vertical_scaling = parameters['scaling'][1] * np.random.rand() + 1
    horizontal_shift = 0.5 * (horizontal_scaling - 1) * width * np_random()
    vertical_shift = 0.5 * (vertical_scaling - 1) * height * np_random()
    new_center = [old_center[0] + horizontal_shift,
                  old_center[1] + vertical_shift]
    new_width = width * horizontal_scaling
    new_height = height * vertical_scaling
    new_bbox = [new_center[0] - 0.5 * new_width, new_center[1] - 0.5 * new_height,
                new_center[0] + 0.5 * new_width, new_center[1] + 0.5 * new_height]
    return new_bbox, joints

def generate_xy_map(bbox, resolution, global_size):
    """
    Generate the normalized coordinates as 2D maps which encodes location 
    information.
    
    bbox: [x1, y1, x2, y2] the local region
    resolution (height, width): target resolution
    global_size (height, width): the size of original image
    """
    map_width, map_height = resolution
    g_height, g_width = global_size
    x_start, x_end = 2*bbox[0]/g_width - 1, 2*bbox[2]/g_width - 1
    y_start, y_end = 2*bbox[1]/g_height - 1, 2*bbox[3]/g_height - 1
    x_map = np.tile(np.linspace(x_start, x_end, map_width), (map_height, 1))
    x_map = x_map.reshape(map_height, map_width, 1)
    y_map = np.linspace(y_start, y_end, map_height).reshape(map_height, 1)
    y_map = np.tile(y_map, (1, map_width))
    y_map = y_map.reshape(map_height, map_width, 1)
    return np.concatenate([x_map, y_map], axis=2)

def crop_single_instance(data_numpy, bbox, joints, parameters, pth_trans=None):
    """
    Crop an instance from an image given the bounding box and part coordinates.
    """
    reso = parameters['input_size'] # (height, width)
    transformed_joints = joints.copy()
    if parameters['jitter_bbox']:
        bbox, joints = jitter_bbox_with_kpts_no_occlu(bbox, 
                                                      joints,
                                                      parameters['jitter_params']
                                                      )
    joints_vis = joints[:, 2]
    if parameters['resize']:
        ret = resize_bbox(bbox[0], bbox[1], bbox[2], bbox[3], 
                          target_ar=reso[0]/reso[1])
        c, s = ret['c'], ret['s']
    else:
        c, s = bbox2cs(bbox)    
    trans = get_affine_transform(c, s, 0.0, reso)
    input = cv2.warpAffine(data_numpy,
                           trans,
                           (int(reso[1]), int(reso[0])),
                           flags=cv2.INTER_LINEAR
                           )
    # add two more channels to encode object location
    if parameters['add_xy']:
        xymap = generate_xy_map(ret['bbox'], reso, parameters['global_size'])
        input = np.concatenate([input, xymap.astype(np.float32)], axis=2)
    #cv2.imwrite('test.jpg', input)
    #input = torch.from_numpy(input.transpose(2,0,1))
    input = input if pth_trans is None else pth_trans(input)
    for i in range(len(joints)):
        if joints_vis[i] > 0.0:
            transformed_joints[i, 0:2] = affine_transform(joints[i, 0:2], trans)   
    c = c.reshape(1, 2)
    s = s.reshape(1, 2)
    return input.unsqueeze(0), transformed_joints, c, s

def get_tensor_from_img(path, 
                        parameters,
                        sf=0.2, 
                        rf=30., 
                        r_prob=0.6, 
                        aug=False, 
                        rgb=True, 
                        joints=None,
                        global_box=None,
                        pth_trans=None,
                        generate_hm=False,
                        max_cnt=None
                        ):
    """
    Read image and apply data augmentation to obtain a tensor. 
    Keypoints are also transformed if given.
    
    path: image path
    c: cropping center
    s: cropping scale
    r: rotation
    reso: resolution of output image
    sf: scaling factor
    rf: rotation factor
    aug: apply data augmentation
    joints: key-point locations with optional visibility [N_instance, N_joint, 3]
    generate_hm: whether to generate heatmap based on joint locations
    """
#    data_numpy = cv2.imread(
#        path, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION
#        )
    data_numpy = cv2.imread(
        path, 1 | 128
        )    
    if data_numpy is None:
        raise ValueError('Fail to read {}'.format(path))    
    if rgb:
        data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)
    all_inputs = []
    all_target = []
    all_centers = []
    all_scales = []
    all_target_weight = []
    # the dimension of the image
    parameters['global_size'] = data_numpy.shape[:-1]
    all_transformed_joints = []
    if parameters['reference'] == 'bbox':
        # crop around the given bounding boxes
        # bbox = [0, 0, data_numpy.shape[1] - 1, data_numpy.shape[0] - 1] \
        #     if 'bbox' not in parameters else parameters['bbox']
        bboxes = parameters['boxes'] # [N_instance, 4]
        for idx, bbox in enumerate(bboxes):
            input, transformed_joints, c, s = crop_single_instance(data_numpy,
                                                                   bbox,
                                                                   joints[idx],
                                                                   parameters,
                                                                   pth_trans
                                                                   )
            all_inputs.append(input)
            all_centers.append(c)
            all_scales.append(s)
        # s = s * np.clip(np.random.randn() * sf + 1, 1 - sf, 1 + sf)
        # r = np.clip(np.random.randn() * rf, -rf, rf) if np.random.rand() <= r_prob else 0
            target = target_weight = 1.
            if generate_hm:
                target, target_weight = generate_target(transformed_joints, 
                                                        transformed_joints[:,2], 
                                                        parameters)
                target = torch.unsqueeze(torch.from_numpy(target), 0)
                target_weight = torch.unsqueeze(torch.from_numpy(target_weight), 0)
            all_target.append(target)
            all_target_weight.append(target_weight)
            all_transformed_joints.append(np.expand_dims(transformed_joints,0))
    all_transformed_joints = np.concatenate(all_transformed_joints)
    if max_cnt is not None and max_cnt < len(all_inputs):
        end = max_cnt
    else:
        end = len(all_inputs)
    end_indices = list(range(end))
    meta = {
        'path': path,
        'original_joints': joints[end_indices],
        'transformed_joints': all_transformed_joints[end_indices],
        'center': np.vstack(all_centers[:end]),
        'scale': np.vstack(all_scales[:end]),
        'joints_vis': all_transformed_joints[end_indices][:,:,2]
        # 'rotation': r,
    }
    inputs = torch.cat(all_inputs[:end], dim=0)
    if generate_hm:
        targets = torch.cat(all_target[:end], dim=0)
        target_weights = torch.cat(all_target_weight[:end], dim=0)
    else:
        targets, target_weights = None, None
    return inputs, targets, target_weights, meta

def generate_target(joints, joints_vis, parameters):
    """
    Generate heatmap targets by drawing Gaussian dots.
    
    joints:  [num_joints, 3]
    joints_vis: [num_joints]
    
    return: target, target_weight (1: visible, 0: invisible)
    """
    num_joints = parameters['num_joints']
    target_type = parameters['target_type']
    input_size = parameters['input_size']
    heatmap_size = parameters['heatmap_size']
    sigma = parameters['sigma']
    target_weight = np.ones((num_joints, 1), dtype=np.float32)
    target_weight[:, 0] = joints_vis

    
    assert target_type == 'gaussian', 'Only support gaussian map now!'

    if target_type == 'gaussian':
        target = np.zeros((num_joints, heatmap_size[0], heatmap_size[1]), 
                          dtype=np.float32)

        tmp_size = sigma * 3

        for joint_id in range(num_joints):
            if target_weight[joint_id] <= 0.5:
                continue
            feat_stride = input_size / heatmap_size
            mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5)
            mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5)
            # Check that any part of the gaussian is in-bounds
            ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)]
            br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)]
            if ul[0] >= heatmap_size[1] or ul[1] >= heatmap_size[0] \
                    or br[0] < 0 or br[1] < 0:
                # If not, just return the image as is
                target_weight[joint_id] = 0
                continue

            # # Generate gaussian
            size = 2 * tmp_size + 1
            x = np.arange(0, size, 1, np.float32)
            y = x[:, np.newaxis]
            x0 = y0 = size // 2
            # The gaussian is not normalized, we want the center value to equal 1
            g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * sigma ** 2))

            # Usable gaussian range
            g_x = max(0, -ul[0]), min(br[0], heatmap_size[1]) - ul[0]
            g_y = max(0, -ul[1]), min(br[1], heatmap_size[0]) - ul[1]
            # Image range
            img_x = max(0, ul[0]), min(br[0], heatmap_size[1])
            img_y = max(0, ul[1]), min(br[1], heatmap_size[0])

            target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = \
                g[g_y[0]:g_y[1], g_x[0]:g_x[1]]

    if parameters['use_different_joints_weight']:
        target_weight = np.multiply(target_weight, parameters['joints_weight'])

    return target, target_weight

def resize_bbox(left, top, right, bottom, target_ar=1.):
    """
    Resize a bounding box to pre-defined aspect ratio.
    """ 
    width = right - left
    height = bottom - top
    aspect_ratio = height/width
    center_x = (left + right)/2
    center_y = (top + bottom)/2
    if aspect_ratio > target_ar:
        new_width = height*(1/target_ar)
        new_left = center_x - 0.5*new_width
        new_right = center_x + 0.5*new_width
        new_top = top
        new_bottom = bottom        
    else:
        new_height = width*target_ar
        new_left = left
        new_right = right
        new_top = center_y - 0.5*new_height
        new_bottom = center_y + 0.5*new_height
    return {'bbox': [new_left, new_top, new_right, new_bottom],
            'c': np.array([center_x, center_y]),
            's': np.array([(new_right - new_left)/SIZE, (new_bottom - new_top)/SIZE])
            }

def enlarge_bbox(left, top, right, bottom, enlarge):
    """
    Enlarge a bounding box.
    """ 
    width = right - left
    height = bottom - top
    new_width = width * enlarge[0]
    new_height = height * enlarge[1]
    center_x = (left + right) / 2
    center_y = (top + bottom) / 2    
    new_left = center_x - 0.5 * new_width
    new_right = center_x + 0.5 * new_width
    new_top = center_y - 0.5 * new_height
    new_bottom = center_y + 0.5 * new_height
    return [new_left, new_top, new_right, new_bottom]

def modify_bbox(bbox, target_ar, enlarge=1.1):
    """
    Modify a bounding box by enlarging/resizing.
    """
    lbbox = enlarge_bbox(bbox[0], bbox[1], bbox[2], bbox[3], [enlarge, enlarge])
    ret = resize_bbox(lbbox[0], lbbox[1], lbbox[2], lbbox[3], target_ar=target_ar)
    return ret
    
def resize_crop(crop_size, target_ar=None):
    """
    Resize a crop size to a pre-defined aspect ratio.
    """    
    if target_ar is None:
        return crop_size
    width = crop_size[0]
    height = crop_size[1]
    aspect_ratio = height / width    
    if aspect_ratio > target_ar:
        new_width = height * (1 / target_ar)
        new_height = height
    else:
        new_height = width*target_ar
        new_width = width
    return [new_width, new_height]

def bbox2cs(bbox):
    """
    Convert bounding box annotation to center and scale.
    """  
    return [(bbox[0] + bbox[2]/2), (bbox[1] + bbox[3]/2)], \
        [(bbox[2] - bbox[0]/SIZE), (bbox[3] - bbox[1]/SIZE)]

def cs2bbox(center, size):
    """
    Convert center/scale to a bounding box annotation.
    """  
    x1 = center[0] - size[0]
    y1 = center[1] - size[1]
    x2 = center[0] + size[0]
    y2 = center[1] + size[1]
    return [x1, y1, x2, y2]

def kpts2cs(keypoints, 
            enlarge=1.1, 
            method='boundary', 
            target_ar=None, 
            use_visibility=True
            ):
    """
    Convert instance screen coordinates to cropping center and size
    
    keypoints of shape [n_joints, 2/3]
    """   
    assert keypoints.shape[1] in [2, 3], 'Unsupported input.'
    if keypoints.shape[1] == 2:
        visible_keypoints = keypoints
        vis_rate = 1.0
    elif keypoints.shape[1] == 3 and use_visibility:
        visible_indices = keypoints[:, 2].nonzero()[0]
        visible_keypoints = keypoints[visible_indices, :2]
        vis_rate = len(visible_keypoints)/len(keypoints)
    else:
        visible_keypoints = keypoints[:, :2]
        visible_indices = np.array(range(len(keypoints)))
        vis_rate = 1.0
    if method == 'centroid':
        center = np.ceil(visible_keypoints.mean(axis=0, keepdims=True))
        dif = np.abs(visible_keypoints - center).max(axis=0, keepdims=True)
        crop_size = np.ceil(dif*enlarge).squeeze()
        center = center.squeeze()
    elif method == 'boundary':
        left_top = visible_keypoints.min(axis=0, keepdims=True)
        right_bottom = visible_keypoints.max(axis=0, keepdims=True)
        center = ((left_top + right_bottom) / 2).squeeze()
        crop_size = ((right_bottom - left_top)*enlarge/2).squeeze()
    else:
        raise NotImplementedError
    # resize the bounding box to a specified aspect ratio
    crop_size = resize_crop(crop_size, target_ar)
    x1, y1, x2, y2 = cs2bbox(center, crop_size)

    new_origin = np.array([[x1, y1]], dtype=keypoints.dtype)
    new_keypoints = keypoints.copy()
    if keypoints.shape[1] == 2:
        new_keypoints = visible_keypoints - new_origin
    elif keypoints.shape[1] == 3: 
        new_keypoints[visible_indices, :2] = visible_keypoints - new_origin
    return center, crop_size, new_keypoints, vis_rate

def draw_bboxes(img_path, bboxes_dict, save_path=None):
    """
    Draw bounding boxes with OpenCV.
    """
    data_numpy = cv2.imread(img_path, 1 | 128)  
    for name, (color, bboxes) in bboxes_dict.items():
        for bbox in bboxes:
            start_point = (bbox[0], bbox[1])
            end_point = (bbox[2], bbox[3])
            cv2.rectangle(data_numpy, start_point, end_point, color, 2)
    if save_path is not None:
        cv2.imwrite(save_path, data_numpy)
    return data_numpy

def imread_rgb(img_path):
    """
    Read image with OpenCV.
    """    
    data_numpy = cv2.imread(img_path, 1 | 128)  
    data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)
    return data_numpy

def save_cropped_patches(img_path, 
                         keypoints, 
                         save_dir="./", 
                         threshold=0.25,
                         enlarge=1.4, 
                         target_ar=None
                         ):
    """
    Crop instances from a image given part screen coordinates and save them.
    """
#    data_numpy = cv2.imread(
#        img_path, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION
#        )   
    data_numpy = cv2.imread(img_path, 1 | 128)  
    # data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)
    # debug
    # import matplotlib.pyplot as plt
    # plt.imshow(data_numpy[:,:,::-1])
    # plt.plot(keypoints[0][:,0], keypoints[0][:,1], 'ro')
    # plt.pause(0.1)
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)
    new_paths = []
    all_new_keypoints = []
    all_bbox = []
    for i in range(len(keypoints)):
        center, crop_size, new_keypoints, vis_rate = kpts2cs(keypoints[i], 
                                                             enlarge, 
                                                             target_ar=target_ar)
        all_bbox.append(list(map(int, cs2bbox(center, crop_size))))
        if vis_rate < threshold:
            continue
        all_new_keypoints.append(new_keypoints.reshape(1, keypoints.shape[1], -1))
        cropped = simple_crop(data_numpy, center, crop_size)
        save_path = os.path.join(save_dir, "instance_{:d}.jpg".format(i))
        new_paths.append(save_path)
        cv2.imwrite(save_path, cropped)
        del cropped
    if len(new_paths) == 0:
        # No instances cropped
        return new_paths, np.zeros((0, keypoints.shape[1], 3)), all_bbox
    else:
        return new_paths, np.concatenate(all_new_keypoints, axis=0), all_bbox
    
def get_max_preds(batch_heatmaps):
    """
    Get predictions from heatmaps with hard arg-max.
    
    batch_heatmaps: numpy.ndarray([batch_size, num_joints, height, width])
    """
    assert isinstance(batch_heatmaps, np.ndarray), \
        'batch_heatmaps should be numpy.ndarray'
    assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'

    batch_size = batch_heatmaps.shape[0]
    num_joints = batch_heatmaps.shape[1]
    width = batch_heatmaps.shape[3]
    heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1))
    idx = np.argmax(heatmaps_reshaped, 2)
    maxvals = np.amax(heatmaps_reshaped, 2)

    maxvals = maxvals.reshape((batch_size, num_joints, 1))
    idx = idx.reshape((batch_size, num_joints, 1))

    preds = np.tile(idx, (1, 1, 2)).astype(np.float32)

    preds[:, :, 0] = (preds[:, :, 0]) % width
    preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)

    pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
    pred_mask = pred_mask.astype(np.float32)

    preds *= pred_mask
    return preds, maxvals

def soft_arg_max_np(batch_heatmaps):
    """
    Soft-argmax instead of hard-argmax considering quantization errors.
    """
    assert isinstance(batch_heatmaps, np.ndarray), \
        'batch_heatmaps should be numpy.ndarray'
    assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'
    batch_size = batch_heatmaps.shape[0]
    num_joints = batch_heatmaps.shape[1]
    height = batch_heatmaps.shape[2]
    width = batch_heatmaps.shape[3]
    # get score/confidence for each joint
    heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1))
    maxvals = np.amax(heatmaps_reshaped, 2)
    maxvals = maxvals.reshape((batch_size, num_joints, 1))    
    # normalize the heatmaps so that they sum to 1
    #assert batch_heatmaps.min() >= 0.0
    batch_heatmaps = np.clip(batch_heatmaps, a_min=0.0, a_max=None)
    temp_sum = heatmaps_reshaped.sum(axis = 2, keepdims=True)
    heatmaps_reshaped /= temp_sum
    ## another normalization method: softmax
    # spatial soft-max
    #heatmaps_reshaped = softmax(heatmaps_reshaped, axis=2)
    ##
    batch_heatmaps = heatmaps_reshaped.reshape(batch_size, num_joints, height, width)
    x = batch_heatmaps.sum(axis = 2)
    y = batch_heatmaps.sum(axis = 3)
    x_indices = np.arange(width).astype(np.float32).reshape(1,1,width)
    y_indices = np.arange(height).astype(np.float32).reshape(1,1,height)
    x *= x_indices
    y *= y_indices
    x = x.sum(axis = 2, keepdims=True)
    y = y.sum(axis = 2, keepdims=True)
    preds = np.concatenate([x, y], axis=2)
    pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))
    pred_mask = pred_mask.astype(np.float32)
    preds *= pred_mask
    return preds, maxvals

def soft_arg_max(batch_heatmaps):
    """
    A pytorch version of soft-argmax
    """
    assert len(batch_heatmaps.shape) == 4, 'batch_images should be 4-ndim'
    batch_size = batch_heatmaps.shape[0]
    num_joints = batch_heatmaps.shape[1]
    height = batch_heatmaps.shape[2]
    width = batch_heatmaps.shape[3]
    heatmaps_reshaped = batch_heatmaps.view((batch_size, num_joints, -1))
    # get score/confidence for each joint    
    maxvals = heatmaps_reshaped.max(dim=2)[0]
    maxvals = maxvals.view((batch_size, num_joints, 1))       
    # normalize the heatmaps so that they sum to 1
    heatmaps_reshaped = F.softmax(heatmaps_reshaped, dim=2)
    batch_heatmaps = heatmaps_reshaped.view(batch_size, num_joints, height, width)
    x = batch_heatmaps.sum(dim = 2)
    y = batch_heatmaps.sum(dim = 3)
    x_indices = torch.arange(width).type(torch.cuda.FloatTensor)
    x_indices = torch.cuda.comm.broadcast(x_indices, devices=[x.device.index])[0]
    x_indices = x_indices.view(1, 1, width)
    y_indices = torch.arange(height).type(torch.cuda.FloatTensor)
    y_indices = torch.cuda.comm.broadcast(y_indices, devices=[y.device.index])[0]
    y_indices = y_indices.view(1, 1, height)    
    x *= x_indices
    y *= y_indices
    x = x.sum(dim = 2, keepdim=True)
    y = y.sum(dim = 2, keepdim=True)
    preds = torch.cat([x, y], dim=2)
    return preds, maxvals

def appro_cr(coordinates):
    """
    Approximate the square of cross-ratio along four ordered 2D points using 
    inner-product
    
    coordinates: PyTorch tensor of shape [4, 2]
    """
    AC = coordinates[2] - coordinates[0]
    BD = coordinates[3] - coordinates[1]
    BC = coordinates[2] - coordinates[1]
    AD = coordinates[3] - coordinates[0]
    return (AC.dot(AC) * BD.dot(BD)) / (BC.dot(BC) * AD.dot(AD))

def to_npy(tensor):
    """
    Convert PyTorch tensor to numpy array.
    """
    if isinstance(tensor, np.ndarray):
        return tensor
    else:
        return tensor.data.cpu().numpy()

================================================
FILE: libs/common/transformation.py
================================================
"""
Coordinate transformation functions.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import numpy as np
import cv2

def move_to(points, xyz=np.zeros((1,3))):
    # points of shape [n_points, 3]
    centroid = points.mean(axis=0, keepdims=True)
    return points - (centroid - xyz)

def world_to_camera_frame(P, R, T):
    """
    Convert points from world to camera coordinates
    
    P: Nx3 3d points in world coordinates
    R: 3x3 Camera rotation matrix
    T: 3x1 Camera translation parameters
    
    Returns
    X_cam: Nx3 3d points in camera coordinates
    """
    assert len(P.shape) == 2
    assert P.shape[1] == 3
    X_cam = R.dot( P.T - T ) # rotate and translate
    return X_cam.T

def camera_to_world_frame(P, R, T):
    """
    Inverse of world_to_camera_frame

    P: Nx3 points in camera coordinates
    R: 3x3 Camera rotation matrix
    T: 3x1 Camera translation parameters
    
    Returns
    X_cam: Nx3 points in world coordinates
    """
    assert len(P.shape) == 2
    assert P.shape[1] == 3
    X_cam = R.T.dot( P.T ) + T # rotate and translate
    return X_cam.T

def compute_similarity_transform(X, Y, compute_optimal_scale=False):
    """
    A port of MATLAB's `procrustes` function to Numpy.
    Adapted from http://stackoverflow.com/a/18927641/1884420
    
    Args
      X: array NxM of targets, with N number of points and M point dimensionality
      Y: array NxM of inputs
      compute_optimal_scale: whether we compute optimal scale or force it to be 1
    
    Returns:
      d: squared error after transformation
      Z: transformed Y
      T: computed rotation
      b: scaling
      c: translation
    """
    muX = X.mean(0)
    muY = Y.mean(0)
    X0 = X - muX
    Y0 = Y - muY
    ssX = (X0**2.).sum()
    ssY = (Y0**2.).sum()
    # centred Frobenius norm
    normX = np.sqrt(ssX)
    normY = np.sqrt(ssY)
    # scale to equal (unit) norm
    X0 = X0 / normX
    Y0 = Y0 / normY
    # optimum rotation matrix of Y
    A = np.dot(X0.T, Y0)
    U,s,Vt = np.linalg.svd(A,full_matrices=False)
    V = Vt.T
    T = np.dot(V, U.T)
    # Make sure we have a rotation
    detT = np.linalg.det(T)
    V[:,-1] *= np.sign( detT )
    s[-1]   *= np.sign( detT )
    T = np.dot(V, U.T)
    traceTA = s.sum()
    if compute_optimal_scale:  # Compute optimum scaling of Y.
        b = traceTA * normX / normY
        d = 1 - traceTA**2
        Z = normX*traceTA*np.dot(Y0, T) + muX
    else:  # If no scaling allowed
        b = 1
        d = 1 + ssY/ssX - 2 * traceTA * normY / normX
        Z = normY*np.dot(Y0, T) + muX
    c = muX - b*np.dot(muY, T)
    return d, Z, T, b, c

def compute_rigid_transform(X, Y, W=None, verbose=False):
    """
    A least-sqaure estimate of rigid transformation by SVD.
    
    Reference: https://content.sakai.rutgers.edu/access/content/group/
    7bee3f05-9013-4fc2-8743-3c5078742791/material/svd_ls_rotation.pdf
    
    X, Y: [d, N] N data points of dimention d
    W: [N, ] optional weight (importance) matrix for N data points
    """    
    assert len(X) == len(Y)
    assert (W is None) or (len(W.shape) in [1, 2])
    # find mean column wise
    centroid_X = np.mean(X, axis=1, keepdims=True)
    centroid_Y = np.mean(Y, axis=1, keepdims=True)
    # subtract mean
    Xm = X - centroid_X
    Ym = Y - centroid_Y
    if W is None:
        H = Xm @ Ym.T
    else:
        W = np.diag(W) if len(W.shape) == 1 else W
        H = Xm @ W @ Ym.T
    # find rotation
    U, S, Vt = np.linalg.svd(H)
    R = Vt.T @ U.T
    if np.linalg.det(R) < 0:
        # special reflection case
        if verbose:
            print("det(R) < R, reflection detected!, correcting for it ...\n");
        # the global minimizer with a orthogonal transformation is not possible
        # the next best transformation is chosen
        Vt[-1,:] *= -1
        R = Vt.T @ U.T
    t = -R @ centroid_X + centroid_Y
    return R, t

def procrustes_transform(X, Y):
    """
    Compute a rigid transformation trans() from X to Y and return trans(X)
    """
    R, t = compute_rigid_transform(X, Y)
    return R @ X + t

def pnp_refine(prediction, observation, intrinsics, dist_coeffs):
    """
    Refine 3D prediction with observed image projection based on  the PnP algorithm.
    """
    (success, R, T) = cv2.solvePnP(prediction,
                                   observation,
                                   intrinsics,
                                   dist_coeffs,
                                   flags=cv2.SOLVEPNP_ITERATIVE)
    if not success:
        print('PnP failed.')
        return prediction
    else:
        refined_prediction = cv2.Rodrigues(R)[0] @ prediction.T + T    
        return refined_prediction

================================================
FILE: libs/common/utils.py
================================================
"""
Common utilities.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import torch
import torch.nn as nn
import numpy as np

from libs.metric.criterions import PCK_THRES

import os
from os.path import join as pjoin
from collections import namedtuple

def make_dir(name):
    """    
    Create a directory.
    """
    if not os.path.exists(os.path.dirname(name)):
        try:
            os.makedirs(os.path.dirname(name))
        except OSError as exc:
            print('make_dir failed.')
            raise exc
    return

def save_checkpoint(states, is_best, output_dir, filename='checkpoint.pth'):
    torch.save(states, pjoin(output_dir, filename))
    if is_best and 'state_dict' in states:
        torch.save(states['best_state_dict'], pjoin(output_dir, 'model_best.pth'))

def get_model_summary(model, *input_tensors, item_length=26, verbose=False):
    """
    Summarize a model. For now only convolution, batch normalization and 
    linear layers are considered for parameters and FLOPs.
    """
    summary = []
    ModuleDetails = namedtuple(
        "Layer", 
        ["name", "input_size", "output_size", "num_parameters", "multiply_adds"]
        )
    hooks = []
    layer_instances = {}

    def hook(module, input, output):
        class_name = str(module.__class__.__name__)
        instance_index = 1
        if class_name not in layer_instances:
            layer_instances[class_name] = instance_index
        else:
            instance_index = layer_instances[class_name] + 1
            layer_instances[class_name] = instance_index
    
        layer_name = class_name + "_" + str(instance_index)
    
        params = 0
    
        if class_name.find("Conv") != -1 or class_name.find("BatchNorm") != -1 or \
           class_name.find("Linear") != -1:
            for param_ in module.parameters():
                params += param_.view(-1).size(0)
    
        flops = "Not Available"
        if class_name.find("Conv") != -1 and hasattr(module, "weight"):
            flops = (
                torch.prod(
                    torch.LongTensor(list(module.weight.data.size()))) *
                torch.prod(
                    torch.LongTensor(list(output.size())[2:]))).item()
        elif isinstance(module, nn.Linear):
            flops = (torch.prod(torch.LongTensor(list(output.size()))) \
                     * input[0].size(1)).item()
    
        if isinstance(input[0], list):
            input = input[0]
        if isinstance(output, list):
            output = output[0]
    
        summary.append(
            ModuleDetails(
                name=layer_name,
                input_size=list(input[0].size()),
                output_size=list(output.size()),
                num_parameters=params,
                multiply_adds=flops)
        )

    def add_hooks(module):
        if not isinstance(module, nn.ModuleList) \
           and not isinstance(module, nn.Sequential) \
           and module != model:
            hooks.append(module.register_forward_hook(hook))

    model.eval()
    model.apply(add_hooks)

    space_len = item_length

    model(*input_tensors)
    for h in hooks:
        h.remove()

    details = ''
    if verbose:
        details = "Model Summary" + \
            os.linesep + \
            "Name{}Input Size{}Output Size{}Parameters{}Multiply Adds (Flops){}".format(
                ' ' * (space_len - len("Name")),
                ' ' * (space_len - len("Input Size")),
                ' ' * (space_len - len("Output Size")),
                ' ' * (space_len - len("Parameters")),
                ' ' * (space_len - len("Multiply Adds (Flops)"))) \
                + os.linesep + '-' * space_len * 5 + os.linesep

    params_sum = 0
    flops_sum = 0
    for layer in summary:
        params_sum += layer.num_parameters
        if layer.multiply_adds != "Not Available":
            flops_sum += layer.multiply_adds
        if verbose:
            details += "{}{}{}{}{}{}{}{}{}{}".format(
                layer.name,
                ' ' * (space_len - len(layer.name)),
                layer.input_size,
                ' ' * (space_len - len(str(layer.input_size))),
                layer.output_size,
                ' ' * (space_len - len(str(layer.output_size))),
                layer.num_parameters,
                ' ' * (space_len - len(str(layer.num_parameters))),
                layer.multiply_adds,
                ' ' * (space_len - len(str(layer.multiply_adds)))) \
                + os.linesep + '-' * space_len * 5 + os.linesep

    details += os.linesep \
        + "Total Parameters: {:,}".format(params_sum) \
        + os.linesep + '-' * space_len * 5 + os.linesep
    details += "Total Multiply Adds (For Convolution and Linear Layers only): {:,} GFLOPs".format(flops_sum/(1024**3)) \
        + os.linesep + '-' * space_len * 5 + os.linesep
    details += "Number of Layers" + os.linesep
    for layer in layer_instances:
        details += "{} : {} layers   ".format(layer, layer_instances[layer])

    return details

class AverageMeter(object):
    """
    An averaege meter object that computes and stores the average and current value.
    """
    def __init__(self):
        self.reset()
        self.PCK_stats = {}
        
    def reset(self):
        self.val = 0
        self.avg = 0
        self.sum = 0
        self.count = 0
        return
    
    def update(self, val, n=1, others=None):
        self.val = val
        self.sum += val * n
        self.count += n
        self.avg = self.sum / self.count if self.count != 0 else 0
        if others is not None and'correct_cnt' in others:
            if 'sum' not in self.PCK_stats:
                self.PCK_stats['sum'] = np.zeros(len(others['correct_cnt'])) 
            self.PCK_stats['sum'] += others['correct_cnt']
            if 'total' not in self.PCK_stats:
                self.PCK_stats['total'] = 0.
            self.PCK_stats['total'] += n         
        return
    
    def print_content(self):
        if 'sum' in self.PCK_stats:
            for idx, value in enumerate(self.PCK_stats['sum']):
                PCK = value / self.PCK_stats['total']
                print('Average PCK at threshold {:.2f}: {:.3f}'.format(PCK_THRES[idx], PCK))
        return

================================================
FILE: libs/dataset/KITTI/__init__.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Empty file.
"""


================================================
FILE: libs/dataset/KITTI/car_instance.py
================================================
"""
KITTI dataset implemented as PyTorch dataset object.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import libs.dataset.basic.basic_classes as bc
import libs.visualization.points as vp
import libs.common.img_proc as lip

from libs.common.utils import make_dir
from libs.common.img_proc import get_affine_transform

import numpy as np
import matplotlib.pyplot as plt
import torch
import torchvision.transforms as transforms
import cv2
import csv
import copy

from PIL import Image
from mpl_toolkits.mplot3d import Axes3D
from torch.utils.data.dataloader import default_collate
from os.path import join as pjoin
from os.path import sep as osep
from os.path import exists
from os import listdir 

# maximum number of instances to the network depending on your GPU memory
MAX_INS_CNT = 140
#MAX_INS_CNT = 64

TYPE_ID_CONVERSION = {
    'Car': 0,
    'Cyclist': 1,
    'Pedestrian': 2,
}

# annotation style of KITTI dataset
FIELDNAMES = ['type', 
              'truncated', 
              'occluded', 
              'alpha', 
              'xmin', 
              'ymin', 
              'xmax', 
              'ymax', 
              'dh', 
              'dw',
              'dl', 
              'lx', 
              'ly', 
              'lz', 
              'ry']

# the format of prediction has one more field: confidence score
FIELDNAMES_P = FIELDNAMES.copy() + ['score']

# indices used for performing interpolation
# key->value: style->index arrays
interp_dict = {
    'bbox12':(np.array([1,3,5,7,# h direction
                        1,2,3,4,# l direction
                        1,2,5,6]), # w direction
              np.array([2,4,6,8,
                        5,6,7,8,
                        3,4,7,8])
              ),
    'bbox12l':(np.array([1,2,3,4,]), # w direction
              np.array([5,6,7,8])
              ),
    'bbox12h':(np.array([1,3,5,7]), # w direction
              np.array([2,4,6,8])
              ),
    'bbox12w':(np.array([1,2,5,6]), # w direction
              np.array([3,4,7,8])
              ),    
    }

# indices used for computing the cross ratio
cr_indices_dict = {
    'bbox12':np.array([[ 1,  9, 21,  2],
                       [ 3, 10, 22,  4],
                       [ 5, 11, 23,  6],
                       [ 7, 12, 24,  8],
                       [ 1, 13, 25,  5],
                       [ 2, 14, 26,  6],
                       [ 3, 15, 27,  7],
                       [ 4, 16, 28,  8],
                       [ 1, 17, 29,  3],
                       [ 2, 18, 30,  4],
                       [ 5, 19, 31,  7],
                       [ 6, 20, 32,  8]]
                      )
    }

def get_cr_indices():
    """
    Helper function to define the indices used in computing the cross-ratio.
    """
    num_base_pts = 9
    num_lines = 12
    parents, children = interp_dict['bbox12']
    cr_indices = []
    for line_idx in range(num_lines):
        parent_idx = parents[line_idx] # first point
        child_idx = children[line_idx] # last point
        second_point_idx = num_base_pts + line_idx
        third_point_idx = num_base_pts + num_lines + line_idx
        cr_indices.append(np.array([parent_idx, 
                                   second_point_idx, 
                                   third_point_idx,
                                   child_idx]
                                  ).reshape(1,4)
                         )
    cr_indices = np.vstack(cr_indices)
    return cr_indices

class KITTI(bc.SupervisedDataset):
    """
    KITTI dataset.
    """    
    def __init__(self, cfgs, split, logger, scale=1.0):
        super().__init__(cfgs, split, logger)
        self.logger = logger
        self.logger.info("Initializing KITTI {:s} set, please wait...".format(split))
        self.exp_type = cfgs['exp_type'] # exp_type: experiment type 
        self._data_dir = cfgs['dataset']['root'] # root directory
        self._classes = cfgs['dataset']['detect_classes'] # used object classes
        self._get_data_parameters(cfgs) # initialize hyper-parameters
        self._set_paths() # initialize paths
        self._inference_mode = False 
        self.car_sizes = [] # dimension of cars
        self._load_image_list()
        if self.split in ['train', 'valid', 'trainvalid'] and \
            self.exp_type in ['instanceto2d', 'baselinealpha', 'baselinetheta']:
            # prepare local coordinates used in certain types of experiments
            self._prepare_key_points(cfgs)
            # save cropped car instances for debugging
            # cropped_path = pjoin(self._data_config['cropped_dir'], self.kpts_style,
            #                      self.split)
            # if not exists(cropped_path) and cfgs['dataset']['pre-process']:
            #     self._save_cropped_instances()            
        # prepare data used for future loading
        self.generate_pairs()
        # self.visualize()
        if self.split in ['train', 'trainvalid'] and self.exp_type in ['2dto3d']:
            # 2dto3d means the data is used by the lifter that predicts 3D 
            # cuboid based on 2D screen coordinates 
            self.normalize() # data normalization used for the lifter network
        if 'ss' in cfgs and cfgs['ss']['flag']:
            # use unlabeled images for weak self-supervision
            self.use_ss = True
            self.ss_settings = cfgs['ss']
            self._initialize_unlabeled_data(cfgs)
        self.logger.info("Initialization finished for KITTI {:s} set".format(split))
        # self.show_statistics()
        # debugging code if you need
        # test = self[10]
        # test = self.extract_ss_sample(1)
    
    def _get_image_path_list(self):
        """
        Prepare list of image paths for the used split.
        """
        assert 'image_name_list' in self._data_config
        image_path_list = []
        for name in self._data_config['image_name_list']:
            img_path = pjoin(self._data_config['image_dir'], name)
            image_path_list.append(img_path)
        self._data_config['image_path_list'] = image_path_list        
        return
    
    def _initialize_unlabeled_data(self, cfgs):
        """
        Initialize unlabeled data for self-supervision experiment.
        """
        self.ss_record = np.load(cfgs['ss']['record_path'], allow_pickle=True).item()
        self.logger.info('Found prepared self-supervision record at: ' + cfgs['ss']['record_path'])
        return
    
    def _load_image_list(self):
        """
        Prepare list of image names for the used split.
        """
        path = self._data_config[self.split + '_list']       
        with open(path, "r") as f:
            image_name_list = f.read().splitlines()
        for idx, line in enumerate(image_name_list):
            base_name = line.replace("\n", "")
            image_name = base_name + ".png"
            image_name_list[idx] = image_name
        self._data_config['image_name_list'] = image_name_list
        self._get_image_path_list()
        return
    
    def _check_precomputed_file(self, path, name):
        """
        Check if a pre-computed numpy file exists or not.
        """
        if exists(path):
            self.logger.info('Found prepared {0:s} at {1:s}'.format(name, path))
            value = np.load(path, allow_pickle=True).item()
            setattr(self, name, value)
            return True
        else:
            return False
        
    def _save_precomputed_file(self, data_dic, pre_computed_path, name):
        """
        Save a pre-computed numpy file.
        """
        setattr(self, name, data_dic)
        make_dir(pre_computed_path)
        np.save(pre_computed_path, data_dic)
        self.logger.info('Save prepared {0:s} at {1:s}'.format(name, pre_computed_path))        
        return
    
    def _prepare_key_points_custom(self, style, interp_params, vis_thresh=0.25):
        """
        Project 3D bounding boxes to image planes to prepare screen coordinates.
        """        
        assert 'keypoint_dir' in self._data_config
        kpt_dir = self._data_config['keypoint_dir']
        if interp_params['flag']:
            style += str(interp_params['coef'])
        pre_computed_path_kpts = pjoin(kpt_dir, '{0:s}_{1:s}_{2:s}.npy'.format(style, self.split, str(self._classes)))
        pre_computed_path_ids = pjoin(kpt_dir, '{0:s}_{1:s}_{2:s}_ids.npy'.format(style, self.split, str(self._classes)))        
        pre_computed_path_rots = pjoin(kpt_dir, '{0:s}_{1:s}_{2:s}_rots.npy'.format(style, self.split, str(self._classes)))   
        if self._check_precomputed_file(pre_computed_path_kpts, 'keypoints'):
            pass
        if self._check_precomputed_file(pre_computed_path_ids, 'instance_ids'):
            pass     
        if self._check_precomputed_file(pre_computed_path_rots, 'rotations'):
            return    
        path_list = self._data_config['image_path_list']
        data_dic_kpts = {}
        data_dic_ids = {}
        data_dic_rots = {}
        for path in path_list:
            image_name = path.split(osep)[-1]
            # instances that lie out of the image plane will be discarded 
            list_2d, _, list_id, _, list_rots = self.get_2d_3d_pair(path, 
                                                                    style=style, 
                                                                    augment=False,
                                                                    add_visibility=True,
                                                                    filter_outlier=True,
                                                                    add_rotation=True
                                                                    )  
            if len(list_2d) == 0:
                continue
            for idx, kpts in enumerate(list_2d):
                list_2d[idx] = kpts.reshape(1, -1, 3)
            data_dic_kpts[image_name] = np.concatenate(list_2d, axis=0)
            data_dic_ids[image_name] = list_id
            data_dic_rots[image_name] = np.concatenate(list_rots, axis=0)
        self._save_precomputed_file(data_dic_kpts, pre_computed_path_kpts, 'keypoints')
        self._save_precomputed_file(data_dic_ids, pre_computed_path_ids, 'instance_ids')  
        self._save_precomputed_file(data_dic_rots, pre_computed_path_rots, 'rotations') 
        return
    
    def _prepare_key_points(self, cfgs):
        self.kpts_style = cfgs['dataset']['2d_kpt_style']
        self._prepare_key_points_custom(self.kpts_style, cfgs['dataset']['interpolate'])
        if 'enlarge_factor' in cfgs['dataset']:
            self.enlarge_factor = cfgs['dataset']['enlarge_factor']
        else:
            self.enlarge_factor = 1.1
        return
    
    def _save_cropped_instances(self):
        # DEPRECATED, will be removed in a future release
        """ 
        Crop and save car instance images with given 2d key-points
        """        
        assert hasattr(self, 'keypoints')
        all_save_paths = []
        all_keypoints = []
        all_bbox = []
        target_ar = self.hm_para['target_ar']
        for image_name in self.keypoints.keys():
            image_path = pjoin(self._data_config['image_dir'], image_name)
            save_dir = pjoin(self._data_config['cropped_dir'], self.kpts_style,
                             self.split, image_name[:-4])
            keypoints = self.keypoints[image_name]
            new_paths, new_keypoints, bboxes = lip.save_cropped_patches(image_path, 
                                                                keypoints, 
                                                                save_dir, 
                                                                enlarge=self.enlarge_factor,
                                                                target_ar=target_ar)
            all_save_paths += new_paths
            all_keypoints.append(new_keypoints)
            all_bbox += bboxes
        annot_save_name = pjoin(self._data_config['cropped_dir'], 
                                self.kpts_style, self.split, 'annot.npy')
        np.save(annot_save_name, {'paths': all_save_paths,
                                  'kpts': np.concatenate(all_keypoints, axis=0),
                                  'global_box': all_bbox
                                  })
        return
    
    def _prepare_2d_pose_annot(self, threshold=4):
        """ 
        Prepare annotation for training the coordinate regression model.
        """          
        all_paths = []
        all_boxes = []
        all_rotations = []
        all_keypoints = []
        all_keypoints_raw = []
        for image_name in self.keypoints.keys():
            image_path = pjoin(self._data_config['image_dir'], image_name)
            # raw keypoints using camera projection
            keypoints = self.keypoints[image_name]
            rotations = self.rotations[image_name]
            boxes_img = []
            rots_img = []
            visible_kpts_img = []
            for i in range(len(keypoints)):
                # Note here severely-occluded instances are ignored in the trainign data
                visible_cnt = np.sum(keypoints[i][:, 2])
                if visible_cnt < threshold:
                    continue
                else:
                    # now set all keypoints as visible
                    tempt_kpts = keypoints[i][:,:2]
                    visible_kpts_img.append(np.expand_dims(tempt_kpts, 0))
                center, crop_size, new_keypoints, vis_rate = lip.kpts2cs(tempt_kpts, enlarge=self.enlarge_factor)
                bbox_instance = np.array((list(map(int, lip.cs2bbox(center, crop_size)))))
                boxes_img.append(bbox_instance.reshape(1,4))
                rots_img.append(rotations[i].reshape(1,2))
            if len(boxes_img) == 0:
                continue
            all_paths.append(image_path)            
            all_boxes.append(np.concatenate(boxes_img))
            all_rotations.append(np.concatenate(rots_img))
            all_keypoints.append(np.concatenate(visible_kpts_img))
            all_keypoints_raw.append(keypoints)
        return {'paths':all_paths, 
                'boxes':all_boxes, 
                'rots':all_rotations,
                'kpts':all_keypoints,
                'raw_kpts':all_keypoints_raw
                }
    
    def _prepare_detection_records(self, save=False, threshold = 0.1):
        # DEPRECATED UNTIL FURTHER UPDATE
        raise ValueError

    def gather_annotations(self, 
                           threshold=0.1, 
                           use_raw_bbox=False, 
                           add_gt=True,
                           filter_outlier=False
                           ):
        """ 
        Read ground truth 3D bounding box labels.
        """           
        path_list = self._data_config['image_path_list'] 
        record_dict = {}
        for img_path in path_list:
            image_name = img_path.split(osep)[-1]
            if self.split != 'test':
                # default: use gt label and calibration
                label_path = pjoin(self._data_config['label_dir'], 
                                   image_name[:-4] + '.txt'
                                   )
                self.read_single_file(image_name, 
                                      record_dict, 
                                      label_path=label_path,
                                      fieldnames=FIELDNAMES,
                                      add_gt=add_gt,
                                      use_raw_bbox=use_raw_bbox,
                                      filter_outlier=filter_outlier
                                      )
            else:
                record_dict[image_name] = {}
        self.annot_dict = record_dict
        return     
    
    def read_single_file(self, 
                         image_name, 
                         record_dict, 
                         label_path=None,
                         calib_path=None,
                         threshold=0.1,
                         fieldnames=FIELDNAMES_P,
                         add_gt=False,
                         use_raw_bbox=True,
                         filter_outlier=False,
                         bbox_only=False
                         ):
        """ 
        Read labels and prepare annotation for a single image.
        """  
        style = self._data_config['3d_kpt_sample_style']
        image_path = pjoin(self._data_config['image_dir'], image_name)
        if label_path is None:
            # default is ground truth annotation
            label_path = pjoin(self._data_config['label_dir'], image_name[:-3] + 'txt')
        if calib_path is None:
            calib_path = pjoin(self._data_config['calib_dir'], image_name[:-3] + 'txt')
        list_2d, list_3d, list_id, pv, raw_bboxes = self.get_2d_3d_pair(image_path,
                                                                        label_path=label_path,
                                                                        calib_path=calib_path,
                                                                        style=style,
                                                                        augment=False,
                                                                        add_raw_bbox=True,
                                                                        bbox_only=bbox_only,
                                                                        filter_outlier=filter_outlier,
                                                                        fieldnames=fieldnames # also load the confidence score
                                                                        )  
        if len(raw_bboxes) == 0:
            return False        
        if image_name not in record_dict:
            record_dict[image_name] = {}
        raw_annot, P = self.load_annotations(label_path, calib_path, fieldnames=fieldnames)
        # use different (slightly) intrinsic parameters for different images
        K = P[:, :3]  
        if len(list_2d) != 0:
            for idx, kpts in enumerate(list_2d):
                list_2d[idx] = kpts.reshape(1, -1, 3)
                list_3d[idx] = list_3d[idx].reshape(1, -1, 3)
            all_keypoints_2d = np.concatenate(list_2d, axis=0)
            all_keypoints_3d = np.concatenate(list_3d, axis=0)                       
            # compute 2D bounding box based on the projected 3D boxes
            bboxes_kpt = []
            for idx, keypoints in enumerate(all_keypoints_2d):
                # relatively tight bounding box: use enlarge = 1.0
                # delete invisible instances
                center, crop_size, _, _ = lip.kpts2cs(keypoints[:,:2],
                                                      enlarge=1.01)
                bbox = np.array(lip.cs2bbox(center, crop_size))             
                bboxes_kpt.append(np.array(bbox).reshape(1, 4))
            record_dict[image_name]['kpts_3d'] = all_keypoints_3d
            if add_gt:
                # special key name representing ground truth
                record_dict[image_name]['kpts'] = all_keypoints_2d
                record_dict[image_name]['kpts_3d_gt'] = all_keypoints_3d
        if use_raw_bbox:
            bboxes = np.vstack(raw_bboxes)
        elif len(bboxes_kpt) != 0:
            bboxes = np.vstack(bboxes_kpt)
            
        record_dict[image_name]['bbox_2d'] = bboxes
        record_dict[image_name]['raw_txt_format'] = raw_annot
        record_dict[image_name]['K'] = K
        # add some key-value pairs as ground truth annotation
        if add_gt:         
            pvs = np.vstack(pv) if len(pv) != 0 else []
            tempt_dic = {'boxes': bboxes,
                         'pose_vecs_gt':pvs
                         }
            record_dict[image_name] = {**record_dict[image_name], **tempt_dic}              
        return True
    
    def read_predictions(self, path):
        """
        Read the prediction files in the same format as the ground truth.
        """
        self.logger.info("Reading predictions from {:s}".format(path))
        file_list = listdir(path)  
        record_dict = {}
        use_raw_bbox = True if self.split == 'test' else False
        for file_name in file_list:
            if not file_name.endswith(".txt"):
                continue
            image_name = file_name[:-4] + ".png"
            label_path = pjoin(path, file_name)            
            self.read_single_file(image_name, 
                                  record_dict, 
                                  label_path=label_path,
                                  use_raw_bbox=use_raw_bbox
                                  )
        self.logger.info("Reading predictions finished.")
        return record_dict
    
    def _get_data_parameters(self, cfgs):
        """
        Initialize dataset-relevant parameters.
        """
        self._data_config = {}
        self._data_config['image_size_raw'] = NotImplemented
        if self.exp_type in ['2dto3d', 'inference', 'finetune']:
            # parameters relevant to input/output representation
            for key in ['3d_kpt_sample_style', 'lft_in_rep', 'lft_out_rep']:
                self._data_config[key] = cfgs['dataset'][key] 
        if self.exp_type in ['2dto3d']:  
            # parameters relevant to data augmentation              
            for key in ['lft_aug','lft_aug_times']:
                self._data_config[key] = cfgs['training_settings'][key]
        # parameters relevant to cuboid interpolation
        self.interp_params = cfgs['dataset']['interpolate']
        # parameters relevant to heatmap regression model and image data augmentation
        if 'heatmapModel' in cfgs:
            hm = cfgs['heatmapModel']
            jitter_flag = hm['jitter_bbox'] and self.split=='train' and cfgs['train']
            self.hm_para = {'reference': 'bbox',
                            'resize': True,
                            'add_xy': hm['add_xy'],
                            'jitter_bbox': jitter_flag,
                            'jitter_params': hm['jitter_params'],
                            # (height, width)
                            'input_size': np.array([hm['input_size'][1],
                                             hm['input_size'][0]]),
                            'heatmap_size': np.array([hm['heatmap_size'][1],
                                               hm['heatmap_size'][0]]),
                            'target_ar': hm['heatmap_size'][1]/hm['heatmap_size'][0],
                            'augment': hm['augment_input'],
                            'sf': cfgs['dataset']['scaling_factor'],
                            'rf': cfgs['dataset']['rotation_factor'],
                            'num_joints': hm['num_joints'],
                            'sigma': hm['sigma'] if 'sigma' in hm else None,
                            'target_type': hm['target_type'] if 'target_type' in hm else None,
                            'use_different_joints_weight': 
                                hm['use_different_joints_weight'] if 'use_different_joints_weight' in hm else None                            
                              }
            self.num_joints = hm['num_joints']
        # parameters relevant to PyTorch image transformation operations
        if 'pth_transform' in cfgs['dataset']:
            pth_transform = cfgs['dataset']['pth_transform']
            normalize = transforms.Normalize(
                mean=pth_transform['mean'], 
                std=pth_transform['std']
                )
            transform_list = [transforms.ToTensor(), normalize]
            if self.exp_type == 'detect2D' and self.split == 'train':
                transform_list.append(transforms.RandomHorizontalFlip(0.5))
            self.pth_trans = transforms.Compose(transform_list)           

    def _set_paths(self):
        """
        Initialize relevant directories.
        """
        ROOT = self.root
        split = self.split
        # validation set is a sub-set of the official training split
        # train/val/test: 3712/3769/7518
        split = 'train' if self.split == 'valid' else split
        split += 'ing'
        self._data_config['image_dir'] = pjoin(ROOT, split, 'image_2')
        self._data_config['cropped_dir'] = pjoin(ROOT, split, 'cropped')
        self._data_config['drawn_dir'] = pjoin(ROOT, split, 'drawn')
        self._data_config['label_dir'] = pjoin(ROOT, split, 'label_2')
        self._data_config['calib_dir'] = pjoin(ROOT, split, 'calib')
        self._data_config['keypoint_dir'] = pjoin(ROOT, split, 'keypoints')
        self._data_config['stats_dir'] = pjoin(ROOT, 'instance_stats.npy')
        # list of images for each sub-set
        self._data_config['train_list'] = pjoin(ROOT, 'training/ImageSets/train.txt')
        self._data_config['valid_list'] = pjoin(ROOT, 'training/ImageSets/val.txt')
        self._data_config['test_list'] = pjoin(ROOT, 'testing/ImageSets/test.txt')
        self._data_config['trainvalid_list'] = pjoin(ROOT, 'training/ImageSets/trainval.txt')        
        return
    
    def project_3d_to_2d(self, points, K):
        """ 
        Get 2D projection of 3D points in the camera coordinate system. 
        """          
        projected = K @ points.T
        projected[:2, :] /= projected[2, :]
        return projected
    
    def render_car(self, ax, K, obj_class, rot_y, locs, dimension, shift):
        # DEPRECATED
        cam_cord = []
        self.get_cam_cord(cam_cord, shift, rot_y, dimension, locs)
        # get 2D projections 
        projected = self.project_3d_to_2d(cam_cord[0], K)
        ax.plot(projected[0, :], projected[1, :], 'ro')
        vp.plot_3d_bbox(ax, projected[:2, 1:].T)
        return
    
    def show_statistics(self):
        # DEPRECATED
        path = self._data_config['stats_dir']       
        if self._check_precomputed_file(path, 'instance_stats') or self.split != 'train':
            return
        self.instance_statistics = {}
        if hasattr(self, 'car_sizes') and len(self.car_sizes) != 0:
            all_sizes = np.concatenate(self.car_sizes)
            fig, axes = plt.subplots(3,1)
            names = ['x', 'y', 'z']
            for axe_id in range(3):
                axes[axe_id].hist(all_sizes[:, axe_id])
                axes[axe_id].set_xlabel('Car size in {:s} direction'.format(names[axe_id]))
                axes[axe_id].set_ylabel('Counts')
            mean_size = all_sizes.mean(axis=0)
            std_size = all_sizes.std(axis=0)
            self.instance_statistics['size'] = {'mean':mean_size,
                                                'std': std_size
                                                }
            # prepare a reference 3D bounding box
            xmax, xmin = mean_size[0], -mean_size[0]
            ymax, ymin = mean_size[1], -mean_size[1]
            zmax, zmin = mean_size[2], -mean_size[2]
            bbox = np.array([[xmax, ymin, zmax],
                             [xmax, ymax, zmax],
                             [xmax, ymin, zmin],
                             [xmax, ymax, zmin],
                             [xmin, ymin, zmax],
                             [xmin, ymax, zmax],
                             [xmin, ymin, zmin],
                             [xmin, ymax, zmin]])
            bbox = np.vstack([np.array([[0., 0., 0.]]), bbox])            
            self.instance_statistics['ref_box3d'] = bbox
        self._save_precomputed_file(self.instance_statistics, path, 'instance_stats')            
        return
    
    def augment_pose_vector(self, 
                            locs,
                            rot_y,
                            obj_class,
                            dimension,
                            augment,
                            augment_times,
                            std_rot = np.array([15., 50., 15.])*np.pi/180.,
                            std_trans = np.array([0.2, 0.01, 0.2]),
                            ):
        """
        Data augmentation used for training the lifter sub-model.
        
        std_rot: standard deviation of rotation around x, y and z axis
        std_trans: standard deviation of translation along x, y and z axis
        """
        aug_ids, aug_pose_vecs = [], []
        aug_ids.append((obj_class, dimension))
        # KITTI only annotates rotation around y-axis (yaw)
        pose_vec = np.concatenate([locs, np.array([0., rot_y, 0.])]).reshape(1, 6)
        aug_pose_vecs.append(pose_vec)
        if not augment:
            return aug_ids, aug_pose_vecs
        rots_random = np.random.randn(augment_times, 3) * std_rot.reshape(1, 3)
        # y-axis
        rots_random[:, 1] += rot_y
        trans_random = 1 + np.random.randn(augment_times, 3) * std_trans.reshape(1, 3)
        trans_random *= locs.reshape(1, 3)
        for i in range(augment_times):
            # augment 6DoF pose
            aug_ids.append((obj_class, dimension))
            pose_vec = np.concatenate([trans_random[i], rots_random[i]]).reshape(1, 6)
            aug_pose_vecs.append(pose_vec)
        return aug_ids, aug_pose_vecs
    
    def get_representation(self, p2d, p3d, in_rep, out_rep):
        """
        Get input-output representations based on 3d point cloud and its 
        projected 2D screen coordinates.
        """        
        # input representation
        if len(p2d) > 0:
            num_kpts = len(p2d[0])
        if in_rep == 'coordinates2d':
            input_list = [points.reshape(1, num_kpts, -1) for points in p2d]
        elif in_rep == 'coordinates2d+area' and self._data_config['3d_kpt_sample_style'] == 'bbox9':
            # indices: [corner, neighbour1, neighbour2]
            indices = self.area_indices
            input_list = [vp.get_area(points, indices, True) for points in p2d]
        else:
            raise NotImplementedError('Undefined input representation.')
        # output representation
        if out_rep == 'R3d+T':
            # R3D stands for relative 3D shape, T stands for translation
            # center the camera coordinates to remove depth
            output_list = []
            for i in range(len(p3d)):
                # format: the root should be pre-computed as the first 3d point 
                root = p3d[i][[0], :]
                relative_shape = p3d[i][1:, :] - root
                output = np.concatenate([root, relative_shape], axis=0)
                output_list.append(output.reshape(1, -1)) 
        elif out_rep == 'R3d': # relative 3D shape
            output_list = []
            # save a copy of the 3D object roots
            if not hasattr(self, 'root_list'):
                self.root_list = []
            for i in range(len(p3d)):
                # format: the root should be pre-computed as the first 3d point 
                root = p3d[i][[0], :]
                self.root_list.append(root)
                relative_shape = p3d[i][1:, :] - root
                output_list.append(relative_shape.reshape(1, -1)) 
        else:
            raise NotImplementedError('undefined output representation.')
        return input_list, output_list
    
    def get_input_output_size(self):
        """
        Get the input/output size for 2d-to-3d lifting.
        """
        num_joints = self.num_joints
        if self._data_config['lft_in_rep'] == 'coordinates2d':
             input_size = num_joints*2
        else:
             raise NotImplementedError
        if self._data_config['lft_out_rep'] in ['R3d+T']:
             output_size = num_joints*3
        elif self._data_config['lft_out_rep'] in ['R3d']:
             output_size = (num_joints - 1) * 3             
        else:
             raise NotImplementedError        
        return input_size, output_size
    
    def interpolate(self, 
                    bbox_3d, 
                    style, 
                    interp_coef=[0.5], 
                    dimension=None, 
                    strings=['l','h','w']
                    ):
        """
        Interpolate 3d points on a 3D bounding box with a specified style.
        """
        if dimension is not None:
            # size-encoded representation
            l = dimension[0]
            if l < 3.5:
                style += 'l'
            elif l < 4.5:
                style += 'h'
            else:
                style += 'w'       
        pidx, cidx = interp_dict[style]
        parents, children = bbox_3d[:, pidx], bbox_3d[:, cidx]
        lines = children - parents
        new_joints = [(parents + interp_coef[i]*lines) for i in range(len(interp_coef))]
        return np.hstack([bbox_3d, np.hstack(new_joints)])
    
    def construct_box_3d(self, l, h, w, interp_params):
        """
        Construct 3D bounding box corners in the canonical pose.
        """        
        x_corners = [0.5*l, l, l, l, l, 0, 0, 0, 0]
        y_corners = [0.5*h, 0, h, 0, h, 0, h, 0, h]
        z_corners = [0.5*w, w, w, 0, 0, w, w, 0, 0]
        x_corners += - np.float32(l) / 2
        y_corners += - np.float32(h)
        z_corners += - np.float32(w) / 2
        corners_3d = np.array([x_corners, y_corners, z_corners])     
        if interp_params['flag']:
            corners_3d = self.interpolate(corners_3d, 
                                          interp_params['style'],
                                          interp_params['coef'],
                                          #dimension=np.array([l,h,w]) # dimension aware
                                          )
        return corners_3d
    
    def get_cam_cord(self, cam_cord, shift, ids, pose_vecs, rot_xz=False):
        """
        Construct 3D bounding box corners in the camera coordinate system.
        """         
        # does not augment the dimension for now
        dims = ids[0][1]
        l, h, w = dims[0], dims[1], dims[2]
        corners_3d_fixed = self.construct_box_3d(l, h, w, self.interp_params)
        for pose_vec in pose_vecs:
            # translation
            locs = pose_vec[0, :3]
            rots = pose_vec[0, 3:]
            x, y, z = locs[0], locs[1], locs[2] # bottom center of the labeled 3D box
            rx, ry, rz = rots[0], rots[1], rots[2]
            # This purturbation turns out to work well for rotation estimation
#            x *= (1 + np.random.randn()*0.1)
#            y *= (1 + np.random.randn()*0.05)
#            z *= (1 + np.random.randn()*0.1)
            if self.split == 'train' and self.exp_type == '2dto3d' and not self._inference_mode:
                ry += np.random.randn()*np.pi # random perturbation
            rot_maty = np.array([[np.cos(ry), 0, np.sin(ry)],
                                [0, 1, 0],
                                [-np.sin(ry), 0, np.cos(ry)]])
            if rot_xz:
                # rotation. Only yaw angle is considered in KITTI dataset
                rot_matx = np.array([[1, 0, 0],
                                    [0, np.cos(rx), -np.sin(rx)],
                                    [0, np.sin(rx), np.cos(rx)]])        
    
                rot_matz = np.array([[np.cos(rz), -np.sin(rz), 0],
                                    [np.sin(rz), np.cos(rz), 0],
                                    [0, 0, 1]])        
                # TODO: correct here
                rot_mat = rot_matz @ rot_maty @ rot_matx     
            else:
                rot_mat = rot_maty
            corners_3d = np.matmul(rot_mat, corners_3d_fixed)
            # translation
            corners_3d += np.array([x, y, z]).reshape([3, 1])
            camera_coordinates = corners_3d + shift
            cam_cord.append(camera_coordinates.T)
        return 
    
    def csv_read_annot(self, file_path, fieldnames):
        """
        Read instance attributes in the KITTI format. Instances not in the 
        selected class will be ignored. 
        
        A list of python dictionary is returned where each dictionary 
        represents one instsance.
        """        
        annotations = []
        with open(file_path, 'r') as csv_file:
            reader = csv.DictReader(csv_file, delimiter=' ', fieldnames=fieldnames)
            for line, row in enumerate(reader):
                if row["type"] in self._classes:
                    annot_dict = {
                        "class": row["type"],
                        "label": TYPE_ID_CONVERSION[row["type"]],
                        "truncation": float(row["truncated"]),
                        "occlusion": float(row["occluded"]),
                        "alpha": float(row["alpha"]),
                        "dimensions": [float(row['dl']), 
                                       float(row['dh']), 
                                       float(row['dw'])
                                       ],
                        "locations": [float(row['lx']), 
                                      float(row['ly']), 
                                      float(row['lz'])
                                      ],
                        "rot_y": float(row["ry"]),
                        "bbox": [float(row["xmin"]),
                                 float(row["ymin"]),
                                 float(row["xmax"]),
                                 float(row["ymax"])
                                 ]
                    }
                    if "score" in fieldnames:
                        annot_dict["score"] = float(row["score"])
                    annotations.append(annot_dict)        
        return annotations
    
    def csv_read_calib(self, file_path):
        """
        Read camera projection matrix in the KITTI format.
        """  
        with open(file_path, 'r') as csv_file:
            reader = csv.reader(csv_file, delimiter=' ')
            for line, row in enumerate(reader):
                if row[0] == 'P2:':
                    P = row[1:]
                    P = [float(i) for i in P]
                    P = np.array(P, dtype=np.float32).reshape(3, 4)
                    break        
        return P
    
    def load_annotations(self, label_path, calib_path, fieldnames=FIELDNAMES): 
        """
        Read 3D annotation and camera parameters.
        """          
        if self.split in ['train', 'valid', 'trainvalid', 'test']:
            annotations = self.csv_read_annot(label_path, fieldnames)
        # get camera intrinsic matrix K
        P = self.csv_read_calib(calib_path)
        return annotations, P
    
    def add_visibility(self, joints, img_width=1242, img_height=375):
        """
        Compute binary visibility of projected 2D parts.
        """  
        assert joints.shape[1] == 2
        visibility = np.ones((len(joints), 1))
        # predicate from upper left corner
        predicate1 = joints - np.array([[0., 0.]])
        predicate1 = (predicate1 > 0.).prod(axis=1)
        # predicate from lower right corner
        predicate2 = joints - np.array([[img_width, img_height]])
        predicate2 = (predicate2 < 0.).prod(axis=1)
        visibility[:, 0] *= predicate1*predicate2      
        return np.hstack([joints, visibility])
    
    def get_inlier_indices(self, p_2d, threshold=0.3):
        """
        Get indices of instances that are visible 'enough'.
        """  
        indices = []
        num_joints = p_2d[0].shape[0]
        for idx, kpts in enumerate(p_2d):
            if p_2d[idx][:, 2].sum() / num_joints >= threshold:
                indices.append(idx)        
        return indices
    
    def filter_outlier(self, p_2d, p_3d, threshold=0.3):
        """
        Keep instances that are visible 'enough'.
        """  
        p_2d_filtered, p_3d_filtered, indices = [], [], []
        num_joints = p_2d[0].shape[0]
        for idx, kpts in enumerate(p_2d):
            if p_2d[idx][:, 2].sum() / num_joints >= threshold:
                p_2d_filtered.append(p_2d[idx])
                p_3d_filtered.append(p_3d[idx])
                indices.append(idx)
        return p_2d_filtered, p_3d_filtered
    
    def get_img_size(self, path):
        """
        Get the resolution of an image without loading it.
        """
        with Image.open(path) as image:
            size = image.size 
        return size
    
    def get_2d_3d_pair(self, 
                       image_path, 
                       label_path=None,
                       calib_path=None,
                       style='null',
                       in_rep = 'coordinates2d',
                       out_rep = 'R3d+T',
                       augment=False, 
                       augment_times=1,
                       add_visibility=True,
                       add_raw_bbox=False, # add original bbox annotation from KITTI
                       add_rotation=False, # add orientation angles
                       bbox_only=False, # only returns raw bounding box
                       filter_outlier=True,
                       fieldnames=FIELDNAMES
                       ):
        """
        Get (input, output) pair used for training a lifter sub-model from a 
        single image.
        """
        image_name = image_path.split(osep)[-1]
        if label_path is None:
            # default is ground truth annotation
            label_path = pjoin(self._data_config['label_dir'], image_name[:-3] + 'txt')
        if calib_path is None:
            calib_path = pjoin(self._data_config['calib_dir'], image_name[:-3] + 'txt')
        anns, P = self.load_annotations(label_path, calib_path, fieldnames=fieldnames)
        # The intrinsics may vary slightly for different images
        # Yet one may convert them to a fixed one by applying a homography
        K = P[:, :3]
        # Debug: use pre-defined intrinsic parameters
        # K = np.array([[707.0493,   0.    , 604.0814],
        #               [  0.    , 707.0493, 180.5066],
        #               [  0.    ,   0.    ,   1.    ]], dtype=np.float32)
        shift = np.linalg.inv(K) @ P[:, 3].reshape(3,1)      
        # P containes intrinsics and extrinsics, I factorize P to K[I|K^-1t] 
        # and use extrinsics to compute the camera coordinate
        # here the extrinsics represent the shift between current camera to
        # the reference grayscale camera        
        # For more calibration details, refer to "Vision meets Robotics: The KITTI Dataset"
        camera_coordinates = []
        pose_vecs = []
        # id includes the class and size of the object
        ids = []
        if add_raw_bbox:
            bboxes = []
        if add_rotation:
            rotations = []
        for i, a in enumerate(anns):
            a = a.copy()
            obj_class = a["label"]
            dimension = a["dimensions"]
            locs = np.array(a["locations"])
            rot_y = np.array(a["rot_y"])
            if add_raw_bbox:
                bboxes.append(np.array(a["bbox"]).reshape(1,4))
            if add_rotation:
                rotations.append(np.array([a["alpha"], a["rot_y"]]).reshape(1,2))
            # apply data augmentation to represent a larger variation of
            # 3D pose and translation 
            if bbox_only:
                continue
            aug_ids, aug_pose_vecs = self.augment_pose_vector(locs,
                                                              rot_y,
                                                              obj_class,
                                                              dimension,
                                                              augment,
                                                              augment_times
                                                              )
            self.get_cam_cord(camera_coordinates, 
                              shift, 
                              aug_ids, 
                              aug_pose_vecs
                              )                    
            ids += aug_ids
            pose_vecs += aug_pose_vecs
        num_instances = len(camera_coordinates)
        # get 2D projections 
        if len(camera_coordinates) != 0:
            camera_coordinates = np.vstack(camera_coordinates)
            projected = self.project_3d_to_2d(camera_coordinates, K)[:2, :].T
            # target is camera coordinates
            p_2d = np.split(projected, num_instances, axis=0) 
            p_3d = np.split(camera_coordinates, num_instances, axis=0) 
            # set visibility to 0 if the projected keypoints lie out of the image plane
            if add_visibility:
                width, height = self.get_img_size(image_path)
                for idx, joints in enumerate(p_2d):
                    p_2d[idx] = self.add_visibility(joints, width, height)
            # filter out the instances that lie outside of the image
            if filter_outlier:
                indices = self.get_inlier_indices(p_2d)
                p_2d = [p_2d[idx] for idx in indices]
                p_3d = [p_3d[idx] for idx in indices]
                # p_2d, p_3d = self.filter_outlier(p_2d, p_3d)
            if filter_outlier and add_raw_bbox:
                bboxes = [bboxes[idx] for idx in indices]
            if filter_outlier and add_rotation:
                rotations = [rotations[idx] for idx in indices]            
            list_2d, list_3d = self.get_representation(p_2d, p_3d, in_rep, out_rep)

        else:
            list_2d, list_3d, ids, pose_vecs = [], [], [], []
        ret = list_2d, list_3d, ids, pose_vecs
        if add_raw_bbox:
            ret = ret + (bboxes, )
        if add_rotation:
            ret = ret + (rotations, )
        return ret            
    
    def show_annot(self, 
                   image_path, 
                   label_file=None, 
                   calib_file=None, 
                   save_dir=None
                   ):
        """
        Show the annotation of an image.
        """      
        image_name = image_path.split(osep)[-1]
        if label_file is None:
            label_file = pjoin(self._data_config['label_dir'], image_name[:-3] + 'txt')
        if calib_file is None:
            calib_file = pjoin(self._data_config['calib_dir'], image_name[:-3] + 'txt')
        anns, P = self.load_annotations(label_file, calib_file)
        K = P[:, :3]
        shift = np.linalg.inv(K) @ P[:, 3].reshape(3,1)        
        image = cv2.imread(image_path, cv2.IMREAD_UNCHANGED)[:, :, ::-1]
        fig1 = plt.figure(figsize=(11.3, 9))
        ax = plt.subplot(111)
        ax.imshow(image)
        fig2 = plt.figure(figsize=(11.3, 9))
        ax = plt.subplot(111)
        ax.imshow(image)        
        for i, a in enumerate(anns):
            a = a.copy()
            obj_class = a["label"]
            dimension = a["dimensions"]
            locs = np.array(a["locations"])
            rot_y = np.array(a["rot_y"])
            self.render_car(ax, K, obj_class, rot_y, locs, dimension, shift) 
        if save_dir is not None:
            output_path1 =  pjoin(save_dir, image_name + '_original.png')
            output_path2 = pjoin(save_dir, image_name + '_annotated.png')
            make_dir(output_path1)
            fig1.savefig(output_path1, dpi=300)
            fig2.savefig(output_path2, dpi=300)
        return
    
    def _generate_2d_3d_paris(self):
        """
        Prepare pair of 2D screen coordinates and 3D cuboid representation.
        
        """
        path_list = self._data_config['image_path_list']
        kpt_3d_style = self._data_config['3d_kpt_sample_style']
        in_rep = self._data_config['lft_in_rep']
        out_rep = self._data_config['lft_out_rep'] # R3d (Relative 3D shape) encodes 3D rotation
        input_list = []
        output_list = []
        id_list = []
        augment = self._data_config['lft_aug'] if self.split == 'train' else False
        augment_times = self._data_config['lft_aug_times']
        for path in path_list:
            list_2d, list_3d, ids, _ = self.get_2d_3d_pair(path, 
                                                           style=kpt_3d_style,
                                                           in_rep = in_rep,
                                                           out_rep = out_rep,
                                                           augment=augment,
                                                           augment_times=augment_times,
                                                           add_visibility=True
                                                           )            
            input_list += list_2d
            output_list += list_3d
            id_list += ids
        # does not use visibility as input
        num_instance = len(input_list)
        self.input = np.vstack(input_list)[:, :, :2].reshape(num_instance, -1)
        # use visibility as input
        # self.input = np.vstack(input_list).reshape(num_instance, -1)
        self.output = np.vstack(output_list) 
        if hasattr(self, 'root_list'):
            self.root_list = np.vstack(self.root_list)
        self.num_joints = int(self.input.shape[1]/2)      
        return
    
    def generate_pairs(self):
        """
        Prepare data (e.g., input-output pairs and metadata) that will be used 
        depending on the type of experiment.
        """
        if self.exp_type == '2dto3d':           
            # generate 2D screen coordinates and 3D cuboid
            self._generate_2d_3d_paris()
        elif self.exp_type in ['instanceto2d', 'baselinealpha', 'baselinetheta']:
            # # load the annotations containing cropped car instances 
            # path = pjoin(self._data_config['cropped_dir'], 
            #              self.kpts_style, self.split, 'annot.npy')
            # assert exists(path), 'Please prepare instance annotation first.'
            # self.annot_2dpose = np.load(path, allow_pickle=True).item() 
            self.annot_2dpose = self._prepare_2d_pose_annot()
        elif self.exp_type in ['detection2d']:
            self._prepare_detection_records()
            self.total_data = len(self.detection_records)
        elif self.exp_type == 'inference':
            self.gather_annotations()
            self.total_data = len(self.annot_dict)
            self.annoted_img_paths = list(self.annot_dict.keys())
        elif self.exp_type == 'finetune':
            self.gather_annotations(use_raw_bbox=False, 
                                    add_gt=True, 
                                    filter_outlier=True
                                    )
            self.total_data = len(self.annot_dict)
            self.annoted_img_paths = list(self.annot_dict.keys())            
        else:
            raise NotImplementedError('Unknown experiment type.')
        # count of total data
        if self.exp_type == '2dto3d':
            self.input = self.input.astype(np.float32())
            self.output = self.output.astype(np.float32())
            self.total_data = len(self.input)
        elif self.exp_type in ['instanceto2d', 'baselinealpha', 'baselinetheta']:
            self.total_data = len(self.annot_2dpose['paths'])
        return
    
    def visualize(self, plot_num = 1, save_dir=None):
        """
        Show some random images with annotations.
        """        
        path_list = self._data_config['image_path_list']
        chosen = np.random.choice(len(path_list), plot_num, replace=False)
        for img_idx in chosen:
            self.show_annot(path_list[img_idx], save_dir=save_dir)
        return
    
    def get_collate_fn(self):
        return my_collate_fn
    
    def inference(self, flags=[True, True]):
        self._inference_mode = flags[0]
        self._read_img_during_inference = flags[1]
    
    def extract_ss_sample(self, cnt):
        """
        Prepare data for self-supervised representation learning.
        """           
        # cnt: number of fully supervised samples
        extract_cnt = self.ss_settings['max_per_img'] - cnt
        if extract_cnt <= 0:
            num_channel = 5 if self.hm_para['add_xy'] else 3
            return torch.zeros(0, num_channel, 256, 256), None, None, None
        idx = np.random.randint(0, len(self.ss_record['paths']))
        parameters = self.hm_para
        parameters['boxes'] = self.ss_record['boxes'][idx]
        joints = self.ss_record['kpts'][idx]
        img_name = self.ss_record['paths'][idx].split(osep)[-1]
        img_path = pjoin(self.ss_settings['img_root'], img_name)
        image, target, weights, meta = lip.get_tensor_from_img(img_path, 
                                                               parameters, 
                                                               joints=joints,
                                                               pth_trans=self.pth_trans,
                                                               rf=parameters['rf'],
                                                               sf=parameters['sf'],
                                                               generate_hm=False,
                                                               max_cnt=extract_cnt
                                                               )        
        return image, target, weights, meta
    
    def prepare_ft_dict(self, idx):
        """
        Prepare data for fine-tuning.
        """  
        img_name = self.annoted_img_paths[idx]
        img_annot = self.annot_dict[img_name]
        ret = {}
        img_path = pjoin(self._data_config['image_dir'], img_name)
        kpts = img_annot['kpts']
        # the croping bounding box in the original image
        # global_box = self.annot_2dpose['global_box'][idx]
        parameters = self.hm_para
        parameters['boxes'] = img_annot['bbox_2d']
        # fs: fully-supervised ss: self-supervised
        images_fs, heatmaps_fs, weights_fs, meta_fs = lip.get_tensor_from_img(img_path, 
                                                                              parameters, 
                                                                              joints=kpts,
                                                                              pth_trans=self.pth_trans,
                                                                              rf=parameters['rf'],
                                                                              sf=parameters['sf'],
                                                                              generate_hm=True)
        ret['path'] = img_path
        ret['images_fs'] = images_fs
        ret['heatmaps_fs'] = heatmaps_fs
        # ret['meta_fs'] = meta_fs
        ret['kpts_3d'] = img_annot['kpts_3d']
        ret['crop_center'] = meta_fs['center']
        ret['crop_scale'] = meta_fs['scale']
        ret['kpts_local'] = meta_fs['transformed_joints']
        # prepare the affine transformation matrices so map local coordinates
        # back to global screen coordinates
        ret['af_mats'] = []
        for idx in range(len(ret['crop_center'])):
            trans_inv = get_affine_transform(ret['crop_center'][idx],
                                             ret['crop_scale'][idx], 
                                             0., 
                                             self.hm_para['input_size'], 
                                             inv=1)  
            ret['af_mats'].append(trans_inv)
        # use random unlabeled images for data augmentation
        if self.split == 'train' and self.use_ss:
            images_ss, heatmaps_ss, weights_ss, meta_ss = self.extract_ss_sample(len(images_fs))
            ret['images_ss'] = images_ss
            ret['meta_ss'] = meta_ss
        return ret
    
    def __getitem__(self, idx):
        """
        Required by dataloader.
        """  
        # only return testing images during inference
        if self.split == 'test' or self._inference_mode:
            #TODO: consider classes except for cars in the future
            img_name = self.annoted_img_paths[idx]
            # debug: use a specified image for visualization
            # img_name = "006658.png"
            img_path = pjoin(self._data_config['image_dir'], img_name)
            if self._read_img_during_inference:
                image = lip.imread_rgb(img_path)
            else:
                image = None
            if self._read_img_during_inference and hasattr(self, 'pth_trans'):
                # pytorch transformation if provided
                image = self.pth_trans(image)
            record = {'path':img_path}
            # add other available annotations
            if hasattr(self, 'annot_dict'):
                record = {**record, **self.annot_dict[img_name]}
            return image, record
        # for training and validation splits
        if self.exp_type == '2dto3d':
            # the 2D-3D pairs are stored in RAM
            meta_data = {}
            # the 3D global position
            if hasattr(self, 'root_list'):
                meta_data['roots'] = self.root_list[idx]
            return self.input[idx], self.output[idx], np.zeros((0,1)), meta_data
        elif self.exp_type in ['baselinealpha', 'baselinetheta']:
            img_path = self.annot_2dpose['paths'][idx]
            rots = self.annot_2dpose['rots'][idx]
            kpts = self.annot_2dpose['kpts'][idx]
            if kpts.shape[2] == 2:
                kpts = np.concatenate([kpts, np.ones((kpts.shape[0], kpts.shape[1], 1))], axis=2)            
            parameters = self.hm_para
            parameters['boxes'] = self.annot_2dpose['boxes'][idx]
            images_fs, heatmaps_fs, weights_fs, meta_fs = lip.get_tensor_from_img(img_path, 
                                                                                  parameters, 
                                                                                  joints=kpts,
                                                                                  pth_trans=self.pth_trans,
                                                                                  rf=parameters['rf'],
                                                                                  sf=parameters['sf'],
                                                                                  generate_hm=False
                                                                                  )
            if self.exp_type == 'baselinealpha':
                targets = [np.array([[np.cos(rots[idx][0]), np.sin(rots[idx][0])]])  for idx in range(len(rots))]
                meta_fs['angles_gt'] = rots[:, 0]
            elif self.exp_type == 'baselinetheta':
                targets = [np.array([[np.cos(rots[idx][1]), np.sin(rots[idx][1])]]) for idx in range(len(rots))]
                meta_fs['angles_gt'] = rots[:, 1]
            targets = torch.from_numpy(np.concatenate(targets).astype(np.float32))
            return images_fs, targets, weights_fs, meta_fs
        elif self.exp_type == 'instanceto2d':
            # the input images and target heatmaps are produced online
            img_path = self.annot_2dpose['paths'][idx]
            kpts = self.annot_2dpose['kpts'][idx]
            # the croping bounding box in the original image
            # global_box = self.annot_2dpose['global_box'][idx]
            if kpts.shape[2] == 2:
                kpts = np.concatenate([kpts, np.ones((kpts.shape[0], kpts.shape[1], 1))], axis=2)
            parameters = self.hm_para
            parameters['boxes'] = self.annot_2dpose['boxes'][idx]
            # fs: fully-supervised ss: self-supervised
            images_fs, heatmaps_fs, weights_fs, meta_fs = \
                lip.get_tensor_from_img(img_path, 
                                        parameters, 
                                        joints=kpts,
                                        pth_trans=self.pth_trans,
                                        rf=parameters['rf'],
                                        sf=parameters['sf'],
                                        generate_hm=True
                                        )
            # use random unlabeled images for data augmentation
            if self.split == 'train' and hasattr(self, 'use_ss') and self.use_ss:
                images_ss, heatmaps_ss, weights_ss, meta_ss = self.extract_ss_sample(len(images_fs))
                images = [images_fs, images_ss]
                targets = heatmaps_fs
                weights = weights_fs
                meta = meta_fs
            else:
                images = images_fs
                targets = heatmaps_fs
                weights = weights_fs
                meta = meta_fs
            return images, targets, weights, meta
        elif self.exp_type == 'detection2d':
            record = copy.deepcopy(self.detection_records[idx])
            path = record['path']
            image = lip.imread_rgb(path)
            target = record['target']
            if hasattr(self, 'pth_trans'):
                # pytorch transformation if provided
                image = self.pth_trans(image)
            return image, target
        elif self.exp_type == 'finetune':
            # prepare images, 2D and 3D annotations as a dictionary for finetuning 
            ret = self.prepare_ft_dict(idx)
            return ret
        else:
            raise NotImplementedError

def prepare_data(cfgs, logger):
    """
    Prepare training and validation dataset objects.
    """  
    train_set = KITTI(cfgs, 'train', logger)
    valid_set = KITTI(cfgs, 'valid', logger)
    if cfgs['exp_type'] == '2dto3d':
        # normalize 2D keypoints
        valid_set.normalize(train_set.statistics)
    return train_set, valid_set

def get_dataset(cfgs, logger, split):
    return KITTI(cfgs, split, logger)

def collate_dict(dict_list):
    ret = {}
    ret['path'] = [item['path'] for item in dict_list]
    for key in dict_list[0]:
        if key == 'path':
            continue
        ret[key] = np.concatenate([d[key] for d in dict_list], axis=0)
    return ret

def length_limit(instances, targets, target_weights, meta):
    if len(instances) > MAX_INS_CNT and len(instances) == len(targets):
        # normal training
        chosen = np.random.choice(len(instances), MAX_INS_CNT, replace=False)
        ins, tar, tw, = instances[chosen], targets[chosen], target_weights[chosen]
        m = {'path':meta['path']}
        for key in meta:
            if key != 'path':
                m[key] = meta[key][chosen]
    elif len(instances) > MAX_INS_CNT and len(instances) > len(targets) and meta['fs_instance_cnt'] > MAX_INS_CNT:
        # mixed training: fully-supervised instances are too many
        chosen = np.random.choice(meta['fs_instance_cnt'], MAX_INS_CNT, replace=False)
        ins, tar, tw, = instances[chosen], targets[chosen], target_weights[chosen]
        m = {'path':meta['path']}
        for key in meta:
            if key != 'path' and key != 'fs_instance_cnt':
                m[key] = meta[key][chosen]
    elif len(instances) > MAX_INS_CNT and len(instances) > len(targets) and meta['fs_instance_cnt'] <= MAX_INS_CNT:
        # mixed training: self-supervised instances are too many
        ins, tar, tw, m = instances[:MAX_INS_CNT], targets, target_weights, meta
    else:
        ins, tar, tw, m = instances, targets, target_weights, meta
    return ins, tar, tw, m

def my_collate_fn(batch):
    # the collate function for 2d pose training
    instances, targets, target_weights, meta = list(zip(*batch))
    if isinstance(instances[0], list):
        # each batch comes in the format of (fs_instances, ss_instances)
        fs_instances, ss_instances = list(zip(*instances))
        fs_instances = torch.cat(fs_instances)
        ss_instances = torch.cat(ss_instances)
        instances = torch.cat([fs_instances, ss_instances])
        targets = torch.cat(targets, dim=0)
        # target_weights = torch.cat(target_weights, dim=0)
        meta = collate_dict(meta)
        meta['fs_instance_cnt'] = len(fs_instances)
    else:
        instances = torch.cat(instances, dim=0)
        targets = torch.cat(targets, dim=0)
        # target_weights = torch.cat(target_weights, dim=0)
        meta = collate_dict(meta)
    if target_weights[0] is not None:
        target_weights = torch.cat(target_weights, dim=0)
    else:
        #dummy weight
        target_weights = torch.ones(1)
    return length_limit(instances, targets, target_weights, meta)

================================================
FILE: libs/dataset/__init__.py
================================================
#import libs.dataset.ApolloScape
import libs.dataset.KITTI


================================================
FILE: libs/dataset/basic/__init__.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Empty file.
"""


================================================
FILE: libs/dataset/basic/basic_classes.py
================================================
"""
Basic classes for customized dataset classes to inherit.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""
import torch.utils.data
import libs.dataset.normalization.operations as nop

class SupervisedDataset(torch.utils.data.Dataset):
    def __init__(self, cfgs, split, logger=None):
        self.cfgs = cfgs
        self.split = split
        self.logger = logger
        self.root = cfgs['dataset']['root']
        return
    
    def generate_pairs(self, synthetic=True):
        # sub-classes need to override this method to specify the inputs and
        # outputs
        self.input = None
        self.output = None
        self.total_data = 0
        return
    
    def normalize(self, statistics=None):
        """ 
        Normalize the (input, output) pairs with optional statistics.
        """
        if statistics is None:
            mean_in, std_in = nop.get_statistics_1d(self.input)
            mean_out, std_out = nop.get_statistics_1d(self.output)
            self.statistics = {'mean_in': mean_in,
                               'mean_out': mean_out,
                               'std_in': std_in,
                               'std_out': std_out
                               }
        else:
            mean_in, std_in = statistics['mean_in'], statistics['std_in']
            mean_out, std_out = statistics['mean_out'], statistics['std_out']
            self.statistics = statistics
        self.input = nop.normalize_1d(self.input, mean_in, std_in)
        self.output = nop.normalize_1d(self.output, mean_out, std_out)
        return
    
    def unnormalize(self, data, mean, std):
        return nop.unnormalize_1d(data, mean, std)
    
    def __len__(self):
        return self.total_data

    def __getitem__(self, idx):
        return self.input[idx], self.output[idx]

================================================
FILE: libs/dataset/normalization/__init__.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Empty file.
"""


================================================
FILE: libs/dataset/normalization/operations.py
================================================
"""
Dataset normalization operations.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import numpy as np

def get_statistics_1d(data):
    """
    Compute statistics of 1D data.
    
    data of shape [num_sample, vector_length]
    """  
    assert len(data.shape) == 2
    mean = data.mean(axis=0, keepdims=True)
    std = data.std(axis=0, keepdims=True)
    return mean, std

def normalize_1d(data, mean, std, individual=False):
    """
    Normalizes 1D data with mean and standard deviation.
    
    data: dictionary where values are
    mean: np vector with the mean of the data
    std: np vector with the standard deviation of the data
    individual: whether to perform normalization independently for each input
    
    Returns
    data_out: normalized data
    """
    if individual:
        # this representation has the implicit assumption that the representation
        # is translational and scaling invariant
        num_data = len(data)
        data = data.reshape(num_data, -1, 2)
        mean_x = np.mean(data[:,:,0], axis=1).reshape(num_data, 1)
        std_x = np.std(data[:,:,0], axis=1)
        mean_y = np.mean(data[:,:,1], axis=1).reshape(num_data, 1)
        std_y = np.std(data[:,:,1], axis=1)
        denominator = (0.5 * (std_x + std_y)).reshape(num_data, 1)
        data[:,:,0] = (data[:,:,0] - mean_x)/denominator
        data[:,:,1] = (data[:,:,1] - mean_y)/denominator
        data_out = data.reshape(num_data, -1)
    else:
        data_out = (data - mean)/std
    return data_out

def unnormalize_1d(normalized_data, mean, std):
    orig_data = normalized_data*std + mean
    return orig_data

================================================
FILE: libs/logger/__init__.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Empty file.
"""


================================================
FILE: libs/logger/logger.py
================================================
"""
Basic logging functions.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import logging
import os
import time

from libs.common import utils

initialized = False

def get_dirs(cfgs):
    """
    Prepare file directories for a logger object.
    """     
    root_output_dir = cfgs['dirs']['output']
    dataset_name = cfgs['dataset']['name']
    cfg_name = cfgs['name']
    final_output_dir = [root_output_dir, dataset_name]    
    final_output_dir = os.path.join(*final_output_dir)
    time_str = time.strftime('%Y-%m-%d %H:%M')
    log_file = '{}_{}.log'.format(cfg_name, time_str)
    final_log_file = os.path.join(final_output_dir, log_file)
    return final_output_dir, final_log_file

def get_logger(cfgs, head = '%(asctime)-15s %(message)s'):
    """
    Prepare a logger object.
    """     
    final_output_dir, final_log_file = get_dirs(cfgs)
    utils.make_dir(final_log_file)
    logging.basicConfig(filename=str(final_log_file), format=head)
    logger = logging.getLogger()
    logger.setLevel(logging.INFO)
    if len(logger.handlers) == 1:    
        console = logging.StreamHandler()
        logging.getLogger('').addHandler(console)    
    return logger, final_output_dir

================================================
FILE: libs/loss/__init__.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Empty file.
"""


================================================
FILE: libs/loss/function.py
================================================
"""
Loss functions.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from scipy.spatial import distance_matrix

from libs.common.img_proc import soft_arg_max, appro_cr


loss_dict = {'mse': nn.MSELoss(reduction='mean'),
             'sl1': nn.SmoothL1Loss(reduction='mean'),
             'l1': nn.L1Loss(reduction='mean')
             }

class JointsMSELoss(nn.Module):
    def __init__(self, use_target_weight):
        super(JointsMSELoss, self).__init__()
        self.criterion = nn.MSELoss(reduction='mean')
        self.use_target_weight = use_target_weight

    def forward(self, output, target, target_weight, meta=None):
        batch_size = output.size(0)
        num_joints = output.size(1)
        heatmaps_pred = output.reshape((batch_size, num_joints, -1)).split(1, 1)
        heatmaps_gt = target.reshape((batch_size, num_joints, -1)).split(1, 1)
        loss = 0

        for idx in range(num_joints):
            heatmap_pred = heatmaps_pred[idx].squeeze()
            heatmap_gt = heatmaps_gt[idx].squeeze()
            if self.use_target_weight:
                loss += 0.5 * self.criterion(
                    heatmap_pred.mul(target_weight[:, idx]),
                    heatmap_gt.mul(target_weight[:, idx])
                )
            else:
                loss += 0.5 * self.criterion(heatmap_pred, heatmap_gt)

        return loss / num_joints

def get_comp_dict(spec_list = ['None', 'None', 'None'], 
                  loss_weights = [1,1,1]
                  ):
    comp_dict = {}

    if spec_list[0] != 'None':
        comp_dict['hm'] = (loss_dict[spec_list[0]], loss_weights[0])
    if spec_list[1] != 'None':
        comp_dict['coor'] = (loss_dict[spec_list[1]], loss_weights[1])
    if spec_list[2] != 'None':
        comp_dict['cr'] = (loss_dict[spec_list[2]], loss_weights[2])     
    return comp_dict

class JointsCompositeLoss(nn.Module):
    """
    Loss function for 2d screen coordinate regression which consists of 
    multiple terms.
    """
    def __init__(self,
                 spec_list,
                 img_size,
                 hm_size,
                 loss_weights = [1,1,1],
                 target_cr = None,
                 cr_loss_thres = 0.15,
                 use_target_weight=False
                 ):
        """
        comp_dict specify the optional terms used in the loss computation, 
        which is specified with spec_list.
        loss for each component follows the format of [loss_type, weight],
        loss_type speficy the loss type for each component (e.g. L1 or L2) while
        weight gives the weight for this component.
        
        hm: a supervised loss defined with a heatmap target
        coor: a supervised loss defined with 2D coordinates
        cr: a self-supervised loss defined with prior cross-ratio
        """
        super(JointsCompositeLoss, self).__init__()
        self.comp_dict = get_comp_dict(spec_list, loss_weights)
        self.img_size = img_size
        self.hm_size = hm_size
        self.target_cr = target_cr
        self.use_target_weight = use_target_weight
        self.apply_cr_loss = False
        self.cr_loss_thres = cr_loss_thres

    def calc_hm_loss(self, output, target):
        """
        Heatmap loss which corresponds to L_{hm} in the paper.
        
        output: predicted heatmaps of shape [N, K, H, W]
        target: ground truth heatmaps of shape [N, K, H, W]
        """        
        batch_size = output.size(0)
        num_parts = output.size(1)
        heatmaps_pred = output.reshape((batch_size, num_parts, -1)).split(1, 1)
        heatmaps_gt = target.reshape((batch_size, num_parts, -1)).split(1, 1)
        loss = 0
        for idx in range(num_parts):
            heatmap_pred = heatmaps_pred[idx].squeeze()
            heatmap_gt = heatmaps_gt[idx].squeeze()
            loss += 0.5 * self.comp_dict['hm'][0](heatmap_pred, heatmap_gt)        
        return loss / num_parts
    
    def calc_cross_ratio_loss(self, pred_coor, target_cr, mask):
        """
        Cross-ratio loss which corresponds to L_{cr} in the paper.
        
        pred_coor: predicted local coordinates
        target_cr: ground truth cross ratio
        """  
        assert hasattr(self, 'cr_indices')
        # this indices is assumed to be initialized by the user
        loss = 0
        mask = mask.to(pred_coor.device)
        if mask.sum() == 0:
            return loss
        for sample_idx in range(len(pred_coor)):
            for line_idx in range(len(self.cr_indices)):
                if mask[sample_idx][line_idx] == 0:
                    continue
                # predicted cross-ratio square
                pred_cr_sqr = appro_cr(pred_coor[sample_idx][self.cr_indices[line_idx]])
                # normalize the predicted cross-ratio square
                pred_cr_sqr /= target_cr**2
                line_loss = self.comp_dict['cr'][0](pred_cr_sqr, torch.ones(1).to(pred_cr_sqr.device))
                loss += line_loss * mask[sample_idx][line_idx][0]
        return loss/mask.sum()
    
    def get_cr_mask(self, coordinates, threshold = 0.15):
        """
        Mask some edges out when computing the cross-ratio loss.
        Ignore the fore-shortened edges since they will produce large and 
        unstable gradient.
        """          
        assert hasattr(self, 'cr_indices')
        mask = torch.zeros(coordinates.shape[0], len(self.cr_indices), 1)
        for sample_idx in range(len(coordinates)):
            for line_idx in range(len(self.cr_indices)):
                pts = coordinates[sample_idx][self.cr_indices[line_idx]]
                dm = distance_matrix(pts, pts)
                minval = np.min(dm[np.nonzero(dm)])
                if minval > threshold:
                    mask[sample_idx][line_idx] = 1.0
        return mask
    
    def calc_colinear_loss(self):
        # DEPRECATED
        return 0.
    
    def calc_coor_loss(self, coordinates_pred, coordinates_gt):
        """
        Coordinate loss which corresponds to L_{2d} in the paper.
        coordinates_pred: [N, K, 2]
        coordinates_gt: [N, K, 2]
        """  
        coordinates_gt[:, :, 0] /= self.img_size[0]
        coordinates_gt[:, :, 1] /= self.img_size[1]   
        loss = self.comp_dict['coor'][0](coordinates_pred, coordinates_gt) 
        return loss
    
    def forward(self, output, target, target_weight=None, meta=None):
        """
        Loss evaluation.
        Output is in the format of (heatmaps, coordinates) where coordinates
        is optional.
        target refers to the ground truth heatmaps.
        """  
        if type(output) is tuple:
            heatmaps_pred, coordinates_pred = output
        else:
            heatmaps_pred, coordinates_pred = output, None
        total_loss = 0
        if 'hm' in self.comp_dict:
            # some heatmaps map be produced by unlabeled data
            if len(heatmaps_pred) != len(target):
                heatmaps_pred = heatmaps_pred[:len(target)]
            total_loss += self.calc_hm_loss(heatmaps_pred, target) * self.comp_dict['hm'][1]
        if 'coor' in self.comp_dict:
            coordinates_gt = meta['transformed_joints'][:, :, :2].astype(np.float32)
            coordinates_gt = torch.from_numpy(coordinates_gt).cuda()           
            if coordinates_pred == None:
                coordinates_pred, max_vals = soft_arg_max(heatmaps_pred)
                coordinates_pred[:, :, 0] /= self.hm_size[1]
                coordinates_pred[:, :, 1] /= self.hm_size[0]     
            if len(coordinates_pred) != len(coordinates_gt):
                coordinates_pred_fs = coordinates_pred[:len(coordinates_gt)]
            else:
                coordinates_pred_fs = coordinates_pred
            total_loss += self.calc_coor_loss(coordinates_pred_fs, coordinates_gt) * self.comp_dict['coor'][1] 
        if 'cr' in self.comp_dict and self.comp_dict['cr'][1] != "None" and self.apply_cr_loss:
            cr_loss_mask = self.get_cr_mask(coordinates_pred.clone().detach().data.cpu().numpy(), self.cr_loss_thres)
            total_loss += self.calc_cross_ratio_loss(coordinates_pred, self.target_cr, cr_loss_mask) * self.comp_dict['cr'][1]
        return total_loss
    
class MSELoss1D(nn.Module):
    """
    Mean squared error loss.
    """     
    def __init__(self, use_target_weight=False, reduction='mean'):
        super(MSELoss1D, self).__init__()
        self.criterion = nn.MSELoss(reduction=reduction)
        self.use_target_weight = use_target_weight

    def forward(self, output, target, target_weight=None, meta=None):
        loss = self.criterion(output, target)
        return loss
    
class SmoothL1Loss1D(nn.Module):
    """
    Smooth L1 loss.
    """
    def __init__(self, use_target_weight=False):
        super(SmoothL1Loss1D, self).__init__()
        self.criterion = nn.SmoothL1Loss(reduction='mean')
        self.use_target_weight = use_target_weight

    def forward(self, output, target, target_weight=None, meta=None):
        loss = self.criterion(output, target)
        return loss

class DecoupledSL1Loss(nn.Module):
    # DEPRECATED
    def __init__(self, use_target_weight=None):
        super(DecoupledSL1Loss, self).__init__()
        self.criterion = F.smooth_l1_loss

    def forward(self, output, target, target_weight=None):
        # balance the loss for translation and rotation regression
        loss_center = self.criterion(output[:, :3], target[:, :3], reduction='mean')
        loss_else = self.criterion(output[:, 3:], target[:, 3:], reduction='mean')
        return loss_center + loss_else
    
class JointsOHKMMSELoss(nn.Module):
    # DEPRECATED
    def __init__(self, use_target_weight, topk=8):
        super(JointsOHKMMSELoss, self).__init__()
        self.criterion = nn.MSELoss(reduction='none')
        self.use_target_weight = use_target_weight
        self.topk = topk

    def ohkm(self, loss):
        ohkm_loss = 0.
        for i in range(loss.size()[0]):
            sub_loss = loss[i]
            topk_val, topk_idx = torch.topk(
                sub_loss, k=self.topk, dim=0, sorted=False
            )
            tmp_loss = torch.gather(sub_loss, 0, topk_idx)
            ohkm_loss += torch.sum(tmp_loss) / self.topk
        ohkm_loss /= loss.size()[0]
        return ohkm_loss

    def forward(self, output, target, target_weight):
        batch_size = output.size(0)
        num_joints = output.size(1)
        heatmaps_pred = output.reshape((batch_size, num_joints, -1)).split(1, 1)
        heatmaps_gt = target.reshape((batch_size, num_joints, -1)).split(1, 1)

        loss = []
        for idx in range(num_joints):
            heatmap_pred = heatmaps_pred[idx].squeeze()
            heatmap_gt = heatmaps_gt[idx].squeeze()
            if self.use_target_weight:
                loss.append(0.5 * self.criterion(
                    heatmap_pred.mul(target_weight[:, idx]),
                    heatmap_gt.mul(target_weight[:, idx])
                ))
            else:
                loss.append(
                    0.5 * self.criterion(heatmap_pred, heatmap_gt)
                )

        loss = [l.mean(dim=1).unsqueeze(dim=1) for l in loss]
        loss = torch.cat(loss, dim=1)

        return self.ohkm(loss)

class WingLoss(nn.Module):
    # DEPRECATED
    def __init__(self, use_target_weight, width=5, curvature=0.5, image_size=(384, 288)):
        super(WingLoss, self).__init__()
        self.width = width
        self.curvature = curvature
        self.C = self.width - self.width * np.log(1 + self.width / self.curvature)
        self.image_size = image_size
        
    def forward(self, output, target, target_weight):
        prediction, _ = soft_arg_max(output)
        # normalize the coordinates to 0-1
        prediction[:, :, 0] /= self.image_size[1]
        prediction[:, :, 1] /= self.image_size[0]
        target[:, :, 0] /= self.image_size[1]
        target[:, :, 1] /= self.image_size[0]  
        diff = target - prediction
        diff_abs = diff.abs()
        loss = diff_abs.clone()

        idx_smaller = diff_abs < self.width
        idx_bigger = diff_abs >= self.width

        loss[idx_smaller] = self.width * torch.log(1 + diff_abs[idx_smaller] / self.curvature)
        loss[idx_bigger]  = loss[idx_bigger] - self.C
        loss = loss.mean()
        return loss

================================================
FILE: libs/metric/criterions.py
================================================
"""
Metric functions used for validation.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import libs.common.transformation as ltr
import libs.common.img_proc as lip
from libs.common.transformation import compute_similarity_transform

import numpy as np
import torch
from scipy.spatial.transform import Rotation

# threshold for percentage of correct key-points (PCK)
PCK_THRES = np.array([0.1, 0.2, 0.3])

def get_distance(gt, pred):
    """
    2D Euclidean distance of two groups of points with visibility considered. 
    
    gt: [n_joints, 2 or 3]
    pred: [n_joints, 2]
    """    
    if gt.shape[1] == 2:
        sqerr = (gt - pred)**2
        sqerr = sqerr.sum(axis = 1)
        dist_list = list(np.sqrt(sqerr))
    elif gt.shape[1] == 3:
        dist_list = []
        sqerr = (gt[:, :2] - pred)**2
        sqerr = sqerr.sum(axis = 1)
        indices = np.nonzero(gt[:, 2])[0]
        dist_list = list(np.sqrt(sqerr[indices]))        
    else:
        raise ValueError('Array shape not supported.')
    return dist_list

def get_angle_error(pred, meta_data, cfgs=None):
    """
    Compute error for angle prediction.
    """    
    if not isinstance(pred, np.ndarray):
        pred = pred.data.cpu().numpy()    
    angles_pred = np.arctan2(pred[:,1], pred[:,0])
    angles_gt = meta_data['angles_gt']
    dif = np.abs(angles_gt - angles_pred) * 180 / np.pi
    # add or minus 2*pi
    indices = dif > 180
    dif[indices] = 360 - dif[indices]
    cnt = len(pred)
    avg_acc = dif.sum()/cnt
    others = None
    return avg_acc, cnt, others

def get_PCK(pred, gt):
    """
    Get percentage of correct key-points
    """
    distance = np.array(get_distance(gt, pred))
    denominator = (gt[:, 1].max() - gt[:, 1].min()) * 1/3
    correct_cnt = np.zeros((len(PCK_THRES)))
    for idx, thres in enumerate(PCK_THRES):
        correct_cnt[idx] = (distance < thres * denominator).sum()
    return correct_cnt

def get_distance_src(output,
                     meta_data,
                     cfgs=None,
                     image_size = (256.0, 256.0),
                     arg_max='hard'
                     ):
    """
    From predicted heatmaps, obtain local coordinates (\phi_l in the paper) 
    and transform them back to the source images based on metadata. 
    Error is then evaluated on the source image for the screen coordinates 
    (\phi_g in the paper).
    """
    # the error is reported as distance in terms of pixels in the source image
    if type(output) is tuple:
        pred, max_vals = output[1].data.cpu().numpy(), None
    elif isinstance(output, np.ndarray) and arg_max == 'soft':
        pred, max_vals = lip.soft_arg_max_np(output)
    elif isinstance(output, torch.Tensor) and arg_max == 'soft': 
        pred, max_vals = lip.soft_arg_max(output)
    elif isinstance(output, np.ndarray) or isinstance(output, torch.Tensor) and arg_max == 'hard':
        if not isinstance(output, np.ndarray):
            output = output.data.cpu().numpy()        
        pred, max_vals = lip.get_max_preds(output)
    else:
        raise NotImplementedError
    image_size = image_size if cfgs is None else cfgs['heatmapModel']['input_size']
    width, height = image_size
    # multiply by down-sample ratio
    if not isinstance(pred, np.ndarray):
        pred = pred.data.cpu().numpy()
    if (max_vals is not None) and (not isinstance(max_vals, np.ndarray)):
        max_vals = max_vals.data.cpu().numpy()
    # the coordinates need to be rescaled for different cases
    if type(output) is tuple:
        pred *= np.array(image_size).reshape(1, 1, 2)
    else:
        pred *= image_size[0] / output.shape[3]
    # inverse transform and compare pixel didstance
    centers, scales = meta_data['center'], meta_data['scale']
    # some predictions are generated for unlabeled data
    if len(pred) != len(centers):
        pred_used = pred[:len(centers)]
    else:
        pred_used = pred
    if 'rotation' in meta_data:
        rots = meta_data['rotation']
    else:
        rots = [0. for i in range(len(centers))]
    joints_original_batch = meta_data['original_joints']
    distance_list = []
    correct_cnt_sum = np.zeros((len(PCK_THRES)))
    all_src_coordinates = []
    for sample_idx in range(len(pred_used)):
        trans_inv = lip.get_affine_transform(centers[sample_idx], 
                                             scales[sample_idx], 
                                             rots[sample_idx], 
                                             (height, width), 
                                             inv=1
                                             )
        joints_original = joints_original_batch[sample_idx]        
        pred_src_coordinates = lip.affine_transform_modified(pred_used[sample_idx], 
                                                             trans_inv
                                                             ) 
        all_src_coordinates.append(pred_src_coordinates.reshape(1, len(pred_src_coordinates), 2))
        distance_list += get_distance(joints_original, pred_src_coordinates)
        correct_cnt_sum += get_PCK(pred_src_coordinates, joints_original)
    cnt = len(distance_list)
    avg_acc = sum(distance_list) / cnt
    others = {
        'src_coord': np.concatenate(all_src_coordinates, axis=0), # screen coordinates
        'joints_pred': pred, # predicted local coordinates
        'max_vals': max_vals, 
        'correct_cnt': correct_cnt_sum,
        'PCK_batch': correct_cnt_sum / cnt
        }
    return avg_acc, cnt, others

class AngleError():
    """
    Angle error in degrees. 
    """  
    def __init__(self, cfgs, num_joints=None):
        self.name = 'Angle error in degrees'
        self.num_joints = num_joints
        self.count = 0
        self.mean = 0.
        return
  
    def update(self, prediction, meta_data, ground_truth=None, logger=None):
        """
        the prediction and transformation parameters in meta_data are used.
        """    
        avg_acc, cnt, others = get_angle_error(prediction, meta_data)
        self.mean = (self.mean * self.count + cnt * avg_acc) / (self.count + cnt)
        self.count += cnt
        return 
    
    def report(self, logger):
        msg = 'Error type: {error_type:s}\t' \
              'Error: {Error}\t'.format(
                      error_type = self.name,
                      Error = self.mean)     
        logger.info(msg)        
        return

class JointDistance2DSIP():
    """
    Joint distance error evaluated for screen coordinates in the source image plane (SIP). 
    """  
    def __init__(self, cfgs, num_joints=None):
        self.name = 'Joint distance in the source image plane'
        if num_joints is not None:
            self.num_joints = num_joints
        else:
            self.num_joints = cfgs['heatmapModel']['num_joints']
        self.image_size = cfgs['heatmapModel']['input_size']
        if 'arg_max' in cfgs['testing_settings']:
            self.arg_max = cfgs['testing_settings']['arg_max']
        else:
            self.arg_max = None
            self.count = 0
            self.mean = 0.
            self.PCK_counts = np.zeros(len(PCK_THRES))
        return
  
    def update(self, prediction, meta_data, ground_truth=None, logger=None):
        """
        Update statistics for a batch.
        The prediction and transformation parameters in meta_data are used.
        """    
        avg_acc, cnt, others = get_distance_src(prediction, 
                                                meta_data,
                                                arg_max=self.arg_max,
                                                image_size=self.image_size
                                                )       
        self.mean = (self.mean * self.count + cnt * avg_acc) / (self.count + cnt)
        self.count += cnt
        self.PCK_counts += others['correct_cnt']
        return 
    
    def report(self, logger):
        """
        Report final evaluation results.
        """  
        logger.info("Ealuaton Results:")
        msg = 'Error type: {error_type:s}\t' \
              'MPJPE: {MPJPE}\t'.format(error_type = self.name, 
                                        MPJPE = self.mean
                                        )     
        logger.info(msg)        
        for idx, value in enumerate(self.PCK_counts):
            PCK = value / self.count
            logger.info('PCK at threshold {:.2f}: {:.3f}'.format(PCK_THRES[idx], PCK))        
        return

def update_statistics(self, update, num_data, name_str):
    """
    Update error statistics for a data batch.
    """ 
    old_count = getattr(self, 'count'+name_str)
    old_mean = getattr(self, 'mean'+name_str)
    old_max = getattr(self, 'max'+name_str)
    old_min = getattr(self, 'min'+name_str)
    new_mean = (old_count * old_mean + np.sum(update, axis=0)) / (old_count + num_data) 
    new_count = old_count + num_data
    new_max = np.maximum(old_max, update.max(axis=0))
    new_min = np.minimum(old_min, update.min(axis=0))
    setattr(self, 'mean'+name_str, new_mean)
    setattr(self, 'count'+name_str, new_count)
    setattr(self, 'max'+name_str, new_max)
    setattr(self, 'min'+name_str, new_min)    
    return

def update_rotation_error(self, 
                          prediction, 
                          ground_truth, 
                          meta_data=None, 
                          logger=None,
                          name_str='',
                          style='euler'
                          ):
    """
    Get rotation error between two 3D point clouds. 
    """    
    num_data = len(prediction)
    prediction = prediction.reshape(num_data, -1, 3)
    ground_truth = ground_truth.reshape(num_data, -1, 3)
    if style == 'euler':
        results = -np.ones((num_data, 3))
    for data_idx in range(num_data):
        R, T = ltr.compute_rigid_transform(prediction[data_idx].T, 
                                           ground_truth[data_idx].T
                                           )
        if style == 'euler':
            results[data_idx] = np.abs(Rotation.from_matrix(R).as_euler('xyz', 
                                                                        degrees=True
                                                                        )
                                       )
        else:
            raise NotImplementedError
    update_statistics(self, results, num_data, name_str)
    return

def update_joints_3d_error(self, 
                           prediction, 
                           ground_truth, 
                           meta_data=None, 
                           logger=None,
                           name_str='',
                           style='direct'
                           ):
    """
    Get distance error between prediction and ground truth.
    """
    ground_truth = ground_truth.reshape(len(ground_truth), -1, 3)
    prediction = prediction.reshape(len(prediction), -1, 3)
    num_joints = prediction.shape[1]
    if style == 'procrustes':
        # Apply procrustes alignment if asked to do so
        for j in range(len(prediction)):
            gt  = ground_truth[j]
            out = prediction[j]
            _, Z, T, b, c = compute_similarity_transform(gt, out, compute_optimal_scale=True)
            out = (b * out.dot(T)) + c
            prediction[j] = np.reshape(out, [num_joints, 3])
    sqerr = (ground_truth - prediction)**2 
    distance = np.sqrt(np.sum(sqerr, axis=2))        
    num_data = len(prediction)
    update_statistics(self, distance, num_data, name_str)    
    # provide detailed L1 errors if there is only one joint
    if num_joints == 1:
        error_xyz = np.abs(ground_truth - prediction)
        update_statistics(self, error_xyz, num_data, name_str + '_xyz')
    return    

class RotationError3D():
    """
    Helper class for recording rotation estimation error.
    """
    def __init__(self, cfgs):
        self.name = 'Rotation error'
        self.style = cfgs['metrics']['R3D']['style']
        self.count = 0
        if self.style == 'euler':
            self.mean = np.zeros((3))
            self.max = -np.ones((3))
            self.min = np.ones((3))*1e16
        return
    
    def update(self, prediction, ground_truth, meta_data=None, logger=None):
        """
        get rotation error between two point clouds 
        """    
        update_rotation_error(self, 
                              prediction, 
                              ground_truth, 
                              meta_data=meta_data, 
                              logger=logger,
                              style=self.style
                              )
        return 
    
    def report(self, logger):
        msg = 'Error type: {error_type:s}\t' \
              'Mean error: {mean_error}\t' \
              'Max error: {max_error}\t' \
              'Min error: {min_error}\t'.format(
                      error_type = self.name,
                      mean_error= self.mean, 
                      max_error= self.max,
                      min_error= self.min
                      )     
        logger.info(msg)        
        return
    
class JointDistance3D():
    """
    Helper class for recording joint distance error.
    """
    def __init__(self, cfgs):
        self.name = 'Joint distance'
        self.style = cfgs['metrics']['JD3D']['style']
        self.num_joints = int(cfgs['FCModel']['output_size']/3)
        self.count = 0
        if self.style in ['direct', 'procrustes']:
            self.mean = np.zeros((self.num_joints))
            self.max = -np.ones((self.num_joints))
            self.min = np.ones((self.num_joints))*1e16
        else:
            raise NotImplementedError
        return
  
    def update(self, prediction, ground_truth, meta_data=None, logger=None):
        """
        get Euclidean distance between two point clouds 
        """    
        update_joints_3d_error(self, 
                               prediction,
                               ground_truth,
                               meta_data=meta_data,
                               logger=logger,
                               name_str='',
                               style=self.style
                               )        
        return 
    
    def report(self, logger):
        MPJPE = self.mean.sum() / self.num_joints
        msg = 'Error type: {error_type:s}\t' \
              'MPJPE: {MPJPE}\t' \
              'Mean error for each joint: {mean_error}\t' \
              'Max error for each joint: {max_error}\t' \
              'Min error for each joint: {min_error}\t'.format(
                      error_type = self.name,
                      MPJPE = MPJPE,
                      mean_error= self.mean, 
                      max_error= self.max,
                      min_error= self.min
                      )     
        logger.info(msg)        
        return

class RError3D():
    def __init__(self, cfgs, num_joints):
        """
        Relative shape error
        The point cloud should have a format [shape_relative_to_root]
        """           
        self.name = 'RError3D'
        self.T_style = cfgs['metrics']['R3D']['T_style']
        self.R_style = cfgs['metrics']['R3D']['R_style']
        if cfgs['dataset']['3d_kpt_sample_style'] == 'bbox9': 
            self.num_joints = num_joints - 1 # discount the root joint
        else:
            raise NotImplementedError
        self.count_rT = self.count_R = 0
        # translation error of the shape relative to the root
        self.mean_rT = np.zeros((self.num_joints))
        self.max_rT = -np.ones((self.num_joints))
        self.min_rT = np.ones((self.num_joints))*1e16            
        # relative rotation between the ground truth shape and predicted shape
        self.mean_R = np.zeros((3))
        self.max_R = -np.ones((3))
        self.min_R = np.ones((3))*1e16            
        return
  
    def update(self, prediction, ground_truth, meta_data=None, logger=None):
        update_joints_3d_error(self, 
                               prediction=prediction,
                               ground_truth=ground_truth,
                               meta_data=meta_data,
                               logger=logger,
                               name_str='_rT',
                               style=self.T_style
                               )
        update_rotation_error(self,
                              prediction=prediction,
                              ground_truth=ground_truth,
                              meta_data=meta_data,
                              logger=logger,
                              name_str='_R',
                              style=self.R_style
                              )        
        return 
    
    def report(self, logger):
        MPJPE = self.mean_rT.sum() / self.num_joints
        msg = 'Error type: {error_type:s}\t' \
              'MPJPE of the shape relative to the root:\t' \
              'MPJPE: {MPJPE}\t' \
              'Rotation error of the shape relative to the root:\t' \
              'Mean error: {mean_R}\t' \
              'Max error: {max_R}\t' \
              'Min error: {min_R}\t'.format(
                  error_type = self.name,
                  MPJPE = MPJPE,
                  mean_R = self.mean_R,
                  max_R = self.max_R,
                  min_R = self.min_R
                  )     
        logger.info(msg)        
        return
    
class RTError3D():
    def __init__(self, cfgs, num_joints):
        """
        Rotation and translation error combined.
        The point cloud should have a format [root, shape_relative_to_root]
        """           
        self.name = 'RTError3D'
        self.T_style = cfgs['metrics']['RTError3D']['T_style']
        self.R_style = cfgs['metrics']['RTError3D']['R_style']
        if cfgs['dataset']['3d_kpt_sample_style'] == 'bbox9': 
            self.num_joints = num_joints - 1 # discount the root joint
        else:
            raise NotImplementedError
        self.count_T = self.count_T_xyz = self.count_rT = self.count_R = 0
        if self.T_style in ['direct', 'procrustes']:
            # translation error of the root vector
            self.mean_T = np.zeros((1))
            # L1 error for each component
            self.mean_T_xyz = np.zeros((3))
            self.max_T = -np.ones((1))
            self.max_T_xyz = -np.ones((3))
            self.min_T = np.ones((1))*1e16
            self.min_T_xyz = np.ones((3))*1e16
            # translation error of the shape relative to the root
            self.mean_rT = np.zeros((self.num_joints))
            self.max_rT = -np.ones((self.num_joints))
            self.min_rT = np.ones((self.num_joints))*1e16            
        else:
            raise NotImplementedError
        # relative rotation between the ground truth shape and predicted shape
        self.mean_R = np.zeros((3))
        self.max_R = -np.ones((3))
        self.min_R = np.ones((3))*1e16            
        return
  
    def update(self, prediction, ground_truth, meta_data=None, logger=None):
        update_joints_3d_error(self, 
                               prediction=prediction[:, :3],
                               ground_truth=ground_truth[:, :3],
                               meta_data=meta_data,
                               logger=logger,
                               name_str='_T',
                               style=self.T_style
                               )
        update_joints_3d_error(self, 
                               prediction=prediction[:, 3:],
                               ground_truth=ground_truth[:, 3:],
                               meta_data=meta_data,
                               logger=logger,
                               name_str='_rT',
                               style=self.T_style
                               )
        update_rotation_error(self,
                              prediction=prediction[:, 3:],
                              ground_truth=ground_truth[:, 3:],
                              meta_data=meta_data,
                              logger=logger,
                              name_str='_R',
                              style=self.R_style
                              )        
        return 
    
    def report(self, logger):
        MPJPE = self.mean_rT.sum() / self.num_joints
        msg = 'Error type: {error_type:s}\t' \
              'Translation error of the root:\t' \
              'Mean error: {mean_T}\t' \
              'Max error: {max_T}\t' \
              'Min error: {min_T}\t' \
              'Translation error of the root in three directions:\t' \
              'Mean error (L1): {mean_T_xyz}\t' \
              'MPJPE of the shape relative to the root:\t' \
              'MPJPE: {MPJPE}\t' \
              'Rotation error of the shape relative to the root:\t' \
              'Mean error: {mean_R}\t' \
              'Max error: {max_R}\t' \
              'Min error: {min_R}\t'.format(
                  error_type = self.name,
                  MPJPE = MPJPE,
                  mean_T = self.mean_T, 
                  max_T = self.max_T,
                  min_T = self.min_T,
                  mean_T_xyz = self.mean_T_xyz,
                  mean_R = self.mean_R,
                  max_R = self.max_R,
                  min_R = self.min_R)     
        logger.info(msg)        
        return
    
class Evaluator():
    """
    Helper class for recording a list of pre-defined metrics.
    """    
    def __init__(self, metrics, cfgs=None, num_joints=9):
        """
        metrics is a list of strings specifying what metrics to use
        """
        self.metrics = []
        for metric in metrics:
            self.metrics.append(eval(metric + '(cfgs=cfgs, num_joints=num_joints)'))
        return
    
    def update(self, 
               prediction, 
               ground_truth=None,
               meta_data=None,
               logger=None
               ):
        """
        update evaluation with a new batch of prediction and ground truth
        """        
        for metric in self.metrics:
            metric.update(prediction, 
                          ground_truth=ground_truth,
                          meta_data=meta_data,
                          logger=logger
                          )
        return 
    
    def report(self, logger):
        for metric in self.metrics:
            metric.report(logger)       
        return


================================================
FILE: libs/model/FCmodel.py
================================================
"""
Fully-connected model architecture for processing 1D data.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""
import torch.nn as nn

class ResidualBlock(nn.Module):
    def __init__(self, 
                 num_neurons, 
                 p_dropout=0.5, 
                 kaiming=False, 
                 leaky=False):
        super(ResidualBlock, self).__init__()
        self.num_neurons = num_neurons
        self.leaky = leaky
        self.p_dropout = p_dropout
        if leaky:
            self.relu = nn.LeakyReLU(inplace=True)
        else:
            self.relu = nn.ReLU(inplace=True)
        self.dropout = nn.Dropout(p_dropout)
        self.w1 = nn.Linear(self.num_neurons, self.num_neurons)
        self.batch_norm1 = nn.BatchNorm1d(self.num_neurons)
        self.w2 = nn.Linear(self.num_neurons, self.num_neurons)
        self.batch_norm2 = nn.BatchNorm1d(self.num_neurons)
        if kaiming:
            # kaiming initialization
            self.w1.weight.data = nn.init.kaiming_normal_(self.w1.weight.data)
            self.w2.weight.data = nn.init.kaiming_normal_(self.w2.weight.data)
            
    def forward(self, x):
        y = self.w1(x)
        y = self.batch_norm1(y)
        y = self.relu(y)
        y = self.dropout(y)
        y = self.w2(y)
        y = self.batch_norm2(y)
        y = self.relu(y)
        y = self.dropout(y)
        out = x + y
        return out

class FCModel(nn.Module):
    def __init__(self,
                 stage_id=1,
                 num_neurons=1024,
                 num_blocks=2,
                 p_dropout=0.5,
                 norm_twoD=False,
                 kaiming=False,
                 refine_3d=False, 
                 leaky=False,
                 dm=False,
                 input_size=32,
                 output_size=64):
        """
        dm: use distance matrix feature computed from coordinates (DEPRECATED)
        leaky: use leaky ReLu instead of normal Relu
        """
        super(FCModel, self).__init__()
        self.num_neurons = num_neurons
        self.p_dropout = p_dropout
        self.num_blocks = num_blocks
        self.stage_id = stage_id
        self.refine_3d = refine_3d
        self.leaky = leaky
        self.dm = dm 
        self.input_size = input_size        
        self.output_size = output_size
        # map the input to a representation vector
        self.w1 = nn.Linear(self.input_size, self.num_neurons)
        self.batch_norm1 = nn.BatchNorm1d(self.num_neurons)
        self.res_blocks = []
        for l in range(num_blocks):
            self.res_blocks.append(ResidualBlock(num_neurons=self.num_neurons, 
                                                 p_dropout=self.p_dropout,
                                                 leaky=self.leaky))
        self.res_blocks = nn.ModuleList(self.res_blocks)
        # output
        self.w2 = nn.Linear(self.num_neurons, self.output_size)
        if self.leaky:
            self.relu = nn.LeakyReLU(inplace=True)
        else:
            self.relu = nn.ReLU(inplace=True)
        self.dropout = nn.Dropout(self.p_dropout)
        if kaiming:
            self.w1.weight.data = nn.init.kaiming_normal_(self.w1.weight.data)
            self.w2.weight.data = nn.init.kaiming_normal_(self.w2.weight.data)
            
    def forward(self, x):
        y = self.get_representation(x)
        y = self.w2(y)
        return y
    
    def get_representation(self, x):
        y = self.w1(x)
        y = self.batch_norm1(y)
        y = self.relu(y)
        y = self.dropout(y)
        # residual blocks
        for i in range(self.num_blocks):
            y = self.res_blocks[i](y)        
        return y
    
def get_fc_model(stage_id, 
                 cfgs,
                 input_size,
                 output_size,
                 architecture_type = 'FCModel'):
    return FCModel(stage_id=stage_id, 
                   refine_3d=cfgs[architecture_type]['refine_3d'], 
                   norm_twoD=cfgs[architecture_type]['norm_twoD'],
                   num_blocks=cfgs[architecture_type]['num_blocks'], 
                   input_size=input_size, 
                   output_size=output_size, 
                   num_neurons=cfgs[architecture_type]['num_neurons'],
                   p_dropout=cfgs[architecture_type]['dropout'],
                   leaky=cfgs[architecture_type]['leaky']
                   )
    
def get_cascade():
    return nn.ModuleList([])

================================================
FILE: libs/model/__init__.py
================================================
import libs.model.heatmapModel.hrnet
import libs.model.heatmapModel.resnet


================================================
FILE: libs/model/egonet.py
================================================
"""
A PyTorch implementation of Ego-Net.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import torch
import torch.nn as nn
import numpy as np
import cv2
import math

from scipy.spatial.transform import Rotation
from os.path import join as pjoin

import libs.model as models
import libs.model.FCmodel as FCmodel
import libs.dataset.normalization.operations as nop
import libs.visualization.egonet_utils as vego
import libs.common.transformation as ltr

from libs.common.img_proc import to_npy, get_affine_transform, generate_xy_map, modify_bbox
from libs.common.img_proc import affine_transform_modified
from libs.common.format import save_txt_file, get_pred_str
from libs.dataset.KITTI.car_instance import interp_dict

class EgoNet(nn.Module):
    def __init__(self,
                 cfgs,
                 pre_trained=False
                 ):
        """
        Initialization method of Ego-Net.
        """
        super(EgoNet, self).__init__()
        # initialize a fully-convolutional heatmap regression model
        # this model corresponds to H and C in Equation (2)
        hm_model_settings = cfgs['heatmapModel']
        hm_model_name = hm_model_settings['name']
        # this implementation uses a HR-Net backbone, yet you can use other 
        # backbones as well
        method_str = 'models.heatmapModel.' + hm_model_name + '.get_pose_net'
        self.HC = eval(method_str)(cfgs, is_train=False)
        self.resolution = cfgs['heatmapModel']['input_size']
        # optional channel augmentation
        if 'add_xy' in cfgs['heatmapModel']:
            self.xy_dict = {'flag':cfgs['heatmapModel']['add_xy']}
        else:
            self.xy_dict = None
        # initialize a lifing model
        # this corresponds to L in Equation (2) 
        self.L = FCmodel.get_fc_model(stage_id=1, 
                                      cfgs=cfgs, 
                                      input_size=cfgs['FCModel']['input_size'],
                                      output_size=cfgs['FCModel']['output_size']
                                      )
        if pre_trained:
            # load pre-trained checkpoints
            HC_path = pjoin(cfgs['dirs']['ckpt'], 'HC.pth')
            L_path = pjoin(cfgs['dirs']['ckpt'], 'L.pth')
            LS_path = pjoin(cfgs['dirs']['ckpt'], 'LS.npy')
            self.HC.load_state_dict(torch.load(HC_path))
            # the statistics used by the lifter for normalizing inputs
            self.LS = np.load(LS_path, allow_pickle=True).item()
            self.L.load_state_dict(torch.load(L_path))
    
    def crop_single_instance(self, 
                             img, 
                             bbox, 
                             resolution, 
                             pth_trans=None, 
                             xy_dict=None
                             ):
        """
        Crop a single instance given an image and bounding box.
        """
        bbox = to_npy(bbox)
        width, height = resolution
        target_ar = height / width
        ret = modify_bbox(bbox, target_ar)
        c, s, r = ret['c'], ret['s'], 0.
        # xy_dict: parameters for adding xy coordinate maps
        trans = get_affine_transform(c, s, r, (height, width))
        instance = cv2.warpAffine(img,
                                  trans,
                                  (int(resolution[0]), int(resolution[1])),
                                  flags=cv2.INTER_LINEAR
                                  )
        #cv2.imwrite('instance.jpg', instance)
        if xy_dict is not None and xy_dict['flag']:
            xymap = generate_xy_map(ret['bbox'], resolution, img.shape[:-1])
            instance = np.concatenate([instance, xymap.astype(np.float32)], axis=2)        
        instance = instance if pth_trans is None else pth_trans(instance)
        return instance
    
    def load_cv2(self, path, rgb=True):
        data_numpy = cv2.imread(path, 1 | 128)    
        if data_numpy is None:
            raise ValueError('Fail to read {}'.format(path))    
        if rgb:
            data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)         
        return data_numpy
    
    def crop_instances(self, 
                       annot_dict, 
                       resolution, 
                       pth_trans=None, 
                       rgb=True,
                       xy_dict=None
                       ):
        """
        Crop input instances given an annotation dictionary.
        """
        all_instances = []
        # each record stores attributes of one instance
        all_records = []
        target_ar = resolution[1] / resolution[0]
        for idx, path in enumerate(annot_dict['path']):
            data_numpy = self.load_cv2(path)
            boxes = annot_dict['boxes'][idx]
            if 'labels' in annot_dict:
                labels = annot_dict['labels'][idx]
            else:
                labels = -np.ones((len(boxes)), dtype=np.int64)
            if 'scores' in annot_dict:
                scores = annot_dict['scores'][idx]
            else:
                scores = -np.ones((len(boxes)))
            if len(boxes) == 0:
                continue
            for idx, bbox in enumerate(boxes):
                # crop an instance with required aspect ratio
                instance = self.crop_single_instance(data_numpy,
                                                     bbox, 
                                                     resolution, 
                                                     pth_trans=pth_trans,
                                                     xy_dict=xy_dict
                                                     )
                bbox = to_npy(bbox)
                ret = modify_bbox(bbox, target_ar)
                c, s, r = ret['c'], ret['s'], 0.
                all_instances.append(torch.unsqueeze(instance, dim=0))
                all_records.append({
                    'path': path,
                    'center': c,
                    'scale': s,
                    'bbox': bbox,
                    'bbox_resize': ret['bbox'],
                    'rotation': r,
                    'label': labels[idx],
                    'score': scores[idx]
                    }
                    )
        return torch.cat(all_instances, dim=0), all_records

    def add_orientation_arrow(self, record):
        """
        Generate an arrow for each predicted orientation for visualization.
        """      
        pred_kpts = record['kpts_3d_pred']
        gt_kpts = record['kpts_3d_gt']
        K = record['K']
        arrow_2d = np.zeros((len(pred_kpts), 2, 2))
        for idx in range(len(pred_kpts)):
            vector_3d = (pred_kpts[idx][1] - pred_kpts[idx][5])
            arrow_3d = np.concatenate([gt_kpts[idx][0].reshape(3, 1), 
                                      (gt_kpts[idx][0] + vector_3d).reshape(3, 1)],
                                      axis=1)
            projected = K @ arrow_3d
            arrow_2d[idx][0] = projected[0, :] / projected[2, :]
            arrow_2d[idx][1] = projected[1, :] / projected[2, :]
            # fix the arrow length if not fore-shortened
            vector_2d = arrow_2d[idx][:,1] - arrow_2d[idx][:,0]
            length = np.linalg.norm(vector_2d)
            if length > 50:
                vector_2d = vector_2d/length * 60
            arrow_2d[idx][:,1] = arrow_2d[idx][:,0] + vector_2d
        return arrow_2d

    def write_annot_dict(self, annot_dict, records):
        for idx, path in enumerate(annot_dict['path']):
            if 'boxes' in annot_dict:
                records[path]['boxes'] = to_npy(annot_dict['boxes'][idx])
            if 'kpts' in annot_dict:
                records[path]['kpts_2d_gt'] = to_npy(annot_dict['kpts'][idx])   
            if 'kpts_3d_gt' in annot_dict:
                records[path]['kpts_3d_gt'] = to_npy(annot_dict['kpts_3d_gt'][idx])   
            if 'pose_vecs_gt' in annot_dict:            
                records[path]['pose_vecs_gt'] = to_npy(annot_dict['pose_vecs_gt'][idx])  
            if 'kpts_3d_before' in annot_dict:
                records[path]['kpts_3d_before'] = to_npy(annot_dict['kpts_3d_before'][idx])  
            if 'raw_txt_format' in annot_dict:
                # list of annotation dictionary for each instance
                records[path]['raw_txt_format'] = annot_dict['raw_txt_format'][idx]
            if 'K' in annot_dict:
                # list of annotation dictionary for each instance
                records[path]['K'] = annot_dict['K'][idx]
            if 'kpts_3d_gt' in annot_dict and 'K' in annot_dict:
                records[path]['arrow'] = self.add_orientation_arrow(records[path])
        return records

    def get_observation_angle_trans(self, euler_angles, translations):
        """
        Convert orientation in camera coordinate into local coordinate system
        utilizing known object location (translation)
        """ 
        alphas = euler_angles[:,1].copy()
        for idx in range(len(euler_angles)):
            ry3d = euler_angles[idx][1] # orientation in the camera coordinate system
            x3d, z3d = translations[idx][0], translations[idx][2]
            alpha = ry3d - math.atan2(-z3d, x3d) - 0.5 * math.pi
            #alpha = ry3d - math.atan2(x3d, z3d)# - 0.5 * math.pi
            while alpha > math.pi: alpha -= math.pi * 2
            while alpha < (-math.pi): alpha += math.pi * 2
            alphas[idx] = alpha
        return alphas
    
    def get_observation_angle_proj(self, euler_angles, kpts, K):
        """
        Convert orientation in camera coordinate into local coordinate system
        utilizing the projection of object on the image plane
        """ 
        f = K[0,0]
        cx = K[0,2]
        kpts_x = [kpts[i][0,0] for i in range(len(kpts))]
        alphas = euler_angles[:,1].copy()
        for idx in range(len(euler_angles)):
            ry3d = euler_angles[idx][1] # orientation in the camera coordinate system
            x3d, z3d = kpts_x[idx] - cx, f
            alpha = ry3d - math.atan2(-z3d, x3d) - 0.5 * math.pi
            #alpha = ry3d - math.atan2(x3d, z3d)# - 0.5 * math.pi
            while alpha > math.pi: alpha -= math.pi * 2
            while alpha < (-math.pi): alpha += math.pi * 2
            alphas[idx] = alpha
        return alphas

    def get_template(self, prediction, interp_coef=[0.332, 0.667]):
        """
        Construct a template 3D cuboid at canonical pose. The 3D cuboid is 
        represented as part coordinates in the camera coordinate system.
        """ 
        parents = prediction[interp_dict['bbox12'][0] - 1]
        children = prediction[interp_dict['bbox12'][1] - 1]
        lines = parents - children
        lines = np.sqrt(np.sum(lines**2, axis=1))
        # averaged over the four parallel line segments
        h, l, w = np.sum(lines[:4])/4, np.sum(lines[4:8])/4, np.sum(lines[8:])/4
        x_corners = [l, l, l, l, 0, 0, 0, 0]
        y_corners = [0, h, 0, h, 0, h, 0, h]
        z_corners = [w, w, 0, 0, w, w, 0, 0]
        x_corners += - np.float32(l) / 2
        y_corners += - np.float32(h)
        #y_corners += - np.float32(h/2)
        z_corners += - np.float32(w) / 2
        corners_3d = np.array([x_corners, y_corners, z_corners])    
        if len(prediction) == 32:
            pidx, cidx = interp_dict['bbox12']
            parents, children = corners_3d[:, pidx - 1], corners_3d[:, cidx - 1]
            lines = children - parents
            new_joints = [(parents + interp_coef[i]*lines) for i in range(len(interp_coef))]
            corners_3d = np.hstack([corners_3d, np.hstack(new_joints)])    
        return corners_3d

    def kpts_to_euler(self, template, prediction):
        """
        Convert the predicted cuboid representation to euler angles.
        """    
        # estimate roll, pitch, yaw of the prediction by comparing with a 
        # reference bounding box
        # prediction and template of shape [3, N_points]
        R, T = ltr.compute_rigid_transform(template, prediction)
        # in the order of yaw, pitch and roll
        angles = Rotation.from_matrix(R).as_euler('yxz', degrees=False)
        # re-order in the order of x, y and z
        angles = angles[[1,0,2]]
        return angles, T

    def get_6d_rep(self, predictions, ax=None, color="black"):
        """
        Get the 6DoF representation of a 3D prediction.
        """    
        predictions = predictions.reshape(len(predictions), -1, 3)
        all_angles = []
        for instance_idx in range(len(predictions)):
            prediction = predictions[instance_idx]
            # templates are 3D boxes with no rotation
            # the prediction is estimated as the rotation between prediction and template
            template = self.get_template(prediction)
            instance_angle, instance_trans = self.kpts_to_euler(template, prediction.T)        
            all_angles.append(instance_angle.reshape(1, 3))
        angles = np.concatenate(all_angles)
        # the first point is the predicted point center
        translation = predictions[:, 0, :]    
        return angles, translation

    def gather_lifting_results(self,
                               record,
                               data,
                               prediction, 
                               target=None,
                               pose_vecs_gt=None,
                               intrinsics=None, 
                               refine=False, 
                               visualize=False,
                               template=None,
                               dist_coeffs=np.zeros((4,1)),
                               color='r',
                               get_str=False,
                               alpha_mode='trans'
                               ):
        """
        Convert network outputs to pose angles.
        """
        # prepare the prediction strings for submission
        # compute the roll, pitch and yaw angle of the predicted bounding box
        record['euler_angles'], record['translation'] = \
            self.get_6d_rep(record['kpts_3d_pred'])
        if alpha_mode == 'trans':
            record['alphas'] = self.get_observation_angle_trans(record['euler_angles'], 
                                                                record['translation']
                                                                )
        elif alpha_mode == 'proj':
            record['alphas'] = self.get_observation_angle_proj(record['euler_angles'],
                                                               record['kpts_2d_pred'],
                                                               record['K']
                                                               )        
        else:
             raise NotImplementedError   
        if get_str:
            record['pred_str'] = get_pred_str(record)      
        if visualize:
            record = vego.plot_3d_objects(prediction, 
                                          target, 
                                          pose_vecs_gt, 
                                          record, 
                                          color
                                          )
        return record

    def plot_one_image(self, 
                       img_path, 
                       record, 
                       visualize=False,
                       color_dict={'bbox_2d':'r',
                                   'bbox_3d':'r',
                                   'kpts':['rx', 'b']
                                   },
                       save_dict={'flag':False,
                                  'save_dir':None
                                  },
                       alpha_mode='trans'
                       ):
        """
        Post-process and plot the predictions from one image.
        """
        if visualize:
            # plot 2D predictions 
            vego.plot_2d_objects(img_path, record, color_dict)
        # plot 3d bounding boxes
        all_kpts_2d = np.concatenate(record['kpts_2d_pred'])
        all_kpts_3d_pred = record['kpts_3d_pred'].reshape(len(record['kpts_3d_pred']), -1)
        if 'kpts_3d_gt' in record:
            all_kpts_3d_gt = record['kpts_3d_gt']
            all_pose_vecs_gt = record['pose_vecs_gt']
        else:
            all_kpts_3d_gt = None
            all_pose_vecs_gt = None
        # refine and gather the prediction strings
        record = self.gather_lifting_results(record,
                                             all_kpts_2d,
                                             all_kpts_3d_pred, 
                                             all_kpts_3d_gt,
                                             all_pose_vecs_gt,
                                             color=color_dict['bbox_3d'],
                                             alpha_mode=alpha_mode,
                                             visualize=visualize,
                                             get_str=save_dict['flag']
                                             )

        # save KITTI-style prediction file in .txt format
        save_txt_file(img_path, record, save_dict)
        return record

    def post_process(self, 
                     records,
                     visualize=False, 
                     color_dict={'bbox_2d':'r',
                                 'kpts':['ro', 'b'],
                                 },
                     save_dict={'flag':False,
                                'save_dir':None
                                },
                     alpha_mode='trans'
                     ):
        """
        Save save and visualize them optionally.
        """   
        for img_path in records.keys():
            print("Processing {:s}".format(img_path))
            records[img_path] = self.plot_one_image(img_path, 
                                                    records[img_path],
                                                    visualize=visualize,
                                                    color_dict=color_dict,
                                                    save_dict=save_dict,
                                                    alpha_mode=alpha_mode
                                                    )      
        return records
    
    def new_img_dict(self):
        """
        An empty dictionary for image-level records.
        """
        img_dict = {'center':[], 
                    'scale':[], 
                    'rotation':[], 
                    'bbox_resize':[], # resized bounding box 
                    'kpts_2d_pred':[], 
                    'label':[], 
                    'score':[] 
                    }        
        return img_dict
    
    def get_keypoints(self,
                      instances, 
                      records, 
                      is_cuda=True
                      ):
        """
        Foward pass to obtain the screen coordinates.
        """
        if is_cuda:
            instances = instances.cuda()
        output = self.HC(instances)
        # local part coordinates
        width, height = self.resolution
        local_coord = output[1].data.cpu().numpy()
        local_coord *= np.array(self.resolution).reshape(1, 1, 2)
        # transform local part coordinates to screen coordinates
        centers = [records[i]['center'] for i in range(len(records))]
        scales = [records[i]['scale'] for i in range(len(records))]
        rots = [records[i]['rotation'] for i in range(len(records))]    
        for instance_idx in range(len(local_coord)):
            trans_inv = get_affine_transform(centers[instance_idx],
                                             scales[instance_idx], 
                                             rots[instance_idx], 
                                             (height, width), 
                                             inv=1
                                             )
            screen_coord = affine_transform_modified(local_coord[instance_idx], 
                                                     trans_inv
                                                     ) 
            records[instance_idx]['kpts'] = screen_coord
        # assemble a dictionary where each key corresponds to one image
        ret = {}
        for record in records:
            path = record['path']
            if path not in ret:
                ret[path] = self.new_img_dict()
            ret[path]['kpts_2d_pred'].append(record['kpts'].reshape(1, -1))
            ret[path]['center'].append(record['center'])
            ret[path]['scale'].append(record['scale'])
            ret[path]['bbox_resize'].append(record['bbox_resize'])
            ret[path]['label'].append(record['label'])
            ret[path]['score'].append(record['score'])
            ret[path]['rotation'].append(record['rotation'])
        return ret

    def lift_2d_to_3d(self, records, cuda=True):
        """
        Foward-pass of the lifter sub-model.
        """      
        for path in records.keys():
            data = np.concatenate(records[path]['kpts_2d_pred'], axis=0)
            data = nop.normalize_1d(data, self.LS['mean_in'], self.LS['std_in'])
            data = data.astype(np.float32)
            data = torch.from_numpy(data)
            if cuda:
                data = data.cuda()
            prediction = self.L(data)  
            prediction = nop.unnormalize_1d(prediction.data.cpu().numpy(),
                                            self.LS['mean_out'], 
                                            self.LS['std_out']
                                            )
            records[path]['kpts_3d_pred'] = prediction.reshape(len(prediction), -1, 3)
        return records
    
    def forward(self, annot_dict):
        """
        Process a batch of images.
        
        annot_dict is a Python dictionary storing the following keys: 
            path: list of image paths
            boxes: list of bounding boxes for each image
        """
        all_instances, all_records = self.crop_instances(annot_dict, 
                                                         resolution=self.resolution,
                                                         pth_trans=self.pth_trans,
                                                         xy_dict=self.xy_dict
                                                         )
        # all_records stores records for each instance
        records = self.get_keypoints(all_instances, all_records)
        # records stores records for each image
        records = self.lift_2d_to_3d(records)
        # write the annotation dictionary
        records = self.write_annot_dict(annot_dict, records)
        return records

================================================
FILE: libs/model/heatmapModel/__init__.py
================================================


================================================
FILE: libs/model/heatmapModel/hrnet.py
================================================
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# Modified by Shichao Li (nicholas.li@connect.ust.hk)
# ------------------------------------------------------------------------------

# from __future__ import absolute_import
# from __future__ import division
# from __future__ import print_function

import os
import logging

import torch
import torch.nn as nn

import numpy as np

BN_MOMENTUM = 0.1
logger = logging.getLogger(__name__)


def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
                     padding=1, bias=False)

def basicdownsample(in_planes, out_planes):
    downsample = nn.Sequential(
    nn.Conv2d(
        in_planes,
        out_planes,
        kernel_size=1, 
        stride=2, 
        bias=False
        ),
    nn.BatchNorm2d(
        out_planes
        ),
    )
    return downsample

class BasicLinearModule(nn.Module):
    def __init__(self, in_channels, out_channels, mid_channels=512):
        super(BasicLinearModule, self).__init__()
        self.l1 = nn.Linear(in_channels, out_channels)
        # self.l1 = nn.Linear(in_channels, mid_channels)
        # self.bn1 = nn.BatchNorm1d(mid_channels, momentum=BN_MOMENTUM)
        # self.relu = nn.ReLU(inplace=True)
        # self.l2 = nn.Linear(mid_channels, out_channels)

    def forward(self, x):
        x = x.view(len(x), -1)

        out = self.l1(x)
        # out = self.bn1(out)
        # out = self.relu(out)

        # out = self.l2(out)
        return out
    
class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
                               bias=False)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion,
                                  momentum=BN_MOMENTUM)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out


class HighResolutionModule(nn.Module):
    def __init__(self, num_branches, blocks, num_blocks, num_inchannels,
                 num_channels, fuse_method, multi_scale_output=True):
        super(HighResolutionModule, self).__init__()
        self._check_branches(
            num_branches, blocks, num_blocks, num_inchannels, num_channels)

        self.num_inchannels = num_inchannels
        self.fuse_method = fuse_method
        self.num_branches = num_branches

        self.multi_scale_output = multi_scale_output

        self.branches = self._make_branches(
            num_branches, blocks, num_blocks, num_channels)
        self.fuse_layers = self._make_fuse_layers()
        self.relu = nn.ReLU(True)

    def _check_branches(self, num_branches, blocks, num_blocks,
                        num_inchannels, num_channels):
        if num_branches != len(num_blocks):
            error_msg = 'NUM_BRANCHES({}) <> NUM_BLOCKS({})'.format(
                num_branches, len(num_blocks))
            logger.error(error_msg)
            raise ValueError(error_msg)

        if num_branches != len(num_channels):
            error_msg = 'NUM_BRANCHES({}) <> NUM_CHANNELS({})'.format(
                num_branches, len(num_channels))
            logger.error(error_msg)
            raise ValueError(error_msg)

        if num_branches != len(num_inchannels):
            error_msg = 'NUM_BRANCHES({}) <> NUM_INCHANNELS({})'.format(
                num_branches, len(num_inchannels))
            logger.error(error_msg)
            raise ValueError(error_msg)

    def _make_one_branch(self, branch_index, block, num_blocks, num_channels,
                         stride=1):
        downsample = None
        if stride != 1 or \
           self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(
                    self.num_inchannels[branch_index],
                    num_channels[branch_index] * block.expansion,
                    kernel_size=1, stride=stride, bias=False
                ),
                nn.BatchNorm2d(
                    num_channels[branch_index] * block.expansion,
                    momentum=BN_MOMENTUM
                ),
            )

        layers = []
        layers.append(
            block(
                self.num_inchannels[branch_index],
                num_channels[branch_index],
                stride,
                downsample
            )
        )
        self.num_inchannels[branch_index] = \
            num_channels[branch_index] * block.expansion
        for i in range(1, num_blocks[branch_index]):
            layers.append(
                block(
                    self.num_inchannels[branch_index],
                    num_channels[branch_index]
                )
            )

        return nn.Sequential(*layers)

    def _make_branches(self, num_branches, block, num_blocks, num_channels):
        branches = []

        for i in range(num_branches):
            branches.append(
                self._make_one_branch(i, block, num_blocks, num_channels)
            )

        return nn.ModuleList(branches)

    def _make_fuse_layers(self):
        if self.num_branches == 1:
            return None

        num_branches = self.num_branches
        num_inchannels = self.num_inchannels
        fuse_layers = []
        for i in range(num_branches if self.multi_scale_output else 1):
            fuse_layer = []
            for j in range(num_branches):
                if j > i:
                    fuse_layer.append(
                        nn.Sequential(
                            nn.Conv2d(
                                num_inchannels[j],
                                num_inchannels[i],
                                1, 1, 0, bias=False
                            ),
                            nn.BatchNorm2d(num_inchannels[i]),
                            nn.Upsample(scale_factor=2**(j-i), mode='nearest')
                        )
                    )
                elif j == i:
                    fuse_layer.append(None)
                else:
                    conv3x3s = []
                    for k in range(i-j):
                        if k == i - j - 1:
                            num_outchannels_conv3x3 = num_inchannels[i]
                            conv3x3s.append(
                                nn.Sequential(
                                    nn.Conv2d(
                                        num_inchannels[j],
                                        num_outchannels_conv3x3,
                                        3, 2, 1, bias=False
                                    ),
                                    nn.BatchNorm2d(num_outchannels_conv3x3)
                                )
                            )
                        else:
                            num_outchannels_conv3x3 = num_inchannels[j]
                            conv3x3s.append(
                                nn.Sequential(
                                    nn.Conv2d(
                                        num_inchannels[j],
                                        num_outchannels_conv3x3,
                                        3, 2, 1, bias=False
                                    ),
                                    nn.BatchNorm2d(num_outchannels_conv3x3),
                                    nn.ReLU(True)
                                )
                            )
                    fuse_layer.append(nn.Sequential(*conv3x3s))
            fuse_layers.append(nn.ModuleList(fuse_layer))

        return nn.ModuleList(fuse_layers)

    def get_num_inchannels(self):
        return self.num_inchannels

    def forward(self, x):
        if self.num_branches == 1:
            return [self.branches[0](x[0])]

        for i in range(self.num_branches):
            x[i] = self.branches[i](x[i])

        x_fuse = []

        for i in range(len(self.fuse_layers)):
            y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])
            for j in range(1, self.num_branches):
                if i == j:
                    y = y + x[j]
                else:
                    y = y + self.fuse_layers[i][j](x[j])
            x_fuse.append(self.relu(y))

        return x_fuse


blocks_dict = {
    'basic': BasicBlock,
    'bottleneck': Bottleneck
}


class PoseHighResolutionNet(nn.Module):

    def __init__(self, cfgs, **kwargs):
        self.inplanes = 64
        self.num_joints = cfgs['heatmapModel']['num_joints']
        extra = cfgs['heatmapModel']['extra']
        super(PoseHighResolutionNet, self).__init__()

        # stem net
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1,
                               bias=False)
        self.bn2 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
        self.relu = nn.ReLU(inplace=True)
        self.layer1 = self._make_layer(Bottleneck, 64, 4)

        self.stage2_cfg = cfgs['heatmapModel']['extra']['stage2']
        num_channels = self.stage2_cfg['num_channels']
        block = blocks_dict[self.stage2_cfg['block']]
        num_channels = [
            num_channels[i] * block.expansion for i in range(len(num_channels))
        ]
        self.transition1 = self._make_transition_layer([256], num_channels)
        self.stage2, pre_stage_channels = self._make_stage(
            self.stage2_cfg, num_channels)

        self.stage3_cfg = cfgs['heatmapModel']['extra']['stage3']
        num_channels = self.stage3_cfg['num_channels']
        block = blocks_dict[self.stage3_cfg['block']]
        num_channels = [
            num_channels[i] * block.expansion for i in range(len(num_channels))
        ]
        self.transition2 = self._make_transition_layer(
            pre_stage_channels, num_channels)
        self.stage3, pre_stage_channels = self._make_stage(
            self.stage3_cfg, num_channels)

        self.stage4_cfg = cfgs['heatmapModel']['extra']['stage4']
        num_channels = self.stage4_cfg['num_channels']
        block = blocks_dict[self.stage4_cfg['block']]
        num_channels = [
            num_channels[i] * block.expansion for i in range(len(num_channels))
        ]
        self.transition3 = self._make_transition_layer(
            pre_stage_channels, num_channels)
        self.stage4, pre_stage_channels = self._make_stage(
            self.stage4_cfg, num_channels, multi_scale_output=False)
        
        self.pretrained_layers = cfgs['heatmapModel']['extra']['pretrained_layers']
        
        # network head
        self.head_type = cfgs['heatmapModel']['head_type']
        self.pixel_shuffle = cfgs['heatmapModel']['pixel_shuffle']
        if self.head_type == 'heatmap':
            self.final_layer = nn.Conv2d(
                in_channels=pre_stage_channels[0],
                out_channels=self.num_joints,
                kernel_size=extra['final_conv_kernel'],
                stride=1,
                padding=1 if extra['final_conv_kernel'] == 3 else 0
            )
            
            if cfgs['heatmapModel']['pixel_shuffle']:
            # Add a pixel shuffle upsampling layer to control the heatmap size
                self.upsamp_fact = int(cfgs['heatmapModel']['heatmap_size'][0]\
                    /cfgs['heatmapModel']['input_size'][0]*4)
                self.upsample_layer = nn.Sequential(
                        nn.Conv2d(self.num_joints, self.num_joints*self.upsamp_fact**2, 
                                  kernel_size=1),
                        nn.BatchNorm2d(self.num_joints*self.upsamp_fact**2),
                        nn.ReLU(inplace=True),                
                        nn.PixelShuffle(self.upsamp_fact)
                        )
        elif self.head_type == 'angleregression':
            num_chan = 256
            self.head = nn.Sequential(
                nn.Conv2d(
                    in_channels=pre_stage_channels[0],
                    out_channels=num_chan,
                    kernel_size=1,
                    stride=1,
                    padding=0
                    ), 
                # produce 8*8*num_joints tensor
                BasicBlock(num_chan, 
                           num_chan, 
                           stride=2,
                           downsample=basicdownsample(num_chan, num_chan)
                           ),
                BasicBlock(num_chan, 
                           num_chan, 
                           stride=2,
                           downsample=basicdownsample(num_chan, num_chan)
                           ),
                BasicBlock(num_chan, 
                           num_chan, 
                           stride=2,
                           downsample=basicdownsample(num_chan, num_chan)
                           ),
                BasicBlock(num_chan, 
                           num_chan, 
                           stride=2,
                           downsample=basicdownsample(num_chan, num_chan)
                           ),
                nn.AvgPool2d(kernel_size=4),                
                )
            self.final_fc = nn.Sequential(
                nn.Linear(256, 256),
                nn.BatchNorm1d(256),
                nn.ReLU(inplace=True),
                nn.Linear(256, 2)
                )
        elif self.head_type == 'coordinates':
            num_chan = self.num_joints
            map_width, map_height = cfgs['heatmapModel']['heatmap_size']
            ks = (int(map_height / 16), int(map_width / 16))
            self.head1 = nn.Sequential(
                nn.Conv2d(
                    in_channels=pre_stage_channels[0],
                    out_channels=self.num_joints,
                    kernel_size=1,
                    stride=1,
                    padding=0
                    ), 
                )
            self.head2 = nn.Sequential(
                BasicBlock(num_chan+2, 
                           num_chan*2, 
                           stride=2,
                           downsample=basicdownsample(num_chan+2, num_chan*2)
                           ),
                BasicBlock(num_chan*2, 
                           num_chan*2, 
                           stride=2,
                           downsample=basicdownsample(num_chan*2, num_chan*2)
                           ),
                BasicBlock(num_chan*2, 
                           num_chan*2, 
                           stride=2,
                           downsample=basicdownsample(num_chan*2, num_chan*2)
                           ),
                BasicBlock(num_chan*2, 
                           num_chan*2, 
                           stride=2,
                           downsample=basicdownsample(num_chan*2, num_chan*2)
                           ),
                nn.Conv2d(num_chan*2, num_chan*2, kernel_size=ks),
                nn.Sigmoid()
                ) 
            # coordinate convolution makes arg-max easier
            x_map = np.tile(np.linspace(0, 1, map_width), (map_height, 1))
            x_map = x_map.reshape(1, 1, map_height, map_width)
            y_map = np.linspace(0, 1, map_height).reshape(map_height, 1)
            y_map = np.tile(y_map, (1, map_width))
            y_map = y_map.reshape(1, 1, map_height, map_width)
            self.coor_maps = np.concatenate([x_map, y_map], axis=1).astype(np.float32)
            self.coor_maps = torch.from_numpy(self.coor_maps)
        else:
            raise NotImplementedError

    def _make_transition_layer(
            self, num_channels_pre_layer, num_channels_cur_layer):
        num_branches_cur = len(num_channels_cur_layer)
        num_branches_pre = len(num_channels_pre_layer)

        transition_layers = []
        for i in range(num_branches_cur):
            if i < num_branches_pre:
                if num_channels_cur_layer[i] != num_channels_pre_layer[i]:
                    transition_layers.append(
                        nn.Sequential(
                            nn.Conv2d(
                                num_channels_pre_layer[i],
                                num_channels_cur_layer[i],
                                3, 1, 1, bias=False
                            ),
                            nn.BatchNorm2d(num_channels_cur_layer[i]),
                            nn.ReLU(inplace=True)
                        )
                    )
                else:
                    transition_layers.append(None)
            else:
                conv3x3s = []
                for j in range(i+1-num_branches_pre):
                    inchannels = num_channels_pre_layer[-1]
                    outchannels = num_channels_cur_layer[i] \
                        if j == i-num_branches_pre else inchannels
                    conv3x3s.append(
                        nn.Sequential(
                            nn.Conv2d(
                                inchannels, outchannels, 3, 2, 1, bias=False
                            ),
                            nn.BatchNorm2d(outchannels),
                            nn.ReLU(inplace=True)
                        )
                    )
                transition_layers.append(nn.Sequential(*conv3x3s))

        return nn.ModuleList(transition_layers)

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(
                    self.inplanes, planes * block.expansion,
                    kernel_size=1, stride=stride, bias=False
                ),
                nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def _make_stage(self, layer_config, num_inchannels,
                    multi_scale_output=True):
        num_modules = layer_config['num_modules']
        num_branches = layer_config['num_branches']
        num_blocks = layer_config['num_blocks']
        num_channels = layer_config['num_channels']
        block = blocks_dict[layer_config['block']]
        fuse_method = layer_config['fuse_method']

        modules = []
        for i in range(num_modules):
            # multi_scale_output is only used last module
            if not multi_scale_output and i == num_modules - 1:
                reset_multi_scale_output = False
            else:
                reset_multi_scale_output = True

            modules.append(
                HighResolutionModule(
                    num_branches,
                    block,
                    num_blocks,
                    num_inchannels,
                    num_channels,
                    fuse_method,
                    reset_multi_scale_output
                )
            )
            num_inchannels = modules[-1].get_num_inchannels()

        return nn.Sequential(*modules), num_inchannels

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.layer1(x)

        x_list = []
        for i in range(self.stage2_cfg['num_branches']):
            if self.transition1[i] is not None:
                x_list.append(self.transition1[i](x))
            else:
                x_list.append(x)
        y_list = self.stage2(x_list)

        x_list = []
        for i in range(self.stage3_cfg['num_branches']):
            if self.transition2[i] is not None:
                x_list.append(self.transition2[i](y_list[-1]))
            else:
                x_list.append(y_list[i])
        y_list = self.stage3(x_list)

        x_list = []
        for i in range(self.stage4_cfg['num_branches']):
            if self.transition3[i] is not None:
                x_list.append(self.transition3[i](y_list[-1]))
            else:
                x_list.append(y_list[i])
        y_list = self.stage4(x_list)
        
        if self.head_type == 'heatmap':
            x = self.final_layer(y_list[0])        
            # upsampling
            if self.pixel_shuffle:
                x = self.upsample_layer(x)
        elif self.head_type == 'coordinates':
            maps = self.head1(y_list[0])
            # concatenate coordinate maps
            num_sample = len(maps)
            coor_maps = self.coor_maps.repeat(num_sample, 1, 1, 1).to(maps.device)
            augmented_maps = torch.cat([maps, coor_maps], dim=1)
            coordinates = self.head2(augmented_maps)
            x = (maps, coordinates.view(len(x), -1, 2))
        elif self.head_type == 'angleregression':
            maps = self.head(y_list[0])
            x = self.final_fc(maps.reshape(len(maps), -1))
        else:
            raise NotImplementedError()
        return x

    def init_weights(self, pretrained=''):
        logger.info('=> init weights from normal distribution')
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                nn.init.normal_(m.weight, std=0.001)
                for name, _ in m.named_parameters():
                    if name in ['bias']:
                        nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
            elif isinstance(m, nn.ConvTranspose2d):
                nn.init.normal_(m.weight, std=0.001)
                for name, _ in m.named_parameters():
                    if name in ['bias']:
                        nn.init.constant_(m.bias, 0)

        if os.path.isfile(pretrained):
            pretrained_state_dict = torch.load(pretrained)
            logger.info('=> loading pretrained model {}'.format(pretrained))

            need_init_state_dict = {}
            for name, m in pretrained_state_dict.items():
                if name.split('.')[0] in self.pretrained_layers \
                   or self.pretrained_layers[0] == '*':
                    need_init_state_dict[name] = m
            self.load_state_dict(need_init_state_dict, strict=False)
            logger.info('{:d} modules initialized.'.format(len(need_init_state_dict)))
        elif pretrained:
            logger.error('=> please download pre-trained models first!')
            raise ValueError('{} does not exist!'.format(pretrained))
    
    def modify_input_channel(self, num_channels):
        if num_channels == 3:
            return
        new_layer = nn.Conv2d(num_channels, 64, kernel_size=3, stride=2, padding=1,
                               bias=False)
        # copy the old weights
        with torch.no_grad():
            new_layer.weight[:,:3,:,:] = self.conv1.weight.clone()
        del self.conv1
        self.conv1 = new_layer
        return
    
    def load_my_state_dict(self, state_dict):
        own_state = self.state_dict()
        for name, param in state_dict.items():
            if name not in own_state:
                 continue
            param = param.data
            own_state[name].copy_(param)

def is_freezed(name, freeze_names):
    for prefix in freeze_names:
        if name.startswith(prefix):
            return True
    return False
            
def get_pose_net(cfgs, is_train, **kwargs):
    model = PoseHighResolutionNet(cfgs, **kwargs)

    if is_train and cfgs['heatmapModel']['init_weights']:
        model.init_weights(cfgs['heatmapModel']['pretrained'])

    # freeze specified pre-trained layers
    freeze_names = cfgs['heatmapModel']['extra'].get('freeze_layers', [])
    for name, param in model.named_parameters():
        if is_freezed(name, freeze_names):
            param.requires_grad = False
            print('{:s} freezed during training.'.format(name))
                
    if cfgs['heatmapModel']['add_xy']:
        model.modify_input_channel(5)
    return model

================================================
FILE: libs/model/heatmapModel/resnet.py
================================================
# ------------------------------------------------------------------------------
# Copyright (c) Microsoft
# Licensed under the MIT License.
# Written by Bin Xiao (Bin.Xiao@microsoft.com)
# ------------------------------------------------------------------------------

#from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import os
import logging

import torch
import torch.nn as nn


BN_MOMENTUM = 0.1
logger = logging.getLogger(__name__)


def conv3x3(in_planes, out_planes, stride=1):
    """3x3 convolution with padding"""
    return nn.Conv2d(
        in_planes, out_planes, kernel_size=3, stride=stride,
        padding=1, bias=False
    )


class BasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes)
        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
                               padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)
        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,
                               bias=False)
        self.bn3 = nn.BatchNorm2d(planes * self.expansion,
                                  momentum=BN_MOMENTUM)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out


class PoseResNet(nn.Module):

    def __init__(self, block, layers, cfg, **kwargs):
        self.inplanes = 64
        extra = cfg.MODEL.EXTRA
        self.deconv_with_bias = extra.DECONV_WITH_BIAS

        super(PoseResNet, self).__init__()
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
                               bias=False)
        self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)

        # used for deconv layers
        self.deconv_layers = self._make_deconv_layer(
            extra.NUM_DECONV_LAYERS,
            extra.NUM_DECONV_FILTERS,
            extra.NUM_DECONV_KERNELS,
        )

        self.final_layer = nn.Conv2d(
            in_channels=extra.NUM_DECONV_FILTERS[-1],
            out_channels=cfg.MODEL.NUM_JOINTS,
            kernel_size=extra.FINAL_CONV_KERNEL,
            stride=1,
            padding=1 if extra.FINAL_CONV_KERNEL == 3 else 0
        )

    def _make_layer(self, block, planes, blocks, stride=1):
        downsample = None
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes * block.expansion,
                          kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),
            )

        layers = []
        layers.append(block(self.inplanes, planes, stride, downsample))
        self.inplanes = planes * block.expansion
        for i in range(1, blocks):
            layers.append(block(self.inplanes, planes))

        return nn.Sequential(*layers)

    def _get_deconv_cfg(self, deconv_kernel, index):
        if deconv_kernel == 4:
            padding = 1
            output_padding = 0
        elif deconv_kernel == 3:
            padding = 1
            output_padding = 1
        elif deconv_kernel == 2:
            padding = 0
            output_padding = 0

        return deconv_kernel, padding, output_padding

    def _make_deconv_layer(self, num_layers, num_filters, num_kernels):
        assert num_layers == len(num_filters), \
            'ERROR: num_deconv_layers is different len(num_deconv_filters)'
        assert num_layers == len(num_kernels), \
            'ERROR: num_deconv_layers is different len(num_deconv_filters)'

        layers = []
        for i in range(num_layers):
            kernel, padding, output_padding = \
                self._get_deconv_cfg(num_kernels[i], i)

            planes = num_filters[i]
            layers.append(
                nn.ConvTranspose2d(
                    in_channels=self.inplanes,
                    out_channels=planes,
                    kernel_size=kernel,
                    stride=2,
                    padding=padding,
                    output_padding=output_padding,
                    bias=self.deconv_with_bias))
            layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM))
            layers.append(nn.ReLU(inplace=True))
            self.inplanes = planes

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.deconv_layers(x)
        x = self.final_layer(x)

        return x

    def init_weights(self, pretrained=''):
        if os.path.isfile(pretrained):
            logger.info('=> init deconv weights from normal distribution')
            for name, m in self.deconv_layers.named_modules():
                if isinstance(m, nn.ConvTranspose2d):
                    logger.info('=> init {}.weight as normal(0, 0.001)'.format(name))
                    logger.info('=> init {}.bias as 0'.format(name))
                    nn.init.normal_(m.weight, std=0.001)
                    if self.deconv_with_bias:
                        nn.init.constant_(m.bias, 0)
                elif isinstance(m, nn.BatchNorm2d):
                    logger.info('=> init {}.weight as 1'.format(name))
                    logger.info('=> init {}.bias as 0'.format(name))
                    nn.init.constant_(m.weight, 1)
                    nn.init.constant_(m.bias, 0)
            logger.info('=> init final conv weights from normal distribution')
            for m in self.final_layer.modules():
                if isinstance(m, nn.Conv2d):
                    # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                    logger.info('=> init {}.weight as normal(0, 0.001)'.format(name))
                    logger.info('=> init {}.bias as 0'.format(name))
                    nn.init.normal_(m.weight, std=0.001)
                    nn.init.constant_(m.bias, 0)

            pretrained_state_dict = torch.load(pretrained)
            logger.info('=> loading pretrained model {}'.format(pretrained))
            self.load_state_dict(pretrained_state_dict, strict=False)
        else:
            logger.info('=> init weights from normal distribution')
            for m in self.modules():
                if isinstance(m, nn.Conv2d):
                    # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
                    nn.init.normal_(m.weight, std=0.001)
                    # nn.init.constant_(m.bias, 0)
                elif isinstance(m, nn.BatchNorm2d):
                    nn.init.constant_(m.weight, 1)
                    nn.init.constant_(m.bias, 0)
                elif isinstance(m, nn.ConvTranspose2d):
                    nn.init.normal_(m.weight, std=0.001)
                    if self.deconv_with_bias:
                        nn.init.constant_(m.bias, 0)


resnet_spec = {
    18: (BasicBlock, [2, 2, 2, 2]),
    34: (BasicBlock, [3, 4, 6, 3]),
    50: (Bottleneck, [3, 4, 6, 3]),
    101: (Bottleneck, [3, 4, 23, 3]),
    152: (Bottleneck, [3, 8, 36, 3])
}


def get_pose_net(cfg, is_train, **kwargs):
    num_layers = cfg.MODEL.EXTRA.NUM_LAYERS

    block_class, layers = resnet_spec[num_layers]

    model = PoseResNet(block_class, layers, cfg, **kwargs)

    if is_train and cfg.MODEL.INIT_WEIGHTS:
        model.init_weights(cfg.MODEL.PRETRAINED)

    return model

================================================
FILE: libs/optimizer/__init__.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Empty file.
"""


================================================
FILE: libs/optimizer/optimizer.py
================================================
"""
Optimization utilities.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""
import torch

def prepare_optim(model, cfgs):
    """
    Get optimizer and scheduler objects from model parameters.
    """      
    params = [ p for p in model.parameters() if p.requires_grad]
    lr = cfgs['optimizer']['lr']
    weight_decay = cfgs['optimizer']['weight_decay']
    momentum = cfgs['optimizer']['momentum']
    milestones = cfgs['optimizer']['milestones']
    gamma = cfgs['optimizer']['gamma']
    if cfgs['optimizer']['optim_type'] == 'adam':
        optimizer = torch.optim.Adam(params, 
                                     lr = lr, 
                                     weight_decay = weight_decay)
    elif cfgs['optimizer']['optim_type'] == 'sgd':
        optimizer = torch.optim.SGD(params, 
                                    lr = lr, 
                                    momentum = momentum,
                                    weight_decay = weight_decay)
    else:
        raise NotImplementedError
    scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, 
                                                     milestones = milestones, 
                                                     gamma = gamma
                                                     )
    # A scheduler that automatically decreases the learning rate
#    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer,
#                                                           mode='min',
#                                                           factor=0.5,
#                                                           patience=10,
#                                                           verbose=True,
#                                                           min_lr=0.01)
    return optimizer, scheduler


================================================
FILE: libs/trainer/__init__.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Empty file.
"""


================================================
FILE: libs/trainer/accuracy.py
================================================
"""
Deprecated. Will be deleted in a future version.
Pre-defined accuracy functions.
"""

import libs.common.img_proc as lip

import numpy as np

def get_distance(gt, pred):
    # gt: [n_joints, 2 or 3]
    # pred: [n_joints, 2]
    if gt.shape[1] == 2:
        sqerr = (gt - pred)**2
        sqerr = sqerr.sum(axis = 1)
        dist_list = list(np.sqrt(sqerr))
    elif gt.shape[1] == 3:
        dist_list = []
        sqerr = (gt[:, :2] - pred)**2
        sqerr = sqerr.sum(axis = 1)
        indices = np.nonzero(gt[:, 2])[0]
        dist_list = list(np.sqrt(sqerr[indices]))        
    else:
        raise ValueError('Array shape not supported.')
    return dist_list

def accuracy_pixel(output, 
                   meta_data, 
                   cfgs=None,
                   image_size = (256.0, 256.0), 
                   arg_max='hard'
                   ):
    """
    pixel-wise distance computed from predicted heatmaps
    """
    # report distance in terms of pixel in the original image
    if arg_max == 'soft':
        if isinstance(output, np.ndarray):
            pred, max_vals = lip.get_max_preds_soft(output)
        else:
            pred, max_vals = lip.get_max_preds_soft_pt(output)
    elif arg_max == 'hard':
        if not isinstance(output, np.ndarray):
            output = output.data.cpu().numpy()
        pred, max_vals = lip.get_max_preds(output)
    else:
        raise NotImplementedError
    image_size = image_size if cfgs is None else cfgs['heatmapModel']['input_size']
    # multiply by down-sample ratio
    if not isinstance(pred, np.ndarray):
        pred = pred.data.cpu().numpy()
        max_vals = max_vals.data.cpu().numpy()
    pred *= image_size[0]/output.shape[3]
    # inverse transform and compare pixel didstance
    centers, scales, rots = meta_data['center'], meta_data['scale'], meta_data['rotation']
    centers = centers.data.cpu().numpy()
    scales = scales.data.cpu().numpy()
    rots = rots.data.cpu().numpy()
    joints_original_batch = meta_data['original_joints'].data.cpu().numpy()
    distance_list = []
    all_src_coordinates = []
    for sample_idx in range(len(pred)):
        trans_inv = lip.get_affine_transform(centers[sample_idx], 
                                             scales[sample_idx], 
                                             rots[sample_idx], 
                                             image_size, 
                                             inv=1)
        joints_original = joints_original_batch[sample_idx]        
        pred_src_coordinates = lip.affine_transform_modified(pred[sample_idx], 
                                                             trans_inv) 
        all_src_coordinates.append(pred_src_coordinates.reshape(1, len(pred_src_coordinates), 2))
        distance_list += get_distance(joints_original, pred_src_coordinates)
    cnt = len(distance_list)
    avg_acc = sum(distance_list)/cnt
    others = {
        'src_coord': np.concatenate(all_src_coordinates, axis=0),
        'joints_pred': pred,
        'max_vals': max_vals
        }
    return avg_acc, cnt, others

================================================
FILE: libs/trainer/trainer.py
================================================
"""
Utilities for training and validation.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import libs.model.FCmodel as FCmodel
import libs.optimizer.optimizer as optimizer
import libs.loss.function as loss_funcs
import libs.visualization.points as vp

from libs.common.transformation import procrustes_transform, pnp_refine
from libs.visualization.debug import save_debug_images
from libs.common.utils import AverageMeter
from libs.metric.criterions import Evaluator
from libs.logger.logger import get_dirs

import torch
import numpy as np
import time
import os
import matplotlib.pyplot as plt

def train_cascade(train_dataset, valid_dataset, cfgs, logger):
    """
    Method for training the lifter sub-model L.pth.
    """    
    # data statistics
    #stats = train_dataset.stats
    stats = None
    # cascaded model
    cascade = FCmodel.get_cascade()
    stage_record = []
    # train each stage
    for stage_id in range(cfgs['cascade']['num_stages']):
        # initialize the model
        input_size, output_size = train_dataset.get_input_output_size()
        cfgs['FCModel']['input_size'] = input_size
        cfgs['FCModel']['output_size'] = output_size
        stage_model = FCmodel.get_fc_model(stage_id + 1, 
                                           cfgs=cfgs,
                                           input_size=input_size,
                                           output_size=output_size
                                           )
        if cfgs['use_gpu']:
            stage_model = stage_model.cuda()
        # prepare the optimizer
        optim, sche = optimizer.prepare_optim(stage_model, cfgs)
        loss_type = cfgs['FCModel']['loss_type']
        loss_func = eval('loss_funcs.' + loss_type)(
        reduction=cfgs['FCModel']['loss_reduction']
        ).cuda()     
        # train the model
        record = train(train_dataset=train_dataset,
                       valid_dataset=valid_dataset, 
                       model=stage_model, 
                       loss_func=loss_func,
                       optim=optim, 
                       sche=sche, 
                       stats=stats, 
                       cfgs=cfgs,
                       logger=logger
                       )
        stage_model = record['model']
        stage_record.append((record['batch_idx'], record['loss']))
        # put into cascade
        cascade.append(stage_model.cpu())     
        # release memory
        del stage_model    
    return {'cascade':cascade, 'record':stage_record}

def evaluate_cascade(cascade, 
                     eval_dataset, 
                     stats, 
                     opt, 
                     save=False, 
                     save_path=None,
                     action_wise=False, 
                     action_eval_list=None, 
                     apply_dropout=False
                     ):
    """
    Method for evaluating the lifter sub-model L.pth.
    """ 
    loss, distance = None, None
    for stage_id in range(len(cascade)):
        # initialize the model
        stage_model = cascade[stage_id]
    
        if opt.cuda:
            stage_model = stage_model.cuda()
        
        # evaluate the model
        loss, distance = evaluate(eval_dataset, 
                                  stage_model, 
                                  stats, 
                                  opt, 
                                  save=save, 
                                  save_path=save_path,
                                  procrustes=False, 
                                  per_joint=True, 
                                  apply_dropout=apply_dropout
                                  )

        # update datasets
        eval_dataset.stage_update(stage_model, stats, opt)
        
        # release memory
        del stage_model       
    return loss, distance

def get_loader(dataset, cfgs, split, collate_fn=None):
    """
    Prepare a PyTorch dataloader object.
    """ 
    setting = split + '_settings'   
    arg_dic = {'batch_size': cfgs[setting]['batch_size'],
               'num_workers': cfgs[setting]['num_threads'],
               'shuffle': cfgs[setting]['shuffle'],
               }
    if collate_fn is not None:
        arg_dic['collate_fn'] = collate_fn
    loader = torch.utils.data.DataLoader(dataset, **arg_dic)     
    return loader

def train(train_dataset,
          model, 
          loss_func,
          optim, 
          sche, 
          cfgs, 
          logger,
          metric_func=None,
          stats=None, 
          valid_dataset=None,
          collate_fn=None,
          save_debug=False
          ):
    """
    Train a model with optional validation during training.
    """
    # training configurations
    total_epochs = cfgs['training_settings']['total_epochs']
    batch_size = cfgs['training_settings']['batch_size']
    report_every = cfgs['training_settings']['report_every']
    eval_during = cfgs['training_settings']['eval_during']
    eval_start_epoch = cfgs['training_settings']['eval_start_epoch'] if \
        'eval_start_epoch' in cfgs['training_settings'] else 0
    # evaluate during training
    if eval_during and valid_dataset is not None:
        eval_every = cfgs['training_settings']['eval_every'] 
        evaluator = Evaluator(cfgs['training_settings']['eval_metrics'], 
                              cfgs,
                              train_dataset.num_joints
                              )
    plot_loss = cfgs['training_settings']['plot_loss'] 
    cuda = cfgs['use_gpu'] and torch.cuda.is_available()
    # optional list storing loss curve 
    x_buffer = []
    y_buffer = []
    # online plotting
    if plot_loss:
        ax, lines, x_buffer, y_buffer = initialize_plot()
    # training
    for epoch in range(1, total_epochs + 1):
        # Apply cross-ratio loss after certain epochs
        if epoch > 1:
            loss_func.apply_cr_loss = True
        # initialize training record
        batch_time = AverageMeter()
        data_time = AverageMeter()
        losses = AverageMeter()
        acc = AverageMeter()           
        model.train()  
        # modify the learning rate according to the scheduler
        sche.step()
        # data loader
        train_loader = get_loader(train_dataset, cfgs, 'training', collate_fn)   
        total_batches = len(train_loader)
        total_sample = len(train_dataset)
        end = time.time()
        for batch_idx, (data, target, weights, meta) in enumerate(train_loader):
            if cuda:
                data, target, weights = data.cuda(), target.cuda(), weights.cuda()
            # measure data loading time
            data_time.update(time.time() - end)
            # erase all computed gradient        
            optim.zero_grad()
            # forward pass to get prediction
            prediction = model(data)
            # compute loss
            loss = loss_func(prediction, target, weights, meta)
            # compute gradient in the computational graph
            loss.backward()
            # update parameters in the model 
            optim.step()
            losses.update(loss.item(), data.size(0))
            # compute other optional metrics besides the loss value
            if metric_func is not None:
                avg_acc, cnt, others = metric_func(prediction, meta, cfgs)
                acc.update(avg_acc, n=cnt, others=others)
                if batch_idx % report_every == 0:
                    acc.print_content()
            else:
                others = None
            # measure elapsed time
            batch_time.update(time.time() - end)
            end = time.time()      
            # logging
            if batch_idx % report_every == 0:
                logger_print(epoch, 
                             batch_idx, 
                             batch_size, 
                             total_sample, 
                             batch_time, 
                             data.size()[0], 
                             data_time, 
                             losses,
                             acc, 
                             logger
                             )
                # optional: save intermediate results for debugging
                if save_debug:
                    save_debug_images(epoch, 
                                      batch_idx, 
                                      cfgs, 
                                      data, 
                                      meta, 
                                      target, 
                                      others, 
                                      prediction, 
                                      'train'
                                      )
                # update loss curve
                x_buffer.append(total_batches * (epoch - 1) + batch_idx)
                y_buffer.append(loss.item())
                if plot_loss:
                    update_curve(ax, lines[0], x_buffer, y_buffer)
            del data, target, weights, meta
            # evaluate model if specified
            if eval_during and epoch> eval_start_epoch and \
                batch_idx and batch_idx % eval_every == 0:
                evaluate(valid_dataset, 
                         model, 
                         loss_func, 
                         cfgs, 
                         logger, 
                         evaluator, 
                         collate_fn=collate_fn,
                         epoch=epoch
                         )
                # back to training mode
                model.train()
        # save a snapshot
        if epoch in cfgs['training_settings'].get('snapshot_epochs', []):
            output_dir, _ = get_dirs(cfgs)
            prefix = cfgs['exp_type']
            model_state_file = os.path.join(output_dir, prefix + '_{:d}.pth'.format(epoch))
            logger.info('=> Snapshot model to {}'.format(model_state_file))
            torch.save(model.module.state_dict(), model_state_file)
    logger.info('Training finished.')
    return {'model':model, 'batch_idx':x_buffer, 'loss':y_buffer}  

def initialize_plot():
    """
    Initialize loss plot.
    """     
    x_buffer, y_buffer = [], []
    ax = plt.subplot(111)
    lines = ax.plot(x_buffer, y_buffer)
    plt.xlabel('batch index')
    plt.ylabel('training loss')    
    return ax, lines, x_buffer, y_buffer

def update_curve(ax, line, x_buffer, y_buffer):
    """
    Update loss plot.
    """ 
    line.set_xdata(x_buffer)
    line.set_ydata(y_buffer)
    # recompute the ax.dataLim
    ax.relim()
    # update ax.viewLim using the new dataLim
    ax.autoscale_view()
    plt.draw()
    plt.pause(0.05)    
    return

def logger_print(epoch, 
                 batch_idx, 
                 batch_size, 
                 total_sample, 
                 batch_time,
                 length,
                 data_time,
                 losses,
                 acc,
                 logger
                 ):
    """
    Print training logs.
    """     
    msg = 'Training Epoch: [{0}][{1}/{2}]\t' \
          'Time {batch_time.val:.3f}s ({batch_time.avg:.3f}s)\t' \
          'Speed {speed:.1f} samples/s\t' \
          'Data {data_time.val:.3f}s ({data_time.avg:.3f}s)\t' \
          'Loss {loss.val:.5f} ({loss.avg:.5f})'.format(
              epoch, 
              batch_idx * batch_size, 
              total_sample, 
              batch_time=batch_time,
              speed=length / batch_time.val,
              data_time=data_time, 
              loss=losses
              )      
    if acc.val != 0 and acc.avg != 0:
        # acc is a pre-defined metric with positive value
        msg += 'Accuracy {acc.val:.3f} ({acc.avg:.3f})'.format(acc=acc)          
    logger.info(msg)
    return

def visualize_lifting_results(data, 
                              prediction, 
                              target=None, 
                              sample_num=None,
                              intrinsics=None, 
                              refine=False, 
                              dist_coeffs=np.zeros((4,1)),
                              meta_data=None
                              ):
    """
    Visualizing predictions of the lifter model (optional).
    """ 
    # only take the coordinates
    if data.shape[1] > 18:
        data = data[:, :18]
    # use the ground truth translation if provided in the meta_data
    if 'roots' in meta_data:
        target = np.hstack([meta_data['roots'], target])
        prediction = np.hstack([meta_data['roots'], prediction])
    sample_num = sample_num if sample_num else len(prediction) 
    chosen = np.random.choice(len(prediction), sample_num, replace=False)
    if target is not None:
        assert len(target) == len(prediction)
        p3d_gt_sample = target[chosen].reshape(sample_num, -1, 3)
    else:
        p3d_gt_sample = None
    p3d_pred_sample = prediction[chosen].reshape(sample_num, -1, 3)
    data_sample = data[chosen].reshape(sample_num, -1, 2)
    # vp.plot_comparison_relative(p3d_pred_sample[:9, 3:], 
    #                             p3d_gt_sample[:9, 3:])
    ax = vp.plot_scene_3dbox(p3d_pred_sample, p3d_gt_sample)
    if not refine:
        return
    # refine 3D point prediction by minimizing re-projection errors
    assert intrinsics is not None
    for idx in range(sample_num):
        prediction = p3d_pred_sample[idx]
        tempt_box_pred = prediction.copy()
        tempt_box_pred[1:, :] += tempt_box_pred[0, :].reshape(1, 3)
        observation = data_sample[idx]
        # use the predicted 3D bounding box size for refinement
        refined_prediction = pnp_refine(tempt_box_pred, observation, intrinsics, 
                                        dist_coeffs)
        vp.plot_lines(ax, 
                      refined_prediction[:, 1:].T, 
                      vp.plot_3d_bbox.connections, 
                      dimension=3, 
                      c='g'
                      )
        # use the gt 3D box size for refinement
        # first align a box with gt size with the predicted box, then refine
        if target is None:
            continue
        tempt_box_gt = p3d_gt_sample[idx].copy()
        tempt_box_gt[1:, :] += tempt_box_gt[0, :].reshape(1, 3) 
        pseudo_box = procrustes_transform(tempt_box_gt.T, tempt_box_pred.T)
        refined_prediction2 = pnp_refine(pseudo_box.T, observation, intrinsics, 
                                        dist_coeffs)
        vp.plot_lines(ax, 
                      pseudo_box[:, 1:].T, 
                      vp.plot_3d_bbox.connections, 
                      dimension=3, 
                      c='y'
                      )         
        vp.plot_lines(ax, 
                      refined_prediction2[:, 1:].T, 
                      vp.plot_3d_bbox.connections, 
                      dimension=3, 
                      c='b'
                      )        
    return

def evaluate(eval_dataset, 
             model, 
             loss_func,
             cfgs, 
             logger, 
             evaluator,
             save=False, 
             save_path=None,
             collate_fn=None,
             epoch=None,
             sample_num=20
             ):
    """
    Method for evaluating a model.
    """     
    # unnormalize the prediction if needed
    if cfgs['testing_settings']['unnormalize']:
        stats = eval_dataset.statistics
        
    # visualize after certain epoch
    if cfgs['exp_type'] == '2dto3d' and 'vis_epoch' in cfgs['testing_settings']:
        vis_epoch = cfgs['testing_settings']['vis_epoch']
    else:
        vis_epoch = -1

    all_dists = []
    model.eval()
    
    # optional: enable dropout in testing to produce loss similar to the training loss
    if cfgs['testing_settings']['apply_dropout']:
        def apply_dropout(m):
            if type(m) == torch.nn.Dropout:
                m.train()        
        model.apply(apply_dropout)
        
    intrinsics = None if not hasattr(eval_dataset, 'intrinsic') else \
        eval_dataset.intrinsic
    refine = False if intrinsics is None else True
    eval_loader = get_loader(eval_dataset, cfgs, 'testing', collate_fn)
    cuda = cfgs['use_gpu'] and torch.cuda.is_available()
    losses = AverageMeter()
    
    # optional: save intermediate results
    if save:
        pred_list = []
        gt_list = []
    has_plot = False # only plot once
    
    for batch_idx, (data, target, weights, meta) in enumerate(eval_loader):
        if cuda:
            data, target, weights = data.cuda(), target.cuda(), weights.cuda()
            
        # forward pass to get prediction
        prediction = model(data)
        
        # optional: save intermediate results for debugging
        if cfgs['testing_settings'].get('save_debug', False) and \
            cfgs.get('exp_type') == 'instanceto2d':
            joints_pred = prediction[1].data.cpu().numpy()
            image_size = cfgs['heatmapModel']['input_size']
            joints_pred *= np.array(image_size).reshape(1, 1, 2)
            save_debug_images(0, 
                              batch_idx, 
                              cfgs, 
                              data, 
                              meta, 
                              target, 
                              {'joints_pred': joints_pred}, 
                              prediction, 
                              'validation'
                              )        
            logger.info('Saved batch {:d}'.format(batch_idx))
#        if save:
#            pred_list.append(prediction.data.cpu().numpy())
        loss = loss_func(prediction, target, weights, meta)
        losses.update(loss.item(), data.size(0))
        
        if cfgs['testing_settings']['unnormalize']:
            # compute distance of body joints in un-normalized format
            target = eval_dataset.unnormalize(target.data.cpu().numpy(), 
                                              stats['mean_out'], 
                                              stats['std_out']
                                              )    
            prediction = eval_dataset.unnormalize(prediction.data.cpu().numpy(), 
                                                  stats['mean_out'], 
                                                  stats['std_out']
                                                  ) 
            
        evaluator.update(prediction, ground_truth=target, meta_data=meta)
        
        ## plot 3D bounding boxes for visualization
        if not has_plot and vis_epoch > 0 and epoch > vis_epoch:
            data_unnorm = eval_dataset.unnormalize(data.data.cpu().numpy(), 
                                                   stats['mean_in'], 
                                                   stats['std_in']
                                                   ) 
            visualize_lifting_results(data_unnorm, 
                                      prediction, 
                                      target, 
                                      sample_num=sample_num, 
                                      intrinsics=intrinsics,
                                      refine=refine,
                                      meta_data=meta
                                      )
            has_plot = True
        if save:
            pred_list.append(prediction)
            gt_list.append(target)

    if save:
        # note the residual update is saved if a cascade is used
        record = {#'data':np.concatenate(data_list, axis=0), 
                  'pred':np.concatenate(pred_list, axis=0), 
                  'error':all_dists, 
                  'gt':np.concatenate(gt_list, axis=0)
                  }
        np.save(save_path, np.array(record))

    evaluator.report(logger)
    return

================================================
FILE: libs/visualization/__init__.py
================================================
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Empty file.
"""


================================================
FILE: libs/visualization/debug.py
================================================
"""
Utilities for saving debugging images.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

from libs.common.img_proc import get_max_preds
from libs.common.utils import make_dir

import math
import numpy as np
import torchvision
import cv2

from os.path import join

def draw_circles(ndarr, 
                 xmaps, 
                 ymaps, 
                 nmaps, 
                 batch_joints, 
                 batch_joints_vis, 
                 width, 
                 height, 
                 padding, 
                 color=[255,0,0],
                 add_idx=True
                 ):
    k = 0
    for y in range(ymaps):
        for x in range(xmaps):
            if k >= nmaps:
                break
            joints = batch_joints[k]
            for idx, joint in enumerate(joints):
                xpos = x * width + padding + joint[0]
                ypos = y * height + padding + joint[1]
                cv2.circle(ndarr, (int(xpos), int(ypos)), 2, color, 2)
                if add_idx:
                    cv2.putText(ndarr, 
                                str(idx+1), 
                                (int(xpos), int(ypos)), 
                                cv2.FONT_HERSHEY_SIMPLEX, 
                                1, color, 1
                                )
            k += 1
    return ndarr

# functions used for debugging heatmap-based keypoint localization model      #
def save_batch_image_with_joints(batch_image, 
                                 record_dict, 
                                 file_name, 
                                 nrow=8, 
                                 padding=2
                                 ):
    """
    batch_image: [batch_size, channel, height, width]
    batch_joints: [batch_size, num_joints, 3],
    batch_joints_vis: [batch_size, num_joints, 1],
    """
    grid = torchvision.utils.make_grid(batch_image[:, :3, :, :], nrow, padding, True)
    ndarr = grid.mul(255).clamp(0, 255).byte().permute(1, 2, 0).cpu().numpy()
    ndarr = ndarr.copy()
    nmaps = batch_image.size(0)
    xmaps = min(nrow, nmaps)
    ymaps = int(math.ceil(float(nmaps) / xmaps))
    height = int(batch_image.size(2) + padding)
    width = int(batch_image.size(3) + padding)
    batch_joints, batch_joints_vis = record_dict['pred'] 
    ndarr = draw_circles(ndarr, xmaps, ymaps, nmaps, batch_joints, batch_joints_vis, 
                         width, height, padding)
    if 'gt' in record_dict:
        nmaps = min(nmaps, len(batch_joints_vis))
        xmaps = min(nrow, nmaps)
        ymaps = int(math.ceil(float(nmaps) / xmaps))
        batch_joints_gt, batch_joints_vis_gt = record_dict['gt']
        ndarr = draw_circles(ndarr, xmaps, ymaps, nmaps, batch_joints_gt, batch_joints_vis_gt, 
                             width, height, padding, color=[0,255,255])        
    cv2.imwrite(file_name, cv2.cvtColor(ndarr, cv2.COLOR_RGB2BGR))
    return

def save_batch_heatmaps(batch_image, 
                        batch_heatmaps, 
                        file_name,
                        normalize=True
                        ):
    """
    batch_image: [batch_size, channel, height, width]
    batch_heatmaps: ['batch_size, num_joints, height, width]
    file_name: saved file name
    """
    if normalize:
        batch_image = batch_image.clone()
        min = float(batch_image.min())
        max = float(batch_image.max())

        batch_image.add_(-min).div_(max - min + 1e-5)

    batch_size = batch_heatmaps.size(0)
    num_joints = batch_heatmaps.size(1)
    heatmap_height = batch_heatmaps.size(2)
    heatmap_width = batch_heatmaps.size(3)

    grid_image = np.zeros((batch_size*heatmap_height,
                           (num_joints+1)*heatmap_width,
                           3),
                          dtype=np.uint8)

    preds, maxvals = get_max_preds(batch_heatmaps.detach().cpu().numpy())

    for i in range(batch_size):
        image = batch_image[i].mul(255)\
                              .clamp(0, 255)\
                              .byte()\
                              .permute(1, 2, 0)\
                              .cpu().numpy()
        heatmaps = batch_heatmaps[i].mul(255)\
                                    .clamp(0, 255)\
                                    .byte()\
                                    .cpu().numpy()

        resized_image = cv2.resize(image,
                                   (int(heatmap_width), int(heatmap_height)))

        height_begin = heatmap_height * i
        height_end = heatmap_height * (i + 1)
        for j in range(num_joints):
            cv2.circle(resized_image,
                       (int(preds[i][j][0]), int(preds[i][j][1])),
                       1, [0, 0, 255], 1)
            heatmap = heatmaps[j, :, :]
            colored_heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
            masked_image = colored_heatmap*0.7 + resized_image*0.3
            cv2.circle(masked_image,
                       (int(preds[i][j][0]), int(preds[i][j][1])),
                       1, [0, 0, 255], 1)

            width_begin = heatmap_width * (j+1)
            width_end = heatmap_width * (j+2)
            grid_image[height_begin:height_end, width_begin:width_end, :] = \
                masked_image
            # grid_image[height_begin:height_end, width_begin:width_end, :] = \
            #     colored_heatmap*0.7 + resized_image*0.3

        grid_image[height_begin:height_end, 0:heatmap_width, :] = resized_image

    cv2.imwrite(file_name, grid_image)
    return

def save_debug_images(epoch, 
                      batch_index, 
                      cfgs, 
                      input, 
                      meta, 
                      target, 
                      others, 
                      output, 
                      split
                      ):
    """
    Save debugging images during training HC.pth.
    """    
    if not cfgs['training_settings']['debug']['save']:
        return
    prefix = join(cfgs['dirs']['output'], 
                  "intermediate_results",
                  split, 
                  '{}_{}'.format(epoch, batch_index)
                  )
    make_dir(prefix)
    joints_pred = others['joints_pred']
    debug_cfgs = cfgs['training_settings']['debug']
    record_dict = {'pred':(joints_pred, meta['joints_vis']),
                   'gt':(meta['transformed_joints'], meta['joints_vis'])}    
    if debug_cfgs['save_images_kpts']:
        save_batch_image_with_joints(
            input[:,:3,:,:], record_dict, '{}_keypoints.jpg'.format(prefix)
        )
    if debug_cfgs['save_hms_gt']:
        save_batch_heatmaps(
            input[:,:3,:,:], target, '{}_hm_gt.jpg'.format(prefix)
        )
    if debug_cfgs['save_hms_pred']:
        output = output[0] if type(output) is tuple else output
        save_batch_heatmaps(
            input[:,:3,:,:], output, '{}_hm_pred.jpg'.format(prefix)
        )
    return

================================================
FILE: libs/visualization/egonet_utils.py
================================================
"""
Visualization utilities for Ego-Net inference.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import cv2
import numpy as np
import matplotlib.pyplot as plt

import libs.visualization.points as vp

def plot_2d_objects(img_path, record, color_dict):
    if 'plots' in record:
        # update old drawing
        fig = record['plots']['fig2d']
        ax = record['plots']['ax2d']
    else:
        # new drawing
        fig = plt.figure(figsize=(11.3, 9))
        ax = plt.subplot(111)
        record['plots'] = {}
        record['plots']['fig2d'] = fig
        record['plots']['ax2d'] = ax
        image = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)[:, :, ::-1]
        height, width, _ = image.shape
        ax.imshow(image) 
        ax.set_xlim([0, width])
        ax.set_ylim([0, height])
        ax.invert_yaxis()
    for idx in range(len(record['kpts_2d_pred'])):
        kpts = record['kpts_2d_pred'][idx].reshape(-1, 2)
        bbox = record['bbox_resize'][idx]
        vp.plot_2d_bbox(ax, bbox, color_dict['bbox_2d'])
        # predicted key-points
        ax.plot(kpts[:, 0], kpts[:, 1], color_dict['kpts'][0])    
    if 'kpts_2d_gt' in record:
        # plot ground truth 2D screen coordinates
        for idx, kpts_gt in enumerate(record['kpts_2d_gt']):
            kpts_gt = kpts_gt.reshape(-1, 3)
            vp.plot_3d_bbox(ax, kpts_gt[1:, :2], color='g', linestyle='-.')
    if 'arrow' in record:
        for idx in range(len(record['arrow'])):
            start = record['arrow'][idx][:,0]
            end = record['arrow'][idx][:,1]
            x, y = start
            dx, dy = end - start
            ax.arrow(x, y, dx, dy, color='r', lw=4, head_width=5, alpha=0.5) 
    # save intermediate results
    # plt.gca().set_axis_off()
    # plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, 
    # hspace = 0, wspace = 0)
    # plt.margins(0,0)
    # plt.gca().xaxis.set_major_locator(plt.NullLocator())
    # plt.gca().yaxis.set_major_locator(plt.NullLocator())
    # img_name = img_path.split('/')[-1]
    # save_dir = './qualitative_results/'
    # plt.savefig(save_dir + img_name, dpi=100, bbox_inches = 'tight', pad_inches = 0)
    return record

def plot_3d_objects(prediction, target, pose_vecs_gt, record, color):
    if target is not None:
        p3d_gt = target.reshape(len(target), -1, 3)
    else:
        p3d_gt = None
    p3d_pred = prediction.reshape(len(prediction), -1, 3)
    if "kpts_3d_before" in record:
        # use predicted translation for visualization
        p3d_pred = np.concatenate([record['kpts_3d_before'][:, [0], :], p3d_pred], axis=1)
    elif p3d_gt is not None and p3d_gt.shape[1] == p3d_pred.shape[1] + 1:
        # use ground truth translation for visualization
        assert len(p3d_pred) == len(p3d_gt)
        p3d_pred = np.concatenate([p3d_gt[:, [0], :], p3d_pred], axis=1) 
    else:
        raise NotImplementedError
    if 'plots' in record and 'ax3d' in record['plots']:
        # update drawing
        ax = record['plots']['ax3d']
        ax = vp.plot_scene_3dbox(p3d_pred, p3d_gt, ax=ax, color=color)
    elif 'plots' in record:
        # plotting a set of 3D boxes
        ax = vp.plot_scene_3dbox(p3d_pred, p3d_gt, color=color)
        ax.set_title("GT: black w/o Ego-Net: magenta w/ Ego-Net: red/yellow")
        vp.draw_pose_vecs(ax, pose_vecs_gt)
        record['plots']['ax3d'] = ax
    else:
        raise NotImplementedError
    # draw pose angle predictions
    translation = p3d_pred[:, 0, :]    
    pose_vecs_pred = np.concatenate([translation, record['euler_angles']], axis=1)
    vp.draw_pose_vecs(ax, pose_vecs_pred, color=color)
    if 'kpts_3d_before' in record and 'plots' in record:
        # plot input 3D bounding boxes before using Ego-Net
        kpts_3d_before = record['kpts_3d_before']
        vp.plot_scene_3dbox(kpts_3d_before, ax=ax, color='m')    
        pose_vecs_before = np.zeros((len(kpts_3d_before), 6))
        for idx in range(len(pose_vecs_before)):
            pose_vecs_before[idx][0:3] = record['raw_txt_format'][idx]['locations']
            pose_vecs_before[idx][4] = record['raw_txt_format'][idx]['rot_y']
        vp.draw_pose_vecs(ax, pose_vecs_before, color='m')
    return record

================================================
FILE: libs/visualization/points.py
================================================
"""
Simple visualization utilities for 2D and 3D points based on Matplotlib.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import numpy as np
import matplotlib.pyplot as plt

from mpl_toolkits.mplot3d import Axes3D

def check_points(points, dimension):
    """
    Assertion function for input dimension.
    """
    if len(points.shape) == 1:
        assert points.shape[0] % dimension == 0
        points = points.reshape(-1, dimension)
    elif len(points.shape) == 2:
        assert points.shape[1] == dimension
    else:
        raise ValueError    
    return points

def set_3d_axe_limits(ax, points=None, center=None, radius=None, ratio=1.2):
    """
    Set 3d axe limits to simulate set_aspect('equal').
    Matplotlib has not yet provided implementation of set_aspect('equal') 
    for 3d axe.
    """
    if points is None:
        assert center is not None and radius is not None
    if center is None or radius is None:
        assert points is not None
    if center is None:
        center = points.mean(axis=0, keepdims=True)
    if radius is None:
        radius = points - center
        radius = np.max(np.abs(radius))*ratio
    #ax.set_aspect('equal')    
    xroot, yroot, zroot = center[0,0], center[0,1], center[0,2]
    ax.set_xlim3d([-radius+xroot, radius+xroot])
    ax.set_ylim3d([-radius+yroot, radius+yroot])
    ax.set_zlim3d([-radius+zroot, radius+zroot])    
    return

def plot_3d_points(ax, 
                   points, 
                   indices=None, 
                   center=None, 
                   radius=None,  
                   add_labels=True, 
                   display_ticks=True, 
                   remove_planes=[],
                   marker='o', 
                   color='k', 
                   size=50, 
                   alpha=1, 
                   set_limits=False
                   ):
    """
    Scatter plot of 3D points.
    
    points are of shape [3*N_points] or [N_points, 3]
    """
    points = check_points(points, dimension=3)
    points = points[indices,:] if indices is not None else points
    ax.scatter(points[:,0], points[:,1], points[:,2], marker=marker, c=color,
               s=size, alpha=alpha)
    if set_limits:
        set_3d_axe_limits(ax, points, center, radius)
    if add_labels:
        ax.set_xlabel("x")
        ax.set_ylabel("y")
        ax.set_zlabel("z")
    # remove tick labels or planes
    if not display_ticks:
        ax.set_xticks([])
        ax.set_yticks([])
        ax.set_zticks([])
        ax.get_xaxis().set_ticklabels([])
        ax.get_yaxis().set_ticklabels([])
        ax.set_zticklabels([])
    white = (1.0, 1.0, 1.0, 1.0)
    if 'x' in remove_planes:
        ax.w_xaxis.set_pane_color(white)
    if 'y' in remove_planes:
        ax.w_xaxis.set_pane_color(white)
    if 'z' in remove_planes:
        ax.w_xaxis.set_pane_color(white)        
    
    plt.show()
    return

def plot_lines(ax, 
               points, 
               connections, 
               dimension, 
               lw=4, 
               c='k', 
               linestyle='-', 
               alpha=0.8, 
               add_index=False
               ):
    """
    Plot 2D/3D lines given points and connection.
    
    connections are of shape [n_lines, 2]
    """
    points = check_points(points, dimension)
    if add_index:
        for idx in range(len(points)):
            if dimension == 2:
                x, y = points[idx][0], points[idx][1]
                ax.text(x, y, str(idx))
            elif dimension == 3:
                x, y, z = points[idx][0], points[idx][1], points[idx][2]
                ax.text(x, y, z, str(idx))                
    connections = connections.reshape(-1, 2)
    for connection in connections:
        x = [points[connection[0]][0], points[connection[1]][0]]
        y = [points[connection[0]][1], points[connection[1]][1]]
        if dimension == 3:
            z = [points[connection[0]][2], points[connection[1]][2]]
            line, = ax.plot(x, y, z, lw=lw, c=c, linestyle=linestyle, alpha=alpha)
        else:
            line, = ax.plot(x, y, lw=lw, c=c, linestyle=linestyle, alpha=alpha)
    plt.show()
    return line

def plot_mesh(ax, vertices, faces, color='grey'):
    """
    Simple mesh plotting.
    
    vertics of shape [N_vertices, 3]
    faces pf shape [N_faces, 3] storing indices
    """    
    set_3d_axe_limits(ax, vertices)
    ax.plot_trisurf(vertices[:, 0], 
                    vertices[:, 1], 
                    faces, 
                    -vertices[:, 2], 
                    shade=True, 
                    color=color
                    )    
    return

def plot_3d_coordinate_system(ax, 
                              origin, 
                              system, 
                              length=300, 
                              colors=['r', 'g', 'b']
                              ):
    """
    Draw a coordinate system at a specified origin
    
    system: [v1, v2, v3] 
    """ 
    origin = origin.reshape(3, 1)
    start_points = np.repeat(origin, 3, axis=1)
    end_points = start_points + system*length
    all_points = np.hstack([origin, end_points])
    for i in range(3):
        plot_lines(ax, 
                   all_points.T, 
                   plot_3d_coordinate_system.connections[i].reshape(1,2),
                   dimension=3, 
                   c=colors[i]
                   )
    return

def plot_3d_bbox(ax, 
                 bbox_3d_projected, 
                 color=None, 
                 linestyle='-', 
                 add_index=False
                 ):
    """
    Draw the projected edges of a 3D cuboid.
    """ 
    c = np.random.rand(3) if color is None else color
    plot_lines(ax, 
               bbox_3d_projected, 
               plot_3d_bbox.connections, 
               dimension=2, 
               c=c, 
               linestyle=linestyle, 
               add_index=add_index
               )
    return

def plot_2d_bbox(ax, 
                 bbox_2d, 
                 color=None, 
                 score=None, 
                 label=None, 
                 linestyle='-'
                 ):
    """
    Draw a 2D bounding box.
    
    bbox_2d in the format [x1, y1, x2, y2]
    """ 
    c = np.random.rand(3) if color is None else color
    x1, y1, x2, y2 = bbox_2d[0], bbox_2d[1], bbox_2d[2], bbox_2d[3],
    points = np.array([[x1, y1], [x2, y1], [x2, y2], [x1, y2]], dtype=np.float32)
    plot_lines(ax, points, plot_2d_bbox.connections, dimension=2, c=c, linestyle=linestyle)
    if score is not None and label is not None:
        string = "({:.2f}, {:d})".format(score, label)
        ax.text((x1+x2)*0.5, (y1+y2)*0.5, string, bbox=dict(facecolor='red', alpha=0.2))
    return

def plot_comparison_relative(points_pred, points_gt):
    # DEPRECATED
    # plot the comparison of the shape relative to the root point
    plt.figure()
    num_row = 3
    num_col = int(len(points_pred)/num_row)
    for i in range(len(points_pred)):
        ax = plt.subplot(num_row, num_col, i+1, projection='3d')
        pred = points_pred[i]
        gt = points_gt[i]
        plot_3d_points(ax, pred, color='r')
        plot_3d_points(ax, gt, color='k')
        # TODO check here
        pred_bbox = get_bbox_3d(pred)
        gt_bbox = get_bbox_3d(gt)
        plot_lines(ax, pred_bbox, plot_3d_bbox.connections, dimension=3, c='r')
        plot_lines(ax, gt_bbox, plot_3d_bbox.connections, dimension=3, c='k')
        set_3d_axe_limits(ax, 
                          np.vstack([pred_bbox.reshape(-1, 3), 
                                     gt_bbox.reshape(-1, 3)]
                                    ),
                          center=np.zeros((1,3)), 
                          radius=5.
                          )
    ax.set_xlabel("x")
    ax.set_ylabel("y")
    ax.set_zlabel("z")    
    ax.view_init(0., 90.)
    return

def plot_scene_3dbox(points_pred, points_gt=None, ax=None, color='r'):
    """
    Plot the comparison of predicted 3d bounding boxes and ground truth ones.
    """ 
    if ax is None:
        plt.figure()
        ax = plt.subplot(111, projection='3d')
    preds = points_pred.copy()
    # add the root translation
    preds[:,1:,] = preds[:,1:,] + preds[:,[0],]
    if points_gt is not None:
        gts = points_gt.copy() 
        gts[:,1:,] = gts[:,1:,] + gts[:,[0],]
        all_points = np.concatenate([preds, gts], axis=0).reshape(-1, 3)
    else:
        all_points = preds.reshape(-1, 3)
    for pred in preds:
        plot_3d_points(ax, pred, color=color, size=15)
        plot_lines(ax, pred[1:,], plot_3d_bbox.connections, dimension=3, c=color)
    if points_gt is not None:
        for gt in gts:
            plot_3d_points(ax, gt, color='k', size=15)
            plot_lines(ax, gt[1:,], plot_3d_bbox.connections, dimension=3, c='k')         
    set_3d_axe_limits(ax, all_points)
    return ax

def get_area(points, indices, preserve_points=False):
    # DEPRECATED
    # points [N, 2]
    # indices [M, 3]
    vec1 = points[indices[:, 1], :] - points[indices[:, 0], :]
    vec2 = points[indices[:, 2], :] - points[indices[:, 0], :]
    area= np.cross(vec1, vec2)*0.5
    area = area.reshape(1, -1)
    if preserve_points:
        feature = np.hstack([points.reshape(1,-1), area])
    else:
        feature = area
    return feature

def interpolate(start, end, num_interp):
    # DEPRECATED
    # start: [3]
    # end: [3]
    x = np.linspace(start[0], end[0], num=num_interp+2)[1:-1].reshape(num_interp, 1)
    y = np.linspace(start[1], end[1], num=num_interp+2)[1:-1].reshape(num_interp, 1)
    z = np.linspace(start[2], end[2], num=num_interp+2)[1:-1].reshape(num_interp, 1)
    return np.concatenate([x,y,z], axis=1)

def get_interpolated_points(points, indices, num_interp):
    # DEPRECATED
    # points [N, 3]
    # indices [M, 2] point indices for interpolating a line segment
    # num_interp how many points to add for each segment
    new_points = []
    for start_idx, end_idx in indices:
        new_points.append(interpolate(points[start_idx], points[end_idx], num_interp))
    return np.vstack(new_points)

def draw_pose_vecs(ax, pose_vecs=None, color='black'):
    """
    Add pose vectors to a 3D matplotlib axe.
    """     
    if pose_vecs is None:
        return
    for pose_vec in pose_vecs:
        x, y, z, pitch, yaw, roll = pose_vec
        string = "({:.2f}, {:.2f}, {:.2f})".format(pitch, yaw, roll)
        # add some random noise to the text location so that they do not overlap
        nl = 0.02 # noise level
        ax.text(x*(1+np.random.randn()*nl), 
                y*(1+np.random.randn()*nl), 
                z*(1+np.random.randn()*nl), 
                string, 
                color=color
                )
        
def get_bbox_3d(points, add_center=False, interp_style=""):
    """
    Get a 3D bounding boxes from coordinate limits in object coordinate system.
    """  
    assert len(points.shape) == 2 
    if points.shape[0] == 3:
        axis=1 
    elif points.shape[1] == 3:
        axis=0
    limit_min = points.min(axis=axis)
    limit_max = points.max(axis=axis)
    xmax, xmin = limit_max[0], limit_min[0]
    ymax, ymin = limit_max[1], limit_min[1]
    zmax, zmin = limit_max[2], limit_min[2]
    bbox = np.array([[xmax, ymin, zmax],
                     [xmax, ymax, zmax],
                     [xmax, ymin, zmin],
                     [xmax, ymax, zmin],
                     [xmin, ymin, zmax],
                     [xmin, ymax, zmax],
                     [xmin, ymin, zmin],
                     [xmin, ymax, zmin]])
    if add_center:
        bbox = np.vstack([np.array([[0., 0., 0.]]), bbox])
    if interp_style.startswith('bbox9interp'):
        interp_num = int(interp_style[11:])
        # indices for each edge
        indices = np.array([[1,2],
                            [3,4],
                            [1,3],
                            [2,4],
                            [5,6],
                            [7,8],
                            [5,7],
                            [6,8],
                            [1,5],
                            [3,7],
                            [2,6],
                            [4,8]])
        new_points = get_interpolated_points(bbox, indices, interp_num)
        bbox = np.vstack([bbox, new_points])
    return bbox

def ray_intersect_triangle(p0, p1, triangle):
    """
    Tests if a ray starting at point p0, in the direction
    p1 - p0, will intersect with the triangle.
    
    arguments:
    p0, p1: numpy.ndarray, both with shape (3,) for x, y, z.
    triangle: numpy.ndarray, shaped (3,3), with each row
        representing a vertex and three columns for x, y, z.
    
    returns: 
        0.0 if ray does not intersect triangle, 
        1.0 if it will intersect the triangle,
        2.0 if starting point lies in the triangle.
        
    Reference: https://www.erikrotteveel.com/python/three-dimensional-ray-tracing-in-python/
    """
    v0, v1, v2 = triangle
    u = v1 - v0
    v = v2 - v0
    normal = np.cross(u, v)
    b = np.inner(normal, p1 - p0)
    a = np.inner(normal, v0 - p0)
    if (b == 0.0):
        # ray is parallel to the plane
        if a != 0.0:
            # ray is outside but parallel to the plane
            return 0
        else:
            # ray is parallel and lies in the plane
            rI = 0.0
    else:
        rI = a / b
    if rI < 0.0:
        return 0
    w = p0 + rI * (p1 - p0) - v0
    denom = np.inner(u, v) * np.inner(u, v) - \
        np.inner(u, u) * np.inner(v, v)
    si = (np.inner(u, v) * np.inner(w, v) - \
        np.inner(v, v) * np.inner(w, u)) / denom
    if (si < 0.0) | (si > 1.0):
        return 0
    ti = (np.inner(u, v) * np.inner(w, u) - \
        np.inner(u, u) * np.inner(w, v)) / denom
    if (ti < 0.0) | (si + ti > 1.0):
        return 0
    if (rI == 0.0):
        return 2
    return 1

def get_visibility(box3d, triangles):
    """
    Get visibility for each vertex of a 3D bounding box given all the triangles
    in a scene.
    
    box3d: [8, 3] The vertex coordinates in the camera coordinate system.
    triangles: [N, 3, 3]
    """      
    visibility = np.ones(8, dtype=np.bool)
    p1 = np.zeros(3)
    for idx, p0 in enumerate(box3d):
        intersects = set()
        for triangle in triangles:
            intersects.add(ray_intersect_triangle(p0, p1, triangle))
        if 1 in intersects:
            visibility[idx] = False
    return visibility

## static variables implemented as function attributes 
plot_3d_coordinate_system.connections = np.array([[0, 1],
                                                  [0, 2],
                                                  [0, 3]])
plot_3d_bbox.connections = np.array([[0, 1],
                                     [0, 2],
                                     [1, 3],
                                     [2, 3],
                                     [4, 5],
                                     [5, 7],
                                     [4, 6],
                                     [6, 7],
                                     [0, 4],
                                     [1, 5],
                                     [2, 6],
                                     [3, 7]])
plot_2d_bbox.connections = np.array([[0, 1],
                                     [1, 2],
                                     [2, 3],
                                     [3, 0]])

================================================
FILE: tools/inference.py
================================================
"""
Inference of Ego-Net on KITTI dataset.

The user can provide the 3D bounding boxes predicted by other 3D object detectors
and run Ego-Net to refine the orientation of these instances.

The user can also visualize the intermediate results.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import sys
sys.path.append('../')

import libs.arguments.parse as parse
import libs.logger.logger as liblogger
import libs.dataset.KITTI.car_instance as libkitti

from libs.common.img_proc import modify_bbox
from libs.trainer.trainer import get_loader
from libs.model.egonet import EgoNet

import shutil
import torch
import numpy as np
import os
import subprocess
import matplotlib.pyplot as plt
plt.ion()

def filter_detection(detected, thres=0.7):
    """
    Filter predictions based on a confidence threshold.
    """      
    # detected: list of dict
    filtered = []
    for detection in detected:
        tempt_dict = {}
        indices = detection['scores'] > thres
        for key in ['boxes', 'labels', 'scores']:
            tempt_dict[key] = detection[key][indices]
        filtered.append(tempt_dict)
    return filtered

def merge(dict_a, dict_b):
    for key in dict_b.keys():
        dict_a[key] = dict_b[key]
    return

def collate_dict(dict_list):
    ret = {}
    for key in dict_list[0]:
        ret[key] = [d[key] for d in dict_list]
    return ret

def my_collate_fn(batch):
    # the collate function for 2d pose training
    imgs, meta = list(zip(*batch))
    meta = collate_dict(meta)
    return imgs, meta

def filter_conf(record, thres=0.0):
    """
    Filter the object detections with a confidence threshold.
    """
    annots = record['raw_txt_format']
    indices = [i for i in range(len(annots)) if annots[i]['score'] >= thres]
    if len(indices) == 0:
        return False, record
    filterd_record = {
        'bbox_2d': record['bbox_2d'][indices],
        'kpts_3d': record['kpts_3d'][indices],
        'raw_txt_format': [annots[i] for i in indices],
        'scores': [annots[i]['score'] for i in indices],
        'K':record['K']
        }
    return True, filterd_record

def gather_dict(request, 
                references, 
                filter_c=True, 
                larger=True, 
                thres=0.,
                target_ar=1.,
                enlarge=1.2
                ):
    """
    Gather a annotation dictionary from the prepared detections as requsted.
    """
    assert 'path' in request
    ret = {'path':[], 
           'boxes':[], 
           'kpts_3d_before':[], 
           'raw_txt_format':[],
           'scores':[],
           'K':[]}
    for img_path in request['path']:
        img_name = img_path.split('/')[-1]
        if img_name not in references:
            print('Warning: ' + img_name + ' not included in detected images!')
            continue
        ref = references[img_name]
        if filter_c:
            success, ref = filter_conf(ref, thres=thres)
        if filter_c and not success:
            continue
        ret['path'].append(img_path)
        bbox = ref['bbox_2d']
        if larger:
            # enlarge the input bounding box if needed            
            for instance_id in range(len(bbox)):
                bbox[instance_id] = np.array(modify_bbox(bbox[instance_id], 
                                                         target_ar=target_ar, 
                                                         enlarge=enlarge
                                                         )['bbox']
                                             )
        ret['boxes'].append(bbox)
        # 3D key-points from the detections before using Ego-Net
        ret['kpts_3d_before'].append(ref['kpts_3d'])
        # raw prediction strings used for later saving
        ret['raw_txt_format'].append(ref['raw_txt_format'])
        ret['scores'].append(ref['scores'])
        ret['K'].append(ref['K'])
    if 'pose_vecs_gt' in request:
        ret['pose_vecs_gt'] = request['pose_vecs_gt']
    return ret
    
def make_output_dir(cfgs, name):
    save_dir = os.path.join(cfgs['dirs']['output'], name, 'data')
    if not os.path.exists(save_dir):
        os.makedirs(save_dir) 
    return save_dir

@torch.no_grad()
def inference(testset, model, results, cfgs):
    """
    The inference loop.
    
    Set cfgs['visualize'] to True if you want to view the results.
    color_dict stores plotting parameters used by Matplotlib.
    save_dict stores parameters relevant to result saving.
    """
    # data loader
    data_loader = get_loader(testset, cfgs, 'testing', collate_fn=my_collate_fn)          
    # transformation statistics
    model.pth_trans = testset.pth_trans 
    all_records = {}
    for batch_idx, (_, meta) in enumerate(data_loader):
        if cfgs['use_gt_box']:
            save_dir = make_output_dir(cfgs, 'gt_box_test')         
            # use ground truth bounding box to crop RoIs
            record = model(meta)
            record = model.post_process(record,
                                        visualize=cfgs['visualize'],
                                        color_dict={'bbox_2d':'y',
                                                    'bbox_3d':'y',
                                                    'kpts':['yx', 'y']
                                                    },
                                        save_dict={
                                            'flag':True,
                                            'save_dir':save_dir
                                            }
                                        )
            merge(all_records, record)
        if cfgs['use_pred_box']:
            # use detected bounding box from any 2D/3D detector
            thres = cfgs.get('conf_thres', 0.)
            width, height = cfgs['heatmapModel']['input_size']
            enlarge = cfgs['dataset'].get('enlarge_factor', 1.2)
            annot_dict = gather_dict(meta, results['pred'], 
                                     thres=thres,
                                     target_ar=height/width,
                                     enlarge=enlarge
                                     )
            if len(annot_dict['path']) != 0:
                record2 = model(annot_dict)
                # update drawings
                for key in record2:
                    if 'record' in locals() and 'plots' in record[key]:
                        record2[key]['plots'] = record[key]['plots']
                save_dir = make_output_dir(cfgs, 'submission')   
                record2 = model.post_process(record2, 
                                             visualize=cfgs['visualize'],
                                             color_dict={'bbox_2d':'r',
                                                         'bbox_3d':'r',
                                                         'kpts':['rx', 'r'],
                                                         },
                                             save_dict={'flag':True,
                                                        'save_dir':save_dir
                                                        },
                                             alpha_mode=cfgs['testing_settings']['alpha_mode']
                                             )   
        if cfgs['visualize']:
            input("Press Enter to view next batch.")
        # set batch_to_show to a small number if you need to visualize 
        if batch_idx >= cfgs['batch_to_show'] - 1:
            break  
    return

def generate_empty_file(output_dir, label_dir):
    """
    Generate empty files for images without any predictions.
    """    
    all_files = os.listdir(label_dir)
    detected = os.listdir(os.path.join(output_dir, 'data'))
    for file_name in all_files:
        if not file_name.endswith(".txt"):
            continue
        if file_name not in detected:
            file = open(os.path.join(output_dir, 'data', file_name[:-4] + '.txt'), 'w')
            file.close()
    return

def main():
    # experiment configurations
    cfgs = parse.parse_args()
    
    # logging
    logger, final_output_dir = liblogger.get_logger(cfgs)   
    
    # save a copy of the experiment configuration
    save_cfg_path = os.path.join(final_output_dir, 'saved_config.yml')
    shutil.copyfile(cfgs['config_path'], save_cfg_path)
    
    # set GPU
    if cfgs['use_gpu'] and torch.cuda.is_available():
        logger.info('Using GPU:{}'.format(cfgs['gpu_id']))
        os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(list(map(str, cfgs['gpu_id'])))
    else:
        raise ValueError('CPU-based inference is not maintained.')
        
    # cudnn related setting
    torch.backends.cudnn.benchmark = cfgs['cudnn']['benchmark']
    torch.backends.cudnn.deterministic = cfgs['cudnn']['deterministic']
    torch.backends.cudnn.enabled = cfgs['cudnn']['enabled']
    
    # configurations related to the KITTI dataset
    data_cfgs = cfgs['dataset']
    
    # which split to show
    split = data_cfgs['split'] # default: KITTI val split
    dataset_inf = libkitti.get_dataset(cfgs, logger, split)
    
    # set the dataset to inference mode
    dataset_inf.inference([True, False])
    
    # read annotations
    input_file_path = cfgs['dirs']['load_prediction_file']
    # the record for 2D and 3D predictions
    results = {}
    
    # flags: the user can choose to use which type of input bounding boxes to use
    # use_gt_box can be used to re-produce the experiments simulating perfect 2D detection
    results['flags'] = {}
    if cfgs['use_pred_box']:
        # read the predicted boxes as specified by the path
        results['pred'] = dataset_inf.read_predictions(input_file_path)
    
    # Initialize Ego-Net and load the pre-trained checkpoint
    model = EgoNet(cfgs, pre_trained=True)
    model = model.eval().cuda()
    
    # perform inference and save the (updated) predictions
    inference(dataset_inf, model, results, cfgs)       
    if cfgs['visualize']:
        return
    
    evaluator = "./kitti-eval/evaluate_object_3d_offline"
    label_dir = os.path.join(cfgs['dataset']['root'], 'training', 'label_2')
    output_dir = os.path.join(cfgs['dirs']['output'], 'submission')
    
    # When generating submission files for the test split,
    # if no detections are produced for one image, generate an empty file
    if cfgs['dataset']['split'] == 'test':
        test_calib_dir = os.path.join(cfgs['dataset']['root'], 'testing', 'calib')
        generate_empty_file(output_dir, test_calib_dir)
        return
    
    # run kitti-eval to produce official evaluation
    command = "{} {} {}".format(evaluator, label_dir, output_dir)
    output = subprocess.check_output(command, shell=True)
    print(output.decode())
    return output

if __name__ == "__main__":
    main()

================================================
FILE: tools/inference_legacy.py
================================================
"""
This is the legacy inference code which includes some debugging functions.
You don't need to read this file to use Ego-Net.
"""
import sys
sys.path.append('../')

import libs.arguments.parse as parse
import libs.logger.logger as liblogger
import libs.dataset as dataset
import libs.dataset.KITTI.car_instance
import libs.model as models
import libs.model.FCmodel as FCmodel
import libs.dataset.normalization.operations as nop
import libs.visualization.points as vp
import libs.common.transformation as ltr

from libs.common.img_proc import resize_bbox, get_affine_transform, get_max_preds, generate_xy_map
from libs.common.img_proc import affine_transform_modified, cs2bbox, simple_crop, enlarge_bbox
from libs.trainer.trainer import visualize_lifting_results, get_loader
from libs.dataset.KITTI.car_instance import interp_dict

import shutil
import torch
import cv2
import numpy as np
import matplotlib.pyplot as plt
import os
import math
from scipy.spatial.transform import Rotation
from copy import deepcopy

def prepare_models(cfgs, is_cuda=True):
    """
    Initialize and load Ego-Net given a configuration file.
    """
    hm_model_settings = cfgs['heatmapModel']
    hm_model_name = hm_model_settings['name']
    method_str = 'models.heatmapModel.' + hm_model_name + '.get_pose_net'
    hm_model = eval(method_str)(cfgs, is_train=False)
    lifter = FCmodel.get_fc_model(stage_id=1, 
                                  cfgs=cfgs, 
                                  input_size=cfgs['FCModel']['input_size'],
                                  output_size=cfgs['FCModel']['output_size']
                                  )
    hm_model.load_state_dict(torch.load(cfgs['dirs']['load_hm_model']))
    stats = np.load(cfgs['dirs']['load_stats'], allow_pickle=True).item()
    lifter.load_state_dict(torch.load(cfgs['dirs']['load_lifter']))
    
    if is_cuda:
        hm_model = hm_model.cuda()
        lifter = lifter.cuda()
    model_dict = {'heatmap_regression':hm_model.eval(),
                  'lifting':lifter.eval(),
                  'FC_stats':stats                  
                  }
    return model_dict

def modify_bbox(bbox, target_ar, enlarge=1.1):
    """
    Enlarge a bounding box so that occluded parts may be enclosed.
    """
    lbbox = enlarge_bbox(bbox[0], bbox[1], bbox[2], bbox[3], [enlarge, enlarge])
    ret = resize_bbox(lbbox[0], lbbox[1], lbbox[2], lbbox[3], target_ar=target_ar)
    return ret

def crop_single_instance(img, bbox, resolution, pth_trans=None, xy_dict=None):
    """
    Crop a single instance given an image and bounding box.
    """
    bbox = to_npy(bbox)
    target_ar = resolution[0] / resolution[1]
    ret = modify_bbox(bbox, target_ar)
    c, s = ret['c'], ret['s']
    r = 0.
    # xy_dict: parameters for adding xy coordinate maps
    trans = get_affine_transform(c, s, r, resolution)
    instance = cv2.warpAffine(img,
                              trans,
                              (int(resolution[0]), int(resolution[1])),
                              flags=cv2.INTER_LINEAR
                              )
    #cv2.imwrite('test.jpg', input)
    #input = torch.from_numpy(input.transpose(2,0,1))
    if xy_dict is not None and xy_dict['flag']:
        xymap = generate_xy_map(ret['bbox'], resolution, img.shape[:-1])
        instance = np.concatenate([instance, xymap.astype(np.float32)], axis=2)        
    instance = instance if pth_trans is None else pth_trans(instance)
    return instance

def crop_instances(annot_dict, 
                   resolution, 
                   pth_trans=None, 
                   rgb=True,
                   xy_dict=None
                   ):
    """
    Crop input instances given an annotation dictionary.
    """
    all_instances = []
    # each record describes one instance
    all_records = []
    target_ar = resolution[0] / resolution[1]
    for idx, path in enumerate(annot_dict['path']):
        #print(path)
        data_numpy = cv2.imread(path, 1 | 128)    
        if data_numpy is None:
            raise ValueError('Fail to read {}'.format(path))    
        if rgb:
            data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB) 
        boxes = annot_dict['boxes'][idx]
        if 'labels' in annot_dict:
            labels = annot_dict['labels'][idx]
        else:
            labels = -np.ones((len(boxes)), dtype=np.int64)
        if 'scores' in annot_dict:
            scores = annot_dict['scores'][idx]
        else:
            scores = -np.ones((len(boxes)))
        if len(boxes) == 0:
            continue
        for idx, bbox in enumerate(boxes):
            # first crop the instance, and then resize to the required aspect ratio
            instance = crop_single_instance(data_numpy,
                                            bbox, 
                                            resolution, 
                                            pth_trans=pth_trans,
                                            xy_dict=xy_dict
                                            )
            bbox = to_npy(bbox)
            ret = modify_bbox(bbox, target_ar)
            c, s = ret['c'], ret['s']
            r = 0.
            all_instances.append(torch.unsqueeze(instance, dim=0))
            all_records.append({
                'path': path,
                'center': c,
                'scale': s,
                'bbox': bbox,
                'bbox_resize': ret['bbox'],
                'rotation': r,
                'label': labels[idx],
                'score': scores[idx]
                }
                )
            #break
    return torch.cat(all_instances, dim=0), all_records

def get_keypoints(instances, 
                  records, 
                  model, 
                  image_size=(256,256), 
                  arg_max='hard',
                  is_cuda=True
                  ):
    """
    Foward pass to obtain the screen coordinates.
    """
    if is_cuda:
        instances = instances.cuda()
    output = model(instances)
    if type(output) is tuple:
        pred, max_vals = output[1].data.cpu().numpy(), None  
        
    elif arg_max == 'hard':
        if not isinstance(output, np.ndarray):
            output = output.data.cpu().numpy()
        pred, max_vals = get_max_preds(output)
    else:
        raise NotImplementedError
    if type(output) is tuple:
        pred *= image_size[0]
    else:
        pred *= image_size[0]/output.shape[3]
    centers = [records[i]['center'] for i in range(len(records))]
    scales = [records[i]['scale'] for i in range(len(records))]
    rots = [records[i]['rotation'] for i in range(len(records))]    
    for sample_idx in range(len(pred)):
        trans_inv = get_affine_transform(centers[sample_idx],
                                         scales[sample_idx], 
                                         rots[sample_idx], 
                                         image_size, 
                                         inv=1)
        pred_src_coordinates = affine_transform_modified(pred[sample_idx], 
                                                             trans_inv) 
        record = records[sample_idx]
        # pred_src_coordinates += np.array([[record['bbox'][0], record['bbox'][1]]])
        records[sample_idx]['kpts'] = pred_src_coordinates
    # assemble a dictionary where each key corresponds to one image
    ret = {}
    for record in records:
        path = record['path']
        if path not in ret:
            ret[path] = {'center':[], 
                         'scale':[], 
                         'rotation':[], 
                         'bbox_resize':[], # resized bounding box
                         'kpts_2d_pred':[], 
                         'label':[], 
                         'score':[]
                         }
        ret[path]['kpts_2d_pred'].append(record['kpts'].reshape(1, -1))
        ret[path]['center'].append(record['center'])
        ret[path]['scale'].append(record['scale'])
        ret[path]['bbox_resize'].append(record['bbox_resize'])
        ret[path]['label'].append(record['label'])
        ret[path]['score'].append(record['score'])
        ret[path]['rotation'].append(record['rotation'])
    return ret

def kpts_to_euler(template, prediction):
    """
    Convert the predicted cuboid representation to euler angles.
    """    
    # estimate roll, pitch, yaw of the prediction by comparing with a 
    # reference bounding box
    # prediction and template of shape [3, N_points]
    R, T = ltr.compute_rigid_transform(template, prediction)
    # in the order of yaw, pitch and roll
    angles = Rotation.from_matrix(R).as_euler('yxz', degrees=False)
    # re-order in the order of x, y and z
    angles = angles[[1,0,2]]
    return angles, T

def get_template(prediction, interp_coef=[0.332, 0.667]):
    """
    Construct a template 3D cuboid used for computing regid transformation.
    """ 
    parents = prediction[interp_dict['bbox12'][0]]
    children = prediction[interp_dict['bbox12'][1]]
    lines = parents - children
    lines = np.sqrt(np.sum(lines**2, axis=1))
    h = np.sum(lines[:4])/4 # averaged over the four parallel line segments
    l = np.sum(lines[4:8])/4
    w = np.sum(lines[8:])/4
    x_corners = [0.5*l, l, l, l, l, 0, 0, 0, 0]
    y_corners = [0.5*h, 0, h, 0, h, 0, h, 0, h]
    z_corners = [0.5*w, w, w, 0, 0, w, w, 0, 0]
    x_corners += - np.float32(l) / 2
    y_corners += - np.float32(h)
    #y_corners += - np.float32(h/2)
    z_corners += - np.float32(w) / 2
    corners_3d = np.array([x_corners, y_corners, z_corners])    
    if len(prediction) == 33:
        pidx, cidx = interp_dict['bbox12']
        parents, children = corners_3d[:, pidx], corners_3d[:, cidx]
        lines = children - parents
        new_joints = [(parents + interp_coef[i]*lines) for i in range(len(interp_coef))]
        corners_3d = np.hstack([corners_3d, np.hstack(new_joints)])    
    return corners_3d

def get_observation_angle_trans(euler_angles, translations):
    """
    Convert orientation in camera coordinate into local coordinate system
    utilizing known object location (translation)
    """ 
    alphas = euler_angles[:,1].copy()
    for idx in range(len(euler_angles)):
        ry3d = euler_angles[idx][1] # orientation in the camera coordinate system
        x3d, z3d = translations[idx][0], translations[idx][2]
        alpha = ry3d - math.atan2(-z3d, x3d) - 0.5 * math.pi
        #alpha = ry3d - math.atan2(x3d, z3d)# - 0.5 * math.pi
        while alpha > math.pi: alpha -= math.pi * 2
        while alpha < (-math.pi): alpha += math.pi * 2
        alphas[idx] = alpha
    return alphas

def get_observation_angle_proj(euler_angles, kpts, K):
    """
    Convert orientation in camera coordinate into local coordinate system
    utilizing the projection of object on the image plane
    """ 
    f = K[0,0]
    cx = K[0,2]
    kpts_x = [kpts[i][0,0] for i in range(len(kpts))]
    alphas = euler_angles[:,1].copy()
    for idx in range(len(euler_angles)):
        ry3d = euler_angles[idx][1] # orientation in the camera coordinate system
        x3d, z3d = kpts_x[idx] - cx, f
        alpha = ry3d - math.atan2(-z3d, x3d) - 0.5 * math.pi
        #alpha = ry3d - math.atan2(x3d, z3d)# - 0.5 * math.pi
        while alpha > math.pi: alpha -= math.pi * 2
        while alpha < (-math.pi): alpha += math.pi * 2
        alphas[idx] = alpha
    return alphas

def get_6d_rep(predictions, ax=None, color="black"):
    """
    Get the 6DoF representation of a 3D prediction.
    """    
    predictions = predictions.reshape(len(predictions), -1, 3)
    all_angles = []
    for instance_idx in range(len(predictions)):
        prediction = predictions[instance_idx]
        # templates are 3D boxes with no rotation
        # the prediction is estimated as the rotation between prediction and template
        template = get_template(prediction)
        instance_angle, instance_trans = kpts_to_euler(template, prediction.T)        
        all_angles.append(instance_angle.reshape(1, 3))
    angles = np.concatenate(all_angles)
    # the first point is the predicted point center
    translation = predictions[:, 0, :]    
    if ax is not None:
        pose_vecs = np.concatenate([translation, angles], axis=1)
        draw_pose_vecs(ax, pose_vecs, color=color)
    return angles, translation

def format_str_submission(roll, pitch, yaw, x, y, z, score):
    """
    Get a prediction string in ApolloScape style.
    """      
    tempt_str = "{pitch:.3f} {yaw:.3f} {roll:.3f} {x:.3f} {y:.3f} {z:.3f} {score:.3f}".format(
            pitch=pitch,
            yaw=yaw,
            roll=roll,
            x=x,
            y=y,
            z=z,
            score=score)
    return tempt_str

def get_instance_str(dic):
    """
    Produce KITTI style prediction string for one instance.
    """     
    string = ""
    string += dic['class'] + " "
    string += "{:.1f} ".format(dic['truncation'])
    string += "{:.1f} ".format(dic['occlusion'])
    string += "{:.6f} ".format(dic['alpha'])
    string += "{:.6f} {:.6f} {:.6f} {:.6f} ".format(dic['bbox'][0], dic['bbox'][1], dic['bbox'][2], dic['bbox'][3])
    string += "{:.6f} {:.6f} {:.6f} ".format(dic['dimensions'][1], dic['dimensions'][2], dic['dimensions'][0])
    string += "{:.6f} {:.6f} {:.6f} ".format(dic['locations'][0], dic['locations'][1], dic['locations'][2])
    string += "{:.6f} ".format(dic['rot_y'])
    if 'score' in dic:
        string += "{:.8f} ".format(dic['score'])
    else:
        string += "{:.8f} ".format(1.0)
    return string

def get_pred_str(record):
    """
    Produce KITTI style prediction string for a record dictionary.
    """      
    # replace the rotation prediction generated by the previous stage
    updated_txt = deepcopy(record['raw_txt_format'])
    for instance_id in range(len(record['euler_angles'])):
        updated_txt[instance_id]['rot_y'] = record['euler_angles'][instance_id, 1]
        updated_txt[instance_id]['alpha'] = record['alphas'][instance_id]
    pred_str = ""
    angles = record['euler_angles']
    for instance_id in range(len(angles)):
        # format a string for submission
        tempt_str = get_instance_str(updated_txt[instance_id])
        if instance_id != len(angles) - 1:
            tempt_str += '\n'
        pred_str += tempt_str
    return pred_str

def lift_2d_to_3d(records, model, stats, template, cuda=True):
    """
    Foward-pass of the lifter model.
    """      
    for path in records.keys():
        data = np.concatenate(records[path]['kpts_2d_pred'], axis=0)
        data = nop.normalize_1d(data, stats['mean_in'], stats['std_in'])
        data = data.astype(np.float32)
        data = torch.from_numpy(data)
        if cuda:
            data = data.cuda()
        prediction = model(data)  
        prediction = nop.unnormalize_1d(prediction.data.cpu().numpy(),
                                        stats['mean_out'], 
                                        stats['std_out']
                                        )
        records[path]['kpts_3d_pred'] = prediction.reshape(len(prediction), -1, 3)
    return records

def filter_detection(detected, thres=0.7):
    """
    Filter predictions based on a confidence threshold.
    """      
    # detected: list of dict
    filtered = []
    for detection in detected:
        tempt_dict = {}
        indices = detection['scores'] > thres
        for key in ['boxes', 'labels', 'scores']:
            tempt_dict[key] = detection[key][indices]
        filtered.append(tempt_dict)
    return filtered

def add_orientation_arrow(record):
    """
    Generate an arrow for each predicted orientation for visualization.
    """      
    pred_kpts = record['kpts_3d_pred']
    gt_kpts = record['kpts_3d_gt']
    K = record['K']
    arrow_2d = np.zeros((len(pred_kpts), 2, 2))
    for idx in range(len(pred_kpts)):
        vector_3d = (pred_kpts[idx][1] - pred_kpts[idx][5])
        arrow_3d = np.concatenate([gt_kpts[idx][0].reshape(3, 1), 
                                  (gt_kpts[idx][0] + vector_3d).reshape(3, 1)],
                                  axis=1)
        projected = K @ arrow_3d
        arrow_2d[idx][0] = projected[0, :] / projected[2, :]
        arrow_2d[idx][1] = projected[1, :] / projected[2, :]
        # fix the arrow length if not fore-shortened
        vector_2d = arrow_2d[idx][:,1] - arrow_2d[idx][:,0]
        length = np.linalg.norm(vector_2d)
        if length > 50:
            vector_2d = vector_2d/length * 60
        arrow_2d[idx][:,1] = arrow_2d[idx][:,0] + vector_2d
    return arrow_2d

def process_batch(images, 
                  hm_regressor, 
                  lifter, 
                  stats, 
                  template,
                  annot_dict,
                  pth_trans=None, 
                  is_cuda=True, 
                  threshold=None,
                  xy_dict=None
                  ):
    """
    Process a batch of images.
    # annot_dict is a Python dictionary storing
    # keys: 
    #       path: list of image paths
    #       boxes: list of bounding boxes for each image
    """
    all_instances, all_records = crop_instances(annot_dict, 
                                                resolution=(256, 256),
                                                pth_trans=pth_trans,
                                                xy_dict=xy_dict
                                                )
    # all_records stores records for each instance
    records = get_keypoints(all_instances, all_records, hm_regressor)
    # records stores records for each image
    records = lift_2d_to_3d(records, lifter, stats, template)
    # merge with the annotation dictionary
    for idx, path in enumerate(annot_dict['path']):
        if 'boxes' in annot_dict:
            records[path]['boxes'] = to_npy(annot_dict['boxes'][idx])
        if 'kpts' in annot_dict:
            records[path]['kpts_2d_gt'] = to_npy(annot_dict['kpts'][idx])   
        if 'kpts_3d_gt' in annot_dict:
            records[path]['kpts_3d_gt'] = to_npy(annot_dict['kpts_3d_gt'][idx])   
        if 'pose_vecs_gt' in annot_dict:            
            records[path]['pose_vecs_gt'] = to_npy(annot_dict['pose_vecs_gt'][idx])  
        if 'kpts_3d_SMOKE' in annot_dict:
            records[path]['kpts_3d_SMOKE'] = to_npy(annot_dict['kpts_3d_SMOKE'][idx])  
        if 'raw_txt_format' in annot_dict:
            # list of annotation dictionary for each instance
            records[path]['raw_txt_format'] = annot_dict['raw_txt_format'][idx]
        if 'K' in annot_dict:
            # list of annotation dictionary for each instance
            records[path]['K'] = annot_dict['K'][idx]
        if 'kpts_3d_gt' in annot_dict and 'K' in annot_dict:
            records[path]['arrow'] = add_orientation_arrow(records[path])
    return records

def to_npy(tensor):
    """
    Convert PyTorch tensor to numpy array.
    """
    if isinstance(tensor, np.ndarray):
        return tensor
    else:
        return tensor.data.cpu().numpy()

def refine_with_perfect_size(pred, 
                             observation, 
                             intrinsics, 
                             dist_coeffs, 
                             gts, 
                             threshold=5., 
                             ax=None
                             ):
    """
    Use the gt 3D box size for refinement to show the performance gain with 
    size regression.
    If there is a nearby ground truth bbox, use its size.
    pred [9, 3] gts[N, 9, 3]
    """    
    pred_center = pred[0, :].reshape(1,3)
    distance = np.sqrt(np.sum((gts[:, 0, :] - pred_center)**2, axis=1))
    minimum_idx = np.where(distance == distance.min())[0][0]
    if distance[minimum_idx] > threshold:
        return False, None
    else:
        # First align the box with gt size with the predicted box, then refine
        tempt_box_pred = pred.copy()
        tempt_box_pred[1:, :] += tempt_box_pred[0, :].reshape(1, 3)        
        tempt_box_gt = gts[minimum_idx].copy()
        tempt_box_gt[1:, :] += tempt_box_gt[0, :].reshape(1, 3)         
        pseudo_box = ltr.procrustes_transform(tempt_box_gt.T, tempt_box_pred.T)
        refined_prediction = ltr.pnp_refine(pseudo_box.T, observation, intrinsics, 
                                        dist_coeffs) 
        if ax is not None:
            vp.plot_lines(ax, 
                          pseudo_box[:, 1:].T, 
                          vp.plot_3d_bbox.connections, 
                          dimension=3, 
                          c='y',
                          linestyle='-.')         
            vp.plot_lines(ax, 
                          refined_prediction[:, 1:].T, 
                          vp.plot_3d_bbox.connections, 
                          dimension=3, 
                          c='b',
                          linestyle='-.')         
        return True, refined_prediction

def refine_with_predicted_bbox(pred, 
                               observation, 
                               intrinsics, 
                               dist_coeffs, 
                               gts=None, 
                               threshold=5., 
                               ax=None
                               ):
    """
    Refine with predicted 3D cuboid (disabled by default).
    """ 
    tempt_box_pred = pred.copy()
    tempt_box_pred[1:, :] += tempt_box_pred[0, :].reshape(1, 3)
    # use the predicted 3D bounding box size for refinement
    refined_prediction = ltr.pnp_refine(tempt_box_pred, observation, intrinsics, 
                                    dist_coeffs)    
    # discard the results if the refined solution is to far away from the initial position
    distance = refined_prediction[:, 0] - tempt_box_pred[0, :]
    distance = np.sqrt(np.sum(distance**2))
    if distance > threshold:
        return False, None
    else:
        # plotting
        if ax is not None:
            vp.plot_lines(ax, 
                          refined_prediction[:, 1:].T, 
                          vp.plot_3d_bbox.connections, 
                          dimension=3, 
                          c='g')        
        return True, refined_prediction

def draw_pose_vecs(ax, pose_vecs=None, color='black'):
    """
    Add pose vectors to a 3D matplotlib axe.
    """     
    if pose_vecs is None:
        return
    for pose_vec in pose_vecs:
        x, y, z, pitch, yaw, roll = pose_vec
        string = "({:.2f}, {:.2f}, {:.2f})".format(pitch, yaw, roll)
        # add some random noise to the text location so that they do not overlap
        nl = 0.02 # noise level
        ax.text(x*(1+np.random.randn()*nl), 
                y*(1+np.random.randn()*nl), 
                z*(1+np.random.randn()*nl), 
                string, 
                color=color
                )

def refine_solution(est_3d, 
                    est_2d, 
                    K, 
                    dist_coeffs, 
                    refine_func, 
                    output_arr, 
                    output_flags, 
                    gts=None, 
                    ax=None
                    ):
    """
    Refine 3D prediction by minimizing re-projection error.
    est: estimates [N, 9, 3]
    K: intrinsics    
    """      
    for idx in range(len(est_3d)):
        success, refined_prediction = refine_func(est_3d[idx],
                                                  est_2d[idx],
                                                  K,
                                                  dist_coeffs,
                                                  gts=gts,
                                                  ax=ax)
        if success:
            # update the refined solution
            output_arr[idx] = refined_prediction.T
            output_flags[idx] = True
            # # convert to the center-relative shape representation
            # p3d_pred_refined[idx][1:, :] -= p3d_pred_refined[idx][[0]]    
    return

def gather_lifting_results(record,
                           data,
                           prediction, 
                           target=None,
                           pose_vecs=None,
                           intrinsics=None, 
                           refine=False, 
                           visualize=False,
                           template=None,
                           dist_coeffs=np.zeros((4,1)),
                           color='r',
                           get_str=False,
                           alpha_mode='trans'
                           ):
    """
    Lift Screen coordinates to 3D representation and a optimization-based 
    refinement is optional.
    """
    if target is not None:
        p3d_gt = target.reshape(len(target), -1, 3)
    else:
        p3d_gt = None
    p3d_pred = prediction.reshape(len(prediction), -1, 3)
    # only for visualizing the prediciton of shape using gt bboxes
    if "kpts_3d_SMOKE" in record:
        p3d_pred = np.concatenate([record['kpts_3d_SMOKE'][:, [0], :], p3d_pred], axis=1)
    elif p3d_gt is not None and p3d_gt.shape[1] == p3d_pred.shape[1] + 1:
        if len(p3d_pred) != len(p3d_gt):
            print('debug')
        assert len(p3d_pred) == len(p3d_gt)
        p3d_pred = np.concatenate([p3d_gt[:, [0], :], p3d_pred], axis=1) 
    else:
        raise NotImplementedError
    # this object will be updated if one prediction is refined 
    p3d_pred_refined = p3d_pred.copy()
    refine_flags = [False for i in range(len(p3d_pred_refined))]
    # similar object but using a different refinement strategy
    p3d_pred_refined2 = p3d_pred.copy()
    refine_flags2 = [False for i in range(len(p3d_pred_refined2))]
    # input 2D keypoints
    data = data.reshape(len(data), -1, 2)
    if visualize:
        if 'plots' in record and 'ax3d' in record['plots']:
            ax = record['plots']['ax3d']
            ax = vp.plot_scene_3dbox(p3d_pred, p3d_gt, ax=ax, color=color)
        elif 'plots' in record:
        # plotting the 3D scene
            ax = vp.plot_scene_3dbox(p3d_pred, p3d_gt, color=color)
            draw_pose_vecs(ax, pose_vecs)
            record['plots']['ax3d'] = ax
        else:
            raise ValueError
    else:
        ax = None
    if refine:
        assert intrinsics is not None         
        # refine 3D point prediction by minimizing re-projection errors        
        refine_solution(p3d_pred, 
                        data, 
                        intrinsics, 
                        dist_coeffs, 
                        refine_with_predicted_bbox, 
                        p3d_pred_refined, 
                        refine_flags,
                        ax=ax
                        )
        if target is not None:
            # refine with ground truth bounding box size for debugging purpose
            refine_solution(p3d_pred, 
                            data, 
                            intrinsics, 
                            dist_coeffs, 
                            refine_with_perfect_size, 
                            p3d_pred_refined2, 
                            refine_flags2,
                            gts=p3d_gt,
                            ax=ax
                            )        
    record['kpts_3d_refined'] = p3d_pred_refined  
    # prepare the prediction string for submission
    # compute the roll, pitch and yaw angle of the predicted bounding box
    record['euler_angles'], record['translation'] = \
        get_6d_rep(record['kpts_3d_refined'], ax, color=color) # the predicted pose vectors are also drawn here
    if alpha_mode == 'trans':
        record['alphas'] = get_observation_angle_trans(record['euler_angles'], 
                                                       record['translation'])
    elif alpha_mode == 'proj':
        record['alphas'] = get_observation_angle_proj(record['euler_angles'],
                                                      record['kpts_2d_pred'],
                                                      record['K'])        
    else:
         raise NotImplementedError   
    if get_str:
        record['pred_str'] = get_pred_str(record)      
    return record

def save_txt_file(img_path, prediction, params):
    """
    Save a txt file for predictions of an image.
    """    
    if not params['flag']:
        return
    file_name = img_path.split('/')[-1][:-3] + 'txt'
    save_path = os.path.join(params['save_dir'], file_name) 
    with open(save_path, 'w') as f:
        f.write(prediction['pred_str'])
    return

def refine_one_image(img_path, 
                     record, 
                     add_3d_bbox=True, 
                     camera=None, 
                     template=None,
                     visualize=False,
                     color_dict={'bbox_2d':'r',
                                 'bbox_3d':'r',
                                 'kpts':['rx', 'b']
                                 },
                     save_dict={'flag':False,
                                'save_dir':None
                                },
                     alpha_mode='trans'
                     ):
    """
    Refine the predictions from a single image.
    """
    # plot 2D predictions 
    if visualize:
        if 'plots' in record:
            fig = record['plots']['fig2d']
            ax = record['plots']['ax2d']
        else:
            fig = plt.figure(figsize=(11.3, 9))
            ax = plt.subplot(111)
            record['plots'] = {}
            record['plots']['fig2d'] = fig
            record['plots']['ax2d'] = ax
            image = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)[:, :, ::-1]
            height, width, _ = image.shape
            ax.imshow(image) 
            ax.set_xlim([0, width])
            ax.set_ylim([0, height])
            ax.invert_yaxis()
            
        num_instances = len(record['kpts_2d_pred'])
        for idx in range(num_instances):
            kpts = record['kpts_2d_pred'][idx].reshape(-1, 2)
            # kpts_3d = record['kpts_3d'][idx]
            bbox = record['bbox_resize'][idx]
            label = record['label'][idx]
            score = record['score'][idx]
            vp.plot_2d_bbox(ax, bbox, color_dict['bbox_2d'], score, label)
            # predicted key-points
            ax.plot(kpts[:, 0], kpts[:, 1], color_dict['kpts'][0])
            # if add_3d_bbox:
            #     vp.plot_3d_bbox(ax, kpts[1:,], color_dict['kpts'][1])  
            # bbox_3d_projected = project_3d_to_2d(kpts_3d)
            # vp.plot_3d_bbox(ax, bbox_3d_projected[:2, :].T)      
        # plot ground truth
        if 'kpts_2d_gt' in record:
            for idx, kpts_gt in enumerate(record['kpts_2d_gt']):
                kpts_gt = kpts_gt.reshape(-1, 3)
                # ax.plot(kpts_gt[:, 0], kpts_gt[:, 1], 'gx')
                vp.plot_3d_bbox(ax, kpts_gt[1:, :2], color='g', linestyle='-.')
        if 'arrow' in record:
            for idx in range(len(record['arrow'])):
                start = record['arrow'][idx][:,0]
                end = record['arrow'][idx][:,1]
                x, y = start
                dx, dy = end - start
                ax.arrow(x, y, dx, dy, color='r', lw=4, head_width=5, alpha=0.5)         
            # save intermediate results
            plt.gca().set_axis_off()
            plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, 
            hspace = 0, wspace = 0)
            plt.margins(0,0)
            plt.gca().xaxis.set_major_locator(plt.NullLocator())
            plt.gca().yaxis.set_major_locator(plt.NullLocator())
            img_name = img_path.split('/')[-1]
            save_dir = './debug/qualitative_results/'
            plt.savefig(save_dir + img_name, dpi=100, bbox_inches = 'tight', pad_inches = 0)
    # plot 3d bounding boxes
    all_kpts_2d = np.concatenate(record['kpts_2d_pred'])
    all_kpts_3d_pred = record['kpts_3d_pred'].reshape(len(record['kpts_3d_pred']), -1)
    if 'kpts_3d_gt' in record:
        all_kpts_3d_gt = record['kpts_3d_gt']
        all_pose_vecs_gt = record['pose_vecs_gt']
    else:
        all_kpts_3d_gt = None
        all_pose_vecs_gt = None
    refine_args = {'visualize':visualize, 'get_str':save_dict['flag']}
    if camera is not None:
        refine_args['intrinsics'] = camera
        refine_args['refine'] = True
        refine_args['template'] = template
    # refine and gather the prediction strings
    record = gather_lifting_results(record,
                                    all_kpts_2d,
                                    all_kpts_3d_pred, 
                                    all_kpts_3d_gt,
                                    all_pose_vecs_gt,
                                    color=color_dict['bbox_3d'],
                                    alpha_mode=alpha_mode,
                                    **refine_args
                                    )
    # plot 3D bounding box generated by SMOKE
    if 'kpts_3d_SMOKE' in record:
        kpts_3d_SMOKE = record['kpts_3d_SMOKE']
        if 'plots' in record:
            # update drawings
            ax = record['plots']['ax3d']
            vp.plot_scene_3dbox(kpts_3d_SMOKE, ax=ax, color='m')    
            pose_vecs = np.zeros((len(kpts_3d_SMOKE), 6))
            for idx in range(len(pose_vecs)):
                pose_vecs[idx][0:3] = record['raw_txt_format'][idx]['locations']
                pose_vecs[idx][4] = record['raw_txt_format'][idx]['rot_y']
            # plot pose vectors
            draw_pose_vecs(ax, pose_vecs, color='m')
    # save KITTI-style prediction file in .txt format
    save_txt_file(img_path, record, save_dict)
    return record

def post_process(records, 
                 camera=None, 
                 template=None, 
                 visualize=False, 
                 color_dict={'bbox_2d':'r',
                             'kpts':['ro', 'b'],
                             },
                 save_dict={'flag':False,
                            'save_dir':None
                            },
                 alpha_mode='trans'
                 ):
    for img_path in records.keys():
        print(img_path)
        records[img_path] = refine_one_image(img_path, 
                                             records[img_path], 
                                             camera=camera,
                                             template=template,
                                             visualize=visualize,
                                             color_dict=color_dict,
                                             save_dict=save_dict,
                                             alpha_mode=alpha_mode
                                             )      
    return records

def merge(dict_a, dict_b):
    for key in dict_b.keys():
        dict_a[key] = dict_b[key]
    return

def collate_dict(dict_list):
    ret = {}
    for key in dict_list[0]:
        ret[key] = [d[key] for d in dict_list]
    return ret

def my_collate_fn(batch):
    # the collate function for 2d pose training
    imgs, meta = list(zip(*batch))
    meta = collate_dict(meta)
    return imgs, meta

def filter_conf(record, thres=0.0):
    """
    Filter the proposals with a confidence threshold.
    """
    annots = record['raw_txt_format']
    indices = [i for i in range(len(annots)) if annots[i]['score'] >= thres]
    if len(indices) == 0:
        return False, record
    filterd_record = {
        'bbox_2d': record['bbox_2d'][indices],
        'kpts_3d': record['kpts_3d'][indices],
        'raw_txt_format': [annots[i] for i in indices],
        'scores': [annots[i]['score'] for i in indices],
        'K':record['K']
        }
    return True, filterd_record

def gather_dict(request, references, filter_c=True):
    """
    Gather a dict from reference as requsted.
    """
    assert 'path' in request
    ret = {'path':[], 
           'boxes':[], 
           'kpts_3d_SMOKE':[], 
           'raw_txt_format':[],
           'scores':[],
           'K':[]}
    for img_path in request['path']:
        img_name = img_path.split('/')[-1]
        if img_name not in references:
            print('Warning: ' + img_name + ' not included in detected images!')
            continue
        ref = references[img_name]
        if filter_c:
            success, ref = filter_conf(ref)
        if filter_c and not success:
            continue
        ret['path'].append(img_path)
        # ret['boxes'].append(ref['bbox_2d'])
        # temporary hack: enlarge the bounding box from the stage 1 model
        bbox = ref['bbox_2d']
        for instance_id in range(len(bbox)):
            bbox[instance_id] = np.array(modify_bbox(bbox[instance_id], target_ar=1, enlarge=1.2)['bbox'])
        # temporary hack 2: use the gt bounding box for analysis
        ret['boxes'].append(bbox)
        # 3D bounding box produced by SMOKE
        ret['kpts_3d_SMOKE'].append(ref['kpts_3d'])
        ret['raw_txt_format'].append(ref['raw_txt_format'])
        ret['scores'].append(ref['scores'])
        ret['K'].append(ref['K'])
    #ret['kpts_3d_gt'] = request['kpts_3d_gt']
    if 'pose_vecs_gt' in request:
        ret['pose_vecs_gt'] = request['pose_vecs_gt']
    return ret
    
@torch.no_grad()
def inference(testset, model_settings, results, cfgs):
    """
    The main inference function.
    """
    # visualize to plot the 2D detection and 3D scene reconstruction
    data_loader = get_loader(testset, cfgs, 'testing', collate_fn=my_collate_fn)          
    hm_regressor = model_settings['heatmap_regression']
    lifter = model_settings['lifting']
    # statistics for the FC model
    stats = model_settings['FC_stats']
    #template = testset.instance_stats['ref_box3d']
    template = None
    pth_trans = testset.pth_trans
    if 'add_xy' in cfgs['heatmapModel']:
        xy_dict = {'flag':cfgs['heatmapModel']['add_xy']}
    else:
        xy_dict = None
    all_records = {}
    camera = None
    flags = results['flags']
    visualize = cfgs['visualize']
    batch_to_show = cfgs['batch_to_show']
    for batch_idx, (images, meta) in enumerate(data_loader):
        if flags['gt']:
            save_dir = os.path.join(cfgs['dirs']['output'], 'gt_box_test', 'data')
            if not os.path.exists(save_dir):
                os.makedirs(save_dir)            
            # ground truth bounding box for comparison
            record = process_batch(images,
                                   hm_regressor, 
                                   lifter, 
                                   stats, 
                                   template,
                                   annot_dict=meta,
                                   pth_trans=pth_trans, 
                                   threshold=None,
                                   xy_dict=xy_dict
                                   )
            record = post_process(record, 
                                  camera, 
                                  template, 
                                  visualize=visualize,
                                  color_dict={'bbox_2d':'y',
                                              'bbox_3d':'y',
                                              'kpts':['yx', 'y'],
                                              },
                                  save_dict={
                                      'flag':True,
                                      'save_dir':save_dir
                                      }
                                  )
            merge(all_records, record)
        if flags['pred']:
            # use detected bounding box from an anchor-free model
            annot_dict = gather_dict(meta, results['pred'])
            if len(annot_dict['path']) == 0:
                continue
            record2 = process_batch(images,
                                    hm_regressor, 
                                    lifter, 
                                    stats, 
                                    template,
                                    annot_dict,
                                    pth_trans=pth_trans, 
                                    threshold=None,
                                    xy_dict=xy_dict
                                    )
            for key in record2:
                if 'record' in locals() and 'plots' in record[key]:
                    record2[key]['plots'] = record[key]['plots']
            save_dir = os.path.join(cfgs['dirs']['output'], 'submission', 'data')
            if not os.path.exists(save_dir):
                os.makedirs(save_dir)
            record2 = post_process(record2,
                                   camera, 
                                   template, 
                                   visualize=visualize,
                                   color_dict={'bbox_2d':'r',
                                               'bbox_3d':'r',
                                               'kpts':['rx', 'r'],
                                               },
                                   save_dict={'flag':True,
                                              'save_dir':save_dir
                                              },
                                  alpha_mode=cfgs['testing_settings']['alpha_mode']
                                  )   
            del images, record2, meta
        if batch_idx >= batch_to_show - 1:
            break
    # produce a csv file
    # csv_output_path = cfgs['dirs']['csv_output']
    # save_csv(all_records, csv_output_path)        
    return

def generate_empty_file(output_dir, label_dir):
    """
    Generate empty files for images without any predictions.
    """    
    all_files = os.listdir(label_dir)
    detected = os.listdir(os.path.join(output_dir, 'data'))
    for file_name in all_files:
        if file_name[:-4] + ".txt" not in detected:
            file = open(os.path.join(output_dir, 'data', file_name[:-4] + '.txt'), 'w')
            file.close()
    return

def main():
    # experiment configurations
    cfgs = parse.parse_args()
    
    # logging
    logger, final_output_dir = liblogger.get_logger(cfgs)   
    shutil.copyfile(cfgs['config_path'], os.path.join(final_output_dir, 'saved_config.yml'))
    # Set GPU
    if cfgs['use_gpu'] and torch.cuda.is_available():
        GPUs = cfgs['gpu_id']
    else:
        logger.info("GPU acceleration is disabled.")
        
    # cudnn related setting
    torch.backends.cudnn.benchmark = cfgs['cudnn']['benchmark']
    torch.backends.cudnn.deterministic = cfgs['cudnn']['deterministic']
    torch.backends.cudnn.enabled = cfgs['cudnn']['enabled']

    data_cfgs = cfgs['dataset']
    
    # which split to show
    split = 'valid'
    dataset_inf = eval('dataset.' + data_cfgs['name']  
                        + '.car_instance').get_dataset(cfgs, logger, split)
    # set to inference mode but does not read image
    dataset_inf.inference([True, False])
    
    # some temporary testing
    # test_angle_conversion(dataset_inf, dataset_inf.instance_stats['ref_box3d'])
    
    # read annotations
    input_file_path = cfgs['dirs']['load_prediction_file']
    # the record for 2D and 3D predictions
    # key->value: name of the approach->dictionary storing the predictions
    results = {}
    confidence_thres = cfgs['conf_thres']
    
    # flags: use predicted bounding boxes as well as the ground truth boxes
    # for comparison
    results['flags'] = {}
    results['flags']['pred'] = cfgs['use_pred_box']
    if results['flags']['pred']:
        results['pred'] = dataset_inf.read_predictions(input_file_path)
    results['flags']['gt'] = cfgs['use_gt_box']
    
    # load checkpoints
    model_dict = prepare_models(cfgs)
    
    # inference and update prediction
    inference(dataset_inf, model_dict, results, cfgs)       
    
    # then you can run kitti-eval for evaluation
    evaluator = cfgs['dirs']['kitti_evaluator']
    label_dir = cfgs['dirs']['kitti_label']
    output_dir = os.path.join(cfgs['dirs']['output'], 'submission')
    
    # if no detections are produced, generate an empty file
    #generate_empty_file(output_dir, label_dir)
    command = "{} {} {}".format(evaluator, label_dir, output_dir)
    # e.g.
    # ~/Documents/Github/SMOKE/smoke/data/datasets/evaluation/kitti/kitti_eval/evaluate_object_3d_offline /home/nicholas/Documents/Github/SMOKE/datasets/kitti/training/label_2 /media/nicholas/Database/experiments/3DLearning/0826
    # /media/nicholas/Database/Github/M3D-RPN/data/kitti_split1/devkit/cpp/evaluate_object /home/nicholas/Documents/Github/SMOKE/datasets/kitti/training/label_2 /media/nicholas/Database/Github/M3D-RPN/output/tmp_results
    return

if __name__ == "__main__":
    main()


================================================
FILE: tools/kitti-eval/README.md
================================================
# kitti_eval

`evaluate_object_3d_offline.cpp`evaluates your KITTI detection locally on your own computer using your validation data selected from KITTI training dataset, with the following metrics:

- overlap on image (AP)
- oriented overlap on image (AOS)
- overlap on ground-plane (AP)
- overlap in 3D (AP)

Compile `evaluate_object_3d_offline.cpp` (or `evaluate_object_3d_offline_r40.cpp` for the updated metric) with dependency of Boost and Linux `dirent.h` (You should already have it under most Linux).

Run the evalutaion by:

    ./evaluate_object_3d_offline groundtruth_dir result_dir
    
Note that you don't have to detect over all KITTI training data. The evaluator only evaluates samples whose result files exist.


### Updates

- June, 2017:
  * Fixed the bug of detection box filtering based on min height according to KITTI's note on 25.04.2017.


================================================
FILE: tools/kitti-eval/evaluate_object_3d.cpp
================================================
#include <iostream>
#include <algorithm>
#include <stdio.h>
#include <math.h>
#include <vector>
#include <numeric>
#include <strings.h>
#include <assert.h>

#include <dirent.h>

#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/io.hpp>

#include <boost/geometry.hpp>
#include <boost/geometry/geometries/point_xy.hpp>
#include <boost/geometry/geometries/polygon.hpp>
#include <boost/geometry/geometries/adapted/c_array.hpp>

#include "mail.h"

BOOST_GEOMETRY_REGISTER_C_ARRAY_CS(cs::cartesian)

typedef boost::geometry::model::polygon<boost::geometry::model::d2::point_xy<double> > Polygon;


using namespace std;

/*=======================================================================
STATIC EVALUATION PARAMETERS
=======================================================================*/

// holds the number of test images on the server
const int32_t N_TESTIMAGES = 7518;

// easy, moderate and hard evaluation level
enum DIFFICULTY{EASY=0, MODERATE=1, HARD=2};

// evaluation metrics: image, ground or 3D
enum METRIC{IMAGE=0, GROUND=1, BOX3D=2};

// evaluation parameter
const int32_t MIN_HEIGHT[3]     = {40, 25, 25};     // minimum height for evaluated groundtruth/detections
const int32_t MAX_OCCLUSION[3]  = {0, 1, 2};        // maximum occlusion level of the groundtruth used for evaluation
const double  MAX_TRUNCATION[3] = {0.15, 0.3, 0.5}; // maximum truncation level of the groundtruth used for evaluation

// evaluated object classes
enum CLASSES{CAR=0, PEDESTRIAN=1, CYCLIST=2};
const int NUM_CLASS = 3;

// parameters varying per class
vector<string> CLASS_NAMES;
// the minimum overlap required for 2D evaluation on the image/ground plane and 3D evaluation
const double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.5, 0.25, 0.25}, {0.5, 0.25, 0.25}};

// no. of recall steps that should be evaluated (discretized)
const double N_SAMPLE_PTS = 41;


// initialize class names
void initGlobals () {
  CLASS_NAMES.push_back("car");
  CLASS_NAMES.push_back("pedestrian");
  CLASS_NAMES.push_back("cyclist");
}

/*=======================================================================
DATA TYPES FOR EVALUATION
=======================================================================*/

// holding data needed for precision-recall and precision-aos
struct tPrData {
  vector<double> v;           // detection score for computing score thresholds
  double         similarity;  // orientation similarity
  int32_t        tp;          // true positives
  int32_t        fp;          // false positives
  int32_t        fn;          // false negatives
  tPrData () :
    similarity(0), tp(0), fp(0), fn(0) {}
};

// holding bounding boxes for ground truth and detections
struct tBox {
  string  type;     // object type as car, pedestrian or cyclist,...
  double   x1;      // left corner
  double   y1;      // top corner
  double   x2;      // right corner
  double   y2;      // bottom corner
  double   alpha;   // image orientation
  tBox (string type, double x1,double y1,double x2,double y2,double alpha) :
    type(type),x1(x1),y1(y1),x2(x2),y2(y2),alpha(alpha) {}
};

// holding ground truth data
struct tGroundtruth {
  tBox    box;        // object type, box, orientation
  double  truncation; // truncation 0..1
  int32_t occlusion;  // occlusion 0,1,2 (non, partly, fully)
  double ry;
  double  t1, t2, t3;
  double h, w, l;
  tGroundtruth () :
    box(tBox("invalild",-1,-1,-1,-1,-10)),truncation(-1),occlusion(-1) {}
  tGroundtruth (tBox box,double truncation,int32_t occlusion) :
    box(box),truncation(truncation),occlusion(occlusion) {}
  tGroundtruth (string type,double x1,double y1,double x2,double y2,double alpha,double truncation,int32_t occlusion) :
    box(tBox(type,x1,y1,x2,y2,alpha)),truncation(truncation),occlusion(occlusion) {}
};

// holding detection data
struct tDetection {
  tBox    box;    // object type, box, orientation
  double  thresh; // detection score
  double  ry;
  double  t1, t2, t3;
  double  h, w, l;
  tDetection ():
    box(tBox("invalid",-1,-1,-1,-1,-10)),thresh(-1000) {}
  tDetection (tBox box,double thresh) :
    box(box),thresh(thresh) {}
  tDetection (string type,double x1,double y1,double x2,double y2,double alpha,double thresh) :
    box(tBox(type,x1,y1,x2,y2,alpha)),thresh(thresh) {}
};


/*=======================================================================
FUNCTIONS TO LOAD DETECTION AND GROUND TRUTH DATA ONCE, SAVE RESULTS
=======================================================================*/
vector<int32_t> indices;

vector<tDetection> loadDetections(string file_name, bool &compute_aos,
        vector<bool> &eval_image, vector<bool> &eval_ground,
        vector<bool> &eval_3d, bool &success) {

  // holds all detections (ignored detections are indicated by an index vector
  vector<tDetection> detections;
  FILE *fp = fopen(file_name.c_str(),"r");
  if (!fp) {
    success = false;
    return detections;
  }
  while (!feof(fp)) {
    tDetection d;
    double trash;
    char str[255];
    if (fscanf(fp, "%s %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf",
                   str, &trash, &trash, &d.box.alpha, &d.box.x1, &d.box.y1,
                   &d.box.x2, &d.box.y2, &d.h, &d.w, &d.l, &d.t1, &d.t2, &d.t3,
                   &d.ry, &d.thresh)==16) {

        // d.thresh = 1;
      d.box.type = str;
      detections.push_back(d);

      // orientation=-10 is invalid, AOS is not evaluated if at least one orientation is invalid
      if(d.box.alpha == -10)
        compute_aos = false;

      // a class is only evaluated if it is detected at least once
      for (int c = 0; c < NUM_CLASS; c++) {
        if (!strcasecmp(d.box.type.c_str(), CLASS_NAMES[c].c_str())) {
          if (!eval_image[c] && d.box.x1 >= 0)
            eval_image[c] = true;
          if (!eval_ground[c] && d.t1 != -1000)
            eval_ground[c] = true;
          if (!eval_3d[c] && d.t2 != -1000)
            eval_3d[c] = true;
          break;
        }
      }
    }
  }
  fclose(fp);
  success = true;
  return detections;
}

vector<tGroundtruth> loadGroundtruth(string file_name,bool &success) {

  // holds all ground truth (ignored ground truth is indicated by an index vector
  vector<tGroundtruth> groundtruth;
  FILE *fp = fopen(file_name.c_str(),"r");
  if (!fp) {
    success = false;
    return groundtruth;
  }
  while (!feof(fp)) {
    tGroundtruth g;
    char str[255];
    if (fscanf(fp, "%s %lf %d %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf",
                   str, &g.truncation, &g.occlusion, &g.box.alpha,
                   &g.box.x1,   &g.box.y1,     &g.box.x2,    &g.box.y2,
                   &g.h,      &g.w,        &g.l,       &g.t1,
                   &g.t2,      &g.t3,        &g.ry )==15) {
      g.box.type = str;
      groundtruth.push_back(g);
    }
  }
  fclose(fp);
  success = true;
  return groundtruth;
}

void saveStats (const vector<double> &precision, const vector<double> &aos, FILE *fp_det, FILE *fp_ori) {

  // save precision to file
  if(precision.empty())
    return;
  for (int32_t i=0; i<precision.size(); i++)
    fprintf(fp_det,"%f ",precision[i]);
  fprintf(fp_det,"\n");

  // save orientation similarity, only if there were no invalid orientation entries in submission (alpha=-10)
  if(aos.empty())
    return;
  for (int32_t i=0; i<aos.size(); i++)
    fprintf(fp_ori,"%f ",aos[i]);
  fprintf(fp_ori,"\n");
}

/*=======================================================================
EVALUATION HELPER FUNCTIONS
=======================================================================*/

// criterion defines whether the overlap is computed with respect to both areas (ground truth and detection)
// or with respect to box a or b (detection and "dontcare" areas)
inline double imageBoxOverlap(tBox a, tBox b, int32_t criterion=-1){

  // overlap is invalid in the beginning
  double o = -1;

  // get overlapping area
  double x1 = max(a.x1, b.x1);
  double y1 = max(a.y1, b.y1);
  double x2 = min(a.x2, b.x2);
  double y2 = min(a.y2, b.y2);

  // compute width and height of overlapping area
  double w = x2-x1;
  double h = y2-y1;

  // set invalid entries to 0 overlap
  if(w<=0 || h<=0)
    return 0;

  // get overlapping areas
  double inter = w*h;
  double a_area = (a.x2-a.x1) * (a.y2-a.y1);
  double b_area = (b.x2-b.x1) * (b.y2-b.y1);

  // intersection over union overlap depending on users choice
  if(criterion==-1)     // union
    o = inter / (a_area+b_area-inter);
  else if(criterion==0) // bbox_a
    o = inter / a_area;
  else if(criterion==1) // bbox_b
    o = inter / b_area;

  // overlap
  return o;
}

inline double imageBoxOverlap(tDetection a, tGroundtruth b, int32_t criterion=-1){
  return imageBoxOverlap(a.box, b.box, criterion);
}

// compute polygon of an oriented bounding box
template <typename T>
Polygon toPolygon(const T& g) {
    using namespace boost::numeric::ublas;
    using namespace boost::geometry;
    matrix<double> mref(2, 2);
    mref(0, 0) = cos(g.ry); mref(0, 1) = sin(g.ry);
    mref(1, 0) = -sin(g.ry); mref(1, 1) = cos(g.ry);

    static int count = 0;
    matrix<double> corners(2, 4);
    double data[] = {g.l / 2, g.l / 2, -g.l / 2, -g.l / 2,
                     g.w / 2, -g.w / 2, -g.w / 2, g.w / 2};
    std::copy(data, data + 8, corners.data().begin());
    matrix<double> gc = prod(mref, corners);
    for (int i = 0; i < 4; ++i) {
        gc(0, i) += g.t1;
        gc(1, i) += g.t3;
    }

    double points[][2] = {{gc(0, 0), gc(1, 0)},{gc(0, 1), gc(1, 1)},{gc(0, 2), gc(1, 2)},{gc(0, 3), gc(1, 3)},{gc(0, 0), gc(1, 0)}};
    Polygon poly;
    append(poly, points);
    return poly;
}

// measure overlap between bird's eye view bounding boxes, parametrized by (ry, l, w, tx, tz)
inline double groundBoxOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {
    using namespace boost::geometry;
    Polygon gp = toPolygon(g);
    Polygon dp = toPolygon(d);

    std::vector<Polygon> in, un;
    intersection(gp, dp, in);
    union_(gp, dp, un);

    double inter_area = in.empty() ? 0 : area(in.front());
    double union_area = area(un.front());
    double o;
    if(criterion==-1)     // union
        o = inter_area / union_area;
    else if(criterion==0) // bbox_a
        o = inter_area / area(dp);
    else if(criterion==1) // bbox_b
        o = inter_area / area(gp);

    return o;
}

// measure overlap between 3D bounding boxes, parametrized by (ry, h, w, l, tx, ty, tz)
inline double box3DOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {
    using namespace boost::geometry;
    Polygon gp = toPolygon(g);
    Polygon dp = toPolygon(d);

    std::vector<Polygon> in, un;
    intersection(gp, dp, in);
    union_(gp, dp, un);

    double ymax = min(d.t2, g.t2);
    double ymin = max(d.t2 - d.h, g.t2 - g.h);

    double inter_area = in.empty() ? 0 : area(in.front());
    double inter_vol = inter_area * max(0.0, ymax - ymin);

    double det_vol = d.h * d.l * d.w;
    double gt_vol = g.h * g.l * g.w;

    double o;
    if(criterion==-1)     // union
        o = inter_vol / (det_vol + gt_vol - inter_vol);
    else if(criterion==0) // bbox_a
        o = inter_vol / det_vol;
    else if(criterion==1) // bbox_b
        o = inter_vol / gt_vol;

    return o;
}

vector<double> getThresholds(vector<double> &v, double n_groundtruth){

  // holds scores needed to compute N_SAMPLE_PTS recall values
  vector<double> t;

  // sort scores in descending order
  // (highest score is assumed to give best/most confident detections)
  sort(v.begin(), v.end(), greater<double>());

  // get scores for linearly spaced recall
  double current_recall = 0;
  for(int32_t i=0; i<v.size(); i++){

    // check if right-hand-side recall with respect to current recall is close than left-hand-side one
    // in this case, skip the current detection score
    double l_recall, r_recall, recall;
    l_recall = (double)(i+1)/n_groundtruth;
    if(i<(v.size()-1))
      r_recall = (double)(i+2)/n_groundtruth;
    else
      r_recall = l_recall;

    if( (r_recall-current_recall) < (current_recall-l_recall) && i<(v.size()-1))
      continue;

    // left recall is the best approximation, so use this and goto next recall step for approximation
    recall = l_recall;

    // the next recall step was reached
    t.push_back(v[i]);
    current_recall += 1.0/(N_SAMPLE_PTS-1.0);
  }
  return t;
}

void cleanData(CLASSES current_class, const vector<tGroundtruth> &gt, const vector<tDetection> &det, vector<int32_t> &ignored_gt, vector<tGroundtruth> &dc, vector<int32_t> &ignored_det, int32_t &n_gt, DIFFICULTY difficulty){

  // extract ground truth bounding boxes for current evaluation class
  for(int32_t i=0;i<gt.size(); i++){

    // only bounding boxes with a minimum height are used for evaluation
    double height = gt[i].box.y2 - gt[i].box.y1;

    // neighboring classes are ignored ("van" for "car" and "person_sitting" for "pedestrian")
    // (lower/upper cases are ignored)
    int32_t valid_class;

    // all classes without a neighboring class
    if(!strcasecmp(gt[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))
      valid_class = 1;

    // classes with a neighboring class
    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), "Pedestrian") && !strcasecmp("Person_sitting", gt[i].box.type.c_str()))
      valid_class = 0;
    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), "Car") && !strcasecmp("Van", gt[i].box.type.c_str()))
      valid_class = 0;

    // classes not used for evaluation
    else
      valid_class = -1;

    // ground truth is ignored, if occlusion, truncation exceeds the difficulty or ground truth is too small
    // (doesn't count as FN nor TP, although detections may be assigned)
    bool ignore = false;
    if(gt[i].occlusion>MAX_OCCLUSION[difficulty] || gt[i].truncation>MAX_TRUNCATION[difficulty] || height<MIN_HEIGHT[difficulty])
      ignore = true;

    // set ignored vector for ground truth
    // current class and not ignored (total no. of ground truth is detected for recall denominator)
    if(valid_class==1 && !ignore){
      ignored_gt.push_back(0);
      n_gt++;
    }

    // neighboring class, or current class but ignored
    else if(valid_class==0 || (ignore && valid_class==1))
      ignored_gt.push_back(1);

    // all other classes which are FN in the evaluation
    else
      ignored_gt.push_back(-1);
  }

  // extract dontcare areas
  for(int32_t i=0;i<gt.size(); i++)
    if(!strcasecmp("DontCare", gt[i].box.type.c_str()))
      dc.push_back(gt[i]);

  // extract detections bounding boxes of the current class
  for(int32_t i=0;i<det.size(); i++){

    // neighboring classes are not evaluated
    int32_t valid_class;
    if(!strcasecmp(det[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))
      valid_class = 1;
    else
      valid_class = -1;

    int32_t height = fabs(det[i].box.y1 - det[i].box.y2);
    // set ignored vector for detections
    if(height<MIN_HEIGHT[difficulty])
      ignored_det.push_back(1);
    else if(valid_class==1)
      ignored_det.push_back(0);
    else
      ignored_det.push_back(-1);
  }
}

tPrData computeStatistics(CLASSES current_class, const vector<tGroundtruth> &gt,
        const vector<tDetection> &det, const vector<tGroundtruth> &dc,
        const vector<int32_t> &ignored_gt, const vector<int32_t>  &ignored_det,
        bool compute_fp, double (*boxoverlap)(tDetection, tGroundtruth, int32_t),
        METRIC metric, bool compute_aos=false, double thresh=0, bool debug=false){

  tPrData stat = tPrData();
  const double NO_DETECTION = -10000000;
  vector<double> delta;            // holds angular difference for TPs (needed for AOS evaluation)
  vector<bool> assigned_detection; // holds wether a detection was assigned to a valid or ignored ground truth
  assigned_detection.assign(det.size(), false);
  vector<bool> ignored_threshold;
  ignored_threshold.assign(det.size(), false); // holds detections with a threshold lower than thresh if FP are computed

  // detections with a low score are ignored for computing precision (needs FP)
  if(compute_fp)
    for(int32_t i=0; i<det.size(); i++)
      if(det[i].thresh<thresh)
        ignored_threshold[i] = true;

  // evaluate all ground truth boxes
  for(int32_t i=0; i<gt.size(); i++){

    // this ground truth is not of the current or a neighboring class and therefore ignored
    if(ignored_gt[i]==-1)
      continue;

    /*=======================================================================
    find candidates (overlap with ground truth > 0.5) (logical len(det))
    =======================================================================*/
    int32_t det_idx          = -1;
    double valid_detection = NO_DETECTION;
    double max_overlap     = 0;

    // search for a possible detection
    bool assigned_ignored_det = false;
    for(int32_t j=0; j<det.size(); j++){

      // detections not of the current class, already assigned or with a low threshold are ignored
      if(ignored_det[j]==-1)
        continue;
      if(assigned_detection[j])
        continue;
      if(ignored_threshold[j])
        continue;

      // find the maximum score for the candidates and get idx of respective detection
      double overlap = boxoverlap(det[j], gt[i], -1);

      // for computing recall thresholds, the candidate with highest score is considered
      if(!compute_fp && overlap>MIN_OVERLAP[metric][current_class] && det[j].thresh>valid_detection){
        det_idx         = j;
        valid_detection = det[j].thresh;
      }

      // for computing pr curve values, the candidate with the greatest overlap is considered
      // if the greatest overlap is an ignored detection (min_height), the overlapping detection is used
      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && (overlap>max_overlap || assigned_ignored_det) && ignored_det[j]==0){
        max_overlap     = overlap;
        det_idx         = j;
        valid_detection = 1;
        assigned_ignored_det = false;
      }
      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && valid_detection==NO_DETECTION && ignored_det[j]==1){
        det_idx              = j;
        valid_detection      = 1;
        assigned_ignored_det = true;
      }
    }

    /*=======================================================================
    compute TP, FP and FN
    =======================================================================*/

    // nothing was assigned to this valid ground truth
    if(valid_detection==NO_DETECTION && ignored_gt[i]==0) {
      stat.fn++;
    }

    // only evaluate valid ground truth <=> detection assignments (considering difficulty level)
    else if(valid_detection!=NO_DETECTION && (ignored_gt[i]==1 || ignored_det[det_idx]==1))
      assigned_detection[det_idx] = true;

    // found a valid true positive
    else if(valid_detection!=NO_DETECTION){

      // write highest score to threshold vector
      stat.tp++;
      stat.v.push_back(det[det_idx].thresh);

      // compute angular difference of detection and ground truth if valid detection orientation was provided
      if(compute_aos)
        delta.push_back(gt[i].box.alpha - det[det_idx].box.alpha);

      // clean up
      assigned_detection[det_idx] = true;
    }
  }

  // if FP are requested, consider stuff area
  if(compute_fp){

    // count fp
    for(int32_t i=0; i<det.size(); i++){

      // count false positives if required (height smaller than required is ignored (ignored_det==1)
      if(!(assigned_detection[i] || ignored_det[i]==-1 || ignored_det[i]==1 || ignored_threshold[i]))
        stat.fp++;
    }

    // do not consider detections overlapping with stuff area
    int32_t nstuff = 0;
    for(int32_t i=0; i<dc.size(); i++){
      for(int32_t j=0; j<det.size(); j++){

        // detections not of the current class, already assigned, with a low threshold or a low minimum height are ignored
        if(assigned_detection[j])
          continue;
        if(ignored_det[j]==-1 || ignored_det[j]==1)
          continue;
        if(ignored_threshold[j])
          continue;

        // compute overlap and assign to stuff area, if overlap exceeds class specific value
        double overlap = boxoverlap(det[j], dc[i], 0);
        if(overlap>MIN_OVERLAP[metric][current_class]){
          assigned_detection[j] = true;
          nstuff++;
        }
      }
    }

    // FP = no. of all not to ground truth assigned detections - detections assigned to stuff areas
    stat.fp -= nstuff;

    // if all orientation values are valid, the AOS is computed
    if(compute_aos){
      vector<double> tmp;

      // FP have a similarity of 0, for all TP compute AOS
      tmp.assign(stat.fp, 0);
      for(int32_t i=0; i<delta.size(); i++)
        tmp.push_back((1.0+cos(delta[i]))/2.0);

      // be sure, that all orientation deltas are computed
      assert(tmp.size()==stat.fp+stat.tp);
      assert(delta.size()==stat.tp);

      // get the mean orientation similarity for this image
      if(stat.tp>0 || stat.fp>0)
        stat.similarity = accumulate(tmp.begin(), tmp.end(), 0.0);

      // there was neither a FP nor a TP, so the similarity is ignored in the evaluation
      else
        stat.similarity = -1;
    }
  }
  return stat;
}

/*=======================================================================
EVALUATE CLASS-WISE
=======================================================================*/

bool eval_class (FILE *fp_det, FILE *fp_ori, CLASSES current_class,
        const vector< vector<tGroundtruth> > &groundtruth,
        const vector< vector<tDetection> > &detections, bool compute_aos,
        double (*boxoverlap)(tDetection, tGroundtruth, int32_t),
        vector<double> &precision, vector<double> &aos,
        DIFFICULTY difficulty, METRIC metric) {
    assert(groundtruth.size() == detections.size());

  // init
  int32_t n_gt=0;                                     // total no. of gt (denominator of recall)
  vector<double> v, thresholds;                       // detection scores, evaluated for recall discretization
  vector< vector<int32_t> > ignored_gt, ignored_det;  // index of ignored gt detection for current class/difficulty
  vector< vector<tGroundtruth> > dontcare;            // index of dontcare areas, included in ground truth

  // for all test images do
  for (int32_t i=0; i<groundtruth.size(); i++){

    // holds ignored ground truth, ignored detections and dontcare areas for current frame
    vector<int32_t> i_gt, i_det;
    vector<tGroundtruth> dc;

    // only evaluate objects of current class and ignore occluded, truncated objects
    cleanData(current_class, groundtruth[i], detections[i], i_gt, dc, i_det, n_gt, difficulty);
    ignored_gt.push_back(i_gt);
    ignored_det.push_back(i_det);
    dontcare.push_back(dc);

    // compute statistics to get recall values
    tPrData pr_tmp = tPrData();
    pr_tmp = computeStatistics(current_class, groundtruth[i], detections[i], dc, i_gt, i_det, false, boxoverlap, metric);

    // add detection scores to vector over all images
    for(int32_t j=0; j<pr_tmp.v.size(); j++)
      v.push_back(pr_tmp.v[j]);
  }

  // get scores that must be evaluated for recall discretization
  thresholds = getThresholds(v, n_gt);

  // compute TP,FP,FN for relevant scores
  vector<tPrData> pr;
  pr.assign(thresholds.size(),tPrData());
  for (int32_t i=0; i<groundtruth.size(); i++){

    // for all scores/recall thresholds do:
    for(int32_t t=0; t<thresholds.size(); t++){
      tPrData tmp = tPrData();
      tmp = computeStatistics(current_class, groundtruth[i], detections[i], dontcare[i],
                              ignored_gt[i], ignored_det[i], true, boxoverlap, metric,
                              compute_aos, thresholds[t], t==38);

      // add no. of TP, FP, FN, AOS for current frame to total evaluation for current threshold
      pr[t].tp += tmp.tp;
      pr[t].fp += tmp.fp;
      pr[t].fn += tmp.fn;
      if(tmp.similarity!=-1)
        pr[t].similarity += tmp.similarity;
    }
  }

  // compute recall, precision and AOS
  vector<double> recall;
  precision.assign(N_SAMPLE_PTS, 0);
  if(compute_aos)
    aos.assign(N_SAMPLE_PTS, 0);
  double r=0;
  for (int32_t i=0; i<thresholds.size(); i++){
    r = pr[i].tp/(double)(pr[i].tp + pr[i].fn);
    recall.push_back(r);
    precision[i] = pr[i].tp/(double)(pr[i].tp + pr[i].fp);
    if(compute_aos)
      aos[i] = pr[i].similarity/(double)(pr[i].tp + pr[i].fp);
  }

  // filter precision and AOS using max_{i..end}(precision)
  for (int32_t i=0; i<thresholds.size(); i++){
    precision[i] = *max_element(precision.begin()+i, precision.end());
    if(compute_aos)
      aos[i] = *max_element(aos.begin()+i, aos.end());
  }

  // save statisics and finish with success
  saveStats(precision, aos, fp_det, fp_ori);
    return true;
}

void saveAndPlotPlots(string dir_name,string file_name,string obj_type,vector<double> vals[],bool is_aos){

  char command[1024];

  // save plot data to file
  FILE *fp = fopen((dir_name + "/" + file_name + ".txt").c_str(),"w");
  printf("save %s\n", (dir_name + "/" + file_name + ".txt").c_str());
  for (int32_t i=0; i<(int)N_SAMPLE_PTS; i++)
    fprintf(fp,"%f %f %f %f\n",(double)i/(N_SAMPLE_PTS-1.0),vals[0][i],vals[1][i],vals[2][i]);
  fclose(fp);

  // create png + eps
  for (int32_t j=0; j<2; j++) {

    // open file
    FILE *fp = fopen((dir_name + "/" + file_name + ".gp").c_str(),"w");

    // save gnuplot instructions
    if (j==0) {
      fprintf(fp,"set term png size 450,315 font \"Helvetica\" 11\n");
      fprintf(fp,"set output \"%s.png\"\n",file_name.c_str());
    } else {
      fprintf(fp,"set term postscript eps enhanced color font \"Helvetica\" 20\n");
      fprintf(fp,"set output \"%s.eps\"\n",file_name.c_str());
    }

    // set labels and ranges
    fprintf(fp,"set size ratio 0.7\n");
    fprintf(fp,"set xrange [0:1]\n");
    fprintf(fp,"set yrange [0:1]\n");
    fprintf(fp,"set xlabel \"Recall\"\n");
    if (!is_aos) fprintf(fp,"set ylabel \"Precision\"\n");
    else         fprintf(fp,"set ylabel \"Orientation Similarity\"\n");
    obj_type[0] = toupper(obj_type[0]);
    fprintf(fp,"set title \"%s\"\n",obj_type.c_str());

    // line width
    int32_t   lw = 5;
    if (j==0) lw = 3;

    // plot error curve
    fprintf(fp,"plot ");
    fprintf(fp,"\"%s.txt\" using 1:2 title 'Easy' with lines ls 1 lw %d,",file_name.c_str(),lw);
    fprintf(fp,"\"%s.txt\" using 1:3 title 'Moderate' with lines ls 2 lw %d,",file_name.c_str(),lw);
    fprintf(fp,"\"%s.txt\" using 1:4 title 'Hard' with lines ls 3 lw %d",file_name.c_str(),lw);

    // close file
    fclose(fp);

    // run gnuplot => create png + eps
    sprintf(command,"cd %s; gnuplot %s",dir_name.c_str(),(file_name + ".gp").c_str());
    system(command);
  }

  // create pdf and crop
  sprintf(command,"cd %s; ps2pdf %s.eps %s_large.pdf",dir_name.c_str(),file_name.c_str(),file_name.c_str());
  system(command);
  sprintf(command,"cd %s; pdfcrop %s_large.pdf %s.pdf",dir_name.c_str(),file_name.c_str(),file_name.c_str());
  system(command);
  sprintf(command,"cd %s; rm %s_large.pdf",dir_name.c_str(),file_name.c_str());
  system(command);
}

bool eval(string result_sha,Mail* mail){

  // set some global parameters
  initGlobals();

  // ground truth and result directories
  string gt_dir         = "data/object/label_2";
  string result_dir     = "results/" + result_sha;
  string plot_dir       = result_dir + "/plot";

  // create output directories
  system(("mkdir " + plot_dir).c_str());

  // hold detections and ground truth in memory
  vector< vector<tGroundtruth> > groundtruth;
  vector< vector<tDetection> >   detections;

  // holds wether orientation similarity shall be computed (might be set to false while loading detections)
  // and which labels where provided by this submission
  bool compute_aos=true;
  vector<bool> eval_image(NUM_CLASS, false);
  vector<bool> eval_ground(NUM_CLASS, false);
  vector<bool> eval_3d(NUM_CLASS, false);

  // for all images read groundtruth and detections
  mail->msg("Loading detections...");
  for (int32_t i=0; i<N_TESTIMAGES; i++) {

    // file name
    char file_name[256];
    sprintf(file_name,"%06d.txt",indices.at(i));

    // read ground truth and result poses
    bool gt_success,det_success;
    vector<tGroundtruth> gt   = loadGroundtruth(gt_dir + "/" + file_name,gt_success);
    vector<tDetection>   det  = loadDetections(result_dir + "/data/" + file_name,
            compute_aos, eval_image, eval_ground, eval_3d, det_success);
    groundtruth.push_back(gt);
    detections.push_back(det);

    // check for errors
    if (!gt_success) {
      mail->msg("ERROR: Couldn't read: %s of ground truth. Please write me an email!", file_name);
      return false;
    }
    if (!det_success) {
      mail->msg("ERROR: Couldn't read: %s", file_name);
      return false;
    }
  }
  mail->msg("  done.");

  // holds pointers for result files
  FILE *fp_det=0, *fp_ori=0;

  // eval image 2D bounding boxes
  for (int c = 0; c < NUM_CLASS; c++) {
    CLASSES cls = (CLASSES)c;
    if (eval_image[c]) {
      fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection.txt").c_str(), "w");
      if(compute_aos)
        fp_ori = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_orientation.txt").c_str(),"w");
      vector<double> precision[3], aos[3];
      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[0], aos[0], EASY, IMAGE)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[1], aos[1], MODERATE, IMAGE)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[2], aos[2], HARD, IMAGE)) {
        mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str());
        return false;
      }
      fclose(fp_det);
      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection", CLASS_NAMES[c], precision, 0);
      if(compute_aos){
        saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_orientation", CLASS_NAMES[c], aos, 1);
        fclose(fp_ori);
      }
    }
  }

  // don't evaluate AOS for birdview boxes and 3D boxes
  compute_aos = false;

  // eval bird's eye view bounding boxes
  for (int c = 0; c < NUM_CLASS; c++) {
    CLASSES cls = (CLASSES)c;
    if (eval_ground[c]) {
      fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection_ground.txt").c_str(), "w");
      vector<double> precision[3], aos[3];
      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[0], aos[0], EASY, GROUND)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[1], aos[1], MODERATE, GROUND)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[2], aos[2], HARD, GROUND)) {
        mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str());
        return false;
      }
      fclose(fp_det);
      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection_ground", CLASS_NAMES[c], precision, 0);
    }
  }

  // eval 3D bounding boxes
  for (int c = 0; c < NUM_CLASS; c++) {
    CLASSES cls = (CLASSES)c;
    if (eval_3d[c]) {
      fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection_3d.txt").c_str(), "w");
      vector<double> precision[3], aos[3];
      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[0], aos[0], EASY, BOX3D)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[1], aos[1], MODERATE, BOX3D)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[2], aos[2], HARD, BOX3D)) {
        mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str());
        return false;
      }
      fclose(fp_det);
      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection_3d", CLASS_NAMES[c], precision, 0);
    }
  }

  // success
  return true;
}

int32_t main (int32_t argc,char *argv[]) {

  // we need 2 or 4 arguments!
  if (argc!=2 && argc!=4) {
    cout << "Usage: ./eval_detection result_sha [user_sha email]" << endl;
    return 1;
  }

  // read arguments
  string result_sha = argv[1];

  // init notification mail
  Mail *mail;
  if (argc==4) mail = new Mail(argv[3]);
  else         mail = new Mail();
  mail->msg("Thank you for participating in our evaluation!");

  // run evaluation
  if (eval(result_sha,mail)) {
    mail->msg("Your evaluation results are available at:");
    mail->msg("http://www.cvlibs.net/datasets/kitti/user_submit_check_login.php?benchmark=object&user=%s&result=%s",argv[2], result_sha.c_str());
  } else {
    system(("rm -r results/" + result_sha).c_str());
    mail->msg("An error occured while processing your results.");
    mail->msg("Please make sure that the data in your zip archive has the right format!");
  }

  // send mail and exit
  delete mail;

  return 0;
}


================================================
FILE: tools/kitti-eval/evaluate_object_3d_offline.cpp
================================================
#include <iostream>
#include <algorithm>
#include <stdio.h>
#include <math.h>
#include <vector>
#include <numeric>
#include <strings.h>
#include <assert.h>

#include <dirent.h>

#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/io.hpp>

#include <boost/geometry.hpp>
#include <boost/geometry/geometries/point_xy.hpp>
#include <boost/geometry/geometries/polygon.hpp>
#include <boost/geometry/geometries/adapted/c_array.hpp>

#include "mail.h"

BOOST_GEOMETRY_REGISTER_C_ARRAY_CS(cs::cartesian)

typedef boost::geometry::model::polygon<boost::geometry::model::d2::point_xy<double> > Polygon;


using namespace std;

/*=======================================================================
STATIC EVALUATION PARAMETERS
=======================================================================*/

// holds the number of test images on the server
const int32_t N_TESTIMAGES = 7518;

// easy, moderate and hard evaluation level
enum DIFFICULTY{EASY=0, MODERATE=1, HARD=2};

// evaluation metrics: image, ground or 3D
enum METRIC{IMAGE=0, GROUND=1, BOX3D=2};

// evaluation parameter
const int32_t MIN_HEIGHT[3]     = {40, 25, 25};     // minimum height for evaluated groundtruth/detections
const int32_t MAX_OCCLUSION[3]  = {0, 1, 2};        // maximum occlusion level of the groundtruth used for evaluation
const double  MAX_TRUNCATION[3] = {0.15, 0.3, 0.5}; // maximum truncation level of the groundtruth used for evaluation

// evaluated object classes
enum CLASSES{CAR=0, PEDESTRIAN=1, CYCLIST=2};
const int NUM_CLASS = 3;

// parameters varying per class
vector<string> CLASS_NAMES;
// the minimum overlap required for 2D evaluation on the image/ground plane and 3D evaluation
// const double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.5, 0.25, 0.25}, {0.5, 0.25, 0.25}};
const double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.7, 0.5, 0.5}, {0.7, 0.5, 0.5}};

// no. of recall steps that should be evaluated (discretized)
const double N_SAMPLE_PTS = 41;

// initialize class names
void initGlobals () {
  CLASS_NAMES.push_back("car");
  CLASS_NAMES.push_back("pedestrian");
  CLASS_NAMES.push_back("cyclist");
}

/*=======================================================================
DATA TYPES FOR EVALUATION
=======================================================================*/

// holding data needed for precision-recall and precision-aos
struct tPrData {
  vector<double> v;           // detection score for computing score thresholds
  double         similarity;  // orientation similarity
  int32_t        tp;          // true positives
  int32_t        fp;          // false positives
  int32_t        fn;          // false negatives
  tPrData () :
    similarity(0), tp(0), fp(0), fn(0) {}
};

// holding bounding boxes for ground truth and detections
struct tBox {
  string  type;     // object type as car, pedestrian or cyclist,...
  double   x1;      // left corner
  double   y1;      // top corner
  double   x2;      // right corner
  double   y2;      // bottom corner
  double   alpha;   // image orientation
  tBox (string type, double x1,double y1,double x2,double y2,double alpha) :
    type(type),x1(x1),y1(y1),x2(x2),y2(y2),alpha(alpha) {}
};

// holding ground truth data
struct tGroundtruth {
  tBox    box;        // object type, box, orientation
  double  truncation; // truncation 0..1
  int32_t occlusion;  // occlusion 0,1,2 (non, partly, fully)
  double ry;
  double  t1, t2, t3;
  double h, w, l;
  tGroundtruth () :
    box(tBox("invalild",-1,-1,-1,-1,-10)),truncation(-1),occlusion(-1) {}
  tGroundtruth (tBox box,double truncation,int32_t occlusion) :
    box(box),truncation(truncation),occlusion(occlusion) {}
  tGroundtruth (string type,double x1,double y1,double x2,double y2,double alpha,double truncation,int32_t occlusion) :
    box(tBox(type,x1,y1,x2,y2,alpha)),truncation(truncation),occlusion(occlusion) {}
};

// holding detection data
struct tDetection {
  tBox    box;    // object type, box, orientation
  double  thresh; // detection score
  double  ry;
  double  t1, t2, t3;
  double  h, w, l;
  tDetection ():
    box(tBox("invalid",-1,-1,-1,-1,-10)),thresh(-1000) {}
  tDetection (tBox box,double thresh) :
    box(box),thresh(thresh) {}
  tDetection (string type,double x1,double y1,double x2,double y2,double alpha,double thresh) :
    box(tBox(type,x1,y1,x2,y2,alpha)),thresh(thresh) {}
};


/*=======================================================================
FUNCTIONS TO LOAD DETECTION AND GROUND TRUTH DATA ONCE, SAVE RESULTS
=======================================================================*/
vector<int32_t> indices;

vector<tDetection> loadDetections(string file_name, bool &compute_aos,
        vector<bool> &eval_image, vector<bool> &eval_ground,
        vector<bool> &eval_3d, bool &success) {

  // holds all detections (ignored detections are indicated by an index vector
  vector<tDetection> detections;
  FILE *fp = fopen(file_name.c_str(),"r");
  if (!fp) {
    success = false;
    return detections;
  }
  while (!feof(fp)) {
    tDetection d;
    double trash;
    char str[255];
    if (fscanf(fp, "%s %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf",
                   str, &trash, &trash, &d.box.alpha, &d.box.x1, &d.box.y1,
                   &d.box.x2, &d.box.y2, &d.h, &d.w, &d.l, &d.t1, &d.t2, &d.t3,
                   &d.ry, &d.thresh)==16) {

        // d.thresh = 1;
      d.box.type = str;
      detections.push_back(d);

      // orientation=-10 is invalid, AOS is not evaluated if at least one orientation is invalid
      if(d.box.alpha == -10)
        compute_aos = false;

      // a class is only evaluated if it is detected at least once
      for (int c = 0; c < NUM_CLASS; c++) {
        if (!strcasecmp(d.box.type.c_str(), CLASS_NAMES[c].c_str())) {
          if (!eval_image[c] && d.box.x1 >= 0)
            eval_image[c] = true;
          if (!eval_ground[c] && d.t1 != -1000)
            eval_ground[c] = true;
          if (!eval_3d[c] && d.t2 != -1000)
            eval_3d[c] = true;
          break;
        }
      }
    }
  }
  fclose(fp);
  success = true;
  return detections;
}

vector<tGroundtruth> loadGroundtruth(string file_name,bool &success) {

  // holds all ground truth (ignored ground truth is indicated by an index vector
  vector<tGroundtruth> groundtruth;
  FILE *fp = fopen(file_name.c_str(),"r");
  if (!fp) {
    success = false;
    return groundtruth;
  }
  while (!feof(fp)) {
    tGroundtruth g;
    char str[255];
    if (fscanf(fp, "%s %lf %d %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf",
                   str, &g.truncation, &g.occlusion, &g.box.alpha,
                   &g.box.x1,   &g.box.y1,     &g.box.x2,    &g.box.y2,
                   &g.h,      &g.w,        &g.l,       &g.t1,
                   &g.t2,      &g.t3,        &g.ry )==15) {
      g.box.type = str;
      groundtruth.push_back(g);
    }
  }
  fclose(fp);
  success = true;
  return groundtruth;
}

void saveStats (const vector<double> &precision, const vector<double> &aos, FILE *fp_det, FILE *fp_ori) {

  // save precision to file
  if(precision.empty())
    return;
  for (int32_t i=0; i<precision.size(); i++)
    fprintf(fp_det,"%f ",precision[i]);
  fprintf(fp_det,"\n");

  // save orientation similarity, only if there were no invalid orientation entries in submission (alpha=-10)
  if(aos.empty())
    return;
  for (int32_t i=0; i<aos.size(); i++)
    fprintf(fp_ori,"%f ",aos[i]);
  fprintf(fp_ori,"\n");
}

/*=======================================================================
EVALUATION HELPER FUNCTIONS
=======================================================================*/

// criterion defines whether the overlap is computed with respect to both areas (ground truth and detection)
// or with respect to box a or b (detection and "dontcare" areas)
inline double imageBoxOverlap(tBox a, tBox b, int32_t criterion=-1){

  // overlap is invalid in the beginning
  double o = -1;

  // get overlapping area
  double x1 = max(a.x1, b.x1);
  double y1 = max(a.y1, b.y1);
  double x2 = min(a.x2, b.x2);
  double y2 = min(a.y2, b.y2);

  // compute width and height of overlapping area
  double w = x2-x1;
  double h = y2-y1;

  // set invalid entries to 0 overlap
  if(w<=0 || h<=0)
    return 0;

  // get overlapping areas
  double inter = w*h;
  double a_area = (a.x2-a.x1) * (a.y2-a.y1);
  double b_area = (b.x2-b.x1) * (b.y2-b.y1);

  // intersection over union overlap depending on users choice
  if(criterion==-1)     // union
    o = inter / (a_area+b_area-inter);
  else if(criterion==0) // bbox_a
    o = inter / a_area;
  else if(criterion==1) // bbox_b
    o = inter / b_area;

  // overlap
  return o;
}

inline double imageBoxOverlap(tDetection a, tGroundtruth b, int32_t criterion=-1){
  return imageBoxOverlap(a.box, b.box, criterion);
}

// compute polygon of an oriented bounding box
template <typename T>
Polygon toPolygon(const T& g) {
    using namespace boost::numeric::ublas;
    using namespace boost::geometry;
    matrix<double> mref(2, 2);
    mref(0, 0) = cos(g.ry); mref(0, 1) = sin(g.ry);
    mref(1, 0) = -sin(g.ry); mref(1, 1) = cos(g.ry);

    static int count = 0;
    matrix<double> corners(2, 4);
    double data[] = {g.l / 2, g.l / 2, -g.l / 2, -g.l / 2,
                     g.w / 2, -g.w / 2, -g.w / 2, g.w / 2};
    std::copy(data, data + 8, corners.data().begin());
    matrix<double> gc = prod(mref, corners);
    for (int i = 0; i < 4; ++i) {
        gc(0, i) += g.t1;
        gc(1, i) += g.t3;
    }

    double points[][2] = {{gc(0, 0), gc(1, 0)},{gc(0, 1), gc(1, 1)},{gc(0, 2), gc(1, 2)},{gc(0, 3), gc(1, 3)},{gc(0, 0), gc(1, 0)}};
    Polygon poly;
    append(poly, points);
    return poly;
}

// measure overlap between bird's eye view bounding boxes, parametrized by (ry, l, w, tx, tz)
inline double groundBoxOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {
    using namespace boost::geometry;
    Polygon gp = toPolygon(g);
    Polygon dp = toPolygon(d);

    std::vector<Polygon> in, un;
    intersection(gp, dp, in);
    union_(gp, dp, un);

    double inter_area = in.empty() ? 0 : area(in.front());
    double union_area = area(un.front());
    double o;
    if(criterion==-1)     // union
        o = inter_area / union_area;
    else if(criterion==0) // bbox_a
        o = inter_area / area(dp);
    else if(criterion==1) // bbox_b
        o = inter_area / area(gp);

    return o;
}

// measure overlap between 3D bounding boxes, parametrized by (ry, h, w, l, tx, ty, tz)
inline double box3DOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {
    using namespace boost::geometry;
    Polygon gp = toPolygon(g);
    Polygon dp = toPolygon(d);

    std::vector<Polygon> in, un;
    intersection(gp, dp, in);
    union_(gp, dp, un);

    double ymax = min(d.t2, g.t2);
    double ymin = max(d.t2 - d.h, g.t2 - g.h);

    double inter_area = in.empty() ? 0 : area(in.front());
    double inter_vol = inter_area * max(0.0, ymax - ymin);

    double det_vol = d.h * d.l * d.w;
    double gt_vol = g.h * g.l * g.w;

    double o;
    if(criterion==-1)     // union
        o = inter_vol / (det_vol + gt_vol - inter_vol);
    else if(criterion==0) // bbox_a
        o = inter_vol / det_vol;
    else if(criterion==1) // bbox_b
        o = inter_vol / gt_vol;

    return o;
}

vector<double> getThresholds(vector<double> &v, double n_groundtruth){

  // holds scores needed to compute N_SAMPLE_PTS recall values
  vector<double> t;

  // sort scores in descending order
  // (highest score is assumed to give best/most confident detections)
  sort(v.begin(), v.end(), greater<double>());

  // get scores for linearly spaced recall
  double current_recall = 0;
  for(int32_t i=0; i<v.size(); i++){

    // check if right-hand-side recall with respect to current recall is close than left-hand-side one
    // in this case, skip the current detection score
    double l_recall, r_recall, recall;
    l_recall = (double)(i+1)/n_groundtruth;
    if(i<(v.size()-1))
      r_recall = (double)(i+2)/n_groundtruth;
    else
      r_recall = l_recall;

    if( (r_recall-current_recall) < (current_recall-l_recall) && i<(v.size()-1))
      continue;

    // left recall is the best approximation, so use this and goto next recall step for approximation
    recall = l_recall;

    // the next recall step was reached
    t.push_back(v[i]);
    current_recall += 1.0/(N_SAMPLE_PTS-1.0);
  }
  return t;
}

void cleanData(CLASSES current_class, const vector<tGroundtruth> &gt, const vector<tDetection> &det, vector<int32_t> &ignored_gt, vector<tGroundtruth> &dc, vector<int32_t> &ignored_det, int32_t &n_gt, DIFFICULTY difficulty){

  // extract ground truth bounding boxes for current evaluation class
  for(int32_t i=0;i<gt.size(); i++){

    // only bounding boxes with a minimum height are used for evaluation
    double height = gt[i].box.y2 - gt[i].box.y1;

    // neighboring classes are ignored ("van" for "car" and "person_sitting" for "pedestrian")
    // (lower/upper cases are ignored)
    int32_t valid_class;

    // all classes without a neighboring class
    if(!strcasecmp(gt[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))
      valid_class = 1;

    // classes with a neighboring class
    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), "Pedestrian") && !strcasecmp("Person_sitting", gt[i].box.type.c_str()))
      valid_class = 0;
    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), "Car") && !strcasecmp("Van", gt[i].box.type.c_str()))
      valid_class = 0;

    // classes not used for evaluation
    else
      valid_class = -1;

    // ground truth is ignored, if occlusion, truncation exceeds the difficulty or ground truth is too small
    // (doesn't count as FN nor TP, although detections may be assigned)
    bool ignore = false;
    if(gt[i].occlusion>MAX_OCCLUSION[difficulty] || gt[i].truncation>MAX_TRUNCATION[difficulty] || height<MIN_HEIGHT[difficulty])
      ignore = true;

    // set ignored vector for ground truth
    // current class and not ignored (total no. of ground truth is detected for recall denominator)
    if(valid_class==1 && !ignore){
      ignored_gt.push_back(0);
      n_gt++;
    }

    // neighboring class, or current class but ignored
    else if(valid_class==0 || (ignore && valid_class==1))
      ignored_gt.push_back(1);

    // all other classes which are FN in the evaluation
    else
      ignored_gt.push_back(-1);
  }

  // extract dontcare areas
  for(int32_t i=0;i<gt.size(); i++)
    if(!strcasecmp("DontCare", gt[i].box.type.c_str()))
      dc.push_back(gt[i]);

  // extract detections bounding boxes of the current class
  for(int32_t i=0;i<det.size(); i++){

    // neighboring classes are not evaluated
    int32_t valid_class;
    if(!strcasecmp(det[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))
      valid_class = 1;
    else
      valid_class = -1;

    int32_t height = fabs(det[i].box.y1 - det[i].box.y2);

    // set ignored vector for detections
    if(height<MIN_HEIGHT[difficulty])
      ignored_det.push_back(1);
    else if(valid_class==1)
      ignored_det.push_back(0);
    else
      ignored_det.push_back(-1);
  }
}

tPrData computeStatistics(CLASSES current_class, const vector<tGroundtruth> &gt,
        const vector<tDetection> &det, const vector<tGroundtruth> &dc,
        const vector<int32_t> &ignored_gt, const vector<int32_t>  &ignored_det,
        bool compute_fp, double (*boxoverlap)(tDetection, tGroundtruth, int32_t),
        METRIC metric, bool compute_aos=false, double thresh=0, bool debug=false){

  tPrData stat = tPrData();
  const double NO_DETECTION = -10000000;
  vector<double> delta;            // holds angular difference for TPs (needed for AOS evaluation)
  vector<bool> assigned_detection; // holds wether a detection was assigned to a valid or ignored ground truth
  assigned_detection.assign(det.size(), false);
  vector<bool> ignored_threshold;
  ignored_threshold.assign(det.size(), false); // holds detections with a threshold lower than thresh if FP are computed

  // detections with a low score are ignored for computing precision (needs FP)
  if(compute_fp)
    for(int32_t i=0; i<det.size(); i++)
      if(det[i].thresh<thresh)
        ignored_threshold[i] = true;

  // evaluate all ground truth boxes
  for(int32_t i=0; i<gt.size(); i++){

    // this ground truth is not of the current or a neighboring class and therefore ignored
    if(ignored_gt[i]==-1)
      continue;

    /*=======================================================================
    find candidates (overlap with ground truth > 0.5) (logical len(det))
    =======================================================================*/
    int32_t det_idx          = -1;
    double valid_detection = NO_DETECTION;
    double max_overlap     = 0;

    // search for a possible detection
    bool assigned_ignored_det = false;
    for(int32_t j=0; j<det.size(); j++){

      // detections not of the current class, already assigned or with a low threshold are ignored
      if(ignored_det[j]==-1)
        continue;
      if(assigned_detection[j])
        continue;
      if(ignored_threshold[j])
        continue;

      // find the maximum score for the candidates and get idx of respective detection
      double overlap = boxoverlap(det[j], gt[i], -1);

      // for computing recall thresholds, the candidate with highest score is considered
      if(!compute_fp && overlap>MIN_OVERLAP[metric][current_class] && det[j].thresh>valid_detection){
        det_idx         = j;
        valid_detection = det[j].thresh;
      }

      // for computing pr curve values, the candidate with the greatest overlap is considered
      // if the greatest overlap is an ignored detection (min_height), the overlapping detection is used
      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && (overlap>max_overlap || assigned_ignored_det) && ignored_det[j]==0){
        max_overlap     = overlap;
        det_idx         = j;
        valid_detection = 1;
        assigned_ignored_det = false;
      }
      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && valid_detection==NO_DETECTION && ignored_det[j]==1){
        det_idx              = j;
        valid_detection      = 1;
        assigned_ignored_det = true;
      }
    }

    /*=======================================================================
    compute TP, FP and FN
    =======================================================================*/

    // nothing was assigned to this valid ground truth
    if(valid_detection==NO_DETECTION && ignored_gt[i]==0) {
      stat.fn++;
    }

    // only evaluate valid ground truth <=> detection assignments (considering difficulty level)
    else if(valid_detection!=NO_DETECTION && (ignored_gt[i]==1 || ignored_det[det_idx]==1))
      assigned_detection[det_idx] = true;

    // found a valid true positive
    else if(valid_detection!=NO_DETECTION){

      // write highest score to threshold vector
      stat.tp++;
      stat.v.push_back(det[det_idx].thresh);

      // compute angular difference of detection and ground truth if valid detection orientation was provided
      if(compute_aos)
        delta.push_back(gt[i].box.alpha - det[det_idx].box.alpha);
      // test use ry as the error measure
	//delta.push_back(gt[i].ry - det[det_idx].ry);

      // clean up
      assigned_detection[det_idx] = true;
    }
  }

  // if FP are requested, consider stuff area
  if(compute_fp){

    // count fp
    for(int32_t i=0; i<det.size(); i++){

      // count false positives if required (height smaller than required is ignored (ignored_det==1)
      if(!(assigned_detection[i] || ignored_det[i]==-1 || ignored_det[i]==1 || ignored_threshold[i]))
        stat.fp++;
    }

    // do not consider detections overlapping with stuff area
    int32_t nstuff = 0;
    for(int32_t i=0; i<dc.size(); i++){
      for(int32_t j=0; j<det.size(); j++){

        // detections not of the current class, already assigned, with a low threshold or a low minimum height are ignored
        if(assigned_detection[j])
          continue;
        if(ignored_det[j]==-1 || ignored_det[j]==1)
          continue;
        if(ignored_threshold[j])
          continue;

        // compute overlap and assign to stuff area, if overlap exceeds class specific value
        double overlap = boxoverlap(det[j], dc[i], 0);
        if(overlap>MIN_OVERLAP[metric][current_class]){
          assigned_detection[j] = true;
          nstuff++;
        }
      }
    }

    // FP = no. of all not to ground truth assigned detections - detections assigned to stuff areas
    stat.fp -= nstuff;

    // if all orientation values are valid, the AOS is computed
    if(compute_aos){
      vector<double> tmp;

      // FP have a similarity of 0, for all TP compute AOS
      tmp.assign(stat.fp, 0);
      for(int32_t i=0; i<delta.size(); i++)
        tmp.push_back((1.0+cos(delta[i]))/2.0);

      // be sure, that all orientation deltas are computed
      assert(tmp.size()==stat.fp+stat.tp);
      assert(delta.size()==stat.tp);

      // get the mean orientation similarity for this image
      if(stat.tp>0 || stat.fp>0)
        stat.similarity = accumulate(tmp.begin(), tmp.end(), 0.0);

      // there was neither a FP nor a TP, so the similarity is ignored in the evaluation
      else
        stat.similarity = -1;
    }
  }
  return stat;
}

/*=======================================================================
EVALUATE CLASS-WISE
=======================================================================*/

bool eval_class (FILE *fp_det, FILE *fp_ori, CLASSES current_class,
        const vector< vector<tGroundtruth> > &groundtruth,
        const vector< vector<tDetection> > &detections, bool compute_aos,
        double (*boxoverlap)(tDetection, tGroundtruth, int32_t),
        vector<double> &precision, vector<double> &aos,
        DIFFICULTY difficulty, METRIC metric) {
    assert(groundtruth.size() == detections.size());

  // init
  int32_t n_gt=0;                                     // total no. of gt (denominator of recall)
  vector<double> v, thresholds;                       // detection scores, evaluated for recall discretization
  vector< vector<int32_t> > ignored_gt, ignored_det;  // index of ignored gt detection for current class/difficulty
  vector< vector<tGroundtruth> > dontcare;            // index of dontcare areas, included in ground truth

  // for all test images do
  for (int32_t i=0; i<groundtruth.size(); i++){

    // holds ignored ground truth, ignored detections and dontcare areas for current frame
    vector<int32_t> i_gt, i_det;
    vector<tGroundtruth> dc;

    // only evaluate objects of current class and ignore occluded, truncated objects
    cleanData(current_class, groundtruth[i], detections[i], i_gt, dc, i_det, n_gt, difficulty);
    ignored_gt.push_back(i_gt);
    ignored_det.push_back(i_det);
    dontcare.push_back(dc);

    // compute statistics to get recall values
    tPrData pr_tmp = tPrData();
    pr_tmp = computeStatistics(current_class, groundtruth[i], detections[i], dc, i_gt, i_det, false, boxoverlap, metric);

    // add detection scores to vector over all images
    for(int32_t j=0; j<pr_tmp.v.size(); j++)
      v.push_back(pr_tmp.v[j]);
  }

  // get scores that must be evaluated for recall discretization
  thresholds = getThresholds(v, n_gt);

  // compute TP,FP,FN for relevant scores
  vector<tPrData> pr;
  pr.assign(thresholds.size(),tPrData());
  for (int32_t i=0; i<groundtruth.size(); i++){

    // for all scores/recall thresholds do:
    for(int32_t t=0; t<thresholds.size(); t++){
      tPrData tmp = tPrData();
      tmp = computeStatistics(current_class, groundtruth[i], detections[i], dontcare[i],
                              ignored_gt[i], ignored_det[i], true, boxoverlap, metric,
                              compute_aos, thresholds[t], t==38);

      // add no. of TP, FP, FN, AOS for current frame to total evaluation for current threshold
      pr[t].tp += tmp.tp;
      pr[t].fp += tmp.fp;
      pr[t].fn += tmp.fn;
      if(tmp.similarity!=-1)
        pr[t].similarity += tmp.similarity;
    }
  }

  // compute recall, precision and AOS
  vector<double> recall;
  precision.assign(N_SAMPLE_PTS, 0);
  if(compute_aos)
    aos.assign(N_SAMPLE_PTS, 0);
  double r=0;
  for (int32_t i=0; i<thresholds.size(); i++){
    r = pr[i].tp/(double)(pr[i].tp + pr[i].fn);
    recall.push_back(r);
    precision[i] = pr[i].tp/(double)(pr[i].tp + pr[i].fp);
    if(compute_aos)
      aos[i] = pr[i].similarity/(double)(pr[i].tp + pr[i].fp);
  }

  // filter precision and AOS using max_{i..end}(precision)
  for (int32_t i=0; i<thresholds.size(); i++){
    precision[i] = *max_element(precision.begin()+i, precision.end());
    if(compute_aos)
      aos[i] = *max_element(aos.begin()+i, aos.end());
  }

  // save statisics and finish with success
  saveStats(precision, aos, fp_det, fp_ori);
    return true;
}

void saveAndPlotPlots(string dir_name,string file_name,string obj_type,vector<double> vals[],bool is_aos){

  char command[1024];

  // save plot data to file
  FILE *fp = fopen((dir_name + "/" + file_name + ".txt").c_str(),"w");
  printf("save %s\n", (dir_name + "/" + file_name + ".txt").c_str());
  for (int32_t i=0; i<(int)N_SAMPLE_PTS; i++)
    fprintf(fp,"%f %f %f %f\n",(double)i/(N_SAMPLE_PTS-1.0),vals[0][i],vals[1][i],vals[2][i]);
  fclose(fp);

  float sum[3] = {0, 0, 0};
  for (int v = 0; v < 3; ++v)
      for (int i = 0; i < vals[v].size(); i = i + 4)
          sum[v] += vals[v][i];
  printf("%s AP: %f %f %f\n", file_name.c_str(), sum[0] / 11 * 100, sum[1] / 11 * 100, sum[2] / 11 * 100);


  // create png + eps
  for (int32_t j=0; j<2; j++) {

    // open file
    FILE *fp = fopen((dir_name + "/" + file_name + ".gp").c_str(),"w");

    // save gnuplot instructions
    if (j==0) {
      fprintf(fp,"set term png size 450,315 font \"Helvetica\" 11\n");
      fprintf(fp,"set output \"%s.png\"\n",file_name.c_str());
    } else {
      fprintf(fp,"set term postscript eps enhanced color font \"Helvetica\" 20\n");
      fprintf(fp,"set output \"%s.eps\"\n",file_name.c_str());
    }

    // set labels and ranges
    fprintf(fp,"set size ratio 0.7\n");
    fprintf(fp,"set xrange [0:1]\n");
    fprintf(fp,"set yrange [0:1]\n");
    fprintf(fp,"set xlabel \"Recall\"\n");
    if (!is_aos) fprintf(fp,"set ylabel \"Precision\"\n");
    else         fprintf(fp,"set ylabel \"Orientation Similarity\"\n");
    obj_type[0] = toupper(obj_type[0]);
    fprintf(fp,"set title \"%s\"\n",obj_type.c_str());

    // line width
    int32_t   lw = 5;
    if (j==0) lw = 3;

    // plot error curve
    fprintf(fp,"plot ");
    fprintf(fp,"\"%s.txt\" using 1:2 title 'Easy' with lines ls 1 lw %d,",file_name.c_str(),lw);
    fprintf(fp,"\"%s.txt\" using 1:3 title 'Moderate' with lines ls 2 lw %d,",file_name.c_str(),lw);
    fprintf(fp,"\"%s.txt\" using 1:4 title 'Hard' with lines ls 3 lw %d",file_name.c_str(),lw);

    // close file
    fclose(fp);

    // run gnuplot => create png + eps
    sprintf(command,"cd %s; gnuplot %s",dir_name.c_str(),(file_name + ".gp").c_str());
    system(command);
  }

  // create pdf and crop
  sprintf(command,"cd %s; ps2pdf %s.eps %s_large.pdf",dir_name.c_str(),file_name.c_str(),file_name.c_str());
  system(command);
  sprintf(command,"cd %s; pdfcrop %s_large.pdf %s.pdf",dir_name.c_str(),file_name.c_str(),file_name.c_str());
  system(command);
  sprintf(command,"cd %s; rm %s_large.pdf",dir_name.c_str(),file_name.c_str());
  system(command);
}

vector<int32_t> getEvalIndices(const string& result_dir) {

    DIR* dir;
    dirent* entity;
    dir = opendir(result_dir.c_str());
    if (dir) {
        while (entity = readdir(dir)) {
            string path(entity->d_name);
            int32_t len = path.size();
            if (len < 10) continue;
            int32_t index = atoi(path.substr(len - 10, 10).c_str());
            indices.push_back(index);
        }
    }
    return indices;
}

bool eval(string gt_dir, string result_dir, Mail* mail){

  // set some global parameters
  initGlobals();

  // ground truth and result directories
  // string gt_dir         = "data/object/label_2";
  // string result_dir     = "results/" + result_sha;
  string plot_dir       = result_dir + "/plot";

  // create output directories
  system(("mkdir " + plot_dir).c_str());

  // hold detections and ground truth in memory
  vector< vector<tGroundtruth> > groundtruth;
  vector< vector<tDetection> >   detections;

  // holds wether orientation similarity shall be computed (might be set to false while loading detections)
  // and which labels where provided by this submission
  bool compute_aos=true;
  vector<bool> eval_image(NUM_CLASS, false);
  vector<bool> eval_ground(NUM_CLASS, false);
  vector<bool> eval_3d(NUM_CLASS, false);

  // for all images read groundtruth and detections
  mail->msg("Loading detections...");
  std::vector<int32_t> indices = getEvalIndices(result_dir + "/data/");
  printf("number of files for evaluation: %d\n", (int)indices.size());

  for (int32_t i=0; i<indices.size(); i++) {

    // file name
    char file_name[256];
    sprintf(file_name,"%06d.txt",indices.at(i));

    // read ground truth and result poses
    bool gt_success,det_success;
    vector<tGroundtruth> gt   = loadGroundtruth(gt_dir + "/" + file_name,gt_success);
    vector<tDetection>   det  = loadDetections(result_dir + "/data/" + file_name,
            compute_aos, eval_image, eval_ground, eval_3d, det_success);
    groundtruth.push_back(gt);
    detections.push_back(det);

    // check for errors
    if (!gt_success) {
      mail->msg("ERROR: Couldn't read: %s of ground truth. Please write me an email!", file_name);
      return false;
    }
    if (!det_success) {
      mail->msg("ERROR: Couldn't read: %s", file_name);
      return false;
    }
  }
  mail->msg("  done.");

  // holds pointers for result files
  FILE *fp_det=0, *fp_ori=0;

  // eval image 2D bounding boxes
  for (int c = 0; c < NUM_CLASS; c++) {
    CLASSES cls = (CLASSES)c;
    if (eval_image[c]) {
      fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection.txt").c_str(), "w");
      if(compute_aos)
        fp_ori = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_orientation.txt").c_str(),"w");
      vector<double> precision[3], aos[3];
      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[0], aos[0], EASY, IMAGE)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[1], aos[1], MODERATE, IMAGE)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[2], aos[2], HARD, IMAGE)) {
        mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str());
        return false;
      }
      fclose(fp_det);
      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection", CLASS_NAMES[c], precision, 0);
      if(compute_aos){
        saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_orientation", CLASS_NAMES[c], aos, 1);
        fclose(fp_ori);
      }
    }
  }

  // don't evaluate AOS for birdview boxes and 3D boxes
  compute_aos = false;

  // eval bird's eye view bounding boxes
  for (int c = 0; c < NUM_CLASS; c++) {
    CLASSES cls = (CLASSES)c;
    if (eval_ground[c]) {
      fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection_ground.txt").c_str(), "w");
      vector<double> precision[3], aos[3];
      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[0], aos[0], EASY, GROUND)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[1], aos[1], MODERATE, GROUND)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[2], aos[2], HARD, GROUND)) {
        mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str());
        return false;
      }
      fclose(fp_det);
      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection_ground", CLASS_NAMES[c], precision, 0);
    }
  }

  // eval 3D bounding boxes
  for (int c = 0; c < NUM_CLASS; c++) {
    CLASSES cls = (CLASSES)c;
    if (eval_3d[c]) {
      fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection_3d.txt").c_str(), "w");
      vector<double> precision[3], aos[3];
      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[0], aos[0], EASY, BOX3D)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[1], aos[1], MODERATE, BOX3D)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[2], aos[2], HARD, BOX3D)) {
        mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str());
        return false;
      }
      fclose(fp_det);
      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection_3d", CLASS_NAMES[c], precision, 0);
    }
  }

  // success
  return true;
}

int32_t main (int32_t argc,char *argv[]) {

  // we need 2 or 4 arguments!
  if (argc!=3) {
    cout << "Usage: ./eval_detection_3d_offline gt_dir result_dir" << endl;
    return 1;
  }

  // read arguments
  string gt_dir = argv[1];
  string result_dir = argv[2];

  // init notification mail
  Mail *mail;
  mail = new Mail();
  mail->msg("Thank you for participating in our evaluation!");

  // run evaluation
  if (eval(gt_dir, result_dir, mail)) {
    mail->msg("Your evaluation results are available at:");
    mail->msg(result_dir.c_str());
  } else {
    system(("rm -r " + result_dir + "/plot").c_str());
    mail->msg("An error occured while processing your results.");
  }

  // send mail and exit
  delete mail;

  return 0;
}


================================================
FILE: tools/kitti-eval/evaluate_object_3d_offline_r40.cpp
================================================
#include <iostream>
#include <algorithm>
#include <stdio.h>
#include <math.h>
#include <vector>
#include <numeric>
#include <strings.h>
#include <assert.h>

#include <dirent.h>

#include <boost/numeric/ublas/matrix.hpp>
#include <boost/numeric/ublas/io.hpp>

#include <boost/geometry.hpp>
#include <boost/geometry/geometries/point_xy.hpp>
#include <boost/geometry/geometries/polygon.hpp>
#include <boost/geometry/geometries/adapted/c_array.hpp>

#include "mail.h"

BOOST_GEOMETRY_REGISTER_C_ARRAY_CS(cs::cartesian)

typedef boost::geometry::model::polygon<boost::geometry::model::d2::point_xy<double> > Polygon;


using namespace std;

/*=======================================================================
STATIC EVALUATION PARAMETERS
=======================================================================*/

// holds the number of test images on the server
const int32_t N_TESTIMAGES = 7518;

// easy, moderate and hard evaluation level
enum DIFFICULTY{EASY=0, MODERATE=1, HARD=2};

// evaluation metrics: image, ground or 3D
enum METRIC{IMAGE=0, GROUND=1, BOX3D=2};

// evaluation parameter
const int32_t MIN_HEIGHT[3]     = {40, 25, 25};     // minimum height for evaluated groundtruth/detections
const int32_t MAX_OCCLUSION[3]  = {0, 1, 2};        // maximum occlusion level of the groundtruth used for evaluation
const double  MAX_TRUNCATION[3] = {0.15, 0.3, 0.5}; // maximum truncation level of the groundtruth used for evaluation

// evaluated object classes
enum CLASSES{CAR=0, PEDESTRIAN=1, CYCLIST=2};
const int NUM_CLASS = 3;

// parameters varying per class
vector<string> CLASS_NAMES;
// the minimum overlap required for 2D evaluation on the image/ground plane and 3D evaluation
// const double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.5, 0.25, 0.25}, {0.5, 0.25, 0.25}};
const double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.7, 0.5, 0.5}, {0.7, 0.5, 0.5}};

// no. of recall steps that should be evaluated (discretized)
const double N_SAMPLE_PTS = 41;

// initialize class names
void initGlobals () {
  CLASS_NAMES.push_back("car");
  CLASS_NAMES.push_back("pedestrian");
  CLASS_NAMES.push_back("cyclist");
}

/*=======================================================================
DATA TYPES FOR EVALUATION
=======================================================================*/

// holding data needed for precision-recall and precision-aos
struct tPrData {
  vector<double> v;           // detection score for computing score thresholds
  double         similarity;  // orientation similarity
  int32_t        tp;          // true positives
  int32_t        fp;          // false positives
  int32_t        fn;          // false negatives
  tPrData () :
    similarity(0), tp(0), fp(0), fn(0) {}
};

// holding bounding boxes for ground truth and detections
struct tBox {
  string  type;     // object type as car, pedestrian or cyclist,...
  double   x1;      // left corner
  double   y1;      // top corner
  double   x2;      // right corner
  double   y2;      // bottom corner
  double   alpha;   // image orientation
  tBox (string type, double x1,double y1,double x2,double y2,double alpha) :
    type(type),x1(x1),y1(y1),x2(x2),y2(y2),alpha(alpha) {}
};

// holding ground truth data
struct tGroundtruth {
  tBox    box;        // object type, box, orientation
  double  truncation; // truncation 0..1
  int32_t occlusion;  // occlusion 0,1,2 (non, partly, fully)
  double ry;
  double  t1, t2, t3;
  double h, w, l;
  tGroundtruth () :
    box(tBox("invalild",-1,-1,-1,-1,-10)),truncation(-1),occlusion(-1) {}
  tGroundtruth (tBox box,double truncation,int32_t occlusion) :
    box(box),truncation(truncation),occlusion(occlusion) {}
  tGroundtruth (string type,double x1,double y1,double x2,double y2,double alpha,double truncation,int32_t occlusion) :
    box(tBox(type,x1,y1,x2,y2,alpha)),truncation(truncation),occlusion(occlusion) {}
};

// holding detection data
struct tDetection {
  tBox    box;    // object type, box, orientation
  double  thresh; // detection score
  double  ry;
  double  t1, t2, t3;
  double  h, w, l;
  tDetection ():
    box(tBox("invalid",-1,-1,-1,-1,-10)),thresh(-1000) {}
  tDetection (tBox box,double thresh) :
    box(box),thresh(thresh) {}
  tDetection (string type,double x1,double y1,double x2,double y2,double alpha,double thresh) :
    box(tBox(type,x1,y1,x2,y2,alpha)),thresh(thresh) {}
};


/*=======================================================================
FUNCTIONS TO LOAD DETECTION AND GROUND TRUTH DATA ONCE, SAVE RESULTS
=======================================================================*/
vector<int32_t> indices;

vector<tDetection> loadDetections(string file_name, bool &compute_aos,
        vector<bool> &eval_image, vector<bool> &eval_ground,
        vector<bool> &eval_3d, bool &success) {

  // holds all detections (ignored detections are indicated by an index vector
  vector<tDetection> detections;
  FILE *fp = fopen(file_name.c_str(),"r");
  if (!fp) {
    success = false;
    return detections;
  }
  while (!feof(fp)) {
    tDetection d;
    double trash;
    char str[255];
    if (fscanf(fp, "%s %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf",
                   str, &trash, &trash, &d.box.alpha, &d.box.x1, &d.box.y1,
                   &d.box.x2, &d.box.y2, &d.h, &d.w, &d.l, &d.t1, &d.t2, &d.t3,
                   &d.ry, &d.thresh)==16) {

        // d.thresh = 1;
      d.box.type = str;
      detections.push_back(d);

      // orientation=-10 is invalid, AOS is not evaluated if at least one orientation is invalid
      if(d.box.alpha == -10)
        compute_aos = false;

      // a class is only evaluated if it is detected at least once
      for (int c = 0; c < NUM_CLASS; c++) {
        if (!strcasecmp(d.box.type.c_str(), CLASS_NAMES[c].c_str())) {
          if (!eval_image[c] && d.box.x1 >= 0)
            eval_image[c] = true;
          if (!eval_ground[c] && d.t1 != -1000)
            eval_ground[c] = true;
          if (!eval_3d[c] && d.t2 != -1000)
            eval_3d[c] = true;
          break;
        }
      }
    }
  }
  fclose(fp);
  success = true;
  return detections;
}

vector<tGroundtruth> loadGroundtruth(string file_name,bool &success) {

  // holds all ground truth (ignored ground truth is indicated by an index vector
  vector<tGroundtruth> groundtruth;
  FILE *fp = fopen(file_name.c_str(),"r");
  if (!fp) {
    success = false;
    return groundtruth;
  }
  while (!feof(fp)) {
    tGroundtruth g;
    char str[255];
    if (fscanf(fp, "%s %lf %d %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf",
                   str, &g.truncation, &g.occlusion, &g.box.alpha,
                   &g.box.x1,   &g.box.y1,     &g.box.x2,    &g.box.y2,
                   &g.h,      &g.w,        &g.l,       &g.t1,
                   &g.t2,      &g.t3,        &g.ry )==15) {
      g.box.type = str;
      groundtruth.push_back(g);
    }
  }
  fclose(fp);
  success = true;
  return groundtruth;
}

void saveStats (const vector<double> &precision, const vector<double> &aos, FILE *fp_det, FILE *fp_ori) {

  // save precision to file
  if(precision.empty())
    return;
  for (int32_t i=0; i<precision.size(); i++)
    fprintf(fp_det,"%f ",precision[i]);
  fprintf(fp_det,"\n");

  // save orientation similarity, only if there were no invalid orientation entries in submission (alpha=-10)
  if(aos.empty())
    return;
  for (int32_t i=0; i<aos.size(); i++)
    fprintf(fp_ori,"%f ",aos[i]);
  fprintf(fp_ori,"\n");
}

/*=======================================================================
EVALUATION HELPER FUNCTIONS
=======================================================================*/

// criterion defines whether the overlap is computed with respect to both areas (ground truth and detection)
// or with respect to box a or b (detection and "dontcare" areas)
inline double imageBoxOverlap(tBox a, tBox b, int32_t criterion=-1){

  // overlap is invalid in the beginning
  double o = -1;

  // get overlapping area
  double x1 = max(a.x1, b.x1);
  double y1 = max(a.y1, b.y1);
  double x2 = min(a.x2, b.x2);
  double y2 = min(a.y2, b.y2);

  // compute width and height of overlapping area
  double w = x2-x1;
  double h = y2-y1;

  // set invalid entries to 0 overlap
  if(w<=0 || h<=0)
    return 0;

  // get overlapping areas
  double inter = w*h;
  double a_area = (a.x2-a.x1) * (a.y2-a.y1);
  double b_area = (b.x2-b.x1) * (b.y2-b.y1);

  // intersection over union overlap depending on users choice
  if(criterion==-1)     // union
    o = inter / (a_area+b_area-inter);
  else if(criterion==0) // bbox_a
    o = inter / a_area;
  else if(criterion==1) // bbox_b
    o = inter / b_area;

  // overlap
  return o;
}

inline double imageBoxOverlap(tDetection a, tGroundtruth b, int32_t criterion=-1){
  return imageBoxOverlap(a.box, b.box, criterion);
}

// compute polygon of an oriented bounding box
template <typename T>
Polygon toPolygon(const T& g) {
    using namespace boost::numeric::ublas;
    using namespace boost::geometry;
    matrix<double> mref(2, 2);
    mref(0, 0) = cos(g.ry); mref(0, 1) = sin(g.ry);
    mref(1, 0) = -sin(g.ry); mref(1, 1) = cos(g.ry);

    static int count = 0;
    matrix<double> corners(2, 4);
    double data[] = {g.l / 2, g.l / 2, -g.l / 2, -g.l / 2,
                     g.w / 2, -g.w / 2, -g.w / 2, g.w / 2};
    std::copy(data, data + 8, corners.data().begin());
    matrix<double> gc = prod(mref, corners);
    for (int i = 0; i < 4; ++i) {
        gc(0, i) += g.t1;
        gc(1, i) += g.t3;
    }

    double points[][2] = {{gc(0, 0), gc(1, 0)},{gc(0, 1), gc(1, 1)},{gc(0, 2), gc(1, 2)},{gc(0, 3), gc(1, 3)},{gc(0, 0), gc(1, 0)}};
    Polygon poly;
    append(poly, points);
    return poly;
}

// measure overlap between bird's eye view bounding boxes, parametrized by (ry, l, w, tx, tz)
inline double groundBoxOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {
    using namespace boost::geometry;
    Polygon gp = toPolygon(g);
    Polygon dp = toPolygon(d);

    std::vector<Polygon> in, un;
    intersection(gp, dp, in);
    union_(gp, dp, un);

    double inter_area = in.empty() ? 0 : area(in.front());
    double union_area = area(un.front());
    double o;
    if(criterion==-1)     // union
        o = inter_area / union_area;
    else if(criterion==0) // bbox_a
        o = inter_area / area(dp);
    else if(criterion==1) // bbox_b
        o = inter_area / area(gp);

    return o;
}

// measure overlap between 3D bounding boxes, parametrized by (ry, h, w, l, tx, ty, tz)
inline double box3DOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {
    using namespace boost::geometry;
    Polygon gp = toPolygon(g);
    Polygon dp = toPolygon(d);

    std::vector<Polygon> in, un;
    intersection(gp, dp, in);
    union_(gp, dp, un);

    double ymax = min(d.t2, g.t2);
    double ymin = max(d.t2 - d.h, g.t2 - g.h);

    double inter_area = in.empty() ? 0 : area(in.front());
    double inter_vol = inter_area * max(0.0, ymax - ymin);

    double det_vol = d.h * d.l * d.w;
    double gt_vol = g.h * g.l * g.w;

    double o;
    if(criterion==-1)     // union
        o = inter_vol / (det_vol + gt_vol - inter_vol);
    else if(criterion==0) // bbox_a
        o = inter_vol / det_vol;
    else if(criterion==1) // bbox_b
        o = inter_vol / gt_vol;

    return o;
}

vector<double> getThresholds(vector<double> &v, double n_groundtruth){

  // holds scores needed to compute N_SAMPLE_PTS recall values
  vector<double> t;

  // sort scores in descending order
  // (highest score is assumed to give best/most confident detections)
  sort(v.begin(), v.end(), greater<double>());

  // get scores for linearly spaced recall
  double current_recall = 0;
  for(int32_t i=0; i<v.size(); i++){

    // check if right-hand-side recall with respect to current recall is close than left-hand-side one
    // in this case, skip the current detection score
    double l_recall, r_recall, recall;
    l_recall = (double)(i+1)/n_groundtruth;
    if(i<(v.size()-1))
      r_recall = (double)(i+2)/n_groundtruth;
    else
      r_recall = l_recall;

    if( (r_recall-current_recall) < (current_recall-l_recall) && i<(v.size()-1))
      continue;

    // left recall is the best approximation, so use this and goto next recall step for approximation
    recall = l_recall;

    // the next recall step was reached
    t.push_back(v[i]);
    current_recall += 1.0/(N_SAMPLE_PTS-1.0);
  }
  return t;
}

void cleanData(CLASSES current_class, const vector<tGroundtruth> &gt, const vector<tDetection> &det, vector<int32_t> &ignored_gt, vector<tGroundtruth> &dc, vector<int32_t> &ignored_det, int32_t &n_gt, DIFFICULTY difficulty){

  // extract ground truth bounding boxes for current evaluation class
  for(int32_t i=0;i<gt.size(); i++){

    // only bounding boxes with a minimum height are used for evaluation
    double height = gt[i].box.y2 - gt[i].box.y1;

    // neighboring classes are ignored ("van" for "car" and "person_sitting" for "pedestrian")
    // (lower/upper cases are ignored)
    int32_t valid_class;

    // all classes without a neighboring class
    if(!strcasecmp(gt[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))
      valid_class = 1;

    // classes with a neighboring class
    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), "Pedestrian") && !strcasecmp("Person_sitting", gt[i].box.type.c_str()))
      valid_class = 0;
    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), "Car") && !strcasecmp("Van", gt[i].box.type.c_str()))
      valid_class = 0;

    // classes not used for evaluation
    else
      valid_class = -1;

    // ground truth is ignored, if occlusion, truncation exceeds the difficulty or ground truth is too small
    // (doesn't count as FN nor TP, although detections may be assigned)
    bool ignore = false;
    if(gt[i].occlusion>MAX_OCCLUSION[difficulty] || gt[i].truncation>MAX_TRUNCATION[difficulty] || height<MIN_HEIGHT[difficulty])
      ignore = true;

    // set ignored vector for ground truth
    // current class and not ignored (total no. of ground truth is detected for recall denominator)
    if(valid_class==1 && !ignore){
      ignored_gt.push_back(0);
      n_gt++;
    }

    // neighboring class, or current class but ignored
    else if(valid_class==0 || (ignore && valid_class==1))
      ignored_gt.push_back(1);

    // all other classes which are FN in the evaluation
    else
      ignored_gt.push_back(-1);
  }

  // extract dontcare areas
  for(int32_t i=0;i<gt.size(); i++)
    if(!strcasecmp("DontCare", gt[i].box.type.c_str()))
      dc.push_back(gt[i]);

  // extract detections bounding boxes of the current class
  for(int32_t i=0;i<det.size(); i++){

    // neighboring classes are not evaluated
    int32_t valid_class;
    if(!strcasecmp(det[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))
      valid_class = 1;
    else
      valid_class = -1;

    int32_t height = fabs(det[i].box.y1 - det[i].box.y2);

    // set ignored vector for detections
    if(height<MIN_HEIGHT[difficulty])
      ignored_det.push_back(1);
    else if(valid_class==1)
      ignored_det.push_back(0);
    else
      ignored_det.push_back(-1);
  }
}

tPrData computeStatistics(CLASSES current_class, const vector<tGroundtruth> &gt,
        const vector<tDetection> &det, const vector<tGroundtruth> &dc,
        const vector<int32_t> &ignored_gt, const vector<int32_t>  &ignored_det,
        bool compute_fp, double (*boxoverlap)(tDetection, tGroundtruth, int32_t),
        METRIC metric, bool compute_aos=false, double thresh=0, bool debug=false){

  tPrData stat = tPrData();
  const double NO_DETECTION = -10000000;
  vector<double> delta;            // holds angular difference for TPs (needed for AOS evaluation)
  vector<bool> assigned_detection; // holds wether a detection was assigned to a valid or ignored ground truth
  assigned_detection.assign(det.size(), false);
  vector<bool> ignored_threshold;
  ignored_threshold.assign(det.size(), false); // holds detections with a threshold lower than thresh if FP are computed

  // detections with a low score are ignored for computing precision (needs FP)
  if(compute_fp)
    for(int32_t i=0; i<det.size(); i++)
      if(det[i].thresh<thresh)
        ignored_threshold[i] = true;

  // evaluate all ground truth boxes
  for(int32_t i=0; i<gt.size(); i++){

    // this ground truth is not of the current or a neighboring class and therefore ignored
    if(ignored_gt[i]==-1)
      continue;

    /*=======================================================================
    find candidates (overlap with ground truth > 0.5) (logical len(det))
    =======================================================================*/
    int32_t det_idx          = -1;
    double valid_detection = NO_DETECTION;
    double max_overlap     = 0;

    // search for a possible detection
    bool assigned_ignored_det = false;
    for(int32_t j=0; j<det.size(); j++){

      // detections not of the current class, already assigned or with a low threshold are ignored
      if(ignored_det[j]==-1)
        continue;
      if(assigned_detection[j])
        continue;
      if(ignored_threshold[j])
        continue;

      // find the maximum score for the candidates and get idx of respective detection
      double overlap = boxoverlap(det[j], gt[i], -1);

      // for computing recall thresholds, the candidate with highest score is considered
      if(!compute_fp && overlap>MIN_OVERLAP[metric][current_class] && det[j].thresh>valid_detection){
        det_idx         = j;
        valid_detection = det[j].thresh;
      }

      // for computing pr curve values, the candidate with the greatest overlap is considered
      // if the greatest overlap is an ignored detection (min_height), the overlapping detection is used
      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && (overlap>max_overlap || assigned_ignored_det) && ignored_det[j]==0){
        max_overlap     = overlap;
        det_idx         = j;
        valid_detection = 1;
        assigned_ignored_det = false;
      }
      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && valid_detection==NO_DETECTION && ignored_det[j]==1){
        det_idx              = j;
        valid_detection      = 1;
        assigned_ignored_det = true;
      }
    }

    /*=======================================================================
    compute TP, FP and FN
    =======================================================================*/

    // nothing was assigned to this valid ground truth
    if(valid_detection==NO_DETECTION && ignored_gt[i]==0) {
      stat.fn++;
    }

    // only evaluate valid ground truth <=> detection assignments (considering difficulty level)
    else if(valid_detection!=NO_DETECTION && (ignored_gt[i]==1 || ignored_det[det_idx]==1))
      assigned_detection[det_idx] = true;

    // found a valid true positive
    else if(valid_detection!=NO_DETECTION){

      // write highest score to threshold vector
      stat.tp++;
      stat.v.push_back(det[det_idx].thresh);

      // compute angular difference of detection and ground truth if valid detection orientation was provided
      if(compute_aos)
        delta.push_back(gt[i].box.alpha - det[det_idx].box.alpha);
      // test use ry as the error measure
	//delta.push_back(gt[i].ry - det[det_idx].ry);

      // clean up
      assigned_detection[det_idx] = true;
    }
  }

  // if FP are requested, consider stuff area
  if(compute_fp){

    // count fp
    for(int32_t i=0; i<det.size(); i++){

      // count false positives if required (height smaller than required is ignored (ignored_det==1)
      if(!(assigned_detection[i] || ignored_det[i]==-1 || ignored_det[i]==1 || ignored_threshold[i]))
        stat.fp++;
    }

    // do not consider detections overlapping with stuff area
    int32_t nstuff = 0;
    for(int32_t i=0; i<dc.size(); i++){
      for(int32_t j=0; j<det.size(); j++){

        // detections not of the current class, already assigned, with a low threshold or a low minimum height are ignored
        if(assigned_detection[j])
          continue;
        if(ignored_det[j]==-1 || ignored_det[j]==1)
          continue;
        if(ignored_threshold[j])
          continue;

        // compute overlap and assign to stuff area, if overlap exceeds class specific value
        double overlap = boxoverlap(det[j], dc[i], 0);
        if(overlap>MIN_OVERLAP[metric][current_class]){
          assigned_detection[j] = true;
          nstuff++;
        }
      }
    }

    // FP = no. of all not to ground truth assigned detections - detections assigned to stuff areas
    stat.fp -= nstuff;

    // if all orientation values are valid, the AOS is computed
    if(compute_aos){
      vector<double> tmp;

      // FP have a similarity of 0, for all TP compute AOS
      tmp.assign(stat.fp, 0);
      for(int32_t i=0; i<delta.size(); i++)
        tmp.push_back((1.0+cos(delta[i]))/2.0);

      // be sure, that all orientation deltas are computed
      assert(tmp.size()==stat.fp+stat.tp);
      assert(delta.size()==stat.tp);

      // get the mean orientation similarity for this image
      if(stat.tp>0 || stat.fp>0)
        stat.similarity = accumulate(tmp.begin(), tmp.end(), 0.0);

      // there was neither a FP nor a TP, so the similarity is ignored in the evaluation
      else
        stat.similarity = -1;
    }
  }
  return stat;
}

/*=======================================================================
EVALUATE CLASS-WISE
=======================================================================*/

bool eval_class (FILE *fp_det, FILE *fp_ori, CLASSES current_class,
        const vector< vector<tGroundtruth> > &groundtruth,
        const vector< vector<tDetection> > &detections, bool compute_aos,
        double (*boxoverlap)(tDetection, tGroundtruth, int32_t),
        vector<double> &precision, vector<double> &aos,
        DIFFICULTY difficulty, METRIC metric) {
    assert(groundtruth.size() == detections.size());

  // init
  int32_t n_gt=0;                                     // total no. of gt (denominator of recall)
  vector<double> v, thresholds;                       // detection scores, evaluated for recall discretization
  vector< vector<int32_t> > ignored_gt, ignored_det;  // index of ignored gt detection for current class/difficulty
  vector< vector<tGroundtruth> > dontcare;            // index of dontcare areas, included in ground truth

  // for all test images do
  for (int32_t i=0; i<groundtruth.size(); i++){

    // holds ignored ground truth, ignored detections and dontcare areas for current frame
    vector<int32_t> i_gt, i_det;
    vector<tGroundtruth> dc;

    // only evaluate objects of current class and ignore occluded, truncated objects
    cleanData(current_class, groundtruth[i], detections[i], i_gt, dc, i_det, n_gt, difficulty);
    ignored_gt.push_back(i_gt);
    ignored_det.push_back(i_det);
    dontcare.push_back(dc);

    // compute statistics to get recall values
    tPrData pr_tmp = tPrData();
    pr_tmp = computeStatistics(current_class, groundtruth[i], detections[i], dc, i_gt, i_det, false, boxoverlap, metric);

    // add detection scores to vector over all images
    for(int32_t j=0; j<pr_tmp.v.size(); j++)
      v.push_back(pr_tmp.v[j]);
  }

  // get scores that must be evaluated for recall discretization
  thresholds = getThresholds(v, n_gt);

  // compute TP,FP,FN for relevant scores
  vector<tPrData> pr;
  pr.assign(thresholds.size(),tPrData());
  for (int32_t i=0; i<groundtruth.size(); i++){

    // for all scores/recall thresholds do:
    for(int32_t t=0; t<thresholds.size(); t++){
      tPrData tmp = tPrData();
      tmp = computeStatistics(current_class, groundtruth[i], detections[i], dontcare[i],
                              ignored_gt[i], ignored_det[i], true, boxoverlap, metric,
                              compute_aos, thresholds[t], t==38);

      // add no. of TP, FP, FN, AOS for current frame to total evaluation for current threshold
      pr[t].tp += tmp.tp;
      pr[t].fp += tmp.fp;
      pr[t].fn += tmp.fn;
      if(tmp.similarity!=-1)
        pr[t].similarity += tmp.similarity;
    }
  }

  // compute recall, precision and AOS
  vector<double> recall;
  precision.assign(N_SAMPLE_PTS, 0);
  if(compute_aos)
    aos.assign(N_SAMPLE_PTS, 0);
  double r=0;
  for (int32_t i=0; i<thresholds.size(); i++){
    r = pr[i].tp/(double)(pr[i].tp + pr[i].fn);
    recall.push_back(r);
    precision[i] = pr[i].tp/(double)(pr[i].tp + pr[i].fp);
    if(compute_aos)
      aos[i] = pr[i].similarity/(double)(pr[i].tp + pr[i].fp);
  }

  // filter precision and AOS using max_{i..end}(precision)
  for (int32_t i=0; i<thresholds.size(); i++){
    precision[i] = *max_element(precision.begin()+i, precision.end());
    if(compute_aos)
      aos[i] = *max_element(aos.begin()+i, aos.end());
  }

  // save statisics and finish with success
  saveStats(precision, aos, fp_det, fp_ori);
    return true;
}

void saveAndPlotPlots(string dir_name,string file_name,string obj_type,vector<double> vals[],bool is_aos){

  char command[1024];

  // save plot data to file
  FILE *fp = fopen((dir_name + "/" + file_name + ".txt").c_str(),"w");
  printf("save %s\n", (dir_name + "/" + file_name + ".txt").c_str());
  for (int32_t i=0; i<(int)N_SAMPLE_PTS; i++)
    fprintf(fp,"%f %f %f %f\n",(double)i/(N_SAMPLE_PTS-1.0),vals[0][i],vals[1][i],vals[2][i]);
  fclose(fp);

  float sum[3] = {0, 0, 0};
  for (int v = 0; v < 3; ++v)
      for (int i = 1; i < vals[v].size(); i = i + 1)
          sum[v] += vals[v][i];
  printf("%s AP: %f %f %f\n", file_name.c_str(), sum[0] / 40 * 100, sum[1] / 40 * 100, sum[2] / 40 * 100);


  // create png + eps
  for (int32_t j=0; j<2; j++) {

    // open file
    FILE *fp = fopen((dir_name + "/" + file_name + ".gp").c_str(),"w");

    // save gnuplot instructions
    if (j==0) {
      fprintf(fp,"set term png size 450,315 font \"Helvetica\" 11\n");
      fprintf(fp,"set output \"%s.png\"\n",file_name.c_str());
    } else {
      fprintf(fp,"set term postscript eps enhanced color font \"Helvetica\" 20\n");
      fprintf(fp,"set output \"%s.eps\"\n",file_name.c_str());
    }

    // set labels and ranges
    fprintf(fp,"set size ratio 0.7\n");
    fprintf(fp,"set xrange [0:1]\n");
    fprintf(fp,"set yrange [0:1]\n");
    fprintf(fp,"set xlabel \"Recall\"\n");
    if (!is_aos) fprintf(fp,"set ylabel \"Precision\"\n");
    else         fprintf(fp,"set ylabel \"Orientation Similarity\"\n");
    obj_type[0] = toupper(obj_type[0]);
    fprintf(fp,"set title \"%s\"\n",obj_type.c_str());

    // line width
    int32_t   lw = 5;
    if (j==0) lw = 3;

    // plot error curve
    fprintf(fp,"plot ");
    fprintf(fp,"\"%s.txt\" using 1:2 title 'Easy' with lines ls 1 lw %d,",file_name.c_str(),lw);
    fprintf(fp,"\"%s.txt\" using 1:3 title 'Moderate' with lines ls 2 lw %d,",file_name.c_str(),lw);
    fprintf(fp,"\"%s.txt\" using 1:4 title 'Hard' with lines ls 3 lw %d",file_name.c_str(),lw);

    // close file
    fclose(fp);

    // run gnuplot => create png + eps
    sprintf(command,"cd %s; gnuplot %s",dir_name.c_str(),(file_name + ".gp").c_str());
    system(command);
  }

  // create pdf and crop
  sprintf(command,"cd %s; ps2pdf %s.eps %s_large.pdf",dir_name.c_str(),file_name.c_str(),file_name.c_str());
  system(command);
  sprintf(command,"cd %s; pdfcrop %s_large.pdf %s.pdf",dir_name.c_str(),file_name.c_str(),file_name.c_str());
  system(command);
  sprintf(command,"cd %s; rm %s_large.pdf",dir_name.c_str(),file_name.c_str());
  system(command);
}

vector<int32_t> getEvalIndices(const string& result_dir) {

    DIR* dir;
    dirent* entity;
    dir = opendir(result_dir.c_str());
    if (dir) {
        while (entity = readdir(dir)) {
            string path(entity->d_name);
            int32_t len = path.size();
            if (len < 10) continue;
            int32_t index = atoi(path.substr(len - 10, 10).c_str());
            indices.push_back(index);
        }
    }
    return indices;
}

bool eval(string gt_dir, string result_dir, Mail* mail){

  // set some global parameters
  initGlobals();

  // ground truth and result directories
  // string gt_dir         = "data/object/label_2";
  // string result_dir     = "results/" + result_sha;
  string plot_dir       = result_dir + "/plot";

  // create output directories
  system(("mkdir " + plot_dir).c_str());

  // hold detections and ground truth in memory
  vector< vector<tGroundtruth> > groundtruth;
  vector< vector<tDetection> >   detections;

  // holds wether orientation similarity shall be computed (might be set to false while loading detections)
  // and which labels where provided by this submission
  bool compute_aos=true;
  vector<bool> eval_image(NUM_CLASS, false);
  vector<bool> eval_ground(NUM_CLASS, false);
  vector<bool> eval_3d(NUM_CLASS, false);

  // for all images read groundtruth and detections
  mail->msg("Loading detections...");
  std::vector<int32_t> indices = getEvalIndices(result_dir + "/data/");
  printf("number of files for evaluation: %d\n", (int)indices.size());

  for (int32_t i=0; i<indices.size(); i++) {

    // file name
    char file_name[256];
    sprintf(file_name,"%06d.txt",indices.at(i));

    // read ground truth and result poses
    bool gt_success,det_success;
    vector<tGroundtruth> gt   = loadGroundtruth(gt_dir + "/" + file_name,gt_success);
    vector<tDetection>   det  = loadDetections(result_dir + "/data/" + file_name,
            compute_aos, eval_image, eval_ground, eval_3d, det_success);
    groundtruth.push_back(gt);
    detections.push_back(det);

    // check for errors
    if (!gt_success) {
      mail->msg("ERROR: Couldn't read: %s of ground truth. Please write me an email!", file_name);
      return false;
    }
    if (!det_success) {
      mail->msg("ERROR: Couldn't read: %s", file_name);
      return false;
    }
  }
  mail->msg("  done.");

  // holds pointers for result files
  FILE *fp_det=0, *fp_ori=0;

  // eval image 2D bounding boxes
  for (int c = 0; c < NUM_CLASS; c++) {
    CLASSES cls = (CLASSES)c;
    if (eval_image[c]) {
      fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection.txt").c_str(), "w");
      if(compute_aos)
        fp_ori = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_orientation.txt").c_str(),"w");
      vector<double> precision[3], aos[3];
      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[0], aos[0], EASY, IMAGE)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[1], aos[1], MODERATE, IMAGE)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[2], aos[2], HARD, IMAGE)) {
        mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str());
        return false;
      }
      fclose(fp_det);
      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection", CLASS_NAMES[c], precision, 0);
      if(compute_aos){
        saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_orientation", CLASS_NAMES[c], aos, 1);
        fclose(fp_ori);
      }
    }
  }

  // don't evaluate AOS for birdview boxes and 3D boxes
  compute_aos = false;

  // eval bird's eye view bounding boxes
  for (int c = 0; c < NUM_CLASS; c++) {
    CLASSES cls = (CLASSES)c;
    if (eval_ground[c]) {
      fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection_ground.txt").c_str(), "w");
      vector<double> precision[3], aos[3];
      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[0], aos[0], EASY, GROUND)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[1], aos[1], MODERATE, GROUND)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[2], aos[2], HARD, GROUND)) {
        mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str());
        return false;
      }
      fclose(fp_det);
      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection_ground", CLASS_NAMES[c], precision, 0);
    }
  }

  // eval 3D bounding boxes
  for (int c = 0; c < NUM_CLASS; c++) {
    CLASSES cls = (CLASSES)c;
    if (eval_3d[c]) {
      fp_det = fopen((result_dir + "/stats_" + CLASS_NAMES[c] + "_detection_3d.txt").c_str(), "w");
      vector<double> precision[3], aos[3];
      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[0], aos[0], EASY, BOX3D)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[1], aos[1], MODERATE, BOX3D)
         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[2], aos[2], HARD, BOX3D)) {
        mail->msg("%s evaluation failed.", CLASS_NAMES[c].c_str());
        return false;
      }
      fclose(fp_det);
      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + "_detection_3d", CLASS_NAMES[c], precision, 0);
    }
  }

  // success
  return true;
}

int32_t main (int32_t argc,char *argv[]) {

  // we need 2 or 4 arguments!
  if (argc!=3) {
    cout << "Usage: ./eval_detection_3d_offline gt_dir result_dir" << endl;
    return 1;
  }

  // read arguments
  string gt_dir = argv[1];
  string result_dir = argv[2];

  // init notification mail
  Mail *mail;
  mail = new Mail();
  mail->msg("Thank you for participating in our evaluation!");

  // run evaluation
  if (eval(gt_dir, result_dir, mail)) {
    mail->msg("Your evaluation results are available at:");
    mail->msg(result_dir.c_str());
  } else {
    system(("rm -r " + result_dir + "/plot").c_str());
    mail->msg("An error occured while processing your results.");
  }

  // send mail and exit
  delete mail;

  return 0;
}


================================================
FILE: tools/kitti-eval/mail.h
================================================
#ifndef MAIL_H
#define MAIL_H

#include <stdio.h>
#include <stdarg.h>
#include <string.h>

class Mail {

public:

  Mail (std::string email = "") {
    if (email.compare("")) {
      mail = popen("/usr/lib/sendmail -t -f noreply@cvlibs.net","w");
      fprintf(mail,"To: %s\n", email.c_str());
      fprintf(mail,"From: noreply@cvlibs.net\n");
      fprintf(mail,"Subject: KITTI Evaluation Benchmark\n");
      fprintf(mail,"\n\n");
    } else {
      mail = 0;
    }
  }
  
  ~Mail() {
    if (mail) {
      pclose(mail);
    }
  }
  
  void msg (const char *format, ...) {
    va_list args;
    va_start(args,format);
    if (mail) {
      vfprintf(mail,format,args);
      fprintf(mail,"\n");
    }
    vprintf(format,args);
    printf("\n");
    va_end(args);
  }
    
private:

  FILE *mail;
  
};

#endif


================================================
FILE: tools/train_IGRs.py
================================================
"""
Training the coordinate localization sub-network.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import sys
sys.path.append('../')
import torch
import os

import libs.arguments.parse as parse
import libs.logger.logger as liblogger
import libs.dataset as dataset
# import libs.dataset.ApolloScape.car_instance
import libs.dataset.KITTI.car_instance
import libs.trainer.trainer as trainer
import libs.model as models
import libs.optimizer.optimizer as optimizer
import libs.loss.function as loss_func

from libs.common.utils import get_model_summary
from libs.metric.criterions import get_distance_src, get_angle_error
from libs.metric.criterions import Evaluator

def choose_loss_func(model_settings, cfgs):
    """
    Initialize the loss function used for training. 
    """
    loss_type = model_settings['loss_type']
    if loss_type == 'JointsCompositeLoss':
        spec_list = model_settings['loss_spec_list']
        loss_weights = model_settings['loss_weight_list']
        func = loss_func.JointsCompositeLoss(spec_list=spec_list,
                                             img_size=model_settings['input_size'],
                                             hm_size=model_settings['heatmap_size'],
                                             cr_loss_thres=model_settings['cr_loss_threshold'],
                                             loss_weights=loss_weights
                                             )
    else:
        func = eval('loss_func.' + loss_type)(use_target_weight=cfgs['training_settings']['use_target_weight'])    
    # the order of the points are needed when computing the cross-ratio loss
    if model_settings['loss_spec_list'][2] != 'None':
        func.cr_indices = libs.dataset.KITTI.car_instance.cr_indices_dict['bbox12']
        func.target_cr = 4/3
    return func.cuda()

def train(model, model_settings, GPUs, cfgs, logger, final_output_dir):
    """
    The training method.
    """
    # get model summary
    input_size = model_settings['input_size']
    input_channels = 5 if cfgs['heatmapModel']['add_xy'] else 3
    dump_input = torch.rand((1, input_channels, input_size[1], input_size[0]))
    logger.info(get_model_summary(model, dump_input))
    
    model = torch.nn.DataParallel(model, device_ids=GPUs).cuda()

    # get forward-pass time if you need 
    # import time
    # dump_input = torch.rand((64, input_channels, input_size[1], input_size[0])).cuda()
    # t1 = time.clock()
    # out = model(dump_input)
    # l = out[0].sum()
    # l.backward()
    # torch.cuda.synchronize()
    # print(time.clock() - t1)

    # specify loss function 
    func = choose_loss_func(model_settings, cfgs)
    
    # dataset preparation
    data_cfgs = cfgs['dataset']
    train_dataset, valid_dataset = eval('dataset.' + data_cfgs['name'] + 
                                        '.car_instance').prepare_data(cfgs, logger)
    
    # get the optimizer and learning rate scheduler    
    optim, sche = optimizer.prepare_optim(model, cfgs)

    # metrics used for training error
    if cfgs['exp_type'] in ['baselinealpha', 'baselinetheta']:
        metric_function = get_angle_error
        save_debug_images = False
    elif cfgs['exp_type'] == 'instanceto2d':
        metric_function = get_distance_src
        save_debug_images = cfgs['training_settings']['debug']['save']
    collate_fn = train_dataset.get_collate_fn()
    trainer.train(train_dataset=train_dataset, 
                  valid_dataset=valid_dataset,
                  model=model,                   
                  loss_func=func,
                  optim=optim, 
                  sche=sche, 
                  metric_func=metric_function,
                  cfgs=cfgs, 
                  logger=logger,
                  collate_fn=collate_fn,
                  save_debug=save_debug_images
                  )

    final_model_state_file = os.path.join(final_output_dir, 'HC.pth')
    logger.info('=> saving final model state to {}'.format(final_model_state_file))
    torch.save(model.module.state_dict(), final_model_state_file)
    return

def evaluate(model, model_settings, GPUs, cfgs, logger, final_output_dir, eval_train=False):
    saved_path = cfgs['dirs']['load_hm_model']
    model.load_state_dict(torch.load(saved_path))
    model = torch.nn.DataParallel(model, device_ids=GPUs).cuda()
    evaluator = Evaluator(cfgs['testing_settings']['eval_metrics'], cfgs)
    # define loss function (criterion) and optimizer
    loss_func = choose_loss_func(model_settings, cfgs)
    # dataset preparation
    data_cfgs = cfgs['dataset']
    train_dataset, valid_dataset = eval('dataset.' + data_cfgs['name'] + 
                                        '.car_instance').prepare_data(cfgs, logger)
    collate_fn = valid_dataset.get_collate_fn()
    logger.info("Evaluation on the validation split:")
    trainer.evaluate(valid_dataset, model, loss_func, cfgs, logger, evaluator, collate_fn=collate_fn)    
    if eval_train:
        logger.info("Evaluation on the training split:")
        trainer.evaluate(train_dataset, model, loss_func, cfgs, logger, evaluator, collate_fn=collate_fn)
    return

def main():
    # experiment configurations
    cfgs = parse.parse_args()
    
    # logging
    logger, final_output_dir = liblogger.get_logger(cfgs)   
    
    # Set GPU
    if cfgs['use_gpu'] and torch.cuda.is_available():
        GPUs = cfgs['gpu_id']
    else:
        logger.info("GPU acceleration is disabled.")
    
    if len(GPUs) == 1:
        torch.cuda.set_device(GPUs[0])
        
    # cudnn related setting
    torch.backends.cudnn.benchmark = cfgs['cudnn']['benchmark']
    torch.backends.cudnn.deterministic = cfgs['cudnn']['deterministic']
    torch.backends.cudnn.enabled = cfgs['cudnn']['enabled']
    
    # model initialization
    model_settings = cfgs['heatmapModel']
    model_name = model_settings['name']
    method_str = 'models.heatmapModel' + '.' + model_name + '.get_pose_net'
    model = eval(method_str)(cfgs, is_train=cfgs['train'])

    if cfgs['train']:
        train(model, model_settings, GPUs, cfgs, logger, final_output_dir)
    elif cfgs['evaluate']:
        evaluate(model, model_settings, GPUs, cfgs, logger, final_output_dir)
    
if __name__ == '__main__':
    main()
    torch.cuda.empty_cache()

================================================
FILE: tools/train_lifting.py
================================================
"""
Training the sub-network \mathcal{L}() that predicts 3D cuboid 
given 2D screen coordinates as input.

Author: Shichao Li
Contact: nicholas.li@connect.ust.hk
"""

import sys
sys.path.append('../')

import libs.arguments.parse as parse
import libs.logger.logger as liblogger
# Deprecated: Apolloscape dataset
#import libs.dataset.ApolloScape.car_instance as car_instance
# KITTI dataset
import libs.dataset.KITTI.car_instance as car_instance
import libs.trainer.trainer as trainer

import torch
import numpy as np
import os

def main():
    # experiment configurations
    cfgs = parse.parse_args()

    # logging
    logger, final_output_dir = liblogger.get_logger(cfgs)
    
    # Set GPU
    if cfgs['use_gpu'] and torch.cuda.is_available():
        GPUs = cfgs['gpu_id']
    else:
        logger.info("GPU acceleration is disabled.")

    # load datasets
    train_dataset, eval_dataset = car_instance.prepare_data(cfgs, logger)
    logger.info("Finished preparing datasets...")
    
    # training
    if cfgs['train']:
        record = trainer.train_cascade(train_dataset, eval_dataset, cfgs, logger)
        cascade = record['cascade']

    if cfgs['save'] and 'cascade' in locals():
        save_path = os.path.join(cfgs['dirs']['output'], "KITTI")
        if not os.path.exists(save_path):
            os.mkdir(save_path)
        # save the model and the normalization statistics
        torch.save(cascade[0].cpu().state_dict(), 
                   os.path.join(save_path, 'L.pth')
                   )
        np.save(os.path.join(save_path, 'LS.npy'), train_dataset.statistics)
        logger.info('=> saving final model state to {}'.format(save_path))        
        # save loss history
        #np.save(os.path.join(save_path, 'record.npy'), record['record'])
        
    if cfgs['visualize'] or cfgs['evaluate']:
        # visualize the predictions
        cascade = torch.load(cfgs['load_model_path'])        
        if cfgs['use_gpu']:
            cascade.cuda()
            
    if cfgs['evaluate']:   
        trainer.evaluate_cascade(cascade, eval_dataset, cfgs) 
        
    return record

if __name__ == "__main__":
    record = main()
    torch.cuda.empty_cache()