[
  {
    "path": ".gitignore",
    "content": "#/*\n**/__pycache__\n.spyproject/\n*.log\n*.ini\n*.bak\n*.pth\n*.csv\n*.jpg\n*.png\n*.pdf\n/tools/kitti-eval/evaluate_object_3d_offline\n/outputs\n\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2022 Shichao Li\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/exploring-intermediate-representation-for/vehicle-pose-estimation-on-kitti-cars-hard)](https://paperswithcode.com/sota/vehicle-pose-estimation-on-kitti-cars-hard?p=exploring-intermediate-representation-for)\n# EgoNet\nOfficial project website for the CVPR 2021 paper \"Exploring intermediate representation for monocular vehicle pose estimation\". This repo includes an implementation that performs vehicle orientation estimation on the KITTI dataset from a single RGB image. \n\nNews:\n\n(2022-??-??): v-1.1 will be released which include pre-trained models for other object classes (Pedestrian and Cyclist in KITTI).\n\n(2021-08-16): v-1.0 is released. The training documentation is added.\n\n(2021-06-21): v-0.9 (beta version) is released. **The inference utility is here!** For Q&A, go to [discussions](https://github.com/Nicholasli1995/EgoNet/discussions). If you believe there is a technical problem, submit to [issues](https://github.com/Nicholasli1995/EgoNet/issues). \n\n(2021-06-16): This repo is under final code cleaning and documentation preparation. Stay tuned and come back in a week!\n\n**Check our 5-min video ([Youtube](https://www.youtube.com/watch?v=isKo0F3MU68), [爱奇艺](https://www.iqiyi.com/v_y6lrdy33kg.html)) for an introduction.**\n\n**中文详解**：[哔哩哔哩](https://www.bilibili.com/video/BV1jP4y1t7ee)\n<p align=\"center\">\n  <img src=\"https://github.com/Nicholasli1995/EgoNet/blob/master/imgs/teaser.jpg\"  width=\"830\" height=\"200\" />\n</p>\n\n## Run a demo with a one-line command!\nCheck instructions [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/demo.md).\n<p align=\"center\">\n  <img src=\"https://github.com/Nicholasli1995/EgoNet/blob/master/imgs/Ego-Net_demo.png\" height=\"175\"/>\n  <img src=\"https://github.com/Nicholasli1995/EgoNet/blob/master/imgs/Ego-Net_demo.gif\" height=\"175\"/>\n</p>\n\n## Performance: AP<sup>BEV</sup>@R<sub>40</sub> on KITTI val set for Car (monocular RGB)\nThe validation results in the paper was based on R<sub>11</sub>, the results using R<sub>40</sub> are attached here.\n| Method                    | Reference|Easy|Moderate|Hard|\n| ------------------------- | ---------------| --------------| --------------| --------------| \n|[M3D-RPN](https://arxiv.org/abs/1907.06038)|ICCV 2019|20.85| 15.62| 11.88|\n|[MonoDIS](https://openaccess.thecvf.com/content_ICCV_2019/papers/Simonelli_Disentangling_Monocular_3D_Object_Detection_ICCV_2019_paper.pdf)|ICCV 2019|18.45 |12.58 |10.66|\n|[MonoPair](https://arxiv.org/abs/2003.00504)|CVPR 2020|24.12| 18.17| 15.76|\n|[D4LCN](https://github.com/dingmyu/D4LCN)|CVPR 2020|31.53 |22.58  |17.87|\n|[Kinematic3D](https://arxiv.org/abs/2007.09548)|ECCV 2020|27.83| 19.72| 15.10|\n|[GrooMeD-NMS](https://github.com/abhi1kumar/groomed_nms)|CVPR 2021 |27.38|19.75|15.92|\n|[MonoDLE](https://github.com/xinzhuma/monodle)|CVPR 2021|24.97| 19.33| 17.01|\n|Ours (@R<sub>11</sub>)           |CVPR 2021 |**33.60**|**25.38**|**22.80**|\n|Ours (@R<sub>40</sub>)           |CVPR 2021 |**34.31**|**24.80**|**20.16**|\n\n## Performance: AOS@R<sub>40</sub> on KITTI test set for Car (RGB)\n\n| Method                    | Reference|Configuration|Easy|Moderate|Hard|\n| ------------------------- | ---------------| --------------| --------------| --------------| --------------| \n|[M3D-RPN](https://arxiv.org/abs/1907.06038)|ICCV 2019|Monocular|88.38 |82.81| 67.08|\n|[DSGN](https://github.com/Jia-Research-Lab/DSGN)|CVPR 2020|Stereo|95.42|86.03| 78.27|\n|[Disp-RCNN](https://github.com/zju3dv/disprcnn)|CVPR 2020|Stereo |93.02 |\t81.70 |\t67.16|\n|[MonoPair](https://arxiv.org/abs/2003.00504)|CVPR 2020|Monocular|91.65 |86.11 |76.45|\n|[D4LCN](https://github.com/dingmyu/D4LCN)|CVPR 2020|Monocular|90.01|82.08| 63.98|\n|[Kinematic3D](https://arxiv.org/abs/2007.09548)|ECCV 2020|Monocular|58.33 |\t45.50 |\t34.81|\n|[MonoDLE](https://github.com/xinzhuma/monodle)|CVPR 2021|Monocular|93.46| 90.23| 80.11|\n|[Ours](http://www.cvlibs.net/datasets/kitti/eval_object_detail.php?&result=e5233225fd5ef36fa63eb00252d9c00024961f2c)           |CVPR 2021 |Monocular|**96.11**|**91.23**|**80.96**|\n\n## Inference/Deployment\nCheck instructions [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/inference.md) to **reproduce** the above quantitative results.\n\n## Training\nCheck instructions [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/training.md) to train Ego-Net and learn how to prepare your own training dataset other than KITTI.\n\n## Citation\nPlease star this repository and cite the following paper in your publications if it helps your research:\n\n    @InProceedings{Li_2021_CVPR,\n    author    = {Li, Shichao and Yan, Zengqiang and Li, Hongyang and Cheng, Kwang-Ting},\n    title     = {Exploring intermediate representation for monocular vehicle pose estimation},\n    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},\n    month     = {June},\n    year      = {2021},\n    pages     = {1873-1883}\n    }\n\n## License\nThis repository can be used freely for non-commercial purposes. Contact me if you are interested in a commercial license.\n\n## Links\nLink to the paper:\n[Exploring intermediate representation for monocular vehicle pose estimation](https://arxiv.org/abs/2011.08464)\n\nLink to the presentation video:\n[Youtube](https://www.youtube.com/watch?v=isKo0F3MU68), [爱奇艺](https://www.iqiyi.com/v_y6lrdy33kg.html)\n\nRelevant ECCV 2020 work: [GSNet](https://github.com/lkeab/gsnet)\n"
  },
  {
    "path": "configs/KITTI_inference:demo.yml",
    "content": "# This is a YAML file storing experimental configurations for KITTI dataset\n\n## general settings\nname: 'refine a given set of predictions from D4LCN'\nexp_type: 'inference'\nmodel_type: 'heatmapModel'\nuse_gpu: True\nuse_pred_box: True\nuse_gt_box: True\ngpu_id: [0]\n\n## operations\ntrain: False\nsave: False\nevaluate: False\ninference: True\n\n## used directories\ndirs:\n    # output directory\n    output: 'YOUR_OURPUT_DIR' \n    ckpt: 'YOUR_PRETRAINED_DIR' \n    load_prediction_file: '../resources/D4LCN/data'\n\n## CUDNN settings\ncudnn:\n    enabled: True\n    deterministic: True\n    benchmark: False\n\n## dataset settings\ndataset:\n    name: 'KITTI'\n    split: 'valid'\n    detect_classes: ['Car']\n    3d_kpt_sample_style: 'bbox9'\n    # interpolate the 3D bbox\n    interpolate:\n        flag: True\n        style: 'bbox12'\n        coef: [0.332, 0.667]\n    # do some pre-processing\n    pre-process: False\n    root: 'YOUR_KITTI_DIR'\n    # augmentation parameters\n    scaling_factor: 0.2\n    rotation_factor: 30. # degrees\n    # pytorch image transformation setting\n    pth_transform:\n# mean: [0.485, 0.456, 0.406, 0., 0.] \n# std: [0.229, 0.224, 0.225, 1., 1.]    \n        mean: [0.485, 0.456, 0.406] \n        std: [0.229, 0.224, 0.225]          \n    # annotation style for 2d key-point\n    2d_kpt_style: 'bbox9' # projected 3d bounding box corner and center points\n    # input-output representation for 2d-to-3d lifting\n    lft_in_rep: 'coordinates2d' # 2d coordinates on screen\n    lft_out_rep: 'R3d+T' # 3d coordinates relative to centroid plus translation vector   \n\n## model settings for a fully-connected network if used\nFCModel:\n    name: 'lifter'\n    refine_3d: False \n    norm_twoD: False\n    num_blocks: 2 \n    input_size: 66 \n    output_size: 96 \n    num_neurons: 1024\n    dropout: 0.5\n    leaky: False\n\n## settings for a fully-convolutional heatmap/coordinate regression model\nheatmapModel:\n    name: hrnet # here a high-resolution (hr) model is used\n    add_xy: False # concatenate xy coodrinate maps along with the input\n    jitter_bbox: False\n    jitter_params:\n        shift:\n        - 0.1\n        - 0.1\n        scaling:\n        - 0.4\n        - 0.4\n    input_size: \n    - 256\n    - 256\n    # rotate and scaling and input images\n    augment_input: False\n    # one can choose to regress dense semantic heatmaps or coordinates \n    head_type: 'coordinates'\n    # up-sampling with pixel-shuffle\n    pixel_shuffle: False\n    # if an intermediate heatmap is produced\n    heatmap_size:\n    - 64\n    - 64\n    init_weights: true\n    num_joints: 33\n    extra:\n        pretrained_layers:\n        - 'conv1'\n        - 'bn1'\n        - 'conv2'\n        - 'bn2'\n        - 'layer1'\n        - 'transition1'\n        - 'stage2'\n        - 'transition2'\n        - 'stage3'\n        - 'transition3'\n        - 'stage4'\n        final_conv_kernel: 1\n        stage2:\n            num_modules: 1\n            num_branches: 2\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            num_channels:\n            - 48\n            - 96\n            fuse_method: sum\n        stage3:\n            num_modules: 4\n            num_branches: 3\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            - 4\n            num_channels:\n            - 48\n            - 96\n            - 192\n            fuse_method: sum\n        stage4:\n            num_modules: 3\n            num_branches: 4\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            - 4\n            - 4\n            num_channels:\n            - 48\n            - 96\n            - 192\n            - 384\n            fuse_method: sum\n\n## testing settings\ntesting_settings:\n    batch_size: 1\n    num_threads: 0\n    shuffle: True\n    pin_memory: False\n    apply_dropout: False\n    unnormalize: False\n    alpha_mode: 'proj'"
  },
  {
    "path": "configs/KITTI_inference:test_submission.yml",
    "content": "# YAML file storing experimental configurations for KITTI dataset\n\n## general settings\nname: 'produce vehicle pose predictions on KITTI test split given detected bounding boxes'\nexp_type: 'inference'\nmodel_type: 'heatmapModel'\nuse_gpu: True\nuse_pred_box: True\nuse_gt_box: False\ngpu_id: [0]\n\n## operations\ntrain: False\nsave: False\nvisualize: False # visualize during inference\nbatch_to_show: 1000000 # how many batches to visualize if needed\nevaluate: False\ninference: True\nconf_thres: 0.1 # discard low score boxes\n\n## used directories\ndirs:\n    # output directory\n    output: 'YOUR_OURPUT_DIR' \n    ckpt: 'YOUR_PRETRAINED_DIR' \n    # raw detection results on test set by using RRC-Net\n    load_prediction_file: '../resources/test_boxes'\n\n## CUDNN settings\ncudnn:\n    enabled: True\n    deterministic: True\n    benchmark: False\n\n## dataset settings\ndataset:\n    name: 'KITTI'\n    split: 'test'\n    detect_classes: ['Car']\n    3d_kpt_sample_style: 'bbox9'\n    # interpolate the 3D bbox\n    interpolate:\n        flag: True\n        style: 'bbox12'\n        coef: [0.332, 0.667]\n    # do some pre-processing\n    pre-process: False\n    root: 'YOUR_KITTI_DIR'\n    # augmentation parameters\n    scaling_factor: 0.2\n    rotation_factor: 30. # degrees\n    # pytorch image transformation setting\n    pth_transform:\n        mean: [0.485, 0.456, 0.406] # TODO re-calculate this: R, G, B, X, Y \n        std: [0.229, 0.224, 0.225]          \n    # annotation style for 2d key-point\n    2d_kpt_style: 'bbox9' # projected 3d bounding box corner and center points\n    # input-output representation for 2d-to-3d lifting\n    lft_in_rep: 'coordinates2d' # 2d coordinates on screen\n    lft_out_rep: 'R3d+T' # 3d coordinates relative to centroid plus translation vector   \n\n## model settings for a fully-connected network if used\nFCModel:\n    name: 'lifter'\n    refine_3d: False \n    norm_twoD: False\n    num_blocks: 2 \n    input_size: 66 \n    output_size: 96 \n    num_neurons: 1024\n    dropout: 0.5\n    leaky: False\n\n## settings for a fully-convolutional heatmap regression model\nheatmapModel:\n    name: hrnet # here a high-resolution (hr) model is used\n    add_xy: False # concatenate xy coodrinate maps along with the input\n    jitter_bbox: True\n    jitter_params:\n        shift:\n        - 0.1\n        - 0.1\n        scaling:\n        - 0.4\n        - 0.4\n    input_size: \n    - 256\n    - 256\n    # rotate and scaling and input images\n    augment_input: True\n    head_type: 'coordinates'\n    pixel_shuffle: False\n    # if an intermediate heatmap is produced\n    heatmap_size:\n    - 64\n    - 64\n    init_weights: true\n    num_joints: 33\n    use_different_joints_weight: False\n    extra:\n        pretrained_layers:\n        - 'conv1'\n        - 'bn1'\n        - 'conv2'\n        - 'bn2'\n        - 'layer1'\n        - 'transition1'\n        - 'stage2'\n        - 'transition2'\n        - 'stage3'\n        - 'transition3'\n        - 'stage4'\n        final_conv_kernel: 1\n        stage2:\n            num_modules: 1\n            num_branches: 2\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            num_channels:\n            - 48\n            - 96\n            fuse_method: sum\n        stage3:\n            num_modules: 4\n            num_branches: 3\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            - 4\n            num_channels:\n            - 48\n            - 96\n            - 192\n            fuse_method: sum\n        stage4:\n            num_modules: 3\n            num_branches: 4\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            - 4\n            - 4\n            num_channels:\n            - 48\n            - 96\n            - 192\n            - 384\n            fuse_method: sum\n\n## testing settings\ntesting_settings:\n    batch_size: 1\n    num_threads: 0\n    shuffle: True\n    pin_memory: False\n    apply_dropout: False\n    unnormalize: False\n    alpha_mode: 'proj'"
  },
  {
    "path": "configs/KITTI_train_IGRs.yml",
    "content": "# YAML file storing experimental configurations for training on KITTI dataset\n\n## general settings\nname: 'kitti_kpt_loc'\nexp_type: 'instanceto2d'\nmodel_type: 'heatmapModel'\nuse_gpu: True\ngpu_id: [0,1,2] # MODIFY this to the GPU/GPUs ids in your computer\n\n## operations\ntrain: True\nsave: True\nvisualize: False\nevaluate: False\n\n## output directories\ndirs:\n    # MODIFY them to your preferred directories\n    output: '../outputs/training_record' \n    # This directory save intermediate training results (optional)\n    debug: '../outputs/training_record/debug' \n\n## CUDNN settings\ncudnn:\n    enabled: True\n    deterministic: False\n    benchmark: False\n\n## dataset settings\ndataset:\n    name: 'KITTI'\n    detect_classes: ['Car']\n    3d_kpt_sample_style: 'bbox9'\n    interpolate:\n        flag: True\n        style: 'bbox12'\n        coef: [0.332, 0.667]\n    # do some pre-processing\n    pre-process: False\n    # MODIFY this to your KITTI directory\n    root: '$YOUR_DIR/KITTI'\n    # augmentation parameters\n    scaling_factor: 0.2\n    rotation_factor: 30. # degrees\n    # pytorch image transformation setting\n    pth_transform:\n#        mean: [0.485, 0.456, 0.406, 0., 0.]\n#        std: [0.229, 0.224, 0.225, 1., 1.]    \n        mean: [0.485, 0.456, 0.406] \n        std: [0.229, 0.224, 0.225]    \n    2d_kpt_style: 'bbox9'\n\n## self-supervision settings\nss:\n    flag: False\n    # MODIFY this to your unlabeled image record if you enable self-supervised representation learning\n    record_path: '$YOUR_DIR/Apollo_ss_record.npy'\n    img_root: '$YOUR_DIR/ApolloScape/images'\n    max_per_img: 6\n\n## settings for a fully-convolutional heatmap/coordinate regression model\nheatmapModel:\n    name: hrnet # here a high-resolution (hr) model is used\n    add_xy: False # concatenate xy coodrinate maps along with the input\n    # data augmentation by adding noise to bounding box location\n    jitter_bbox: True\n    jitter_params:\n        shift:\n        - 0.1\n        - 0.1\n        scaling:\n        - 0.4\n        - 0.4\n    input_size: \n    - 256\n    - 256\n    # rotate and scaling and input images\n    augment_input: True\n    head_type: 'coordinates'\n    # up-sampling with pixel-shuffle\n    pixel_shuffle: False\n    # if an intermediate heatmap is produced\n    heatmap_size:\n    - 64\n    - 64\n    loss_type: JointsCompositeLoss\n    # the following two settings are only valid for JointsCompositeLoss\n    loss_spec_list: ['mse', 'l1', 'sl1']\n    loss_weight_list: [1.0, 0.1, 'None']\n    cr_loss_threshold: 0.15\n    init_weights: true\n    num_joints: 33\n    #use_different_joints_weight: False\n    # use a pre-trained checkpoint to initialize the model\n    # MODIFY it to your own checkpoint directory\n    pretrained: '../resources/start_point.pth'\n    target_type: gaussian\n    sigma: 1\n    extra:\n        pretrained_layers:\n        - 'conv1'\n        - 'bn1'\n        - 'conv2'\n        - 'bn2'\n        - 'layer1'\n        - 'transition1'\n        - 'stage2'\n        - 'transition2'\n        - 'stage3'\n        - 'transition3'\n        - 'stage4'\n        final_conv_kernel: 1\n        stage2:\n            num_modules: 1\n            num_branches: 2\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            num_channels:\n            - 48\n            - 96\n            fuse_method: sum\n        stage3:\n            num_modules: 4\n            num_branches: 3\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            - 4\n            num_channels:\n            - 48\n            - 96\n            - 192\n            fuse_method: sum\n        stage4:\n            num_modules: 3\n            num_branches: 4\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            - 4\n            - 4\n            num_channels:\n            - 48\n            - 96\n            - 192\n            - 384\n            fuse_method: sum\n\n## training settings  \ntraining_settings:\n    total_epochs: 45\n    resume: False\n    batch_size: 24\n    num_threads: 16 # MODIFY this accordingly based on your machine\n    shuffle: True\n    pin_memory: False\n    # weighted loss computation\n    use_target_weight: False\n    report_every: 30\n    eval_every: 130\n    eval_during: False # set this to True if you want to evaluate during training\n    eval_metrics: ['JointDistance2DSIP']\n    plot_loss: False\n    # debugging configurations \n    debug: \n        save: True # save some intermeadiate images with keypoint prediction\n        save_images_kpts: True\n        save_hms_gt: True\n        save_hms_pred: True\n\n## testing settings\ntesting_settings:\n    batch_size: 2\n    num_threads: 4\n    shuffle: False\n    pin_memory: False\n    apply_dropout: False\n    unnormalize: False\n    eval_metrics: ['JointDistance2DSIP']\n\n## optimizer settings\noptimizer:\n    # for ADAM\n    optim_type: 'adam'\n    lr: 0.001\n    weight_decay: 0.0\n    # for SGD\n    momentum: 0.9\n    # learning rate decay\n    milestones: [10, 20, 30, 40]\n    gamma: 0.5\n"
  },
  {
    "path": "configs/KITTI_train_IGRs_Ped.yml",
    "content": "# YAML file storing experimental configurations for training on KITTI dataset for the Pedestrian class\n\n## general settings\nname: 'kitti_kpt_loc_pedestrian'\nexp_type: 'instanceto2d'\n# baselin\nmodel_type: 'heatmapModel'\nuse_gpu: True\ngpu_id: [0,1,]\n\n## operations\ntrain: True\nsave: True\nvisualize: False\nevaluate: False\n\n## output directories\ndirs:\n    # MODIFY them to your preferred directories\n    output: '../outputs/training_record' \n    # This directory save intermediate training results (optional)\n    debug: '../outputs/training_record/debug' \n\n## CUDNN settings\ncudnn:\n    enabled: True\n    deterministic: False\n    benchmark: False\n\n## dataset settings\ndataset:\n    name: 'KITTI'\n    detect_classes: ['Pedestrian']\n    3d_kpt_sample_style: 'bbox9'\n    interpolate:\n        flag: True\n        style: 'bbox12'\n        coef: [0.332, 0.667]\n    enlarge_factor: 1.05 # patch size parameter\n    # do some pre-processing\n    pre-process: True\n    root: '/media/nicholas/Database/datasets/KITTI'\n    # augmentation parameters\n    scaling_factor: 0.2\n    rotation_factor: 30. # degrees\n    # pytorch image transformation setting\n    pth_transform:  \n        mean: [0.485, 0.456, 0.406]\n        std: [0.229, 0.224, 0.225]    \n    2d_kpt_style: 'bbox9'\n\n## self-supervision settings\nss:\n    flag: False\n    # MODIFY this to your unlabeled image record if you enable self-supervised representation learning\n    record_path: '$YOUR_DIR/Apollo_ss_record.npy'\n    img_root: '$YOUR_DIR/ApolloScape/images'\n    max_per_img: 6\n\n## settings for a fully-convolutional heatmap regression model\nheatmapModel:\n    name: hrnet # here a high-resolution (hr) model is used\n    add_xy: False # concatenate xy coodrinate maps along with the input\n    jitter_bbox: True\n    jitter_params:\n        shift:\n        - 0.1\n        - 0.1\n        scaling:\n        - 0.2\n        - 0.2\n    input_size: \n    - 192\n    - 256\n    # rotate and scaling and input images\n    augment_input: True\n    head_type: 'coordinates'\n    # up-sampling with pixel-shuffle\n    pixel_shuffle: False\n    # if an intermediate heatmap is produced\n    heatmap_size:\n    - 48\n    - 64\n    loss_type: JointsCompositeLoss\n    # the following two settings are only valid for JointsCompositeLoss\n    loss_spec_list: ['mse', 'l1', 'None']\n    loss_weight_list: [1.0, 0.1, 0.]\n    cr_loss_threshold: 0.1\n    init_weights: true\n    num_joints: 33\n    use_different_joints_weight: False\n    # use a pre-trained checkpoint to initialize the model\n    # MODIFY it to your own checkpoint directory\n    pretrained: '../resources/start_point.pth'\n    target_type: gaussian\n    sigma: 2\n    extra:\n        pretrained_layers:\n        - 'conv1'\n        - 'bn1'\n        - 'conv2'\n        - 'bn2'\n        - 'layer1'\n        - 'transition1'\n        - 'stage2'\n        - 'transition2'\n        - 'stage3'\n        - 'transition3'\n        - 'stage4'\n        freeze_layers:\n        - 'conv1'\n        - 'bn1'\n        - 'conv2'\n        - 'bn2'\n        - 'layer1'\n        - 'transition1'\n        - 'stage2'        \n        final_conv_kernel: 1\n        stage2:\n            num_modules: 1\n            num_branches: 2\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            num_channels:\n            - 32\n            - 64\n            fuse_method: sum\n        stage3:\n            num_modules: 4\n            num_branches: 3\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            - 4\n            num_channels:\n            - 32\n            - 64\n            - 128\n            fuse_method: sum\n        stage4:\n            num_modules: 3\n            num_branches: 4\n            block: basic\n            num_blocks:\n            - 4\n            - 4\n            - 4\n            - 4\n            num_channels:\n            - 32\n            - 64\n            - 128\n            - 256\n            fuse_method: sum\n\n## training settings  \ntraining_settings:\n    total_epochs: 40\n    resume: False\n    begin_epoch: 1\n    end_epoch: 10\n    snapshot_epochs: [20, 30, 40]\n    batch_size: 2\n    num_threads: 0\n    shuffle: True\n    pin_memory: False\n    # weighted loss computation\n    use_target_weight: False\n    report_every: 100\n    eval_every: 1000\n    eval_during: True\n    eval_metrics: ['JointDistance2DSIP']\n    plot_loss: False\n    # debugging configurations \n    debug: \n        save: True # save some intermeadiate results\n        save_images_kpts: True\n        save_hms_gt: True\n        save_hms_pred: True\n\n## testing settings\ntesting_settings:\n    batch_size: 2\n    num_threads: 0\n    shuffle: False\n    pin_memory: False\n    apply_dropout: False\n    unnormalize: False\n    eval_metrics: ['JointDistance2DSIP']\n    # save_debug: True\n    save_debug: False\n\n## optimizer settings\noptimizer:\n    # for ADAM\n    optim_type: 'adam'\n    lr: 0.001\n    weight_decay: 0.0\n    # for SGD\n    momentum: 0.9\n    # learning rate decay\n    milestones: [10, 20, 30]\n    gamma: 0.5\n"
  },
  {
    "path": "configs/KITTI_train_lifting.yml",
    "content": "# YAML file storing experimental configurations for KITTI dataset\n\n## general settings\nname: 'lifter'\nexp_type: '2dto3d'\nmodel_type: 'FCModel'\nuse_gpu: True\ngpu_id: [1] # modify this to the GPU ids that you use \n\n## operations\ntrain: True # perform training\nsave: True # save the trained model\nvisualize: False # visualize the training results\nevaluate: False # perform evaluation\n\n## paths to the relevant directories\ndirs:\n    # output directory\n    output: '../outputs/training_record' \n    debug: '../outputs/training_record/debug'\n    data_vis: '../outputs/training_record/data_vis'\n\n## CUDNN settings\ncudnn:\n    enabled: True\n    deterministic: False\n    benchmark: False\n\n## evaluation metrics\nmetrics:\n    R3D:\n        T_style: 'direct'\n        R_style: 'euler'\n\n## dataset settings\ndataset:\n    name: 'KITTI'\n    detect_classes: ['Car'] # used class for training\n    3d_kpt_sample_style: 'bbox9' # construct a cuboid for each 3D bounding box\n    # interpolate the 3D bbox\n    interpolate:\n        flag: True\n        style: 'bbox12'\n        coef: [0.332, 0.667]\n    # do some pre-processing\n    pre-process: False\n    root: '$YOUR_DIR/KITTI' # MODIFY this to your own path    \n    # input-output representation for 2d-to-3d lifting\n    lft_in_rep: 'coordinates2d' # 2d coordinates on screen\n    lft_out_rep: 'R3d' # 3d coordinates relative to centroid plus translation vector\n\n## optional cascaded regression\ncascade: \n    num_stages: 1 # the default is simply no cascade\n\n## model settings for a fully-connected network if used\nFCModel:\n    name: 'lifter'\n    refine_3d: False \n    norm_twoD: False\n    num_blocks: 2 \n    num_neurons: 1024\n    dropout: 0.5\n    leaky: False\n    loss_type: MSELoss1D\n    loss_reduction: 'mean'\n\n## training settings  \ntraining_settings:\n    # total_epochs: 300\n    total_epochs: 1\n    eval_start_epoch: 250 # start evaluation after this epoch\n    resume: False\n    batch_size: 2048\n    num_threads: 4 # set the number of workers that works for your machine\n    shuffle: True\n    pin_memory: False\n#    report_every: 500 # report every 500 batches\n#    eval_every: 500 # test on the evaluation set every 500 batches\n    report_every: 5 # report every 500 batches\n    eval_every: 5 # test on the evaluation set every 500 batches\n    eval_during: False # MODIFY this to True if you want to evaluate during the training process\n    # how many times to augment data for 2D-to-3D lifting\n    lft_aug: True\n    lft_aug_times: 100\n    # what evaluation metrics to use\n    eval_metrics: ['RError3D']\n    plot_loss: False # visualize the loss function during training \n\n## testing settings if used\ntesting_settings:\n    apply_dropout: False\n    unnormalize: True\n    batch_size: 1024\n    num_threads: 4\n    shuffle: False\n#    vis_epoch: 290 # start ploting after this epoch\n\n## optimizer settings\noptimizer:\n    # for ADAM\n    optim_type: 'adam'\n    lr: 0.001\n    weight_decay: 0.0\n    # for SGD\n    momentum: 0.9\n    # learning rate will decay at each milestone epoch\n    milestones: [50, 100, 150, 250]\n    gamma: 0.5\n"
  },
  {
    "path": "docs/demo.md",
    "content": "Firstly you need to prepare the dataset and pre-trained models as described [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/preparation.md).\r\n\r\nThen modify the directories by\r\n\r\n```bash\r\ncd ${EgoNet_DIR}/configs && vim KITTI_inference:demo.yml\r\n```\r\n\r\nEdit dirs:ckpt to your pre-trained model directory.\r\n\r\nEdit dataset:root to your KITTI directory.\r\n\r\nFinally, go to ${EgoNet_DIR}/tools and run\r\n\r\n```bash\r\n python inference.py --cfg \"../configs/KITTI_inference:demo.yml\" --visualize True --batch_to_show 2\r\n```\r\n\r\nYou can set --batch_to_show to other integers to see more results.\r\n\r\nThe visualized 3D bounding boxes are distinguished by their colors: \r\n1. Black indicates ground truth 3D boxes.\r\n2. Magenta indicates 3D bounding boxes predicted by another 3D object detector ([D4LCN](https://github.com/dingmyu/D4LCN)).\r\n3. Red indicates the predictions of Ego-Net, using the 2D bounding boxes from [D4LCN](https://github.com/dingmyu/D4LCN).\r\n4. Yellow indicates the predictions of Ego-Net, using the ground truth 2D bounding boxes.\r\n"
  },
  {
    "path": "docs/inference.md",
    "content": "Firstly you need to prepare the dataset and pre-trained models as described [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/preparation.md).\r\n\r\n## Reproduce D4LCN + EgoNet on the val split\r\nYou need to modify the directories by\r\n\r\n```bash\r\ncd ${EgoNet_DIR}/configs && vim KITTI_inference:demo.yml\r\n```\r\nEdit dirs:output to where you want to save the predictions.\r\n\r\nEdit dirs:ckpt to your pre-trained model directory.\r\n\r\nEdit dataset:root to your KITTI directory.\r\n\r\nFinally, go to ${EgoNet_DIR}/tools and run\r\n\r\n```bash\r\n python inference.py --cfg \"../configs/KITTI_inference:demo.yml\"\r\n```\r\n\r\nThis will load D4LCN predictions, refine their vehicle orientation predictions and save the results.\r\nThe official evaluation program will automatically run to produce quantitative performance.\r\n\r\n## Reproduce results on the test split\r\nYou need to modify the directories by\r\n\r\n```bash\r\ncd ${EgoNet_DIR}/configs && vim KITTI_inference:test_submission.yml\r\n```\r\nEdit dirs:output to where you want to save the predictions.\r\n\r\nEdit dirs:ckpt to your pre-trained model directory.\r\n\r\nEdit dataset:root to your KITTI directory.\r\n\r\nFinally, go to ${EgoNet_DIR}/tools and run\r\n\r\n```bash\r\n python inference.py --cfg \"../configs/KITTI_inference:test_submission.yml\"\r\n```\r\n\r\nThis will load prepared 2D bounding boxes, predict the vehicle orientation and save the predictions.\r\n\r\nNow you can zip the results and submit it to the [official server](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=2d)!\r\n\r\nYou can hit [91.23% AOS](http://www.cvlibs.net/datasets/kitti/eval_object_detail.php?&result=e5233225fd5ef36fa63eb00252d9c00024961f2c) for the moderate setting! This is the **most important** metric for joint vehicle detection and pose estimation on KITTI. You achieved this with a single RGB image without extra training data.\r\n"
  },
  {
    "path": "docs/preparation.md",
    "content": "## Data Preparation \nYou need to download KITTI dataset [here](http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=3d). Download left images, calibration files and labels.\nDownload the split files [here](https://drive.google.com/drive/folders/1YLtptqspOFw08QG2MsxewDT9tjF2O45g?usp=sharing) and place them at ${YOUR_KITTI_DIR}/SPLIT/ImageSets.\nYour data folder should look like this:\n\n   ```\n   ${YOUR_KITTI_DIR}\n   ├── training\n      ├── calib\n          ├── xxxxxx.txt (Camera parameters for image xxxxxx)\n      ├── image_2\n          ├── xxxxxx.png (image xxxxxx)\n      ├── label_2\n          ├── xxxxxx.txt (object labels for image xxxxxx)\n      ├── ImageSets\n         ├── train.txt\n         ├── val.txt   \n         ├── trainval.txt        \n   ├── testing\n      ├── calib\n          ├── xxxxxx.txt (Camera parameters for image xxxxxx)\n      ├── image_2\n          ├── xxxxxx.png (image xxxxxx)\n      ├── ImageSets\n         ├── test.txt\n   ```\n\n## Download pre-trained model\nYou need to download the pre-trained checkpoints [here](https://drive.google.com/file/d/1JsVzw7HMfchxOXoXgvWG1I_bPRD1ierE/view?usp=sharing) in order to use Ego-Net. Unzip it to ${YOUR_MODEL_DIR}.\n\n## Compile the official evaluator\nGo to the folder storing the source code\n```bash\ncd ${EgoNet_DIR}/tools/kitti-eval \n```\nCompile the source code\n```bash\ng++ -o evaluate_object_3d_offline evaluate_object_3d_offline.cpp -O3\n```\n\n## Download the input bounding boxes\nDownload the [resources folder](https://drive.google.com/drive/folders/1atfXLmsLFG6XEtNnwZuEYLydKqjr7Icf?usp=sharing) and unzip its contents. Place the resource folder at ${EgoNet_DIR}/resources\n\n\n## Environment\nYou need to create an environment that meets the following dependencies. \nThe versions included in the parenthesis are **tested**. Other versions may also work but are **not tested**.\n\n- Python (3.7.9)\n- Numpy (1.19.2)\n- PyTorch (1.6.0, GPU required)\n- Scipy (1.5.2)\n- Matplotlib (3.3.4)\n- OpenCV (3.4.2)\n- pyyaml (5.4.1)\n\nFor more details of my tested local environment, refer to [spec-list.txt](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/spec-list.txt). \nThe recommended environment manager is [Anaconda](https://www.anaconda.com/), which can create an environment using this provided spec-list. \nFor debugging using an IDE, I personally use and recommend Spyder 4.2 which you can get by\n```bash\nconda install spyder\n```\n"
  },
  {
    "path": "docs/spec-list.txt",
    "content": "# This file may be used to create an environment using:\n# $ conda create --name <env> --file <this file>\n# platform: linux-64\n@EXPLICIT\nhttps://repo.anaconda.com/pkgs/main/linux-64/_libgcc_mutex-0.1-main.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/blas-1.0-mkl.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/ca-certificates-2021.1.19-h06a4308_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/ld_impl_linux-64-2.33.1-h53a641e_7.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libgfortran-ng-7.3.0-hdf63c60_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libstdcxx-ng-9.1.0-hdf63c60_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pandoc-2.12-h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/cudatoolkit-9.2-0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libgcc-ng-9.1.0-hdf63c60_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/bzip2-1.0.8-h7b6447c_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/expat-2.2.10-he6710b0_2.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/freeglut-3.0.0-hf484d3e_5.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/graphite2-1.3.14-h23475e2_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/icu-58.2-he6710b0_3.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/jpeg-9b-h024ee3a_2.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libffi-3.3-he6710b0_2.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libglu-9.0.0-hf484d3e_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libopus-1.3.1-h7b6447c_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libsodium-1.0.18-h7b6447c_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libspatialindex-1.9.3-h2531618_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libuuid-1.0.3-h1bed415_2.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libvpx-1.7.0-h439df22_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libwebp-base-1.2.0-h27cfd23_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libxcb-1.14-h7b6447c_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/lz4-c-1.9.3-h2531618_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/ncurses-6.2-he6710b0_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/openssl-1.1.1k-h27cfd23_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pcre-8.44-he6710b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pixman-0.40.0-h7b6447c_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/xz-5.2.5-h7b6447c_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/yaml-0.2.5-h7b6447c_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/zlib-1.2.11-h7b6447c_3.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/glib-2.68.0-h36276a3_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/hdf5-1.10.2-hba1933b_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/jasper-2.0.14-h07fcdf6_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libpng-1.6.37-hbc83047_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libxml2-2.9.10-hb55368b_3.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/readline-8.1-h27cfd23_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/tk-8.6.10-hbc83047_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/zeromq-4.3.4-h2531618_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/zstd-1.4.5-h9ceee32_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/dbus-1.13.18-hb2f20db_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/freetype-2.10.4-h5ab3b9f_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/gstreamer-1.14.0-h28cd5cc_2.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libtiff-4.2.0-h85742a9_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/sqlite-3.35.2-hdfb4753_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/ffmpeg-4.0-hcdf2ecd_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/fontconfig-2.13.1-h6c09931_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/gst-plugins-base-1.14.0-h8213a91_2.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/lcms2-2.11-h396b838_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/python-3.7.9-h7579374_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/alabaster-0.7.12-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/appdirs-1.4.4-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/argh-0.26.2-py37_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/atomicwrites-1.4.0-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/attrs-20.3.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/backcall-0.2.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/cairo-1.16.0-hf32fb01_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/certifi-2020.12.5-py37h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/chardet-4.0.0-py37h06a4308_1003.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/click-7.1.2-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/cloudpickle-1.6.0-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/colorama-0.4.4-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/decorator-4.4.2-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/defusedxml-0.7.1-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/diff-match-patch-20200713-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/docutils-0.16-py37_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/entrypoints-0.3-py37_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/future-0.18.2-py37_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/idna-2.10-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/imagesize-1.2.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/ipython_genutils-0.2.0-pyhd3eb1b0_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/jeepney-0.6.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/kiwisolver-1.3.1-py37h2531618_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/lazy-object-proxy-1.6.0-py37h27cfd23_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/markupsafe-1.1.1-py37h14c3975_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/mccabe-0.6.1-py37_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/mistune-0.8.4-py37h14c3975_1001.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/mypy_extensions-0.4.3-py37_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/ninja-1.10.2-py37hff7bd54_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/olefile-0.46-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pandocfilters-1.4.3-py37h06a4308_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/parso-0.7.0-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pathspec-0.7.0-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pickleshare-0.7.5-pyhd3eb1b0_1003.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/psutil-5.8.0-py37h27cfd23_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/ptyprocess-0.7.0-pyhd3eb1b0_2.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pycodestyle-2.6.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pycparser-2.20-py_2.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pyflakes-2.2.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pyparsing-2.4.7-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pyrsistent-0.17.3-py37h7b6447c_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pysocks-1.7.1-py37_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pytz-2021.1-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pyxdg-0.27-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pyyaml-5.4.1-py37h27cfd23_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pyzmq-20.0.0-py37h2531618_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/qdarkstyle-2.8.1-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/qt-5.9.7-h5867ecd_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/qtpy-1.9.0-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/regex-2021.3.17-py37h27cfd23_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/rope-0.18.0-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/rtree-0.9.4-py37_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/sip-4.19.8-py37hf484d3e_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/six-1.15.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/snowballstemmer-2.1.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/sortedcontainers-2.3.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-applehelp-1.0.2-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-devhelp-1.0.2-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-htmlhelp-1.0.3-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-jsmath-1.0.1-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-qthelp-1.0.3-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/sphinxcontrib-serializinghtml-1.1.4-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/testpath-0.4.4-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/textdistance-4.2.1-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/toml-0.10.2-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/tornado-6.1-py37h27cfd23_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/typed-ast-1.4.2-py37h27cfd23_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/typing_extensions-3.7.4.3-pyha847dfd_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/ujson-4.0.2-py37h2531618_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/wcwidth-0.2.5-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/webencodings-0.5.1-py37_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/wheel-0.36.2-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/wrapt-1.12.1-py37h7b6447c_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/wurlitzer-2.0.1-py37_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/yapf-0.31.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/zipp-3.4.1-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/autopep8-1.5.6-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/babel-2.9.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/black-19.10b0-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/cffi-1.14.5-py37h261ae71_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/cycler-0.10.0-py37_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/harfbuzz-1.8.8-hffaf4a1_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/importlib-metadata-3.7.3-py37h06a4308_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/intervaltree-3.1.0-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/jedi-0.17.2-py37h06a4308_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/packaging-20.9-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pexpect-4.8.0-pyhd3eb1b0_3.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pillow-8.1.2-py37he98fc37_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/prompt-toolkit-3.0.17-pyh06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pydocstyle-6.0.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pyqt-5.9.2-py37h05f1152_2.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/python-dateutil-2.8.1-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/python-jsonrpc-server-0.4.0-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/qtawesome-1.0.2-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/setuptools-52.0.0-py37h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/three-merge-0.1.1-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/traitlets-5.0.5-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/watchdog-1.0.2-py37h06a4308_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/astroid-2.5-py37h06a4308_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/bleach-3.3.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/brotlipy-0.7.0-py37h27cfd23_1003.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/cryptography-3.4.6-py37hd23ed53_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/flake8-3.9.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/importlib_metadata-3.7.3-hd3eb1b0_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/isort-5.8.0-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/jinja2-2.11.3-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/jupyter_core-4.7.1-py37h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/libopencv-3.4.2-hb342d67_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pip-21.0.1-py37h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pygments-2.8.1-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/ipython-7.21.0-py37hb070fc8_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/jsonschema-3.2.0-py_2.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/jupyter_client-6.1.7-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pluggy-0.13.1-py37h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/pylint-2.7.2-py37h06a4308_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pyopenssl-20.0.1-pyhd3eb1b0_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/secretstorage-3.3.1-py37h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/ipykernel-5.3.4-py37h5ca1d4c_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/keyring-22.3.0-py37h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/nbformat-5.1.2-pyhd3eb1b0_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/python-language-server-0.36.2-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/urllib3-1.26.4-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/nbconvert-5.6.1-py37_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pyls-black-0.4.6-hd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/pyls-spyder-0.3.2-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/qtconsole-5.0.3-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/requests-2.25.1-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/spyder-kernels-1.10.2-py37h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/sphinx-3.5.3-pyhd3eb1b0_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/numpydoc-1.1.0-pyhd3eb1b0_1.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/spyder-4.2.4-py37h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/noarch/imageio-2.9.0-py_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/intel-openmp-2020.2-254.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/mkl-2020.2-256.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/mkl-service-2.3.0-py37he8ac12f_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/matplotlib-3.3.4-py37h06a4308_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/matplotlib-base-3.3.4-py37h62a2d02_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/mkl_fft-1.3.0-py37h54f3939_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/mkl_random-1.1.1-py37h0573a6f_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/numpy-1.19.2-py37h54aff64_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/numpy-base-1.19.2-py37hfa32c7d_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/py-opencv-3.4.2-py37hb342d67_1.tar.bz2\nhttps://conda.anaconda.org/pytorch/linux-64/pytorch-1.6.0-py3.7_cuda9.2.148_cudnn7.6.3_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/scipy-1.5.2-py37h0b6359f_0.tar.bz2\nhttps://repo.anaconda.com/pkgs/main/linux-64/opencv-3.4.2-py37h6fd60c2_1.tar.bz2\nhttps://conda.anaconda.org/pytorch/linux-64/torchvision-0.7.0-py37_cu92.tar.bz2\n"
  },
  {
    "path": "docs/training.md",
    "content": "Firstly you need to prepare the dataset as described [here](https://github.com/Nicholasli1995/EgoNet/blob/master/docs/preparation.md).\n\nThen download a start point model [here](https://drive.google.com/file/d/1VFtMGgBG0cLGnbr3brrnPnJii2xGYj-9/view?usp=sharing) and place it at ${EgoNet_DIR}/resources. \n\nThe training phase consists of two stages which are described as follows. \n\nFor training on other datasets. You need to prepare the training images and camera parameters accordingly.\n\n## Stage 1: train a lifter (L.pth)\nYou need to modify the configuration by\n\n```bash\ncd ${EgoNet_DIR}/configs && vim KITTI_train_lifting.yml\n```\nEdit dataset:root to your KITTI directory.\n\n(Optional) Edit dirs:output to where you want to save the output model.\n\n(Optional) You can evaluate during training by setting eval_during to True.\n\nFinally, run\n\n```bash\n cd tools\n python train_lifting.py --cfg \"../configs/KITTI_train_lifting.yml\"\n```\n\n\n## Stage 2: train the remaining part (HC.pth)\nYou need to modify the configuration by\n\n```bash\ncd ${EgoNet_DIR}/configs && vim KITTI_train_IGRs.yml\n```\n\nEdit dataset:root to your KITTI directory.\n\nEdit gpu_id according to your local machine and set batch_size based on how much GPU memory you have. \n\n(Optional) Edit dirs:output to where you want to save the output model.\n\n(Optional) You can evaluate during training by setting eval_during to True.\n\n(Optional) Edit ss to enable self-supervised representation learning. You need to prepare unlabeled ApolloScape images and download record [here](https://drive.google.com/file/d/1uPdOC7LioomMF5DieUNrx3aZKsgobP5U/view?usp=sharing).\n\n(Optional) Edit training_settings:debug to disable saveing intermediate training results.\n\nFinally, run\n\n```bash\n cd tools\n python train_IGRs.py --cfg \"../configs/KITTI_train_IGRs.yml\"\n```\n"
  },
  {
    "path": "libs/arguments/__init__.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nEmpty file.\n\"\"\"\n\n\n"
  },
  {
    "path": "libs/arguments/parse.py",
    "content": "\"\"\"\nArgument parser for command line inputs and experiment configuration file.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport yaml\nimport argparse\n\ndef read_yaml_file(path):\n    \"\"\"\n    Read a .yml file.\n    \"\"\"\n    try: \n        with open (path, 'r') as file:\n            configs = yaml.safe_load(file)\n    except Exception as e:\n        print('Error reading the config file: ', e)\n    return configs\n\ndef parse_args():\n    \"\"\"\n    Read a .yml experiment configuration file whose path is provided by the user.\n    \n    You can add more arguments and modify configs accordingly.\n    \"\"\"\n    parser = argparse.ArgumentParser(description='a general parser')\n    # path to the configuration file\n    parser.add_argument('--cfg',\n                        help='experiment configuration file path',\n                        type=str\n                        )\n    parser.add_argument('--visualize',\n                        default=False,\n                        type=bool\n                        )    \n    parser.add_argument('--batch_to_show',\n                        default=1000000,\n                        type=int\n                        )    \n    args, unknown = parser.parse_known_args()\n    configs = read_yaml_file(args.cfg)   \n    configs['config_path'] = args.cfg\n    configs['visualize'] = args.visualize\n    configs['batch_to_show'] = args.batch_to_show\n    return configs"
  },
  {
    "path": "libs/common/__init__.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nEmpty file.\n\"\"\"\n\n\n"
  },
  {
    "path": "libs/common/format.py",
    "content": "\"\"\"\nMethods for formatted output.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\nimport os\n\nfrom copy import deepcopy\n\ndef format_str_submission(roll, pitch, yaw, x, y, z, score):\n    \"\"\"\n    Get a prediction string in ApolloScape style.\n    \"\"\"      \n    tempt_str = \"{pitch:.3f} {yaw:.3f} {roll:.3f} {x:.3f} {y:.3f} {z:.3f} {score:.3f}\".format(\n            pitch=pitch,\n            yaw=yaw,\n            roll=roll,\n            x=x,\n            y=y,\n            z=z,\n            score=score)\n    return tempt_str\n\ndef get_instance_str(dic):\n    \"\"\"\n    Produce KITTI style prediction string for one instance.\n    \"\"\"     \n    string = \"\"\n    string += dic['class'] + \" \"\n    string += \"{:.1f} \".format(dic['truncation'])\n    string += \"{:.1f} \".format(dic['occlusion'])\n    string += \"{:.6f} \".format(dic['alpha'])\n    string += \"{:.6f} {:.6f} {:.6f} {:.6f} \".format(dic['bbox'][0], dic['bbox'][1], dic['bbox'][2], dic['bbox'][3])\n    string += \"{:.6f} {:.6f} {:.6f} \".format(dic['dimensions'][1], dic['dimensions'][2], dic['dimensions'][0])\n    string += \"{:.6f} {:.6f} {:.6f} \".format(dic['locations'][0], dic['locations'][1], dic['locations'][2])\n    string += \"{:.6f} \".format(dic['rot_y'])\n    if 'score' in dic:\n        string += \"{:.8f} \".format(dic['score'])\n    else:\n        string += \"{:.8f} \".format(1.0)\n    return string\n\ndef get_pred_str(record):\n    \"\"\"\n    Produce KITTI style prediction string for a record dictionary.\n    \"\"\"      \n    # replace the rotation predictions of input bounding boxes\n    updated_txt = deepcopy(record['raw_txt_format'])\n    for instance_id in range(len(record['euler_angles'])):\n        updated_txt[instance_id]['rot_y'] = record['euler_angles'][instance_id, 1]\n        updated_txt[instance_id]['alpha'] = record['alphas'][instance_id]\n    pred_str = \"\"\n    angles = record['euler_angles']\n    for instance_id in range(len(angles)):\n        # format a string for submission\n        tempt_str = get_instance_str(updated_txt[instance_id])\n        if instance_id != len(angles) - 1:\n            tempt_str += '\\n'\n        pred_str += tempt_str\n    return pred_str\n\ndef save_txt_file(img_path, prediction, params):\n    \"\"\"\n    Save a txt file for predictions of an image.\n    \"\"\"    \n    if not params['flag']:\n        return\n    file_name = img_path.split('/')[-1][:-3] + 'txt'\n    save_path = os.path.join(params['save_dir'], file_name) \n    with open(save_path, 'w') as f:\n        f.write(prediction['pred_str'])\n    print('Wrote prediction file at {:s}'.format(save_path))\n    return"
  },
  {
    "path": "libs/common/img_proc.py",
    "content": "\"\"\"\nImage processing utilities.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport cv2\nimport numpy as np\nimport torch\nimport torch.nn.functional as F\nimport os\n\nSIZE = 200.0\n\ndef transform_preds(coords, center, scale, output_size):\n    \"\"\"\n    Transform local coordinates within a patch to screen coordinates.\n    \"\"\"      \n    target_coords = np.zeros(coords.shape)\n    trans = get_affine_transform(center, scale, 0, output_size, inv=1)\n    for p in range(coords.shape[0]):\n        target_coords[p, 0:2] = affine_transform(coords[p, 0:2], trans)\n    return target_coords\n\ndef get_affine_transform(center, \n                         scale, \n                         rot, \n                         output_size,\n                         shift=np.array([0, 0], dtype=np.float32), \n                         inv=0\n                         ):\n    \"\"\"\n    Estimate an affine transformation given crop parameters (center, scale and\n    rotation) and output resolution.                                                        \n    \"\"\"  \n    if isinstance(scale, list):\n        scale = np.array(scale)\n    if isinstance(center, list):\n        center = np.array(center)\n    scale_tmp = scale * SIZE\n    src_w = scale_tmp[0]\n    dst_h, dst_w = output_size\n\n    rot_rad = np.pi * rot / 180\n    src_dir = get_dir([0, src_w * -0.5], rot_rad)\n    dst_dir = np.array([0, dst_w * -0.5], np.float32)\n\n    src = np.zeros((3, 2), dtype=np.float32)\n    dst = np.zeros((3, 2), dtype=np.float32)\n    src[0, :] = center + scale_tmp * shift\n    src[1, :] = center + src_dir + scale_tmp * shift\n    dst[0, :] = [dst_w * 0.5, dst_h * 0.5]\n    dst[1, :] = np.array([dst_w * 0.5, dst_h * 0.5]) + dst_dir\n\n    src[2:, :] = get_3rd_point(src[0, :], src[1, :])\n    dst[2:, :] = get_3rd_point(dst[0, :], dst[1, :])\n\n    if inv:\n        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))\n    else:\n        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))\n\n    return trans\n\ndef affine_transform(pt, t):\n    new_pt = np.array([pt[0], pt[1], 1.]).T\n    new_pt = np.dot(t, new_pt)\n    return new_pt[:2]\n\ndef affine_transform_modified(pts, t):\n    \"\"\"\n    Apply affine transformation with homogeneous coordinates.                                                    \n    \"\"\" \n    # pts of shape [n, 2]\n    new_pts = np.hstack([pts, np.ones((len(pts), 1))]).T\n    new_pts = t @ new_pts\n    return new_pts[:2, :].T\n\ndef get_3rd_point(a, b):\n    direct = a - b\n    return b + np.array([-direct[1], direct[0]], dtype=np.float32)\n\ndef get_dir(src_point, rot_rad):\n    sn, cs = np.sin(rot_rad), np.cos(rot_rad)\n\n    src_result = [0, 0]\n    src_result[0] = src_point[0] * cs - src_point[1] * sn\n    src_result[1] = src_point[0] * sn + src_point[1] * cs\n\n    return src_result\n\ndef crop(img, center, scale, output_size, rot=0):\n    \"\"\"\n    A cropping function implemented as warping.                                                      \n    \"\"\"     \n    trans = get_affine_transform(center, scale, rot, output_size)\n\n    dst_img = cv2.warpAffine(img, \n                             trans, \n                             (int(output_size[0]), int(output_size[1])),\n                             flags=cv2.INTER_LINEAR\n                             )   \n\n    return dst_img\n\ndef simple_crop(input_image, center, crop_size):\n    \"\"\"\n    A simple cropping function without warping.\n    \"\"\"  \n    assert len(input_image.shape) == 3, 'Unsupported image format.'\n    channel = input_image.shape[2]\n    # crop a rectangular region around the center in the image\n    start_x = int(center[0] - crop_size[0])\n    end_x = int(center[0] + crop_size[0]) \n    start_y = int(center[1] - crop_size[1])\n    end_y = int(center[1] + crop_size[1])\n    cropped = np.zeros((end_y - start_y, end_x - start_x, channel), \n                       dtype = input_image.dtype)\n    # new bounding box index \n    new_start_x = max(-start_x, 0)\n    new_end_x = min(input_image.shape[1], end_x) - start_x\n    new_start_y = max(-start_y, 0)\n    new_end_y = min(input_image.shape[0], end_y) - start_y\n    # clamped old bounding box index\n    old_start_x = max(start_x, 0)\n    old_end_x = min(end_x, input_image.shape[1])\n    old_start_y = max(start_y, 0)\n    old_end_y = min(end_y, input_image.shape[0])\n    try:\n        cropped[new_start_y:new_end_y, new_start_x:new_end_x,:] = input_image[\n            old_start_y:old_end_y, old_start_x:old_end_x,:]\n    except ValueError:\n        print('Error: cropping fails')\n    return cropped\n\ndef np_random():\n    \"\"\"\n    Return a random number sampled uniformly from [-1, 1]\n    \"\"\"\n    return np.random.rand()*2 - 1\n\ndef jitter_bbox_with_kpts(old_bbox, joints, parameters):\n    \"\"\"\n    Randomly shifting and resizeing a bounding box and mask out occluded joints.\n    Used as data augmentation to improve robustness to detector noise.\n    \n    bbox: [x1, y1, x2, y2]\n    joints: [N, 3]\n    \"\"\"\n    new_joints = joints.copy()\n    width, height = old_bbox[2] - old_bbox[0], old_bbox[3] - old_bbox[1]\n    old_center = [0.5*(old_bbox[0] + old_bbox[2]), \n                  0.5*(old_bbox[1] + old_bbox[3])]\n    horizontal_shift = parameters['shift'][0]*width*np_random()\n    vertical_shift = parameters['shift'][1]*height*np_random()\n    new_center = [old_center[0] + horizontal_shift,\n                  old_center[1] + vertical_shift]\n    horizontal_scaling = parameters['scaling'][0]*np_random() + 1\n    vertical_scaling = parameters['scaling'][1]*np_random() + 1\n    new_width = width*horizontal_scaling\n    new_height = height*vertical_scaling\n    new_bbox = [new_center[0] - 0.5*new_width, new_center[1] - 0.5*new_height,\n                new_center[0] + 0.5*new_width, new_center[1] + 0.5*new_height]\n    # predicate from upper left corner\n    predicate1 = joints[:, :2] - np.array([[new_bbox[0], new_bbox[1]]])\n    predicate1 = (predicate1 > 0.).prod(axis=1)\n    # predicate from lower right corner\n    predicate2 = joints[:, :2] - np.array([[new_bbox[2], new_bbox[3]]])\n    predicate2 = (predicate2 < 0.).prod(axis=1)\n    new_joints[:, 2] *= predicate1*predicate2\n    return new_bbox, new_joints\n\ndef jitter_bbox_with_kpts_no_occlu(old_bbox, joints, parameters):\n    \"\"\"\n    Similar to the function above, but does not produce occluded joints\n    \"\"\"\n    width, height = old_bbox[2] - old_bbox[0], old_bbox[3] - old_bbox[1]\n    old_center = [0.5 * (old_bbox[0] + old_bbox[2]), \n                  0.5 * (old_bbox[1] + old_bbox[3])]\n    horizontal_scaling = parameters['scaling'][0] * np.random.rand() + 1\n    vertical_scaling = parameters['scaling'][1] * np.random.rand() + 1\n    horizontal_shift = 0.5 * (horizontal_scaling - 1) * width * np_random()\n    vertical_shift = 0.5 * (vertical_scaling - 1) * height * np_random()\n    new_center = [old_center[0] + horizontal_shift,\n                  old_center[1] + vertical_shift]\n    new_width = width * horizontal_scaling\n    new_height = height * vertical_scaling\n    new_bbox = [new_center[0] - 0.5 * new_width, new_center[1] - 0.5 * new_height,\n                new_center[0] + 0.5 * new_width, new_center[1] + 0.5 * new_height]\n    return new_bbox, joints\n\ndef generate_xy_map(bbox, resolution, global_size):\n    \"\"\"\n    Generate the normalized coordinates as 2D maps which encodes location \n    information.\n    \n    bbox: [x1, y1, x2, y2] the local region\n    resolution (height, width): target resolution\n    global_size (height, width): the size of original image\n    \"\"\"\n    map_width, map_height = resolution\n    g_height, g_width = global_size\n    x_start, x_end = 2*bbox[0]/g_width - 1, 2*bbox[2]/g_width - 1\n    y_start, y_end = 2*bbox[1]/g_height - 1, 2*bbox[3]/g_height - 1\n    x_map = np.tile(np.linspace(x_start, x_end, map_width), (map_height, 1))\n    x_map = x_map.reshape(map_height, map_width, 1)\n    y_map = np.linspace(y_start, y_end, map_height).reshape(map_height, 1)\n    y_map = np.tile(y_map, (1, map_width))\n    y_map = y_map.reshape(map_height, map_width, 1)\n    return np.concatenate([x_map, y_map], axis=2)\n\ndef crop_single_instance(data_numpy, bbox, joints, parameters, pth_trans=None):\n    \"\"\"\n    Crop an instance from an image given the bounding box and part coordinates.\n    \"\"\"\n    reso = parameters['input_size'] # (height, width)\n    transformed_joints = joints.copy()\n    if parameters['jitter_bbox']:\n        bbox, joints = jitter_bbox_with_kpts_no_occlu(bbox, \n                                                      joints,\n                                                      parameters['jitter_params']\n                                                      )\n    joints_vis = joints[:, 2]\n    if parameters['resize']:\n        ret = resize_bbox(bbox[0], bbox[1], bbox[2], bbox[3], \n                          target_ar=reso[0]/reso[1])\n        c, s = ret['c'], ret['s']\n    else:\n        c, s = bbox2cs(bbox)    \n    trans = get_affine_transform(c, s, 0.0, reso)\n    input = cv2.warpAffine(data_numpy,\n                           trans,\n                           (int(reso[1]), int(reso[0])),\n                           flags=cv2.INTER_LINEAR\n                           )\n    # add two more channels to encode object location\n    if parameters['add_xy']:\n        xymap = generate_xy_map(ret['bbox'], reso, parameters['global_size'])\n        input = np.concatenate([input, xymap.astype(np.float32)], axis=2)\n    #cv2.imwrite('test.jpg', input)\n    #input = torch.from_numpy(input.transpose(2,0,1))\n    input = input if pth_trans is None else pth_trans(input)\n    for i in range(len(joints)):\n        if joints_vis[i] > 0.0:\n            transformed_joints[i, 0:2] = affine_transform(joints[i, 0:2], trans)   \n    c = c.reshape(1, 2)\n    s = s.reshape(1, 2)\n    return input.unsqueeze(0), transformed_joints, c, s\n\ndef get_tensor_from_img(path, \n                        parameters,\n                        sf=0.2, \n                        rf=30., \n                        r_prob=0.6, \n                        aug=False, \n                        rgb=True, \n                        joints=None,\n                        global_box=None,\n                        pth_trans=None,\n                        generate_hm=False,\n                        max_cnt=None\n                        ):\n    \"\"\"\n    Read image and apply data augmentation to obtain a tensor. \n    Keypoints are also transformed if given.\n    \n    path: image path\n    c: cropping center\n    s: cropping scale\n    r: rotation\n    reso: resolution of output image\n    sf: scaling factor\n    rf: rotation factor\n    aug: apply data augmentation\n    joints: key-point locations with optional visibility [N_instance, N_joint, 3]\n    generate_hm: whether to generate heatmap based on joint locations\n    \"\"\"\n#    data_numpy = cv2.imread(\n#        path, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION\n#        )\n    data_numpy = cv2.imread(\n        path, 1 | 128\n        )    \n    if data_numpy is None:\n        raise ValueError('Fail to read {}'.format(path))    \n    if rgb:\n        data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)\n    all_inputs = []\n    all_target = []\n    all_centers = []\n    all_scales = []\n    all_target_weight = []\n    # the dimension of the image\n    parameters['global_size'] = data_numpy.shape[:-1]\n    all_transformed_joints = []\n    if parameters['reference'] == 'bbox':\n        # crop around the given bounding boxes\n        # bbox = [0, 0, data_numpy.shape[1] - 1, data_numpy.shape[0] - 1] \\\n        #     if 'bbox' not in parameters else parameters['bbox']\n        bboxes = parameters['boxes'] # [N_instance, 4]\n        for idx, bbox in enumerate(bboxes):\n            input, transformed_joints, c, s = crop_single_instance(data_numpy,\n                                                                   bbox,\n                                                                   joints[idx],\n                                                                   parameters,\n                                                                   pth_trans\n                                                                   )\n            all_inputs.append(input)\n            all_centers.append(c)\n            all_scales.append(s)\n        # s = s * np.clip(np.random.randn() * sf + 1, 1 - sf, 1 + sf)\n        # r = np.clip(np.random.randn() * rf, -rf, rf) if np.random.rand() <= r_prob else 0\n            target = target_weight = 1.\n            if generate_hm:\n                target, target_weight = generate_target(transformed_joints, \n                                                        transformed_joints[:,2], \n                                                        parameters)\n                target = torch.unsqueeze(torch.from_numpy(target), 0)\n                target_weight = torch.unsqueeze(torch.from_numpy(target_weight), 0)\n            all_target.append(target)\n            all_target_weight.append(target_weight)\n            all_transformed_joints.append(np.expand_dims(transformed_joints,0))\n    all_transformed_joints = np.concatenate(all_transformed_joints)\n    if max_cnt is not None and max_cnt < len(all_inputs):\n        end = max_cnt\n    else:\n        end = len(all_inputs)\n    end_indices = list(range(end))\n    meta = {\n        'path': path,\n        'original_joints': joints[end_indices],\n        'transformed_joints': all_transformed_joints[end_indices],\n        'center': np.vstack(all_centers[:end]),\n        'scale': np.vstack(all_scales[:end]),\n        'joints_vis': all_transformed_joints[end_indices][:,:,2]\n        # 'rotation': r,\n    }\n    inputs = torch.cat(all_inputs[:end], dim=0)\n    if generate_hm:\n        targets = torch.cat(all_target[:end], dim=0)\n        target_weights = torch.cat(all_target_weight[:end], dim=0)\n    else:\n        targets, target_weights = None, None\n    return inputs, targets, target_weights, meta\n\ndef generate_target(joints, joints_vis, parameters):\n    \"\"\"\n    Generate heatmap targets by drawing Gaussian dots.\n    \n    joints:  [num_joints, 3]\n    joints_vis: [num_joints]\n    \n    return: target, target_weight (1: visible, 0: invisible)\n    \"\"\"\n    num_joints = parameters['num_joints']\n    target_type = parameters['target_type']\n    input_size = parameters['input_size']\n    heatmap_size = parameters['heatmap_size']\n    sigma = parameters['sigma']\n    target_weight = np.ones((num_joints, 1), dtype=np.float32)\n    target_weight[:, 0] = joints_vis\n\n    \n    assert target_type == 'gaussian', 'Only support gaussian map now!'\n\n    if target_type == 'gaussian':\n        target = np.zeros((num_joints, heatmap_size[0], heatmap_size[1]), \n                          dtype=np.float32)\n\n        tmp_size = sigma * 3\n\n        for joint_id in range(num_joints):\n            if target_weight[joint_id] <= 0.5:\n                continue\n            feat_stride = input_size / heatmap_size\n            mu_x = int(joints[joint_id][0] / feat_stride[0] + 0.5)\n            mu_y = int(joints[joint_id][1] / feat_stride[1] + 0.5)\n            # Check that any part of the gaussian is in-bounds\n            ul = [int(mu_x - tmp_size), int(mu_y - tmp_size)]\n            br = [int(mu_x + tmp_size + 1), int(mu_y + tmp_size + 1)]\n            if ul[0] >= heatmap_size[1] or ul[1] >= heatmap_size[0] \\\n                    or br[0] < 0 or br[1] < 0:\n                # If not, just return the image as is\n                target_weight[joint_id] = 0\n                continue\n\n            # # Generate gaussian\n            size = 2 * tmp_size + 1\n            x = np.arange(0, size, 1, np.float32)\n            y = x[:, np.newaxis]\n            x0 = y0 = size // 2\n            # The gaussian is not normalized, we want the center value to equal 1\n            g = np.exp(- ((x - x0) ** 2 + (y - y0) ** 2) / (2 * sigma ** 2))\n\n            # Usable gaussian range\n            g_x = max(0, -ul[0]), min(br[0], heatmap_size[1]) - ul[0]\n            g_y = max(0, -ul[1]), min(br[1], heatmap_size[0]) - ul[1]\n            # Image range\n            img_x = max(0, ul[0]), min(br[0], heatmap_size[1])\n            img_y = max(0, ul[1]), min(br[1], heatmap_size[0])\n\n            target[joint_id][img_y[0]:img_y[1], img_x[0]:img_x[1]] = \\\n                g[g_y[0]:g_y[1], g_x[0]:g_x[1]]\n\n    if parameters['use_different_joints_weight']:\n        target_weight = np.multiply(target_weight, parameters['joints_weight'])\n\n    return target, target_weight\n\ndef resize_bbox(left, top, right, bottom, target_ar=1.):\n    \"\"\"\n    Resize a bounding box to pre-defined aspect ratio.\n    \"\"\" \n    width = right - left\n    height = bottom - top\n    aspect_ratio = height/width\n    center_x = (left + right)/2\n    center_y = (top + bottom)/2\n    if aspect_ratio > target_ar:\n        new_width = height*(1/target_ar)\n        new_left = center_x - 0.5*new_width\n        new_right = center_x + 0.5*new_width\n        new_top = top\n        new_bottom = bottom        \n    else:\n        new_height = width*target_ar\n        new_left = left\n        new_right = right\n        new_top = center_y - 0.5*new_height\n        new_bottom = center_y + 0.5*new_height\n    return {'bbox': [new_left, new_top, new_right, new_bottom],\n            'c': np.array([center_x, center_y]),\n            's': np.array([(new_right - new_left)/SIZE, (new_bottom - new_top)/SIZE])\n            }\n\ndef enlarge_bbox(left, top, right, bottom, enlarge):\n    \"\"\"\n    Enlarge a bounding box.\n    \"\"\" \n    width = right - left\n    height = bottom - top\n    new_width = width * enlarge[0]\n    new_height = height * enlarge[1]\n    center_x = (left + right) / 2\n    center_y = (top + bottom) / 2    \n    new_left = center_x - 0.5 * new_width\n    new_right = center_x + 0.5 * new_width\n    new_top = center_y - 0.5 * new_height\n    new_bottom = center_y + 0.5 * new_height\n    return [new_left, new_top, new_right, new_bottom]\n\ndef modify_bbox(bbox, target_ar, enlarge=1.1):\n    \"\"\"\n    Modify a bounding box by enlarging/resizing.\n    \"\"\"\n    lbbox = enlarge_bbox(bbox[0], bbox[1], bbox[2], bbox[3], [enlarge, enlarge])\n    ret = resize_bbox(lbbox[0], lbbox[1], lbbox[2], lbbox[3], target_ar=target_ar)\n    return ret\n    \ndef resize_crop(crop_size, target_ar=None):\n    \"\"\"\n    Resize a crop size to a pre-defined aspect ratio.\n    \"\"\"    \n    if target_ar is None:\n        return crop_size\n    width = crop_size[0]\n    height = crop_size[1]\n    aspect_ratio = height / width    \n    if aspect_ratio > target_ar:\n        new_width = height * (1 / target_ar)\n        new_height = height\n    else:\n        new_height = width*target_ar\n        new_width = width\n    return [new_width, new_height]\n\ndef bbox2cs(bbox):\n    \"\"\"\n    Convert bounding box annotation to center and scale.\n    \"\"\"  \n    return [(bbox[0] + bbox[2]/2), (bbox[1] + bbox[3]/2)], \\\n        [(bbox[2] - bbox[0]/SIZE), (bbox[3] - bbox[1]/SIZE)]\n\ndef cs2bbox(center, size):\n    \"\"\"\n    Convert center/scale to a bounding box annotation.\n    \"\"\"  \n    x1 = center[0] - size[0]\n    y1 = center[1] - size[1]\n    x2 = center[0] + size[0]\n    y2 = center[1] + size[1]\n    return [x1, y1, x2, y2]\n\ndef kpts2cs(keypoints, \n            enlarge=1.1, \n            method='boundary', \n            target_ar=None, \n            use_visibility=True\n            ):\n    \"\"\"\n    Convert instance screen coordinates to cropping center and size\n    \n    keypoints of shape [n_joints, 2/3]\n    \"\"\"   \n    assert keypoints.shape[1] in [2, 3], 'Unsupported input.'\n    if keypoints.shape[1] == 2:\n        visible_keypoints = keypoints\n        vis_rate = 1.0\n    elif keypoints.shape[1] == 3 and use_visibility:\n        visible_indices = keypoints[:, 2].nonzero()[0]\n        visible_keypoints = keypoints[visible_indices, :2]\n        vis_rate = len(visible_keypoints)/len(keypoints)\n    else:\n        visible_keypoints = keypoints[:, :2]\n        visible_indices = np.array(range(len(keypoints)))\n        vis_rate = 1.0\n    if method == 'centroid':\n        center = np.ceil(visible_keypoints.mean(axis=0, keepdims=True))\n        dif = np.abs(visible_keypoints - center).max(axis=0, keepdims=True)\n        crop_size = np.ceil(dif*enlarge).squeeze()\n        center = center.squeeze()\n    elif method == 'boundary':\n        left_top = visible_keypoints.min(axis=0, keepdims=True)\n        right_bottom = visible_keypoints.max(axis=0, keepdims=True)\n        center = ((left_top + right_bottom) / 2).squeeze()\n        crop_size = ((right_bottom - left_top)*enlarge/2).squeeze()\n    else:\n        raise NotImplementedError\n    # resize the bounding box to a specified aspect ratio\n    crop_size = resize_crop(crop_size, target_ar)\n    x1, y1, x2, y2 = cs2bbox(center, crop_size)\n\n    new_origin = np.array([[x1, y1]], dtype=keypoints.dtype)\n    new_keypoints = keypoints.copy()\n    if keypoints.shape[1] == 2:\n        new_keypoints = visible_keypoints - new_origin\n    elif keypoints.shape[1] == 3: \n        new_keypoints[visible_indices, :2] = visible_keypoints - new_origin\n    return center, crop_size, new_keypoints, vis_rate\n\ndef draw_bboxes(img_path, bboxes_dict, save_path=None):\n    \"\"\"\n    Draw bounding boxes with OpenCV.\n    \"\"\"\n    data_numpy = cv2.imread(img_path, 1 | 128)  \n    for name, (color, bboxes) in bboxes_dict.items():\n        for bbox in bboxes:\n            start_point = (bbox[0], bbox[1])\n            end_point = (bbox[2], bbox[3])\n            cv2.rectangle(data_numpy, start_point, end_point, color, 2)\n    if save_path is not None:\n        cv2.imwrite(save_path, data_numpy)\n    return data_numpy\n\ndef imread_rgb(img_path):\n    \"\"\"\n    Read image with OpenCV.\n    \"\"\"    \n    data_numpy = cv2.imread(img_path, 1 | 128)  \n    data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)\n    return data_numpy\n\ndef save_cropped_patches(img_path, \n                         keypoints, \n                         save_dir=\"./\", \n                         threshold=0.25,\n                         enlarge=1.4, \n                         target_ar=None\n                         ):\n    \"\"\"\n    Crop instances from a image given part screen coordinates and save them.\n    \"\"\"\n#    data_numpy = cv2.imread(\n#        img_path, cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION\n#        )   \n    data_numpy = cv2.imread(img_path, 1 | 128)  \n    # data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)\n    # debug\n    # import matplotlib.pyplot as plt\n    # plt.imshow(data_numpy[:,:,::-1])\n    # plt.plot(keypoints[0][:,0], keypoints[0][:,1], 'ro')\n    # plt.pause(0.1)\n    if not os.path.exists(save_dir):\n        os.makedirs(save_dir)\n    new_paths = []\n    all_new_keypoints = []\n    all_bbox = []\n    for i in range(len(keypoints)):\n        center, crop_size, new_keypoints, vis_rate = kpts2cs(keypoints[i], \n                                                             enlarge, \n                                                             target_ar=target_ar)\n        all_bbox.append(list(map(int, cs2bbox(center, crop_size))))\n        if vis_rate < threshold:\n            continue\n        all_new_keypoints.append(new_keypoints.reshape(1, keypoints.shape[1], -1))\n        cropped = simple_crop(data_numpy, center, crop_size)\n        save_path = os.path.join(save_dir, \"instance_{:d}.jpg\".format(i))\n        new_paths.append(save_path)\n        cv2.imwrite(save_path, cropped)\n        del cropped\n    if len(new_paths) == 0:\n        # No instances cropped\n        return new_paths, np.zeros((0, keypoints.shape[1], 3)), all_bbox\n    else:\n        return new_paths, np.concatenate(all_new_keypoints, axis=0), all_bbox\n    \ndef get_max_preds(batch_heatmaps):\n    \"\"\"\n    Get predictions from heatmaps with hard arg-max.\n    \n    batch_heatmaps: numpy.ndarray([batch_size, num_joints, height, width])\n    \"\"\"\n    assert isinstance(batch_heatmaps, np.ndarray), \\\n        'batch_heatmaps should be numpy.ndarray'\n    assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'\n\n    batch_size = batch_heatmaps.shape[0]\n    num_joints = batch_heatmaps.shape[1]\n    width = batch_heatmaps.shape[3]\n    heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1))\n    idx = np.argmax(heatmaps_reshaped, 2)\n    maxvals = np.amax(heatmaps_reshaped, 2)\n\n    maxvals = maxvals.reshape((batch_size, num_joints, 1))\n    idx = idx.reshape((batch_size, num_joints, 1))\n\n    preds = np.tile(idx, (1, 1, 2)).astype(np.float32)\n\n    preds[:, :, 0] = (preds[:, :, 0]) % width\n    preds[:, :, 1] = np.floor((preds[:, :, 1]) / width)\n\n    pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))\n    pred_mask = pred_mask.astype(np.float32)\n\n    preds *= pred_mask\n    return preds, maxvals\n\ndef soft_arg_max_np(batch_heatmaps):\n    \"\"\"\n    Soft-argmax instead of hard-argmax considering quantization errors.\n    \"\"\"\n    assert isinstance(batch_heatmaps, np.ndarray), \\\n        'batch_heatmaps should be numpy.ndarray'\n    assert batch_heatmaps.ndim == 4, 'batch_images should be 4-ndim'\n    batch_size = batch_heatmaps.shape[0]\n    num_joints = batch_heatmaps.shape[1]\n    height = batch_heatmaps.shape[2]\n    width = batch_heatmaps.shape[3]\n    # get score/confidence for each joint\n    heatmaps_reshaped = batch_heatmaps.reshape((batch_size, num_joints, -1))\n    maxvals = np.amax(heatmaps_reshaped, 2)\n    maxvals = maxvals.reshape((batch_size, num_joints, 1))    \n    # normalize the heatmaps so that they sum to 1\n    #assert batch_heatmaps.min() >= 0.0\n    batch_heatmaps = np.clip(batch_heatmaps, a_min=0.0, a_max=None)\n    temp_sum = heatmaps_reshaped.sum(axis = 2, keepdims=True)\n    heatmaps_reshaped /= temp_sum\n    ## another normalization method: softmax\n    # spatial soft-max\n    #heatmaps_reshaped = softmax(heatmaps_reshaped, axis=2)\n    ##\n    batch_heatmaps = heatmaps_reshaped.reshape(batch_size, num_joints, height, width)\n    x = batch_heatmaps.sum(axis = 2)\n    y = batch_heatmaps.sum(axis = 3)\n    x_indices = np.arange(width).astype(np.float32).reshape(1,1,width)\n    y_indices = np.arange(height).astype(np.float32).reshape(1,1,height)\n    x *= x_indices\n    y *= y_indices\n    x = x.sum(axis = 2, keepdims=True)\n    y = y.sum(axis = 2, keepdims=True)\n    preds = np.concatenate([x, y], axis=2)\n    pred_mask = np.tile(np.greater(maxvals, 0.0), (1, 1, 2))\n    pred_mask = pred_mask.astype(np.float32)\n    preds *= pred_mask\n    return preds, maxvals\n\ndef soft_arg_max(batch_heatmaps):\n    \"\"\"\n    A pytorch version of soft-argmax\n    \"\"\"\n    assert len(batch_heatmaps.shape) == 4, 'batch_images should be 4-ndim'\n    batch_size = batch_heatmaps.shape[0]\n    num_joints = batch_heatmaps.shape[1]\n    height = batch_heatmaps.shape[2]\n    width = batch_heatmaps.shape[3]\n    heatmaps_reshaped = batch_heatmaps.view((batch_size, num_joints, -1))\n    # get score/confidence for each joint    \n    maxvals = heatmaps_reshaped.max(dim=2)[0]\n    maxvals = maxvals.view((batch_size, num_joints, 1))       \n    # normalize the heatmaps so that they sum to 1\n    heatmaps_reshaped = F.softmax(heatmaps_reshaped, dim=2)\n    batch_heatmaps = heatmaps_reshaped.view(batch_size, num_joints, height, width)\n    x = batch_heatmaps.sum(dim = 2)\n    y = batch_heatmaps.sum(dim = 3)\n    x_indices = torch.arange(width).type(torch.cuda.FloatTensor)\n    x_indices = torch.cuda.comm.broadcast(x_indices, devices=[x.device.index])[0]\n    x_indices = x_indices.view(1, 1, width)\n    y_indices = torch.arange(height).type(torch.cuda.FloatTensor)\n    y_indices = torch.cuda.comm.broadcast(y_indices, devices=[y.device.index])[0]\n    y_indices = y_indices.view(1, 1, height)    \n    x *= x_indices\n    y *= y_indices\n    x = x.sum(dim = 2, keepdim=True)\n    y = y.sum(dim = 2, keepdim=True)\n    preds = torch.cat([x, y], dim=2)\n    return preds, maxvals\n\ndef appro_cr(coordinates):\n    \"\"\"\n    Approximate the square of cross-ratio along four ordered 2D points using \n    inner-product\n    \n    coordinates: PyTorch tensor of shape [4, 2]\n    \"\"\"\n    AC = coordinates[2] - coordinates[0]\n    BD = coordinates[3] - coordinates[1]\n    BC = coordinates[2] - coordinates[1]\n    AD = coordinates[3] - coordinates[0]\n    return (AC.dot(AC) * BD.dot(BD)) / (BC.dot(BC) * AD.dot(AD))\n\ndef to_npy(tensor):\n    \"\"\"\n    Convert PyTorch tensor to numpy array.\n    \"\"\"\n    if isinstance(tensor, np.ndarray):\n        return tensor\n    else:\n        return tensor.data.cpu().numpy()"
  },
  {
    "path": "libs/common/transformation.py",
    "content": "\"\"\"\r\nCoordinate transformation functions.\r\n\r\nAuthor: Shichao Li\r\nContact: nicholas.li@connect.ust.hk\r\n\"\"\"\r\n\r\nimport numpy as np\r\nimport cv2\r\n\r\ndef move_to(points, xyz=np.zeros((1,3))):\r\n    # points of shape [n_points, 3]\r\n    centroid = points.mean(axis=0, keepdims=True)\r\n    return points - (centroid - xyz)\r\n\r\ndef world_to_camera_frame(P, R, T):\r\n    \"\"\"\r\n    Convert points from world to camera coordinates\r\n    \r\n    P: Nx3 3d points in world coordinates\r\n    R: 3x3 Camera rotation matrix\r\n    T: 3x1 Camera translation parameters\r\n    \r\n    Returns\r\n    X_cam: Nx3 3d points in camera coordinates\r\n    \"\"\"\r\n    assert len(P.shape) == 2\r\n    assert P.shape[1] == 3\r\n    X_cam = R.dot( P.T - T ) # rotate and translate\r\n    return X_cam.T\r\n\r\ndef camera_to_world_frame(P, R, T):\r\n    \"\"\"\r\n    Inverse of world_to_camera_frame\r\n\r\n    P: Nx3 points in camera coordinates\r\n    R: 3x3 Camera rotation matrix\r\n    T: 3x1 Camera translation parameters\r\n    \r\n    Returns\r\n    X_cam: Nx3 points in world coordinates\r\n    \"\"\"\r\n    assert len(P.shape) == 2\r\n    assert P.shape[1] == 3\r\n    X_cam = R.T.dot( P.T ) + T # rotate and translate\r\n    return X_cam.T\r\n\r\ndef compute_similarity_transform(X, Y, compute_optimal_scale=False):\r\n    \"\"\"\r\n    A port of MATLAB's `procrustes` function to Numpy.\r\n    Adapted from http://stackoverflow.com/a/18927641/1884420\r\n    \r\n    Args\r\n      X: array NxM of targets, with N number of points and M point dimensionality\r\n      Y: array NxM of inputs\r\n      compute_optimal_scale: whether we compute optimal scale or force it to be 1\r\n    \r\n    Returns:\r\n      d: squared error after transformation\r\n      Z: transformed Y\r\n      T: computed rotation\r\n      b: scaling\r\n      c: translation\r\n    \"\"\"\r\n    muX = X.mean(0)\r\n    muY = Y.mean(0)\r\n    X0 = X - muX\r\n    Y0 = Y - muY\r\n    ssX = (X0**2.).sum()\r\n    ssY = (Y0**2.).sum()\r\n    # centred Frobenius norm\r\n    normX = np.sqrt(ssX)\r\n    normY = np.sqrt(ssY)\r\n    # scale to equal (unit) norm\r\n    X0 = X0 / normX\r\n    Y0 = Y0 / normY\r\n    # optimum rotation matrix of Y\r\n    A = np.dot(X0.T, Y0)\r\n    U,s,Vt = np.linalg.svd(A,full_matrices=False)\r\n    V = Vt.T\r\n    T = np.dot(V, U.T)\r\n    # Make sure we have a rotation\r\n    detT = np.linalg.det(T)\r\n    V[:,-1] *= np.sign( detT )\r\n    s[-1]   *= np.sign( detT )\r\n    T = np.dot(V, U.T)\r\n    traceTA = s.sum()\r\n    if compute_optimal_scale:  # Compute optimum scaling of Y.\r\n        b = traceTA * normX / normY\r\n        d = 1 - traceTA**2\r\n        Z = normX*traceTA*np.dot(Y0, T) + muX\r\n    else:  # If no scaling allowed\r\n        b = 1\r\n        d = 1 + ssY/ssX - 2 * traceTA * normY / normX\r\n        Z = normY*np.dot(Y0, T) + muX\r\n    c = muX - b*np.dot(muY, T)\r\n    return d, Z, T, b, c\r\n\r\ndef compute_rigid_transform(X, Y, W=None, verbose=False):\r\n    \"\"\"\r\n    A least-sqaure estimate of rigid transformation by SVD.\r\n    \r\n    Reference: https://content.sakai.rutgers.edu/access/content/group/\r\n    7bee3f05-9013-4fc2-8743-3c5078742791/material/svd_ls_rotation.pdf\r\n    \r\n    X, Y: [d, N] N data points of dimention d\r\n    W: [N, ] optional weight (importance) matrix for N data points\r\n    \"\"\"    \r\n    assert len(X) == len(Y)\r\n    assert (W is None) or (len(W.shape) in [1, 2])\r\n    # find mean column wise\r\n    centroid_X = np.mean(X, axis=1, keepdims=True)\r\n    centroid_Y = np.mean(Y, axis=1, keepdims=True)\r\n    # subtract mean\r\n    Xm = X - centroid_X\r\n    Ym = Y - centroid_Y\r\n    if W is None:\r\n        H = Xm @ Ym.T\r\n    else:\r\n        W = np.diag(W) if len(W.shape) == 1 else W\r\n        H = Xm @ W @ Ym.T\r\n    # find rotation\r\n    U, S, Vt = np.linalg.svd(H)\r\n    R = Vt.T @ U.T\r\n    if np.linalg.det(R) < 0:\r\n        # special reflection case\r\n        if verbose:\r\n            print(\"det(R) < R, reflection detected!, correcting for it ...\\n\");\r\n        # the global minimizer with a orthogonal transformation is not possible\r\n        # the next best transformation is chosen\r\n        Vt[-1,:] *= -1\r\n        R = Vt.T @ U.T\r\n    t = -R @ centroid_X + centroid_Y\r\n    return R, t\r\n\r\ndef procrustes_transform(X, Y):\r\n    \"\"\"\r\n    Compute a rigid transformation trans() from X to Y and return trans(X)\r\n    \"\"\"\r\n    R, t = compute_rigid_transform(X, Y)\r\n    return R @ X + t\r\n\r\ndef pnp_refine(prediction, observation, intrinsics, dist_coeffs):\r\n    \"\"\"\r\n    Refine 3D prediction with observed image projection based on  the PnP algorithm.\r\n    \"\"\"\r\n    (success, R, T) = cv2.solvePnP(prediction,\r\n                                   observation,\r\n                                   intrinsics,\r\n                                   dist_coeffs,\r\n                                   flags=cv2.SOLVEPNP_ITERATIVE)\r\n    if not success:\r\n        print('PnP failed.')\r\n        return prediction\r\n    else:\r\n        refined_prediction = cv2.Rodrigues(R)[0] @ prediction.T + T    \r\n        return refined_prediction"
  },
  {
    "path": "libs/common/utils.py",
    "content": "\"\"\"\nCommon utilities.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport torch\nimport torch.nn as nn\nimport numpy as np\n\nfrom libs.metric.criterions import PCK_THRES\n\nimport os\nfrom os.path import join as pjoin\nfrom collections import namedtuple\n\ndef make_dir(name):\n    \"\"\"    \n    Create a directory.\n    \"\"\"\n    if not os.path.exists(os.path.dirname(name)):\n        try:\n            os.makedirs(os.path.dirname(name))\n        except OSError as exc:\n            print('make_dir failed.')\n            raise exc\n    return\n\ndef save_checkpoint(states, is_best, output_dir, filename='checkpoint.pth'):\n    torch.save(states, pjoin(output_dir, filename))\n    if is_best and 'state_dict' in states:\n        torch.save(states['best_state_dict'], pjoin(output_dir, 'model_best.pth'))\n\ndef get_model_summary(model, *input_tensors, item_length=26, verbose=False):\n    \"\"\"\n    Summarize a model. For now only convolution, batch normalization and \n    linear layers are considered for parameters and FLOPs.\n    \"\"\"\n    summary = []\n    ModuleDetails = namedtuple(\n        \"Layer\", \n        [\"name\", \"input_size\", \"output_size\", \"num_parameters\", \"multiply_adds\"]\n        )\n    hooks = []\n    layer_instances = {}\n\n    def hook(module, input, output):\n        class_name = str(module.__class__.__name__)\n        instance_index = 1\n        if class_name not in layer_instances:\n            layer_instances[class_name] = instance_index\n        else:\n            instance_index = layer_instances[class_name] + 1\n            layer_instances[class_name] = instance_index\n    \n        layer_name = class_name + \"_\" + str(instance_index)\n    \n        params = 0\n    \n        if class_name.find(\"Conv\") != -1 or class_name.find(\"BatchNorm\") != -1 or \\\n           class_name.find(\"Linear\") != -1:\n            for param_ in module.parameters():\n                params += param_.view(-1).size(0)\n    \n        flops = \"Not Available\"\n        if class_name.find(\"Conv\") != -1 and hasattr(module, \"weight\"):\n            flops = (\n                torch.prod(\n                    torch.LongTensor(list(module.weight.data.size()))) *\n                torch.prod(\n                    torch.LongTensor(list(output.size())[2:]))).item()\n        elif isinstance(module, nn.Linear):\n            flops = (torch.prod(torch.LongTensor(list(output.size()))) \\\n                     * input[0].size(1)).item()\n    \n        if isinstance(input[0], list):\n            input = input[0]\n        if isinstance(output, list):\n            output = output[0]\n    \n        summary.append(\n            ModuleDetails(\n                name=layer_name,\n                input_size=list(input[0].size()),\n                output_size=list(output.size()),\n                num_parameters=params,\n                multiply_adds=flops)\n        )\n\n    def add_hooks(module):\n        if not isinstance(module, nn.ModuleList) \\\n           and not isinstance(module, nn.Sequential) \\\n           and module != model:\n            hooks.append(module.register_forward_hook(hook))\n\n    model.eval()\n    model.apply(add_hooks)\n\n    space_len = item_length\n\n    model(*input_tensors)\n    for h in hooks:\n        h.remove()\n\n    details = ''\n    if verbose:\n        details = \"Model Summary\" + \\\n            os.linesep + \\\n            \"Name{}Input Size{}Output Size{}Parameters{}Multiply Adds (Flops){}\".format(\n                ' ' * (space_len - len(\"Name\")),\n                ' ' * (space_len - len(\"Input Size\")),\n                ' ' * (space_len - len(\"Output Size\")),\n                ' ' * (space_len - len(\"Parameters\")),\n                ' ' * (space_len - len(\"Multiply Adds (Flops)\"))) \\\n                + os.linesep + '-' * space_len * 5 + os.linesep\n\n    params_sum = 0\n    flops_sum = 0\n    for layer in summary:\n        params_sum += layer.num_parameters\n        if layer.multiply_adds != \"Not Available\":\n            flops_sum += layer.multiply_adds\n        if verbose:\n            details += \"{}{}{}{}{}{}{}{}{}{}\".format(\n                layer.name,\n                ' ' * (space_len - len(layer.name)),\n                layer.input_size,\n                ' ' * (space_len - len(str(layer.input_size))),\n                layer.output_size,\n                ' ' * (space_len - len(str(layer.output_size))),\n                layer.num_parameters,\n                ' ' * (space_len - len(str(layer.num_parameters))),\n                layer.multiply_adds,\n                ' ' * (space_len - len(str(layer.multiply_adds)))) \\\n                + os.linesep + '-' * space_len * 5 + os.linesep\n\n    details += os.linesep \\\n        + \"Total Parameters: {:,}\".format(params_sum) \\\n        + os.linesep + '-' * space_len * 5 + os.linesep\n    details += \"Total Multiply Adds (For Convolution and Linear Layers only): {:,} GFLOPs\".format(flops_sum/(1024**3)) \\\n        + os.linesep + '-' * space_len * 5 + os.linesep\n    details += \"Number of Layers\" + os.linesep\n    for layer in layer_instances:\n        details += \"{} : {} layers   \".format(layer, layer_instances[layer])\n\n    return details\n\nclass AverageMeter(object):\n    \"\"\"\n    An averaege meter object that computes and stores the average and current value.\n    \"\"\"\n    def __init__(self):\n        self.reset()\n        self.PCK_stats = {}\n        \n    def reset(self):\n        self.val = 0\n        self.avg = 0\n        self.sum = 0\n        self.count = 0\n        return\n    \n    def update(self, val, n=1, others=None):\n        self.val = val\n        self.sum += val * n\n        self.count += n\n        self.avg = self.sum / self.count if self.count != 0 else 0\n        if others is not None and'correct_cnt' in others:\n            if 'sum' not in self.PCK_stats:\n                self.PCK_stats['sum'] = np.zeros(len(others['correct_cnt'])) \n            self.PCK_stats['sum'] += others['correct_cnt']\n            if 'total' not in self.PCK_stats:\n                self.PCK_stats['total'] = 0.\n            self.PCK_stats['total'] += n         \n        return\n    \n    def print_content(self):\n        if 'sum' in self.PCK_stats:\n            for idx, value in enumerate(self.PCK_stats['sum']):\n                PCK = value / self.PCK_stats['total']\n                print('Average PCK at threshold {:.2f}: {:.3f}'.format(PCK_THRES[idx], PCK))\n        return"
  },
  {
    "path": "libs/dataset/KITTI/__init__.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nEmpty file.\n\"\"\"\n\n\n"
  },
  {
    "path": "libs/dataset/KITTI/car_instance.py",
    "content": "\"\"\"\nKITTI dataset implemented as PyTorch dataset object.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport libs.dataset.basic.basic_classes as bc\nimport libs.visualization.points as vp\nimport libs.common.img_proc as lip\n\nfrom libs.common.utils import make_dir\nfrom libs.common.img_proc import get_affine_transform\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport torch\nimport torchvision.transforms as transforms\nimport cv2\nimport csv\nimport copy\n\nfrom PIL import Image\nfrom mpl_toolkits.mplot3d import Axes3D\nfrom torch.utils.data.dataloader import default_collate\nfrom os.path import join as pjoin\nfrom os.path import sep as osep\nfrom os.path import exists\nfrom os import listdir \n\n# maximum number of instances to the network depending on your GPU memory\nMAX_INS_CNT = 140\n#MAX_INS_CNT = 64\n\nTYPE_ID_CONVERSION = {\n    'Car': 0,\n    'Cyclist': 1,\n    'Pedestrian': 2,\n}\n\n# annotation style of KITTI dataset\nFIELDNAMES = ['type', \n              'truncated', \n              'occluded', \n              'alpha', \n              'xmin', \n              'ymin', \n              'xmax', \n              'ymax', \n              'dh', \n              'dw',\n              'dl', \n              'lx', \n              'ly', \n              'lz', \n              'ry']\n\n# the format of prediction has one more field: confidence score\nFIELDNAMES_P = FIELDNAMES.copy() + ['score']\n\n# indices used for performing interpolation\n# key->value: style->index arrays\ninterp_dict = {\n    'bbox12':(np.array([1,3,5,7,# h direction\n                        1,2,3,4,# l direction\n                        1,2,5,6]), # w direction\n              np.array([2,4,6,8,\n                        5,6,7,8,\n                        3,4,7,8])\n              ),\n    'bbox12l':(np.array([1,2,3,4,]), # w direction\n              np.array([5,6,7,8])\n              ),\n    'bbox12h':(np.array([1,3,5,7]), # w direction\n              np.array([2,4,6,8])\n              ),\n    'bbox12w':(np.array([1,2,5,6]), # w direction\n              np.array([3,4,7,8])\n              ),    \n    }\n\n# indices used for computing the cross ratio\ncr_indices_dict = {\n    'bbox12':np.array([[ 1,  9, 21,  2],\n                       [ 3, 10, 22,  4],\n                       [ 5, 11, 23,  6],\n                       [ 7, 12, 24,  8],\n                       [ 1, 13, 25,  5],\n                       [ 2, 14, 26,  6],\n                       [ 3, 15, 27,  7],\n                       [ 4, 16, 28,  8],\n                       [ 1, 17, 29,  3],\n                       [ 2, 18, 30,  4],\n                       [ 5, 19, 31,  7],\n                       [ 6, 20, 32,  8]]\n                      )\n    }\n\ndef get_cr_indices():\n    \"\"\"\n    Helper function to define the indices used in computing the cross-ratio.\n    \"\"\"\n    num_base_pts = 9\n    num_lines = 12\n    parents, children = interp_dict['bbox12']\n    cr_indices = []\n    for line_idx in range(num_lines):\n        parent_idx = parents[line_idx] # first point\n        child_idx = children[line_idx] # last point\n        second_point_idx = num_base_pts + line_idx\n        third_point_idx = num_base_pts + num_lines + line_idx\n        cr_indices.append(np.array([parent_idx, \n                                   second_point_idx, \n                                   third_point_idx,\n                                   child_idx]\n                                  ).reshape(1,4)\n                         )\n    cr_indices = np.vstack(cr_indices)\n    return cr_indices\n\nclass KITTI(bc.SupervisedDataset):\n    \"\"\"\n    KITTI dataset.\n    \"\"\"    \n    def __init__(self, cfgs, split, logger, scale=1.0):\n        super().__init__(cfgs, split, logger)\n        self.logger = logger\n        self.logger.info(\"Initializing KITTI {:s} set, please wait...\".format(split))\n        self.exp_type = cfgs['exp_type'] # exp_type: experiment type \n        self._data_dir = cfgs['dataset']['root'] # root directory\n        self._classes = cfgs['dataset']['detect_classes'] # used object classes\n        self._get_data_parameters(cfgs) # initialize hyper-parameters\n        self._set_paths() # initialize paths\n        self._inference_mode = False \n        self.car_sizes = [] # dimension of cars\n        self._load_image_list()\n        if self.split in ['train', 'valid', 'trainvalid'] and \\\n            self.exp_type in ['instanceto2d', 'baselinealpha', 'baselinetheta']:\n            # prepare local coordinates used in certain types of experiments\n            self._prepare_key_points(cfgs)\n            # save cropped car instances for debugging\n            # cropped_path = pjoin(self._data_config['cropped_dir'], self.kpts_style,\n            #                      self.split)\n            # if not exists(cropped_path) and cfgs['dataset']['pre-process']:\n            #     self._save_cropped_instances()            \n        # prepare data used for future loading\n        self.generate_pairs()\n        # self.visualize()\n        if self.split in ['train', 'trainvalid'] and self.exp_type in ['2dto3d']:\n            # 2dto3d means the data is used by the lifter that predicts 3D \n            # cuboid based on 2D screen coordinates \n            self.normalize() # data normalization used for the lifter network\n        if 'ss' in cfgs and cfgs['ss']['flag']:\n            # use unlabeled images for weak self-supervision\n            self.use_ss = True\n            self.ss_settings = cfgs['ss']\n            self._initialize_unlabeled_data(cfgs)\n        self.logger.info(\"Initialization finished for KITTI {:s} set\".format(split))\n        # self.show_statistics()\n        # debugging code if you need\n        # test = self[10]\n        # test = self.extract_ss_sample(1)\n    \n    def _get_image_path_list(self):\n        \"\"\"\n        Prepare list of image paths for the used split.\n        \"\"\"\n        assert 'image_name_list' in self._data_config\n        image_path_list = []\n        for name in self._data_config['image_name_list']:\n            img_path = pjoin(self._data_config['image_dir'], name)\n            image_path_list.append(img_path)\n        self._data_config['image_path_list'] = image_path_list        \n        return\n    \n    def _initialize_unlabeled_data(self, cfgs):\n        \"\"\"\n        Initialize unlabeled data for self-supervision experiment.\n        \"\"\"\n        self.ss_record = np.load(cfgs['ss']['record_path'], allow_pickle=True).item()\n        self.logger.info('Found prepared self-supervision record at: ' + cfgs['ss']['record_path'])\n        return\n    \n    def _load_image_list(self):\n        \"\"\"\n        Prepare list of image names for the used split.\n        \"\"\"\n        path = self._data_config[self.split + '_list']       \n        with open(path, \"r\") as f:\n            image_name_list = f.read().splitlines()\n        for idx, line in enumerate(image_name_list):\n            base_name = line.replace(\"\\n\", \"\")\n            image_name = base_name + \".png\"\n            image_name_list[idx] = image_name\n        self._data_config['image_name_list'] = image_name_list\n        self._get_image_path_list()\n        return\n    \n    def _check_precomputed_file(self, path, name):\n        \"\"\"\n        Check if a pre-computed numpy file exists or not.\n        \"\"\"\n        if exists(path):\n            self.logger.info('Found prepared {0:s} at {1:s}'.format(name, path))\n            value = np.load(path, allow_pickle=True).item()\n            setattr(self, name, value)\n            return True\n        else:\n            return False\n        \n    def _save_precomputed_file(self, data_dic, pre_computed_path, name):\n        \"\"\"\n        Save a pre-computed numpy file.\n        \"\"\"\n        setattr(self, name, data_dic)\n        make_dir(pre_computed_path)\n        np.save(pre_computed_path, data_dic)\n        self.logger.info('Save prepared {0:s} at {1:s}'.format(name, pre_computed_path))        \n        return\n    \n    def _prepare_key_points_custom(self, style, interp_params, vis_thresh=0.25):\n        \"\"\"\n        Project 3D bounding boxes to image planes to prepare screen coordinates.\n        \"\"\"        \n        assert 'keypoint_dir' in self._data_config\n        kpt_dir = self._data_config['keypoint_dir']\n        if interp_params['flag']:\n            style += str(interp_params['coef'])\n        pre_computed_path_kpts = pjoin(kpt_dir, '{0:s}_{1:s}_{2:s}.npy'.format(style, self.split, str(self._classes)))\n        pre_computed_path_ids = pjoin(kpt_dir, '{0:s}_{1:s}_{2:s}_ids.npy'.format(style, self.split, str(self._classes)))        \n        pre_computed_path_rots = pjoin(kpt_dir, '{0:s}_{1:s}_{2:s}_rots.npy'.format(style, self.split, str(self._classes)))   \n        if self._check_precomputed_file(pre_computed_path_kpts, 'keypoints'):\n            pass\n        if self._check_precomputed_file(pre_computed_path_ids, 'instance_ids'):\n            pass     \n        if self._check_precomputed_file(pre_computed_path_rots, 'rotations'):\n            return    \n        path_list = self._data_config['image_path_list']\n        data_dic_kpts = {}\n        data_dic_ids = {}\n        data_dic_rots = {}\n        for path in path_list:\n            image_name = path.split(osep)[-1]\n            # instances that lie out of the image plane will be discarded \n            list_2d, _, list_id, _, list_rots = self.get_2d_3d_pair(path, \n                                                                    style=style, \n                                                                    augment=False,\n                                                                    add_visibility=True,\n                                                                    filter_outlier=True,\n                                                                    add_rotation=True\n                                                                    )  \n            if len(list_2d) == 0:\n                continue\n            for idx, kpts in enumerate(list_2d):\n                list_2d[idx] = kpts.reshape(1, -1, 3)\n            data_dic_kpts[image_name] = np.concatenate(list_2d, axis=0)\n            data_dic_ids[image_name] = list_id\n            data_dic_rots[image_name] = np.concatenate(list_rots, axis=0)\n        self._save_precomputed_file(data_dic_kpts, pre_computed_path_kpts, 'keypoints')\n        self._save_precomputed_file(data_dic_ids, pre_computed_path_ids, 'instance_ids')  \n        self._save_precomputed_file(data_dic_rots, pre_computed_path_rots, 'rotations') \n        return\n    \n    def _prepare_key_points(self, cfgs):\n        self.kpts_style = cfgs['dataset']['2d_kpt_style']\n        self._prepare_key_points_custom(self.kpts_style, cfgs['dataset']['interpolate'])\n        if 'enlarge_factor' in cfgs['dataset']:\n            self.enlarge_factor = cfgs['dataset']['enlarge_factor']\n        else:\n            self.enlarge_factor = 1.1\n        return\n    \n    def _save_cropped_instances(self):\n        # DEPRECATED, will be removed in a future release\n        \"\"\" \n        Crop and save car instance images with given 2d key-points\n        \"\"\"        \n        assert hasattr(self, 'keypoints')\n        all_save_paths = []\n        all_keypoints = []\n        all_bbox = []\n        target_ar = self.hm_para['target_ar']\n        for image_name in self.keypoints.keys():\n            image_path = pjoin(self._data_config['image_dir'], image_name)\n            save_dir = pjoin(self._data_config['cropped_dir'], self.kpts_style,\n                             self.split, image_name[:-4])\n            keypoints = self.keypoints[image_name]\n            new_paths, new_keypoints, bboxes = lip.save_cropped_patches(image_path, \n                                                                keypoints, \n                                                                save_dir, \n                                                                enlarge=self.enlarge_factor,\n                                                                target_ar=target_ar)\n            all_save_paths += new_paths\n            all_keypoints.append(new_keypoints)\n            all_bbox += bboxes\n        annot_save_name = pjoin(self._data_config['cropped_dir'], \n                                self.kpts_style, self.split, 'annot.npy')\n        np.save(annot_save_name, {'paths': all_save_paths,\n                                  'kpts': np.concatenate(all_keypoints, axis=0),\n                                  'global_box': all_bbox\n                                  })\n        return\n    \n    def _prepare_2d_pose_annot(self, threshold=4):\n        \"\"\" \n        Prepare annotation for training the coordinate regression model.\n        \"\"\"          \n        all_paths = []\n        all_boxes = []\n        all_rotations = []\n        all_keypoints = []\n        all_keypoints_raw = []\n        for image_name in self.keypoints.keys():\n            image_path = pjoin(self._data_config['image_dir'], image_name)\n            # raw keypoints using camera projection\n            keypoints = self.keypoints[image_name]\n            rotations = self.rotations[image_name]\n            boxes_img = []\n            rots_img = []\n            visible_kpts_img = []\n            for i in range(len(keypoints)):\n                # Note here severely-occluded instances are ignored in the trainign data\n                visible_cnt = np.sum(keypoints[i][:, 2])\n                if visible_cnt < threshold:\n                    continue\n                else:\n                    # now set all keypoints as visible\n                    tempt_kpts = keypoints[i][:,:2]\n                    visible_kpts_img.append(np.expand_dims(tempt_kpts, 0))\n                center, crop_size, new_keypoints, vis_rate = lip.kpts2cs(tempt_kpts, enlarge=self.enlarge_factor)\n                bbox_instance = np.array((list(map(int, lip.cs2bbox(center, crop_size)))))\n                boxes_img.append(bbox_instance.reshape(1,4))\n                rots_img.append(rotations[i].reshape(1,2))\n            if len(boxes_img) == 0:\n                continue\n            all_paths.append(image_path)            \n            all_boxes.append(np.concatenate(boxes_img))\n            all_rotations.append(np.concatenate(rots_img))\n            all_keypoints.append(np.concatenate(visible_kpts_img))\n            all_keypoints_raw.append(keypoints)\n        return {'paths':all_paths, \n                'boxes':all_boxes, \n                'rots':all_rotations,\n                'kpts':all_keypoints,\n                'raw_kpts':all_keypoints_raw\n                }\n    \n    def _prepare_detection_records(self, save=False, threshold = 0.1):\n        # DEPRECATED UNTIL FURTHER UPDATE\n        raise ValueError\n\n    def gather_annotations(self, \n                           threshold=0.1, \n                           use_raw_bbox=False, \n                           add_gt=True,\n                           filter_outlier=False\n                           ):\n        \"\"\" \n        Read ground truth 3D bounding box labels.\n        \"\"\"           \n        path_list = self._data_config['image_path_list'] \n        record_dict = {}\n        for img_path in path_list:\n            image_name = img_path.split(osep)[-1]\n            if self.split != 'test':\n                # default: use gt label and calibration\n                label_path = pjoin(self._data_config['label_dir'], \n                                   image_name[:-4] + '.txt'\n                                   )\n                self.read_single_file(image_name, \n                                      record_dict, \n                                      label_path=label_path,\n                                      fieldnames=FIELDNAMES,\n                                      add_gt=add_gt,\n                                      use_raw_bbox=use_raw_bbox,\n                                      filter_outlier=filter_outlier\n                                      )\n            else:\n                record_dict[image_name] = {}\n        self.annot_dict = record_dict\n        return     \n    \n    def read_single_file(self, \n                         image_name, \n                         record_dict, \n                         label_path=None,\n                         calib_path=None,\n                         threshold=0.1,\n                         fieldnames=FIELDNAMES_P,\n                         add_gt=False,\n                         use_raw_bbox=True,\n                         filter_outlier=False,\n                         bbox_only=False\n                         ):\n        \"\"\" \n        Read labels and prepare annotation for a single image.\n        \"\"\"  \n        style = self._data_config['3d_kpt_sample_style']\n        image_path = pjoin(self._data_config['image_dir'], image_name)\n        if label_path is None:\n            # default is ground truth annotation\n            label_path = pjoin(self._data_config['label_dir'], image_name[:-3] + 'txt')\n        if calib_path is None:\n            calib_path = pjoin(self._data_config['calib_dir'], image_name[:-3] + 'txt')\n        list_2d, list_3d, list_id, pv, raw_bboxes = self.get_2d_3d_pair(image_path,\n                                                                        label_path=label_path,\n                                                                        calib_path=calib_path,\n                                                                        style=style,\n                                                                        augment=False,\n                                                                        add_raw_bbox=True,\n                                                                        bbox_only=bbox_only,\n                                                                        filter_outlier=filter_outlier,\n                                                                        fieldnames=fieldnames # also load the confidence score\n                                                                        )  \n        if len(raw_bboxes) == 0:\n            return False        \n        if image_name not in record_dict:\n            record_dict[image_name] = {}\n        raw_annot, P = self.load_annotations(label_path, calib_path, fieldnames=fieldnames)\n        # use different (slightly) intrinsic parameters for different images\n        K = P[:, :3]  \n        if len(list_2d) != 0:\n            for idx, kpts in enumerate(list_2d):\n                list_2d[idx] = kpts.reshape(1, -1, 3)\n                list_3d[idx] = list_3d[idx].reshape(1, -1, 3)\n            all_keypoints_2d = np.concatenate(list_2d, axis=0)\n            all_keypoints_3d = np.concatenate(list_3d, axis=0)                       \n            # compute 2D bounding box based on the projected 3D boxes\n            bboxes_kpt = []\n            for idx, keypoints in enumerate(all_keypoints_2d):\n                # relatively tight bounding box: use enlarge = 1.0\n                # delete invisible instances\n                center, crop_size, _, _ = lip.kpts2cs(keypoints[:,:2],\n                                                      enlarge=1.01)\n                bbox = np.array(lip.cs2bbox(center, crop_size))             \n                bboxes_kpt.append(np.array(bbox).reshape(1, 4))\n            record_dict[image_name]['kpts_3d'] = all_keypoints_3d\n            if add_gt:\n                # special key name representing ground truth\n                record_dict[image_name]['kpts'] = all_keypoints_2d\n                record_dict[image_name]['kpts_3d_gt'] = all_keypoints_3d\n        if use_raw_bbox:\n            bboxes = np.vstack(raw_bboxes)\n        elif len(bboxes_kpt) != 0:\n            bboxes = np.vstack(bboxes_kpt)\n            \n        record_dict[image_name]['bbox_2d'] = bboxes\n        record_dict[image_name]['raw_txt_format'] = raw_annot\n        record_dict[image_name]['K'] = K\n        # add some key-value pairs as ground truth annotation\n        if add_gt:         \n            pvs = np.vstack(pv) if len(pv) != 0 else []\n            tempt_dic = {'boxes': bboxes,\n                         'pose_vecs_gt':pvs\n                         }\n            record_dict[image_name] = {**record_dict[image_name], **tempt_dic}              \n        return True\n    \n    def read_predictions(self, path):\n        \"\"\"\n        Read the prediction files in the same format as the ground truth.\n        \"\"\"\n        self.logger.info(\"Reading predictions from {:s}\".format(path))\n        file_list = listdir(path)  \n        record_dict = {}\n        use_raw_bbox = True if self.split == 'test' else False\n        for file_name in file_list:\n            if not file_name.endswith(\".txt\"):\n                continue\n            image_name = file_name[:-4] + \".png\"\n            label_path = pjoin(path, file_name)            \n            self.read_single_file(image_name, \n                                  record_dict, \n                                  label_path=label_path,\n                                  use_raw_bbox=use_raw_bbox\n                                  )\n        self.logger.info(\"Reading predictions finished.\")\n        return record_dict\n    \n    def _get_data_parameters(self, cfgs):\n        \"\"\"\n        Initialize dataset-relevant parameters.\n        \"\"\"\n        self._data_config = {}\n        self._data_config['image_size_raw'] = NotImplemented\n        if self.exp_type in ['2dto3d', 'inference', 'finetune']:\n            # parameters relevant to input/output representation\n            for key in ['3d_kpt_sample_style', 'lft_in_rep', 'lft_out_rep']:\n                self._data_config[key] = cfgs['dataset'][key] \n        if self.exp_type in ['2dto3d']:  \n            # parameters relevant to data augmentation              \n            for key in ['lft_aug','lft_aug_times']:\n                self._data_config[key] = cfgs['training_settings'][key]\n        # parameters relevant to cuboid interpolation\n        self.interp_params = cfgs['dataset']['interpolate']\n        # parameters relevant to heatmap regression model and image data augmentation\n        if 'heatmapModel' in cfgs:\n            hm = cfgs['heatmapModel']\n            jitter_flag = hm['jitter_bbox'] and self.split=='train' and cfgs['train']\n            self.hm_para = {'reference': 'bbox',\n                            'resize': True,\n                            'add_xy': hm['add_xy'],\n                            'jitter_bbox': jitter_flag,\n                            'jitter_params': hm['jitter_params'],\n                            # (height, width)\n                            'input_size': np.array([hm['input_size'][1],\n                                             hm['input_size'][0]]),\n                            'heatmap_size': np.array([hm['heatmap_size'][1],\n                                               hm['heatmap_size'][0]]),\n                            'target_ar': hm['heatmap_size'][1]/hm['heatmap_size'][0],\n                            'augment': hm['augment_input'],\n                            'sf': cfgs['dataset']['scaling_factor'],\n                            'rf': cfgs['dataset']['rotation_factor'],\n                            'num_joints': hm['num_joints'],\n                            'sigma': hm['sigma'] if 'sigma' in hm else None,\n                            'target_type': hm['target_type'] if 'target_type' in hm else None,\n                            'use_different_joints_weight': \n                                hm['use_different_joints_weight'] if 'use_different_joints_weight' in hm else None                            \n                              }\n            self.num_joints = hm['num_joints']\n        # parameters relevant to PyTorch image transformation operations\n        if 'pth_transform' in cfgs['dataset']:\n            pth_transform = cfgs['dataset']['pth_transform']\n            normalize = transforms.Normalize(\n                mean=pth_transform['mean'], \n                std=pth_transform['std']\n                )\n            transform_list = [transforms.ToTensor(), normalize]\n            if self.exp_type == 'detect2D' and self.split == 'train':\n                transform_list.append(transforms.RandomHorizontalFlip(0.5))\n            self.pth_trans = transforms.Compose(transform_list)           \n\n    def _set_paths(self):\n        \"\"\"\n        Initialize relevant directories.\n        \"\"\"\n        ROOT = self.root\n        split = self.split\n        # validation set is a sub-set of the official training split\n        # train/val/test: 3712/3769/7518\n        split = 'train' if self.split == 'valid' else split\n        split += 'ing'\n        self._data_config['image_dir'] = pjoin(ROOT, split, 'image_2')\n        self._data_config['cropped_dir'] = pjoin(ROOT, split, 'cropped')\n        self._data_config['drawn_dir'] = pjoin(ROOT, split, 'drawn')\n        self._data_config['label_dir'] = pjoin(ROOT, split, 'label_2')\n        self._data_config['calib_dir'] = pjoin(ROOT, split, 'calib')\n        self._data_config['keypoint_dir'] = pjoin(ROOT, split, 'keypoints')\n        self._data_config['stats_dir'] = pjoin(ROOT, 'instance_stats.npy')\n        # list of images for each sub-set\n        self._data_config['train_list'] = pjoin(ROOT, 'training/ImageSets/train.txt')\n        self._data_config['valid_list'] = pjoin(ROOT, 'training/ImageSets/val.txt')\n        self._data_config['test_list'] = pjoin(ROOT, 'testing/ImageSets/test.txt')\n        self._data_config['trainvalid_list'] = pjoin(ROOT, 'training/ImageSets/trainval.txt')        \n        return\n    \n    def project_3d_to_2d(self, points, K):\n        \"\"\" \n        Get 2D projection of 3D points in the camera coordinate system. \n        \"\"\"          \n        projected = K @ points.T\n        projected[:2, :] /= projected[2, :]\n        return projected\n    \n    def render_car(self, ax, K, obj_class, rot_y, locs, dimension, shift):\n        # DEPRECATED\n        cam_cord = []\n        self.get_cam_cord(cam_cord, shift, rot_y, dimension, locs)\n        # get 2D projections \n        projected = self.project_3d_to_2d(cam_cord[0], K)\n        ax.plot(projected[0, :], projected[1, :], 'ro')\n        vp.plot_3d_bbox(ax, projected[:2, 1:].T)\n        return\n    \n    def show_statistics(self):\n        # DEPRECATED\n        path = self._data_config['stats_dir']       \n        if self._check_precomputed_file(path, 'instance_stats') or self.split != 'train':\n            return\n        self.instance_statistics = {}\n        if hasattr(self, 'car_sizes') and len(self.car_sizes) != 0:\n            all_sizes = np.concatenate(self.car_sizes)\n            fig, axes = plt.subplots(3,1)\n            names = ['x', 'y', 'z']\n            for axe_id in range(3):\n                axes[axe_id].hist(all_sizes[:, axe_id])\n                axes[axe_id].set_xlabel('Car size in {:s} direction'.format(names[axe_id]))\n                axes[axe_id].set_ylabel('Counts')\n            mean_size = all_sizes.mean(axis=0)\n            std_size = all_sizes.std(axis=0)\n            self.instance_statistics['size'] = {'mean':mean_size,\n                                                'std': std_size\n                                                }\n            # prepare a reference 3D bounding box\n            xmax, xmin = mean_size[0], -mean_size[0]\n            ymax, ymin = mean_size[1], -mean_size[1]\n            zmax, zmin = mean_size[2], -mean_size[2]\n            bbox = np.array([[xmax, ymin, zmax],\n                             [xmax, ymax, zmax],\n                             [xmax, ymin, zmin],\n                             [xmax, ymax, zmin],\n                             [xmin, ymin, zmax],\n                             [xmin, ymax, zmax],\n                             [xmin, ymin, zmin],\n                             [xmin, ymax, zmin]])\n            bbox = np.vstack([np.array([[0., 0., 0.]]), bbox])            \n            self.instance_statistics['ref_box3d'] = bbox\n        self._save_precomputed_file(self.instance_statistics, path, 'instance_stats')            \n        return\n    \n    def augment_pose_vector(self, \n                            locs,\n                            rot_y,\n                            obj_class,\n                            dimension,\n                            augment,\n                            augment_times,\n                            std_rot = np.array([15., 50., 15.])*np.pi/180.,\n                            std_trans = np.array([0.2, 0.01, 0.2]),\n                            ):\n        \"\"\"\n        Data augmentation used for training the lifter sub-model.\n        \n        std_rot: standard deviation of rotation around x, y and z axis\n        std_trans: standard deviation of translation along x, y and z axis\n        \"\"\"\n        aug_ids, aug_pose_vecs = [], []\n        aug_ids.append((obj_class, dimension))\n        # KITTI only annotates rotation around y-axis (yaw)\n        pose_vec = np.concatenate([locs, np.array([0., rot_y, 0.])]).reshape(1, 6)\n        aug_pose_vecs.append(pose_vec)\n        if not augment:\n            return aug_ids, aug_pose_vecs\n        rots_random = np.random.randn(augment_times, 3) * std_rot.reshape(1, 3)\n        # y-axis\n        rots_random[:, 1] += rot_y\n        trans_random = 1 + np.random.randn(augment_times, 3) * std_trans.reshape(1, 3)\n        trans_random *= locs.reshape(1, 3)\n        for i in range(augment_times):\n            # augment 6DoF pose\n            aug_ids.append((obj_class, dimension))\n            pose_vec = np.concatenate([trans_random[i], rots_random[i]]).reshape(1, 6)\n            aug_pose_vecs.append(pose_vec)\n        return aug_ids, aug_pose_vecs\n    \n    def get_representation(self, p2d, p3d, in_rep, out_rep):\n        \"\"\"\n        Get input-output representations based on 3d point cloud and its \n        projected 2D screen coordinates.\n        \"\"\"        \n        # input representation\n        if len(p2d) > 0:\n            num_kpts = len(p2d[0])\n        if in_rep == 'coordinates2d':\n            input_list = [points.reshape(1, num_kpts, -1) for points in p2d]\n        elif in_rep == 'coordinates2d+area' and self._data_config['3d_kpt_sample_style'] == 'bbox9':\n            # indices: [corner, neighbour1, neighbour2]\n            indices = self.area_indices\n            input_list = [vp.get_area(points, indices, True) for points in p2d]\n        else:\n            raise NotImplementedError('Undefined input representation.')\n        # output representation\n        if out_rep == 'R3d+T':\n            # R3D stands for relative 3D shape, T stands for translation\n            # center the camera coordinates to remove depth\n            output_list = []\n            for i in range(len(p3d)):\n                # format: the root should be pre-computed as the first 3d point \n                root = p3d[i][[0], :]\n                relative_shape = p3d[i][1:, :] - root\n                output = np.concatenate([root, relative_shape], axis=0)\n                output_list.append(output.reshape(1, -1)) \n        elif out_rep == 'R3d': # relative 3D shape\n            output_list = []\n            # save a copy of the 3D object roots\n            if not hasattr(self, 'root_list'):\n                self.root_list = []\n            for i in range(len(p3d)):\n                # format: the root should be pre-computed as the first 3d point \n                root = p3d[i][[0], :]\n                self.root_list.append(root)\n                relative_shape = p3d[i][1:, :] - root\n                output_list.append(relative_shape.reshape(1, -1)) \n        else:\n            raise NotImplementedError('undefined output representation.')\n        return input_list, output_list\n    \n    def get_input_output_size(self):\n        \"\"\"\n        Get the input/output size for 2d-to-3d lifting.\n        \"\"\"\n        num_joints = self.num_joints\n        if self._data_config['lft_in_rep'] == 'coordinates2d':\n             input_size = num_joints*2\n        else:\n             raise NotImplementedError\n        if self._data_config['lft_out_rep'] in ['R3d+T']:\n             output_size = num_joints*3\n        elif self._data_config['lft_out_rep'] in ['R3d']:\n             output_size = (num_joints - 1) * 3             \n        else:\n             raise NotImplementedError        \n        return input_size, output_size\n    \n    def interpolate(self, \n                    bbox_3d, \n                    style, \n                    interp_coef=[0.5], \n                    dimension=None, \n                    strings=['l','h','w']\n                    ):\n        \"\"\"\n        Interpolate 3d points on a 3D bounding box with a specified style.\n        \"\"\"\n        if dimension is not None:\n            # size-encoded representation\n            l = dimension[0]\n            if l < 3.5:\n                style += 'l'\n            elif l < 4.5:\n                style += 'h'\n            else:\n                style += 'w'       \n        pidx, cidx = interp_dict[style]\n        parents, children = bbox_3d[:, pidx], bbox_3d[:, cidx]\n        lines = children - parents\n        new_joints = [(parents + interp_coef[i]*lines) for i in range(len(interp_coef))]\n        return np.hstack([bbox_3d, np.hstack(new_joints)])\n    \n    def construct_box_3d(self, l, h, w, interp_params):\n        \"\"\"\n        Construct 3D bounding box corners in the canonical pose.\n        \"\"\"        \n        x_corners = [0.5*l, l, l, l, l, 0, 0, 0, 0]\n        y_corners = [0.5*h, 0, h, 0, h, 0, h, 0, h]\n        z_corners = [0.5*w, w, w, 0, 0, w, w, 0, 0]\n        x_corners += - np.float32(l) / 2\n        y_corners += - np.float32(h)\n        z_corners += - np.float32(w) / 2\n        corners_3d = np.array([x_corners, y_corners, z_corners])     \n        if interp_params['flag']:\n            corners_3d = self.interpolate(corners_3d, \n                                          interp_params['style'],\n                                          interp_params['coef'],\n                                          #dimension=np.array([l,h,w]) # dimension aware\n                                          )\n        return corners_3d\n    \n    def get_cam_cord(self, cam_cord, shift, ids, pose_vecs, rot_xz=False):\n        \"\"\"\n        Construct 3D bounding box corners in the camera coordinate system.\n        \"\"\"         \n        # does not augment the dimension for now\n        dims = ids[0][1]\n        l, h, w = dims[0], dims[1], dims[2]\n        corners_3d_fixed = self.construct_box_3d(l, h, w, self.interp_params)\n        for pose_vec in pose_vecs:\n            # translation\n            locs = pose_vec[0, :3]\n            rots = pose_vec[0, 3:]\n            x, y, z = locs[0], locs[1], locs[2] # bottom center of the labeled 3D box\n            rx, ry, rz = rots[0], rots[1], rots[2]\n            # This purturbation turns out to work well for rotation estimation\n#            x *= (1 + np.random.randn()*0.1)\n#            y *= (1 + np.random.randn()*0.05)\n#            z *= (1 + np.random.randn()*0.1)\n            if self.split == 'train' and self.exp_type == '2dto3d' and not self._inference_mode:\n                ry += np.random.randn()*np.pi # random perturbation\n            rot_maty = np.array([[np.cos(ry), 0, np.sin(ry)],\n                                [0, 1, 0],\n                                [-np.sin(ry), 0, np.cos(ry)]])\n            if rot_xz:\n                # rotation. Only yaw angle is considered in KITTI dataset\n                rot_matx = np.array([[1, 0, 0],\n                                    [0, np.cos(rx), -np.sin(rx)],\n                                    [0, np.sin(rx), np.cos(rx)]])        \n    \n                rot_matz = np.array([[np.cos(rz), -np.sin(rz), 0],\n                                    [np.sin(rz), np.cos(rz), 0],\n                                    [0, 0, 1]])        \n                # TODO: correct here\n                rot_mat = rot_matz @ rot_maty @ rot_matx     \n            else:\n                rot_mat = rot_maty\n            corners_3d = np.matmul(rot_mat, corners_3d_fixed)\n            # translation\n            corners_3d += np.array([x, y, z]).reshape([3, 1])\n            camera_coordinates = corners_3d + shift\n            cam_cord.append(camera_coordinates.T)\n        return \n    \n    def csv_read_annot(self, file_path, fieldnames):\n        \"\"\"\n        Read instance attributes in the KITTI format. Instances not in the \n        selected class will be ignored. \n        \n        A list of python dictionary is returned where each dictionary \n        represents one instsance.\n        \"\"\"        \n        annotations = []\n        with open(file_path, 'r') as csv_file:\n            reader = csv.DictReader(csv_file, delimiter=' ', fieldnames=fieldnames)\n            for line, row in enumerate(reader):\n                if row[\"type\"] in self._classes:\n                    annot_dict = {\n                        \"class\": row[\"type\"],\n                        \"label\": TYPE_ID_CONVERSION[row[\"type\"]],\n                        \"truncation\": float(row[\"truncated\"]),\n                        \"occlusion\": float(row[\"occluded\"]),\n                        \"alpha\": float(row[\"alpha\"]),\n                        \"dimensions\": [float(row['dl']), \n                                       float(row['dh']), \n                                       float(row['dw'])\n                                       ],\n                        \"locations\": [float(row['lx']), \n                                      float(row['ly']), \n                                      float(row['lz'])\n                                      ],\n                        \"rot_y\": float(row[\"ry\"]),\n                        \"bbox\": [float(row[\"xmin\"]),\n                                 float(row[\"ymin\"]),\n                                 float(row[\"xmax\"]),\n                                 float(row[\"ymax\"])\n                                 ]\n                    }\n                    if \"score\" in fieldnames:\n                        annot_dict[\"score\"] = float(row[\"score\"])\n                    annotations.append(annot_dict)        \n        return annotations\n    \n    def csv_read_calib(self, file_path):\n        \"\"\"\n        Read camera projection matrix in the KITTI format.\n        \"\"\"  \n        with open(file_path, 'r') as csv_file:\n            reader = csv.reader(csv_file, delimiter=' ')\n            for line, row in enumerate(reader):\n                if row[0] == 'P2:':\n                    P = row[1:]\n                    P = [float(i) for i in P]\n                    P = np.array(P, dtype=np.float32).reshape(3, 4)\n                    break        \n        return P\n    \n    def load_annotations(self, label_path, calib_path, fieldnames=FIELDNAMES): \n        \"\"\"\n        Read 3D annotation and camera parameters.\n        \"\"\"          \n        if self.split in ['train', 'valid', 'trainvalid', 'test']:\n            annotations = self.csv_read_annot(label_path, fieldnames)\n        # get camera intrinsic matrix K\n        P = self.csv_read_calib(calib_path)\n        return annotations, P\n    \n    def add_visibility(self, joints, img_width=1242, img_height=375):\n        \"\"\"\n        Compute binary visibility of projected 2D parts.\n        \"\"\"  \n        assert joints.shape[1] == 2\n        visibility = np.ones((len(joints), 1))\n        # predicate from upper left corner\n        predicate1 = joints - np.array([[0., 0.]])\n        predicate1 = (predicate1 > 0.).prod(axis=1)\n        # predicate from lower right corner\n        predicate2 = joints - np.array([[img_width, img_height]])\n        predicate2 = (predicate2 < 0.).prod(axis=1)\n        visibility[:, 0] *= predicate1*predicate2      \n        return np.hstack([joints, visibility])\n    \n    def get_inlier_indices(self, p_2d, threshold=0.3):\n        \"\"\"\n        Get indices of instances that are visible 'enough'.\n        \"\"\"  \n        indices = []\n        num_joints = p_2d[0].shape[0]\n        for idx, kpts in enumerate(p_2d):\n            if p_2d[idx][:, 2].sum() / num_joints >= threshold:\n                indices.append(idx)        \n        return indices\n    \n    def filter_outlier(self, p_2d, p_3d, threshold=0.3):\n        \"\"\"\n        Keep instances that are visible 'enough'.\n        \"\"\"  \n        p_2d_filtered, p_3d_filtered, indices = [], [], []\n        num_joints = p_2d[0].shape[0]\n        for idx, kpts in enumerate(p_2d):\n            if p_2d[idx][:, 2].sum() / num_joints >= threshold:\n                p_2d_filtered.append(p_2d[idx])\n                p_3d_filtered.append(p_3d[idx])\n                indices.append(idx)\n        return p_2d_filtered, p_3d_filtered\n    \n    def get_img_size(self, path):\n        \"\"\"\n        Get the resolution of an image without loading it.\n        \"\"\"\n        with Image.open(path) as image:\n            size = image.size \n        return size\n    \n    def get_2d_3d_pair(self, \n                       image_path, \n                       label_path=None,\n                       calib_path=None,\n                       style='null',\n                       in_rep = 'coordinates2d',\n                       out_rep = 'R3d+T',\n                       augment=False, \n                       augment_times=1,\n                       add_visibility=True,\n                       add_raw_bbox=False, # add original bbox annotation from KITTI\n                       add_rotation=False, # add orientation angles\n                       bbox_only=False, # only returns raw bounding box\n                       filter_outlier=True,\n                       fieldnames=FIELDNAMES\n                       ):\n        \"\"\"\n        Get (input, output) pair used for training a lifter sub-model from a \n        single image.\n        \"\"\"\n        image_name = image_path.split(osep)[-1]\n        if label_path is None:\n            # default is ground truth annotation\n            label_path = pjoin(self._data_config['label_dir'], image_name[:-3] + 'txt')\n        if calib_path is None:\n            calib_path = pjoin(self._data_config['calib_dir'], image_name[:-3] + 'txt')\n        anns, P = self.load_annotations(label_path, calib_path, fieldnames=fieldnames)\n        # The intrinsics may vary slightly for different images\n        # Yet one may convert them to a fixed one by applying a homography\n        K = P[:, :3]\n        # Debug: use pre-defined intrinsic parameters\n        # K = np.array([[707.0493,   0.    , 604.0814],\n        #               [  0.    , 707.0493, 180.5066],\n        #               [  0.    ,   0.    ,   1.    ]], dtype=np.float32)\n        shift = np.linalg.inv(K) @ P[:, 3].reshape(3,1)      \n        # P containes intrinsics and extrinsics, I factorize P to K[I|K^-1t] \n        # and use extrinsics to compute the camera coordinate\n        # here the extrinsics represent the shift between current camera to\n        # the reference grayscale camera        \n        # For more calibration details, refer to \"Vision meets Robotics: The KITTI Dataset\"\n        camera_coordinates = []\n        pose_vecs = []\n        # id includes the class and size of the object\n        ids = []\n        if add_raw_bbox:\n            bboxes = []\n        if add_rotation:\n            rotations = []\n        for i, a in enumerate(anns):\n            a = a.copy()\n            obj_class = a[\"label\"]\n            dimension = a[\"dimensions\"]\n            locs = np.array(a[\"locations\"])\n            rot_y = np.array(a[\"rot_y\"])\n            if add_raw_bbox:\n                bboxes.append(np.array(a[\"bbox\"]).reshape(1,4))\n            if add_rotation:\n                rotations.append(np.array([a[\"alpha\"], a[\"rot_y\"]]).reshape(1,2))\n            # apply data augmentation to represent a larger variation of\n            # 3D pose and translation \n            if bbox_only:\n                continue\n            aug_ids, aug_pose_vecs = self.augment_pose_vector(locs,\n                                                              rot_y,\n                                                              obj_class,\n                                                              dimension,\n                                                              augment,\n                                                              augment_times\n                                                              )\n            self.get_cam_cord(camera_coordinates, \n                              shift, \n                              aug_ids, \n                              aug_pose_vecs\n                              )                    \n            ids += aug_ids\n            pose_vecs += aug_pose_vecs\n        num_instances = len(camera_coordinates)\n        # get 2D projections \n        if len(camera_coordinates) != 0:\n            camera_coordinates = np.vstack(camera_coordinates)\n            projected = self.project_3d_to_2d(camera_coordinates, K)[:2, :].T\n            # target is camera coordinates\n            p_2d = np.split(projected, num_instances, axis=0) \n            p_3d = np.split(camera_coordinates, num_instances, axis=0) \n            # set visibility to 0 if the projected keypoints lie out of the image plane\n            if add_visibility:\n                width, height = self.get_img_size(image_path)\n                for idx, joints in enumerate(p_2d):\n                    p_2d[idx] = self.add_visibility(joints, width, height)\n            # filter out the instances that lie outside of the image\n            if filter_outlier:\n                indices = self.get_inlier_indices(p_2d)\n                p_2d = [p_2d[idx] for idx in indices]\n                p_3d = [p_3d[idx] for idx in indices]\n                # p_2d, p_3d = self.filter_outlier(p_2d, p_3d)\n            if filter_outlier and add_raw_bbox:\n                bboxes = [bboxes[idx] for idx in indices]\n            if filter_outlier and add_rotation:\n                rotations = [rotations[idx] for idx in indices]            \n            list_2d, list_3d = self.get_representation(p_2d, p_3d, in_rep, out_rep)\n\n        else:\n            list_2d, list_3d, ids, pose_vecs = [], [], [], []\n        ret = list_2d, list_3d, ids, pose_vecs\n        if add_raw_bbox:\n            ret = ret + (bboxes, )\n        if add_rotation:\n            ret = ret + (rotations, )\n        return ret            \n    \n    def show_annot(self, \n                   image_path, \n                   label_file=None, \n                   calib_file=None, \n                   save_dir=None\n                   ):\n        \"\"\"\n        Show the annotation of an image.\n        \"\"\"      \n        image_name = image_path.split(osep)[-1]\n        if label_file is None:\n            label_file = pjoin(self._data_config['label_dir'], image_name[:-3] + 'txt')\n        if calib_file is None:\n            calib_file = pjoin(self._data_config['calib_dir'], image_name[:-3] + 'txt')\n        anns, P = self.load_annotations(label_file, calib_file)\n        K = P[:, :3]\n        shift = np.linalg.inv(K) @ P[:, 3].reshape(3,1)        \n        image = cv2.imread(image_path, cv2.IMREAD_UNCHANGED)[:, :, ::-1]\n        fig1 = plt.figure(figsize=(11.3, 9))\n        ax = plt.subplot(111)\n        ax.imshow(image)\n        fig2 = plt.figure(figsize=(11.3, 9))\n        ax = plt.subplot(111)\n        ax.imshow(image)        \n        for i, a in enumerate(anns):\n            a = a.copy()\n            obj_class = a[\"label\"]\n            dimension = a[\"dimensions\"]\n            locs = np.array(a[\"locations\"])\n            rot_y = np.array(a[\"rot_y\"])\n            self.render_car(ax, K, obj_class, rot_y, locs, dimension, shift) \n        if save_dir is not None:\n            output_path1 =  pjoin(save_dir, image_name + '_original.png')\n            output_path2 = pjoin(save_dir, image_name + '_annotated.png')\n            make_dir(output_path1)\n            fig1.savefig(output_path1, dpi=300)\n            fig2.savefig(output_path2, dpi=300)\n        return\n    \n    def _generate_2d_3d_paris(self):\n        \"\"\"\n        Prepare pair of 2D screen coordinates and 3D cuboid representation.\n        \n        \"\"\"\n        path_list = self._data_config['image_path_list']\n        kpt_3d_style = self._data_config['3d_kpt_sample_style']\n        in_rep = self._data_config['lft_in_rep']\n        out_rep = self._data_config['lft_out_rep'] # R3d (Relative 3D shape) encodes 3D rotation\n        input_list = []\n        output_list = []\n        id_list = []\n        augment = self._data_config['lft_aug'] if self.split == 'train' else False\n        augment_times = self._data_config['lft_aug_times']\n        for path in path_list:\n            list_2d, list_3d, ids, _ = self.get_2d_3d_pair(path, \n                                                           style=kpt_3d_style,\n                                                           in_rep = in_rep,\n                                                           out_rep = out_rep,\n                                                           augment=augment,\n                                                           augment_times=augment_times,\n                                                           add_visibility=True\n                                                           )            \n            input_list += list_2d\n            output_list += list_3d\n            id_list += ids\n        # does not use visibility as input\n        num_instance = len(input_list)\n        self.input = np.vstack(input_list)[:, :, :2].reshape(num_instance, -1)\n        # use visibility as input\n        # self.input = np.vstack(input_list).reshape(num_instance, -1)\n        self.output = np.vstack(output_list) \n        if hasattr(self, 'root_list'):\n            self.root_list = np.vstack(self.root_list)\n        self.num_joints = int(self.input.shape[1]/2)      \n        return\n    \n    def generate_pairs(self):\n        \"\"\"\n        Prepare data (e.g., input-output pairs and metadata) that will be used \n        depending on the type of experiment.\n        \"\"\"\n        if self.exp_type == '2dto3d':           \n            # generate 2D screen coordinates and 3D cuboid\n            self._generate_2d_3d_paris()\n        elif self.exp_type in ['instanceto2d', 'baselinealpha', 'baselinetheta']:\n            # # load the annotations containing cropped car instances \n            # path = pjoin(self._data_config['cropped_dir'], \n            #              self.kpts_style, self.split, 'annot.npy')\n            # assert exists(path), 'Please prepare instance annotation first.'\n            # self.annot_2dpose = np.load(path, allow_pickle=True).item() \n            self.annot_2dpose = self._prepare_2d_pose_annot()\n        elif self.exp_type in ['detection2d']:\n            self._prepare_detection_records()\n            self.total_data = len(self.detection_records)\n        elif self.exp_type == 'inference':\n            self.gather_annotations()\n            self.total_data = len(self.annot_dict)\n            self.annoted_img_paths = list(self.annot_dict.keys())\n        elif self.exp_type == 'finetune':\n            self.gather_annotations(use_raw_bbox=False, \n                                    add_gt=True, \n                                    filter_outlier=True\n                                    )\n            self.total_data = len(self.annot_dict)\n            self.annoted_img_paths = list(self.annot_dict.keys())            \n        else:\n            raise NotImplementedError('Unknown experiment type.')\n        # count of total data\n        if self.exp_type == '2dto3d':\n            self.input = self.input.astype(np.float32())\n            self.output = self.output.astype(np.float32())\n            self.total_data = len(self.input)\n        elif self.exp_type in ['instanceto2d', 'baselinealpha', 'baselinetheta']:\n            self.total_data = len(self.annot_2dpose['paths'])\n        return\n    \n    def visualize(self, plot_num = 1, save_dir=None):\n        \"\"\"\n        Show some random images with annotations.\n        \"\"\"        \n        path_list = self._data_config['image_path_list']\n        chosen = np.random.choice(len(path_list), plot_num, replace=False)\n        for img_idx in chosen:\n            self.show_annot(path_list[img_idx], save_dir=save_dir)\n        return\n    \n    def get_collate_fn(self):\n        return my_collate_fn\n    \n    def inference(self, flags=[True, True]):\n        self._inference_mode = flags[0]\n        self._read_img_during_inference = flags[1]\n    \n    def extract_ss_sample(self, cnt):\n        \"\"\"\n        Prepare data for self-supervised representation learning.\n        \"\"\"           \n        # cnt: number of fully supervised samples\n        extract_cnt = self.ss_settings['max_per_img'] - cnt\n        if extract_cnt <= 0:\n            num_channel = 5 if self.hm_para['add_xy'] else 3\n            return torch.zeros(0, num_channel, 256, 256), None, None, None\n        idx = np.random.randint(0, len(self.ss_record['paths']))\n        parameters = self.hm_para\n        parameters['boxes'] = self.ss_record['boxes'][idx]\n        joints = self.ss_record['kpts'][idx]\n        img_name = self.ss_record['paths'][idx].split(osep)[-1]\n        img_path = pjoin(self.ss_settings['img_root'], img_name)\n        image, target, weights, meta = lip.get_tensor_from_img(img_path, \n                                                               parameters, \n                                                               joints=joints,\n                                                               pth_trans=self.pth_trans,\n                                                               rf=parameters['rf'],\n                                                               sf=parameters['sf'],\n                                                               generate_hm=False,\n                                                               max_cnt=extract_cnt\n                                                               )        \n        return image, target, weights, meta\n    \n    def prepare_ft_dict(self, idx):\n        \"\"\"\n        Prepare data for fine-tuning.\n        \"\"\"  \n        img_name = self.annoted_img_paths[idx]\n        img_annot = self.annot_dict[img_name]\n        ret = {}\n        img_path = pjoin(self._data_config['image_dir'], img_name)\n        kpts = img_annot['kpts']\n        # the croping bounding box in the original image\n        # global_box = self.annot_2dpose['global_box'][idx]\n        parameters = self.hm_para\n        parameters['boxes'] = img_annot['bbox_2d']\n        # fs: fully-supervised ss: self-supervised\n        images_fs, heatmaps_fs, weights_fs, meta_fs = lip.get_tensor_from_img(img_path, \n                                                                              parameters, \n                                                                              joints=kpts,\n                                                                              pth_trans=self.pth_trans,\n                                                                              rf=parameters['rf'],\n                                                                              sf=parameters['sf'],\n                                                                              generate_hm=True)\n        ret['path'] = img_path\n        ret['images_fs'] = images_fs\n        ret['heatmaps_fs'] = heatmaps_fs\n        # ret['meta_fs'] = meta_fs\n        ret['kpts_3d'] = img_annot['kpts_3d']\n        ret['crop_center'] = meta_fs['center']\n        ret['crop_scale'] = meta_fs['scale']\n        ret['kpts_local'] = meta_fs['transformed_joints']\n        # prepare the affine transformation matrices so map local coordinates\n        # back to global screen coordinates\n        ret['af_mats'] = []\n        for idx in range(len(ret['crop_center'])):\n            trans_inv = get_affine_transform(ret['crop_center'][idx],\n                                             ret['crop_scale'][idx], \n                                             0., \n                                             self.hm_para['input_size'], \n                                             inv=1)  \n            ret['af_mats'].append(trans_inv)\n        # use random unlabeled images for data augmentation\n        if self.split == 'train' and self.use_ss:\n            images_ss, heatmaps_ss, weights_ss, meta_ss = self.extract_ss_sample(len(images_fs))\n            ret['images_ss'] = images_ss\n            ret['meta_ss'] = meta_ss\n        return ret\n    \n    def __getitem__(self, idx):\n        \"\"\"\n        Required by dataloader.\n        \"\"\"  \n        # only return testing images during inference\n        if self.split == 'test' or self._inference_mode:\n            #TODO: consider classes except for cars in the future\n            img_name = self.annoted_img_paths[idx]\n            # debug: use a specified image for visualization\n            # img_name = \"006658.png\"\n            img_path = pjoin(self._data_config['image_dir'], img_name)\n            if self._read_img_during_inference:\n                image = lip.imread_rgb(img_path)\n            else:\n                image = None\n            if self._read_img_during_inference and hasattr(self, 'pth_trans'):\n                # pytorch transformation if provided\n                image = self.pth_trans(image)\n            record = {'path':img_path}\n            # add other available annotations\n            if hasattr(self, 'annot_dict'):\n                record = {**record, **self.annot_dict[img_name]}\n            return image, record\n        # for training and validation splits\n        if self.exp_type == '2dto3d':\n            # the 2D-3D pairs are stored in RAM\n            meta_data = {}\n            # the 3D global position\n            if hasattr(self, 'root_list'):\n                meta_data['roots'] = self.root_list[idx]\n            return self.input[idx], self.output[idx], np.zeros((0,1)), meta_data\n        elif self.exp_type in ['baselinealpha', 'baselinetheta']:\n            img_path = self.annot_2dpose['paths'][idx]\n            rots = self.annot_2dpose['rots'][idx]\n            kpts = self.annot_2dpose['kpts'][idx]\n            if kpts.shape[2] == 2:\n                kpts = np.concatenate([kpts, np.ones((kpts.shape[0], kpts.shape[1], 1))], axis=2)            \n            parameters = self.hm_para\n            parameters['boxes'] = self.annot_2dpose['boxes'][idx]\n            images_fs, heatmaps_fs, weights_fs, meta_fs = lip.get_tensor_from_img(img_path, \n                                                                                  parameters, \n                                                                                  joints=kpts,\n                                                                                  pth_trans=self.pth_trans,\n                                                                                  rf=parameters['rf'],\n                                                                                  sf=parameters['sf'],\n                                                                                  generate_hm=False\n                                                                                  )\n            if self.exp_type == 'baselinealpha':\n                targets = [np.array([[np.cos(rots[idx][0]), np.sin(rots[idx][0])]])  for idx in range(len(rots))]\n                meta_fs['angles_gt'] = rots[:, 0]\n            elif self.exp_type == 'baselinetheta':\n                targets = [np.array([[np.cos(rots[idx][1]), np.sin(rots[idx][1])]]) for idx in range(len(rots))]\n                meta_fs['angles_gt'] = rots[:, 1]\n            targets = torch.from_numpy(np.concatenate(targets).astype(np.float32))\n            return images_fs, targets, weights_fs, meta_fs\n        elif self.exp_type == 'instanceto2d':\n            # the input images and target heatmaps are produced online\n            img_path = self.annot_2dpose['paths'][idx]\n            kpts = self.annot_2dpose['kpts'][idx]\n            # the croping bounding box in the original image\n            # global_box = self.annot_2dpose['global_box'][idx]\n            if kpts.shape[2] == 2:\n                kpts = np.concatenate([kpts, np.ones((kpts.shape[0], kpts.shape[1], 1))], axis=2)\n            parameters = self.hm_para\n            parameters['boxes'] = self.annot_2dpose['boxes'][idx]\n            # fs: fully-supervised ss: self-supervised\n            images_fs, heatmaps_fs, weights_fs, meta_fs = \\\n                lip.get_tensor_from_img(img_path, \n                                        parameters, \n                                        joints=kpts,\n                                        pth_trans=self.pth_trans,\n                                        rf=parameters['rf'],\n                                        sf=parameters['sf'],\n                                        generate_hm=True\n                                        )\n            # use random unlabeled images for data augmentation\n            if self.split == 'train' and hasattr(self, 'use_ss') and self.use_ss:\n                images_ss, heatmaps_ss, weights_ss, meta_ss = self.extract_ss_sample(len(images_fs))\n                images = [images_fs, images_ss]\n                targets = heatmaps_fs\n                weights = weights_fs\n                meta = meta_fs\n            else:\n                images = images_fs\n                targets = heatmaps_fs\n                weights = weights_fs\n                meta = meta_fs\n            return images, targets, weights, meta\n        elif self.exp_type == 'detection2d':\n            record = copy.deepcopy(self.detection_records[idx])\n            path = record['path']\n            image = lip.imread_rgb(path)\n            target = record['target']\n            if hasattr(self, 'pth_trans'):\n                # pytorch transformation if provided\n                image = self.pth_trans(image)\n            return image, target\n        elif self.exp_type == 'finetune':\n            # prepare images, 2D and 3D annotations as a dictionary for finetuning \n            ret = self.prepare_ft_dict(idx)\n            return ret\n        else:\n            raise NotImplementedError\n\ndef prepare_data(cfgs, logger):\n    \"\"\"\n    Prepare training and validation dataset objects.\n    \"\"\"  \n    train_set = KITTI(cfgs, 'train', logger)\n    valid_set = KITTI(cfgs, 'valid', logger)\n    if cfgs['exp_type'] == '2dto3d':\n        # normalize 2D keypoints\n        valid_set.normalize(train_set.statistics)\n    return train_set, valid_set\n\ndef get_dataset(cfgs, logger, split):\n    return KITTI(cfgs, split, logger)\n\ndef collate_dict(dict_list):\n    ret = {}\n    ret['path'] = [item['path'] for item in dict_list]\n    for key in dict_list[0]:\n        if key == 'path':\n            continue\n        ret[key] = np.concatenate([d[key] for d in dict_list], axis=0)\n    return ret\n\ndef length_limit(instances, targets, target_weights, meta):\n    if len(instances) > MAX_INS_CNT and len(instances) == len(targets):\n        # normal training\n        chosen = np.random.choice(len(instances), MAX_INS_CNT, replace=False)\n        ins, tar, tw, = instances[chosen], targets[chosen], target_weights[chosen]\n        m = {'path':meta['path']}\n        for key in meta:\n            if key != 'path':\n                m[key] = meta[key][chosen]\n    elif len(instances) > MAX_INS_CNT and len(instances) > len(targets) and meta['fs_instance_cnt'] > MAX_INS_CNT:\n        # mixed training: fully-supervised instances are too many\n        chosen = np.random.choice(meta['fs_instance_cnt'], MAX_INS_CNT, replace=False)\n        ins, tar, tw, = instances[chosen], targets[chosen], target_weights[chosen]\n        m = {'path':meta['path']}\n        for key in meta:\n            if key != 'path' and key != 'fs_instance_cnt':\n                m[key] = meta[key][chosen]\n    elif len(instances) > MAX_INS_CNT and len(instances) > len(targets) and meta['fs_instance_cnt'] <= MAX_INS_CNT:\n        # mixed training: self-supervised instances are too many\n        ins, tar, tw, m = instances[:MAX_INS_CNT], targets, target_weights, meta\n    else:\n        ins, tar, tw, m = instances, targets, target_weights, meta\n    return ins, tar, tw, m\n\ndef my_collate_fn(batch):\n    # the collate function for 2d pose training\n    instances, targets, target_weights, meta = list(zip(*batch))\n    if isinstance(instances[0], list):\n        # each batch comes in the format of (fs_instances, ss_instances)\n        fs_instances, ss_instances = list(zip(*instances))\n        fs_instances = torch.cat(fs_instances)\n        ss_instances = torch.cat(ss_instances)\n        instances = torch.cat([fs_instances, ss_instances])\n        targets = torch.cat(targets, dim=0)\n        # target_weights = torch.cat(target_weights, dim=0)\n        meta = collate_dict(meta)\n        meta['fs_instance_cnt'] = len(fs_instances)\n    else:\n        instances = torch.cat(instances, dim=0)\n        targets = torch.cat(targets, dim=0)\n        # target_weights = torch.cat(target_weights, dim=0)\n        meta = collate_dict(meta)\n    if target_weights[0] is not None:\n        target_weights = torch.cat(target_weights, dim=0)\n    else:\n        #dummy weight\n        target_weights = torch.ones(1)\n    return length_limit(instances, targets, target_weights, meta)"
  },
  {
    "path": "libs/dataset/__init__.py",
    "content": "#import libs.dataset.ApolloScape\nimport libs.dataset.KITTI\n\n"
  },
  {
    "path": "libs/dataset/basic/__init__.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nEmpty file.\n\"\"\"\n\n\n"
  },
  {
    "path": "libs/dataset/basic/basic_classes.py",
    "content": "\"\"\"\nBasic classes for customized dataset classes to inherit.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\nimport torch.utils.data\nimport libs.dataset.normalization.operations as nop\n\nclass SupervisedDataset(torch.utils.data.Dataset):\n    def __init__(self, cfgs, split, logger=None):\n        self.cfgs = cfgs\n        self.split = split\n        self.logger = logger\n        self.root = cfgs['dataset']['root']\n        return\n    \n    def generate_pairs(self, synthetic=True):\n        # sub-classes need to override this method to specify the inputs and\n        # outputs\n        self.input = None\n        self.output = None\n        self.total_data = 0\n        return\n    \n    def normalize(self, statistics=None):\n        \"\"\" \n        Normalize the (input, output) pairs with optional statistics.\n        \"\"\"\n        if statistics is None:\n            mean_in, std_in = nop.get_statistics_1d(self.input)\n            mean_out, std_out = nop.get_statistics_1d(self.output)\n            self.statistics = {'mean_in': mean_in,\n                               'mean_out': mean_out,\n                               'std_in': std_in,\n                               'std_out': std_out\n                               }\n        else:\n            mean_in, std_in = statistics['mean_in'], statistics['std_in']\n            mean_out, std_out = statistics['mean_out'], statistics['std_out']\n            self.statistics = statistics\n        self.input = nop.normalize_1d(self.input, mean_in, std_in)\n        self.output = nop.normalize_1d(self.output, mean_out, std_out)\n        return\n    \n    def unnormalize(self, data, mean, std):\n        return nop.unnormalize_1d(data, mean, std)\n    \n    def __len__(self):\n        return self.total_data\n\n    def __getitem__(self, idx):\n        return self.input[idx], self.output[idx]"
  },
  {
    "path": "libs/dataset/normalization/__init__.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nEmpty file.\n\"\"\"\n\n\n"
  },
  {
    "path": "libs/dataset/normalization/operations.py",
    "content": "\"\"\"\nDataset normalization operations.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport numpy as np\n\ndef get_statistics_1d(data):\n    \"\"\"\n    Compute statistics of 1D data.\n    \n    data of shape [num_sample, vector_length]\n    \"\"\"  \n    assert len(data.shape) == 2\n    mean = data.mean(axis=0, keepdims=True)\n    std = data.std(axis=0, keepdims=True)\n    return mean, std\n\ndef normalize_1d(data, mean, std, individual=False):\n    \"\"\"\n    Normalizes 1D data with mean and standard deviation.\n    \n    data: dictionary where values are\n    mean: np vector with the mean of the data\n    std: np vector with the standard deviation of the data\n    individual: whether to perform normalization independently for each input\n    \n    Returns\n    data_out: normalized data\n    \"\"\"\n    if individual:\n        # this representation has the implicit assumption that the representation\n        # is translational and scaling invariant\n        num_data = len(data)\n        data = data.reshape(num_data, -1, 2)\n        mean_x = np.mean(data[:,:,0], axis=1).reshape(num_data, 1)\n        std_x = np.std(data[:,:,0], axis=1)\n        mean_y = np.mean(data[:,:,1], axis=1).reshape(num_data, 1)\n        std_y = np.std(data[:,:,1], axis=1)\n        denominator = (0.5 * (std_x + std_y)).reshape(num_data, 1)\n        data[:,:,0] = (data[:,:,0] - mean_x)/denominator\n        data[:,:,1] = (data[:,:,1] - mean_y)/denominator\n        data_out = data.reshape(num_data, -1)\n    else:\n        data_out = (data - mean)/std\n    return data_out\n\ndef unnormalize_1d(normalized_data, mean, std):\n    orig_data = normalized_data*std + mean\n    return orig_data"
  },
  {
    "path": "libs/logger/__init__.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nEmpty file.\n\"\"\"\n\n\n"
  },
  {
    "path": "libs/logger/logger.py",
    "content": "\"\"\"\nBasic logging functions.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport logging\nimport os\nimport time\n\nfrom libs.common import utils\n\ninitialized = False\n\ndef get_dirs(cfgs):\n    \"\"\"\n    Prepare file directories for a logger object.\n    \"\"\"     \n    root_output_dir = cfgs['dirs']['output']\n    dataset_name = cfgs['dataset']['name']\n    cfg_name = cfgs['name']\n    final_output_dir = [root_output_dir, dataset_name]    \n    final_output_dir = os.path.join(*final_output_dir)\n    time_str = time.strftime('%Y-%m-%d %H:%M')\n    log_file = '{}_{}.log'.format(cfg_name, time_str)\n    final_log_file = os.path.join(final_output_dir, log_file)\n    return final_output_dir, final_log_file\n\ndef get_logger(cfgs, head = '%(asctime)-15s %(message)s'):\n    \"\"\"\n    Prepare a logger object.\n    \"\"\"     \n    final_output_dir, final_log_file = get_dirs(cfgs)\n    utils.make_dir(final_log_file)\n    logging.basicConfig(filename=str(final_log_file), format=head)\n    logger = logging.getLogger()\n    logger.setLevel(logging.INFO)\n    if len(logger.handlers) == 1:    \n        console = logging.StreamHandler()\n        logging.getLogger('').addHandler(console)    \n    return logger, final_output_dir"
  },
  {
    "path": "libs/loss/__init__.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nEmpty file.\n\"\"\"\n\n\n"
  },
  {
    "path": "libs/loss/function.py",
    "content": "\"\"\"\nLoss functions.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport numpy as np\nfrom scipy.spatial import distance_matrix\n\nfrom libs.common.img_proc import soft_arg_max, appro_cr\n\n\nloss_dict = {'mse': nn.MSELoss(reduction='mean'),\n             'sl1': nn.SmoothL1Loss(reduction='mean'),\n             'l1': nn.L1Loss(reduction='mean')\n             }\n\nclass JointsMSELoss(nn.Module):\n    def __init__(self, use_target_weight):\n        super(JointsMSELoss, self).__init__()\n        self.criterion = nn.MSELoss(reduction='mean')\n        self.use_target_weight = use_target_weight\n\n    def forward(self, output, target, target_weight, meta=None):\n        batch_size = output.size(0)\n        num_joints = output.size(1)\n        heatmaps_pred = output.reshape((batch_size, num_joints, -1)).split(1, 1)\n        heatmaps_gt = target.reshape((batch_size, num_joints, -1)).split(1, 1)\n        loss = 0\n\n        for idx in range(num_joints):\n            heatmap_pred = heatmaps_pred[idx].squeeze()\n            heatmap_gt = heatmaps_gt[idx].squeeze()\n            if self.use_target_weight:\n                loss += 0.5 * self.criterion(\n                    heatmap_pred.mul(target_weight[:, idx]),\n                    heatmap_gt.mul(target_weight[:, idx])\n                )\n            else:\n                loss += 0.5 * self.criterion(heatmap_pred, heatmap_gt)\n\n        return loss / num_joints\n\ndef get_comp_dict(spec_list = ['None', 'None', 'None'], \n                  loss_weights = [1,1,1]\n                  ):\n    comp_dict = {}\n\n    if spec_list[0] != 'None':\n        comp_dict['hm'] = (loss_dict[spec_list[0]], loss_weights[0])\n    if spec_list[1] != 'None':\n        comp_dict['coor'] = (loss_dict[spec_list[1]], loss_weights[1])\n    if spec_list[2] != 'None':\n        comp_dict['cr'] = (loss_dict[spec_list[2]], loss_weights[2])     \n    return comp_dict\n\nclass JointsCompositeLoss(nn.Module):\n    \"\"\"\n    Loss function for 2d screen coordinate regression which consists of \n    multiple terms.\n    \"\"\"\n    def __init__(self,\n                 spec_list,\n                 img_size,\n                 hm_size,\n                 loss_weights = [1,1,1],\n                 target_cr = None,\n                 cr_loss_thres = 0.15,\n                 use_target_weight=False\n                 ):\n        \"\"\"\n        comp_dict specify the optional terms used in the loss computation, \n        which is specified with spec_list.\n        loss for each component follows the format of [loss_type, weight],\n        loss_type speficy the loss type for each component (e.g. L1 or L2) while\n        weight gives the weight for this component.\n        \n        hm: a supervised loss defined with a heatmap target\n        coor: a supervised loss defined with 2D coordinates\n        cr: a self-supervised loss defined with prior cross-ratio\n        \"\"\"\n        super(JointsCompositeLoss, self).__init__()\n        self.comp_dict = get_comp_dict(spec_list, loss_weights)\n        self.img_size = img_size\n        self.hm_size = hm_size\n        self.target_cr = target_cr\n        self.use_target_weight = use_target_weight\n        self.apply_cr_loss = False\n        self.cr_loss_thres = cr_loss_thres\n\n    def calc_hm_loss(self, output, target):\n        \"\"\"\n        Heatmap loss which corresponds to L_{hm} in the paper.\n        \n        output: predicted heatmaps of shape [N, K, H, W]\n        target: ground truth heatmaps of shape [N, K, H, W]\n        \"\"\"        \n        batch_size = output.size(0)\n        num_parts = output.size(1)\n        heatmaps_pred = output.reshape((batch_size, num_parts, -1)).split(1, 1)\n        heatmaps_gt = target.reshape((batch_size, num_parts, -1)).split(1, 1)\n        loss = 0\n        for idx in range(num_parts):\n            heatmap_pred = heatmaps_pred[idx].squeeze()\n            heatmap_gt = heatmaps_gt[idx].squeeze()\n            loss += 0.5 * self.comp_dict['hm'][0](heatmap_pred, heatmap_gt)        \n        return loss / num_parts\n    \n    def calc_cross_ratio_loss(self, pred_coor, target_cr, mask):\n        \"\"\"\n        Cross-ratio loss which corresponds to L_{cr} in the paper.\n        \n        pred_coor: predicted local coordinates\n        target_cr: ground truth cross ratio\n        \"\"\"  \n        assert hasattr(self, 'cr_indices')\n        # this indices is assumed to be initialized by the user\n        loss = 0\n        mask = mask.to(pred_coor.device)\n        if mask.sum() == 0:\n            return loss\n        for sample_idx in range(len(pred_coor)):\n            for line_idx in range(len(self.cr_indices)):\n                if mask[sample_idx][line_idx] == 0:\n                    continue\n                # predicted cross-ratio square\n                pred_cr_sqr = appro_cr(pred_coor[sample_idx][self.cr_indices[line_idx]])\n                # normalize the predicted cross-ratio square\n                pred_cr_sqr /= target_cr**2\n                line_loss = self.comp_dict['cr'][0](pred_cr_sqr, torch.ones(1).to(pred_cr_sqr.device))\n                loss += line_loss * mask[sample_idx][line_idx][0]\n        return loss/mask.sum()\n    \n    def get_cr_mask(self, coordinates, threshold = 0.15):\n        \"\"\"\n        Mask some edges out when computing the cross-ratio loss.\n        Ignore the fore-shortened edges since they will produce large and \n        unstable gradient.\n        \"\"\"          \n        assert hasattr(self, 'cr_indices')\n        mask = torch.zeros(coordinates.shape[0], len(self.cr_indices), 1)\n        for sample_idx in range(len(coordinates)):\n            for line_idx in range(len(self.cr_indices)):\n                pts = coordinates[sample_idx][self.cr_indices[line_idx]]\n                dm = distance_matrix(pts, pts)\n                minval = np.min(dm[np.nonzero(dm)])\n                if minval > threshold:\n                    mask[sample_idx][line_idx] = 1.0\n        return mask\n    \n    def calc_colinear_loss(self):\n        # DEPRECATED\n        return 0.\n    \n    def calc_coor_loss(self, coordinates_pred, coordinates_gt):\n        \"\"\"\n        Coordinate loss which corresponds to L_{2d} in the paper.\n        coordinates_pred: [N, K, 2]\n        coordinates_gt: [N, K, 2]\n        \"\"\"  \n        coordinates_gt[:, :, 0] /= self.img_size[0]\n        coordinates_gt[:, :, 1] /= self.img_size[1]   \n        loss = self.comp_dict['coor'][0](coordinates_pred, coordinates_gt) \n        return loss\n    \n    def forward(self, output, target, target_weight=None, meta=None):\n        \"\"\"\n        Loss evaluation.\n        Output is in the format of (heatmaps, coordinates) where coordinates\n        is optional.\n        target refers to the ground truth heatmaps.\n        \"\"\"  \n        if type(output) is tuple:\n            heatmaps_pred, coordinates_pred = output\n        else:\n            heatmaps_pred, coordinates_pred = output, None\n        total_loss = 0\n        if 'hm' in self.comp_dict:\n            # some heatmaps map be produced by unlabeled data\n            if len(heatmaps_pred) != len(target):\n                heatmaps_pred = heatmaps_pred[:len(target)]\n            total_loss += self.calc_hm_loss(heatmaps_pred, target) * self.comp_dict['hm'][1]\n        if 'coor' in self.comp_dict:\n            coordinates_gt = meta['transformed_joints'][:, :, :2].astype(np.float32)\n            coordinates_gt = torch.from_numpy(coordinates_gt).cuda()           \n            if coordinates_pred == None:\n                coordinates_pred, max_vals = soft_arg_max(heatmaps_pred)\n                coordinates_pred[:, :, 0] /= self.hm_size[1]\n                coordinates_pred[:, :, 1] /= self.hm_size[0]     \n            if len(coordinates_pred) != len(coordinates_gt):\n                coordinates_pred_fs = coordinates_pred[:len(coordinates_gt)]\n            else:\n                coordinates_pred_fs = coordinates_pred\n            total_loss += self.calc_coor_loss(coordinates_pred_fs, coordinates_gt) * self.comp_dict['coor'][1] \n        if 'cr' in self.comp_dict and self.comp_dict['cr'][1] != \"None\" and self.apply_cr_loss:\n            cr_loss_mask = self.get_cr_mask(coordinates_pred.clone().detach().data.cpu().numpy(), self.cr_loss_thres)\n            total_loss += self.calc_cross_ratio_loss(coordinates_pred, self.target_cr, cr_loss_mask) * self.comp_dict['cr'][1]\n        return total_loss\n    \nclass MSELoss1D(nn.Module):\n    \"\"\"\n    Mean squared error loss.\n    \"\"\"     \n    def __init__(self, use_target_weight=False, reduction='mean'):\n        super(MSELoss1D, self).__init__()\n        self.criterion = nn.MSELoss(reduction=reduction)\n        self.use_target_weight = use_target_weight\n\n    def forward(self, output, target, target_weight=None, meta=None):\n        loss = self.criterion(output, target)\n        return loss\n    \nclass SmoothL1Loss1D(nn.Module):\n    \"\"\"\n    Smooth L1 loss.\n    \"\"\"\n    def __init__(self, use_target_weight=False):\n        super(SmoothL1Loss1D, self).__init__()\n        self.criterion = nn.SmoothL1Loss(reduction='mean')\n        self.use_target_weight = use_target_weight\n\n    def forward(self, output, target, target_weight=None, meta=None):\n        loss = self.criterion(output, target)\n        return loss\n\nclass DecoupledSL1Loss(nn.Module):\n    # DEPRECATED\n    def __init__(self, use_target_weight=None):\n        super(DecoupledSL1Loss, self).__init__()\n        self.criterion = F.smooth_l1_loss\n\n    def forward(self, output, target, target_weight=None):\n        # balance the loss for translation and rotation regression\n        loss_center = self.criterion(output[:, :3], target[:, :3], reduction='mean')\n        loss_else = self.criterion(output[:, 3:], target[:, 3:], reduction='mean')\n        return loss_center + loss_else\n    \nclass JointsOHKMMSELoss(nn.Module):\n    # DEPRECATED\n    def __init__(self, use_target_weight, topk=8):\n        super(JointsOHKMMSELoss, self).__init__()\n        self.criterion = nn.MSELoss(reduction='none')\n        self.use_target_weight = use_target_weight\n        self.topk = topk\n\n    def ohkm(self, loss):\n        ohkm_loss = 0.\n        for i in range(loss.size()[0]):\n            sub_loss = loss[i]\n            topk_val, topk_idx = torch.topk(\n                sub_loss, k=self.topk, dim=0, sorted=False\n            )\n            tmp_loss = torch.gather(sub_loss, 0, topk_idx)\n            ohkm_loss += torch.sum(tmp_loss) / self.topk\n        ohkm_loss /= loss.size()[0]\n        return ohkm_loss\n\n    def forward(self, output, target, target_weight):\n        batch_size = output.size(0)\n        num_joints = output.size(1)\n        heatmaps_pred = output.reshape((batch_size, num_joints, -1)).split(1, 1)\n        heatmaps_gt = target.reshape((batch_size, num_joints, -1)).split(1, 1)\n\n        loss = []\n        for idx in range(num_joints):\n            heatmap_pred = heatmaps_pred[idx].squeeze()\n            heatmap_gt = heatmaps_gt[idx].squeeze()\n            if self.use_target_weight:\n                loss.append(0.5 * self.criterion(\n                    heatmap_pred.mul(target_weight[:, idx]),\n                    heatmap_gt.mul(target_weight[:, idx])\n                ))\n            else:\n                loss.append(\n                    0.5 * self.criterion(heatmap_pred, heatmap_gt)\n                )\n\n        loss = [l.mean(dim=1).unsqueeze(dim=1) for l in loss]\n        loss = torch.cat(loss, dim=1)\n\n        return self.ohkm(loss)\n\nclass WingLoss(nn.Module):\n    # DEPRECATED\n    def __init__(self, use_target_weight, width=5, curvature=0.5, image_size=(384, 288)):\n        super(WingLoss, self).__init__()\n        self.width = width\n        self.curvature = curvature\n        self.C = self.width - self.width * np.log(1 + self.width / self.curvature)\n        self.image_size = image_size\n        \n    def forward(self, output, target, target_weight):\n        prediction, _ = soft_arg_max(output)\n        # normalize the coordinates to 0-1\n        prediction[:, :, 0] /= self.image_size[1]\n        prediction[:, :, 1] /= self.image_size[0]\n        target[:, :, 0] /= self.image_size[1]\n        target[:, :, 1] /= self.image_size[0]  \n        diff = target - prediction\n        diff_abs = diff.abs()\n        loss = diff_abs.clone()\n\n        idx_smaller = diff_abs < self.width\n        idx_bigger = diff_abs >= self.width\n\n        loss[idx_smaller] = self.width * torch.log(1 + diff_abs[idx_smaller] / self.curvature)\n        loss[idx_bigger]  = loss[idx_bigger] - self.C\n        loss = loss.mean()\n        return loss"
  },
  {
    "path": "libs/metric/criterions.py",
    "content": "\"\"\"\nMetric functions used for validation.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport libs.common.transformation as ltr\nimport libs.common.img_proc as lip\nfrom libs.common.transformation import compute_similarity_transform\n\nimport numpy as np\nimport torch\nfrom scipy.spatial.transform import Rotation\n\n# threshold for percentage of correct key-points (PCK)\nPCK_THRES = np.array([0.1, 0.2, 0.3])\n\ndef get_distance(gt, pred):\n    \"\"\"\n    2D Euclidean distance of two groups of points with visibility considered. \n    \n    gt: [n_joints, 2 or 3]\n    pred: [n_joints, 2]\n    \"\"\"    \n    if gt.shape[1] == 2:\n        sqerr = (gt - pred)**2\n        sqerr = sqerr.sum(axis = 1)\n        dist_list = list(np.sqrt(sqerr))\n    elif gt.shape[1] == 3:\n        dist_list = []\n        sqerr = (gt[:, :2] - pred)**2\n        sqerr = sqerr.sum(axis = 1)\n        indices = np.nonzero(gt[:, 2])[0]\n        dist_list = list(np.sqrt(sqerr[indices]))        \n    else:\n        raise ValueError('Array shape not supported.')\n    return dist_list\n\ndef get_angle_error(pred, meta_data, cfgs=None):\n    \"\"\"\n    Compute error for angle prediction.\n    \"\"\"    \n    if not isinstance(pred, np.ndarray):\n        pred = pred.data.cpu().numpy()    \n    angles_pred = np.arctan2(pred[:,1], pred[:,0])\n    angles_gt = meta_data['angles_gt']\n    dif = np.abs(angles_gt - angles_pred) * 180 / np.pi\n    # add or minus 2*pi\n    indices = dif > 180\n    dif[indices] = 360 - dif[indices]\n    cnt = len(pred)\n    avg_acc = dif.sum()/cnt\n    others = None\n    return avg_acc, cnt, others\n\ndef get_PCK(pred, gt):\n    \"\"\"\n    Get percentage of correct key-points\n    \"\"\"\n    distance = np.array(get_distance(gt, pred))\n    denominator = (gt[:, 1].max() - gt[:, 1].min()) * 1/3\n    correct_cnt = np.zeros((len(PCK_THRES)))\n    for idx, thres in enumerate(PCK_THRES):\n        correct_cnt[idx] = (distance < thres * denominator).sum()\n    return correct_cnt\n\ndef get_distance_src(output,\n                     meta_data,\n                     cfgs=None,\n                     image_size = (256.0, 256.0),\n                     arg_max='hard'\n                     ):\n    \"\"\"\n    From predicted heatmaps, obtain local coordinates (\\phi_l in the paper) \n    and transform them back to the source images based on metadata. \n    Error is then evaluated on the source image for the screen coordinates \n    (\\phi_g in the paper).\n    \"\"\"\n    # the error is reported as distance in terms of pixels in the source image\n    if type(output) is tuple:\n        pred, max_vals = output[1].data.cpu().numpy(), None\n    elif isinstance(output, np.ndarray) and arg_max == 'soft':\n        pred, max_vals = lip.soft_arg_max_np(output)\n    elif isinstance(output, torch.Tensor) and arg_max == 'soft': \n        pred, max_vals = lip.soft_arg_max(output)\n    elif isinstance(output, np.ndarray) or isinstance(output, torch.Tensor) and arg_max == 'hard':\n        if not isinstance(output, np.ndarray):\n            output = output.data.cpu().numpy()        \n        pred, max_vals = lip.get_max_preds(output)\n    else:\n        raise NotImplementedError\n    image_size = image_size if cfgs is None else cfgs['heatmapModel']['input_size']\n    width, height = image_size\n    # multiply by down-sample ratio\n    if not isinstance(pred, np.ndarray):\n        pred = pred.data.cpu().numpy()\n    if (max_vals is not None) and (not isinstance(max_vals, np.ndarray)):\n        max_vals = max_vals.data.cpu().numpy()\n    # the coordinates need to be rescaled for different cases\n    if type(output) is tuple:\n        pred *= np.array(image_size).reshape(1, 1, 2)\n    else:\n        pred *= image_size[0] / output.shape[3]\n    # inverse transform and compare pixel didstance\n    centers, scales = meta_data['center'], meta_data['scale']\n    # some predictions are generated for unlabeled data\n    if len(pred) != len(centers):\n        pred_used = pred[:len(centers)]\n    else:\n        pred_used = pred\n    if 'rotation' in meta_data:\n        rots = meta_data['rotation']\n    else:\n        rots = [0. for i in range(len(centers))]\n    joints_original_batch = meta_data['original_joints']\n    distance_list = []\n    correct_cnt_sum = np.zeros((len(PCK_THRES)))\n    all_src_coordinates = []\n    for sample_idx in range(len(pred_used)):\n        trans_inv = lip.get_affine_transform(centers[sample_idx], \n                                             scales[sample_idx], \n                                             rots[sample_idx], \n                                             (height, width), \n                                             inv=1\n                                             )\n        joints_original = joints_original_batch[sample_idx]        \n        pred_src_coordinates = lip.affine_transform_modified(pred_used[sample_idx], \n                                                             trans_inv\n                                                             ) \n        all_src_coordinates.append(pred_src_coordinates.reshape(1, len(pred_src_coordinates), 2))\n        distance_list += get_distance(joints_original, pred_src_coordinates)\n        correct_cnt_sum += get_PCK(pred_src_coordinates, joints_original)\n    cnt = len(distance_list)\n    avg_acc = sum(distance_list) / cnt\n    others = {\n        'src_coord': np.concatenate(all_src_coordinates, axis=0), # screen coordinates\n        'joints_pred': pred, # predicted local coordinates\n        'max_vals': max_vals, \n        'correct_cnt': correct_cnt_sum,\n        'PCK_batch': correct_cnt_sum / cnt\n        }\n    return avg_acc, cnt, others\n\nclass AngleError():\n    \"\"\"\n    Angle error in degrees. \n    \"\"\"  \n    def __init__(self, cfgs, num_joints=None):\n        self.name = 'Angle error in degrees'\n        self.num_joints = num_joints\n        self.count = 0\n        self.mean = 0.\n        return\n  \n    def update(self, prediction, meta_data, ground_truth=None, logger=None):\n        \"\"\"\n        the prediction and transformation parameters in meta_data are used.\n        \"\"\"    \n        avg_acc, cnt, others = get_angle_error(prediction, meta_data)\n        self.mean = (self.mean * self.count + cnt * avg_acc) / (self.count + cnt)\n        self.count += cnt\n        return \n    \n    def report(self, logger):\n        msg = 'Error type: {error_type:s}\\t' \\\n              'Error: {Error}\\t'.format(\n                      error_type = self.name,\n                      Error = self.mean)     \n        logger.info(msg)        \n        return\n\nclass JointDistance2DSIP():\n    \"\"\"\n    Joint distance error evaluated for screen coordinates in the source image plane (SIP). \n    \"\"\"  \n    def __init__(self, cfgs, num_joints=None):\n        self.name = 'Joint distance in the source image plane'\n        if num_joints is not None:\n            self.num_joints = num_joints\n        else:\n            self.num_joints = cfgs['heatmapModel']['num_joints']\n        self.image_size = cfgs['heatmapModel']['input_size']\n        if 'arg_max' in cfgs['testing_settings']:\n            self.arg_max = cfgs['testing_settings']['arg_max']\n        else:\n            self.arg_max = None\n            self.count = 0\n            self.mean = 0.\n            self.PCK_counts = np.zeros(len(PCK_THRES))\n        return\n  \n    def update(self, prediction, meta_data, ground_truth=None, logger=None):\n        \"\"\"\n        Update statistics for a batch.\n        The prediction and transformation parameters in meta_data are used.\n        \"\"\"    \n        avg_acc, cnt, others = get_distance_src(prediction, \n                                                meta_data,\n                                                arg_max=self.arg_max,\n                                                image_size=self.image_size\n                                                )       \n        self.mean = (self.mean * self.count + cnt * avg_acc) / (self.count + cnt)\n        self.count += cnt\n        self.PCK_counts += others['correct_cnt']\n        return \n    \n    def report(self, logger):\n        \"\"\"\n        Report final evaluation results.\n        \"\"\"  \n        logger.info(\"Ealuaton Results:\")\n        msg = 'Error type: {error_type:s}\\t' \\\n              'MPJPE: {MPJPE}\\t'.format(error_type = self.name, \n                                        MPJPE = self.mean\n                                        )     \n        logger.info(msg)        \n        for idx, value in enumerate(self.PCK_counts):\n            PCK = value / self.count\n            logger.info('PCK at threshold {:.2f}: {:.3f}'.format(PCK_THRES[idx], PCK))        \n        return\n\ndef update_statistics(self, update, num_data, name_str):\n    \"\"\"\n    Update error statistics for a data batch.\n    \"\"\" \n    old_count = getattr(self, 'count'+name_str)\n    old_mean = getattr(self, 'mean'+name_str)\n    old_max = getattr(self, 'max'+name_str)\n    old_min = getattr(self, 'min'+name_str)\n    new_mean = (old_count * old_mean + np.sum(update, axis=0)) / (old_count + num_data) \n    new_count = old_count + num_data\n    new_max = np.maximum(old_max, update.max(axis=0))\n    new_min = np.minimum(old_min, update.min(axis=0))\n    setattr(self, 'mean'+name_str, new_mean)\n    setattr(self, 'count'+name_str, new_count)\n    setattr(self, 'max'+name_str, new_max)\n    setattr(self, 'min'+name_str, new_min)    \n    return\n\ndef update_rotation_error(self, \n                          prediction, \n                          ground_truth, \n                          meta_data=None, \n                          logger=None,\n                          name_str='',\n                          style='euler'\n                          ):\n    \"\"\"\n    Get rotation error between two 3D point clouds. \n    \"\"\"    \n    num_data = len(prediction)\n    prediction = prediction.reshape(num_data, -1, 3)\n    ground_truth = ground_truth.reshape(num_data, -1, 3)\n    if style == 'euler':\n        results = -np.ones((num_data, 3))\n    for data_idx in range(num_data):\n        R, T = ltr.compute_rigid_transform(prediction[data_idx].T, \n                                           ground_truth[data_idx].T\n                                           )\n        if style == 'euler':\n            results[data_idx] = np.abs(Rotation.from_matrix(R).as_euler('xyz', \n                                                                        degrees=True\n                                                                        )\n                                       )\n        else:\n            raise NotImplementedError\n    update_statistics(self, results, num_data, name_str)\n    return\n\ndef update_joints_3d_error(self, \n                           prediction, \n                           ground_truth, \n                           meta_data=None, \n                           logger=None,\n                           name_str='',\n                           style='direct'\n                           ):\n    \"\"\"\n    Get distance error between prediction and ground truth.\n    \"\"\"\n    ground_truth = ground_truth.reshape(len(ground_truth), -1, 3)\n    prediction = prediction.reshape(len(prediction), -1, 3)\n    num_joints = prediction.shape[1]\n    if style == 'procrustes':\n        # Apply procrustes alignment if asked to do so\n        for j in range(len(prediction)):\n            gt  = ground_truth[j]\n            out = prediction[j]\n            _, Z, T, b, c = compute_similarity_transform(gt, out, compute_optimal_scale=True)\n            out = (b * out.dot(T)) + c\n            prediction[j] = np.reshape(out, [num_joints, 3])\n    sqerr = (ground_truth - prediction)**2 \n    distance = np.sqrt(np.sum(sqerr, axis=2))        \n    num_data = len(prediction)\n    update_statistics(self, distance, num_data, name_str)    \n    # provide detailed L1 errors if there is only one joint\n    if num_joints == 1:\n        error_xyz = np.abs(ground_truth - prediction)\n        update_statistics(self, error_xyz, num_data, name_str + '_xyz')\n    return    \n\nclass RotationError3D():\n    \"\"\"\n    Helper class for recording rotation estimation error.\n    \"\"\"\n    def __init__(self, cfgs):\n        self.name = 'Rotation error'\n        self.style = cfgs['metrics']['R3D']['style']\n        self.count = 0\n        if self.style == 'euler':\n            self.mean = np.zeros((3))\n            self.max = -np.ones((3))\n            self.min = np.ones((3))*1e16\n        return\n    \n    def update(self, prediction, ground_truth, meta_data=None, logger=None):\n        \"\"\"\n        get rotation error between two point clouds \n        \"\"\"    \n        update_rotation_error(self, \n                              prediction, \n                              ground_truth, \n                              meta_data=meta_data, \n                              logger=logger,\n                              style=self.style\n                              )\n        return \n    \n    def report(self, logger):\n        msg = 'Error type: {error_type:s}\\t' \\\n              'Mean error: {mean_error}\\t' \\\n              'Max error: {max_error}\\t' \\\n              'Min error: {min_error}\\t'.format(\n                      error_type = self.name,\n                      mean_error= self.mean, \n                      max_error= self.max,\n                      min_error= self.min\n                      )     \n        logger.info(msg)        \n        return\n    \nclass JointDistance3D():\n    \"\"\"\n    Helper class for recording joint distance error.\n    \"\"\"\n    def __init__(self, cfgs):\n        self.name = 'Joint distance'\n        self.style = cfgs['metrics']['JD3D']['style']\n        self.num_joints = int(cfgs['FCModel']['output_size']/3)\n        self.count = 0\n        if self.style in ['direct', 'procrustes']:\n            self.mean = np.zeros((self.num_joints))\n            self.max = -np.ones((self.num_joints))\n            self.min = np.ones((self.num_joints))*1e16\n        else:\n            raise NotImplementedError\n        return\n  \n    def update(self, prediction, ground_truth, meta_data=None, logger=None):\n        \"\"\"\n        get Euclidean distance between two point clouds \n        \"\"\"    \n        update_joints_3d_error(self, \n                               prediction,\n                               ground_truth,\n                               meta_data=meta_data,\n                               logger=logger,\n                               name_str='',\n                               style=self.style\n                               )        \n        return \n    \n    def report(self, logger):\n        MPJPE = self.mean.sum() / self.num_joints\n        msg = 'Error type: {error_type:s}\\t' \\\n              'MPJPE: {MPJPE}\\t' \\\n              'Mean error for each joint: {mean_error}\\t' \\\n              'Max error for each joint: {max_error}\\t' \\\n              'Min error for each joint: {min_error}\\t'.format(\n                      error_type = self.name,\n                      MPJPE = MPJPE,\n                      mean_error= self.mean, \n                      max_error= self.max,\n                      min_error= self.min\n                      )     \n        logger.info(msg)        \n        return\n\nclass RError3D():\n    def __init__(self, cfgs, num_joints):\n        \"\"\"\n        Relative shape error\n        The point cloud should have a format [shape_relative_to_root]\n        \"\"\"           \n        self.name = 'RError3D'\n        self.T_style = cfgs['metrics']['R3D']['T_style']\n        self.R_style = cfgs['metrics']['R3D']['R_style']\n        if cfgs['dataset']['3d_kpt_sample_style'] == 'bbox9': \n            self.num_joints = num_joints - 1 # discount the root joint\n        else:\n            raise NotImplementedError\n        self.count_rT = self.count_R = 0\n        # translation error of the shape relative to the root\n        self.mean_rT = np.zeros((self.num_joints))\n        self.max_rT = -np.ones((self.num_joints))\n        self.min_rT = np.ones((self.num_joints))*1e16            \n        # relative rotation between the ground truth shape and predicted shape\n        self.mean_R = np.zeros((3))\n        self.max_R = -np.ones((3))\n        self.min_R = np.ones((3))*1e16            \n        return\n  \n    def update(self, prediction, ground_truth, meta_data=None, logger=None):\n        update_joints_3d_error(self, \n                               prediction=prediction,\n                               ground_truth=ground_truth,\n                               meta_data=meta_data,\n                               logger=logger,\n                               name_str='_rT',\n                               style=self.T_style\n                               )\n        update_rotation_error(self,\n                              prediction=prediction,\n                              ground_truth=ground_truth,\n                              meta_data=meta_data,\n                              logger=logger,\n                              name_str='_R',\n                              style=self.R_style\n                              )        \n        return \n    \n    def report(self, logger):\n        MPJPE = self.mean_rT.sum() / self.num_joints\n        msg = 'Error type: {error_type:s}\\t' \\\n              'MPJPE of the shape relative to the root:\\t' \\\n              'MPJPE: {MPJPE}\\t' \\\n              'Rotation error of the shape relative to the root:\\t' \\\n              'Mean error: {mean_R}\\t' \\\n              'Max error: {max_R}\\t' \\\n              'Min error: {min_R}\\t'.format(\n                  error_type = self.name,\n                  MPJPE = MPJPE,\n                  mean_R = self.mean_R,\n                  max_R = self.max_R,\n                  min_R = self.min_R\n                  )     \n        logger.info(msg)        \n        return\n    \nclass RTError3D():\n    def __init__(self, cfgs, num_joints):\n        \"\"\"\n        Rotation and translation error combined.\n        The point cloud should have a format [root, shape_relative_to_root]\n        \"\"\"           \n        self.name = 'RTError3D'\n        self.T_style = cfgs['metrics']['RTError3D']['T_style']\n        self.R_style = cfgs['metrics']['RTError3D']['R_style']\n        if cfgs['dataset']['3d_kpt_sample_style'] == 'bbox9': \n            self.num_joints = num_joints - 1 # discount the root joint\n        else:\n            raise NotImplementedError\n        self.count_T = self.count_T_xyz = self.count_rT = self.count_R = 0\n        if self.T_style in ['direct', 'procrustes']:\n            # translation error of the root vector\n            self.mean_T = np.zeros((1))\n            # L1 error for each component\n            self.mean_T_xyz = np.zeros((3))\n            self.max_T = -np.ones((1))\n            self.max_T_xyz = -np.ones((3))\n            self.min_T = np.ones((1))*1e16\n            self.min_T_xyz = np.ones((3))*1e16\n            # translation error of the shape relative to the root\n            self.mean_rT = np.zeros((self.num_joints))\n            self.max_rT = -np.ones((self.num_joints))\n            self.min_rT = np.ones((self.num_joints))*1e16            \n        else:\n            raise NotImplementedError\n        # relative rotation between the ground truth shape and predicted shape\n        self.mean_R = np.zeros((3))\n        self.max_R = -np.ones((3))\n        self.min_R = np.ones((3))*1e16            \n        return\n  \n    def update(self, prediction, ground_truth, meta_data=None, logger=None):\n        update_joints_3d_error(self, \n                               prediction=prediction[:, :3],\n                               ground_truth=ground_truth[:, :3],\n                               meta_data=meta_data,\n                               logger=logger,\n                               name_str='_T',\n                               style=self.T_style\n                               )\n        update_joints_3d_error(self, \n                               prediction=prediction[:, 3:],\n                               ground_truth=ground_truth[:, 3:],\n                               meta_data=meta_data,\n                               logger=logger,\n                               name_str='_rT',\n                               style=self.T_style\n                               )\n        update_rotation_error(self,\n                              prediction=prediction[:, 3:],\n                              ground_truth=ground_truth[:, 3:],\n                              meta_data=meta_data,\n                              logger=logger,\n                              name_str='_R',\n                              style=self.R_style\n                              )        \n        return \n    \n    def report(self, logger):\n        MPJPE = self.mean_rT.sum() / self.num_joints\n        msg = 'Error type: {error_type:s}\\t' \\\n              'Translation error of the root:\\t' \\\n              'Mean error: {mean_T}\\t' \\\n              'Max error: {max_T}\\t' \\\n              'Min error: {min_T}\\t' \\\n              'Translation error of the root in three directions:\\t' \\\n              'Mean error (L1): {mean_T_xyz}\\t' \\\n              'MPJPE of the shape relative to the root:\\t' \\\n              'MPJPE: {MPJPE}\\t' \\\n              'Rotation error of the shape relative to the root:\\t' \\\n              'Mean error: {mean_R}\\t' \\\n              'Max error: {max_R}\\t' \\\n              'Min error: {min_R}\\t'.format(\n                  error_type = self.name,\n                  MPJPE = MPJPE,\n                  mean_T = self.mean_T, \n                  max_T = self.max_T,\n                  min_T = self.min_T,\n                  mean_T_xyz = self.mean_T_xyz,\n                  mean_R = self.mean_R,\n                  max_R = self.max_R,\n                  min_R = self.min_R)     \n        logger.info(msg)        \n        return\n    \nclass Evaluator():\n    \"\"\"\n    Helper class for recording a list of pre-defined metrics.\n    \"\"\"    \n    def __init__(self, metrics, cfgs=None, num_joints=9):\n        \"\"\"\n        metrics is a list of strings specifying what metrics to use\n        \"\"\"\n        self.metrics = []\n        for metric in metrics:\n            self.metrics.append(eval(metric + '(cfgs=cfgs, num_joints=num_joints)'))\n        return\n    \n    def update(self, \n               prediction, \n               ground_truth=None,\n               meta_data=None,\n               logger=None\n               ):\n        \"\"\"\n        update evaluation with a new batch of prediction and ground truth\n        \"\"\"        \n        for metric in self.metrics:\n            metric.update(prediction, \n                          ground_truth=ground_truth,\n                          meta_data=meta_data,\n                          logger=logger\n                          )\n        return \n    \n    def report(self, logger):\n        for metric in self.metrics:\n            metric.report(logger)       \n        return\n"
  },
  {
    "path": "libs/model/FCmodel.py",
    "content": "\"\"\"\nFully-connected model architecture for processing 1D data.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\nimport torch.nn as nn\n\nclass ResidualBlock(nn.Module):\n    def __init__(self, \n                 num_neurons, \n                 p_dropout=0.5, \n                 kaiming=False, \n                 leaky=False):\n        super(ResidualBlock, self).__init__()\n        self.num_neurons = num_neurons\n        self.leaky = leaky\n        self.p_dropout = p_dropout\n        if leaky:\n            self.relu = nn.LeakyReLU(inplace=True)\n        else:\n            self.relu = nn.ReLU(inplace=True)\n        self.dropout = nn.Dropout(p_dropout)\n        self.w1 = nn.Linear(self.num_neurons, self.num_neurons)\n        self.batch_norm1 = nn.BatchNorm1d(self.num_neurons)\n        self.w2 = nn.Linear(self.num_neurons, self.num_neurons)\n        self.batch_norm2 = nn.BatchNorm1d(self.num_neurons)\n        if kaiming:\n            # kaiming initialization\n            self.w1.weight.data = nn.init.kaiming_normal_(self.w1.weight.data)\n            self.w2.weight.data = nn.init.kaiming_normal_(self.w2.weight.data)\n            \n    def forward(self, x):\n        y = self.w1(x)\n        y = self.batch_norm1(y)\n        y = self.relu(y)\n        y = self.dropout(y)\n        y = self.w2(y)\n        y = self.batch_norm2(y)\n        y = self.relu(y)\n        y = self.dropout(y)\n        out = x + y\n        return out\n\nclass FCModel(nn.Module):\n    def __init__(self,\n                 stage_id=1,\n                 num_neurons=1024,\n                 num_blocks=2,\n                 p_dropout=0.5,\n                 norm_twoD=False,\n                 kaiming=False,\n                 refine_3d=False, \n                 leaky=False,\n                 dm=False,\n                 input_size=32,\n                 output_size=64):\n        \"\"\"\n        dm: use distance matrix feature computed from coordinates (DEPRECATED)\n        leaky: use leaky ReLu instead of normal Relu\n        \"\"\"\n        super(FCModel, self).__init__()\n        self.num_neurons = num_neurons\n        self.p_dropout = p_dropout\n        self.num_blocks = num_blocks\n        self.stage_id = stage_id\n        self.refine_3d = refine_3d\n        self.leaky = leaky\n        self.dm = dm \n        self.input_size = input_size        \n        self.output_size = output_size\n        # map the input to a representation vector\n        self.w1 = nn.Linear(self.input_size, self.num_neurons)\n        self.batch_norm1 = nn.BatchNorm1d(self.num_neurons)\n        self.res_blocks = []\n        for l in range(num_blocks):\n            self.res_blocks.append(ResidualBlock(num_neurons=self.num_neurons, \n                                                 p_dropout=self.p_dropout,\n                                                 leaky=self.leaky))\n        self.res_blocks = nn.ModuleList(self.res_blocks)\n        # output\n        self.w2 = nn.Linear(self.num_neurons, self.output_size)\n        if self.leaky:\n            self.relu = nn.LeakyReLU(inplace=True)\n        else:\n            self.relu = nn.ReLU(inplace=True)\n        self.dropout = nn.Dropout(self.p_dropout)\n        if kaiming:\n            self.w1.weight.data = nn.init.kaiming_normal_(self.w1.weight.data)\n            self.w2.weight.data = nn.init.kaiming_normal_(self.w2.weight.data)\n            \n    def forward(self, x):\n        y = self.get_representation(x)\n        y = self.w2(y)\n        return y\n    \n    def get_representation(self, x):\n        y = self.w1(x)\n        y = self.batch_norm1(y)\n        y = self.relu(y)\n        y = self.dropout(y)\n        # residual blocks\n        for i in range(self.num_blocks):\n            y = self.res_blocks[i](y)        \n        return y\n    \ndef get_fc_model(stage_id, \n                 cfgs,\n                 input_size,\n                 output_size,\n                 architecture_type = 'FCModel'):\n    return FCModel(stage_id=stage_id, \n                   refine_3d=cfgs[architecture_type]['refine_3d'], \n                   norm_twoD=cfgs[architecture_type]['norm_twoD'],\n                   num_blocks=cfgs[architecture_type]['num_blocks'], \n                   input_size=input_size, \n                   output_size=output_size, \n                   num_neurons=cfgs[architecture_type]['num_neurons'],\n                   p_dropout=cfgs[architecture_type]['dropout'],\n                   leaky=cfgs[architecture_type]['leaky']\n                   )\n    \ndef get_cascade():\n    return nn.ModuleList([])"
  },
  {
    "path": "libs/model/__init__.py",
    "content": "import libs.model.heatmapModel.hrnet\nimport libs.model.heatmapModel.resnet\n"
  },
  {
    "path": "libs/model/egonet.py",
    "content": "\"\"\"\nA PyTorch implementation of Ego-Net.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport torch\nimport torch.nn as nn\nimport numpy as np\nimport cv2\nimport math\n\nfrom scipy.spatial.transform import Rotation\nfrom os.path import join as pjoin\n\nimport libs.model as models\nimport libs.model.FCmodel as FCmodel\nimport libs.dataset.normalization.operations as nop\nimport libs.visualization.egonet_utils as vego\nimport libs.common.transformation as ltr\n\nfrom libs.common.img_proc import to_npy, get_affine_transform, generate_xy_map, modify_bbox\nfrom libs.common.img_proc import affine_transform_modified\nfrom libs.common.format import save_txt_file, get_pred_str\nfrom libs.dataset.KITTI.car_instance import interp_dict\n\nclass EgoNet(nn.Module):\n    def __init__(self,\n                 cfgs,\n                 pre_trained=False\n                 ):\n        \"\"\"\n        Initialization method of Ego-Net.\n        \"\"\"\n        super(EgoNet, self).__init__()\n        # initialize a fully-convolutional heatmap regression model\n        # this model corresponds to H and C in Equation (2)\n        hm_model_settings = cfgs['heatmapModel']\n        hm_model_name = hm_model_settings['name']\n        # this implementation uses a HR-Net backbone, yet you can use other \n        # backbones as well\n        method_str = 'models.heatmapModel.' + hm_model_name + '.get_pose_net'\n        self.HC = eval(method_str)(cfgs, is_train=False)\n        self.resolution = cfgs['heatmapModel']['input_size']\n        # optional channel augmentation\n        if 'add_xy' in cfgs['heatmapModel']:\n            self.xy_dict = {'flag':cfgs['heatmapModel']['add_xy']}\n        else:\n            self.xy_dict = None\n        # initialize a lifing model\n        # this corresponds to L in Equation (2) \n        self.L = FCmodel.get_fc_model(stage_id=1, \n                                      cfgs=cfgs, \n                                      input_size=cfgs['FCModel']['input_size'],\n                                      output_size=cfgs['FCModel']['output_size']\n                                      )\n        if pre_trained:\n            # load pre-trained checkpoints\n            HC_path = pjoin(cfgs['dirs']['ckpt'], 'HC.pth')\n            L_path = pjoin(cfgs['dirs']['ckpt'], 'L.pth')\n            LS_path = pjoin(cfgs['dirs']['ckpt'], 'LS.npy')\n            self.HC.load_state_dict(torch.load(HC_path))\n            # the statistics used by the lifter for normalizing inputs\n            self.LS = np.load(LS_path, allow_pickle=True).item()\n            self.L.load_state_dict(torch.load(L_path))\n    \n    def crop_single_instance(self, \n                             img, \n                             bbox, \n                             resolution, \n                             pth_trans=None, \n                             xy_dict=None\n                             ):\n        \"\"\"\n        Crop a single instance given an image and bounding box.\n        \"\"\"\n        bbox = to_npy(bbox)\n        width, height = resolution\n        target_ar = height / width\n        ret = modify_bbox(bbox, target_ar)\n        c, s, r = ret['c'], ret['s'], 0.\n        # xy_dict: parameters for adding xy coordinate maps\n        trans = get_affine_transform(c, s, r, (height, width))\n        instance = cv2.warpAffine(img,\n                                  trans,\n                                  (int(resolution[0]), int(resolution[1])),\n                                  flags=cv2.INTER_LINEAR\n                                  )\n        #cv2.imwrite('instance.jpg', instance)\n        if xy_dict is not None and xy_dict['flag']:\n            xymap = generate_xy_map(ret['bbox'], resolution, img.shape[:-1])\n            instance = np.concatenate([instance, xymap.astype(np.float32)], axis=2)        \n        instance = instance if pth_trans is None else pth_trans(instance)\n        return instance\n    \n    def load_cv2(self, path, rgb=True):\n        data_numpy = cv2.imread(path, 1 | 128)    \n        if data_numpy is None:\n            raise ValueError('Fail to read {}'.format(path))    \n        if rgb:\n            data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB)         \n        return data_numpy\n    \n    def crop_instances(self, \n                       annot_dict, \n                       resolution, \n                       pth_trans=None, \n                       rgb=True,\n                       xy_dict=None\n                       ):\n        \"\"\"\n        Crop input instances given an annotation dictionary.\n        \"\"\"\n        all_instances = []\n        # each record stores attributes of one instance\n        all_records = []\n        target_ar = resolution[1] / resolution[0]\n        for idx, path in enumerate(annot_dict['path']):\n            data_numpy = self.load_cv2(path)\n            boxes = annot_dict['boxes'][idx]\n            if 'labels' in annot_dict:\n                labels = annot_dict['labels'][idx]\n            else:\n                labels = -np.ones((len(boxes)), dtype=np.int64)\n            if 'scores' in annot_dict:\n                scores = annot_dict['scores'][idx]\n            else:\n                scores = -np.ones((len(boxes)))\n            if len(boxes) == 0:\n                continue\n            for idx, bbox in enumerate(boxes):\n                # crop an instance with required aspect ratio\n                instance = self.crop_single_instance(data_numpy,\n                                                     bbox, \n                                                     resolution, \n                                                     pth_trans=pth_trans,\n                                                     xy_dict=xy_dict\n                                                     )\n                bbox = to_npy(bbox)\n                ret = modify_bbox(bbox, target_ar)\n                c, s, r = ret['c'], ret['s'], 0.\n                all_instances.append(torch.unsqueeze(instance, dim=0))\n                all_records.append({\n                    'path': path,\n                    'center': c,\n                    'scale': s,\n                    'bbox': bbox,\n                    'bbox_resize': ret['bbox'],\n                    'rotation': r,\n                    'label': labels[idx],\n                    'score': scores[idx]\n                    }\n                    )\n        return torch.cat(all_instances, dim=0), all_records\n\n    def add_orientation_arrow(self, record):\n        \"\"\"\n        Generate an arrow for each predicted orientation for visualization.\n        \"\"\"      \n        pred_kpts = record['kpts_3d_pred']\n        gt_kpts = record['kpts_3d_gt']\n        K = record['K']\n        arrow_2d = np.zeros((len(pred_kpts), 2, 2))\n        for idx in range(len(pred_kpts)):\n            vector_3d = (pred_kpts[idx][1] - pred_kpts[idx][5])\n            arrow_3d = np.concatenate([gt_kpts[idx][0].reshape(3, 1), \n                                      (gt_kpts[idx][0] + vector_3d).reshape(3, 1)],\n                                      axis=1)\n            projected = K @ arrow_3d\n            arrow_2d[idx][0] = projected[0, :] / projected[2, :]\n            arrow_2d[idx][1] = projected[1, :] / projected[2, :]\n            # fix the arrow length if not fore-shortened\n            vector_2d = arrow_2d[idx][:,1] - arrow_2d[idx][:,0]\n            length = np.linalg.norm(vector_2d)\n            if length > 50:\n                vector_2d = vector_2d/length * 60\n            arrow_2d[idx][:,1] = arrow_2d[idx][:,0] + vector_2d\n        return arrow_2d\n\n    def write_annot_dict(self, annot_dict, records):\n        for idx, path in enumerate(annot_dict['path']):\n            if 'boxes' in annot_dict:\n                records[path]['boxes'] = to_npy(annot_dict['boxes'][idx])\n            if 'kpts' in annot_dict:\n                records[path]['kpts_2d_gt'] = to_npy(annot_dict['kpts'][idx])   \n            if 'kpts_3d_gt' in annot_dict:\n                records[path]['kpts_3d_gt'] = to_npy(annot_dict['kpts_3d_gt'][idx])   \n            if 'pose_vecs_gt' in annot_dict:            \n                records[path]['pose_vecs_gt'] = to_npy(annot_dict['pose_vecs_gt'][idx])  \n            if 'kpts_3d_before' in annot_dict:\n                records[path]['kpts_3d_before'] = to_npy(annot_dict['kpts_3d_before'][idx])  \n            if 'raw_txt_format' in annot_dict:\n                # list of annotation dictionary for each instance\n                records[path]['raw_txt_format'] = annot_dict['raw_txt_format'][idx]\n            if 'K' in annot_dict:\n                # list of annotation dictionary for each instance\n                records[path]['K'] = annot_dict['K'][idx]\n            if 'kpts_3d_gt' in annot_dict and 'K' in annot_dict:\n                records[path]['arrow'] = self.add_orientation_arrow(records[path])\n        return records\n\n    def get_observation_angle_trans(self, euler_angles, translations):\n        \"\"\"\n        Convert orientation in camera coordinate into local coordinate system\n        utilizing known object location (translation)\n        \"\"\" \n        alphas = euler_angles[:,1].copy()\n        for idx in range(len(euler_angles)):\n            ry3d = euler_angles[idx][1] # orientation in the camera coordinate system\n            x3d, z3d = translations[idx][0], translations[idx][2]\n            alpha = ry3d - math.atan2(-z3d, x3d) - 0.5 * math.pi\n            #alpha = ry3d - math.atan2(x3d, z3d)# - 0.5 * math.pi\n            while alpha > math.pi: alpha -= math.pi * 2\n            while alpha < (-math.pi): alpha += math.pi * 2\n            alphas[idx] = alpha\n        return alphas\n    \n    def get_observation_angle_proj(self, euler_angles, kpts, K):\n        \"\"\"\n        Convert orientation in camera coordinate into local coordinate system\n        utilizing the projection of object on the image plane\n        \"\"\" \n        f = K[0,0]\n        cx = K[0,2]\n        kpts_x = [kpts[i][0,0] for i in range(len(kpts))]\n        alphas = euler_angles[:,1].copy()\n        for idx in range(len(euler_angles)):\n            ry3d = euler_angles[idx][1] # orientation in the camera coordinate system\n            x3d, z3d = kpts_x[idx] - cx, f\n            alpha = ry3d - math.atan2(-z3d, x3d) - 0.5 * math.pi\n            #alpha = ry3d - math.atan2(x3d, z3d)# - 0.5 * math.pi\n            while alpha > math.pi: alpha -= math.pi * 2\n            while alpha < (-math.pi): alpha += math.pi * 2\n            alphas[idx] = alpha\n        return alphas\n\n    def get_template(self, prediction, interp_coef=[0.332, 0.667]):\n        \"\"\"\n        Construct a template 3D cuboid at canonical pose. The 3D cuboid is \n        represented as part coordinates in the camera coordinate system.\n        \"\"\" \n        parents = prediction[interp_dict['bbox12'][0] - 1]\n        children = prediction[interp_dict['bbox12'][1] - 1]\n        lines = parents - children\n        lines = np.sqrt(np.sum(lines**2, axis=1))\n        # averaged over the four parallel line segments\n        h, l, w = np.sum(lines[:4])/4, np.sum(lines[4:8])/4, np.sum(lines[8:])/4\n        x_corners = [l, l, l, l, 0, 0, 0, 0]\n        y_corners = [0, h, 0, h, 0, h, 0, h]\n        z_corners = [w, w, 0, 0, w, w, 0, 0]\n        x_corners += - np.float32(l) / 2\n        y_corners += - np.float32(h)\n        #y_corners += - np.float32(h/2)\n        z_corners += - np.float32(w) / 2\n        corners_3d = np.array([x_corners, y_corners, z_corners])    \n        if len(prediction) == 32:\n            pidx, cidx = interp_dict['bbox12']\n            parents, children = corners_3d[:, pidx - 1], corners_3d[:, cidx - 1]\n            lines = children - parents\n            new_joints = [(parents + interp_coef[i]*lines) for i in range(len(interp_coef))]\n            corners_3d = np.hstack([corners_3d, np.hstack(new_joints)])    \n        return corners_3d\n\n    def kpts_to_euler(self, template, prediction):\n        \"\"\"\n        Convert the predicted cuboid representation to euler angles.\n        \"\"\"    \n        # estimate roll, pitch, yaw of the prediction by comparing with a \n        # reference bounding box\n        # prediction and template of shape [3, N_points]\n        R, T = ltr.compute_rigid_transform(template, prediction)\n        # in the order of yaw, pitch and roll\n        angles = Rotation.from_matrix(R).as_euler('yxz', degrees=False)\n        # re-order in the order of x, y and z\n        angles = angles[[1,0,2]]\n        return angles, T\n\n    def get_6d_rep(self, predictions, ax=None, color=\"black\"):\n        \"\"\"\n        Get the 6DoF representation of a 3D prediction.\n        \"\"\"    \n        predictions = predictions.reshape(len(predictions), -1, 3)\n        all_angles = []\n        for instance_idx in range(len(predictions)):\n            prediction = predictions[instance_idx]\n            # templates are 3D boxes with no rotation\n            # the prediction is estimated as the rotation between prediction and template\n            template = self.get_template(prediction)\n            instance_angle, instance_trans = self.kpts_to_euler(template, prediction.T)        \n            all_angles.append(instance_angle.reshape(1, 3))\n        angles = np.concatenate(all_angles)\n        # the first point is the predicted point center\n        translation = predictions[:, 0, :]    \n        return angles, translation\n\n    def gather_lifting_results(self,\n                               record,\n                               data,\n                               prediction, \n                               target=None,\n                               pose_vecs_gt=None,\n                               intrinsics=None, \n                               refine=False, \n                               visualize=False,\n                               template=None,\n                               dist_coeffs=np.zeros((4,1)),\n                               color='r',\n                               get_str=False,\n                               alpha_mode='trans'\n                               ):\n        \"\"\"\n        Convert network outputs to pose angles.\n        \"\"\"\n        # prepare the prediction strings for submission\n        # compute the roll, pitch and yaw angle of the predicted bounding box\n        record['euler_angles'], record['translation'] = \\\n            self.get_6d_rep(record['kpts_3d_pred'])\n        if alpha_mode == 'trans':\n            record['alphas'] = self.get_observation_angle_trans(record['euler_angles'], \n                                                                record['translation']\n                                                                )\n        elif alpha_mode == 'proj':\n            record['alphas'] = self.get_observation_angle_proj(record['euler_angles'],\n                                                               record['kpts_2d_pred'],\n                                                               record['K']\n                                                               )        \n        else:\n             raise NotImplementedError   \n        if get_str:\n            record['pred_str'] = get_pred_str(record)      \n        if visualize:\n            record = vego.plot_3d_objects(prediction, \n                                          target, \n                                          pose_vecs_gt, \n                                          record, \n                                          color\n                                          )\n        return record\n\n    def plot_one_image(self, \n                       img_path, \n                       record, \n                       visualize=False,\n                       color_dict={'bbox_2d':'r',\n                                   'bbox_3d':'r',\n                                   'kpts':['rx', 'b']\n                                   },\n                       save_dict={'flag':False,\n                                  'save_dir':None\n                                  },\n                       alpha_mode='trans'\n                       ):\n        \"\"\"\n        Post-process and plot the predictions from one image.\n        \"\"\"\n        if visualize:\n            # plot 2D predictions \n            vego.plot_2d_objects(img_path, record, color_dict)\n        # plot 3d bounding boxes\n        all_kpts_2d = np.concatenate(record['kpts_2d_pred'])\n        all_kpts_3d_pred = record['kpts_3d_pred'].reshape(len(record['kpts_3d_pred']), -1)\n        if 'kpts_3d_gt' in record:\n            all_kpts_3d_gt = record['kpts_3d_gt']\n            all_pose_vecs_gt = record['pose_vecs_gt']\n        else:\n            all_kpts_3d_gt = None\n            all_pose_vecs_gt = None\n        # refine and gather the prediction strings\n        record = self.gather_lifting_results(record,\n                                             all_kpts_2d,\n                                             all_kpts_3d_pred, \n                                             all_kpts_3d_gt,\n                                             all_pose_vecs_gt,\n                                             color=color_dict['bbox_3d'],\n                                             alpha_mode=alpha_mode,\n                                             visualize=visualize,\n                                             get_str=save_dict['flag']\n                                             )\n\n        # save KITTI-style prediction file in .txt format\n        save_txt_file(img_path, record, save_dict)\n        return record\n\n    def post_process(self, \n                     records,\n                     visualize=False, \n                     color_dict={'bbox_2d':'r',\n                                 'kpts':['ro', 'b'],\n                                 },\n                     save_dict={'flag':False,\n                                'save_dir':None\n                                },\n                     alpha_mode='trans'\n                     ):\n        \"\"\"\n        Save save and visualize them optionally.\n        \"\"\"   \n        for img_path in records.keys():\n            print(\"Processing {:s}\".format(img_path))\n            records[img_path] = self.plot_one_image(img_path, \n                                                    records[img_path],\n                                                    visualize=visualize,\n                                                    color_dict=color_dict,\n                                                    save_dict=save_dict,\n                                                    alpha_mode=alpha_mode\n                                                    )      \n        return records\n    \n    def new_img_dict(self):\n        \"\"\"\n        An empty dictionary for image-level records.\n        \"\"\"\n        img_dict = {'center':[], \n                    'scale':[], \n                    'rotation':[], \n                    'bbox_resize':[], # resized bounding box \n                    'kpts_2d_pred':[], \n                    'label':[], \n                    'score':[] \n                    }        \n        return img_dict\n    \n    def get_keypoints(self,\n                      instances, \n                      records, \n                      is_cuda=True\n                      ):\n        \"\"\"\n        Foward pass to obtain the screen coordinates.\n        \"\"\"\n        if is_cuda:\n            instances = instances.cuda()\n        output = self.HC(instances)\n        # local part coordinates\n        width, height = self.resolution\n        local_coord = output[1].data.cpu().numpy()\n        local_coord *= np.array(self.resolution).reshape(1, 1, 2)\n        # transform local part coordinates to screen coordinates\n        centers = [records[i]['center'] for i in range(len(records))]\n        scales = [records[i]['scale'] for i in range(len(records))]\n        rots = [records[i]['rotation'] for i in range(len(records))]    \n        for instance_idx in range(len(local_coord)):\n            trans_inv = get_affine_transform(centers[instance_idx],\n                                             scales[instance_idx], \n                                             rots[instance_idx], \n                                             (height, width), \n                                             inv=1\n                                             )\n            screen_coord = affine_transform_modified(local_coord[instance_idx], \n                                                     trans_inv\n                                                     ) \n            records[instance_idx]['kpts'] = screen_coord\n        # assemble a dictionary where each key corresponds to one image\n        ret = {}\n        for record in records:\n            path = record['path']\n            if path not in ret:\n                ret[path] = self.new_img_dict()\n            ret[path]['kpts_2d_pred'].append(record['kpts'].reshape(1, -1))\n            ret[path]['center'].append(record['center'])\n            ret[path]['scale'].append(record['scale'])\n            ret[path]['bbox_resize'].append(record['bbox_resize'])\n            ret[path]['label'].append(record['label'])\n            ret[path]['score'].append(record['score'])\n            ret[path]['rotation'].append(record['rotation'])\n        return ret\n\n    def lift_2d_to_3d(self, records, cuda=True):\n        \"\"\"\n        Foward-pass of the lifter sub-model.\n        \"\"\"      \n        for path in records.keys():\n            data = np.concatenate(records[path]['kpts_2d_pred'], axis=0)\n            data = nop.normalize_1d(data, self.LS['mean_in'], self.LS['std_in'])\n            data = data.astype(np.float32)\n            data = torch.from_numpy(data)\n            if cuda:\n                data = data.cuda()\n            prediction = self.L(data)  \n            prediction = nop.unnormalize_1d(prediction.data.cpu().numpy(),\n                                            self.LS['mean_out'], \n                                            self.LS['std_out']\n                                            )\n            records[path]['kpts_3d_pred'] = prediction.reshape(len(prediction), -1, 3)\n        return records\n    \n    def forward(self, annot_dict):\n        \"\"\"\n        Process a batch of images.\n        \n        annot_dict is a Python dictionary storing the following keys: \n            path: list of image paths\n            boxes: list of bounding boxes for each image\n        \"\"\"\n        all_instances, all_records = self.crop_instances(annot_dict, \n                                                         resolution=self.resolution,\n                                                         pth_trans=self.pth_trans,\n                                                         xy_dict=self.xy_dict\n                                                         )\n        # all_records stores records for each instance\n        records = self.get_keypoints(all_instances, all_records)\n        # records stores records for each image\n        records = self.lift_2d_to_3d(records)\n        # write the annotation dictionary\n        records = self.write_annot_dict(annot_dict, records)\n        return records"
  },
  {
    "path": "libs/model/heatmapModel/__init__.py",
    "content": ""
  },
  {
    "path": "libs/model/heatmapModel/hrnet.py",
    "content": "# ------------------------------------------------------------------------------\n# Copyright (c) Microsoft\n# Licensed under the MIT License.\n# Written by Bin Xiao (Bin.Xiao@microsoft.com)\n# Modified by Shichao Li (nicholas.li@connect.ust.hk)\n# ------------------------------------------------------------------------------\n\n# from __future__ import absolute_import\n# from __future__ import division\n# from __future__ import print_function\n\nimport os\nimport logging\n\nimport torch\nimport torch.nn as nn\n\nimport numpy as np\n\nBN_MOMENTUM = 0.1\nlogger = logging.getLogger(__name__)\n\n\ndef conv3x3(in_planes, out_planes, stride=1):\n    \"\"\"3x3 convolution with padding\"\"\"\n    return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,\n                     padding=1, bias=False)\n\ndef basicdownsample(in_planes, out_planes):\n    downsample = nn.Sequential(\n    nn.Conv2d(\n        in_planes,\n        out_planes,\n        kernel_size=1, \n        stride=2, \n        bias=False\n        ),\n    nn.BatchNorm2d(\n        out_planes\n        ),\n    )\n    return downsample\n\nclass BasicLinearModule(nn.Module):\n    def __init__(self, in_channels, out_channels, mid_channels=512):\n        super(BasicLinearModule, self).__init__()\n        self.l1 = nn.Linear(in_channels, out_channels)\n        # self.l1 = nn.Linear(in_channels, mid_channels)\n        # self.bn1 = nn.BatchNorm1d(mid_channels, momentum=BN_MOMENTUM)\n        # self.relu = nn.ReLU(inplace=True)\n        # self.l2 = nn.Linear(mid_channels, out_channels)\n\n    def forward(self, x):\n        x = x.view(len(x), -1)\n\n        out = self.l1(x)\n        # out = self.bn1(out)\n        # out = self.relu(out)\n\n        # out = self.l2(out)\n        return out\n    \nclass BasicBlock(nn.Module):\n    expansion = 1\n\n    def __init__(self, inplanes, planes, stride=1, downsample=None):\n        super(BasicBlock, self).__init__()\n        self.conv1 = conv3x3(inplanes, planes, stride)\n        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = conv3x3(planes, planes)\n        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n\n        if self.downsample is not None:\n            residual = self.downsample(x)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\n\nclass Bottleneck(nn.Module):\n    expansion = 4\n\n    def __init__(self, inplanes, planes, stride=1, downsample=None):\n        super(Bottleneck, self).__init__()\n        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)\n        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,\n                               padding=1, bias=False)\n        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)\n        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,\n                               bias=False)\n        self.bn3 = nn.BatchNorm2d(planes * self.expansion,\n                                  momentum=BN_MOMENTUM)\n        self.relu = nn.ReLU(inplace=True)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu(out)\n\n        out = self.conv3(out)\n        out = self.bn3(out)\n\n        if self.downsample is not None:\n            residual = self.downsample(x)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\n\nclass HighResolutionModule(nn.Module):\n    def __init__(self, num_branches, blocks, num_blocks, num_inchannels,\n                 num_channels, fuse_method, multi_scale_output=True):\n        super(HighResolutionModule, self).__init__()\n        self._check_branches(\n            num_branches, blocks, num_blocks, num_inchannels, num_channels)\n\n        self.num_inchannels = num_inchannels\n        self.fuse_method = fuse_method\n        self.num_branches = num_branches\n\n        self.multi_scale_output = multi_scale_output\n\n        self.branches = self._make_branches(\n            num_branches, blocks, num_blocks, num_channels)\n        self.fuse_layers = self._make_fuse_layers()\n        self.relu = nn.ReLU(True)\n\n    def _check_branches(self, num_branches, blocks, num_blocks,\n                        num_inchannels, num_channels):\n        if num_branches != len(num_blocks):\n            error_msg = 'NUM_BRANCHES({}) <> NUM_BLOCKS({})'.format(\n                num_branches, len(num_blocks))\n            logger.error(error_msg)\n            raise ValueError(error_msg)\n\n        if num_branches != len(num_channels):\n            error_msg = 'NUM_BRANCHES({}) <> NUM_CHANNELS({})'.format(\n                num_branches, len(num_channels))\n            logger.error(error_msg)\n            raise ValueError(error_msg)\n\n        if num_branches != len(num_inchannels):\n            error_msg = 'NUM_BRANCHES({}) <> NUM_INCHANNELS({})'.format(\n                num_branches, len(num_inchannels))\n            logger.error(error_msg)\n            raise ValueError(error_msg)\n\n    def _make_one_branch(self, branch_index, block, num_blocks, num_channels,\n                         stride=1):\n        downsample = None\n        if stride != 1 or \\\n           self.num_inchannels[branch_index] != num_channels[branch_index] * block.expansion:\n            downsample = nn.Sequential(\n                nn.Conv2d(\n                    self.num_inchannels[branch_index],\n                    num_channels[branch_index] * block.expansion,\n                    kernel_size=1, stride=stride, bias=False\n                ),\n                nn.BatchNorm2d(\n                    num_channels[branch_index] * block.expansion,\n                    momentum=BN_MOMENTUM\n                ),\n            )\n\n        layers = []\n        layers.append(\n            block(\n                self.num_inchannels[branch_index],\n                num_channels[branch_index],\n                stride,\n                downsample\n            )\n        )\n        self.num_inchannels[branch_index] = \\\n            num_channels[branch_index] * block.expansion\n        for i in range(1, num_blocks[branch_index]):\n            layers.append(\n                block(\n                    self.num_inchannels[branch_index],\n                    num_channels[branch_index]\n                )\n            )\n\n        return nn.Sequential(*layers)\n\n    def _make_branches(self, num_branches, block, num_blocks, num_channels):\n        branches = []\n\n        for i in range(num_branches):\n            branches.append(\n                self._make_one_branch(i, block, num_blocks, num_channels)\n            )\n\n        return nn.ModuleList(branches)\n\n    def _make_fuse_layers(self):\n        if self.num_branches == 1:\n            return None\n\n        num_branches = self.num_branches\n        num_inchannels = self.num_inchannels\n        fuse_layers = []\n        for i in range(num_branches if self.multi_scale_output else 1):\n            fuse_layer = []\n            for j in range(num_branches):\n                if j > i:\n                    fuse_layer.append(\n                        nn.Sequential(\n                            nn.Conv2d(\n                                num_inchannels[j],\n                                num_inchannels[i],\n                                1, 1, 0, bias=False\n                            ),\n                            nn.BatchNorm2d(num_inchannels[i]),\n                            nn.Upsample(scale_factor=2**(j-i), mode='nearest')\n                        )\n                    )\n                elif j == i:\n                    fuse_layer.append(None)\n                else:\n                    conv3x3s = []\n                    for k in range(i-j):\n                        if k == i - j - 1:\n                            num_outchannels_conv3x3 = num_inchannels[i]\n                            conv3x3s.append(\n                                nn.Sequential(\n                                    nn.Conv2d(\n                                        num_inchannels[j],\n                                        num_outchannels_conv3x3,\n                                        3, 2, 1, bias=False\n                                    ),\n                                    nn.BatchNorm2d(num_outchannels_conv3x3)\n                                )\n                            )\n                        else:\n                            num_outchannels_conv3x3 = num_inchannels[j]\n                            conv3x3s.append(\n                                nn.Sequential(\n                                    nn.Conv2d(\n                                        num_inchannels[j],\n                                        num_outchannels_conv3x3,\n                                        3, 2, 1, bias=False\n                                    ),\n                                    nn.BatchNorm2d(num_outchannels_conv3x3),\n                                    nn.ReLU(True)\n                                )\n                            )\n                    fuse_layer.append(nn.Sequential(*conv3x3s))\n            fuse_layers.append(nn.ModuleList(fuse_layer))\n\n        return nn.ModuleList(fuse_layers)\n\n    def get_num_inchannels(self):\n        return self.num_inchannels\n\n    def forward(self, x):\n        if self.num_branches == 1:\n            return [self.branches[0](x[0])]\n\n        for i in range(self.num_branches):\n            x[i] = self.branches[i](x[i])\n\n        x_fuse = []\n\n        for i in range(len(self.fuse_layers)):\n            y = x[0] if i == 0 else self.fuse_layers[i][0](x[0])\n            for j in range(1, self.num_branches):\n                if i == j:\n                    y = y + x[j]\n                else:\n                    y = y + self.fuse_layers[i][j](x[j])\n            x_fuse.append(self.relu(y))\n\n        return x_fuse\n\n\nblocks_dict = {\n    'basic': BasicBlock,\n    'bottleneck': Bottleneck\n}\n\n\nclass PoseHighResolutionNet(nn.Module):\n\n    def __init__(self, cfgs, **kwargs):\n        self.inplanes = 64\n        self.num_joints = cfgs['heatmapModel']['num_joints']\n        extra = cfgs['heatmapModel']['extra']\n        super(PoseHighResolutionNet, self).__init__()\n\n        # stem net\n        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=2, padding=1,\n                               bias=False)\n        self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)\n        self.conv2 = nn.Conv2d(64, 64, kernel_size=3, stride=2, padding=1,\n                               bias=False)\n        self.bn2 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)\n        self.relu = nn.ReLU(inplace=True)\n        self.layer1 = self._make_layer(Bottleneck, 64, 4)\n\n        self.stage2_cfg = cfgs['heatmapModel']['extra']['stage2']\n        num_channels = self.stage2_cfg['num_channels']\n        block = blocks_dict[self.stage2_cfg['block']]\n        num_channels = [\n            num_channels[i] * block.expansion for i in range(len(num_channels))\n        ]\n        self.transition1 = self._make_transition_layer([256], num_channels)\n        self.stage2, pre_stage_channels = self._make_stage(\n            self.stage2_cfg, num_channels)\n\n        self.stage3_cfg = cfgs['heatmapModel']['extra']['stage3']\n        num_channels = self.stage3_cfg['num_channels']\n        block = blocks_dict[self.stage3_cfg['block']]\n        num_channels = [\n            num_channels[i] * block.expansion for i in range(len(num_channels))\n        ]\n        self.transition2 = self._make_transition_layer(\n            pre_stage_channels, num_channels)\n        self.stage3, pre_stage_channels = self._make_stage(\n            self.stage3_cfg, num_channels)\n\n        self.stage4_cfg = cfgs['heatmapModel']['extra']['stage4']\n        num_channels = self.stage4_cfg['num_channels']\n        block = blocks_dict[self.stage4_cfg['block']]\n        num_channels = [\n            num_channels[i] * block.expansion for i in range(len(num_channels))\n        ]\n        self.transition3 = self._make_transition_layer(\n            pre_stage_channels, num_channels)\n        self.stage4, pre_stage_channels = self._make_stage(\n            self.stage4_cfg, num_channels, multi_scale_output=False)\n        \n        self.pretrained_layers = cfgs['heatmapModel']['extra']['pretrained_layers']\n        \n        # network head\n        self.head_type = cfgs['heatmapModel']['head_type']\n        self.pixel_shuffle = cfgs['heatmapModel']['pixel_shuffle']\n        if self.head_type == 'heatmap':\n            self.final_layer = nn.Conv2d(\n                in_channels=pre_stage_channels[0],\n                out_channels=self.num_joints,\n                kernel_size=extra['final_conv_kernel'],\n                stride=1,\n                padding=1 if extra['final_conv_kernel'] == 3 else 0\n            )\n            \n            if cfgs['heatmapModel']['pixel_shuffle']:\n            # Add a pixel shuffle upsampling layer to control the heatmap size\n                self.upsamp_fact = int(cfgs['heatmapModel']['heatmap_size'][0]\\\n                    /cfgs['heatmapModel']['input_size'][0]*4)\n                self.upsample_layer = nn.Sequential(\n                        nn.Conv2d(self.num_joints, self.num_joints*self.upsamp_fact**2, \n                                  kernel_size=1),\n                        nn.BatchNorm2d(self.num_joints*self.upsamp_fact**2),\n                        nn.ReLU(inplace=True),                \n                        nn.PixelShuffle(self.upsamp_fact)\n                        )\n        elif self.head_type == 'angleregression':\n            num_chan = 256\n            self.head = nn.Sequential(\n                nn.Conv2d(\n                    in_channels=pre_stage_channels[0],\n                    out_channels=num_chan,\n                    kernel_size=1,\n                    stride=1,\n                    padding=0\n                    ), \n                # produce 8*8*num_joints tensor\n                BasicBlock(num_chan, \n                           num_chan, \n                           stride=2,\n                           downsample=basicdownsample(num_chan, num_chan)\n                           ),\n                BasicBlock(num_chan, \n                           num_chan, \n                           stride=2,\n                           downsample=basicdownsample(num_chan, num_chan)\n                           ),\n                BasicBlock(num_chan, \n                           num_chan, \n                           stride=2,\n                           downsample=basicdownsample(num_chan, num_chan)\n                           ),\n                BasicBlock(num_chan, \n                           num_chan, \n                           stride=2,\n                           downsample=basicdownsample(num_chan, num_chan)\n                           ),\n                nn.AvgPool2d(kernel_size=4),                \n                )\n            self.final_fc = nn.Sequential(\n                nn.Linear(256, 256),\n                nn.BatchNorm1d(256),\n                nn.ReLU(inplace=True),\n                nn.Linear(256, 2)\n                )\n        elif self.head_type == 'coordinates':\n            num_chan = self.num_joints\n            map_width, map_height = cfgs['heatmapModel']['heatmap_size']\n            ks = (int(map_height / 16), int(map_width / 16))\n            self.head1 = nn.Sequential(\n                nn.Conv2d(\n                    in_channels=pre_stage_channels[0],\n                    out_channels=self.num_joints,\n                    kernel_size=1,\n                    stride=1,\n                    padding=0\n                    ), \n                )\n            self.head2 = nn.Sequential(\n                BasicBlock(num_chan+2, \n                           num_chan*2, \n                           stride=2,\n                           downsample=basicdownsample(num_chan+2, num_chan*2)\n                           ),\n                BasicBlock(num_chan*2, \n                           num_chan*2, \n                           stride=2,\n                           downsample=basicdownsample(num_chan*2, num_chan*2)\n                           ),\n                BasicBlock(num_chan*2, \n                           num_chan*2, \n                           stride=2,\n                           downsample=basicdownsample(num_chan*2, num_chan*2)\n                           ),\n                BasicBlock(num_chan*2, \n                           num_chan*2, \n                           stride=2,\n                           downsample=basicdownsample(num_chan*2, num_chan*2)\n                           ),\n                nn.Conv2d(num_chan*2, num_chan*2, kernel_size=ks),\n                nn.Sigmoid()\n                ) \n            # coordinate convolution makes arg-max easier\n            x_map = np.tile(np.linspace(0, 1, map_width), (map_height, 1))\n            x_map = x_map.reshape(1, 1, map_height, map_width)\n            y_map = np.linspace(0, 1, map_height).reshape(map_height, 1)\n            y_map = np.tile(y_map, (1, map_width))\n            y_map = y_map.reshape(1, 1, map_height, map_width)\n            self.coor_maps = np.concatenate([x_map, y_map], axis=1).astype(np.float32)\n            self.coor_maps = torch.from_numpy(self.coor_maps)\n        else:\n            raise NotImplementedError\n\n    def _make_transition_layer(\n            self, num_channels_pre_layer, num_channels_cur_layer):\n        num_branches_cur = len(num_channels_cur_layer)\n        num_branches_pre = len(num_channels_pre_layer)\n\n        transition_layers = []\n        for i in range(num_branches_cur):\n            if i < num_branches_pre:\n                if num_channels_cur_layer[i] != num_channels_pre_layer[i]:\n                    transition_layers.append(\n                        nn.Sequential(\n                            nn.Conv2d(\n                                num_channels_pre_layer[i],\n                                num_channels_cur_layer[i],\n                                3, 1, 1, bias=False\n                            ),\n                            nn.BatchNorm2d(num_channels_cur_layer[i]),\n                            nn.ReLU(inplace=True)\n                        )\n                    )\n                else:\n                    transition_layers.append(None)\n            else:\n                conv3x3s = []\n                for j in range(i+1-num_branches_pre):\n                    inchannels = num_channels_pre_layer[-1]\n                    outchannels = num_channels_cur_layer[i] \\\n                        if j == i-num_branches_pre else inchannels\n                    conv3x3s.append(\n                        nn.Sequential(\n                            nn.Conv2d(\n                                inchannels, outchannels, 3, 2, 1, bias=False\n                            ),\n                            nn.BatchNorm2d(outchannels),\n                            nn.ReLU(inplace=True)\n                        )\n                    )\n                transition_layers.append(nn.Sequential(*conv3x3s))\n\n        return nn.ModuleList(transition_layers)\n\n    def _make_layer(self, block, planes, blocks, stride=1):\n        downsample = None\n        if stride != 1 or self.inplanes != planes * block.expansion:\n            downsample = nn.Sequential(\n                nn.Conv2d(\n                    self.inplanes, planes * block.expansion,\n                    kernel_size=1, stride=stride, bias=False\n                ),\n                nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),\n            )\n\n        layers = []\n        layers.append(block(self.inplanes, planes, stride, downsample))\n        self.inplanes = planes * block.expansion\n        for i in range(1, blocks):\n            layers.append(block(self.inplanes, planes))\n\n        return nn.Sequential(*layers)\n\n    def _make_stage(self, layer_config, num_inchannels,\n                    multi_scale_output=True):\n        num_modules = layer_config['num_modules']\n        num_branches = layer_config['num_branches']\n        num_blocks = layer_config['num_blocks']\n        num_channels = layer_config['num_channels']\n        block = blocks_dict[layer_config['block']]\n        fuse_method = layer_config['fuse_method']\n\n        modules = []\n        for i in range(num_modules):\n            # multi_scale_output is only used last module\n            if not multi_scale_output and i == num_modules - 1:\n                reset_multi_scale_output = False\n            else:\n                reset_multi_scale_output = True\n\n            modules.append(\n                HighResolutionModule(\n                    num_branches,\n                    block,\n                    num_blocks,\n                    num_inchannels,\n                    num_channels,\n                    fuse_method,\n                    reset_multi_scale_output\n                )\n            )\n            num_inchannels = modules[-1].get_num_inchannels()\n\n        return nn.Sequential(*modules), num_inchannels\n\n    def forward(self, x):\n        x = self.conv1(x)\n        x = self.bn1(x)\n        x = self.relu(x)\n        x = self.conv2(x)\n        x = self.bn2(x)\n        x = self.relu(x)\n        x = self.layer1(x)\n\n        x_list = []\n        for i in range(self.stage2_cfg['num_branches']):\n            if self.transition1[i] is not None:\n                x_list.append(self.transition1[i](x))\n            else:\n                x_list.append(x)\n        y_list = self.stage2(x_list)\n\n        x_list = []\n        for i in range(self.stage3_cfg['num_branches']):\n            if self.transition2[i] is not None:\n                x_list.append(self.transition2[i](y_list[-1]))\n            else:\n                x_list.append(y_list[i])\n        y_list = self.stage3(x_list)\n\n        x_list = []\n        for i in range(self.stage4_cfg['num_branches']):\n            if self.transition3[i] is not None:\n                x_list.append(self.transition3[i](y_list[-1]))\n            else:\n                x_list.append(y_list[i])\n        y_list = self.stage4(x_list)\n        \n        if self.head_type == 'heatmap':\n            x = self.final_layer(y_list[0])        \n            # upsampling\n            if self.pixel_shuffle:\n                x = self.upsample_layer(x)\n        elif self.head_type == 'coordinates':\n            maps = self.head1(y_list[0])\n            # concatenate coordinate maps\n            num_sample = len(maps)\n            coor_maps = self.coor_maps.repeat(num_sample, 1, 1, 1).to(maps.device)\n            augmented_maps = torch.cat([maps, coor_maps], dim=1)\n            coordinates = self.head2(augmented_maps)\n            x = (maps, coordinates.view(len(x), -1, 2))\n        elif self.head_type == 'angleregression':\n            maps = self.head(y_list[0])\n            x = self.final_fc(maps.reshape(len(maps), -1))\n        else:\n            raise NotImplementedError()\n        return x\n\n    def init_weights(self, pretrained=''):\n        logger.info('=> init weights from normal distribution')\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n                nn.init.normal_(m.weight, std=0.001)\n                for name, _ in m.named_parameters():\n                    if name in ['bias']:\n                        nn.init.constant_(m.bias, 0)\n            elif isinstance(m, nn.BatchNorm2d):\n                nn.init.constant_(m.weight, 1)\n                nn.init.constant_(m.bias, 0)\n            elif isinstance(m, nn.ConvTranspose2d):\n                nn.init.normal_(m.weight, std=0.001)\n                for name, _ in m.named_parameters():\n                    if name in ['bias']:\n                        nn.init.constant_(m.bias, 0)\n\n        if os.path.isfile(pretrained):\n            pretrained_state_dict = torch.load(pretrained)\n            logger.info('=> loading pretrained model {}'.format(pretrained))\n\n            need_init_state_dict = {}\n            for name, m in pretrained_state_dict.items():\n                if name.split('.')[0] in self.pretrained_layers \\\n                   or self.pretrained_layers[0] == '*':\n                    need_init_state_dict[name] = m\n            self.load_state_dict(need_init_state_dict, strict=False)\n            logger.info('{:d} modules initialized.'.format(len(need_init_state_dict)))\n        elif pretrained:\n            logger.error('=> please download pre-trained models first!')\n            raise ValueError('{} does not exist!'.format(pretrained))\n    \n    def modify_input_channel(self, num_channels):\n        if num_channels == 3:\n            return\n        new_layer = nn.Conv2d(num_channels, 64, kernel_size=3, stride=2, padding=1,\n                               bias=False)\n        # copy the old weights\n        with torch.no_grad():\n            new_layer.weight[:,:3,:,:] = self.conv1.weight.clone()\n        del self.conv1\n        self.conv1 = new_layer\n        return\n    \n    def load_my_state_dict(self, state_dict):\n        own_state = self.state_dict()\n        for name, param in state_dict.items():\n            if name not in own_state:\n                 continue\n            param = param.data\n            own_state[name].copy_(param)\n\ndef is_freezed(name, freeze_names):\n    for prefix in freeze_names:\n        if name.startswith(prefix):\n            return True\n    return False\n            \ndef get_pose_net(cfgs, is_train, **kwargs):\n    model = PoseHighResolutionNet(cfgs, **kwargs)\n\n    if is_train and cfgs['heatmapModel']['init_weights']:\n        model.init_weights(cfgs['heatmapModel']['pretrained'])\n\n    # freeze specified pre-trained layers\n    freeze_names = cfgs['heatmapModel']['extra'].get('freeze_layers', [])\n    for name, param in model.named_parameters():\n        if is_freezed(name, freeze_names):\n            param.requires_grad = False\n            print('{:s} freezed during training.'.format(name))\n                \n    if cfgs['heatmapModel']['add_xy']:\n        model.modify_input_channel(5)\n    return model"
  },
  {
    "path": "libs/model/heatmapModel/resnet.py",
    "content": "# ------------------------------------------------------------------------------\n# Copyright (c) Microsoft\n# Licensed under the MIT License.\n# Written by Bin Xiao (Bin.Xiao@microsoft.com)\n# ------------------------------------------------------------------------------\n\n#from __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport os\nimport logging\n\nimport torch\nimport torch.nn as nn\n\n\nBN_MOMENTUM = 0.1\nlogger = logging.getLogger(__name__)\n\n\ndef conv3x3(in_planes, out_planes, stride=1):\n    \"\"\"3x3 convolution with padding\"\"\"\n    return nn.Conv2d(\n        in_planes, out_planes, kernel_size=3, stride=stride,\n        padding=1, bias=False\n    )\n\n\nclass BasicBlock(nn.Module):\n    expansion = 1\n\n    def __init__(self, inplanes, planes, stride=1, downsample=None):\n        super(BasicBlock, self).__init__()\n        self.conv1 = conv3x3(inplanes, planes, stride)\n        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)\n        self.relu = nn.ReLU(inplace=True)\n        self.conv2 = conv3x3(planes, planes)\n        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n\n        if self.downsample is not None:\n            residual = self.downsample(x)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\n\nclass Bottleneck(nn.Module):\n    expansion = 4\n\n    def __init__(self, inplanes, planes, stride=1, downsample=None):\n        super(Bottleneck, self).__init__()\n        self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)\n        self.bn1 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)\n        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,\n                               padding=1, bias=False)\n        self.bn2 = nn.BatchNorm2d(planes, momentum=BN_MOMENTUM)\n        self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1,\n                               bias=False)\n        self.bn3 = nn.BatchNorm2d(planes * self.expansion,\n                                  momentum=BN_MOMENTUM)\n        self.relu = nn.ReLU(inplace=True)\n        self.downsample = downsample\n        self.stride = stride\n\n    def forward(self, x):\n        residual = x\n\n        out = self.conv1(x)\n        out = self.bn1(out)\n        out = self.relu(out)\n\n        out = self.conv2(out)\n        out = self.bn2(out)\n        out = self.relu(out)\n\n        out = self.conv3(out)\n        out = self.bn3(out)\n\n        if self.downsample is not None:\n            residual = self.downsample(x)\n\n        out += residual\n        out = self.relu(out)\n\n        return out\n\n\nclass PoseResNet(nn.Module):\n\n    def __init__(self, block, layers, cfg, **kwargs):\n        self.inplanes = 64\n        extra = cfg.MODEL.EXTRA\n        self.deconv_with_bias = extra.DECONV_WITH_BIAS\n\n        super(PoseResNet, self).__init__()\n        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,\n                               bias=False)\n        self.bn1 = nn.BatchNorm2d(64, momentum=BN_MOMENTUM)\n        self.relu = nn.ReLU(inplace=True)\n        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)\n        self.layer1 = self._make_layer(block, 64, layers[0])\n        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)\n        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)\n        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)\n\n        # used for deconv layers\n        self.deconv_layers = self._make_deconv_layer(\n            extra.NUM_DECONV_LAYERS,\n            extra.NUM_DECONV_FILTERS,\n            extra.NUM_DECONV_KERNELS,\n        )\n\n        self.final_layer = nn.Conv2d(\n            in_channels=extra.NUM_DECONV_FILTERS[-1],\n            out_channels=cfg.MODEL.NUM_JOINTS,\n            kernel_size=extra.FINAL_CONV_KERNEL,\n            stride=1,\n            padding=1 if extra.FINAL_CONV_KERNEL == 3 else 0\n        )\n\n    def _make_layer(self, block, planes, blocks, stride=1):\n        downsample = None\n        if stride != 1 or self.inplanes != planes * block.expansion:\n            downsample = nn.Sequential(\n                nn.Conv2d(self.inplanes, planes * block.expansion,\n                          kernel_size=1, stride=stride, bias=False),\n                nn.BatchNorm2d(planes * block.expansion, momentum=BN_MOMENTUM),\n            )\n\n        layers = []\n        layers.append(block(self.inplanes, planes, stride, downsample))\n        self.inplanes = planes * block.expansion\n        for i in range(1, blocks):\n            layers.append(block(self.inplanes, planes))\n\n        return nn.Sequential(*layers)\n\n    def _get_deconv_cfg(self, deconv_kernel, index):\n        if deconv_kernel == 4:\n            padding = 1\n            output_padding = 0\n        elif deconv_kernel == 3:\n            padding = 1\n            output_padding = 1\n        elif deconv_kernel == 2:\n            padding = 0\n            output_padding = 0\n\n        return deconv_kernel, padding, output_padding\n\n    def _make_deconv_layer(self, num_layers, num_filters, num_kernels):\n        assert num_layers == len(num_filters), \\\n            'ERROR: num_deconv_layers is different len(num_deconv_filters)'\n        assert num_layers == len(num_kernels), \\\n            'ERROR: num_deconv_layers is different len(num_deconv_filters)'\n\n        layers = []\n        for i in range(num_layers):\n            kernel, padding, output_padding = \\\n                self._get_deconv_cfg(num_kernels[i], i)\n\n            planes = num_filters[i]\n            layers.append(\n                nn.ConvTranspose2d(\n                    in_channels=self.inplanes,\n                    out_channels=planes,\n                    kernel_size=kernel,\n                    stride=2,\n                    padding=padding,\n                    output_padding=output_padding,\n                    bias=self.deconv_with_bias))\n            layers.append(nn.BatchNorm2d(planes, momentum=BN_MOMENTUM))\n            layers.append(nn.ReLU(inplace=True))\n            self.inplanes = planes\n\n        return nn.Sequential(*layers)\n\n    def forward(self, x):\n        x = self.conv1(x)\n        x = self.bn1(x)\n        x = self.relu(x)\n        x = self.maxpool(x)\n\n        x = self.layer1(x)\n        x = self.layer2(x)\n        x = self.layer3(x)\n        x = self.layer4(x)\n\n        x = self.deconv_layers(x)\n        x = self.final_layer(x)\n\n        return x\n\n    def init_weights(self, pretrained=''):\n        if os.path.isfile(pretrained):\n            logger.info('=> init deconv weights from normal distribution')\n            for name, m in self.deconv_layers.named_modules():\n                if isinstance(m, nn.ConvTranspose2d):\n                    logger.info('=> init {}.weight as normal(0, 0.001)'.format(name))\n                    logger.info('=> init {}.bias as 0'.format(name))\n                    nn.init.normal_(m.weight, std=0.001)\n                    if self.deconv_with_bias:\n                        nn.init.constant_(m.bias, 0)\n                elif isinstance(m, nn.BatchNorm2d):\n                    logger.info('=> init {}.weight as 1'.format(name))\n                    logger.info('=> init {}.bias as 0'.format(name))\n                    nn.init.constant_(m.weight, 1)\n                    nn.init.constant_(m.bias, 0)\n            logger.info('=> init final conv weights from normal distribution')\n            for m in self.final_layer.modules():\n                if isinstance(m, nn.Conv2d):\n                    # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n                    logger.info('=> init {}.weight as normal(0, 0.001)'.format(name))\n                    logger.info('=> init {}.bias as 0'.format(name))\n                    nn.init.normal_(m.weight, std=0.001)\n                    nn.init.constant_(m.bias, 0)\n\n            pretrained_state_dict = torch.load(pretrained)\n            logger.info('=> loading pretrained model {}'.format(pretrained))\n            self.load_state_dict(pretrained_state_dict, strict=False)\n        else:\n            logger.info('=> init weights from normal distribution')\n            for m in self.modules():\n                if isinstance(m, nn.Conv2d):\n                    # nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')\n                    nn.init.normal_(m.weight, std=0.001)\n                    # nn.init.constant_(m.bias, 0)\n                elif isinstance(m, nn.BatchNorm2d):\n                    nn.init.constant_(m.weight, 1)\n                    nn.init.constant_(m.bias, 0)\n                elif isinstance(m, nn.ConvTranspose2d):\n                    nn.init.normal_(m.weight, std=0.001)\n                    if self.deconv_with_bias:\n                        nn.init.constant_(m.bias, 0)\n\n\nresnet_spec = {\n    18: (BasicBlock, [2, 2, 2, 2]),\n    34: (BasicBlock, [3, 4, 6, 3]),\n    50: (Bottleneck, [3, 4, 6, 3]),\n    101: (Bottleneck, [3, 4, 23, 3]),\n    152: (Bottleneck, [3, 8, 36, 3])\n}\n\n\ndef get_pose_net(cfg, is_train, **kwargs):\n    num_layers = cfg.MODEL.EXTRA.NUM_LAYERS\n\n    block_class, layers = resnet_spec[num_layers]\n\n    model = PoseResNet(block_class, layers, cfg, **kwargs)\n\n    if is_train and cfg.MODEL.INIT_WEIGHTS:\n        model.init_weights(cfg.MODEL.PRETRAINED)\n\n    return model"
  },
  {
    "path": "libs/optimizer/__init__.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nEmpty file.\n\"\"\"\n\n\n"
  },
  {
    "path": "libs/optimizer/optimizer.py",
    "content": "\"\"\"\nOptimization utilities.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\nimport torch\n\ndef prepare_optim(model, cfgs):\n    \"\"\"\n    Get optimizer and scheduler objects from model parameters.\n    \"\"\"      \n    params = [ p for p in model.parameters() if p.requires_grad]\n    lr = cfgs['optimizer']['lr']\n    weight_decay = cfgs['optimizer']['weight_decay']\n    momentum = cfgs['optimizer']['momentum']\n    milestones = cfgs['optimizer']['milestones']\n    gamma = cfgs['optimizer']['gamma']\n    if cfgs['optimizer']['optim_type'] == 'adam':\n        optimizer = torch.optim.Adam(params, \n                                     lr = lr, \n                                     weight_decay = weight_decay)\n    elif cfgs['optimizer']['optim_type'] == 'sgd':\n        optimizer = torch.optim.SGD(params, \n                                    lr = lr, \n                                    momentum = momentum,\n                                    weight_decay = weight_decay)\n    else:\n        raise NotImplementedError\n    scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, \n                                                     milestones = milestones, \n                                                     gamma = gamma\n                                                     )\n    # A scheduler that automatically decreases the learning rate\n#    scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer,\n#                                                           mode='min',\n#                                                           factor=0.5,\n#                                                           patience=10,\n#                                                           verbose=True,\n#                                                           min_lr=0.01)\n    return optimizer, scheduler\n"
  },
  {
    "path": "libs/trainer/__init__.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nEmpty file.\n\"\"\"\n\n\n"
  },
  {
    "path": "libs/trainer/accuracy.py",
    "content": "\"\"\"\nDeprecated. Will be deleted in a future version.\nPre-defined accuracy functions.\n\"\"\"\n\nimport libs.common.img_proc as lip\n\nimport numpy as np\n\ndef get_distance(gt, pred):\n    # gt: [n_joints, 2 or 3]\n    # pred: [n_joints, 2]\n    if gt.shape[1] == 2:\n        sqerr = (gt - pred)**2\n        sqerr = sqerr.sum(axis = 1)\n        dist_list = list(np.sqrt(sqerr))\n    elif gt.shape[1] == 3:\n        dist_list = []\n        sqerr = (gt[:, :2] - pred)**2\n        sqerr = sqerr.sum(axis = 1)\n        indices = np.nonzero(gt[:, 2])[0]\n        dist_list = list(np.sqrt(sqerr[indices]))        \n    else:\n        raise ValueError('Array shape not supported.')\n    return dist_list\n\ndef accuracy_pixel(output, \n                   meta_data, \n                   cfgs=None,\n                   image_size = (256.0, 256.0), \n                   arg_max='hard'\n                   ):\n    \"\"\"\n    pixel-wise distance computed from predicted heatmaps\n    \"\"\"\n    # report distance in terms of pixel in the original image\n    if arg_max == 'soft':\n        if isinstance(output, np.ndarray):\n            pred, max_vals = lip.get_max_preds_soft(output)\n        else:\n            pred, max_vals = lip.get_max_preds_soft_pt(output)\n    elif arg_max == 'hard':\n        if not isinstance(output, np.ndarray):\n            output = output.data.cpu().numpy()\n        pred, max_vals = lip.get_max_preds(output)\n    else:\n        raise NotImplementedError\n    image_size = image_size if cfgs is None else cfgs['heatmapModel']['input_size']\n    # multiply by down-sample ratio\n    if not isinstance(pred, np.ndarray):\n        pred = pred.data.cpu().numpy()\n        max_vals = max_vals.data.cpu().numpy()\n    pred *= image_size[0]/output.shape[3]\n    # inverse transform and compare pixel didstance\n    centers, scales, rots = meta_data['center'], meta_data['scale'], meta_data['rotation']\n    centers = centers.data.cpu().numpy()\n    scales = scales.data.cpu().numpy()\n    rots = rots.data.cpu().numpy()\n    joints_original_batch = meta_data['original_joints'].data.cpu().numpy()\n    distance_list = []\n    all_src_coordinates = []\n    for sample_idx in range(len(pred)):\n        trans_inv = lip.get_affine_transform(centers[sample_idx], \n                                             scales[sample_idx], \n                                             rots[sample_idx], \n                                             image_size, \n                                             inv=1)\n        joints_original = joints_original_batch[sample_idx]        \n        pred_src_coordinates = lip.affine_transform_modified(pred[sample_idx], \n                                                             trans_inv) \n        all_src_coordinates.append(pred_src_coordinates.reshape(1, len(pred_src_coordinates), 2))\n        distance_list += get_distance(joints_original, pred_src_coordinates)\n    cnt = len(distance_list)\n    avg_acc = sum(distance_list)/cnt\n    others = {\n        'src_coord': np.concatenate(all_src_coordinates, axis=0),\n        'joints_pred': pred,\n        'max_vals': max_vals\n        }\n    return avg_acc, cnt, others"
  },
  {
    "path": "libs/trainer/trainer.py",
    "content": "\"\"\"\nUtilities for training and validation.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport libs.model.FCmodel as FCmodel\nimport libs.optimizer.optimizer as optimizer\nimport libs.loss.function as loss_funcs\nimport libs.visualization.points as vp\n\nfrom libs.common.transformation import procrustes_transform, pnp_refine\nfrom libs.visualization.debug import save_debug_images\nfrom libs.common.utils import AverageMeter\nfrom libs.metric.criterions import Evaluator\nfrom libs.logger.logger import get_dirs\n\nimport torch\nimport numpy as np\nimport time\nimport os\nimport matplotlib.pyplot as plt\n\ndef train_cascade(train_dataset, valid_dataset, cfgs, logger):\n    \"\"\"\n    Method for training the lifter sub-model L.pth.\n    \"\"\"    \n    # data statistics\n    #stats = train_dataset.stats\n    stats = None\n    # cascaded model\n    cascade = FCmodel.get_cascade()\n    stage_record = []\n    # train each stage\n    for stage_id in range(cfgs['cascade']['num_stages']):\n        # initialize the model\n        input_size, output_size = train_dataset.get_input_output_size()\n        cfgs['FCModel']['input_size'] = input_size\n        cfgs['FCModel']['output_size'] = output_size\n        stage_model = FCmodel.get_fc_model(stage_id + 1, \n                                           cfgs=cfgs,\n                                           input_size=input_size,\n                                           output_size=output_size\n                                           )\n        if cfgs['use_gpu']:\n            stage_model = stage_model.cuda()\n        # prepare the optimizer\n        optim, sche = optimizer.prepare_optim(stage_model, cfgs)\n        loss_type = cfgs['FCModel']['loss_type']\n        loss_func = eval('loss_funcs.' + loss_type)(\n        reduction=cfgs['FCModel']['loss_reduction']\n        ).cuda()     \n        # train the model\n        record = train(train_dataset=train_dataset,\n                       valid_dataset=valid_dataset, \n                       model=stage_model, \n                       loss_func=loss_func,\n                       optim=optim, \n                       sche=sche, \n                       stats=stats, \n                       cfgs=cfgs,\n                       logger=logger\n                       )\n        stage_model = record['model']\n        stage_record.append((record['batch_idx'], record['loss']))\n        # put into cascade\n        cascade.append(stage_model.cpu())     \n        # release memory\n        del stage_model    \n    return {'cascade':cascade, 'record':stage_record}\n\ndef evaluate_cascade(cascade, \n                     eval_dataset, \n                     stats, \n                     opt, \n                     save=False, \n                     save_path=None,\n                     action_wise=False, \n                     action_eval_list=None, \n                     apply_dropout=False\n                     ):\n    \"\"\"\n    Method for evaluating the lifter sub-model L.pth.\n    \"\"\" \n    loss, distance = None, None\n    for stage_id in range(len(cascade)):\n        # initialize the model\n        stage_model = cascade[stage_id]\n    \n        if opt.cuda:\n            stage_model = stage_model.cuda()\n        \n        # evaluate the model\n        loss, distance = evaluate(eval_dataset, \n                                  stage_model, \n                                  stats, \n                                  opt, \n                                  save=save, \n                                  save_path=save_path,\n                                  procrustes=False, \n                                  per_joint=True, \n                                  apply_dropout=apply_dropout\n                                  )\n\n        # update datasets\n        eval_dataset.stage_update(stage_model, stats, opt)\n        \n        # release memory\n        del stage_model       \n    return loss, distance\n\ndef get_loader(dataset, cfgs, split, collate_fn=None):\n    \"\"\"\n    Prepare a PyTorch dataloader object.\n    \"\"\" \n    setting = split + '_settings'   \n    arg_dic = {'batch_size': cfgs[setting]['batch_size'],\n               'num_workers': cfgs[setting]['num_threads'],\n               'shuffle': cfgs[setting]['shuffle'],\n               }\n    if collate_fn is not None:\n        arg_dic['collate_fn'] = collate_fn\n    loader = torch.utils.data.DataLoader(dataset, **arg_dic)     \n    return loader\n\ndef train(train_dataset,\n          model, \n          loss_func,\n          optim, \n          sche, \n          cfgs, \n          logger,\n          metric_func=None,\n          stats=None, \n          valid_dataset=None,\n          collate_fn=None,\n          save_debug=False\n          ):\n    \"\"\"\n    Train a model with optional validation during training.\n    \"\"\"\n    # training configurations\n    total_epochs = cfgs['training_settings']['total_epochs']\n    batch_size = cfgs['training_settings']['batch_size']\n    report_every = cfgs['training_settings']['report_every']\n    eval_during = cfgs['training_settings']['eval_during']\n    eval_start_epoch = cfgs['training_settings']['eval_start_epoch'] if \\\n        'eval_start_epoch' in cfgs['training_settings'] else 0\n    # evaluate during training\n    if eval_during and valid_dataset is not None:\n        eval_every = cfgs['training_settings']['eval_every'] \n        evaluator = Evaluator(cfgs['training_settings']['eval_metrics'], \n                              cfgs,\n                              train_dataset.num_joints\n                              )\n    plot_loss = cfgs['training_settings']['plot_loss'] \n    cuda = cfgs['use_gpu'] and torch.cuda.is_available()\n    # optional list storing loss curve \n    x_buffer = []\n    y_buffer = []\n    # online plotting\n    if plot_loss:\n        ax, lines, x_buffer, y_buffer = initialize_plot()\n    # training\n    for epoch in range(1, total_epochs + 1):\n        # Apply cross-ratio loss after certain epochs\n        if epoch > 1:\n            loss_func.apply_cr_loss = True\n        # initialize training record\n        batch_time = AverageMeter()\n        data_time = AverageMeter()\n        losses = AverageMeter()\n        acc = AverageMeter()           \n        model.train()  \n        # modify the learning rate according to the scheduler\n        sche.step()\n        # data loader\n        train_loader = get_loader(train_dataset, cfgs, 'training', collate_fn)   \n        total_batches = len(train_loader)\n        total_sample = len(train_dataset)\n        end = time.time()\n        for batch_idx, (data, target, weights, meta) in enumerate(train_loader):\n            if cuda:\n                data, target, weights = data.cuda(), target.cuda(), weights.cuda()\n            # measure data loading time\n            data_time.update(time.time() - end)\n            # erase all computed gradient        \n            optim.zero_grad()\n            # forward pass to get prediction\n            prediction = model(data)\n            # compute loss\n            loss = loss_func(prediction, target, weights, meta)\n            # compute gradient in the computational graph\n            loss.backward()\n            # update parameters in the model \n            optim.step()\n            losses.update(loss.item(), data.size(0))\n            # compute other optional metrics besides the loss value\n            if metric_func is not None:\n                avg_acc, cnt, others = metric_func(prediction, meta, cfgs)\n                acc.update(avg_acc, n=cnt, others=others)\n                if batch_idx % report_every == 0:\n                    acc.print_content()\n            else:\n                others = None\n            # measure elapsed time\n            batch_time.update(time.time() - end)\n            end = time.time()      \n            # logging\n            if batch_idx % report_every == 0:\n                logger_print(epoch, \n                             batch_idx, \n                             batch_size, \n                             total_sample, \n                             batch_time, \n                             data.size()[0], \n                             data_time, \n                             losses,\n                             acc, \n                             logger\n                             )\n                # optional: save intermediate results for debugging\n                if save_debug:\n                    save_debug_images(epoch, \n                                      batch_idx, \n                                      cfgs, \n                                      data, \n                                      meta, \n                                      target, \n                                      others, \n                                      prediction, \n                                      'train'\n                                      )\n                # update loss curve\n                x_buffer.append(total_batches * (epoch - 1) + batch_idx)\n                y_buffer.append(loss.item())\n                if plot_loss:\n                    update_curve(ax, lines[0], x_buffer, y_buffer)\n            del data, target, weights, meta\n            # evaluate model if specified\n            if eval_during and epoch> eval_start_epoch and \\\n                batch_idx and batch_idx % eval_every == 0:\n                evaluate(valid_dataset, \n                         model, \n                         loss_func, \n                         cfgs, \n                         logger, \n                         evaluator, \n                         collate_fn=collate_fn,\n                         epoch=epoch\n                         )\n                # back to training mode\n                model.train()\n        # save a snapshot\n        if epoch in cfgs['training_settings'].get('snapshot_epochs', []):\n            output_dir, _ = get_dirs(cfgs)\n            prefix = cfgs['exp_type']\n            model_state_file = os.path.join(output_dir, prefix + '_{:d}.pth'.format(epoch))\n            logger.info('=> Snapshot model to {}'.format(model_state_file))\n            torch.save(model.module.state_dict(), model_state_file)\n    logger.info('Training finished.')\n    return {'model':model, 'batch_idx':x_buffer, 'loss':y_buffer}  \n\ndef initialize_plot():\n    \"\"\"\n    Initialize loss plot.\n    \"\"\"     \n    x_buffer, y_buffer = [], []\n    ax = plt.subplot(111)\n    lines = ax.plot(x_buffer, y_buffer)\n    plt.xlabel('batch index')\n    plt.ylabel('training loss')    \n    return ax, lines, x_buffer, y_buffer\n\ndef update_curve(ax, line, x_buffer, y_buffer):\n    \"\"\"\n    Update loss plot.\n    \"\"\" \n    line.set_xdata(x_buffer)\n    line.set_ydata(y_buffer)\n    # recompute the ax.dataLim\n    ax.relim()\n    # update ax.viewLim using the new dataLim\n    ax.autoscale_view()\n    plt.draw()\n    plt.pause(0.05)    \n    return\n\ndef logger_print(epoch, \n                 batch_idx, \n                 batch_size, \n                 total_sample, \n                 batch_time,\n                 length,\n                 data_time,\n                 losses,\n                 acc,\n                 logger\n                 ):\n    \"\"\"\n    Print training logs.\n    \"\"\"     \n    msg = 'Training Epoch: [{0}][{1}/{2}]\\t' \\\n          'Time {batch_time.val:.3f}s ({batch_time.avg:.3f}s)\\t' \\\n          'Speed {speed:.1f} samples/s\\t' \\\n          'Data {data_time.val:.3f}s ({data_time.avg:.3f}s)\\t' \\\n          'Loss {loss.val:.5f} ({loss.avg:.5f})'.format(\n              epoch, \n              batch_idx * batch_size, \n              total_sample, \n              batch_time=batch_time,\n              speed=length / batch_time.val,\n              data_time=data_time, \n              loss=losses\n              )      \n    if acc.val != 0 and acc.avg != 0:\n        # acc is a pre-defined metric with positive value\n        msg += 'Accuracy {acc.val:.3f} ({acc.avg:.3f})'.format(acc=acc)          \n    logger.info(msg)\n    return\n\ndef visualize_lifting_results(data, \n                              prediction, \n                              target=None, \n                              sample_num=None,\n                              intrinsics=None, \n                              refine=False, \n                              dist_coeffs=np.zeros((4,1)),\n                              meta_data=None\n                              ):\n    \"\"\"\n    Visualizing predictions of the lifter model (optional).\n    \"\"\" \n    # only take the coordinates\n    if data.shape[1] > 18:\n        data = data[:, :18]\n    # use the ground truth translation if provided in the meta_data\n    if 'roots' in meta_data:\n        target = np.hstack([meta_data['roots'], target])\n        prediction = np.hstack([meta_data['roots'], prediction])\n    sample_num = sample_num if sample_num else len(prediction) \n    chosen = np.random.choice(len(prediction), sample_num, replace=False)\n    if target is not None:\n        assert len(target) == len(prediction)\n        p3d_gt_sample = target[chosen].reshape(sample_num, -1, 3)\n    else:\n        p3d_gt_sample = None\n    p3d_pred_sample = prediction[chosen].reshape(sample_num, -1, 3)\n    data_sample = data[chosen].reshape(sample_num, -1, 2)\n    # vp.plot_comparison_relative(p3d_pred_sample[:9, 3:], \n    #                             p3d_gt_sample[:9, 3:])\n    ax = vp.plot_scene_3dbox(p3d_pred_sample, p3d_gt_sample)\n    if not refine:\n        return\n    # refine 3D point prediction by minimizing re-projection errors\n    assert intrinsics is not None\n    for idx in range(sample_num):\n        prediction = p3d_pred_sample[idx]\n        tempt_box_pred = prediction.copy()\n        tempt_box_pred[1:, :] += tempt_box_pred[0, :].reshape(1, 3)\n        observation = data_sample[idx]\n        # use the predicted 3D bounding box size for refinement\n        refined_prediction = pnp_refine(tempt_box_pred, observation, intrinsics, \n                                        dist_coeffs)\n        vp.plot_lines(ax, \n                      refined_prediction[:, 1:].T, \n                      vp.plot_3d_bbox.connections, \n                      dimension=3, \n                      c='g'\n                      )\n        # use the gt 3D box size for refinement\n        # first align a box with gt size with the predicted box, then refine\n        if target is None:\n            continue\n        tempt_box_gt = p3d_gt_sample[idx].copy()\n        tempt_box_gt[1:, :] += tempt_box_gt[0, :].reshape(1, 3) \n        pseudo_box = procrustes_transform(tempt_box_gt.T, tempt_box_pred.T)\n        refined_prediction2 = pnp_refine(pseudo_box.T, observation, intrinsics, \n                                        dist_coeffs)\n        vp.plot_lines(ax, \n                      pseudo_box[:, 1:].T, \n                      vp.plot_3d_bbox.connections, \n                      dimension=3, \n                      c='y'\n                      )         \n        vp.plot_lines(ax, \n                      refined_prediction2[:, 1:].T, \n                      vp.plot_3d_bbox.connections, \n                      dimension=3, \n                      c='b'\n                      )        \n    return\n\ndef evaluate(eval_dataset, \n             model, \n             loss_func,\n             cfgs, \n             logger, \n             evaluator,\n             save=False, \n             save_path=None,\n             collate_fn=None,\n             epoch=None,\n             sample_num=20\n             ):\n    \"\"\"\n    Method for evaluating a model.\n    \"\"\"     \n    # unnormalize the prediction if needed\n    if cfgs['testing_settings']['unnormalize']:\n        stats = eval_dataset.statistics\n        \n    # visualize after certain epoch\n    if cfgs['exp_type'] == '2dto3d' and 'vis_epoch' in cfgs['testing_settings']:\n        vis_epoch = cfgs['testing_settings']['vis_epoch']\n    else:\n        vis_epoch = -1\n\n    all_dists = []\n    model.eval()\n    \n    # optional: enable dropout in testing to produce loss similar to the training loss\n    if cfgs['testing_settings']['apply_dropout']:\n        def apply_dropout(m):\n            if type(m) == torch.nn.Dropout:\n                m.train()        \n        model.apply(apply_dropout)\n        \n    intrinsics = None if not hasattr(eval_dataset, 'intrinsic') else \\\n        eval_dataset.intrinsic\n    refine = False if intrinsics is None else True\n    eval_loader = get_loader(eval_dataset, cfgs, 'testing', collate_fn)\n    cuda = cfgs['use_gpu'] and torch.cuda.is_available()\n    losses = AverageMeter()\n    \n    # optional: save intermediate results\n    if save:\n        pred_list = []\n        gt_list = []\n    has_plot = False # only plot once\n    \n    for batch_idx, (data, target, weights, meta) in enumerate(eval_loader):\n        if cuda:\n            data, target, weights = data.cuda(), target.cuda(), weights.cuda()\n            \n        # forward pass to get prediction\n        prediction = model(data)\n        \n        # optional: save intermediate results for debugging\n        if cfgs['testing_settings'].get('save_debug', False) and \\\n            cfgs.get('exp_type') == 'instanceto2d':\n            joints_pred = prediction[1].data.cpu().numpy()\n            image_size = cfgs['heatmapModel']['input_size']\n            joints_pred *= np.array(image_size).reshape(1, 1, 2)\n            save_debug_images(0, \n                              batch_idx, \n                              cfgs, \n                              data, \n                              meta, \n                              target, \n                              {'joints_pred': joints_pred}, \n                              prediction, \n                              'validation'\n                              )        \n            logger.info('Saved batch {:d}'.format(batch_idx))\n#        if save:\n#            pred_list.append(prediction.data.cpu().numpy())\n        loss = loss_func(prediction, target, weights, meta)\n        losses.update(loss.item(), data.size(0))\n        \n        if cfgs['testing_settings']['unnormalize']:\n            # compute distance of body joints in un-normalized format\n            target = eval_dataset.unnormalize(target.data.cpu().numpy(), \n                                              stats['mean_out'], \n                                              stats['std_out']\n                                              )    \n            prediction = eval_dataset.unnormalize(prediction.data.cpu().numpy(), \n                                                  stats['mean_out'], \n                                                  stats['std_out']\n                                                  ) \n            \n        evaluator.update(prediction, ground_truth=target, meta_data=meta)\n        \n        ## plot 3D bounding boxes for visualization\n        if not has_plot and vis_epoch > 0 and epoch > vis_epoch:\n            data_unnorm = eval_dataset.unnormalize(data.data.cpu().numpy(), \n                                                   stats['mean_in'], \n                                                   stats['std_in']\n                                                   ) \n            visualize_lifting_results(data_unnorm, \n                                      prediction, \n                                      target, \n                                      sample_num=sample_num, \n                                      intrinsics=intrinsics,\n                                      refine=refine,\n                                      meta_data=meta\n                                      )\n            has_plot = True\n        if save:\n            pred_list.append(prediction)\n            gt_list.append(target)\n\n    if save:\n        # note the residual update is saved if a cascade is used\n        record = {#'data':np.concatenate(data_list, axis=0), \n                  'pred':np.concatenate(pred_list, axis=0), \n                  'error':all_dists, \n                  'gt':np.concatenate(gt_list, axis=0)\n                  }\n        np.save(save_path, np.array(record))\n\n    evaluator.report(logger)\n    return"
  },
  {
    "path": "libs/visualization/__init__.py",
    "content": "#!/usr/bin/env python3\n# -*- coding: utf-8 -*-\n\"\"\"\nEmpty file.\n\"\"\"\n\n\n"
  },
  {
    "path": "libs/visualization/debug.py",
    "content": "\"\"\"\nUtilities for saving debugging images.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nfrom libs.common.img_proc import get_max_preds\nfrom libs.common.utils import make_dir\n\nimport math\nimport numpy as np\nimport torchvision\nimport cv2\n\nfrom os.path import join\n\ndef draw_circles(ndarr, \n                 xmaps, \n                 ymaps, \n                 nmaps, \n                 batch_joints, \n                 batch_joints_vis, \n                 width, \n                 height, \n                 padding, \n                 color=[255,0,0],\n                 add_idx=True\n                 ):\n    k = 0\n    for y in range(ymaps):\n        for x in range(xmaps):\n            if k >= nmaps:\n                break\n            joints = batch_joints[k]\n            for idx, joint in enumerate(joints):\n                xpos = x * width + padding + joint[0]\n                ypos = y * height + padding + joint[1]\n                cv2.circle(ndarr, (int(xpos), int(ypos)), 2, color, 2)\n                if add_idx:\n                    cv2.putText(ndarr, \n                                str(idx+1), \n                                (int(xpos), int(ypos)), \n                                cv2.FONT_HERSHEY_SIMPLEX, \n                                1, color, 1\n                                )\n            k += 1\n    return ndarr\n\n# functions used for debugging heatmap-based keypoint localization model      #\ndef save_batch_image_with_joints(batch_image, \n                                 record_dict, \n                                 file_name, \n                                 nrow=8, \n                                 padding=2\n                                 ):\n    \"\"\"\n    batch_image: [batch_size, channel, height, width]\n    batch_joints: [batch_size, num_joints, 3],\n    batch_joints_vis: [batch_size, num_joints, 1],\n    \"\"\"\n    grid = torchvision.utils.make_grid(batch_image[:, :3, :, :], nrow, padding, True)\n    ndarr = grid.mul(255).clamp(0, 255).byte().permute(1, 2, 0).cpu().numpy()\n    ndarr = ndarr.copy()\n    nmaps = batch_image.size(0)\n    xmaps = min(nrow, nmaps)\n    ymaps = int(math.ceil(float(nmaps) / xmaps))\n    height = int(batch_image.size(2) + padding)\n    width = int(batch_image.size(3) + padding)\n    batch_joints, batch_joints_vis = record_dict['pred'] \n    ndarr = draw_circles(ndarr, xmaps, ymaps, nmaps, batch_joints, batch_joints_vis, \n                         width, height, padding)\n    if 'gt' in record_dict:\n        nmaps = min(nmaps, len(batch_joints_vis))\n        xmaps = min(nrow, nmaps)\n        ymaps = int(math.ceil(float(nmaps) / xmaps))\n        batch_joints_gt, batch_joints_vis_gt = record_dict['gt']\n        ndarr = draw_circles(ndarr, xmaps, ymaps, nmaps, batch_joints_gt, batch_joints_vis_gt, \n                             width, height, padding, color=[0,255,255])        \n    cv2.imwrite(file_name, cv2.cvtColor(ndarr, cv2.COLOR_RGB2BGR))\n    return\n\ndef save_batch_heatmaps(batch_image, \n                        batch_heatmaps, \n                        file_name,\n                        normalize=True\n                        ):\n    \"\"\"\n    batch_image: [batch_size, channel, height, width]\n    batch_heatmaps: ['batch_size, num_joints, height, width]\n    file_name: saved file name\n    \"\"\"\n    if normalize:\n        batch_image = batch_image.clone()\n        min = float(batch_image.min())\n        max = float(batch_image.max())\n\n        batch_image.add_(-min).div_(max - min + 1e-5)\n\n    batch_size = batch_heatmaps.size(0)\n    num_joints = batch_heatmaps.size(1)\n    heatmap_height = batch_heatmaps.size(2)\n    heatmap_width = batch_heatmaps.size(3)\n\n    grid_image = np.zeros((batch_size*heatmap_height,\n                           (num_joints+1)*heatmap_width,\n                           3),\n                          dtype=np.uint8)\n\n    preds, maxvals = get_max_preds(batch_heatmaps.detach().cpu().numpy())\n\n    for i in range(batch_size):\n        image = batch_image[i].mul(255)\\\n                              .clamp(0, 255)\\\n                              .byte()\\\n                              .permute(1, 2, 0)\\\n                              .cpu().numpy()\n        heatmaps = batch_heatmaps[i].mul(255)\\\n                                    .clamp(0, 255)\\\n                                    .byte()\\\n                                    .cpu().numpy()\n\n        resized_image = cv2.resize(image,\n                                   (int(heatmap_width), int(heatmap_height)))\n\n        height_begin = heatmap_height * i\n        height_end = heatmap_height * (i + 1)\n        for j in range(num_joints):\n            cv2.circle(resized_image,\n                       (int(preds[i][j][0]), int(preds[i][j][1])),\n                       1, [0, 0, 255], 1)\n            heatmap = heatmaps[j, :, :]\n            colored_heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)\n            masked_image = colored_heatmap*0.7 + resized_image*0.3\n            cv2.circle(masked_image,\n                       (int(preds[i][j][0]), int(preds[i][j][1])),\n                       1, [0, 0, 255], 1)\n\n            width_begin = heatmap_width * (j+1)\n            width_end = heatmap_width * (j+2)\n            grid_image[height_begin:height_end, width_begin:width_end, :] = \\\n                masked_image\n            # grid_image[height_begin:height_end, width_begin:width_end, :] = \\\n            #     colored_heatmap*0.7 + resized_image*0.3\n\n        grid_image[height_begin:height_end, 0:heatmap_width, :] = resized_image\n\n    cv2.imwrite(file_name, grid_image)\n    return\n\ndef save_debug_images(epoch, \n                      batch_index, \n                      cfgs, \n                      input, \n                      meta, \n                      target, \n                      others, \n                      output, \n                      split\n                      ):\n    \"\"\"\n    Save debugging images during training HC.pth.\n    \"\"\"    \n    if not cfgs['training_settings']['debug']['save']:\n        return\n    prefix = join(cfgs['dirs']['output'], \n                  \"intermediate_results\",\n                  split, \n                  '{}_{}'.format(epoch, batch_index)\n                  )\n    make_dir(prefix)\n    joints_pred = others['joints_pred']\n    debug_cfgs = cfgs['training_settings']['debug']\n    record_dict = {'pred':(joints_pred, meta['joints_vis']),\n                   'gt':(meta['transformed_joints'], meta['joints_vis'])}    \n    if debug_cfgs['save_images_kpts']:\n        save_batch_image_with_joints(\n            input[:,:3,:,:], record_dict, '{}_keypoints.jpg'.format(prefix)\n        )\n    if debug_cfgs['save_hms_gt']:\n        save_batch_heatmaps(\n            input[:,:3,:,:], target, '{}_hm_gt.jpg'.format(prefix)\n        )\n    if debug_cfgs['save_hms_pred']:\n        output = output[0] if type(output) is tuple else output\n        save_batch_heatmaps(\n            input[:,:3,:,:], output, '{}_hm_pred.jpg'.format(prefix)\n        )\n    return"
  },
  {
    "path": "libs/visualization/egonet_utils.py",
    "content": "\"\"\"\nVisualization utilities for Ego-Net inference.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport cv2\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nimport libs.visualization.points as vp\n\ndef plot_2d_objects(img_path, record, color_dict):\n    if 'plots' in record:\n        # update old drawing\n        fig = record['plots']['fig2d']\n        ax = record['plots']['ax2d']\n    else:\n        # new drawing\n        fig = plt.figure(figsize=(11.3, 9))\n        ax = plt.subplot(111)\n        record['plots'] = {}\n        record['plots']['fig2d'] = fig\n        record['plots']['ax2d'] = ax\n        image = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)[:, :, ::-1]\n        height, width, _ = image.shape\n        ax.imshow(image) \n        ax.set_xlim([0, width])\n        ax.set_ylim([0, height])\n        ax.invert_yaxis()\n    for idx in range(len(record['kpts_2d_pred'])):\n        kpts = record['kpts_2d_pred'][idx].reshape(-1, 2)\n        bbox = record['bbox_resize'][idx]\n        vp.plot_2d_bbox(ax, bbox, color_dict['bbox_2d'])\n        # predicted key-points\n        ax.plot(kpts[:, 0], kpts[:, 1], color_dict['kpts'][0])    \n    if 'kpts_2d_gt' in record:\n        # plot ground truth 2D screen coordinates\n        for idx, kpts_gt in enumerate(record['kpts_2d_gt']):\n            kpts_gt = kpts_gt.reshape(-1, 3)\n            vp.plot_3d_bbox(ax, kpts_gt[1:, :2], color='g', linestyle='-.')\n    if 'arrow' in record:\n        for idx in range(len(record['arrow'])):\n            start = record['arrow'][idx][:,0]\n            end = record['arrow'][idx][:,1]\n            x, y = start\n            dx, dy = end - start\n            ax.arrow(x, y, dx, dy, color='r', lw=4, head_width=5, alpha=0.5) \n    # save intermediate results\n    # plt.gca().set_axis_off()\n    # plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, \n    # hspace = 0, wspace = 0)\n    # plt.margins(0,0)\n    # plt.gca().xaxis.set_major_locator(plt.NullLocator())\n    # plt.gca().yaxis.set_major_locator(plt.NullLocator())\n    # img_name = img_path.split('/')[-1]\n    # save_dir = './qualitative_results/'\n    # plt.savefig(save_dir + img_name, dpi=100, bbox_inches = 'tight', pad_inches = 0)\n    return record\n\ndef plot_3d_objects(prediction, target, pose_vecs_gt, record, color):\n    if target is not None:\n        p3d_gt = target.reshape(len(target), -1, 3)\n    else:\n        p3d_gt = None\n    p3d_pred = prediction.reshape(len(prediction), -1, 3)\n    if \"kpts_3d_before\" in record:\n        # use predicted translation for visualization\n        p3d_pred = np.concatenate([record['kpts_3d_before'][:, [0], :], p3d_pred], axis=1)\n    elif p3d_gt is not None and p3d_gt.shape[1] == p3d_pred.shape[1] + 1:\n        # use ground truth translation for visualization\n        assert len(p3d_pred) == len(p3d_gt)\n        p3d_pred = np.concatenate([p3d_gt[:, [0], :], p3d_pred], axis=1) \n    else:\n        raise NotImplementedError\n    if 'plots' in record and 'ax3d' in record['plots']:\n        # update drawing\n        ax = record['plots']['ax3d']\n        ax = vp.plot_scene_3dbox(p3d_pred, p3d_gt, ax=ax, color=color)\n    elif 'plots' in record:\n        # plotting a set of 3D boxes\n        ax = vp.plot_scene_3dbox(p3d_pred, p3d_gt, color=color)\n        ax.set_title(\"GT: black w/o Ego-Net: magenta w/ Ego-Net: red/yellow\")\n        vp.draw_pose_vecs(ax, pose_vecs_gt)\n        record['plots']['ax3d'] = ax\n    else:\n        raise NotImplementedError\n    # draw pose angle predictions\n    translation = p3d_pred[:, 0, :]    \n    pose_vecs_pred = np.concatenate([translation, record['euler_angles']], axis=1)\n    vp.draw_pose_vecs(ax, pose_vecs_pred, color=color)\n    if 'kpts_3d_before' in record and 'plots' in record:\n        # plot input 3D bounding boxes before using Ego-Net\n        kpts_3d_before = record['kpts_3d_before']\n        vp.plot_scene_3dbox(kpts_3d_before, ax=ax, color='m')    \n        pose_vecs_before = np.zeros((len(kpts_3d_before), 6))\n        for idx in range(len(pose_vecs_before)):\n            pose_vecs_before[idx][0:3] = record['raw_txt_format'][idx]['locations']\n            pose_vecs_before[idx][4] = record['raw_txt_format'][idx]['rot_y']\n        vp.draw_pose_vecs(ax, pose_vecs_before, color='m')\n    return record"
  },
  {
    "path": "libs/visualization/points.py",
    "content": "\"\"\"\nSimple visualization utilities for 2D and 3D points based on Matplotlib.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport numpy as np\nimport matplotlib.pyplot as plt\n\nfrom mpl_toolkits.mplot3d import Axes3D\n\ndef check_points(points, dimension):\n    \"\"\"\n    Assertion function for input dimension.\n    \"\"\"\n    if len(points.shape) == 1:\n        assert points.shape[0] % dimension == 0\n        points = points.reshape(-1, dimension)\n    elif len(points.shape) == 2:\n        assert points.shape[1] == dimension\n    else:\n        raise ValueError    \n    return points\n\ndef set_3d_axe_limits(ax, points=None, center=None, radius=None, ratio=1.2):\n    \"\"\"\n    Set 3d axe limits to simulate set_aspect('equal').\n    Matplotlib has not yet provided implementation of set_aspect('equal') \n    for 3d axe.\n    \"\"\"\n    if points is None:\n        assert center is not None and radius is not None\n    if center is None or radius is None:\n        assert points is not None\n    if center is None:\n        center = points.mean(axis=0, keepdims=True)\n    if radius is None:\n        radius = points - center\n        radius = np.max(np.abs(radius))*ratio\n    #ax.set_aspect('equal')    \n    xroot, yroot, zroot = center[0,0], center[0,1], center[0,2]\n    ax.set_xlim3d([-radius+xroot, radius+xroot])\n    ax.set_ylim3d([-radius+yroot, radius+yroot])\n    ax.set_zlim3d([-radius+zroot, radius+zroot])    \n    return\n\ndef plot_3d_points(ax, \n                   points, \n                   indices=None, \n                   center=None, \n                   radius=None,  \n                   add_labels=True, \n                   display_ticks=True, \n                   remove_planes=[],\n                   marker='o', \n                   color='k', \n                   size=50, \n                   alpha=1, \n                   set_limits=False\n                   ):\n    \"\"\"\n    Scatter plot of 3D points.\n    \n    points are of shape [3*N_points] or [N_points, 3]\n    \"\"\"\n    points = check_points(points, dimension=3)\n    points = points[indices,:] if indices is not None else points\n    ax.scatter(points[:,0], points[:,1], points[:,2], marker=marker, c=color,\n               s=size, alpha=alpha)\n    if set_limits:\n        set_3d_axe_limits(ax, points, center, radius)\n    if add_labels:\n        ax.set_xlabel(\"x\")\n        ax.set_ylabel(\"y\")\n        ax.set_zlabel(\"z\")\n    # remove tick labels or planes\n    if not display_ticks:\n        ax.set_xticks([])\n        ax.set_yticks([])\n        ax.set_zticks([])\n        ax.get_xaxis().set_ticklabels([])\n        ax.get_yaxis().set_ticklabels([])\n        ax.set_zticklabels([])\n    white = (1.0, 1.0, 1.0, 1.0)\n    if 'x' in remove_planes:\n        ax.w_xaxis.set_pane_color(white)\n    if 'y' in remove_planes:\n        ax.w_xaxis.set_pane_color(white)\n    if 'z' in remove_planes:\n        ax.w_xaxis.set_pane_color(white)        \n    \n    plt.show()\n    return\n\ndef plot_lines(ax, \n               points, \n               connections, \n               dimension, \n               lw=4, \n               c='k', \n               linestyle='-', \n               alpha=0.8, \n               add_index=False\n               ):\n    \"\"\"\n    Plot 2D/3D lines given points and connection.\n    \n    connections are of shape [n_lines, 2]\n    \"\"\"\n    points = check_points(points, dimension)\n    if add_index:\n        for idx in range(len(points)):\n            if dimension == 2:\n                x, y = points[idx][0], points[idx][1]\n                ax.text(x, y, str(idx))\n            elif dimension == 3:\n                x, y, z = points[idx][0], points[idx][1], points[idx][2]\n                ax.text(x, y, z, str(idx))                \n    connections = connections.reshape(-1, 2)\n    for connection in connections:\n        x = [points[connection[0]][0], points[connection[1]][0]]\n        y = [points[connection[0]][1], points[connection[1]][1]]\n        if dimension == 3:\n            z = [points[connection[0]][2], points[connection[1]][2]]\n            line, = ax.plot(x, y, z, lw=lw, c=c, linestyle=linestyle, alpha=alpha)\n        else:\n            line, = ax.plot(x, y, lw=lw, c=c, linestyle=linestyle, alpha=alpha)\n    plt.show()\n    return line\n\ndef plot_mesh(ax, vertices, faces, color='grey'):\n    \"\"\"\n    Simple mesh plotting.\n    \n    vertics of shape [N_vertices, 3]\n    faces pf shape [N_faces, 3] storing indices\n    \"\"\"    \n    set_3d_axe_limits(ax, vertices)\n    ax.plot_trisurf(vertices[:, 0], \n                    vertices[:, 1], \n                    faces, \n                    -vertices[:, 2], \n                    shade=True, \n                    color=color\n                    )    \n    return\n\ndef plot_3d_coordinate_system(ax, \n                              origin, \n                              system, \n                              length=300, \n                              colors=['r', 'g', 'b']\n                              ):\n    \"\"\"\n    Draw a coordinate system at a specified origin\n    \n    system: [v1, v2, v3] \n    \"\"\" \n    origin = origin.reshape(3, 1)\n    start_points = np.repeat(origin, 3, axis=1)\n    end_points = start_points + system*length\n    all_points = np.hstack([origin, end_points])\n    for i in range(3):\n        plot_lines(ax, \n                   all_points.T, \n                   plot_3d_coordinate_system.connections[i].reshape(1,2),\n                   dimension=3, \n                   c=colors[i]\n                   )\n    return\n\ndef plot_3d_bbox(ax, \n                 bbox_3d_projected, \n                 color=None, \n                 linestyle='-', \n                 add_index=False\n                 ):\n    \"\"\"\n    Draw the projected edges of a 3D cuboid.\n    \"\"\" \n    c = np.random.rand(3) if color is None else color\n    plot_lines(ax, \n               bbox_3d_projected, \n               plot_3d_bbox.connections, \n               dimension=2, \n               c=c, \n               linestyle=linestyle, \n               add_index=add_index\n               )\n    return\n\ndef plot_2d_bbox(ax, \n                 bbox_2d, \n                 color=None, \n                 score=None, \n                 label=None, \n                 linestyle='-'\n                 ):\n    \"\"\"\n    Draw a 2D bounding box.\n    \n    bbox_2d in the format [x1, y1, x2, y2]\n    \"\"\" \n    c = np.random.rand(3) if color is None else color\n    x1, y1, x2, y2 = bbox_2d[0], bbox_2d[1], bbox_2d[2], bbox_2d[3],\n    points = np.array([[x1, y1], [x2, y1], [x2, y2], [x1, y2]], dtype=np.float32)\n    plot_lines(ax, points, plot_2d_bbox.connections, dimension=2, c=c, linestyle=linestyle)\n    if score is not None and label is not None:\n        string = \"({:.2f}, {:d})\".format(score, label)\n        ax.text((x1+x2)*0.5, (y1+y2)*0.5, string, bbox=dict(facecolor='red', alpha=0.2))\n    return\n\ndef plot_comparison_relative(points_pred, points_gt):\n    # DEPRECATED\n    # plot the comparison of the shape relative to the root point\n    plt.figure()\n    num_row = 3\n    num_col = int(len(points_pred)/num_row)\n    for i in range(len(points_pred)):\n        ax = plt.subplot(num_row, num_col, i+1, projection='3d')\n        pred = points_pred[i]\n        gt = points_gt[i]\n        plot_3d_points(ax, pred, color='r')\n        plot_3d_points(ax, gt, color='k')\n        # TODO check here\n        pred_bbox = get_bbox_3d(pred)\n        gt_bbox = get_bbox_3d(gt)\n        plot_lines(ax, pred_bbox, plot_3d_bbox.connections, dimension=3, c='r')\n        plot_lines(ax, gt_bbox, plot_3d_bbox.connections, dimension=3, c='k')\n        set_3d_axe_limits(ax, \n                          np.vstack([pred_bbox.reshape(-1, 3), \n                                     gt_bbox.reshape(-1, 3)]\n                                    ),\n                          center=np.zeros((1,3)), \n                          radius=5.\n                          )\n    ax.set_xlabel(\"x\")\n    ax.set_ylabel(\"y\")\n    ax.set_zlabel(\"z\")    \n    ax.view_init(0., 90.)\n    return\n\ndef plot_scene_3dbox(points_pred, points_gt=None, ax=None, color='r'):\n    \"\"\"\n    Plot the comparison of predicted 3d bounding boxes and ground truth ones.\n    \"\"\" \n    if ax is None:\n        plt.figure()\n        ax = plt.subplot(111, projection='3d')\n    preds = points_pred.copy()\n    # add the root translation\n    preds[:,1:,] = preds[:,1:,] + preds[:,[0],]\n    if points_gt is not None:\n        gts = points_gt.copy() \n        gts[:,1:,] = gts[:,1:,] + gts[:,[0],]\n        all_points = np.concatenate([preds, gts], axis=0).reshape(-1, 3)\n    else:\n        all_points = preds.reshape(-1, 3)\n    for pred in preds:\n        plot_3d_points(ax, pred, color=color, size=15)\n        plot_lines(ax, pred[1:,], plot_3d_bbox.connections, dimension=3, c=color)\n    if points_gt is not None:\n        for gt in gts:\n            plot_3d_points(ax, gt, color='k', size=15)\n            plot_lines(ax, gt[1:,], plot_3d_bbox.connections, dimension=3, c='k')         \n    set_3d_axe_limits(ax, all_points)\n    return ax\n\ndef get_area(points, indices, preserve_points=False):\n    # DEPRECATED\n    # points [N, 2]\n    # indices [M, 3]\n    vec1 = points[indices[:, 1], :] - points[indices[:, 0], :]\n    vec2 = points[indices[:, 2], :] - points[indices[:, 0], :]\n    area= np.cross(vec1, vec2)*0.5\n    area = area.reshape(1, -1)\n    if preserve_points:\n        feature = np.hstack([points.reshape(1,-1), area])\n    else:\n        feature = area\n    return feature\n\ndef interpolate(start, end, num_interp):\n    # DEPRECATED\n    # start: [3]\n    # end: [3]\n    x = np.linspace(start[0], end[0], num=num_interp+2)[1:-1].reshape(num_interp, 1)\n    y = np.linspace(start[1], end[1], num=num_interp+2)[1:-1].reshape(num_interp, 1)\n    z = np.linspace(start[2], end[2], num=num_interp+2)[1:-1].reshape(num_interp, 1)\n    return np.concatenate([x,y,z], axis=1)\n\ndef get_interpolated_points(points, indices, num_interp):\n    # DEPRECATED\n    # points [N, 3]\n    # indices [M, 2] point indices for interpolating a line segment\n    # num_interp how many points to add for each segment\n    new_points = []\n    for start_idx, end_idx in indices:\n        new_points.append(interpolate(points[start_idx], points[end_idx], num_interp))\n    return np.vstack(new_points)\n\ndef draw_pose_vecs(ax, pose_vecs=None, color='black'):\n    \"\"\"\n    Add pose vectors to a 3D matplotlib axe.\n    \"\"\"     \n    if pose_vecs is None:\n        return\n    for pose_vec in pose_vecs:\n        x, y, z, pitch, yaw, roll = pose_vec\n        string = \"({:.2f}, {:.2f}, {:.2f})\".format(pitch, yaw, roll)\n        # add some random noise to the text location so that they do not overlap\n        nl = 0.02 # noise level\n        ax.text(x*(1+np.random.randn()*nl), \n                y*(1+np.random.randn()*nl), \n                z*(1+np.random.randn()*nl), \n                string, \n                color=color\n                )\n        \ndef get_bbox_3d(points, add_center=False, interp_style=\"\"):\n    \"\"\"\n    Get a 3D bounding boxes from coordinate limits in object coordinate system.\n    \"\"\"  \n    assert len(points.shape) == 2 \n    if points.shape[0] == 3:\n        axis=1 \n    elif points.shape[1] == 3:\n        axis=0\n    limit_min = points.min(axis=axis)\n    limit_max = points.max(axis=axis)\n    xmax, xmin = limit_max[0], limit_min[0]\n    ymax, ymin = limit_max[1], limit_min[1]\n    zmax, zmin = limit_max[2], limit_min[2]\n    bbox = np.array([[xmax, ymin, zmax],\n                     [xmax, ymax, zmax],\n                     [xmax, ymin, zmin],\n                     [xmax, ymax, zmin],\n                     [xmin, ymin, zmax],\n                     [xmin, ymax, zmax],\n                     [xmin, ymin, zmin],\n                     [xmin, ymax, zmin]])\n    if add_center:\n        bbox = np.vstack([np.array([[0., 0., 0.]]), bbox])\n    if interp_style.startswith('bbox9interp'):\n        interp_num = int(interp_style[11:])\n        # indices for each edge\n        indices = np.array([[1,2],\n                            [3,4],\n                            [1,3],\n                            [2,4],\n                            [5,6],\n                            [7,8],\n                            [5,7],\n                            [6,8],\n                            [1,5],\n                            [3,7],\n                            [2,6],\n                            [4,8]])\n        new_points = get_interpolated_points(bbox, indices, interp_num)\n        bbox = np.vstack([bbox, new_points])\n    return bbox\n\ndef ray_intersect_triangle(p0, p1, triangle):\n    \"\"\"\n    Tests if a ray starting at point p0, in the direction\n    p1 - p0, will intersect with the triangle.\n    \n    arguments:\n    p0, p1: numpy.ndarray, both with shape (3,) for x, y, z.\n    triangle: numpy.ndarray, shaped (3,3), with each row\n        representing a vertex and three columns for x, y, z.\n    \n    returns: \n        0.0 if ray does not intersect triangle, \n        1.0 if it will intersect the triangle,\n        2.0 if starting point lies in the triangle.\n        \n    Reference: https://www.erikrotteveel.com/python/three-dimensional-ray-tracing-in-python/\n    \"\"\"\n    v0, v1, v2 = triangle\n    u = v1 - v0\n    v = v2 - v0\n    normal = np.cross(u, v)\n    b = np.inner(normal, p1 - p0)\n    a = np.inner(normal, v0 - p0)\n    if (b == 0.0):\n        # ray is parallel to the plane\n        if a != 0.0:\n            # ray is outside but parallel to the plane\n            return 0\n        else:\n            # ray is parallel and lies in the plane\n            rI = 0.0\n    else:\n        rI = a / b\n    if rI < 0.0:\n        return 0\n    w = p0 + rI * (p1 - p0) - v0\n    denom = np.inner(u, v) * np.inner(u, v) - \\\n        np.inner(u, u) * np.inner(v, v)\n    si = (np.inner(u, v) * np.inner(w, v) - \\\n        np.inner(v, v) * np.inner(w, u)) / denom\n    if (si < 0.0) | (si > 1.0):\n        return 0\n    ti = (np.inner(u, v) * np.inner(w, u) - \\\n        np.inner(u, u) * np.inner(w, v)) / denom\n    if (ti < 0.0) | (si + ti > 1.0):\n        return 0\n    if (rI == 0.0):\n        return 2\n    return 1\n\ndef get_visibility(box3d, triangles):\n    \"\"\"\n    Get visibility for each vertex of a 3D bounding box given all the triangles\n    in a scene.\n    \n    box3d: [8, 3] The vertex coordinates in the camera coordinate system.\n    triangles: [N, 3, 3]\n    \"\"\"      \n    visibility = np.ones(8, dtype=np.bool)\n    p1 = np.zeros(3)\n    for idx, p0 in enumerate(box3d):\n        intersects = set()\n        for triangle in triangles:\n            intersects.add(ray_intersect_triangle(p0, p1, triangle))\n        if 1 in intersects:\n            visibility[idx] = False\n    return visibility\n\n## static variables implemented as function attributes \nplot_3d_coordinate_system.connections = np.array([[0, 1],\n                                                  [0, 2],\n                                                  [0, 3]])\nplot_3d_bbox.connections = np.array([[0, 1],\n                                     [0, 2],\n                                     [1, 3],\n                                     [2, 3],\n                                     [4, 5],\n                                     [5, 7],\n                                     [4, 6],\n                                     [6, 7],\n                                     [0, 4],\n                                     [1, 5],\n                                     [2, 6],\n                                     [3, 7]])\nplot_2d_bbox.connections = np.array([[0, 1],\n                                     [1, 2],\n                                     [2, 3],\n                                     [3, 0]])"
  },
  {
    "path": "tools/inference.py",
    "content": "\"\"\"\nInference of Ego-Net on KITTI dataset.\n\nThe user can provide the 3D bounding boxes predicted by other 3D object detectors\nand run Ego-Net to refine the orientation of these instances.\n\nThe user can also visualize the intermediate results.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport sys\nsys.path.append('../')\n\nimport libs.arguments.parse as parse\nimport libs.logger.logger as liblogger\nimport libs.dataset.KITTI.car_instance as libkitti\n\nfrom libs.common.img_proc import modify_bbox\nfrom libs.trainer.trainer import get_loader\nfrom libs.model.egonet import EgoNet\n\nimport shutil\nimport torch\nimport numpy as np\nimport os\nimport subprocess\nimport matplotlib.pyplot as plt\nplt.ion()\n\ndef filter_detection(detected, thres=0.7):\n    \"\"\"\n    Filter predictions based on a confidence threshold.\n    \"\"\"      \n    # detected: list of dict\n    filtered = []\n    for detection in detected:\n        tempt_dict = {}\n        indices = detection['scores'] > thres\n        for key in ['boxes', 'labels', 'scores']:\n            tempt_dict[key] = detection[key][indices]\n        filtered.append(tempt_dict)\n    return filtered\n\ndef merge(dict_a, dict_b):\n    for key in dict_b.keys():\n        dict_a[key] = dict_b[key]\n    return\n\ndef collate_dict(dict_list):\n    ret = {}\n    for key in dict_list[0]:\n        ret[key] = [d[key] for d in dict_list]\n    return ret\n\ndef my_collate_fn(batch):\n    # the collate function for 2d pose training\n    imgs, meta = list(zip(*batch))\n    meta = collate_dict(meta)\n    return imgs, meta\n\ndef filter_conf(record, thres=0.0):\n    \"\"\"\n    Filter the object detections with a confidence threshold.\n    \"\"\"\n    annots = record['raw_txt_format']\n    indices = [i for i in range(len(annots)) if annots[i]['score'] >= thres]\n    if len(indices) == 0:\n        return False, record\n    filterd_record = {\n        'bbox_2d': record['bbox_2d'][indices],\n        'kpts_3d': record['kpts_3d'][indices],\n        'raw_txt_format': [annots[i] for i in indices],\n        'scores': [annots[i]['score'] for i in indices],\n        'K':record['K']\n        }\n    return True, filterd_record\n\ndef gather_dict(request, \n                references, \n                filter_c=True, \n                larger=True, \n                thres=0.,\n                target_ar=1.,\n                enlarge=1.2\n                ):\n    \"\"\"\n    Gather a annotation dictionary from the prepared detections as requsted.\n    \"\"\"\n    assert 'path' in request\n    ret = {'path':[], \n           'boxes':[], \n           'kpts_3d_before':[], \n           'raw_txt_format':[],\n           'scores':[],\n           'K':[]}\n    for img_path in request['path']:\n        img_name = img_path.split('/')[-1]\n        if img_name not in references:\n            print('Warning: ' + img_name + ' not included in detected images!')\n            continue\n        ref = references[img_name]\n        if filter_c:\n            success, ref = filter_conf(ref, thres=thres)\n        if filter_c and not success:\n            continue\n        ret['path'].append(img_path)\n        bbox = ref['bbox_2d']\n        if larger:\n            # enlarge the input bounding box if needed            \n            for instance_id in range(len(bbox)):\n                bbox[instance_id] = np.array(modify_bbox(bbox[instance_id], \n                                                         target_ar=target_ar, \n                                                         enlarge=enlarge\n                                                         )['bbox']\n                                             )\n        ret['boxes'].append(bbox)\n        # 3D key-points from the detections before using Ego-Net\n        ret['kpts_3d_before'].append(ref['kpts_3d'])\n        # raw prediction strings used for later saving\n        ret['raw_txt_format'].append(ref['raw_txt_format'])\n        ret['scores'].append(ref['scores'])\n        ret['K'].append(ref['K'])\n    if 'pose_vecs_gt' in request:\n        ret['pose_vecs_gt'] = request['pose_vecs_gt']\n    return ret\n    \ndef make_output_dir(cfgs, name):\n    save_dir = os.path.join(cfgs['dirs']['output'], name, 'data')\n    if not os.path.exists(save_dir):\n        os.makedirs(save_dir) \n    return save_dir\n\n@torch.no_grad()\ndef inference(testset, model, results, cfgs):\n    \"\"\"\n    The inference loop.\n    \n    Set cfgs['visualize'] to True if you want to view the results.\n    color_dict stores plotting parameters used by Matplotlib.\n    save_dict stores parameters relevant to result saving.\n    \"\"\"\n    # data loader\n    data_loader = get_loader(testset, cfgs, 'testing', collate_fn=my_collate_fn)          \n    # transformation statistics\n    model.pth_trans = testset.pth_trans \n    all_records = {}\n    for batch_idx, (_, meta) in enumerate(data_loader):\n        if cfgs['use_gt_box']:\n            save_dir = make_output_dir(cfgs, 'gt_box_test')         \n            # use ground truth bounding box to crop RoIs\n            record = model(meta)\n            record = model.post_process(record,\n                                        visualize=cfgs['visualize'],\n                                        color_dict={'bbox_2d':'y',\n                                                    'bbox_3d':'y',\n                                                    'kpts':['yx', 'y']\n                                                    },\n                                        save_dict={\n                                            'flag':True,\n                                            'save_dir':save_dir\n                                            }\n                                        )\n            merge(all_records, record)\n        if cfgs['use_pred_box']:\n            # use detected bounding box from any 2D/3D detector\n            thres = cfgs.get('conf_thres', 0.)\n            width, height = cfgs['heatmapModel']['input_size']\n            enlarge = cfgs['dataset'].get('enlarge_factor', 1.2)\n            annot_dict = gather_dict(meta, results['pred'], \n                                     thres=thres,\n                                     target_ar=height/width,\n                                     enlarge=enlarge\n                                     )\n            if len(annot_dict['path']) != 0:\n                record2 = model(annot_dict)\n                # update drawings\n                for key in record2:\n                    if 'record' in locals() and 'plots' in record[key]:\n                        record2[key]['plots'] = record[key]['plots']\n                save_dir = make_output_dir(cfgs, 'submission')   \n                record2 = model.post_process(record2, \n                                             visualize=cfgs['visualize'],\n                                             color_dict={'bbox_2d':'r',\n                                                         'bbox_3d':'r',\n                                                         'kpts':['rx', 'r'],\n                                                         },\n                                             save_dict={'flag':True,\n                                                        'save_dir':save_dir\n                                                        },\n                                             alpha_mode=cfgs['testing_settings']['alpha_mode']\n                                             )   \n        if cfgs['visualize']:\n            input(\"Press Enter to view next batch.\")\n        # set batch_to_show to a small number if you need to visualize \n        if batch_idx >= cfgs['batch_to_show'] - 1:\n            break  \n    return\n\ndef generate_empty_file(output_dir, label_dir):\n    \"\"\"\n    Generate empty files for images without any predictions.\n    \"\"\"    \n    all_files = os.listdir(label_dir)\n    detected = os.listdir(os.path.join(output_dir, 'data'))\n    for file_name in all_files:\n        if not file_name.endswith(\".txt\"):\n            continue\n        if file_name not in detected:\n            file = open(os.path.join(output_dir, 'data', file_name[:-4] + '.txt'), 'w')\n            file.close()\n    return\n\ndef main():\n    # experiment configurations\n    cfgs = parse.parse_args()\n    \n    # logging\n    logger, final_output_dir = liblogger.get_logger(cfgs)   \n    \n    # save a copy of the experiment configuration\n    save_cfg_path = os.path.join(final_output_dir, 'saved_config.yml')\n    shutil.copyfile(cfgs['config_path'], save_cfg_path)\n    \n    # set GPU\n    if cfgs['use_gpu'] and torch.cuda.is_available():\n        logger.info('Using GPU:{}'.format(cfgs['gpu_id']))\n        os.environ['CUDA_VISIBLE_DEVICES'] = ','.join(list(map(str, cfgs['gpu_id'])))\n    else:\n        raise ValueError('CPU-based inference is not maintained.')\n        \n    # cudnn related setting\n    torch.backends.cudnn.benchmark = cfgs['cudnn']['benchmark']\n    torch.backends.cudnn.deterministic = cfgs['cudnn']['deterministic']\n    torch.backends.cudnn.enabled = cfgs['cudnn']['enabled']\n    \n    # configurations related to the KITTI dataset\n    data_cfgs = cfgs['dataset']\n    \n    # which split to show\n    split = data_cfgs['split'] # default: KITTI val split\n    dataset_inf = libkitti.get_dataset(cfgs, logger, split)\n    \n    # set the dataset to inference mode\n    dataset_inf.inference([True, False])\n    \n    # read annotations\n    input_file_path = cfgs['dirs']['load_prediction_file']\n    # the record for 2D and 3D predictions\n    results = {}\n    \n    # flags: the user can choose to use which type of input bounding boxes to use\n    # use_gt_box can be used to re-produce the experiments simulating perfect 2D detection\n    results['flags'] = {}\n    if cfgs['use_pred_box']:\n        # read the predicted boxes as specified by the path\n        results['pred'] = dataset_inf.read_predictions(input_file_path)\n    \n    # Initialize Ego-Net and load the pre-trained checkpoint\n    model = EgoNet(cfgs, pre_trained=True)\n    model = model.eval().cuda()\n    \n    # perform inference and save the (updated) predictions\n    inference(dataset_inf, model, results, cfgs)       \n    if cfgs['visualize']:\n        return\n    \n    evaluator = \"./kitti-eval/evaluate_object_3d_offline\"\n    label_dir = os.path.join(cfgs['dataset']['root'], 'training', 'label_2')\n    output_dir = os.path.join(cfgs['dirs']['output'], 'submission')\n    \n    # When generating submission files for the test split,\n    # if no detections are produced for one image, generate an empty file\n    if cfgs['dataset']['split'] == 'test':\n        test_calib_dir = os.path.join(cfgs['dataset']['root'], 'testing', 'calib')\n        generate_empty_file(output_dir, test_calib_dir)\n        return\n    \n    # run kitti-eval to produce official evaluation\n    command = \"{} {} {}\".format(evaluator, label_dir, output_dir)\n    output = subprocess.check_output(command, shell=True)\n    print(output.decode())\n    return output\n\nif __name__ == \"__main__\":\n    main()"
  },
  {
    "path": "tools/inference_legacy.py",
    "content": "\"\"\"\nThis is the legacy inference code which includes some debugging functions.\nYou don't need to read this file to use Ego-Net.\n\"\"\"\nimport sys\nsys.path.append('../')\n\nimport libs.arguments.parse as parse\nimport libs.logger.logger as liblogger\nimport libs.dataset as dataset\nimport libs.dataset.KITTI.car_instance\nimport libs.model as models\nimport libs.model.FCmodel as FCmodel\nimport libs.dataset.normalization.operations as nop\nimport libs.visualization.points as vp\nimport libs.common.transformation as ltr\n\nfrom libs.common.img_proc import resize_bbox, get_affine_transform, get_max_preds, generate_xy_map\nfrom libs.common.img_proc import affine_transform_modified, cs2bbox, simple_crop, enlarge_bbox\nfrom libs.trainer.trainer import visualize_lifting_results, get_loader\nfrom libs.dataset.KITTI.car_instance import interp_dict\n\nimport shutil\nimport torch\nimport cv2\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport os\nimport math\nfrom scipy.spatial.transform import Rotation\nfrom copy import deepcopy\n\ndef prepare_models(cfgs, is_cuda=True):\n    \"\"\"\n    Initialize and load Ego-Net given a configuration file.\n    \"\"\"\n    hm_model_settings = cfgs['heatmapModel']\n    hm_model_name = hm_model_settings['name']\n    method_str = 'models.heatmapModel.' + hm_model_name + '.get_pose_net'\n    hm_model = eval(method_str)(cfgs, is_train=False)\n    lifter = FCmodel.get_fc_model(stage_id=1, \n                                  cfgs=cfgs, \n                                  input_size=cfgs['FCModel']['input_size'],\n                                  output_size=cfgs['FCModel']['output_size']\n                                  )\n    hm_model.load_state_dict(torch.load(cfgs['dirs']['load_hm_model']))\n    stats = np.load(cfgs['dirs']['load_stats'], allow_pickle=True).item()\n    lifter.load_state_dict(torch.load(cfgs['dirs']['load_lifter']))\n    \n    if is_cuda:\n        hm_model = hm_model.cuda()\n        lifter = lifter.cuda()\n    model_dict = {'heatmap_regression':hm_model.eval(),\n                  'lifting':lifter.eval(),\n                  'FC_stats':stats                  \n                  }\n    return model_dict\n\ndef modify_bbox(bbox, target_ar, enlarge=1.1):\n    \"\"\"\n    Enlarge a bounding box so that occluded parts may be enclosed.\n    \"\"\"\n    lbbox = enlarge_bbox(bbox[0], bbox[1], bbox[2], bbox[3], [enlarge, enlarge])\n    ret = resize_bbox(lbbox[0], lbbox[1], lbbox[2], lbbox[3], target_ar=target_ar)\n    return ret\n\ndef crop_single_instance(img, bbox, resolution, pth_trans=None, xy_dict=None):\n    \"\"\"\n    Crop a single instance given an image and bounding box.\n    \"\"\"\n    bbox = to_npy(bbox)\n    target_ar = resolution[0] / resolution[1]\n    ret = modify_bbox(bbox, target_ar)\n    c, s = ret['c'], ret['s']\n    r = 0.\n    # xy_dict: parameters for adding xy coordinate maps\n    trans = get_affine_transform(c, s, r, resolution)\n    instance = cv2.warpAffine(img,\n                              trans,\n                              (int(resolution[0]), int(resolution[1])),\n                              flags=cv2.INTER_LINEAR\n                              )\n    #cv2.imwrite('test.jpg', input)\n    #input = torch.from_numpy(input.transpose(2,0,1))\n    if xy_dict is not None and xy_dict['flag']:\n        xymap = generate_xy_map(ret['bbox'], resolution, img.shape[:-1])\n        instance = np.concatenate([instance, xymap.astype(np.float32)], axis=2)        \n    instance = instance if pth_trans is None else pth_trans(instance)\n    return instance\n\ndef crop_instances(annot_dict, \n                   resolution, \n                   pth_trans=None, \n                   rgb=True,\n                   xy_dict=None\n                   ):\n    \"\"\"\n    Crop input instances given an annotation dictionary.\n    \"\"\"\n    all_instances = []\n    # each record describes one instance\n    all_records = []\n    target_ar = resolution[0] / resolution[1]\n    for idx, path in enumerate(annot_dict['path']):\n        #print(path)\n        data_numpy = cv2.imread(path, 1 | 128)    \n        if data_numpy is None:\n            raise ValueError('Fail to read {}'.format(path))    \n        if rgb:\n            data_numpy = cv2.cvtColor(data_numpy, cv2.COLOR_BGR2RGB) \n        boxes = annot_dict['boxes'][idx]\n        if 'labels' in annot_dict:\n            labels = annot_dict['labels'][idx]\n        else:\n            labels = -np.ones((len(boxes)), dtype=np.int64)\n        if 'scores' in annot_dict:\n            scores = annot_dict['scores'][idx]\n        else:\n            scores = -np.ones((len(boxes)))\n        if len(boxes) == 0:\n            continue\n        for idx, bbox in enumerate(boxes):\n            # first crop the instance, and then resize to the required aspect ratio\n            instance = crop_single_instance(data_numpy,\n                                            bbox, \n                                            resolution, \n                                            pth_trans=pth_trans,\n                                            xy_dict=xy_dict\n                                            )\n            bbox = to_npy(bbox)\n            ret = modify_bbox(bbox, target_ar)\n            c, s = ret['c'], ret['s']\n            r = 0.\n            all_instances.append(torch.unsqueeze(instance, dim=0))\n            all_records.append({\n                'path': path,\n                'center': c,\n                'scale': s,\n                'bbox': bbox,\n                'bbox_resize': ret['bbox'],\n                'rotation': r,\n                'label': labels[idx],\n                'score': scores[idx]\n                }\n                )\n            #break\n    return torch.cat(all_instances, dim=0), all_records\n\ndef get_keypoints(instances, \n                  records, \n                  model, \n                  image_size=(256,256), \n                  arg_max='hard',\n                  is_cuda=True\n                  ):\n    \"\"\"\n    Foward pass to obtain the screen coordinates.\n    \"\"\"\n    if is_cuda:\n        instances = instances.cuda()\n    output = model(instances)\n    if type(output) is tuple:\n        pred, max_vals = output[1].data.cpu().numpy(), None  \n        \n    elif arg_max == 'hard':\n        if not isinstance(output, np.ndarray):\n            output = output.data.cpu().numpy()\n        pred, max_vals = get_max_preds(output)\n    else:\n        raise NotImplementedError\n    if type(output) is tuple:\n        pred *= image_size[0]\n    else:\n        pred *= image_size[0]/output.shape[3]\n    centers = [records[i]['center'] for i in range(len(records))]\n    scales = [records[i]['scale'] for i in range(len(records))]\n    rots = [records[i]['rotation'] for i in range(len(records))]    \n    for sample_idx in range(len(pred)):\n        trans_inv = get_affine_transform(centers[sample_idx],\n                                         scales[sample_idx], \n                                         rots[sample_idx], \n                                         image_size, \n                                         inv=1)\n        pred_src_coordinates = affine_transform_modified(pred[sample_idx], \n                                                             trans_inv) \n        record = records[sample_idx]\n        # pred_src_coordinates += np.array([[record['bbox'][0], record['bbox'][1]]])\n        records[sample_idx]['kpts'] = pred_src_coordinates\n    # assemble a dictionary where each key corresponds to one image\n    ret = {}\n    for record in records:\n        path = record['path']\n        if path not in ret:\n            ret[path] = {'center':[], \n                         'scale':[], \n                         'rotation':[], \n                         'bbox_resize':[], # resized bounding box\n                         'kpts_2d_pred':[], \n                         'label':[], \n                         'score':[]\n                         }\n        ret[path]['kpts_2d_pred'].append(record['kpts'].reshape(1, -1))\n        ret[path]['center'].append(record['center'])\n        ret[path]['scale'].append(record['scale'])\n        ret[path]['bbox_resize'].append(record['bbox_resize'])\n        ret[path]['label'].append(record['label'])\n        ret[path]['score'].append(record['score'])\n        ret[path]['rotation'].append(record['rotation'])\n    return ret\n\ndef kpts_to_euler(template, prediction):\n    \"\"\"\n    Convert the predicted cuboid representation to euler angles.\n    \"\"\"    \n    # estimate roll, pitch, yaw of the prediction by comparing with a \n    # reference bounding box\n    # prediction and template of shape [3, N_points]\n    R, T = ltr.compute_rigid_transform(template, prediction)\n    # in the order of yaw, pitch and roll\n    angles = Rotation.from_matrix(R).as_euler('yxz', degrees=False)\n    # re-order in the order of x, y and z\n    angles = angles[[1,0,2]]\n    return angles, T\n\ndef get_template(prediction, interp_coef=[0.332, 0.667]):\n    \"\"\"\n    Construct a template 3D cuboid used for computing regid transformation.\n    \"\"\" \n    parents = prediction[interp_dict['bbox12'][0]]\n    children = prediction[interp_dict['bbox12'][1]]\n    lines = parents - children\n    lines = np.sqrt(np.sum(lines**2, axis=1))\n    h = np.sum(lines[:4])/4 # averaged over the four parallel line segments\n    l = np.sum(lines[4:8])/4\n    w = np.sum(lines[8:])/4\n    x_corners = [0.5*l, l, l, l, l, 0, 0, 0, 0]\n    y_corners = [0.5*h, 0, h, 0, h, 0, h, 0, h]\n    z_corners = [0.5*w, w, w, 0, 0, w, w, 0, 0]\n    x_corners += - np.float32(l) / 2\n    y_corners += - np.float32(h)\n    #y_corners += - np.float32(h/2)\n    z_corners += - np.float32(w) / 2\n    corners_3d = np.array([x_corners, y_corners, z_corners])    \n    if len(prediction) == 33:\n        pidx, cidx = interp_dict['bbox12']\n        parents, children = corners_3d[:, pidx], corners_3d[:, cidx]\n        lines = children - parents\n        new_joints = [(parents + interp_coef[i]*lines) for i in range(len(interp_coef))]\n        corners_3d = np.hstack([corners_3d, np.hstack(new_joints)])    \n    return corners_3d\n\ndef get_observation_angle_trans(euler_angles, translations):\n    \"\"\"\n    Convert orientation in camera coordinate into local coordinate system\n    utilizing known object location (translation)\n    \"\"\" \n    alphas = euler_angles[:,1].copy()\n    for idx in range(len(euler_angles)):\n        ry3d = euler_angles[idx][1] # orientation in the camera coordinate system\n        x3d, z3d = translations[idx][0], translations[idx][2]\n        alpha = ry3d - math.atan2(-z3d, x3d) - 0.5 * math.pi\n        #alpha = ry3d - math.atan2(x3d, z3d)# - 0.5 * math.pi\n        while alpha > math.pi: alpha -= math.pi * 2\n        while alpha < (-math.pi): alpha += math.pi * 2\n        alphas[idx] = alpha\n    return alphas\n\ndef get_observation_angle_proj(euler_angles, kpts, K):\n    \"\"\"\n    Convert orientation in camera coordinate into local coordinate system\n    utilizing the projection of object on the image plane\n    \"\"\" \n    f = K[0,0]\n    cx = K[0,2]\n    kpts_x = [kpts[i][0,0] for i in range(len(kpts))]\n    alphas = euler_angles[:,1].copy()\n    for idx in range(len(euler_angles)):\n        ry3d = euler_angles[idx][1] # orientation in the camera coordinate system\n        x3d, z3d = kpts_x[idx] - cx, f\n        alpha = ry3d - math.atan2(-z3d, x3d) - 0.5 * math.pi\n        #alpha = ry3d - math.atan2(x3d, z3d)# - 0.5 * math.pi\n        while alpha > math.pi: alpha -= math.pi * 2\n        while alpha < (-math.pi): alpha += math.pi * 2\n        alphas[idx] = alpha\n    return alphas\n\ndef get_6d_rep(predictions, ax=None, color=\"black\"):\n    \"\"\"\n    Get the 6DoF representation of a 3D prediction.\n    \"\"\"    \n    predictions = predictions.reshape(len(predictions), -1, 3)\n    all_angles = []\n    for instance_idx in range(len(predictions)):\n        prediction = predictions[instance_idx]\n        # templates are 3D boxes with no rotation\n        # the prediction is estimated as the rotation between prediction and template\n        template = get_template(prediction)\n        instance_angle, instance_trans = kpts_to_euler(template, prediction.T)        \n        all_angles.append(instance_angle.reshape(1, 3))\n    angles = np.concatenate(all_angles)\n    # the first point is the predicted point center\n    translation = predictions[:, 0, :]    \n    if ax is not None:\n        pose_vecs = np.concatenate([translation, angles], axis=1)\n        draw_pose_vecs(ax, pose_vecs, color=color)\n    return angles, translation\n\ndef format_str_submission(roll, pitch, yaw, x, y, z, score):\n    \"\"\"\n    Get a prediction string in ApolloScape style.\n    \"\"\"      \n    tempt_str = \"{pitch:.3f} {yaw:.3f} {roll:.3f} {x:.3f} {y:.3f} {z:.3f} {score:.3f}\".format(\n            pitch=pitch,\n            yaw=yaw,\n            roll=roll,\n            x=x,\n            y=y,\n            z=z,\n            score=score)\n    return tempt_str\n\ndef get_instance_str(dic):\n    \"\"\"\n    Produce KITTI style prediction string for one instance.\n    \"\"\"     \n    string = \"\"\n    string += dic['class'] + \" \"\n    string += \"{:.1f} \".format(dic['truncation'])\n    string += \"{:.1f} \".format(dic['occlusion'])\n    string += \"{:.6f} \".format(dic['alpha'])\n    string += \"{:.6f} {:.6f} {:.6f} {:.6f} \".format(dic['bbox'][0], dic['bbox'][1], dic['bbox'][2], dic['bbox'][3])\n    string += \"{:.6f} {:.6f} {:.6f} \".format(dic['dimensions'][1], dic['dimensions'][2], dic['dimensions'][0])\n    string += \"{:.6f} {:.6f} {:.6f} \".format(dic['locations'][0], dic['locations'][1], dic['locations'][2])\n    string += \"{:.6f} \".format(dic['rot_y'])\n    if 'score' in dic:\n        string += \"{:.8f} \".format(dic['score'])\n    else:\n        string += \"{:.8f} \".format(1.0)\n    return string\n\ndef get_pred_str(record):\n    \"\"\"\n    Produce KITTI style prediction string for a record dictionary.\n    \"\"\"      \n    # replace the rotation prediction generated by the previous stage\n    updated_txt = deepcopy(record['raw_txt_format'])\n    for instance_id in range(len(record['euler_angles'])):\n        updated_txt[instance_id]['rot_y'] = record['euler_angles'][instance_id, 1]\n        updated_txt[instance_id]['alpha'] = record['alphas'][instance_id]\n    pred_str = \"\"\n    angles = record['euler_angles']\n    for instance_id in range(len(angles)):\n        # format a string for submission\n        tempt_str = get_instance_str(updated_txt[instance_id])\n        if instance_id != len(angles) - 1:\n            tempt_str += '\\n'\n        pred_str += tempt_str\n    return pred_str\n\ndef lift_2d_to_3d(records, model, stats, template, cuda=True):\n    \"\"\"\n    Foward-pass of the lifter model.\n    \"\"\"      \n    for path in records.keys():\n        data = np.concatenate(records[path]['kpts_2d_pred'], axis=0)\n        data = nop.normalize_1d(data, stats['mean_in'], stats['std_in'])\n        data = data.astype(np.float32)\n        data = torch.from_numpy(data)\n        if cuda:\n            data = data.cuda()\n        prediction = model(data)  \n        prediction = nop.unnormalize_1d(prediction.data.cpu().numpy(),\n                                        stats['mean_out'], \n                                        stats['std_out']\n                                        )\n        records[path]['kpts_3d_pred'] = prediction.reshape(len(prediction), -1, 3)\n    return records\n\ndef filter_detection(detected, thres=0.7):\n    \"\"\"\n    Filter predictions based on a confidence threshold.\n    \"\"\"      \n    # detected: list of dict\n    filtered = []\n    for detection in detected:\n        tempt_dict = {}\n        indices = detection['scores'] > thres\n        for key in ['boxes', 'labels', 'scores']:\n            tempt_dict[key] = detection[key][indices]\n        filtered.append(tempt_dict)\n    return filtered\n\ndef add_orientation_arrow(record):\n    \"\"\"\n    Generate an arrow for each predicted orientation for visualization.\n    \"\"\"      \n    pred_kpts = record['kpts_3d_pred']\n    gt_kpts = record['kpts_3d_gt']\n    K = record['K']\n    arrow_2d = np.zeros((len(pred_kpts), 2, 2))\n    for idx in range(len(pred_kpts)):\n        vector_3d = (pred_kpts[idx][1] - pred_kpts[idx][5])\n        arrow_3d = np.concatenate([gt_kpts[idx][0].reshape(3, 1), \n                                  (gt_kpts[idx][0] + vector_3d).reshape(3, 1)],\n                                  axis=1)\n        projected = K @ arrow_3d\n        arrow_2d[idx][0] = projected[0, :] / projected[2, :]\n        arrow_2d[idx][1] = projected[1, :] / projected[2, :]\n        # fix the arrow length if not fore-shortened\n        vector_2d = arrow_2d[idx][:,1] - arrow_2d[idx][:,0]\n        length = np.linalg.norm(vector_2d)\n        if length > 50:\n            vector_2d = vector_2d/length * 60\n        arrow_2d[idx][:,1] = arrow_2d[idx][:,0] + vector_2d\n    return arrow_2d\n\ndef process_batch(images, \n                  hm_regressor, \n                  lifter, \n                  stats, \n                  template,\n                  annot_dict,\n                  pth_trans=None, \n                  is_cuda=True, \n                  threshold=None,\n                  xy_dict=None\n                  ):\n    \"\"\"\n    Process a batch of images.\n    # annot_dict is a Python dictionary storing\n    # keys: \n    #       path: list of image paths\n    #       boxes: list of bounding boxes for each image\n    \"\"\"\n    all_instances, all_records = crop_instances(annot_dict, \n                                                resolution=(256, 256),\n                                                pth_trans=pth_trans,\n                                                xy_dict=xy_dict\n                                                )\n    # all_records stores records for each instance\n    records = get_keypoints(all_instances, all_records, hm_regressor)\n    # records stores records for each image\n    records = lift_2d_to_3d(records, lifter, stats, template)\n    # merge with the annotation dictionary\n    for idx, path in enumerate(annot_dict['path']):\n        if 'boxes' in annot_dict:\n            records[path]['boxes'] = to_npy(annot_dict['boxes'][idx])\n        if 'kpts' in annot_dict:\n            records[path]['kpts_2d_gt'] = to_npy(annot_dict['kpts'][idx])   \n        if 'kpts_3d_gt' in annot_dict:\n            records[path]['kpts_3d_gt'] = to_npy(annot_dict['kpts_3d_gt'][idx])   \n        if 'pose_vecs_gt' in annot_dict:            \n            records[path]['pose_vecs_gt'] = to_npy(annot_dict['pose_vecs_gt'][idx])  \n        if 'kpts_3d_SMOKE' in annot_dict:\n            records[path]['kpts_3d_SMOKE'] = to_npy(annot_dict['kpts_3d_SMOKE'][idx])  \n        if 'raw_txt_format' in annot_dict:\n            # list of annotation dictionary for each instance\n            records[path]['raw_txt_format'] = annot_dict['raw_txt_format'][idx]\n        if 'K' in annot_dict:\n            # list of annotation dictionary for each instance\n            records[path]['K'] = annot_dict['K'][idx]\n        if 'kpts_3d_gt' in annot_dict and 'K' in annot_dict:\n            records[path]['arrow'] = add_orientation_arrow(records[path])\n    return records\n\ndef to_npy(tensor):\n    \"\"\"\n    Convert PyTorch tensor to numpy array.\n    \"\"\"\n    if isinstance(tensor, np.ndarray):\n        return tensor\n    else:\n        return tensor.data.cpu().numpy()\n\ndef refine_with_perfect_size(pred, \n                             observation, \n                             intrinsics, \n                             dist_coeffs, \n                             gts, \n                             threshold=5., \n                             ax=None\n                             ):\n    \"\"\"\n    Use the gt 3D box size for refinement to show the performance gain with \n    size regression.\n    If there is a nearby ground truth bbox, use its size.\n    pred [9, 3] gts[N, 9, 3]\n    \"\"\"    \n    pred_center = pred[0, :].reshape(1,3)\n    distance = np.sqrt(np.sum((gts[:, 0, :] - pred_center)**2, axis=1))\n    minimum_idx = np.where(distance == distance.min())[0][0]\n    if distance[minimum_idx] > threshold:\n        return False, None\n    else:\n        # First align the box with gt size with the predicted box, then refine\n        tempt_box_pred = pred.copy()\n        tempt_box_pred[1:, :] += tempt_box_pred[0, :].reshape(1, 3)        \n        tempt_box_gt = gts[minimum_idx].copy()\n        tempt_box_gt[1:, :] += tempt_box_gt[0, :].reshape(1, 3)         \n        pseudo_box = ltr.procrustes_transform(tempt_box_gt.T, tempt_box_pred.T)\n        refined_prediction = ltr.pnp_refine(pseudo_box.T, observation, intrinsics, \n                                        dist_coeffs) \n        if ax is not None:\n            vp.plot_lines(ax, \n                          pseudo_box[:, 1:].T, \n                          vp.plot_3d_bbox.connections, \n                          dimension=3, \n                          c='y',\n                          linestyle='-.')         \n            vp.plot_lines(ax, \n                          refined_prediction[:, 1:].T, \n                          vp.plot_3d_bbox.connections, \n                          dimension=3, \n                          c='b',\n                          linestyle='-.')         \n        return True, refined_prediction\n\ndef refine_with_predicted_bbox(pred, \n                               observation, \n                               intrinsics, \n                               dist_coeffs, \n                               gts=None, \n                               threshold=5., \n                               ax=None\n                               ):\n    \"\"\"\n    Refine with predicted 3D cuboid (disabled by default).\n    \"\"\" \n    tempt_box_pred = pred.copy()\n    tempt_box_pred[1:, :] += tempt_box_pred[0, :].reshape(1, 3)\n    # use the predicted 3D bounding box size for refinement\n    refined_prediction = ltr.pnp_refine(tempt_box_pred, observation, intrinsics, \n                                    dist_coeffs)    \n    # discard the results if the refined solution is to far away from the initial position\n    distance = refined_prediction[:, 0] - tempt_box_pred[0, :]\n    distance = np.sqrt(np.sum(distance**2))\n    if distance > threshold:\n        return False, None\n    else:\n        # plotting\n        if ax is not None:\n            vp.plot_lines(ax, \n                          refined_prediction[:, 1:].T, \n                          vp.plot_3d_bbox.connections, \n                          dimension=3, \n                          c='g')        \n        return True, refined_prediction\n\ndef draw_pose_vecs(ax, pose_vecs=None, color='black'):\n    \"\"\"\n    Add pose vectors to a 3D matplotlib axe.\n    \"\"\"     \n    if pose_vecs is None:\n        return\n    for pose_vec in pose_vecs:\n        x, y, z, pitch, yaw, roll = pose_vec\n        string = \"({:.2f}, {:.2f}, {:.2f})\".format(pitch, yaw, roll)\n        # add some random noise to the text location so that they do not overlap\n        nl = 0.02 # noise level\n        ax.text(x*(1+np.random.randn()*nl), \n                y*(1+np.random.randn()*nl), \n                z*(1+np.random.randn()*nl), \n                string, \n                color=color\n                )\n\ndef refine_solution(est_3d, \n                    est_2d, \n                    K, \n                    dist_coeffs, \n                    refine_func, \n                    output_arr, \n                    output_flags, \n                    gts=None, \n                    ax=None\n                    ):\n    \"\"\"\n    Refine 3D prediction by minimizing re-projection error.\n    est: estimates [N, 9, 3]\n    K: intrinsics    \n    \"\"\"      \n    for idx in range(len(est_3d)):\n        success, refined_prediction = refine_func(est_3d[idx],\n                                                  est_2d[idx],\n                                                  K,\n                                                  dist_coeffs,\n                                                  gts=gts,\n                                                  ax=ax)\n        if success:\n            # update the refined solution\n            output_arr[idx] = refined_prediction.T\n            output_flags[idx] = True\n            # # convert to the center-relative shape representation\n            # p3d_pred_refined[idx][1:, :] -= p3d_pred_refined[idx][[0]]    \n    return\n\ndef gather_lifting_results(record,\n                           data,\n                           prediction, \n                           target=None,\n                           pose_vecs=None,\n                           intrinsics=None, \n                           refine=False, \n                           visualize=False,\n                           template=None,\n                           dist_coeffs=np.zeros((4,1)),\n                           color='r',\n                           get_str=False,\n                           alpha_mode='trans'\n                           ):\n    \"\"\"\n    Lift Screen coordinates to 3D representation and a optimization-based \n    refinement is optional.\n    \"\"\"\n    if target is not None:\n        p3d_gt = target.reshape(len(target), -1, 3)\n    else:\n        p3d_gt = None\n    p3d_pred = prediction.reshape(len(prediction), -1, 3)\n    # only for visualizing the prediciton of shape using gt bboxes\n    if \"kpts_3d_SMOKE\" in record:\n        p3d_pred = np.concatenate([record['kpts_3d_SMOKE'][:, [0], :], p3d_pred], axis=1)\n    elif p3d_gt is not None and p3d_gt.shape[1] == p3d_pred.shape[1] + 1:\n        if len(p3d_pred) != len(p3d_gt):\n            print('debug')\n        assert len(p3d_pred) == len(p3d_gt)\n        p3d_pred = np.concatenate([p3d_gt[:, [0], :], p3d_pred], axis=1) \n    else:\n        raise NotImplementedError\n    # this object will be updated if one prediction is refined \n    p3d_pred_refined = p3d_pred.copy()\n    refine_flags = [False for i in range(len(p3d_pred_refined))]\n    # similar object but using a different refinement strategy\n    p3d_pred_refined2 = p3d_pred.copy()\n    refine_flags2 = [False for i in range(len(p3d_pred_refined2))]\n    # input 2D keypoints\n    data = data.reshape(len(data), -1, 2)\n    if visualize:\n        if 'plots' in record and 'ax3d' in record['plots']:\n            ax = record['plots']['ax3d']\n            ax = vp.plot_scene_3dbox(p3d_pred, p3d_gt, ax=ax, color=color)\n        elif 'plots' in record:\n        # plotting the 3D scene\n            ax = vp.plot_scene_3dbox(p3d_pred, p3d_gt, color=color)\n            draw_pose_vecs(ax, pose_vecs)\n            record['plots']['ax3d'] = ax\n        else:\n            raise ValueError\n    else:\n        ax = None\n    if refine:\n        assert intrinsics is not None         \n        # refine 3D point prediction by minimizing re-projection errors        \n        refine_solution(p3d_pred, \n                        data, \n                        intrinsics, \n                        dist_coeffs, \n                        refine_with_predicted_bbox, \n                        p3d_pred_refined, \n                        refine_flags,\n                        ax=ax\n                        )\n        if target is not None:\n            # refine with ground truth bounding box size for debugging purpose\n            refine_solution(p3d_pred, \n                            data, \n                            intrinsics, \n                            dist_coeffs, \n                            refine_with_perfect_size, \n                            p3d_pred_refined2, \n                            refine_flags2,\n                            gts=p3d_gt,\n                            ax=ax\n                            )        \n    record['kpts_3d_refined'] = p3d_pred_refined  \n    # prepare the prediction string for submission\n    # compute the roll, pitch and yaw angle of the predicted bounding box\n    record['euler_angles'], record['translation'] = \\\n        get_6d_rep(record['kpts_3d_refined'], ax, color=color) # the predicted pose vectors are also drawn here\n    if alpha_mode == 'trans':\n        record['alphas'] = get_observation_angle_trans(record['euler_angles'], \n                                                       record['translation'])\n    elif alpha_mode == 'proj':\n        record['alphas'] = get_observation_angle_proj(record['euler_angles'],\n                                                      record['kpts_2d_pred'],\n                                                      record['K'])        \n    else:\n         raise NotImplementedError   \n    if get_str:\n        record['pred_str'] = get_pred_str(record)      \n    return record\n\ndef save_txt_file(img_path, prediction, params):\n    \"\"\"\n    Save a txt file for predictions of an image.\n    \"\"\"    \n    if not params['flag']:\n        return\n    file_name = img_path.split('/')[-1][:-3] + 'txt'\n    save_path = os.path.join(params['save_dir'], file_name) \n    with open(save_path, 'w') as f:\n        f.write(prediction['pred_str'])\n    return\n\ndef refine_one_image(img_path, \n                     record, \n                     add_3d_bbox=True, \n                     camera=None, \n                     template=None,\n                     visualize=False,\n                     color_dict={'bbox_2d':'r',\n                                 'bbox_3d':'r',\n                                 'kpts':['rx', 'b']\n                                 },\n                     save_dict={'flag':False,\n                                'save_dir':None\n                                },\n                     alpha_mode='trans'\n                     ):\n    \"\"\"\n    Refine the predictions from a single image.\n    \"\"\"\n    # plot 2D predictions \n    if visualize:\n        if 'plots' in record:\n            fig = record['plots']['fig2d']\n            ax = record['plots']['ax2d']\n        else:\n            fig = plt.figure(figsize=(11.3, 9))\n            ax = plt.subplot(111)\n            record['plots'] = {}\n            record['plots']['fig2d'] = fig\n            record['plots']['ax2d'] = ax\n            image = cv2.imread(img_path, cv2.IMREAD_UNCHANGED)[:, :, ::-1]\n            height, width, _ = image.shape\n            ax.imshow(image) \n            ax.set_xlim([0, width])\n            ax.set_ylim([0, height])\n            ax.invert_yaxis()\n            \n        num_instances = len(record['kpts_2d_pred'])\n        for idx in range(num_instances):\n            kpts = record['kpts_2d_pred'][idx].reshape(-1, 2)\n            # kpts_3d = record['kpts_3d'][idx]\n            bbox = record['bbox_resize'][idx]\n            label = record['label'][idx]\n            score = record['score'][idx]\n            vp.plot_2d_bbox(ax, bbox, color_dict['bbox_2d'], score, label)\n            # predicted key-points\n            ax.plot(kpts[:, 0], kpts[:, 1], color_dict['kpts'][0])\n            # if add_3d_bbox:\n            #     vp.plot_3d_bbox(ax, kpts[1:,], color_dict['kpts'][1])  \n            # bbox_3d_projected = project_3d_to_2d(kpts_3d)\n            # vp.plot_3d_bbox(ax, bbox_3d_projected[:2, :].T)      \n        # plot ground truth\n        if 'kpts_2d_gt' in record:\n            for idx, kpts_gt in enumerate(record['kpts_2d_gt']):\n                kpts_gt = kpts_gt.reshape(-1, 3)\n                # ax.plot(kpts_gt[:, 0], kpts_gt[:, 1], 'gx')\n                vp.plot_3d_bbox(ax, kpts_gt[1:, :2], color='g', linestyle='-.')\n        if 'arrow' in record:\n            for idx in range(len(record['arrow'])):\n                start = record['arrow'][idx][:,0]\n                end = record['arrow'][idx][:,1]\n                x, y = start\n                dx, dy = end - start\n                ax.arrow(x, y, dx, dy, color='r', lw=4, head_width=5, alpha=0.5)         \n            # save intermediate results\n            plt.gca().set_axis_off()\n            plt.subplots_adjust(top = 1, bottom = 0, right = 1, left = 0, \n            hspace = 0, wspace = 0)\n            plt.margins(0,0)\n            plt.gca().xaxis.set_major_locator(plt.NullLocator())\n            plt.gca().yaxis.set_major_locator(plt.NullLocator())\n            img_name = img_path.split('/')[-1]\n            save_dir = './debug/qualitative_results/'\n            plt.savefig(save_dir + img_name, dpi=100, bbox_inches = 'tight', pad_inches = 0)\n    # plot 3d bounding boxes\n    all_kpts_2d = np.concatenate(record['kpts_2d_pred'])\n    all_kpts_3d_pred = record['kpts_3d_pred'].reshape(len(record['kpts_3d_pred']), -1)\n    if 'kpts_3d_gt' in record:\n        all_kpts_3d_gt = record['kpts_3d_gt']\n        all_pose_vecs_gt = record['pose_vecs_gt']\n    else:\n        all_kpts_3d_gt = None\n        all_pose_vecs_gt = None\n    refine_args = {'visualize':visualize, 'get_str':save_dict['flag']}\n    if camera is not None:\n        refine_args['intrinsics'] = camera\n        refine_args['refine'] = True\n        refine_args['template'] = template\n    # refine and gather the prediction strings\n    record = gather_lifting_results(record,\n                                    all_kpts_2d,\n                                    all_kpts_3d_pred, \n                                    all_kpts_3d_gt,\n                                    all_pose_vecs_gt,\n                                    color=color_dict['bbox_3d'],\n                                    alpha_mode=alpha_mode,\n                                    **refine_args\n                                    )\n    # plot 3D bounding box generated by SMOKE\n    if 'kpts_3d_SMOKE' in record:\n        kpts_3d_SMOKE = record['kpts_3d_SMOKE']\n        if 'plots' in record:\n            # update drawings\n            ax = record['plots']['ax3d']\n            vp.plot_scene_3dbox(kpts_3d_SMOKE, ax=ax, color='m')    \n            pose_vecs = np.zeros((len(kpts_3d_SMOKE), 6))\n            for idx in range(len(pose_vecs)):\n                pose_vecs[idx][0:3] = record['raw_txt_format'][idx]['locations']\n                pose_vecs[idx][4] = record['raw_txt_format'][idx]['rot_y']\n            # plot pose vectors\n            draw_pose_vecs(ax, pose_vecs, color='m')\n    # save KITTI-style prediction file in .txt format\n    save_txt_file(img_path, record, save_dict)\n    return record\n\ndef post_process(records, \n                 camera=None, \n                 template=None, \n                 visualize=False, \n                 color_dict={'bbox_2d':'r',\n                             'kpts':['ro', 'b'],\n                             },\n                 save_dict={'flag':False,\n                            'save_dir':None\n                            },\n                 alpha_mode='trans'\n                 ):\n    for img_path in records.keys():\n        print(img_path)\n        records[img_path] = refine_one_image(img_path, \n                                             records[img_path], \n                                             camera=camera,\n                                             template=template,\n                                             visualize=visualize,\n                                             color_dict=color_dict,\n                                             save_dict=save_dict,\n                                             alpha_mode=alpha_mode\n                                             )      \n    return records\n\ndef merge(dict_a, dict_b):\n    for key in dict_b.keys():\n        dict_a[key] = dict_b[key]\n    return\n\ndef collate_dict(dict_list):\n    ret = {}\n    for key in dict_list[0]:\n        ret[key] = [d[key] for d in dict_list]\n    return ret\n\ndef my_collate_fn(batch):\n    # the collate function for 2d pose training\n    imgs, meta = list(zip(*batch))\n    meta = collate_dict(meta)\n    return imgs, meta\n\ndef filter_conf(record, thres=0.0):\n    \"\"\"\n    Filter the proposals with a confidence threshold.\n    \"\"\"\n    annots = record['raw_txt_format']\n    indices = [i for i in range(len(annots)) if annots[i]['score'] >= thres]\n    if len(indices) == 0:\n        return False, record\n    filterd_record = {\n        'bbox_2d': record['bbox_2d'][indices],\n        'kpts_3d': record['kpts_3d'][indices],\n        'raw_txt_format': [annots[i] for i in indices],\n        'scores': [annots[i]['score'] for i in indices],\n        'K':record['K']\n        }\n    return True, filterd_record\n\ndef gather_dict(request, references, filter_c=True):\n    \"\"\"\n    Gather a dict from reference as requsted.\n    \"\"\"\n    assert 'path' in request\n    ret = {'path':[], \n           'boxes':[], \n           'kpts_3d_SMOKE':[], \n           'raw_txt_format':[],\n           'scores':[],\n           'K':[]}\n    for img_path in request['path']:\n        img_name = img_path.split('/')[-1]\n        if img_name not in references:\n            print('Warning: ' + img_name + ' not included in detected images!')\n            continue\n        ref = references[img_name]\n        if filter_c:\n            success, ref = filter_conf(ref)\n        if filter_c and not success:\n            continue\n        ret['path'].append(img_path)\n        # ret['boxes'].append(ref['bbox_2d'])\n        # temporary hack: enlarge the bounding box from the stage 1 model\n        bbox = ref['bbox_2d']\n        for instance_id in range(len(bbox)):\n            bbox[instance_id] = np.array(modify_bbox(bbox[instance_id], target_ar=1, enlarge=1.2)['bbox'])\n        # temporary hack 2: use the gt bounding box for analysis\n        ret['boxes'].append(bbox)\n        # 3D bounding box produced by SMOKE\n        ret['kpts_3d_SMOKE'].append(ref['kpts_3d'])\n        ret['raw_txt_format'].append(ref['raw_txt_format'])\n        ret['scores'].append(ref['scores'])\n        ret['K'].append(ref['K'])\n    #ret['kpts_3d_gt'] = request['kpts_3d_gt']\n    if 'pose_vecs_gt' in request:\n        ret['pose_vecs_gt'] = request['pose_vecs_gt']\n    return ret\n    \n@torch.no_grad()\ndef inference(testset, model_settings, results, cfgs):\n    \"\"\"\n    The main inference function.\n    \"\"\"\n    # visualize to plot the 2D detection and 3D scene reconstruction\n    data_loader = get_loader(testset, cfgs, 'testing', collate_fn=my_collate_fn)          \n    hm_regressor = model_settings['heatmap_regression']\n    lifter = model_settings['lifting']\n    # statistics for the FC model\n    stats = model_settings['FC_stats']\n    #template = testset.instance_stats['ref_box3d']\n    template = None\n    pth_trans = testset.pth_trans\n    if 'add_xy' in cfgs['heatmapModel']:\n        xy_dict = {'flag':cfgs['heatmapModel']['add_xy']}\n    else:\n        xy_dict = None\n    all_records = {}\n    camera = None\n    flags = results['flags']\n    visualize = cfgs['visualize']\n    batch_to_show = cfgs['batch_to_show']\n    for batch_idx, (images, meta) in enumerate(data_loader):\n        if flags['gt']:\n            save_dir = os.path.join(cfgs['dirs']['output'], 'gt_box_test', 'data')\n            if not os.path.exists(save_dir):\n                os.makedirs(save_dir)            \n            # ground truth bounding box for comparison\n            record = process_batch(images,\n                                   hm_regressor, \n                                   lifter, \n                                   stats, \n                                   template,\n                                   annot_dict=meta,\n                                   pth_trans=pth_trans, \n                                   threshold=None,\n                                   xy_dict=xy_dict\n                                   )\n            record = post_process(record, \n                                  camera, \n                                  template, \n                                  visualize=visualize,\n                                  color_dict={'bbox_2d':'y',\n                                              'bbox_3d':'y',\n                                              'kpts':['yx', 'y'],\n                                              },\n                                  save_dict={\n                                      'flag':True,\n                                      'save_dir':save_dir\n                                      }\n                                  )\n            merge(all_records, record)\n        if flags['pred']:\n            # use detected bounding box from an anchor-free model\n            annot_dict = gather_dict(meta, results['pred'])\n            if len(annot_dict['path']) == 0:\n                continue\n            record2 = process_batch(images,\n                                    hm_regressor, \n                                    lifter, \n                                    stats, \n                                    template,\n                                    annot_dict,\n                                    pth_trans=pth_trans, \n                                    threshold=None,\n                                    xy_dict=xy_dict\n                                    )\n            for key in record2:\n                if 'record' in locals() and 'plots' in record[key]:\n                    record2[key]['plots'] = record[key]['plots']\n            save_dir = os.path.join(cfgs['dirs']['output'], 'submission', 'data')\n            if not os.path.exists(save_dir):\n                os.makedirs(save_dir)\n            record2 = post_process(record2,\n                                   camera, \n                                   template, \n                                   visualize=visualize,\n                                   color_dict={'bbox_2d':'r',\n                                               'bbox_3d':'r',\n                                               'kpts':['rx', 'r'],\n                                               },\n                                   save_dict={'flag':True,\n                                              'save_dir':save_dir\n                                              },\n                                  alpha_mode=cfgs['testing_settings']['alpha_mode']\n                                  )   \n            del images, record2, meta\n        if batch_idx >= batch_to_show - 1:\n            break\n    # produce a csv file\n    # csv_output_path = cfgs['dirs']['csv_output']\n    # save_csv(all_records, csv_output_path)        \n    return\n\ndef generate_empty_file(output_dir, label_dir):\n    \"\"\"\n    Generate empty files for images without any predictions.\n    \"\"\"    \n    all_files = os.listdir(label_dir)\n    detected = os.listdir(os.path.join(output_dir, 'data'))\n    for file_name in all_files:\n        if file_name[:-4] + \".txt\" not in detected:\n            file = open(os.path.join(output_dir, 'data', file_name[:-4] + '.txt'), 'w')\n            file.close()\n    return\n\ndef main():\n    # experiment configurations\n    cfgs = parse.parse_args()\n    \n    # logging\n    logger, final_output_dir = liblogger.get_logger(cfgs)   \n    shutil.copyfile(cfgs['config_path'], os.path.join(final_output_dir, 'saved_config.yml'))\n    # Set GPU\n    if cfgs['use_gpu'] and torch.cuda.is_available():\n        GPUs = cfgs['gpu_id']\n    else:\n        logger.info(\"GPU acceleration is disabled.\")\n        \n    # cudnn related setting\n    torch.backends.cudnn.benchmark = cfgs['cudnn']['benchmark']\n    torch.backends.cudnn.deterministic = cfgs['cudnn']['deterministic']\n    torch.backends.cudnn.enabled = cfgs['cudnn']['enabled']\n\n    data_cfgs = cfgs['dataset']\n    \n    # which split to show\n    split = 'valid'\n    dataset_inf = eval('dataset.' + data_cfgs['name']  \n                        + '.car_instance').get_dataset(cfgs, logger, split)\n    # set to inference mode but does not read image\n    dataset_inf.inference([True, False])\n    \n    # some temporary testing\n    # test_angle_conversion(dataset_inf, dataset_inf.instance_stats['ref_box3d'])\n    \n    # read annotations\n    input_file_path = cfgs['dirs']['load_prediction_file']\n    # the record for 2D and 3D predictions\n    # key->value: name of the approach->dictionary storing the predictions\n    results = {}\n    confidence_thres = cfgs['conf_thres']\n    \n    # flags: use predicted bounding boxes as well as the ground truth boxes\n    # for comparison\n    results['flags'] = {}\n    results['flags']['pred'] = cfgs['use_pred_box']\n    if results['flags']['pred']:\n        results['pred'] = dataset_inf.read_predictions(input_file_path)\n    results['flags']['gt'] = cfgs['use_gt_box']\n    \n    # load checkpoints\n    model_dict = prepare_models(cfgs)\n    \n    # inference and update prediction\n    inference(dataset_inf, model_dict, results, cfgs)       \n    \n    # then you can run kitti-eval for evaluation\n    evaluator = cfgs['dirs']['kitti_evaluator']\n    label_dir = cfgs['dirs']['kitti_label']\n    output_dir = os.path.join(cfgs['dirs']['output'], 'submission')\n    \n    # if no detections are produced, generate an empty file\n    #generate_empty_file(output_dir, label_dir)\n    command = \"{} {} {}\".format(evaluator, label_dir, output_dir)\n    # e.g.\n    # ~/Documents/Github/SMOKE/smoke/data/datasets/evaluation/kitti/kitti_eval/evaluate_object_3d_offline /home/nicholas/Documents/Github/SMOKE/datasets/kitti/training/label_2 /media/nicholas/Database/experiments/3DLearning/0826\n    # /media/nicholas/Database/Github/M3D-RPN/data/kitti_split1/devkit/cpp/evaluate_object /home/nicholas/Documents/Github/SMOKE/datasets/kitti/training/label_2 /media/nicholas/Database/Github/M3D-RPN/output/tmp_results\n    return\n\nif __name__ == \"__main__\":\n    main()\n"
  },
  {
    "path": "tools/kitti-eval/README.md",
    "content": "# kitti_eval\n\n`evaluate_object_3d_offline.cpp`evaluates your KITTI detection locally on your own computer using your validation data selected from KITTI training dataset, with the following metrics:\n\n- overlap on image (AP)\n- oriented overlap on image (AOS)\n- overlap on ground-plane (AP)\n- overlap in 3D (AP)\n\nCompile `evaluate_object_3d_offline.cpp` (or `evaluate_object_3d_offline_r40.cpp` for the updated metric) with dependency of Boost and Linux `dirent.h` (You should already have it under most Linux).\n\nRun the evalutaion by:\n\n    ./evaluate_object_3d_offline groundtruth_dir result_dir\n    \nNote that you don't have to detect over all KITTI training data. The evaluator only evaluates samples whose result files exist.\n\n\n### Updates\n\n- June, 2017:\n  * Fixed the bug of detection box filtering based on min height according to KITTI's note on 25.04.2017.\n"
  },
  {
    "path": "tools/kitti-eval/evaluate_object_3d.cpp",
    "content": "#include <iostream>\n#include <algorithm>\n#include <stdio.h>\n#include <math.h>\n#include <vector>\n#include <numeric>\n#include <strings.h>\n#include <assert.h>\n\n#include <dirent.h>\n\n#include <boost/numeric/ublas/matrix.hpp>\n#include <boost/numeric/ublas/io.hpp>\n\n#include <boost/geometry.hpp>\n#include <boost/geometry/geometries/point_xy.hpp>\n#include <boost/geometry/geometries/polygon.hpp>\n#include <boost/geometry/geometries/adapted/c_array.hpp>\n\n#include \"mail.h\"\n\nBOOST_GEOMETRY_REGISTER_C_ARRAY_CS(cs::cartesian)\n\ntypedef boost::geometry::model::polygon<boost::geometry::model::d2::point_xy<double> > Polygon;\n\n\nusing namespace std;\n\n/*=======================================================================\nSTATIC EVALUATION PARAMETERS\n=======================================================================*/\n\n// holds the number of test images on the server\nconst int32_t N_TESTIMAGES = 7518;\n\n// easy, moderate and hard evaluation level\nenum DIFFICULTY{EASY=0, MODERATE=1, HARD=2};\n\n// evaluation metrics: image, ground or 3D\nenum METRIC{IMAGE=0, GROUND=1, BOX3D=2};\n\n// evaluation parameter\nconst int32_t MIN_HEIGHT[3]     = {40, 25, 25};     // minimum height for evaluated groundtruth/detections\nconst int32_t MAX_OCCLUSION[3]  = {0, 1, 2};        // maximum occlusion level of the groundtruth used for evaluation\nconst double  MAX_TRUNCATION[3] = {0.15, 0.3, 0.5}; // maximum truncation level of the groundtruth used for evaluation\n\n// evaluated object classes\nenum CLASSES{CAR=0, PEDESTRIAN=1, CYCLIST=2};\nconst int NUM_CLASS = 3;\n\n// parameters varying per class\nvector<string> CLASS_NAMES;\n// the minimum overlap required for 2D evaluation on the image/ground plane and 3D evaluation\nconst double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.5, 0.25, 0.25}, {0.5, 0.25, 0.25}};\n\n// no. of recall steps that should be evaluated (discretized)\nconst double N_SAMPLE_PTS = 41;\n\n\n// initialize class names\nvoid initGlobals () {\n  CLASS_NAMES.push_back(\"car\");\n  CLASS_NAMES.push_back(\"pedestrian\");\n  CLASS_NAMES.push_back(\"cyclist\");\n}\n\n/*=======================================================================\nDATA TYPES FOR EVALUATION\n=======================================================================*/\n\n// holding data needed for precision-recall and precision-aos\nstruct tPrData {\n  vector<double> v;           // detection score for computing score thresholds\n  double         similarity;  // orientation similarity\n  int32_t        tp;          // true positives\n  int32_t        fp;          // false positives\n  int32_t        fn;          // false negatives\n  tPrData () :\n    similarity(0), tp(0), fp(0), fn(0) {}\n};\n\n// holding bounding boxes for ground truth and detections\nstruct tBox {\n  string  type;     // object type as car, pedestrian or cyclist,...\n  double   x1;      // left corner\n  double   y1;      // top corner\n  double   x2;      // right corner\n  double   y2;      // bottom corner\n  double   alpha;   // image orientation\n  tBox (string type, double x1,double y1,double x2,double y2,double alpha) :\n    type(type),x1(x1),y1(y1),x2(x2),y2(y2),alpha(alpha) {}\n};\n\n// holding ground truth data\nstruct tGroundtruth {\n  tBox    box;        // object type, box, orientation\n  double  truncation; // truncation 0..1\n  int32_t occlusion;  // occlusion 0,1,2 (non, partly, fully)\n  double ry;\n  double  t1, t2, t3;\n  double h, w, l;\n  tGroundtruth () :\n    box(tBox(\"invalild\",-1,-1,-1,-1,-10)),truncation(-1),occlusion(-1) {}\n  tGroundtruth (tBox box,double truncation,int32_t occlusion) :\n    box(box),truncation(truncation),occlusion(occlusion) {}\n  tGroundtruth (string type,double x1,double y1,double x2,double y2,double alpha,double truncation,int32_t occlusion) :\n    box(tBox(type,x1,y1,x2,y2,alpha)),truncation(truncation),occlusion(occlusion) {}\n};\n\n// holding detection data\nstruct tDetection {\n  tBox    box;    // object type, box, orientation\n  double  thresh; // detection score\n  double  ry;\n  double  t1, t2, t3;\n  double  h, w, l;\n  tDetection ():\n    box(tBox(\"invalid\",-1,-1,-1,-1,-10)),thresh(-1000) {}\n  tDetection (tBox box,double thresh) :\n    box(box),thresh(thresh) {}\n  tDetection (string type,double x1,double y1,double x2,double y2,double alpha,double thresh) :\n    box(tBox(type,x1,y1,x2,y2,alpha)),thresh(thresh) {}\n};\n\n\n/*=======================================================================\nFUNCTIONS TO LOAD DETECTION AND GROUND TRUTH DATA ONCE, SAVE RESULTS\n=======================================================================*/\nvector<int32_t> indices;\n\nvector<tDetection> loadDetections(string file_name, bool &compute_aos,\n        vector<bool> &eval_image, vector<bool> &eval_ground,\n        vector<bool> &eval_3d, bool &success) {\n\n  // holds all detections (ignored detections are indicated by an index vector\n  vector<tDetection> detections;\n  FILE *fp = fopen(file_name.c_str(),\"r\");\n  if (!fp) {\n    success = false;\n    return detections;\n  }\n  while (!feof(fp)) {\n    tDetection d;\n    double trash;\n    char str[255];\n    if (fscanf(fp, \"%s %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf\",\n                   str, &trash, &trash, &d.box.alpha, &d.box.x1, &d.box.y1,\n                   &d.box.x2, &d.box.y2, &d.h, &d.w, &d.l, &d.t1, &d.t2, &d.t3,\n                   &d.ry, &d.thresh)==16) {\n\n        // d.thresh = 1;\n      d.box.type = str;\n      detections.push_back(d);\n\n      // orientation=-10 is invalid, AOS is not evaluated if at least one orientation is invalid\n      if(d.box.alpha == -10)\n        compute_aos = false;\n\n      // a class is only evaluated if it is detected at least once\n      for (int c = 0; c < NUM_CLASS; c++) {\n        if (!strcasecmp(d.box.type.c_str(), CLASS_NAMES[c].c_str())) {\n          if (!eval_image[c] && d.box.x1 >= 0)\n            eval_image[c] = true;\n          if (!eval_ground[c] && d.t1 != -1000)\n            eval_ground[c] = true;\n          if (!eval_3d[c] && d.t2 != -1000)\n            eval_3d[c] = true;\n          break;\n        }\n      }\n    }\n  }\n  fclose(fp);\n  success = true;\n  return detections;\n}\n\nvector<tGroundtruth> loadGroundtruth(string file_name,bool &success) {\n\n  // holds all ground truth (ignored ground truth is indicated by an index vector\n  vector<tGroundtruth> groundtruth;\n  FILE *fp = fopen(file_name.c_str(),\"r\");\n  if (!fp) {\n    success = false;\n    return groundtruth;\n  }\n  while (!feof(fp)) {\n    tGroundtruth g;\n    char str[255];\n    if (fscanf(fp, \"%s %lf %d %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf\",\n                   str, &g.truncation, &g.occlusion, &g.box.alpha,\n                   &g.box.x1,   &g.box.y1,     &g.box.x2,    &g.box.y2,\n                   &g.h,      &g.w,        &g.l,       &g.t1,\n                   &g.t2,      &g.t3,        &g.ry )==15) {\n      g.box.type = str;\n      groundtruth.push_back(g);\n    }\n  }\n  fclose(fp);\n  success = true;\n  return groundtruth;\n}\n\nvoid saveStats (const vector<double> &precision, const vector<double> &aos, FILE *fp_det, FILE *fp_ori) {\n\n  // save precision to file\n  if(precision.empty())\n    return;\n  for (int32_t i=0; i<precision.size(); i++)\n    fprintf(fp_det,\"%f \",precision[i]);\n  fprintf(fp_det,\"\\n\");\n\n  // save orientation similarity, only if there were no invalid orientation entries in submission (alpha=-10)\n  if(aos.empty())\n    return;\n  for (int32_t i=0; i<aos.size(); i++)\n    fprintf(fp_ori,\"%f \",aos[i]);\n  fprintf(fp_ori,\"\\n\");\n}\n\n/*=======================================================================\nEVALUATION HELPER FUNCTIONS\n=======================================================================*/\n\n// criterion defines whether the overlap is computed with respect to both areas (ground truth and detection)\n// or with respect to box a or b (detection and \"dontcare\" areas)\ninline double imageBoxOverlap(tBox a, tBox b, int32_t criterion=-1){\n\n  // overlap is invalid in the beginning\n  double o = -1;\n\n  // get overlapping area\n  double x1 = max(a.x1, b.x1);\n  double y1 = max(a.y1, b.y1);\n  double x2 = min(a.x2, b.x2);\n  double y2 = min(a.y2, b.y2);\n\n  // compute width and height of overlapping area\n  double w = x2-x1;\n  double h = y2-y1;\n\n  // set invalid entries to 0 overlap\n  if(w<=0 || h<=0)\n    return 0;\n\n  // get overlapping areas\n  double inter = w*h;\n  double a_area = (a.x2-a.x1) * (a.y2-a.y1);\n  double b_area = (b.x2-b.x1) * (b.y2-b.y1);\n\n  // intersection over union overlap depending on users choice\n  if(criterion==-1)     // union\n    o = inter / (a_area+b_area-inter);\n  else if(criterion==0) // bbox_a\n    o = inter / a_area;\n  else if(criterion==1) // bbox_b\n    o = inter / b_area;\n\n  // overlap\n  return o;\n}\n\ninline double imageBoxOverlap(tDetection a, tGroundtruth b, int32_t criterion=-1){\n  return imageBoxOverlap(a.box, b.box, criterion);\n}\n\n// compute polygon of an oriented bounding box\ntemplate <typename T>\nPolygon toPolygon(const T& g) {\n    using namespace boost::numeric::ublas;\n    using namespace boost::geometry;\n    matrix<double> mref(2, 2);\n    mref(0, 0) = cos(g.ry); mref(0, 1) = sin(g.ry);\n    mref(1, 0) = -sin(g.ry); mref(1, 1) = cos(g.ry);\n\n    static int count = 0;\n    matrix<double> corners(2, 4);\n    double data[] = {g.l / 2, g.l / 2, -g.l / 2, -g.l / 2,\n                     g.w / 2, -g.w / 2, -g.w / 2, g.w / 2};\n    std::copy(data, data + 8, corners.data().begin());\n    matrix<double> gc = prod(mref, corners);\n    for (int i = 0; i < 4; ++i) {\n        gc(0, i) += g.t1;\n        gc(1, i) += g.t3;\n    }\n\n    double points[][2] = {{gc(0, 0), gc(1, 0)},{gc(0, 1), gc(1, 1)},{gc(0, 2), gc(1, 2)},{gc(0, 3), gc(1, 3)},{gc(0, 0), gc(1, 0)}};\n    Polygon poly;\n    append(poly, points);\n    return poly;\n}\n\n// measure overlap between bird's eye view bounding boxes, parametrized by (ry, l, w, tx, tz)\ninline double groundBoxOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {\n    using namespace boost::geometry;\n    Polygon gp = toPolygon(g);\n    Polygon dp = toPolygon(d);\n\n    std::vector<Polygon> in, un;\n    intersection(gp, dp, in);\n    union_(gp, dp, un);\n\n    double inter_area = in.empty() ? 0 : area(in.front());\n    double union_area = area(un.front());\n    double o;\n    if(criterion==-1)     // union\n        o = inter_area / union_area;\n    else if(criterion==0) // bbox_a\n        o = inter_area / area(dp);\n    else if(criterion==1) // bbox_b\n        o = inter_area / area(gp);\n\n    return o;\n}\n\n// measure overlap between 3D bounding boxes, parametrized by (ry, h, w, l, tx, ty, tz)\ninline double box3DOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {\n    using namespace boost::geometry;\n    Polygon gp = toPolygon(g);\n    Polygon dp = toPolygon(d);\n\n    std::vector<Polygon> in, un;\n    intersection(gp, dp, in);\n    union_(gp, dp, un);\n\n    double ymax = min(d.t2, g.t2);\n    double ymin = max(d.t2 - d.h, g.t2 - g.h);\n\n    double inter_area = in.empty() ? 0 : area(in.front());\n    double inter_vol = inter_area * max(0.0, ymax - ymin);\n\n    double det_vol = d.h * d.l * d.w;\n    double gt_vol = g.h * g.l * g.w;\n\n    double o;\n    if(criterion==-1)     // union\n        o = inter_vol / (det_vol + gt_vol - inter_vol);\n    else if(criterion==0) // bbox_a\n        o = inter_vol / det_vol;\n    else if(criterion==1) // bbox_b\n        o = inter_vol / gt_vol;\n\n    return o;\n}\n\nvector<double> getThresholds(vector<double> &v, double n_groundtruth){\n\n  // holds scores needed to compute N_SAMPLE_PTS recall values\n  vector<double> t;\n\n  // sort scores in descending order\n  // (highest score is assumed to give best/most confident detections)\n  sort(v.begin(), v.end(), greater<double>());\n\n  // get scores for linearly spaced recall\n  double current_recall = 0;\n  for(int32_t i=0; i<v.size(); i++){\n\n    // check if right-hand-side recall with respect to current recall is close than left-hand-side one\n    // in this case, skip the current detection score\n    double l_recall, r_recall, recall;\n    l_recall = (double)(i+1)/n_groundtruth;\n    if(i<(v.size()-1))\n      r_recall = (double)(i+2)/n_groundtruth;\n    else\n      r_recall = l_recall;\n\n    if( (r_recall-current_recall) < (current_recall-l_recall) && i<(v.size()-1))\n      continue;\n\n    // left recall is the best approximation, so use this and goto next recall step for approximation\n    recall = l_recall;\n\n    // the next recall step was reached\n    t.push_back(v[i]);\n    current_recall += 1.0/(N_SAMPLE_PTS-1.0);\n  }\n  return t;\n}\n\nvoid cleanData(CLASSES current_class, const vector<tGroundtruth> &gt, const vector<tDetection> &det, vector<int32_t> &ignored_gt, vector<tGroundtruth> &dc, vector<int32_t> &ignored_det, int32_t &n_gt, DIFFICULTY difficulty){\n\n  // extract ground truth bounding boxes for current evaluation class\n  for(int32_t i=0;i<gt.size(); i++){\n\n    // only bounding boxes with a minimum height are used for evaluation\n    double height = gt[i].box.y2 - gt[i].box.y1;\n\n    // neighboring classes are ignored (\"van\" for \"car\" and \"person_sitting\" for \"pedestrian\")\n    // (lower/upper cases are ignored)\n    int32_t valid_class;\n\n    // all classes without a neighboring class\n    if(!strcasecmp(gt[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))\n      valid_class = 1;\n\n    // classes with a neighboring class\n    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), \"Pedestrian\") && !strcasecmp(\"Person_sitting\", gt[i].box.type.c_str()))\n      valid_class = 0;\n    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), \"Car\") && !strcasecmp(\"Van\", gt[i].box.type.c_str()))\n      valid_class = 0;\n\n    // classes not used for evaluation\n    else\n      valid_class = -1;\n\n    // ground truth is ignored, if occlusion, truncation exceeds the difficulty or ground truth is too small\n    // (doesn't count as FN nor TP, although detections may be assigned)\n    bool ignore = false;\n    if(gt[i].occlusion>MAX_OCCLUSION[difficulty] || gt[i].truncation>MAX_TRUNCATION[difficulty] || height<MIN_HEIGHT[difficulty])\n      ignore = true;\n\n    // set ignored vector for ground truth\n    // current class and not ignored (total no. of ground truth is detected for recall denominator)\n    if(valid_class==1 && !ignore){\n      ignored_gt.push_back(0);\n      n_gt++;\n    }\n\n    // neighboring class, or current class but ignored\n    else if(valid_class==0 || (ignore && valid_class==1))\n      ignored_gt.push_back(1);\n\n    // all other classes which are FN in the evaluation\n    else\n      ignored_gt.push_back(-1);\n  }\n\n  // extract dontcare areas\n  for(int32_t i=0;i<gt.size(); i++)\n    if(!strcasecmp(\"DontCare\", gt[i].box.type.c_str()))\n      dc.push_back(gt[i]);\n\n  // extract detections bounding boxes of the current class\n  for(int32_t i=0;i<det.size(); i++){\n\n    // neighboring classes are not evaluated\n    int32_t valid_class;\n    if(!strcasecmp(det[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))\n      valid_class = 1;\n    else\n      valid_class = -1;\n\n    int32_t height = fabs(det[i].box.y1 - det[i].box.y2);\n    // set ignored vector for detections\n    if(height<MIN_HEIGHT[difficulty])\n      ignored_det.push_back(1);\n    else if(valid_class==1)\n      ignored_det.push_back(0);\n    else\n      ignored_det.push_back(-1);\n  }\n}\n\ntPrData computeStatistics(CLASSES current_class, const vector<tGroundtruth> &gt,\n        const vector<tDetection> &det, const vector<tGroundtruth> &dc,\n        const vector<int32_t> &ignored_gt, const vector<int32_t>  &ignored_det,\n        bool compute_fp, double (*boxoverlap)(tDetection, tGroundtruth, int32_t),\n        METRIC metric, bool compute_aos=false, double thresh=0, bool debug=false){\n\n  tPrData stat = tPrData();\n  const double NO_DETECTION = -10000000;\n  vector<double> delta;            // holds angular difference for TPs (needed for AOS evaluation)\n  vector<bool> assigned_detection; // holds wether a detection was assigned to a valid or ignored ground truth\n  assigned_detection.assign(det.size(), false);\n  vector<bool> ignored_threshold;\n  ignored_threshold.assign(det.size(), false); // holds detections with a threshold lower than thresh if FP are computed\n\n  // detections with a low score are ignored for computing precision (needs FP)\n  if(compute_fp)\n    for(int32_t i=0; i<det.size(); i++)\n      if(det[i].thresh<thresh)\n        ignored_threshold[i] = true;\n\n  // evaluate all ground truth boxes\n  for(int32_t i=0; i<gt.size(); i++){\n\n    // this ground truth is not of the current or a neighboring class and therefore ignored\n    if(ignored_gt[i]==-1)\n      continue;\n\n    /*=======================================================================\n    find candidates (overlap with ground truth > 0.5) (logical len(det))\n    =======================================================================*/\n    int32_t det_idx          = -1;\n    double valid_detection = NO_DETECTION;\n    double max_overlap     = 0;\n\n    // search for a possible detection\n    bool assigned_ignored_det = false;\n    for(int32_t j=0; j<det.size(); j++){\n\n      // detections not of the current class, already assigned or with a low threshold are ignored\n      if(ignored_det[j]==-1)\n        continue;\n      if(assigned_detection[j])\n        continue;\n      if(ignored_threshold[j])\n        continue;\n\n      // find the maximum score for the candidates and get idx of respective detection\n      double overlap = boxoverlap(det[j], gt[i], -1);\n\n      // for computing recall thresholds, the candidate with highest score is considered\n      if(!compute_fp && overlap>MIN_OVERLAP[metric][current_class] && det[j].thresh>valid_detection){\n        det_idx         = j;\n        valid_detection = det[j].thresh;\n      }\n\n      // for computing pr curve values, the candidate with the greatest overlap is considered\n      // if the greatest overlap is an ignored detection (min_height), the overlapping detection is used\n      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && (overlap>max_overlap || assigned_ignored_det) && ignored_det[j]==0){\n        max_overlap     = overlap;\n        det_idx         = j;\n        valid_detection = 1;\n        assigned_ignored_det = false;\n      }\n      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && valid_detection==NO_DETECTION && ignored_det[j]==1){\n        det_idx              = j;\n        valid_detection      = 1;\n        assigned_ignored_det = true;\n      }\n    }\n\n    /*=======================================================================\n    compute TP, FP and FN\n    =======================================================================*/\n\n    // nothing was assigned to this valid ground truth\n    if(valid_detection==NO_DETECTION && ignored_gt[i]==0) {\n      stat.fn++;\n    }\n\n    // only evaluate valid ground truth <=> detection assignments (considering difficulty level)\n    else if(valid_detection!=NO_DETECTION && (ignored_gt[i]==1 || ignored_det[det_idx]==1))\n      assigned_detection[det_idx] = true;\n\n    // found a valid true positive\n    else if(valid_detection!=NO_DETECTION){\n\n      // write highest score to threshold vector\n      stat.tp++;\n      stat.v.push_back(det[det_idx].thresh);\n\n      // compute angular difference of detection and ground truth if valid detection orientation was provided\n      if(compute_aos)\n        delta.push_back(gt[i].box.alpha - det[det_idx].box.alpha);\n\n      // clean up\n      assigned_detection[det_idx] = true;\n    }\n  }\n\n  // if FP are requested, consider stuff area\n  if(compute_fp){\n\n    // count fp\n    for(int32_t i=0; i<det.size(); i++){\n\n      // count false positives if required (height smaller than required is ignored (ignored_det==1)\n      if(!(assigned_detection[i] || ignored_det[i]==-1 || ignored_det[i]==1 || ignored_threshold[i]))\n        stat.fp++;\n    }\n\n    // do not consider detections overlapping with stuff area\n    int32_t nstuff = 0;\n    for(int32_t i=0; i<dc.size(); i++){\n      for(int32_t j=0; j<det.size(); j++){\n\n        // detections not of the current class, already assigned, with a low threshold or a low minimum height are ignored\n        if(assigned_detection[j])\n          continue;\n        if(ignored_det[j]==-1 || ignored_det[j]==1)\n          continue;\n        if(ignored_threshold[j])\n          continue;\n\n        // compute overlap and assign to stuff area, if overlap exceeds class specific value\n        double overlap = boxoverlap(det[j], dc[i], 0);\n        if(overlap>MIN_OVERLAP[metric][current_class]){\n          assigned_detection[j] = true;\n          nstuff++;\n        }\n      }\n    }\n\n    // FP = no. of all not to ground truth assigned detections - detections assigned to stuff areas\n    stat.fp -= nstuff;\n\n    // if all orientation values are valid, the AOS is computed\n    if(compute_aos){\n      vector<double> tmp;\n\n      // FP have a similarity of 0, for all TP compute AOS\n      tmp.assign(stat.fp, 0);\n      for(int32_t i=0; i<delta.size(); i++)\n        tmp.push_back((1.0+cos(delta[i]))/2.0);\n\n      // be sure, that all orientation deltas are computed\n      assert(tmp.size()==stat.fp+stat.tp);\n      assert(delta.size()==stat.tp);\n\n      // get the mean orientation similarity for this image\n      if(stat.tp>0 || stat.fp>0)\n        stat.similarity = accumulate(tmp.begin(), tmp.end(), 0.0);\n\n      // there was neither a FP nor a TP, so the similarity is ignored in the evaluation\n      else\n        stat.similarity = -1;\n    }\n  }\n  return stat;\n}\n\n/*=======================================================================\nEVALUATE CLASS-WISE\n=======================================================================*/\n\nbool eval_class (FILE *fp_det, FILE *fp_ori, CLASSES current_class,\n        const vector< vector<tGroundtruth> > &groundtruth,\n        const vector< vector<tDetection> > &detections, bool compute_aos,\n        double (*boxoverlap)(tDetection, tGroundtruth, int32_t),\n        vector<double> &precision, vector<double> &aos,\n        DIFFICULTY difficulty, METRIC metric) {\n    assert(groundtruth.size() == detections.size());\n\n  // init\n  int32_t n_gt=0;                                     // total no. of gt (denominator of recall)\n  vector<double> v, thresholds;                       // detection scores, evaluated for recall discretization\n  vector< vector<int32_t> > ignored_gt, ignored_det;  // index of ignored gt detection for current class/difficulty\n  vector< vector<tGroundtruth> > dontcare;            // index of dontcare areas, included in ground truth\n\n  // for all test images do\n  for (int32_t i=0; i<groundtruth.size(); i++){\n\n    // holds ignored ground truth, ignored detections and dontcare areas for current frame\n    vector<int32_t> i_gt, i_det;\n    vector<tGroundtruth> dc;\n\n    // only evaluate objects of current class and ignore occluded, truncated objects\n    cleanData(current_class, groundtruth[i], detections[i], i_gt, dc, i_det, n_gt, difficulty);\n    ignored_gt.push_back(i_gt);\n    ignored_det.push_back(i_det);\n    dontcare.push_back(dc);\n\n    // compute statistics to get recall values\n    tPrData pr_tmp = tPrData();\n    pr_tmp = computeStatistics(current_class, groundtruth[i], detections[i], dc, i_gt, i_det, false, boxoverlap, metric);\n\n    // add detection scores to vector over all images\n    for(int32_t j=0; j<pr_tmp.v.size(); j++)\n      v.push_back(pr_tmp.v[j]);\n  }\n\n  // get scores that must be evaluated for recall discretization\n  thresholds = getThresholds(v, n_gt);\n\n  // compute TP,FP,FN for relevant scores\n  vector<tPrData> pr;\n  pr.assign(thresholds.size(),tPrData());\n  for (int32_t i=0; i<groundtruth.size(); i++){\n\n    // for all scores/recall thresholds do:\n    for(int32_t t=0; t<thresholds.size(); t++){\n      tPrData tmp = tPrData();\n      tmp = computeStatistics(current_class, groundtruth[i], detections[i], dontcare[i],\n                              ignored_gt[i], ignored_det[i], true, boxoverlap, metric,\n                              compute_aos, thresholds[t], t==38);\n\n      // add no. of TP, FP, FN, AOS for current frame to total evaluation for current threshold\n      pr[t].tp += tmp.tp;\n      pr[t].fp += tmp.fp;\n      pr[t].fn += tmp.fn;\n      if(tmp.similarity!=-1)\n        pr[t].similarity += tmp.similarity;\n    }\n  }\n\n  // compute recall, precision and AOS\n  vector<double> recall;\n  precision.assign(N_SAMPLE_PTS, 0);\n  if(compute_aos)\n    aos.assign(N_SAMPLE_PTS, 0);\n  double r=0;\n  for (int32_t i=0; i<thresholds.size(); i++){\n    r = pr[i].tp/(double)(pr[i].tp + pr[i].fn);\n    recall.push_back(r);\n    precision[i] = pr[i].tp/(double)(pr[i].tp + pr[i].fp);\n    if(compute_aos)\n      aos[i] = pr[i].similarity/(double)(pr[i].tp + pr[i].fp);\n  }\n\n  // filter precision and AOS using max_{i..end}(precision)\n  for (int32_t i=0; i<thresholds.size(); i++){\n    precision[i] = *max_element(precision.begin()+i, precision.end());\n    if(compute_aos)\n      aos[i] = *max_element(aos.begin()+i, aos.end());\n  }\n\n  // save statisics and finish with success\n  saveStats(precision, aos, fp_det, fp_ori);\n    return true;\n}\n\nvoid saveAndPlotPlots(string dir_name,string file_name,string obj_type,vector<double> vals[],bool is_aos){\n\n  char command[1024];\n\n  // save plot data to file\n  FILE *fp = fopen((dir_name + \"/\" + file_name + \".txt\").c_str(),\"w\");\n  printf(\"save %s\\n\", (dir_name + \"/\" + file_name + \".txt\").c_str());\n  for (int32_t i=0; i<(int)N_SAMPLE_PTS; i++)\n    fprintf(fp,\"%f %f %f %f\\n\",(double)i/(N_SAMPLE_PTS-1.0),vals[0][i],vals[1][i],vals[2][i]);\n  fclose(fp);\n\n  // create png + eps\n  for (int32_t j=0; j<2; j++) {\n\n    // open file\n    FILE *fp = fopen((dir_name + \"/\" + file_name + \".gp\").c_str(),\"w\");\n\n    // save gnuplot instructions\n    if (j==0) {\n      fprintf(fp,\"set term png size 450,315 font \\\"Helvetica\\\" 11\\n\");\n      fprintf(fp,\"set output \\\"%s.png\\\"\\n\",file_name.c_str());\n    } else {\n      fprintf(fp,\"set term postscript eps enhanced color font \\\"Helvetica\\\" 20\\n\");\n      fprintf(fp,\"set output \\\"%s.eps\\\"\\n\",file_name.c_str());\n    }\n\n    // set labels and ranges\n    fprintf(fp,\"set size ratio 0.7\\n\");\n    fprintf(fp,\"set xrange [0:1]\\n\");\n    fprintf(fp,\"set yrange [0:1]\\n\");\n    fprintf(fp,\"set xlabel \\\"Recall\\\"\\n\");\n    if (!is_aos) fprintf(fp,\"set ylabel \\\"Precision\\\"\\n\");\n    else         fprintf(fp,\"set ylabel \\\"Orientation Similarity\\\"\\n\");\n    obj_type[0] = toupper(obj_type[0]);\n    fprintf(fp,\"set title \\\"%s\\\"\\n\",obj_type.c_str());\n\n    // line width\n    int32_t   lw = 5;\n    if (j==0) lw = 3;\n\n    // plot error curve\n    fprintf(fp,\"plot \");\n    fprintf(fp,\"\\\"%s.txt\\\" using 1:2 title 'Easy' with lines ls 1 lw %d,\",file_name.c_str(),lw);\n    fprintf(fp,\"\\\"%s.txt\\\" using 1:3 title 'Moderate' with lines ls 2 lw %d,\",file_name.c_str(),lw);\n    fprintf(fp,\"\\\"%s.txt\\\" using 1:4 title 'Hard' with lines ls 3 lw %d\",file_name.c_str(),lw);\n\n    // close file\n    fclose(fp);\n\n    // run gnuplot => create png + eps\n    sprintf(command,\"cd %s; gnuplot %s\",dir_name.c_str(),(file_name + \".gp\").c_str());\n    system(command);\n  }\n\n  // create pdf and crop\n  sprintf(command,\"cd %s; ps2pdf %s.eps %s_large.pdf\",dir_name.c_str(),file_name.c_str(),file_name.c_str());\n  system(command);\n  sprintf(command,\"cd %s; pdfcrop %s_large.pdf %s.pdf\",dir_name.c_str(),file_name.c_str(),file_name.c_str());\n  system(command);\n  sprintf(command,\"cd %s; rm %s_large.pdf\",dir_name.c_str(),file_name.c_str());\n  system(command);\n}\n\nbool eval(string result_sha,Mail* mail){\n\n  // set some global parameters\n  initGlobals();\n\n  // ground truth and result directories\n  string gt_dir         = \"data/object/label_2\";\n  string result_dir     = \"results/\" + result_sha;\n  string plot_dir       = result_dir + \"/plot\";\n\n  // create output directories\n  system((\"mkdir \" + plot_dir).c_str());\n\n  // hold detections and ground truth in memory\n  vector< vector<tGroundtruth> > groundtruth;\n  vector< vector<tDetection> >   detections;\n\n  // holds wether orientation similarity shall be computed (might be set to false while loading detections)\n  // and which labels where provided by this submission\n  bool compute_aos=true;\n  vector<bool> eval_image(NUM_CLASS, false);\n  vector<bool> eval_ground(NUM_CLASS, false);\n  vector<bool> eval_3d(NUM_CLASS, false);\n\n  // for all images read groundtruth and detections\n  mail->msg(\"Loading detections...\");\n  for (int32_t i=0; i<N_TESTIMAGES; i++) {\n\n    // file name\n    char file_name[256];\n    sprintf(file_name,\"%06d.txt\",indices.at(i));\n\n    // read ground truth and result poses\n    bool gt_success,det_success;\n    vector<tGroundtruth> gt   = loadGroundtruth(gt_dir + \"/\" + file_name,gt_success);\n    vector<tDetection>   det  = loadDetections(result_dir + \"/data/\" + file_name,\n            compute_aos, eval_image, eval_ground, eval_3d, det_success);\n    groundtruth.push_back(gt);\n    detections.push_back(det);\n\n    // check for errors\n    if (!gt_success) {\n      mail->msg(\"ERROR: Couldn't read: %s of ground truth. Please write me an email!\", file_name);\n      return false;\n    }\n    if (!det_success) {\n      mail->msg(\"ERROR: Couldn't read: %s\", file_name);\n      return false;\n    }\n  }\n  mail->msg(\"  done.\");\n\n  // holds pointers for result files\n  FILE *fp_det=0, *fp_ori=0;\n\n  // eval image 2D bounding boxes\n  for (int c = 0; c < NUM_CLASS; c++) {\n    CLASSES cls = (CLASSES)c;\n    if (eval_image[c]) {\n      fp_det = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_detection.txt\").c_str(), \"w\");\n      if(compute_aos)\n        fp_ori = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_orientation.txt\").c_str(),\"w\");\n      vector<double> precision[3], aos[3];\n      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[0], aos[0], EASY, IMAGE)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[1], aos[1], MODERATE, IMAGE)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[2], aos[2], HARD, IMAGE)) {\n        mail->msg(\"%s evaluation failed.\", CLASS_NAMES[c].c_str());\n        return false;\n      }\n      fclose(fp_det);\n      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_detection\", CLASS_NAMES[c], precision, 0);\n      if(compute_aos){\n        saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_orientation\", CLASS_NAMES[c], aos, 1);\n        fclose(fp_ori);\n      }\n    }\n  }\n\n  // don't evaluate AOS for birdview boxes and 3D boxes\n  compute_aos = false;\n\n  // eval bird's eye view bounding boxes\n  for (int c = 0; c < NUM_CLASS; c++) {\n    CLASSES cls = (CLASSES)c;\n    if (eval_ground[c]) {\n      fp_det = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_detection_ground.txt\").c_str(), \"w\");\n      vector<double> precision[3], aos[3];\n      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[0], aos[0], EASY, GROUND)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[1], aos[1], MODERATE, GROUND)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[2], aos[2], HARD, GROUND)) {\n        mail->msg(\"%s evaluation failed.\", CLASS_NAMES[c].c_str());\n        return false;\n      }\n      fclose(fp_det);\n      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_detection_ground\", CLASS_NAMES[c], precision, 0);\n    }\n  }\n\n  // eval 3D bounding boxes\n  for (int c = 0; c < NUM_CLASS; c++) {\n    CLASSES cls = (CLASSES)c;\n    if (eval_3d[c]) {\n      fp_det = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_detection_3d.txt\").c_str(), \"w\");\n      vector<double> precision[3], aos[3];\n      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[0], aos[0], EASY, BOX3D)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[1], aos[1], MODERATE, BOX3D)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[2], aos[2], HARD, BOX3D)) {\n        mail->msg(\"%s evaluation failed.\", CLASS_NAMES[c].c_str());\n        return false;\n      }\n      fclose(fp_det);\n      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_detection_3d\", CLASS_NAMES[c], precision, 0);\n    }\n  }\n\n  // success\n  return true;\n}\n\nint32_t main (int32_t argc,char *argv[]) {\n\n  // we need 2 or 4 arguments!\n  if (argc!=2 && argc!=4) {\n    cout << \"Usage: ./eval_detection result_sha [user_sha email]\" << endl;\n    return 1;\n  }\n\n  // read arguments\n  string result_sha = argv[1];\n\n  // init notification mail\n  Mail *mail;\n  if (argc==4) mail = new Mail(argv[3]);\n  else         mail = new Mail();\n  mail->msg(\"Thank you for participating in our evaluation!\");\n\n  // run evaluation\n  if (eval(result_sha,mail)) {\n    mail->msg(\"Your evaluation results are available at:\");\n    mail->msg(\"http://www.cvlibs.net/datasets/kitti/user_submit_check_login.php?benchmark=object&user=%s&result=%s\",argv[2], result_sha.c_str());\n  } else {\n    system((\"rm -r results/\" + result_sha).c_str());\n    mail->msg(\"An error occured while processing your results.\");\n    mail->msg(\"Please make sure that the data in your zip archive has the right format!\");\n  }\n\n  // send mail and exit\n  delete mail;\n\n  return 0;\n}\n\n\n"
  },
  {
    "path": "tools/kitti-eval/evaluate_object_3d_offline.cpp",
    "content": "#include <iostream>\n#include <algorithm>\n#include <stdio.h>\n#include <math.h>\n#include <vector>\n#include <numeric>\n#include <strings.h>\n#include <assert.h>\n\n#include <dirent.h>\n\n#include <boost/numeric/ublas/matrix.hpp>\n#include <boost/numeric/ublas/io.hpp>\n\n#include <boost/geometry.hpp>\n#include <boost/geometry/geometries/point_xy.hpp>\n#include <boost/geometry/geometries/polygon.hpp>\n#include <boost/geometry/geometries/adapted/c_array.hpp>\n\n#include \"mail.h\"\n\nBOOST_GEOMETRY_REGISTER_C_ARRAY_CS(cs::cartesian)\n\ntypedef boost::geometry::model::polygon<boost::geometry::model::d2::point_xy<double> > Polygon;\n\n\nusing namespace std;\n\n/*=======================================================================\nSTATIC EVALUATION PARAMETERS\n=======================================================================*/\n\n// holds the number of test images on the server\nconst int32_t N_TESTIMAGES = 7518;\n\n// easy, moderate and hard evaluation level\nenum DIFFICULTY{EASY=0, MODERATE=1, HARD=2};\n\n// evaluation metrics: image, ground or 3D\nenum METRIC{IMAGE=0, GROUND=1, BOX3D=2};\n\n// evaluation parameter\nconst int32_t MIN_HEIGHT[3]     = {40, 25, 25};     // minimum height for evaluated groundtruth/detections\nconst int32_t MAX_OCCLUSION[3]  = {0, 1, 2};        // maximum occlusion level of the groundtruth used for evaluation\nconst double  MAX_TRUNCATION[3] = {0.15, 0.3, 0.5}; // maximum truncation level of the groundtruth used for evaluation\n\n// evaluated object classes\nenum CLASSES{CAR=0, PEDESTRIAN=1, CYCLIST=2};\nconst int NUM_CLASS = 3;\n\n// parameters varying per class\nvector<string> CLASS_NAMES;\n// the minimum overlap required for 2D evaluation on the image/ground plane and 3D evaluation\n// const double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.5, 0.25, 0.25}, {0.5, 0.25, 0.25}};\nconst double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.7, 0.5, 0.5}, {0.7, 0.5, 0.5}};\n\n// no. of recall steps that should be evaluated (discretized)\nconst double N_SAMPLE_PTS = 41;\n\n// initialize class names\nvoid initGlobals () {\n  CLASS_NAMES.push_back(\"car\");\n  CLASS_NAMES.push_back(\"pedestrian\");\n  CLASS_NAMES.push_back(\"cyclist\");\n}\n\n/*=======================================================================\nDATA TYPES FOR EVALUATION\n=======================================================================*/\n\n// holding data needed for precision-recall and precision-aos\nstruct tPrData {\n  vector<double> v;           // detection score for computing score thresholds\n  double         similarity;  // orientation similarity\n  int32_t        tp;          // true positives\n  int32_t        fp;          // false positives\n  int32_t        fn;          // false negatives\n  tPrData () :\n    similarity(0), tp(0), fp(0), fn(0) {}\n};\n\n// holding bounding boxes for ground truth and detections\nstruct tBox {\n  string  type;     // object type as car, pedestrian or cyclist,...\n  double   x1;      // left corner\n  double   y1;      // top corner\n  double   x2;      // right corner\n  double   y2;      // bottom corner\n  double   alpha;   // image orientation\n  tBox (string type, double x1,double y1,double x2,double y2,double alpha) :\n    type(type),x1(x1),y1(y1),x2(x2),y2(y2),alpha(alpha) {}\n};\n\n// holding ground truth data\nstruct tGroundtruth {\n  tBox    box;        // object type, box, orientation\n  double  truncation; // truncation 0..1\n  int32_t occlusion;  // occlusion 0,1,2 (non, partly, fully)\n  double ry;\n  double  t1, t2, t3;\n  double h, w, l;\n  tGroundtruth () :\n    box(tBox(\"invalild\",-1,-1,-1,-1,-10)),truncation(-1),occlusion(-1) {}\n  tGroundtruth (tBox box,double truncation,int32_t occlusion) :\n    box(box),truncation(truncation),occlusion(occlusion) {}\n  tGroundtruth (string type,double x1,double y1,double x2,double y2,double alpha,double truncation,int32_t occlusion) :\n    box(tBox(type,x1,y1,x2,y2,alpha)),truncation(truncation),occlusion(occlusion) {}\n};\n\n// holding detection data\nstruct tDetection {\n  tBox    box;    // object type, box, orientation\n  double  thresh; // detection score\n  double  ry;\n  double  t1, t2, t3;\n  double  h, w, l;\n  tDetection ():\n    box(tBox(\"invalid\",-1,-1,-1,-1,-10)),thresh(-1000) {}\n  tDetection (tBox box,double thresh) :\n    box(box),thresh(thresh) {}\n  tDetection (string type,double x1,double y1,double x2,double y2,double alpha,double thresh) :\n    box(tBox(type,x1,y1,x2,y2,alpha)),thresh(thresh) {}\n};\n\n\n/*=======================================================================\nFUNCTIONS TO LOAD DETECTION AND GROUND TRUTH DATA ONCE, SAVE RESULTS\n=======================================================================*/\nvector<int32_t> indices;\n\nvector<tDetection> loadDetections(string file_name, bool &compute_aos,\n        vector<bool> &eval_image, vector<bool> &eval_ground,\n        vector<bool> &eval_3d, bool &success) {\n\n  // holds all detections (ignored detections are indicated by an index vector\n  vector<tDetection> detections;\n  FILE *fp = fopen(file_name.c_str(),\"r\");\n  if (!fp) {\n    success = false;\n    return detections;\n  }\n  while (!feof(fp)) {\n    tDetection d;\n    double trash;\n    char str[255];\n    if (fscanf(fp, \"%s %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf\",\n                   str, &trash, &trash, &d.box.alpha, &d.box.x1, &d.box.y1,\n                   &d.box.x2, &d.box.y2, &d.h, &d.w, &d.l, &d.t1, &d.t2, &d.t3,\n                   &d.ry, &d.thresh)==16) {\n\n        // d.thresh = 1;\n      d.box.type = str;\n      detections.push_back(d);\n\n      // orientation=-10 is invalid, AOS is not evaluated if at least one orientation is invalid\n      if(d.box.alpha == -10)\n        compute_aos = false;\n\n      // a class is only evaluated if it is detected at least once\n      for (int c = 0; c < NUM_CLASS; c++) {\n        if (!strcasecmp(d.box.type.c_str(), CLASS_NAMES[c].c_str())) {\n          if (!eval_image[c] && d.box.x1 >= 0)\n            eval_image[c] = true;\n          if (!eval_ground[c] && d.t1 != -1000)\n            eval_ground[c] = true;\n          if (!eval_3d[c] && d.t2 != -1000)\n            eval_3d[c] = true;\n          break;\n        }\n      }\n    }\n  }\n  fclose(fp);\n  success = true;\n  return detections;\n}\n\nvector<tGroundtruth> loadGroundtruth(string file_name,bool &success) {\n\n  // holds all ground truth (ignored ground truth is indicated by an index vector\n  vector<tGroundtruth> groundtruth;\n  FILE *fp = fopen(file_name.c_str(),\"r\");\n  if (!fp) {\n    success = false;\n    return groundtruth;\n  }\n  while (!feof(fp)) {\n    tGroundtruth g;\n    char str[255];\n    if (fscanf(fp, \"%s %lf %d %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf\",\n                   str, &g.truncation, &g.occlusion, &g.box.alpha,\n                   &g.box.x1,   &g.box.y1,     &g.box.x2,    &g.box.y2,\n                   &g.h,      &g.w,        &g.l,       &g.t1,\n                   &g.t2,      &g.t3,        &g.ry )==15) {\n      g.box.type = str;\n      groundtruth.push_back(g);\n    }\n  }\n  fclose(fp);\n  success = true;\n  return groundtruth;\n}\n\nvoid saveStats (const vector<double> &precision, const vector<double> &aos, FILE *fp_det, FILE *fp_ori) {\n\n  // save precision to file\n  if(precision.empty())\n    return;\n  for (int32_t i=0; i<precision.size(); i++)\n    fprintf(fp_det,\"%f \",precision[i]);\n  fprintf(fp_det,\"\\n\");\n\n  // save orientation similarity, only if there were no invalid orientation entries in submission (alpha=-10)\n  if(aos.empty())\n    return;\n  for (int32_t i=0; i<aos.size(); i++)\n    fprintf(fp_ori,\"%f \",aos[i]);\n  fprintf(fp_ori,\"\\n\");\n}\n\n/*=======================================================================\nEVALUATION HELPER FUNCTIONS\n=======================================================================*/\n\n// criterion defines whether the overlap is computed with respect to both areas (ground truth and detection)\n// or with respect to box a or b (detection and \"dontcare\" areas)\ninline double imageBoxOverlap(tBox a, tBox b, int32_t criterion=-1){\n\n  // overlap is invalid in the beginning\n  double o = -1;\n\n  // get overlapping area\n  double x1 = max(a.x1, b.x1);\n  double y1 = max(a.y1, b.y1);\n  double x2 = min(a.x2, b.x2);\n  double y2 = min(a.y2, b.y2);\n\n  // compute width and height of overlapping area\n  double w = x2-x1;\n  double h = y2-y1;\n\n  // set invalid entries to 0 overlap\n  if(w<=0 || h<=0)\n    return 0;\n\n  // get overlapping areas\n  double inter = w*h;\n  double a_area = (a.x2-a.x1) * (a.y2-a.y1);\n  double b_area = (b.x2-b.x1) * (b.y2-b.y1);\n\n  // intersection over union overlap depending on users choice\n  if(criterion==-1)     // union\n    o = inter / (a_area+b_area-inter);\n  else if(criterion==0) // bbox_a\n    o = inter / a_area;\n  else if(criterion==1) // bbox_b\n    o = inter / b_area;\n\n  // overlap\n  return o;\n}\n\ninline double imageBoxOverlap(tDetection a, tGroundtruth b, int32_t criterion=-1){\n  return imageBoxOverlap(a.box, b.box, criterion);\n}\n\n// compute polygon of an oriented bounding box\ntemplate <typename T>\nPolygon toPolygon(const T& g) {\n    using namespace boost::numeric::ublas;\n    using namespace boost::geometry;\n    matrix<double> mref(2, 2);\n    mref(0, 0) = cos(g.ry); mref(0, 1) = sin(g.ry);\n    mref(1, 0) = -sin(g.ry); mref(1, 1) = cos(g.ry);\n\n    static int count = 0;\n    matrix<double> corners(2, 4);\n    double data[] = {g.l / 2, g.l / 2, -g.l / 2, -g.l / 2,\n                     g.w / 2, -g.w / 2, -g.w / 2, g.w / 2};\n    std::copy(data, data + 8, corners.data().begin());\n    matrix<double> gc = prod(mref, corners);\n    for (int i = 0; i < 4; ++i) {\n        gc(0, i) += g.t1;\n        gc(1, i) += g.t3;\n    }\n\n    double points[][2] = {{gc(0, 0), gc(1, 0)},{gc(0, 1), gc(1, 1)},{gc(0, 2), gc(1, 2)},{gc(0, 3), gc(1, 3)},{gc(0, 0), gc(1, 0)}};\n    Polygon poly;\n    append(poly, points);\n    return poly;\n}\n\n// measure overlap between bird's eye view bounding boxes, parametrized by (ry, l, w, tx, tz)\ninline double groundBoxOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {\n    using namespace boost::geometry;\n    Polygon gp = toPolygon(g);\n    Polygon dp = toPolygon(d);\n\n    std::vector<Polygon> in, un;\n    intersection(gp, dp, in);\n    union_(gp, dp, un);\n\n    double inter_area = in.empty() ? 0 : area(in.front());\n    double union_area = area(un.front());\n    double o;\n    if(criterion==-1)     // union\n        o = inter_area / union_area;\n    else if(criterion==0) // bbox_a\n        o = inter_area / area(dp);\n    else if(criterion==1) // bbox_b\n        o = inter_area / area(gp);\n\n    return o;\n}\n\n// measure overlap between 3D bounding boxes, parametrized by (ry, h, w, l, tx, ty, tz)\ninline double box3DOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {\n    using namespace boost::geometry;\n    Polygon gp = toPolygon(g);\n    Polygon dp = toPolygon(d);\n\n    std::vector<Polygon> in, un;\n    intersection(gp, dp, in);\n    union_(gp, dp, un);\n\n    double ymax = min(d.t2, g.t2);\n    double ymin = max(d.t2 - d.h, g.t2 - g.h);\n\n    double inter_area = in.empty() ? 0 : area(in.front());\n    double inter_vol = inter_area * max(0.0, ymax - ymin);\n\n    double det_vol = d.h * d.l * d.w;\n    double gt_vol = g.h * g.l * g.w;\n\n    double o;\n    if(criterion==-1)     // union\n        o = inter_vol / (det_vol + gt_vol - inter_vol);\n    else if(criterion==0) // bbox_a\n        o = inter_vol / det_vol;\n    else if(criterion==1) // bbox_b\n        o = inter_vol / gt_vol;\n\n    return o;\n}\n\nvector<double> getThresholds(vector<double> &v, double n_groundtruth){\n\n  // holds scores needed to compute N_SAMPLE_PTS recall values\n  vector<double> t;\n\n  // sort scores in descending order\n  // (highest score is assumed to give best/most confident detections)\n  sort(v.begin(), v.end(), greater<double>());\n\n  // get scores for linearly spaced recall\n  double current_recall = 0;\n  for(int32_t i=0; i<v.size(); i++){\n\n    // check if right-hand-side recall with respect to current recall is close than left-hand-side one\n    // in this case, skip the current detection score\n    double l_recall, r_recall, recall;\n    l_recall = (double)(i+1)/n_groundtruth;\n    if(i<(v.size()-1))\n      r_recall = (double)(i+2)/n_groundtruth;\n    else\n      r_recall = l_recall;\n\n    if( (r_recall-current_recall) < (current_recall-l_recall) && i<(v.size()-1))\n      continue;\n\n    // left recall is the best approximation, so use this and goto next recall step for approximation\n    recall = l_recall;\n\n    // the next recall step was reached\n    t.push_back(v[i]);\n    current_recall += 1.0/(N_SAMPLE_PTS-1.0);\n  }\n  return t;\n}\n\nvoid cleanData(CLASSES current_class, const vector<tGroundtruth> &gt, const vector<tDetection> &det, vector<int32_t> &ignored_gt, vector<tGroundtruth> &dc, vector<int32_t> &ignored_det, int32_t &n_gt, DIFFICULTY difficulty){\n\n  // extract ground truth bounding boxes for current evaluation class\n  for(int32_t i=0;i<gt.size(); i++){\n\n    // only bounding boxes with a minimum height are used for evaluation\n    double height = gt[i].box.y2 - gt[i].box.y1;\n\n    // neighboring classes are ignored (\"van\" for \"car\" and \"person_sitting\" for \"pedestrian\")\n    // (lower/upper cases are ignored)\n    int32_t valid_class;\n\n    // all classes without a neighboring class\n    if(!strcasecmp(gt[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))\n      valid_class = 1;\n\n    // classes with a neighboring class\n    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), \"Pedestrian\") && !strcasecmp(\"Person_sitting\", gt[i].box.type.c_str()))\n      valid_class = 0;\n    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), \"Car\") && !strcasecmp(\"Van\", gt[i].box.type.c_str()))\n      valid_class = 0;\n\n    // classes not used for evaluation\n    else\n      valid_class = -1;\n\n    // ground truth is ignored, if occlusion, truncation exceeds the difficulty or ground truth is too small\n    // (doesn't count as FN nor TP, although detections may be assigned)\n    bool ignore = false;\n    if(gt[i].occlusion>MAX_OCCLUSION[difficulty] || gt[i].truncation>MAX_TRUNCATION[difficulty] || height<MIN_HEIGHT[difficulty])\n      ignore = true;\n\n    // set ignored vector for ground truth\n    // current class and not ignored (total no. of ground truth is detected for recall denominator)\n    if(valid_class==1 && !ignore){\n      ignored_gt.push_back(0);\n      n_gt++;\n    }\n\n    // neighboring class, or current class but ignored\n    else if(valid_class==0 || (ignore && valid_class==1))\n      ignored_gt.push_back(1);\n\n    // all other classes which are FN in the evaluation\n    else\n      ignored_gt.push_back(-1);\n  }\n\n  // extract dontcare areas\n  for(int32_t i=0;i<gt.size(); i++)\n    if(!strcasecmp(\"DontCare\", gt[i].box.type.c_str()))\n      dc.push_back(gt[i]);\n\n  // extract detections bounding boxes of the current class\n  for(int32_t i=0;i<det.size(); i++){\n\n    // neighboring classes are not evaluated\n    int32_t valid_class;\n    if(!strcasecmp(det[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))\n      valid_class = 1;\n    else\n      valid_class = -1;\n\n    int32_t height = fabs(det[i].box.y1 - det[i].box.y2);\n\n    // set ignored vector for detections\n    if(height<MIN_HEIGHT[difficulty])\n      ignored_det.push_back(1);\n    else if(valid_class==1)\n      ignored_det.push_back(0);\n    else\n      ignored_det.push_back(-1);\n  }\n}\n\ntPrData computeStatistics(CLASSES current_class, const vector<tGroundtruth> &gt,\n        const vector<tDetection> &det, const vector<tGroundtruth> &dc,\n        const vector<int32_t> &ignored_gt, const vector<int32_t>  &ignored_det,\n        bool compute_fp, double (*boxoverlap)(tDetection, tGroundtruth, int32_t),\n        METRIC metric, bool compute_aos=false, double thresh=0, bool debug=false){\n\n  tPrData stat = tPrData();\n  const double NO_DETECTION = -10000000;\n  vector<double> delta;            // holds angular difference for TPs (needed for AOS evaluation)\n  vector<bool> assigned_detection; // holds wether a detection was assigned to a valid or ignored ground truth\n  assigned_detection.assign(det.size(), false);\n  vector<bool> ignored_threshold;\n  ignored_threshold.assign(det.size(), false); // holds detections with a threshold lower than thresh if FP are computed\n\n  // detections with a low score are ignored for computing precision (needs FP)\n  if(compute_fp)\n    for(int32_t i=0; i<det.size(); i++)\n      if(det[i].thresh<thresh)\n        ignored_threshold[i] = true;\n\n  // evaluate all ground truth boxes\n  for(int32_t i=0; i<gt.size(); i++){\n\n    // this ground truth is not of the current or a neighboring class and therefore ignored\n    if(ignored_gt[i]==-1)\n      continue;\n\n    /*=======================================================================\n    find candidates (overlap with ground truth > 0.5) (logical len(det))\n    =======================================================================*/\n    int32_t det_idx          = -1;\n    double valid_detection = NO_DETECTION;\n    double max_overlap     = 0;\n\n    // search for a possible detection\n    bool assigned_ignored_det = false;\n    for(int32_t j=0; j<det.size(); j++){\n\n      // detections not of the current class, already assigned or with a low threshold are ignored\n      if(ignored_det[j]==-1)\n        continue;\n      if(assigned_detection[j])\n        continue;\n      if(ignored_threshold[j])\n        continue;\n\n      // find the maximum score for the candidates and get idx of respective detection\n      double overlap = boxoverlap(det[j], gt[i], -1);\n\n      // for computing recall thresholds, the candidate with highest score is considered\n      if(!compute_fp && overlap>MIN_OVERLAP[metric][current_class] && det[j].thresh>valid_detection){\n        det_idx         = j;\n        valid_detection = det[j].thresh;\n      }\n\n      // for computing pr curve values, the candidate with the greatest overlap is considered\n      // if the greatest overlap is an ignored detection (min_height), the overlapping detection is used\n      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && (overlap>max_overlap || assigned_ignored_det) && ignored_det[j]==0){\n        max_overlap     = overlap;\n        det_idx         = j;\n        valid_detection = 1;\n        assigned_ignored_det = false;\n      }\n      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && valid_detection==NO_DETECTION && ignored_det[j]==1){\n        det_idx              = j;\n        valid_detection      = 1;\n        assigned_ignored_det = true;\n      }\n    }\n\n    /*=======================================================================\n    compute TP, FP and FN\n    =======================================================================*/\n\n    // nothing was assigned to this valid ground truth\n    if(valid_detection==NO_DETECTION && ignored_gt[i]==0) {\n      stat.fn++;\n    }\n\n    // only evaluate valid ground truth <=> detection assignments (considering difficulty level)\n    else if(valid_detection!=NO_DETECTION && (ignored_gt[i]==1 || ignored_det[det_idx]==1))\n      assigned_detection[det_idx] = true;\n\n    // found a valid true positive\n    else if(valid_detection!=NO_DETECTION){\n\n      // write highest score to threshold vector\n      stat.tp++;\n      stat.v.push_back(det[det_idx].thresh);\n\n      // compute angular difference of detection and ground truth if valid detection orientation was provided\n      if(compute_aos)\n        delta.push_back(gt[i].box.alpha - det[det_idx].box.alpha);\n      // test use ry as the error measure\n\t//delta.push_back(gt[i].ry - det[det_idx].ry);\n\n      // clean up\n      assigned_detection[det_idx] = true;\n    }\n  }\n\n  // if FP are requested, consider stuff area\n  if(compute_fp){\n\n    // count fp\n    for(int32_t i=0; i<det.size(); i++){\n\n      // count false positives if required (height smaller than required is ignored (ignored_det==1)\n      if(!(assigned_detection[i] || ignored_det[i]==-1 || ignored_det[i]==1 || ignored_threshold[i]))\n        stat.fp++;\n    }\n\n    // do not consider detections overlapping with stuff area\n    int32_t nstuff = 0;\n    for(int32_t i=0; i<dc.size(); i++){\n      for(int32_t j=0; j<det.size(); j++){\n\n        // detections not of the current class, already assigned, with a low threshold or a low minimum height are ignored\n        if(assigned_detection[j])\n          continue;\n        if(ignored_det[j]==-1 || ignored_det[j]==1)\n          continue;\n        if(ignored_threshold[j])\n          continue;\n\n        // compute overlap and assign to stuff area, if overlap exceeds class specific value\n        double overlap = boxoverlap(det[j], dc[i], 0);\n        if(overlap>MIN_OVERLAP[metric][current_class]){\n          assigned_detection[j] = true;\n          nstuff++;\n        }\n      }\n    }\n\n    // FP = no. of all not to ground truth assigned detections - detections assigned to stuff areas\n    stat.fp -= nstuff;\n\n    // if all orientation values are valid, the AOS is computed\n    if(compute_aos){\n      vector<double> tmp;\n\n      // FP have a similarity of 0, for all TP compute AOS\n      tmp.assign(stat.fp, 0);\n      for(int32_t i=0; i<delta.size(); i++)\n        tmp.push_back((1.0+cos(delta[i]))/2.0);\n\n      // be sure, that all orientation deltas are computed\n      assert(tmp.size()==stat.fp+stat.tp);\n      assert(delta.size()==stat.tp);\n\n      // get the mean orientation similarity for this image\n      if(stat.tp>0 || stat.fp>0)\n        stat.similarity = accumulate(tmp.begin(), tmp.end(), 0.0);\n\n      // there was neither a FP nor a TP, so the similarity is ignored in the evaluation\n      else\n        stat.similarity = -1;\n    }\n  }\n  return stat;\n}\n\n/*=======================================================================\nEVALUATE CLASS-WISE\n=======================================================================*/\n\nbool eval_class (FILE *fp_det, FILE *fp_ori, CLASSES current_class,\n        const vector< vector<tGroundtruth> > &groundtruth,\n        const vector< vector<tDetection> > &detections, bool compute_aos,\n        double (*boxoverlap)(tDetection, tGroundtruth, int32_t),\n        vector<double> &precision, vector<double> &aos,\n        DIFFICULTY difficulty, METRIC metric) {\n    assert(groundtruth.size() == detections.size());\n\n  // init\n  int32_t n_gt=0;                                     // total no. of gt (denominator of recall)\n  vector<double> v, thresholds;                       // detection scores, evaluated for recall discretization\n  vector< vector<int32_t> > ignored_gt, ignored_det;  // index of ignored gt detection for current class/difficulty\n  vector< vector<tGroundtruth> > dontcare;            // index of dontcare areas, included in ground truth\n\n  // for all test images do\n  for (int32_t i=0; i<groundtruth.size(); i++){\n\n    // holds ignored ground truth, ignored detections and dontcare areas for current frame\n    vector<int32_t> i_gt, i_det;\n    vector<tGroundtruth> dc;\n\n    // only evaluate objects of current class and ignore occluded, truncated objects\n    cleanData(current_class, groundtruth[i], detections[i], i_gt, dc, i_det, n_gt, difficulty);\n    ignored_gt.push_back(i_gt);\n    ignored_det.push_back(i_det);\n    dontcare.push_back(dc);\n\n    // compute statistics to get recall values\n    tPrData pr_tmp = tPrData();\n    pr_tmp = computeStatistics(current_class, groundtruth[i], detections[i], dc, i_gt, i_det, false, boxoverlap, metric);\n\n    // add detection scores to vector over all images\n    for(int32_t j=0; j<pr_tmp.v.size(); j++)\n      v.push_back(pr_tmp.v[j]);\n  }\n\n  // get scores that must be evaluated for recall discretization\n  thresholds = getThresholds(v, n_gt);\n\n  // compute TP,FP,FN for relevant scores\n  vector<tPrData> pr;\n  pr.assign(thresholds.size(),tPrData());\n  for (int32_t i=0; i<groundtruth.size(); i++){\n\n    // for all scores/recall thresholds do:\n    for(int32_t t=0; t<thresholds.size(); t++){\n      tPrData tmp = tPrData();\n      tmp = computeStatistics(current_class, groundtruth[i], detections[i], dontcare[i],\n                              ignored_gt[i], ignored_det[i], true, boxoverlap, metric,\n                              compute_aos, thresholds[t], t==38);\n\n      // add no. of TP, FP, FN, AOS for current frame to total evaluation for current threshold\n      pr[t].tp += tmp.tp;\n      pr[t].fp += tmp.fp;\n      pr[t].fn += tmp.fn;\n      if(tmp.similarity!=-1)\n        pr[t].similarity += tmp.similarity;\n    }\n  }\n\n  // compute recall, precision and AOS\n  vector<double> recall;\n  precision.assign(N_SAMPLE_PTS, 0);\n  if(compute_aos)\n    aos.assign(N_SAMPLE_PTS, 0);\n  double r=0;\n  for (int32_t i=0; i<thresholds.size(); i++){\n    r = pr[i].tp/(double)(pr[i].tp + pr[i].fn);\n    recall.push_back(r);\n    precision[i] = pr[i].tp/(double)(pr[i].tp + pr[i].fp);\n    if(compute_aos)\n      aos[i] = pr[i].similarity/(double)(pr[i].tp + pr[i].fp);\n  }\n\n  // filter precision and AOS using max_{i..end}(precision)\n  for (int32_t i=0; i<thresholds.size(); i++){\n    precision[i] = *max_element(precision.begin()+i, precision.end());\n    if(compute_aos)\n      aos[i] = *max_element(aos.begin()+i, aos.end());\n  }\n\n  // save statisics and finish with success\n  saveStats(precision, aos, fp_det, fp_ori);\n    return true;\n}\n\nvoid saveAndPlotPlots(string dir_name,string file_name,string obj_type,vector<double> vals[],bool is_aos){\n\n  char command[1024];\n\n  // save plot data to file\n  FILE *fp = fopen((dir_name + \"/\" + file_name + \".txt\").c_str(),\"w\");\n  printf(\"save %s\\n\", (dir_name + \"/\" + file_name + \".txt\").c_str());\n  for (int32_t i=0; i<(int)N_SAMPLE_PTS; i++)\n    fprintf(fp,\"%f %f %f %f\\n\",(double)i/(N_SAMPLE_PTS-1.0),vals[0][i],vals[1][i],vals[2][i]);\n  fclose(fp);\n\n  float sum[3] = {0, 0, 0};\n  for (int v = 0; v < 3; ++v)\n      for (int i = 0; i < vals[v].size(); i = i + 4)\n          sum[v] += vals[v][i];\n  printf(\"%s AP: %f %f %f\\n\", file_name.c_str(), sum[0] / 11 * 100, sum[1] / 11 * 100, sum[2] / 11 * 100);\n\n\n  // create png + eps\n  for (int32_t j=0; j<2; j++) {\n\n    // open file\n    FILE *fp = fopen((dir_name + \"/\" + file_name + \".gp\").c_str(),\"w\");\n\n    // save gnuplot instructions\n    if (j==0) {\n      fprintf(fp,\"set term png size 450,315 font \\\"Helvetica\\\" 11\\n\");\n      fprintf(fp,\"set output \\\"%s.png\\\"\\n\",file_name.c_str());\n    } else {\n      fprintf(fp,\"set term postscript eps enhanced color font \\\"Helvetica\\\" 20\\n\");\n      fprintf(fp,\"set output \\\"%s.eps\\\"\\n\",file_name.c_str());\n    }\n\n    // set labels and ranges\n    fprintf(fp,\"set size ratio 0.7\\n\");\n    fprintf(fp,\"set xrange [0:1]\\n\");\n    fprintf(fp,\"set yrange [0:1]\\n\");\n    fprintf(fp,\"set xlabel \\\"Recall\\\"\\n\");\n    if (!is_aos) fprintf(fp,\"set ylabel \\\"Precision\\\"\\n\");\n    else         fprintf(fp,\"set ylabel \\\"Orientation Similarity\\\"\\n\");\n    obj_type[0] = toupper(obj_type[0]);\n    fprintf(fp,\"set title \\\"%s\\\"\\n\",obj_type.c_str());\n\n    // line width\n    int32_t   lw = 5;\n    if (j==0) lw = 3;\n\n    // plot error curve\n    fprintf(fp,\"plot \");\n    fprintf(fp,\"\\\"%s.txt\\\" using 1:2 title 'Easy' with lines ls 1 lw %d,\",file_name.c_str(),lw);\n    fprintf(fp,\"\\\"%s.txt\\\" using 1:3 title 'Moderate' with lines ls 2 lw %d,\",file_name.c_str(),lw);\n    fprintf(fp,\"\\\"%s.txt\\\" using 1:4 title 'Hard' with lines ls 3 lw %d\",file_name.c_str(),lw);\n\n    // close file\n    fclose(fp);\n\n    // run gnuplot => create png + eps\n    sprintf(command,\"cd %s; gnuplot %s\",dir_name.c_str(),(file_name + \".gp\").c_str());\n    system(command);\n  }\n\n  // create pdf and crop\n  sprintf(command,\"cd %s; ps2pdf %s.eps %s_large.pdf\",dir_name.c_str(),file_name.c_str(),file_name.c_str());\n  system(command);\n  sprintf(command,\"cd %s; pdfcrop %s_large.pdf %s.pdf\",dir_name.c_str(),file_name.c_str(),file_name.c_str());\n  system(command);\n  sprintf(command,\"cd %s; rm %s_large.pdf\",dir_name.c_str(),file_name.c_str());\n  system(command);\n}\n\nvector<int32_t> getEvalIndices(const string& result_dir) {\n\n    DIR* dir;\n    dirent* entity;\n    dir = opendir(result_dir.c_str());\n    if (dir) {\n        while (entity = readdir(dir)) {\n            string path(entity->d_name);\n            int32_t len = path.size();\n            if (len < 10) continue;\n            int32_t index = atoi(path.substr(len - 10, 10).c_str());\n            indices.push_back(index);\n        }\n    }\n    return indices;\n}\n\nbool eval(string gt_dir, string result_dir, Mail* mail){\n\n  // set some global parameters\n  initGlobals();\n\n  // ground truth and result directories\n  // string gt_dir         = \"data/object/label_2\";\n  // string result_dir     = \"results/\" + result_sha;\n  string plot_dir       = result_dir + \"/plot\";\n\n  // create output directories\n  system((\"mkdir \" + plot_dir).c_str());\n\n  // hold detections and ground truth in memory\n  vector< vector<tGroundtruth> > groundtruth;\n  vector< vector<tDetection> >   detections;\n\n  // holds wether orientation similarity shall be computed (might be set to false while loading detections)\n  // and which labels where provided by this submission\n  bool compute_aos=true;\n  vector<bool> eval_image(NUM_CLASS, false);\n  vector<bool> eval_ground(NUM_CLASS, false);\n  vector<bool> eval_3d(NUM_CLASS, false);\n\n  // for all images read groundtruth and detections\n  mail->msg(\"Loading detections...\");\n  std::vector<int32_t> indices = getEvalIndices(result_dir + \"/data/\");\n  printf(\"number of files for evaluation: %d\\n\", (int)indices.size());\n\n  for (int32_t i=0; i<indices.size(); i++) {\n\n    // file name\n    char file_name[256];\n    sprintf(file_name,\"%06d.txt\",indices.at(i));\n\n    // read ground truth and result poses\n    bool gt_success,det_success;\n    vector<tGroundtruth> gt   = loadGroundtruth(gt_dir + \"/\" + file_name,gt_success);\n    vector<tDetection>   det  = loadDetections(result_dir + \"/data/\" + file_name,\n            compute_aos, eval_image, eval_ground, eval_3d, det_success);\n    groundtruth.push_back(gt);\n    detections.push_back(det);\n\n    // check for errors\n    if (!gt_success) {\n      mail->msg(\"ERROR: Couldn't read: %s of ground truth. Please write me an email!\", file_name);\n      return false;\n    }\n    if (!det_success) {\n      mail->msg(\"ERROR: Couldn't read: %s\", file_name);\n      return false;\n    }\n  }\n  mail->msg(\"  done.\");\n\n  // holds pointers for result files\n  FILE *fp_det=0, *fp_ori=0;\n\n  // eval image 2D bounding boxes\n  for (int c = 0; c < NUM_CLASS; c++) {\n    CLASSES cls = (CLASSES)c;\n    if (eval_image[c]) {\n      fp_det = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_detection.txt\").c_str(), \"w\");\n      if(compute_aos)\n        fp_ori = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_orientation.txt\").c_str(),\"w\");\n      vector<double> precision[3], aos[3];\n      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[0], aos[0], EASY, IMAGE)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[1], aos[1], MODERATE, IMAGE)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[2], aos[2], HARD, IMAGE)) {\n        mail->msg(\"%s evaluation failed.\", CLASS_NAMES[c].c_str());\n        return false;\n      }\n      fclose(fp_det);\n      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_detection\", CLASS_NAMES[c], precision, 0);\n      if(compute_aos){\n        saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_orientation\", CLASS_NAMES[c], aos, 1);\n        fclose(fp_ori);\n      }\n    }\n  }\n\n  // don't evaluate AOS for birdview boxes and 3D boxes\n  compute_aos = false;\n\n  // eval bird's eye view bounding boxes\n  for (int c = 0; c < NUM_CLASS; c++) {\n    CLASSES cls = (CLASSES)c;\n    if (eval_ground[c]) {\n      fp_det = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_detection_ground.txt\").c_str(), \"w\");\n      vector<double> precision[3], aos[3];\n      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[0], aos[0], EASY, GROUND)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[1], aos[1], MODERATE, GROUND)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[2], aos[2], HARD, GROUND)) {\n        mail->msg(\"%s evaluation failed.\", CLASS_NAMES[c].c_str());\n        return false;\n      }\n      fclose(fp_det);\n      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_detection_ground\", CLASS_NAMES[c], precision, 0);\n    }\n  }\n\n  // eval 3D bounding boxes\n  for (int c = 0; c < NUM_CLASS; c++) {\n    CLASSES cls = (CLASSES)c;\n    if (eval_3d[c]) {\n      fp_det = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_detection_3d.txt\").c_str(), \"w\");\n      vector<double> precision[3], aos[3];\n      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[0], aos[0], EASY, BOX3D)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[1], aos[1], MODERATE, BOX3D)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[2], aos[2], HARD, BOX3D)) {\n        mail->msg(\"%s evaluation failed.\", CLASS_NAMES[c].c_str());\n        return false;\n      }\n      fclose(fp_det);\n      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_detection_3d\", CLASS_NAMES[c], precision, 0);\n    }\n  }\n\n  // success\n  return true;\n}\n\nint32_t main (int32_t argc,char *argv[]) {\n\n  // we need 2 or 4 arguments!\n  if (argc!=3) {\n    cout << \"Usage: ./eval_detection_3d_offline gt_dir result_dir\" << endl;\n    return 1;\n  }\n\n  // read arguments\n  string gt_dir = argv[1];\n  string result_dir = argv[2];\n\n  // init notification mail\n  Mail *mail;\n  mail = new Mail();\n  mail->msg(\"Thank you for participating in our evaluation!\");\n\n  // run evaluation\n  if (eval(gt_dir, result_dir, mail)) {\n    mail->msg(\"Your evaluation results are available at:\");\n    mail->msg(result_dir.c_str());\n  } else {\n    system((\"rm -r \" + result_dir + \"/plot\").c_str());\n    mail->msg(\"An error occured while processing your results.\");\n  }\n\n  // send mail and exit\n  delete mail;\n\n  return 0;\n}\n\n\n"
  },
  {
    "path": "tools/kitti-eval/evaluate_object_3d_offline_r40.cpp",
    "content": "#include <iostream>\n#include <algorithm>\n#include <stdio.h>\n#include <math.h>\n#include <vector>\n#include <numeric>\n#include <strings.h>\n#include <assert.h>\n\n#include <dirent.h>\n\n#include <boost/numeric/ublas/matrix.hpp>\n#include <boost/numeric/ublas/io.hpp>\n\n#include <boost/geometry.hpp>\n#include <boost/geometry/geometries/point_xy.hpp>\n#include <boost/geometry/geometries/polygon.hpp>\n#include <boost/geometry/geometries/adapted/c_array.hpp>\n\n#include \"mail.h\"\n\nBOOST_GEOMETRY_REGISTER_C_ARRAY_CS(cs::cartesian)\n\ntypedef boost::geometry::model::polygon<boost::geometry::model::d2::point_xy<double> > Polygon;\n\n\nusing namespace std;\n\n/*=======================================================================\nSTATIC EVALUATION PARAMETERS\n=======================================================================*/\n\n// holds the number of test images on the server\nconst int32_t N_TESTIMAGES = 7518;\n\n// easy, moderate and hard evaluation level\nenum DIFFICULTY{EASY=0, MODERATE=1, HARD=2};\n\n// evaluation metrics: image, ground or 3D\nenum METRIC{IMAGE=0, GROUND=1, BOX3D=2};\n\n// evaluation parameter\nconst int32_t MIN_HEIGHT[3]     = {40, 25, 25};     // minimum height for evaluated groundtruth/detections\nconst int32_t MAX_OCCLUSION[3]  = {0, 1, 2};        // maximum occlusion level of the groundtruth used for evaluation\nconst double  MAX_TRUNCATION[3] = {0.15, 0.3, 0.5}; // maximum truncation level of the groundtruth used for evaluation\n\n// evaluated object classes\nenum CLASSES{CAR=0, PEDESTRIAN=1, CYCLIST=2};\nconst int NUM_CLASS = 3;\n\n// parameters varying per class\nvector<string> CLASS_NAMES;\n// the minimum overlap required for 2D evaluation on the image/ground plane and 3D evaluation\n// const double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.5, 0.25, 0.25}, {0.5, 0.25, 0.25}};\nconst double MIN_OVERLAP[3][3] = {{0.7, 0.5, 0.5}, {0.7, 0.5, 0.5}, {0.7, 0.5, 0.5}};\n\n// no. of recall steps that should be evaluated (discretized)\nconst double N_SAMPLE_PTS = 41;\n\n// initialize class names\nvoid initGlobals () {\n  CLASS_NAMES.push_back(\"car\");\n  CLASS_NAMES.push_back(\"pedestrian\");\n  CLASS_NAMES.push_back(\"cyclist\");\n}\n\n/*=======================================================================\nDATA TYPES FOR EVALUATION\n=======================================================================*/\n\n// holding data needed for precision-recall and precision-aos\nstruct tPrData {\n  vector<double> v;           // detection score for computing score thresholds\n  double         similarity;  // orientation similarity\n  int32_t        tp;          // true positives\n  int32_t        fp;          // false positives\n  int32_t        fn;          // false negatives\n  tPrData () :\n    similarity(0), tp(0), fp(0), fn(0) {}\n};\n\n// holding bounding boxes for ground truth and detections\nstruct tBox {\n  string  type;     // object type as car, pedestrian or cyclist,...\n  double   x1;      // left corner\n  double   y1;      // top corner\n  double   x2;      // right corner\n  double   y2;      // bottom corner\n  double   alpha;   // image orientation\n  tBox (string type, double x1,double y1,double x2,double y2,double alpha) :\n    type(type),x1(x1),y1(y1),x2(x2),y2(y2),alpha(alpha) {}\n};\n\n// holding ground truth data\nstruct tGroundtruth {\n  tBox    box;        // object type, box, orientation\n  double  truncation; // truncation 0..1\n  int32_t occlusion;  // occlusion 0,1,2 (non, partly, fully)\n  double ry;\n  double  t1, t2, t3;\n  double h, w, l;\n  tGroundtruth () :\n    box(tBox(\"invalild\",-1,-1,-1,-1,-10)),truncation(-1),occlusion(-1) {}\n  tGroundtruth (tBox box,double truncation,int32_t occlusion) :\n    box(box),truncation(truncation),occlusion(occlusion) {}\n  tGroundtruth (string type,double x1,double y1,double x2,double y2,double alpha,double truncation,int32_t occlusion) :\n    box(tBox(type,x1,y1,x2,y2,alpha)),truncation(truncation),occlusion(occlusion) {}\n};\n\n// holding detection data\nstruct tDetection {\n  tBox    box;    // object type, box, orientation\n  double  thresh; // detection score\n  double  ry;\n  double  t1, t2, t3;\n  double  h, w, l;\n  tDetection ():\n    box(tBox(\"invalid\",-1,-1,-1,-1,-10)),thresh(-1000) {}\n  tDetection (tBox box,double thresh) :\n    box(box),thresh(thresh) {}\n  tDetection (string type,double x1,double y1,double x2,double y2,double alpha,double thresh) :\n    box(tBox(type,x1,y1,x2,y2,alpha)),thresh(thresh) {}\n};\n\n\n/*=======================================================================\nFUNCTIONS TO LOAD DETECTION AND GROUND TRUTH DATA ONCE, SAVE RESULTS\n=======================================================================*/\nvector<int32_t> indices;\n\nvector<tDetection> loadDetections(string file_name, bool &compute_aos,\n        vector<bool> &eval_image, vector<bool> &eval_ground,\n        vector<bool> &eval_3d, bool &success) {\n\n  // holds all detections (ignored detections are indicated by an index vector\n  vector<tDetection> detections;\n  FILE *fp = fopen(file_name.c_str(),\"r\");\n  if (!fp) {\n    success = false;\n    return detections;\n  }\n  while (!feof(fp)) {\n    tDetection d;\n    double trash;\n    char str[255];\n    if (fscanf(fp, \"%s %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf\",\n                   str, &trash, &trash, &d.box.alpha, &d.box.x1, &d.box.y1,\n                   &d.box.x2, &d.box.y2, &d.h, &d.w, &d.l, &d.t1, &d.t2, &d.t3,\n                   &d.ry, &d.thresh)==16) {\n\n        // d.thresh = 1;\n      d.box.type = str;\n      detections.push_back(d);\n\n      // orientation=-10 is invalid, AOS is not evaluated if at least one orientation is invalid\n      if(d.box.alpha == -10)\n        compute_aos = false;\n\n      // a class is only evaluated if it is detected at least once\n      for (int c = 0; c < NUM_CLASS; c++) {\n        if (!strcasecmp(d.box.type.c_str(), CLASS_NAMES[c].c_str())) {\n          if (!eval_image[c] && d.box.x1 >= 0)\n            eval_image[c] = true;\n          if (!eval_ground[c] && d.t1 != -1000)\n            eval_ground[c] = true;\n          if (!eval_3d[c] && d.t2 != -1000)\n            eval_3d[c] = true;\n          break;\n        }\n      }\n    }\n  }\n  fclose(fp);\n  success = true;\n  return detections;\n}\n\nvector<tGroundtruth> loadGroundtruth(string file_name,bool &success) {\n\n  // holds all ground truth (ignored ground truth is indicated by an index vector\n  vector<tGroundtruth> groundtruth;\n  FILE *fp = fopen(file_name.c_str(),\"r\");\n  if (!fp) {\n    success = false;\n    return groundtruth;\n  }\n  while (!feof(fp)) {\n    tGroundtruth g;\n    char str[255];\n    if (fscanf(fp, \"%s %lf %d %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf %lf\",\n                   str, &g.truncation, &g.occlusion, &g.box.alpha,\n                   &g.box.x1,   &g.box.y1,     &g.box.x2,    &g.box.y2,\n                   &g.h,      &g.w,        &g.l,       &g.t1,\n                   &g.t2,      &g.t3,        &g.ry )==15) {\n      g.box.type = str;\n      groundtruth.push_back(g);\n    }\n  }\n  fclose(fp);\n  success = true;\n  return groundtruth;\n}\n\nvoid saveStats (const vector<double> &precision, const vector<double> &aos, FILE *fp_det, FILE *fp_ori) {\n\n  // save precision to file\n  if(precision.empty())\n    return;\n  for (int32_t i=0; i<precision.size(); i++)\n    fprintf(fp_det,\"%f \",precision[i]);\n  fprintf(fp_det,\"\\n\");\n\n  // save orientation similarity, only if there were no invalid orientation entries in submission (alpha=-10)\n  if(aos.empty())\n    return;\n  for (int32_t i=0; i<aos.size(); i++)\n    fprintf(fp_ori,\"%f \",aos[i]);\n  fprintf(fp_ori,\"\\n\");\n}\n\n/*=======================================================================\nEVALUATION HELPER FUNCTIONS\n=======================================================================*/\n\n// criterion defines whether the overlap is computed with respect to both areas (ground truth and detection)\n// or with respect to box a or b (detection and \"dontcare\" areas)\ninline double imageBoxOverlap(tBox a, tBox b, int32_t criterion=-1){\n\n  // overlap is invalid in the beginning\n  double o = -1;\n\n  // get overlapping area\n  double x1 = max(a.x1, b.x1);\n  double y1 = max(a.y1, b.y1);\n  double x2 = min(a.x2, b.x2);\n  double y2 = min(a.y2, b.y2);\n\n  // compute width and height of overlapping area\n  double w = x2-x1;\n  double h = y2-y1;\n\n  // set invalid entries to 0 overlap\n  if(w<=0 || h<=0)\n    return 0;\n\n  // get overlapping areas\n  double inter = w*h;\n  double a_area = (a.x2-a.x1) * (a.y2-a.y1);\n  double b_area = (b.x2-b.x1) * (b.y2-b.y1);\n\n  // intersection over union overlap depending on users choice\n  if(criterion==-1)     // union\n    o = inter / (a_area+b_area-inter);\n  else if(criterion==0) // bbox_a\n    o = inter / a_area;\n  else if(criterion==1) // bbox_b\n    o = inter / b_area;\n\n  // overlap\n  return o;\n}\n\ninline double imageBoxOverlap(tDetection a, tGroundtruth b, int32_t criterion=-1){\n  return imageBoxOverlap(a.box, b.box, criterion);\n}\n\n// compute polygon of an oriented bounding box\ntemplate <typename T>\nPolygon toPolygon(const T& g) {\n    using namespace boost::numeric::ublas;\n    using namespace boost::geometry;\n    matrix<double> mref(2, 2);\n    mref(0, 0) = cos(g.ry); mref(0, 1) = sin(g.ry);\n    mref(1, 0) = -sin(g.ry); mref(1, 1) = cos(g.ry);\n\n    static int count = 0;\n    matrix<double> corners(2, 4);\n    double data[] = {g.l / 2, g.l / 2, -g.l / 2, -g.l / 2,\n                     g.w / 2, -g.w / 2, -g.w / 2, g.w / 2};\n    std::copy(data, data + 8, corners.data().begin());\n    matrix<double> gc = prod(mref, corners);\n    for (int i = 0; i < 4; ++i) {\n        gc(0, i) += g.t1;\n        gc(1, i) += g.t3;\n    }\n\n    double points[][2] = {{gc(0, 0), gc(1, 0)},{gc(0, 1), gc(1, 1)},{gc(0, 2), gc(1, 2)},{gc(0, 3), gc(1, 3)},{gc(0, 0), gc(1, 0)}};\n    Polygon poly;\n    append(poly, points);\n    return poly;\n}\n\n// measure overlap between bird's eye view bounding boxes, parametrized by (ry, l, w, tx, tz)\ninline double groundBoxOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {\n    using namespace boost::geometry;\n    Polygon gp = toPolygon(g);\n    Polygon dp = toPolygon(d);\n\n    std::vector<Polygon> in, un;\n    intersection(gp, dp, in);\n    union_(gp, dp, un);\n\n    double inter_area = in.empty() ? 0 : area(in.front());\n    double union_area = area(un.front());\n    double o;\n    if(criterion==-1)     // union\n        o = inter_area / union_area;\n    else if(criterion==0) // bbox_a\n        o = inter_area / area(dp);\n    else if(criterion==1) // bbox_b\n        o = inter_area / area(gp);\n\n    return o;\n}\n\n// measure overlap between 3D bounding boxes, parametrized by (ry, h, w, l, tx, ty, tz)\ninline double box3DOverlap(tDetection d, tGroundtruth g, int32_t criterion = -1) {\n    using namespace boost::geometry;\n    Polygon gp = toPolygon(g);\n    Polygon dp = toPolygon(d);\n\n    std::vector<Polygon> in, un;\n    intersection(gp, dp, in);\n    union_(gp, dp, un);\n\n    double ymax = min(d.t2, g.t2);\n    double ymin = max(d.t2 - d.h, g.t2 - g.h);\n\n    double inter_area = in.empty() ? 0 : area(in.front());\n    double inter_vol = inter_area * max(0.0, ymax - ymin);\n\n    double det_vol = d.h * d.l * d.w;\n    double gt_vol = g.h * g.l * g.w;\n\n    double o;\n    if(criterion==-1)     // union\n        o = inter_vol / (det_vol + gt_vol - inter_vol);\n    else if(criterion==0) // bbox_a\n        o = inter_vol / det_vol;\n    else if(criterion==1) // bbox_b\n        o = inter_vol / gt_vol;\n\n    return o;\n}\n\nvector<double> getThresholds(vector<double> &v, double n_groundtruth){\n\n  // holds scores needed to compute N_SAMPLE_PTS recall values\n  vector<double> t;\n\n  // sort scores in descending order\n  // (highest score is assumed to give best/most confident detections)\n  sort(v.begin(), v.end(), greater<double>());\n\n  // get scores for linearly spaced recall\n  double current_recall = 0;\n  for(int32_t i=0; i<v.size(); i++){\n\n    // check if right-hand-side recall with respect to current recall is close than left-hand-side one\n    // in this case, skip the current detection score\n    double l_recall, r_recall, recall;\n    l_recall = (double)(i+1)/n_groundtruth;\n    if(i<(v.size()-1))\n      r_recall = (double)(i+2)/n_groundtruth;\n    else\n      r_recall = l_recall;\n\n    if( (r_recall-current_recall) < (current_recall-l_recall) && i<(v.size()-1))\n      continue;\n\n    // left recall is the best approximation, so use this and goto next recall step for approximation\n    recall = l_recall;\n\n    // the next recall step was reached\n    t.push_back(v[i]);\n    current_recall += 1.0/(N_SAMPLE_PTS-1.0);\n  }\n  return t;\n}\n\nvoid cleanData(CLASSES current_class, const vector<tGroundtruth> &gt, const vector<tDetection> &det, vector<int32_t> &ignored_gt, vector<tGroundtruth> &dc, vector<int32_t> &ignored_det, int32_t &n_gt, DIFFICULTY difficulty){\n\n  // extract ground truth bounding boxes for current evaluation class\n  for(int32_t i=0;i<gt.size(); i++){\n\n    // only bounding boxes with a minimum height are used for evaluation\n    double height = gt[i].box.y2 - gt[i].box.y1;\n\n    // neighboring classes are ignored (\"van\" for \"car\" and \"person_sitting\" for \"pedestrian\")\n    // (lower/upper cases are ignored)\n    int32_t valid_class;\n\n    // all classes without a neighboring class\n    if(!strcasecmp(gt[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))\n      valid_class = 1;\n\n    // classes with a neighboring class\n    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), \"Pedestrian\") && !strcasecmp(\"Person_sitting\", gt[i].box.type.c_str()))\n      valid_class = 0;\n    else if(!strcasecmp(CLASS_NAMES[current_class].c_str(), \"Car\") && !strcasecmp(\"Van\", gt[i].box.type.c_str()))\n      valid_class = 0;\n\n    // classes not used for evaluation\n    else\n      valid_class = -1;\n\n    // ground truth is ignored, if occlusion, truncation exceeds the difficulty or ground truth is too small\n    // (doesn't count as FN nor TP, although detections may be assigned)\n    bool ignore = false;\n    if(gt[i].occlusion>MAX_OCCLUSION[difficulty] || gt[i].truncation>MAX_TRUNCATION[difficulty] || height<MIN_HEIGHT[difficulty])\n      ignore = true;\n\n    // set ignored vector for ground truth\n    // current class and not ignored (total no. of ground truth is detected for recall denominator)\n    if(valid_class==1 && !ignore){\n      ignored_gt.push_back(0);\n      n_gt++;\n    }\n\n    // neighboring class, or current class but ignored\n    else if(valid_class==0 || (ignore && valid_class==1))\n      ignored_gt.push_back(1);\n\n    // all other classes which are FN in the evaluation\n    else\n      ignored_gt.push_back(-1);\n  }\n\n  // extract dontcare areas\n  for(int32_t i=0;i<gt.size(); i++)\n    if(!strcasecmp(\"DontCare\", gt[i].box.type.c_str()))\n      dc.push_back(gt[i]);\n\n  // extract detections bounding boxes of the current class\n  for(int32_t i=0;i<det.size(); i++){\n\n    // neighboring classes are not evaluated\n    int32_t valid_class;\n    if(!strcasecmp(det[i].box.type.c_str(), CLASS_NAMES[current_class].c_str()))\n      valid_class = 1;\n    else\n      valid_class = -1;\n\n    int32_t height = fabs(det[i].box.y1 - det[i].box.y2);\n\n    // set ignored vector for detections\n    if(height<MIN_HEIGHT[difficulty])\n      ignored_det.push_back(1);\n    else if(valid_class==1)\n      ignored_det.push_back(0);\n    else\n      ignored_det.push_back(-1);\n  }\n}\n\ntPrData computeStatistics(CLASSES current_class, const vector<tGroundtruth> &gt,\n        const vector<tDetection> &det, const vector<tGroundtruth> &dc,\n        const vector<int32_t> &ignored_gt, const vector<int32_t>  &ignored_det,\n        bool compute_fp, double (*boxoverlap)(tDetection, tGroundtruth, int32_t),\n        METRIC metric, bool compute_aos=false, double thresh=0, bool debug=false){\n\n  tPrData stat = tPrData();\n  const double NO_DETECTION = -10000000;\n  vector<double> delta;            // holds angular difference for TPs (needed for AOS evaluation)\n  vector<bool> assigned_detection; // holds wether a detection was assigned to a valid or ignored ground truth\n  assigned_detection.assign(det.size(), false);\n  vector<bool> ignored_threshold;\n  ignored_threshold.assign(det.size(), false); // holds detections with a threshold lower than thresh if FP are computed\n\n  // detections with a low score are ignored for computing precision (needs FP)\n  if(compute_fp)\n    for(int32_t i=0; i<det.size(); i++)\n      if(det[i].thresh<thresh)\n        ignored_threshold[i] = true;\n\n  // evaluate all ground truth boxes\n  for(int32_t i=0; i<gt.size(); i++){\n\n    // this ground truth is not of the current or a neighboring class and therefore ignored\n    if(ignored_gt[i]==-1)\n      continue;\n\n    /*=======================================================================\n    find candidates (overlap with ground truth > 0.5) (logical len(det))\n    =======================================================================*/\n    int32_t det_idx          = -1;\n    double valid_detection = NO_DETECTION;\n    double max_overlap     = 0;\n\n    // search for a possible detection\n    bool assigned_ignored_det = false;\n    for(int32_t j=0; j<det.size(); j++){\n\n      // detections not of the current class, already assigned or with a low threshold are ignored\n      if(ignored_det[j]==-1)\n        continue;\n      if(assigned_detection[j])\n        continue;\n      if(ignored_threshold[j])\n        continue;\n\n      // find the maximum score for the candidates and get idx of respective detection\n      double overlap = boxoverlap(det[j], gt[i], -1);\n\n      // for computing recall thresholds, the candidate with highest score is considered\n      if(!compute_fp && overlap>MIN_OVERLAP[metric][current_class] && det[j].thresh>valid_detection){\n        det_idx         = j;\n        valid_detection = det[j].thresh;\n      }\n\n      // for computing pr curve values, the candidate with the greatest overlap is considered\n      // if the greatest overlap is an ignored detection (min_height), the overlapping detection is used\n      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && (overlap>max_overlap || assigned_ignored_det) && ignored_det[j]==0){\n        max_overlap     = overlap;\n        det_idx         = j;\n        valid_detection = 1;\n        assigned_ignored_det = false;\n      }\n      else if(compute_fp && overlap>MIN_OVERLAP[metric][current_class] && valid_detection==NO_DETECTION && ignored_det[j]==1){\n        det_idx              = j;\n        valid_detection      = 1;\n        assigned_ignored_det = true;\n      }\n    }\n\n    /*=======================================================================\n    compute TP, FP and FN\n    =======================================================================*/\n\n    // nothing was assigned to this valid ground truth\n    if(valid_detection==NO_DETECTION && ignored_gt[i]==0) {\n      stat.fn++;\n    }\n\n    // only evaluate valid ground truth <=> detection assignments (considering difficulty level)\n    else if(valid_detection!=NO_DETECTION && (ignored_gt[i]==1 || ignored_det[det_idx]==1))\n      assigned_detection[det_idx] = true;\n\n    // found a valid true positive\n    else if(valid_detection!=NO_DETECTION){\n\n      // write highest score to threshold vector\n      stat.tp++;\n      stat.v.push_back(det[det_idx].thresh);\n\n      // compute angular difference of detection and ground truth if valid detection orientation was provided\n      if(compute_aos)\n        delta.push_back(gt[i].box.alpha - det[det_idx].box.alpha);\n      // test use ry as the error measure\n\t//delta.push_back(gt[i].ry - det[det_idx].ry);\n\n      // clean up\n      assigned_detection[det_idx] = true;\n    }\n  }\n\n  // if FP are requested, consider stuff area\n  if(compute_fp){\n\n    // count fp\n    for(int32_t i=0; i<det.size(); i++){\n\n      // count false positives if required (height smaller than required is ignored (ignored_det==1)\n      if(!(assigned_detection[i] || ignored_det[i]==-1 || ignored_det[i]==1 || ignored_threshold[i]))\n        stat.fp++;\n    }\n\n    // do not consider detections overlapping with stuff area\n    int32_t nstuff = 0;\n    for(int32_t i=0; i<dc.size(); i++){\n      for(int32_t j=0; j<det.size(); j++){\n\n        // detections not of the current class, already assigned, with a low threshold or a low minimum height are ignored\n        if(assigned_detection[j])\n          continue;\n        if(ignored_det[j]==-1 || ignored_det[j]==1)\n          continue;\n        if(ignored_threshold[j])\n          continue;\n\n        // compute overlap and assign to stuff area, if overlap exceeds class specific value\n        double overlap = boxoverlap(det[j], dc[i], 0);\n        if(overlap>MIN_OVERLAP[metric][current_class]){\n          assigned_detection[j] = true;\n          nstuff++;\n        }\n      }\n    }\n\n    // FP = no. of all not to ground truth assigned detections - detections assigned to stuff areas\n    stat.fp -= nstuff;\n\n    // if all orientation values are valid, the AOS is computed\n    if(compute_aos){\n      vector<double> tmp;\n\n      // FP have a similarity of 0, for all TP compute AOS\n      tmp.assign(stat.fp, 0);\n      for(int32_t i=0; i<delta.size(); i++)\n        tmp.push_back((1.0+cos(delta[i]))/2.0);\n\n      // be sure, that all orientation deltas are computed\n      assert(tmp.size()==stat.fp+stat.tp);\n      assert(delta.size()==stat.tp);\n\n      // get the mean orientation similarity for this image\n      if(stat.tp>0 || stat.fp>0)\n        stat.similarity = accumulate(tmp.begin(), tmp.end(), 0.0);\n\n      // there was neither a FP nor a TP, so the similarity is ignored in the evaluation\n      else\n        stat.similarity = -1;\n    }\n  }\n  return stat;\n}\n\n/*=======================================================================\nEVALUATE CLASS-WISE\n=======================================================================*/\n\nbool eval_class (FILE *fp_det, FILE *fp_ori, CLASSES current_class,\n        const vector< vector<tGroundtruth> > &groundtruth,\n        const vector< vector<tDetection> > &detections, bool compute_aos,\n        double (*boxoverlap)(tDetection, tGroundtruth, int32_t),\n        vector<double> &precision, vector<double> &aos,\n        DIFFICULTY difficulty, METRIC metric) {\n    assert(groundtruth.size() == detections.size());\n\n  // init\n  int32_t n_gt=0;                                     // total no. of gt (denominator of recall)\n  vector<double> v, thresholds;                       // detection scores, evaluated for recall discretization\n  vector< vector<int32_t> > ignored_gt, ignored_det;  // index of ignored gt detection for current class/difficulty\n  vector< vector<tGroundtruth> > dontcare;            // index of dontcare areas, included in ground truth\n\n  // for all test images do\n  for (int32_t i=0; i<groundtruth.size(); i++){\n\n    // holds ignored ground truth, ignored detections and dontcare areas for current frame\n    vector<int32_t> i_gt, i_det;\n    vector<tGroundtruth> dc;\n\n    // only evaluate objects of current class and ignore occluded, truncated objects\n    cleanData(current_class, groundtruth[i], detections[i], i_gt, dc, i_det, n_gt, difficulty);\n    ignored_gt.push_back(i_gt);\n    ignored_det.push_back(i_det);\n    dontcare.push_back(dc);\n\n    // compute statistics to get recall values\n    tPrData pr_tmp = tPrData();\n    pr_tmp = computeStatistics(current_class, groundtruth[i], detections[i], dc, i_gt, i_det, false, boxoverlap, metric);\n\n    // add detection scores to vector over all images\n    for(int32_t j=0; j<pr_tmp.v.size(); j++)\n      v.push_back(pr_tmp.v[j]);\n  }\n\n  // get scores that must be evaluated for recall discretization\n  thresholds = getThresholds(v, n_gt);\n\n  // compute TP,FP,FN for relevant scores\n  vector<tPrData> pr;\n  pr.assign(thresholds.size(),tPrData());\n  for (int32_t i=0; i<groundtruth.size(); i++){\n\n    // for all scores/recall thresholds do:\n    for(int32_t t=0; t<thresholds.size(); t++){\n      tPrData tmp = tPrData();\n      tmp = computeStatistics(current_class, groundtruth[i], detections[i], dontcare[i],\n                              ignored_gt[i], ignored_det[i], true, boxoverlap, metric,\n                              compute_aos, thresholds[t], t==38);\n\n      // add no. of TP, FP, FN, AOS for current frame to total evaluation for current threshold\n      pr[t].tp += tmp.tp;\n      pr[t].fp += tmp.fp;\n      pr[t].fn += tmp.fn;\n      if(tmp.similarity!=-1)\n        pr[t].similarity += tmp.similarity;\n    }\n  }\n\n  // compute recall, precision and AOS\n  vector<double> recall;\n  precision.assign(N_SAMPLE_PTS, 0);\n  if(compute_aos)\n    aos.assign(N_SAMPLE_PTS, 0);\n  double r=0;\n  for (int32_t i=0; i<thresholds.size(); i++){\n    r = pr[i].tp/(double)(pr[i].tp + pr[i].fn);\n    recall.push_back(r);\n    precision[i] = pr[i].tp/(double)(pr[i].tp + pr[i].fp);\n    if(compute_aos)\n      aos[i] = pr[i].similarity/(double)(pr[i].tp + pr[i].fp);\n  }\n\n  // filter precision and AOS using max_{i..end}(precision)\n  for (int32_t i=0; i<thresholds.size(); i++){\n    precision[i] = *max_element(precision.begin()+i, precision.end());\n    if(compute_aos)\n      aos[i] = *max_element(aos.begin()+i, aos.end());\n  }\n\n  // save statisics and finish with success\n  saveStats(precision, aos, fp_det, fp_ori);\n    return true;\n}\n\nvoid saveAndPlotPlots(string dir_name,string file_name,string obj_type,vector<double> vals[],bool is_aos){\n\n  char command[1024];\n\n  // save plot data to file\n  FILE *fp = fopen((dir_name + \"/\" + file_name + \".txt\").c_str(),\"w\");\n  printf(\"save %s\\n\", (dir_name + \"/\" + file_name + \".txt\").c_str());\n  for (int32_t i=0; i<(int)N_SAMPLE_PTS; i++)\n    fprintf(fp,\"%f %f %f %f\\n\",(double)i/(N_SAMPLE_PTS-1.0),vals[0][i],vals[1][i],vals[2][i]);\n  fclose(fp);\n\n  float sum[3] = {0, 0, 0};\n  for (int v = 0; v < 3; ++v)\n      for (int i = 1; i < vals[v].size(); i = i + 1)\n          sum[v] += vals[v][i];\n  printf(\"%s AP: %f %f %f\\n\", file_name.c_str(), sum[0] / 40 * 100, sum[1] / 40 * 100, sum[2] / 40 * 100);\n\n\n  // create png + eps\n  for (int32_t j=0; j<2; j++) {\n\n    // open file\n    FILE *fp = fopen((dir_name + \"/\" + file_name + \".gp\").c_str(),\"w\");\n\n    // save gnuplot instructions\n    if (j==0) {\n      fprintf(fp,\"set term png size 450,315 font \\\"Helvetica\\\" 11\\n\");\n      fprintf(fp,\"set output \\\"%s.png\\\"\\n\",file_name.c_str());\n    } else {\n      fprintf(fp,\"set term postscript eps enhanced color font \\\"Helvetica\\\" 20\\n\");\n      fprintf(fp,\"set output \\\"%s.eps\\\"\\n\",file_name.c_str());\n    }\n\n    // set labels and ranges\n    fprintf(fp,\"set size ratio 0.7\\n\");\n    fprintf(fp,\"set xrange [0:1]\\n\");\n    fprintf(fp,\"set yrange [0:1]\\n\");\n    fprintf(fp,\"set xlabel \\\"Recall\\\"\\n\");\n    if (!is_aos) fprintf(fp,\"set ylabel \\\"Precision\\\"\\n\");\n    else         fprintf(fp,\"set ylabel \\\"Orientation Similarity\\\"\\n\");\n    obj_type[0] = toupper(obj_type[0]);\n    fprintf(fp,\"set title \\\"%s\\\"\\n\",obj_type.c_str());\n\n    // line width\n    int32_t   lw = 5;\n    if (j==0) lw = 3;\n\n    // plot error curve\n    fprintf(fp,\"plot \");\n    fprintf(fp,\"\\\"%s.txt\\\" using 1:2 title 'Easy' with lines ls 1 lw %d,\",file_name.c_str(),lw);\n    fprintf(fp,\"\\\"%s.txt\\\" using 1:3 title 'Moderate' with lines ls 2 lw %d,\",file_name.c_str(),lw);\n    fprintf(fp,\"\\\"%s.txt\\\" using 1:4 title 'Hard' with lines ls 3 lw %d\",file_name.c_str(),lw);\n\n    // close file\n    fclose(fp);\n\n    // run gnuplot => create png + eps\n    sprintf(command,\"cd %s; gnuplot %s\",dir_name.c_str(),(file_name + \".gp\").c_str());\n    system(command);\n  }\n\n  // create pdf and crop\n  sprintf(command,\"cd %s; ps2pdf %s.eps %s_large.pdf\",dir_name.c_str(),file_name.c_str(),file_name.c_str());\n  system(command);\n  sprintf(command,\"cd %s; pdfcrop %s_large.pdf %s.pdf\",dir_name.c_str(),file_name.c_str(),file_name.c_str());\n  system(command);\n  sprintf(command,\"cd %s; rm %s_large.pdf\",dir_name.c_str(),file_name.c_str());\n  system(command);\n}\n\nvector<int32_t> getEvalIndices(const string& result_dir) {\n\n    DIR* dir;\n    dirent* entity;\n    dir = opendir(result_dir.c_str());\n    if (dir) {\n        while (entity = readdir(dir)) {\n            string path(entity->d_name);\n            int32_t len = path.size();\n            if (len < 10) continue;\n            int32_t index = atoi(path.substr(len - 10, 10).c_str());\n            indices.push_back(index);\n        }\n    }\n    return indices;\n}\n\nbool eval(string gt_dir, string result_dir, Mail* mail){\n\n  // set some global parameters\n  initGlobals();\n\n  // ground truth and result directories\n  // string gt_dir         = \"data/object/label_2\";\n  // string result_dir     = \"results/\" + result_sha;\n  string plot_dir       = result_dir + \"/plot\";\n\n  // create output directories\n  system((\"mkdir \" + plot_dir).c_str());\n\n  // hold detections and ground truth in memory\n  vector< vector<tGroundtruth> > groundtruth;\n  vector< vector<tDetection> >   detections;\n\n  // holds wether orientation similarity shall be computed (might be set to false while loading detections)\n  // and which labels where provided by this submission\n  bool compute_aos=true;\n  vector<bool> eval_image(NUM_CLASS, false);\n  vector<bool> eval_ground(NUM_CLASS, false);\n  vector<bool> eval_3d(NUM_CLASS, false);\n\n  // for all images read groundtruth and detections\n  mail->msg(\"Loading detections...\");\n  std::vector<int32_t> indices = getEvalIndices(result_dir + \"/data/\");\n  printf(\"number of files for evaluation: %d\\n\", (int)indices.size());\n\n  for (int32_t i=0; i<indices.size(); i++) {\n\n    // file name\n    char file_name[256];\n    sprintf(file_name,\"%06d.txt\",indices.at(i));\n\n    // read ground truth and result poses\n    bool gt_success,det_success;\n    vector<tGroundtruth> gt   = loadGroundtruth(gt_dir + \"/\" + file_name,gt_success);\n    vector<tDetection>   det  = loadDetections(result_dir + \"/data/\" + file_name,\n            compute_aos, eval_image, eval_ground, eval_3d, det_success);\n    groundtruth.push_back(gt);\n    detections.push_back(det);\n\n    // check for errors\n    if (!gt_success) {\n      mail->msg(\"ERROR: Couldn't read: %s of ground truth. Please write me an email!\", file_name);\n      return false;\n    }\n    if (!det_success) {\n      mail->msg(\"ERROR: Couldn't read: %s\", file_name);\n      return false;\n    }\n  }\n  mail->msg(\"  done.\");\n\n  // holds pointers for result files\n  FILE *fp_det=0, *fp_ori=0;\n\n  // eval image 2D bounding boxes\n  for (int c = 0; c < NUM_CLASS; c++) {\n    CLASSES cls = (CLASSES)c;\n    if (eval_image[c]) {\n      fp_det = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_detection.txt\").c_str(), \"w\");\n      if(compute_aos)\n        fp_ori = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_orientation.txt\").c_str(),\"w\");\n      vector<double> precision[3], aos[3];\n      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[0], aos[0], EASY, IMAGE)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[1], aos[1], MODERATE, IMAGE)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, imageBoxOverlap, precision[2], aos[2], HARD, IMAGE)) {\n        mail->msg(\"%s evaluation failed.\", CLASS_NAMES[c].c_str());\n        return false;\n      }\n      fclose(fp_det);\n      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_detection\", CLASS_NAMES[c], precision, 0);\n      if(compute_aos){\n        saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_orientation\", CLASS_NAMES[c], aos, 1);\n        fclose(fp_ori);\n      }\n    }\n  }\n\n  // don't evaluate AOS for birdview boxes and 3D boxes\n  compute_aos = false;\n\n  // eval bird's eye view bounding boxes\n  for (int c = 0; c < NUM_CLASS; c++) {\n    CLASSES cls = (CLASSES)c;\n    if (eval_ground[c]) {\n      fp_det = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_detection_ground.txt\").c_str(), \"w\");\n      vector<double> precision[3], aos[3];\n      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[0], aos[0], EASY, GROUND)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[1], aos[1], MODERATE, GROUND)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, groundBoxOverlap, precision[2], aos[2], HARD, GROUND)) {\n        mail->msg(\"%s evaluation failed.\", CLASS_NAMES[c].c_str());\n        return false;\n      }\n      fclose(fp_det);\n      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_detection_ground\", CLASS_NAMES[c], precision, 0);\n    }\n  }\n\n  // eval 3D bounding boxes\n  for (int c = 0; c < NUM_CLASS; c++) {\n    CLASSES cls = (CLASSES)c;\n    if (eval_3d[c]) {\n      fp_det = fopen((result_dir + \"/stats_\" + CLASS_NAMES[c] + \"_detection_3d.txt\").c_str(), \"w\");\n      vector<double> precision[3], aos[3];\n      if(   !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[0], aos[0], EASY, BOX3D)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[1], aos[1], MODERATE, BOX3D)\n         || !eval_class(fp_det, fp_ori, cls, groundtruth, detections, compute_aos, box3DOverlap, precision[2], aos[2], HARD, BOX3D)) {\n        mail->msg(\"%s evaluation failed.\", CLASS_NAMES[c].c_str());\n        return false;\n      }\n      fclose(fp_det);\n      saveAndPlotPlots(plot_dir, CLASS_NAMES[c] + \"_detection_3d\", CLASS_NAMES[c], precision, 0);\n    }\n  }\n\n  // success\n  return true;\n}\n\nint32_t main (int32_t argc,char *argv[]) {\n\n  // we need 2 or 4 arguments!\n  if (argc!=3) {\n    cout << \"Usage: ./eval_detection_3d_offline gt_dir result_dir\" << endl;\n    return 1;\n  }\n\n  // read arguments\n  string gt_dir = argv[1];\n  string result_dir = argv[2];\n\n  // init notification mail\n  Mail *mail;\n  mail = new Mail();\n  mail->msg(\"Thank you for participating in our evaluation!\");\n\n  // run evaluation\n  if (eval(gt_dir, result_dir, mail)) {\n    mail->msg(\"Your evaluation results are available at:\");\n    mail->msg(result_dir.c_str());\n  } else {\n    system((\"rm -r \" + result_dir + \"/plot\").c_str());\n    mail->msg(\"An error occured while processing your results.\");\n  }\n\n  // send mail and exit\n  delete mail;\n\n  return 0;\n}\n\n\n"
  },
  {
    "path": "tools/kitti-eval/mail.h",
    "content": "#ifndef MAIL_H\n#define MAIL_H\n\n#include <stdio.h>\n#include <stdarg.h>\n#include <string.h>\n\nclass Mail {\n\npublic:\n\n  Mail (std::string email = \"\") {\n    if (email.compare(\"\")) {\n      mail = popen(\"/usr/lib/sendmail -t -f noreply@cvlibs.net\",\"w\");\n      fprintf(mail,\"To: %s\\n\", email.c_str());\n      fprintf(mail,\"From: noreply@cvlibs.net\\n\");\n      fprintf(mail,\"Subject: KITTI Evaluation Benchmark\\n\");\n      fprintf(mail,\"\\n\\n\");\n    } else {\n      mail = 0;\n    }\n  }\n  \n  ~Mail() {\n    if (mail) {\n      pclose(mail);\n    }\n  }\n  \n  void msg (const char *format, ...) {\n    va_list args;\n    va_start(args,format);\n    if (mail) {\n      vfprintf(mail,format,args);\n      fprintf(mail,\"\\n\");\n    }\n    vprintf(format,args);\n    printf(\"\\n\");\n    va_end(args);\n  }\n    \nprivate:\n\n  FILE *mail;\n  \n};\n\n#endif\n"
  },
  {
    "path": "tools/train_IGRs.py",
    "content": "\"\"\"\nTraining the coordinate localization sub-network.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport sys\nsys.path.append('../')\nimport torch\nimport os\n\nimport libs.arguments.parse as parse\nimport libs.logger.logger as liblogger\nimport libs.dataset as dataset\n# import libs.dataset.ApolloScape.car_instance\nimport libs.dataset.KITTI.car_instance\nimport libs.trainer.trainer as trainer\nimport libs.model as models\nimport libs.optimizer.optimizer as optimizer\nimport libs.loss.function as loss_func\n\nfrom libs.common.utils import get_model_summary\nfrom libs.metric.criterions import get_distance_src, get_angle_error\nfrom libs.metric.criterions import Evaluator\n\ndef choose_loss_func(model_settings, cfgs):\n    \"\"\"\n    Initialize the loss function used for training. \n    \"\"\"\n    loss_type = model_settings['loss_type']\n    if loss_type == 'JointsCompositeLoss':\n        spec_list = model_settings['loss_spec_list']\n        loss_weights = model_settings['loss_weight_list']\n        func = loss_func.JointsCompositeLoss(spec_list=spec_list,\n                                             img_size=model_settings['input_size'],\n                                             hm_size=model_settings['heatmap_size'],\n                                             cr_loss_thres=model_settings['cr_loss_threshold'],\n                                             loss_weights=loss_weights\n                                             )\n    else:\n        func = eval('loss_func.' + loss_type)(use_target_weight=cfgs['training_settings']['use_target_weight'])    \n    # the order of the points are needed when computing the cross-ratio loss\n    if model_settings['loss_spec_list'][2] != 'None':\n        func.cr_indices = libs.dataset.KITTI.car_instance.cr_indices_dict['bbox12']\n        func.target_cr = 4/3\n    return func.cuda()\n\ndef train(model, model_settings, GPUs, cfgs, logger, final_output_dir):\n    \"\"\"\n    The training method.\n    \"\"\"\n    # get model summary\n    input_size = model_settings['input_size']\n    input_channels = 5 if cfgs['heatmapModel']['add_xy'] else 3\n    dump_input = torch.rand((1, input_channels, input_size[1], input_size[0]))\n    logger.info(get_model_summary(model, dump_input))\n    \n    model = torch.nn.DataParallel(model, device_ids=GPUs).cuda()\n\n    # get forward-pass time if you need \n    # import time\n    # dump_input = torch.rand((64, input_channels, input_size[1], input_size[0])).cuda()\n    # t1 = time.clock()\n    # out = model(dump_input)\n    # l = out[0].sum()\n    # l.backward()\n    # torch.cuda.synchronize()\n    # print(time.clock() - t1)\n\n    # specify loss function \n    func = choose_loss_func(model_settings, cfgs)\n    \n    # dataset preparation\n    data_cfgs = cfgs['dataset']\n    train_dataset, valid_dataset = eval('dataset.' + data_cfgs['name'] + \n                                        '.car_instance').prepare_data(cfgs, logger)\n    \n    # get the optimizer and learning rate scheduler    \n    optim, sche = optimizer.prepare_optim(model, cfgs)\n\n    # metrics used for training error\n    if cfgs['exp_type'] in ['baselinealpha', 'baselinetheta']:\n        metric_function = get_angle_error\n        save_debug_images = False\n    elif cfgs['exp_type'] == 'instanceto2d':\n        metric_function = get_distance_src\n        save_debug_images = cfgs['training_settings']['debug']['save']\n    collate_fn = train_dataset.get_collate_fn()\n    trainer.train(train_dataset=train_dataset, \n                  valid_dataset=valid_dataset,\n                  model=model,                   \n                  loss_func=func,\n                  optim=optim, \n                  sche=sche, \n                  metric_func=metric_function,\n                  cfgs=cfgs, \n                  logger=logger,\n                  collate_fn=collate_fn,\n                  save_debug=save_debug_images\n                  )\n\n    final_model_state_file = os.path.join(final_output_dir, 'HC.pth')\n    logger.info('=> saving final model state to {}'.format(final_model_state_file))\n    torch.save(model.module.state_dict(), final_model_state_file)\n    return\n\ndef evaluate(model, model_settings, GPUs, cfgs, logger, final_output_dir, eval_train=False):\n    saved_path = cfgs['dirs']['load_hm_model']\n    model.load_state_dict(torch.load(saved_path))\n    model = torch.nn.DataParallel(model, device_ids=GPUs).cuda()\n    evaluator = Evaluator(cfgs['testing_settings']['eval_metrics'], cfgs)\n    # define loss function (criterion) and optimizer\n    loss_func = choose_loss_func(model_settings, cfgs)\n    # dataset preparation\n    data_cfgs = cfgs['dataset']\n    train_dataset, valid_dataset = eval('dataset.' + data_cfgs['name'] + \n                                        '.car_instance').prepare_data(cfgs, logger)\n    collate_fn = valid_dataset.get_collate_fn()\n    logger.info(\"Evaluation on the validation split:\")\n    trainer.evaluate(valid_dataset, model, loss_func, cfgs, logger, evaluator, collate_fn=collate_fn)    \n    if eval_train:\n        logger.info(\"Evaluation on the training split:\")\n        trainer.evaluate(train_dataset, model, loss_func, cfgs, logger, evaluator, collate_fn=collate_fn)\n    return\n\ndef main():\n    # experiment configurations\n    cfgs = parse.parse_args()\n    \n    # logging\n    logger, final_output_dir = liblogger.get_logger(cfgs)   \n    \n    # Set GPU\n    if cfgs['use_gpu'] and torch.cuda.is_available():\n        GPUs = cfgs['gpu_id']\n    else:\n        logger.info(\"GPU acceleration is disabled.\")\n    \n    if len(GPUs) == 1:\n        torch.cuda.set_device(GPUs[0])\n        \n    # cudnn related setting\n    torch.backends.cudnn.benchmark = cfgs['cudnn']['benchmark']\n    torch.backends.cudnn.deterministic = cfgs['cudnn']['deterministic']\n    torch.backends.cudnn.enabled = cfgs['cudnn']['enabled']\n    \n    # model initialization\n    model_settings = cfgs['heatmapModel']\n    model_name = model_settings['name']\n    method_str = 'models.heatmapModel' + '.' + model_name + '.get_pose_net'\n    model = eval(method_str)(cfgs, is_train=cfgs['train'])\n\n    if cfgs['train']:\n        train(model, model_settings, GPUs, cfgs, logger, final_output_dir)\n    elif cfgs['evaluate']:\n        evaluate(model, model_settings, GPUs, cfgs, logger, final_output_dir)\n    \nif __name__ == '__main__':\n    main()\n    torch.cuda.empty_cache()"
  },
  {
    "path": "tools/train_lifting.py",
    "content": "\"\"\"\nTraining the sub-network \\mathcal{L}() that predicts 3D cuboid \ngiven 2D screen coordinates as input.\n\nAuthor: Shichao Li\nContact: nicholas.li@connect.ust.hk\n\"\"\"\n\nimport sys\nsys.path.append('../')\n\nimport libs.arguments.parse as parse\nimport libs.logger.logger as liblogger\n# Deprecated: Apolloscape dataset\n#import libs.dataset.ApolloScape.car_instance as car_instance\n# KITTI dataset\nimport libs.dataset.KITTI.car_instance as car_instance\nimport libs.trainer.trainer as trainer\n\nimport torch\nimport numpy as np\nimport os\n\ndef main():\n    # experiment configurations\n    cfgs = parse.parse_args()\n\n    # logging\n    logger, final_output_dir = liblogger.get_logger(cfgs)\n    \n    # Set GPU\n    if cfgs['use_gpu'] and torch.cuda.is_available():\n        GPUs = cfgs['gpu_id']\n    else:\n        logger.info(\"GPU acceleration is disabled.\")\n\n    # load datasets\n    train_dataset, eval_dataset = car_instance.prepare_data(cfgs, logger)\n    logger.info(\"Finished preparing datasets...\")\n    \n    # training\n    if cfgs['train']:\n        record = trainer.train_cascade(train_dataset, eval_dataset, cfgs, logger)\n        cascade = record['cascade']\n\n    if cfgs['save'] and 'cascade' in locals():\n        save_path = os.path.join(cfgs['dirs']['output'], \"KITTI\")\n        if not os.path.exists(save_path):\n            os.mkdir(save_path)\n        # save the model and the normalization statistics\n        torch.save(cascade[0].cpu().state_dict(), \n                   os.path.join(save_path, 'L.pth')\n                   )\n        np.save(os.path.join(save_path, 'LS.npy'), train_dataset.statistics)\n        logger.info('=> saving final model state to {}'.format(save_path))        \n        # save loss history\n        #np.save(os.path.join(save_path, 'record.npy'), record['record'])\n        \n    if cfgs['visualize'] or cfgs['evaluate']:\n        # visualize the predictions\n        cascade = torch.load(cfgs['load_model_path'])        \n        if cfgs['use_gpu']:\n            cascade.cuda()\n            \n    if cfgs['evaluate']:   \n        trainer.evaluate_cascade(cascade, eval_dataset, cfgs) \n        \n    return record\n\nif __name__ == \"__main__\":\n    record = main()\n    torch.cuda.empty_cache()"
  }
]