Repository: SangbumChoi/MobileHumanPose
Branch: master
Commit: a359dd9798e0
Files: 51
Total size: 172.2 KB

Directory structure:
gitextract_l8_g4mzf/

├── .gitignore
├── LICENSE
├── README.md
├── common/
│   ├── backbone/
│   │   ├── __init__.py
│   │   ├── lpnet_res_concat.py
│   │   ├── lpnet_ski_concat.py
│   │   └── lpnet_wo_concat.py
│   ├── base.py
│   ├── logger.py
│   ├── timer.py
│   └── utils/
│       ├── __init__.py
│       ├── dir_utils.py
│       ├── pose_utils.py
│       └── vis.py
├── data/
│   ├── Dummy/
│   │   ├── Dummy.py
│   │   ├── annotations/
│   │   │   ├── Dummy_subject1_camera.json
│   │   │   ├── Dummy_subject1_data.json
│   │   │   └── Dummy_subject1_joint_3d.json
│   │   └── bbox_root/
│   │       └── bbox_dummy_output.json
│   ├── Human36M/
│   │   └── Human36M.py
│   ├── MPII/
│   │   └── MPII.py
│   ├── MSCOCO/
│   │   └── MSCOCO.py
│   ├── MuCo/
│   │   └── MuCo.py
│   ├── MuPoTS/
│   │   ├── MuPoTS.py
│   │   └── mpii_mupots_multiperson_eval.m
│   ├── dataset.py
│   └── multiple_datasets.py
├── demo/
│   └── demo.py
├── main/
│   ├── config.py
│   ├── intermediate.py
│   ├── model.py
│   ├── pytorch2coreml.py
│   ├── pytorch2onnx.py
│   ├── summary.py
│   ├── test.py
│   ├── time.py
│   └── train.py
├── requirements.txt
├── tool/
│   └── Human36M/
│       ├── README.MD
│       ├── h36m2coco.py
│       └── preprocess_h36m.m
└── vis/
    ├── coco_img_name.py
    ├── multi/
    │   ├── draw_2Dskeleton.m
    │   ├── draw_3Dpose_coco.m
    │   ├── draw_3Dpose_mupots.m
    │   └── draw_3Dskeleton.m
    ├── mupots_img_name.py
    └── single/
        ├── draw_2Dskeleton.m
        ├── draw_3Dpose_coco.m
        ├── draw_3Dpose_mupots.m
        └── draw_3Dskeleton.m

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
# virtualenv setting
venv_3DMPPE

# output result
output

# demo output
demo/*.pth.tar

# byte-compiled
/__pycache_/
*/__pycache/*
*/*/__pycache/
*/*/*/__pycache/
*.py[cod]
*.pyc

# nohup process
*.out

# idea
.DS_Store
.idea


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2019 Gyeongsik Moon

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices"

#### [2021.11.23] There will be massive refactoring and optimization expected. It will be released as soon as possible including new model.pth, Please wait for the model!(expecting end of December)
#### [2022.05.19] Dummy dataloader is added. This will make reduce about to 100x faster that user to generate dummy pth.tar file of MobileHumanPose model for their PoC.

## Introduction

This repo is official **[PyTorch](https://pytorch.org)** implementation of **[MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021)](https://openaccess.thecvf.com/content/CVPR2021W/MAI/html/Choi_MobileHumanPose_Toward_Real-Time_3D_Human_Pose_Estimation_in_Mobile_Devices_CVPRW_2021_paper.html)**.

## Dependencies
* [PyTorch](https://pytorch.org)
* [CUDA](https://developer.nvidia.com/cuda-downloads)
* [cuDNN](https://developer.nvidia.com/cudnn)
* [Anaconda](https://www.anaconda.com/download/)
* [COCO API](https://github.com/cocodataset/cocoapi)

This code is tested under Ubuntu 16.04, CUDA 11.2 environment with two NVIDIA RTX or V100 GPUs.

Python 3.6.5 version with virtualenv is used for development.

## Directory

### Root
The `${ROOT}` is described as below.
```
${ROOT}
|-- data
|-- demo
|-- common
|-- main
|-- tool
|-- vis
`-- output
```
* `data` contains data loading codes and soft links to images and annotations directories.
* `demo` contains demo codes.
* `common` contains kernel codes for 3d multi-person pose estimation system. Also custom backbone is implemented in this repo
* `main` contains high-level codes for training or testing the network.
* `tool` contains data pre-processing codes. You don't have to run this code. I provide pre-processed data below.
* `vis` contains scripts for 3d visualization.
* `output` contains log, trained models, visualized outputs, and test result.

### Data
You need to follow directory structure of the `data` as below.
```
${POSE_ROOT}
|-- data
|   |-- Human36M
|   |   |-- bbox_root
|   |   |   |-- bbox_root_human36m_output.json
|   |   |-- images
|   |   |-- annotations
|   |-- MPII
|   |   |-- images
|   |   |-- annotations
|   |-- MSCOCO
|   |   |-- bbox_root
|   |   |   |-- bbox_root_coco_output.json
|   |   |-- images
|   |   |   |-- train2017
|   |   |   |-- val2017
|   |   |-- annotations
|   |-- MuCo
|   |   |-- data
|   |   |   |-- augmented_set
|   |   |   |-- unaugmented_set
|   |   |   |-- MuCo-3DHP.json
|   |-- MuPoTS
|   |   |-- bbox_root
|   |   |   |-- bbox_mupots_output.json
|   |   |-- data
|   |   |   |-- MultiPersonTestSet
|   |   |   |-- MuPoTS-3D.json
```
* Download Human3.6M parsed data [[data](https://drive.google.com/drive/folders/1kgVH-GugrLoc9XyvP6nRoaFpw3TmM5xK?usp=sharing)]
* Download MPII parsed data [[images](http://human-pose.mpi-inf.mpg.de/)][[annotations](https://drive.google.com/drive/folders/1MmQ2FRP0coxHGk0Ntj0JOGv9OxSNuCfK?usp=sharing)]
* Download MuCo parsed and composited data [[data](https://drive.google.com/drive/folders/1yL2ey3aWHJnh8f_nhWP--IyC9krAPsQN?usp=sharing)]
* Download MuPoTS parsed data [[images](http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson/)][[annotations](https://drive.google.com/drive/folders/1WmfQ8UEj6nuamMfAdkxmrNcsQTrTfKK_?usp=sharing)]
* All annotation files follow [MS COCO format](http://cocodataset.org/#format-data).
* If you want to add your own dataset, you have to convert it to [MS COCO format](http://cocodataset.org/#format-data).

### Output
You need to follow the directory structure of the `output` folder as below.
```
${POSE_ROOT}
|-- output
|-- |-- log
|-- |-- model_dump
|-- |-- result
`-- |-- vis
```
* Creating `output` folder as soft link form is recommended instead of folder form because it would take large storage capacity.
* `log` folder contains training log file.
* `model_dump` folder contains saved checkpoints for each epoch.
* `result` folder contains final estimation files generated in the testing stage.
* `vis` folder contains visualized results.

### 3D visualization
* Run `$DB_NAME_img_name.py` to get image file names in `.txt` format.
* Place your test result files (`preds_2d_kpt_$DB_NAME.mat`, `preds_3d_kpt_$DB_NAME.mat`) in `single` or `multi` folder.
* Run `draw_3Dpose_$DB_NAME.m`

<p align="middle">
<img src="assets/test.JPG">
</p>

## Running 3DMPPE_POSENET

### Requirements

```shell
cd main
pip install -r requirements.txt
```

### Setup Training
* In the `main/config.py`, you can change settings of the model including dataset to use, network backbone, and input size and so on.

### Train
In the `main` folder, run
```bash
python train.py --gpu 0-1 --backbone LPSKI
```
to train the network on the GPU 0,1. 

If you want to continue experiment, run 
```bash
python train.py --gpu 0-1 --backbone LPSKI --continue
```
`--gpu 0,1` can be used instead of `--gpu 0-1`.

### Test
Place trained model at the `output/model_dump/`.

In the `main` folder, run 
```bash
python test.py --gpu 0-1 --test_epoch 20-21 --backbone LPSKI
```
to test the network on the GPU 0,1 with 20th and 21th epoch trained model. `--gpu 0,1` can be used instead of `--gpu 0-1`. For the backbone you can either choose 
BACKBONE_DICT = {
    'LPRES':LpNetResConcat,
    'LPSKI':LpNetSkiConcat,
    'LPWO':LpNetWoConcat
    }

#### Human3.6M dataset using protocol 1
For the evaluation, you can run `test.py` or there are evaluation codes in `Human36M`.
<p align="center">
<img src="assets/protocol1.JPG">
</p>

#### Human3.6M dataset using protocol 2
For the evaluation, you can run `test.py` or there are evaluation codes in `Human36M`.
<p align="center">
<img src="assets/protocol2.JPG">
</p>

#### MuPoTS-3D dataset
For the evaluation, run `test.py`.  After that, move `data/MuPoTS/mpii_mupots_multiperson_eval.m` in `data/MuPoTS/data`. Also, move the test result files (`preds_2d_kpt_mupots.mat` and `preds_3d_kpt_mupots.mat`) in `data/MuPoTS/data`. Then run `mpii_mupots_multiperson_eval.m` with your evaluation mode arguments.
<p align="center">
<img src="assets/mupots.JPG">
</p>

#### TFLite inference
For the inference in mobile devices we also tested in mobile devices which converting PyTorch implementation through onnx and finally serving into TFlite.
Official demo app is available in [here](https://github.com/tucan9389/PoseEstimation-TFLiteSwift)

## Reference

**What this repo cames from:**
Training section and is based on following paper and github
* [PyTorch](https://pytorch.org) implementation of [Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image (ICCV 2019)](https://arxiv.org/abs/1907.11346).
* Flexible and simple code.
* Compatibility for most of the publicly available 2D and 3D, single and multi-person pose estimation datasets including **[Human3.6M](http://vision.imar.ro/human3.6m/description.php), [MPII](http://human-pose.mpi-inf.mpg.de/), [MS COCO 2017](http://cocodataset.org/#home), [MuCo-3DHP](http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson/) and [MuPoTS-3D](http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson/)**.
* Human pose estimation visualization code.

```
@InProceedings{Choi_2021_CVPR,
    author    = {Choi, Sangbum and Choi, Seokeon and Kim, Changick},
    title     = {MobileHumanPose: Toward Real-Time 3D Human Pose Estimation in Mobile Devices},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month     = {June},
    year      = {2021},
    pages     = {2328-2338}
}
```


================================================
FILE: common/backbone/__init__.py
================================================
from backbone.lpnet_res_concat import *
from backbone.lpnet_ski_concat import *
from backbone.lpnet_wo_concat import *


================================================
FILE: common/backbone/lpnet_res_concat.py
================================================
import torch.nn as nn
import torch
from torchsummary import summary

def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by 8
    It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class DoubleConv(nn.Sequential):
    def __init__(self, in_ch, out_ch, norm_layer=None, activation_layer=None):
        super(DoubleConv, self).__init__(
            nn.Conv2d(in_ch , out_ch, kernel_size=1),
            norm_layer(out_ch),
            activation_layer(out_ch),
            nn.Conv2d(out_ch, out_ch, kernel_size=3, padding=1),
            norm_layer(out_ch),
            activation_layer(out_ch),
            nn.UpsamplingBilinear2d(scale_factor=2)
        )

class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None, activation_layer=None):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            norm_layer(out_planes),
            activation_layer(out_planes)
        )

class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None, activation_layer=None):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup

        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer))
        layers.extend([
            # dw
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer, activation_layer=activation_layer),
            # pw-linear
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
            norm_layer(oup),
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)

class LpNetResConcat(nn.Module):
    def __init__(self,
                 input_size,
                 joint_num,
                 input_channel = 48,
                 embedding_size = 2048,
                 width_mult=1.0,
                 round_nearest=8,
                 block=None,
                 norm_layer=None,
                 activation_layer=None,
                 inverted_residual_setting=None):

        super(LpNetResConcat, self).__init__()

        assert input_size[1] in [256]

        if block is None:
            block = InvertedResidual
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.PReLU # PReLU does not have inplace True
        if inverted_residual_setting is None:
            inverted_residual_setting = [
                # t, c, n, s
                [1, 64, 1, 1],  #[-1, 48, 256, 256]
                [6, 48, 2, 2],  #[-1, 48, 128, 128]
                [6, 48, 3, 2],  #[-1, 48, 64, 64]
                [6, 64, 4, 2],  #[-1, 64, 32, 32]
                [6, 96, 3, 2],  #[-1, 96, 16, 16]
                [6, 160, 3, 2], #[-1, 160, 8, 8]
                [6, 320, 1, 1], #[-1, 320, 8, 8]
            ]

        # building first layer
        inp_channel = [_make_divisible(input_channel * width_mult, round_nearest),
                         _make_divisible(input_channel * width_mult, round_nearest) + inverted_residual_setting[0][1],
                         inverted_residual_setting[0][1] + inverted_residual_setting[1][1],
                         inverted_residual_setting[1][1] + inverted_residual_setting[2][1],
                         inverted_residual_setting[2][1] + inverted_residual_setting[3][1],
                         inverted_residual_setting[3][1] + inverted_residual_setting[4][1],
                         inverted_residual_setting[4][1] + inverted_residual_setting[5][1],
                         inverted_residual_setting[5][1] + inverted_residual_setting[6][1],
                         inverted_residual_setting[6][1] + embedding_size,
                         256 + embedding_size,
                       ]
        self.first_conv = ConvBNReLU(3, inp_channel[0], stride=1, norm_layer=norm_layer, activation_layer=activation_layer)

        inv_residual = []
        # building inverted residual blocks
        j = 0
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                input_channel = inp_channel[j] if i == 0 else output_channel
                inv_residual.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer, activation_layer=activation_layer))
            j += 1
        # make it nn.Sequential
        self.inv_residual = nn.Sequential(*inv_residual)

        self.last_conv = ConvBNReLU(inp_channel[j], embedding_size, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer)

        self.deonv0 = DoubleConv(inp_channel[j+1], 256, norm_layer=norm_layer, activation_layer=activation_layer)
        self.deonv1 = DoubleConv(2304, 256, norm_layer=norm_layer, activation_layer=activation_layer)
        self.deonv2 = DoubleConv(512, 256, norm_layer=norm_layer, activation_layer=activation_layer)

        self.final_layer = nn.Conv2d(
            in_channels=256,
            out_channels= joint_num * 64,
            kernel_size=1,
            stride=1,
            padding=0
        )

        self.avgpool = nn.AvgPool2d(3, stride=2, padding=1, count_include_pad=False)
        self.upsample = nn.UpsamplingBilinear2d(scale_factor=2)

    def forward(self, x):
        x0 = self.first_conv(x)
        x1 = self.inv_residual[0:1](x0)
        x2 = self.inv_residual[1:3](torch.cat([x0, x1], dim=1))
        x0 = self.inv_residual[3:6](torch.cat([self.avgpool(x1), x2], dim=1))
        x1 = self.inv_residual[6:10](torch.cat([self.avgpool(x2), x0], dim=1))
        x2 = self.inv_residual[10:13](torch.cat([self.avgpool(x0), x1], dim=1))
        x0 = self.inv_residual[13:16](torch.cat([self.avgpool(x1), x2], dim=1))
        x1 = self.inv_residual[16:17](torch.cat([self.avgpool(x2), x0], dim=1))
        x2 = self.last_conv(torch.cat([x0, x1], dim=1))
        x0 = self.deonv0(torch.cat([x1, x2], dim=1))
        x1 = self.deonv1(torch.cat([self.upsample(x2), x0], dim=1))
        x2 = self.deonv2(torch.cat([self.upsample(x0), x1], dim=1))
        x0 = self.final_layer(x2)
        return x0

    def init_weights(self):
        for i in [self.deconv0, self.deconv1, self.deconv2]:
            for name, m in i.named_modules():
                if isinstance(m, nn.ConvTranspose2d):
                    nn.init.normal_(m.weight, std=0.001)
                elif isinstance(m, nn.BatchNorm2d):
                    nn.init.constant_(m.weight, 1)
                    nn.init.constant_(m.bias, 0)
        for j in [self.first_conv, self.inv_residual, self.last_conv, self.final_layer]:
            for m in j.modules():
                if isinstance(m, nn.Conv2d):
                    nn.init.normal_(m.weight, std=0.001)
                    if hasattr(m, 'bias'):
                        if m.bias is not None:
                            nn.init.constant_(m.bias, 0)

if __name__ == "__main__":
    model = LpNetResConcat((256, 256), 18)
    test_data = torch.rand(1, 3, 256, 256)
    test_outputs = model(test_data)
    # print(test_outputs.size())
    summary(model, (3, 256, 256))

================================================
FILE: common/backbone/lpnet_ski_concat.py
================================================
import torch.nn as nn
import torch
from torchsummary import summary

def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by 8
    It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class DeConv(nn.Sequential):
    def __init__(self, in_ch, mid_ch, out_ch, norm_layer=None, activation_layer=None):
        super(DeConv, self).__init__(
            nn.Conv2d(in_ch + mid_ch, mid_ch, kernel_size=1),
            norm_layer(mid_ch),
            activation_layer(mid_ch),
            nn.Conv2d(mid_ch, out_ch, kernel_size=3, padding=1),
            norm_layer(out_ch),
            activation_layer(out_ch),
            nn.UpsamplingBilinear2d(scale_factor=2)
        )

class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None, activation_layer=None):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            norm_layer(out_planes),
            activation_layer(out_planes)
        )

class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None, activation_layer=None):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup

        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer))
        layers.extend([
            # dw
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer, activation_layer=activation_layer),
            # pw-linear
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
            norm_layer(oup),
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)

class LpNetSkiConcat(nn.Module):
    def __init__(self,
                 input_size,
                 joint_num,
                 input_channel = 48,
                 embedding_size = 2048,
                 width_mult=1.0,
                 round_nearest=8,
                 block=None,
                 norm_layer=None,
                 activation_layer=None,
                 inverted_residual_setting=None):

        super(LpNetSkiConcat, self).__init__()

        assert input_size[1] in [256]

        if block is None:
            block = InvertedResidual
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.PReLU # PReLU does not have inplace True
        if inverted_residual_setting is None:
            inverted_residual_setting = [
                # t, c, n, s
                [1, 64, 1, 2],  #[-1, 48, 256, 256]
                [6, 48, 2, 2],  #[-1, 48, 128, 128]
                [6, 48, 3, 2],  #[-1, 48, 64, 64]
                [6, 64, 4, 2],  #[-1, 64, 32, 32]
                [6, 96, 3, 2],  #[-1, 96, 16, 16]
                [6, 160, 3, 1], #[-1, 160, 8, 8]
                [6, 320, 1, 1], #[-1, 320, 8, 8]
            ]

        # building first layer
        input_channel = _make_divisible(input_channel * width_mult, round_nearest)

        self.first_conv = ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer, activation_layer=activation_layer)

        inv_residual = []
        # building inverted residual blocks
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                inv_residual.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer, activation_layer=activation_layer))
                input_channel = output_channel
        # make it nn.Sequential
        self.inv_residual = nn.Sequential(*inv_residual)

        self.last_conv = ConvBNReLU(input_channel, embedding_size, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer)

        self.deconv0 = DeConv(embedding_size, _make_divisible(inverted_residual_setting[-3][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)
        self.deconv1 = DeConv(256, _make_divisible(inverted_residual_setting[-4][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)
        self.deconv2 = DeConv(256, _make_divisible(inverted_residual_setting[-5][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)

        self.final_layer = nn.Conv2d(
            in_channels=256,
            out_channels= joint_num * 32,
            kernel_size=1,
            stride=1,
            padding=0
        )

    def forward(self, x):
        x = self.first_conv(x)
        x = self.inv_residual[0:6](x)
        x2 = x
        x = self.inv_residual[6:10](x)
        x1 = x
        x = self.inv_residual[10:13](x)
        x0 = x
        x = self.inv_residual[13:16](x)
        x = self.inv_residual[16:](x)
        z = self.last_conv(x)
        z = torch.cat([x0, z], dim=1)
        z = self.deconv0(z)
        z = torch.cat([x1, z], dim=1)
        z = self.deconv1(z)
        z = torch.cat([x2, z], dim=1)
        z = self.deconv2(z)
        z = self.final_layer(z)
        return z

    def init_weights(self):
        for i in [self.deconv0, self.deconv1, self.deconv2]:
            for name, m in i.named_modules():
                if isinstance(m, nn.ConvTranspose2d):
                    nn.init.normal_(m.weight, std=0.001)
                elif isinstance(m, nn.BatchNorm2d):
                    nn.init.constant_(m.weight, 1)
                    nn.init.constant_(m.bias, 0)
        for j in [self.first_conv, self.inv_residual, self.last_conv, self.final_layer]:
            for m in j.modules():
                if isinstance(m, nn.Conv2d):
                    nn.init.normal_(m.weight, std=0.001)
                    if hasattr(m, 'bias'):
                        if m.bias is not None:
                            nn.init.constant_(m.bias, 0)

if __name__ == "__main__":
    LpNetSkiConcat((256, 256), 18).init_weights()
    model = LpNetSkiConcat((256, 256), 18)
    test_data = torch.rand(1, 3, 256, 256)
    test_outputs = model(test_data)
    print(test_outputs.size())
    summary(model, (3, 256, 256))


================================================
FILE: common/backbone/lpnet_wo_concat.py
================================================
import torch.nn as nn
import torch
from torchsummary import summary

def _make_divisible(v, divisor, min_value=None):
    """
    This function is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by 8
    It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
    :param v:
    :param divisor:
    :param min_value:
    :return:
    """
    if min_value is None:
        min_value = divisor
    new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
    # Make sure that round down does not go down by more than 10%.
    if new_v < 0.9 * v:
        new_v += divisor
    return new_v

class DeConv(nn.Sequential):
    def __init__(self, in_ch, mid_ch, out_ch, norm_layer=None, activation_layer=None):
        super(DeConv, self).__init__(
            nn.Conv2d(in_ch, mid_ch, kernel_size=1),
            norm_layer(mid_ch),
            activation_layer(mid_ch),
            nn.Conv2d(mid_ch, out_ch, kernel_size=3, padding=1),
            norm_layer(out_ch),
            activation_layer(out_ch),
            nn.UpsamplingBilinear2d(scale_factor=2)
        )

class ConvBNReLU(nn.Sequential):
    def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None, activation_layer=None):
        padding = (kernel_size - 1) // 2
        super(ConvBNReLU, self).__init__(
            nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
            norm_layer(out_planes),
            activation_layer(out_planes)
        )

class InvertedResidual(nn.Module):
    def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None, activation_layer=None):
        super(InvertedResidual, self).__init__()
        self.stride = stride
        assert stride in [1, 2]

        hidden_dim = int(round(inp * expand_ratio))
        self.use_res_connect = self.stride == 1 and inp == oup

        layers = []
        if expand_ratio != 1:
            # pw
            layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer))
        layers.extend([
            # dw
            ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer, activation_layer=activation_layer),
            # pw-linear
            nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
            norm_layer(oup),
        ])
        self.conv = nn.Sequential(*layers)

    def forward(self, x):
        if self.use_res_connect:
            return x + self.conv(x)
        else:
            return self.conv(x)

class LpNetWoConcat(nn.Module):
    def __init__(self,
                 input_size,
                 joint_num,
                 input_channel = 48,
                 embedding_size = 2048,
                 width_mult=1.0,
                 round_nearest=8,
                 block=None,
                 norm_layer=None,
                 activation_layer=None,
                 inverted_residual_setting=None):

        super(LpNetWoConcat, self).__init__()

        assert input_size[1] in [256]

        if block is None:
            block = InvertedResidual
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if activation_layer is None:
            activation_layer = nn.PReLU # PReLU does not have inplace True
        if inverted_residual_setting is None:
            inverted_residual_setting = [
                # t, c, n, s
                [1, 64, 1, 1],  #[-1, 48, 256, 256]
                [6, 48, 2, 2],  #[-1, 48, 128, 128]
                [6, 48, 3, 2],  #[-1, 48, 64, 64]
                [6, 64, 4, 2],  #[-1, 64, 32, 32]
                [6, 96, 3, 2],  #[-1, 96, 16, 16]
                [6, 160, 3, 2], #[-1, 160, 8, 8]
                [6, 320, 1, 1], #[-1, 320, 8, 8]
            ]

        # building first layer
        input_channel = _make_divisible(input_channel * width_mult, round_nearest)
        self.first_conv = ConvBNReLU(3, input_channel, stride=1, norm_layer=norm_layer, activation_layer=activation_layer)

        inv_residual = []
        # building inverted residual blocks
        for t, c, n, s in inverted_residual_setting:
            output_channel = _make_divisible(c * width_mult, round_nearest)
            for i in range(n):
                stride = s if i == 0 else 1
                inv_residual.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer, activation_layer=activation_layer))
                input_channel = output_channel
        # make it nn.Sequential
        self.inv_residual = nn.Sequential(*inv_residual)

        self.last_conv = ConvBNReLU(input_channel, embedding_size, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer)

        self.deconv0 = DeConv(embedding_size, _make_divisible(inverted_residual_setting[-2][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)
        self.deconv1 = DeConv(256, _make_divisible(inverted_residual_setting[-3][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)
        self.deconv2 = DeConv(256, _make_divisible(inverted_residual_setting[-4][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)

        self.final_layer = nn.Conv2d(
            in_channels=256,
            out_channels= joint_num * 64,
            kernel_size=1,
            stride=1,
            padding=0
        )

    def forward(self, x):
        x = self.first_conv(x)
        x = self.inv_residual(x)
        x = self.last_conv(x)
        x = self.deconv0(x)
        x = self.deconv1(x)
        x = self.deconv2(x)
        x = self.final_layer(x)
        return x

    def init_weights(self):
        for i in [self.deconv0, self.deconv1, self.deconv2]:
            for name, m in i.named_modules():
                if isinstance(m, nn.ConvTranspose2d):
                    nn.init.normal_(m.weight, std=0.001)
                elif isinstance(m, nn.BatchNorm2d):
                    nn.init.constant_(m.weight, 1)
                    nn.init.constant_(m.bias, 0)
        for j in [self.first_conv, self.inv_residual, self.last_conv, self.final_layer]:
            for m in j.modules():
                if isinstance(m, nn.Conv2d):
                    nn.init.normal_(m.weight, std=0.001)
                    if hasattr(m, 'bias'):
                        if m.bias is not None:
                            nn.init.constant_(m.bias, 0)

if __name__ == "__main__":
    model = LpNetWoConcat((256, 256), 18)
    test_data = torch.rand(1, 3, 256, 256)
    test_outputs = model(test_data)
    summary(model, (3, 256, 256))

================================================
FILE: common/base.py
================================================
import os
import os.path as osp
import math
import time
import glob
import abc
from torch.utils.data import DataLoader
import torch.optim
import torchvision.transforms as transforms
from timer import Timer
from logger import colorlogger
from torch.nn.parallel.data_parallel import DataParallel
from config import cfg
from model import get_pose_net
from dataset import DatasetLoader
from multiple_datasets import MultipleDatasets

# dynamic dataset import
for i in range(len(cfg.trainset_3d)):
    exec('from ' + cfg.trainset_3d[i] + ' import ' + cfg.trainset_3d[i])
for i in range(len(cfg.trainset_2d)):
    exec('from ' + cfg.trainset_2d[i] + ' import ' + cfg.trainset_2d[i])
exec('from ' + cfg.testset + ' import ' + cfg.testset)

class Base(object):
    __metaclass__ = abc.ABCMeta

    def __init__(self, log_name='logs.txt'):
        
        self.cur_epoch = 0

        # timer
        self.tot_timer = Timer()
        self.gpu_timer = Timer()
        self.read_timer = Timer()

        # logger
        self.logger = colorlogger(cfg.log_dir, log_name=log_name)

    @abc.abstractmethod
    def _make_batch_generator(self):
        return

    @abc.abstractmethod
    def _make_model(self):
        return

    def save_model(self, state, epoch):
        file_path = osp.join(cfg.model_dir,'snapshot_{}.pth.tar'.format(str(epoch)))
        torch.save(state, file_path)
        self.logger.info("Write snapshot into {}".format(file_path))

    def load_model(self, model, optimizer):
        model_file_list = glob.glob(osp.join(cfg.model_dir,'*.pth.tar'))
        cur_epoch = max([int(file_name[file_name.find('snapshot_') + 9 : file_name.find('.pth.tar')]) for file_name in model_file_list])
        ckpt = torch.load(osp.join(cfg.model_dir, 'snapshot_' + str(cur_epoch) + '.pth.tar')) 
        start_epoch = ckpt['epoch'] + 1
        model.load_state_dict(ckpt['network'])
        optimizer.load_state_dict(ckpt['optimizer'])

        return start_epoch, model, optimizer

class Trainer(Base):
    
    def __init__(self, cfg):
        super(Trainer, self).__init__(log_name = 'train_logs.txt')
        self.backbone = cfg.backbone

    def get_optimizer(self, model):
        
        optimizer = torch.optim.Adam(model.parameters(), lr=cfg.lr)
        return optimizer

    def set_lr(self, epoch):
        for e in cfg.lr_dec_epoch:
            if epoch < e:
                break
        if epoch < cfg.lr_dec_epoch[-1]:
            idx = cfg.lr_dec_epoch.index(e)
            for g in self.optimizer.param_groups:
                g['lr'] = cfg.lr / (cfg.lr_dec_factor ** idx)
        else:
            for g in self.optimizer.param_groups:
                g['lr'] = cfg.lr / (cfg.lr_dec_factor ** len(cfg.lr_dec_epoch))

    def get_lr(self):
        for g in self.optimizer.param_groups:
            cur_lr = g['lr']

        return cur_lr

    def _make_batch_generator(self):
        # data load and construct batch generator
        self.logger.info("Creating dataset...")
        trainset3d_loader = []
        for i in range(len(cfg.trainset_3d)):
            if i > 0:
                ref_joints_name = trainset3d_loader[0].joints_name
            else:
                ref_joints_name = None
            trainset3d_loader.append(DatasetLoader(eval(cfg.trainset_3d[i])("train"), ref_joints_name, True, transforms.Compose([\
                                                                                                        transforms.ToTensor(),
                                                                                                        transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]\
                                                                                                        )))
        ref_joints_name = trainset3d_loader[0].joints_name
        trainset2d_loader = []
        for i in range(len(cfg.trainset_2d)):
            trainset2d_loader.append(DatasetLoader(eval(cfg.trainset_2d[i])("train"), ref_joints_name, True, transforms.Compose([\
                                                                                                        transforms.ToTensor(),
                                                                                                        transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]\
                                                                                                        )))

        self.joint_num = trainset3d_loader[0].joint_num

        trainset3d_loader = MultipleDatasets(trainset3d_loader, make_same_len=False)
        if trainset2d_loader != []:
            trainset2d_loader = MultipleDatasets(trainset2d_loader, make_same_len=False)
            trainset_loader = MultipleDatasets([trainset3d_loader, trainset2d_loader], make_same_len=True)
        else:
            trainset_loader = MultipleDatasets([trainset3d_loader, ], make_same_len=True)

        self.itr_per_epoch = math.ceil(len(trainset_loader) / cfg.num_gpus / cfg.batch_size)
        self.batch_generator = DataLoader(dataset=trainset_loader, batch_size=cfg.num_gpus*cfg.batch_size, shuffle=True, num_workers=cfg.num_thread, pin_memory=True)

    def _make_model(self):
        # prepare network
        self.logger.info("Creating graph and optimizer...")
        model = get_pose_net(self.backbone, True, self.joint_num)
        if torch.cuda.is_available():
            model = DataParallel(model).cuda()
        optimizer = self.get_optimizer(model)
        if cfg.continue_train:
            start_epoch, model, optimizer = self.load_model(model, optimizer)
        else:
            start_epoch = 0
        model.train()

        self.start_epoch = start_epoch
        self.model = model
        self.optimizer = optimizer

class Tester(Base):
    
    def __init__(self, backbone):
        self.backbone = backbone
        super(Tester, self).__init__(log_name = 'test_logs.txt')

    def _make_batch_generator(self):
        # data load and construct batch generator
        # self.logger.info("Creating dataset...")
        testset = eval(cfg.testset)("test")
        testset_loader = DatasetLoader(testset, None, False, transforms.Compose([\
                                                                                                        transforms.ToTensor(),
                                                                                                        transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]\
                                                                                                        ))
        batch_generator = DataLoader(dataset=testset_loader, batch_size=cfg.num_gpus*cfg.test_batch_size, shuffle=False, num_workers=cfg.num_thread, pin_memory=True)
        
        self.testset = testset
        self.joint_num = testset_loader.joint_num
        self.skeleton = testset_loader.skeleton
        self.flip_pairs = testset.flip_pairs
        self.batch_generator = batch_generator
    
    def _make_model(self, test_epoch):
        self.test_epoch = test_epoch
        model_path = os.path.join(cfg.model_dir, 'snapshot_%d.pth.tar' % self.test_epoch)
        assert os.path.exists(model_path), 'Cannot find model at ' + model_path
        # self.logger.info('Load checkpoint from {}'.format(model_path))
        
        # prepare network
        # self.logger.info("Creating graph...")
        model = get_pose_net(self.backbone, False, self.joint_num)
        model = DataParallel(model).cuda()
        ckpt = torch.load(model_path)
        model.load_state_dict(ckpt['network'])
        model.eval()

        self.model = model

    def _evaluate(self, preds, result_save_path):
        eval_summary = self.testset.evaluate(preds, result_save_path)
        self.logger.info('{}'.format(eval_summary))

class Transformer(Base):

    def __init__(self, backbone, jointnum, modelpath):
        super(Transformer, self).__init__(log_name='transformer_logs.txt')
        self.backbone = backbone
        self.jointnum = jointnum
        self.modelpath = modelpath

    def _make_model(self):
        # prepare network
        self.logger.info("Creating graph and optimizer...")
        model = get_pose_net(self.backbone, False, self.jointnum)
        model = DataParallel(model).cuda()
        model.load_state_dict(torch.load(self.modelpath)['network'])
        single_pytorch_model = model.module
        single_pytorch_model.eval()
        self.model = single_pytorch_model


================================================
FILE: common/logger.py
================================================
import logging
import os

OK = '\033[92m'
WARNING = '\033[93m'
FAIL = '\033[91m'
END = '\033[0m'

PINK = '\033[95m'
BLUE = '\033[94m'
GREEN = OK
RED = FAIL
WHITE = END
YELLOW = WARNING

class colorlogger():
    def __init__(self, log_dir, log_name='train_logs.txt'):
        # set log
        self._logger = logging.getLogger(log_name)
        self._logger.setLevel(logging.INFO)
        log_file = os.path.join(log_dir, log_name)
        if not os.path.exists(log_dir):
            os.makedirs(log_dir)
        file_log = logging.FileHandler(log_file, mode='a')
        file_log.setLevel(logging.INFO)
        console_log = logging.StreamHandler()
        console_log.setLevel(logging.INFO)
        formatter = logging.Formatter(
            "{}%(asctime)s{} %(message)s".format(GREEN, END),
            "%m-%d %H:%M:%S")
        file_log.setFormatter(formatter)
        console_log.setFormatter(formatter)
        self._logger.addHandler(file_log)
        self._logger.addHandler(console_log)

    def debug(self, msg):
        self._logger.debug(str(msg))

    def info(self, msg):
        self._logger.info(str(msg))

    def warning(self, msg):
        self._logger.warning(WARNING + 'WRN: ' + str(msg) + END)

    def critical(self, msg):
        self._logger.critical(RED + 'CRI: ' + str(msg) + END)

    def error(self, msg):
        self._logger.error(RED + 'ERR: ' + str(msg) + END)


================================================
FILE: common/timer.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------

import time

class Timer(object):
    """A simple timer."""
    def __init__(self):
        self.total_time = 0.
        self.calls = 0
        self.start_time = 0.
        self.diff = 0.
        self.average_time = 0.
        self.warm_up = 0

    def tic(self):
        # using time.time instead of time.clock because time time.clock
        # does not normalize for multithreading
        self.start_time = time.time()

    def toc(self, average=True):
        self.diff = time.time() - self.start_time
        if self.warm_up < 10:
            self.warm_up += 1
            return self.diff
        else:
            self.total_time += self.diff
            self.calls += 1
            self.average_time = self.total_time / self.calls

        if average:
            return self.average_time
        else:
            return self.diff


================================================
FILE: common/utils/__init__.py
================================================


================================================
FILE: common/utils/dir_utils.py
================================================
import os
import sys

def make_folder(folder_name):
    if not os.path.exists(folder_name):
        os.makedirs(folder_name)

def add_pypath(path):
    if path not in sys.path:
        sys.path.insert(0, path)


================================================
FILE: common/utils/pose_utils.py
================================================
import torch
import numpy as np
from config import cfg
import copy

def cam2pixel(cam_coord, f, c):
    x = cam_coord[:, 0] / (cam_coord[:, 2] + 1e-8) * f[0] + c[0]
    y = cam_coord[:, 1] / (cam_coord[:, 2] + 1e-8) * f[1] + c[1]
    z = cam_coord[:, 2]
    img_coord = np.concatenate((x[:,None], y[:,None], z[:,None]),1)
    return img_coord

def pixel2cam(pixel_coord, f, c):
    x = (pixel_coord[:, 0] - c[0]) / f[0] * pixel_coord[:, 2]
    y = (pixel_coord[:, 1] - c[1]) / f[1] * pixel_coord[:, 2]
    z = pixel_coord[:, 2]
    cam_coord = np.concatenate((x[:,None], y[:,None], z[:,None]),1)
    return cam_coord

def world2cam(world_coord, R, t):
    cam_coord = np.dot(R, world_coord.transpose(1,0)).transpose(1,0) + t.reshape(1,3)
    return cam_coord

def rigid_transform_3D(A, B):
    centroid_A = np.mean(A, axis = 0)
    centroid_B = np.mean(B, axis = 0)
    H = np.dot(np.transpose(A - centroid_A), B - centroid_B)
    U, s, V = np.linalg.svd(H)
    R = np.dot(np.transpose(V), np.transpose(U))
    if np.linalg.det(R) < 0:
        V[2] = -V[2]
        R = np.dot(np.transpose(V), np.transpose(U))
    t = -np.dot(R, np.transpose(centroid_A)) + np.transpose(centroid_B)
    return R, t

def rigid_align(A, B):
    R, t = rigid_transform_3D(A, B)
    A2 = np.transpose(np.dot(R, np.transpose(A))) + t
    return A2

def get_bbox(joint_img):
    # bbox extract from keypoint coordinates
    bbox = np.zeros((4))
    xmin = np.min(joint_img[:,0])
    ymin = np.min(joint_img[:,1])
    xmax = np.max(joint_img[:,0])
    ymax = np.max(joint_img[:,1])
    width = xmax - xmin - 1
    height = ymax - ymin - 1
    
    bbox[0] = (xmin + xmax)/2. - width/2*1.2
    bbox[1] = (ymin + ymax)/2. - height/2*1.2
    bbox[2] = width*1.2
    bbox[3] = height*1.2

    return bbox

def process_bbox(bbox, width, height):
    # sanitize bboxes
    x, y, w, h = bbox
    x1 = np.max((0, x))
    y1 = np.max((0, y))
    x2 = np.min((width - 1, x1 + np.max((0, w - 1))))
    y2 = np.min((height - 1, y1 + np.max((0, h - 1))))
    if w*h > 0 and x2 >= x1 and y2 >= y1:
        bbox = np.array([x1, y1, x2-x1, y2-y1])
    else:
        return None

    # aspect ratio preserving bbox
    w = bbox[2]
    h = bbox[3]
    c_x = bbox[0] + w/2.
    c_y = bbox[1] + h/2.
    aspect_ratio = cfg.input_shape[1]/cfg.input_shape[0]
    if w > aspect_ratio * h:
        h = w / aspect_ratio
    elif w < aspect_ratio * h:
        w = h * aspect_ratio
    bbox[2] = w*1.25
    bbox[3] = h*1.25
    bbox[0] = c_x - bbox[2]/2.
    bbox[1] = c_y - bbox[3]/2.
    return bbox

def transform_joint_to_other_db(src_joint, src_name, dst_name):
    src_joint_num = len(src_name)
    dst_joint_num = len(dst_name)

    new_joint = np.zeros(((dst_joint_num,) + src_joint.shape[1:]))

    for src_idx in range(len(src_name)):
        name = src_name[src_idx]
        if name in dst_name:
            dst_idx = dst_name.index(name)
            new_joint[dst_idx] = src_joint[src_idx]

    return new_joint


def fliplr_joints(_joints, width, matched_parts):
    """
    flip coords
    joints: numpy array, nJoints * dim, dim == 2 [x, y] or dim == 3  [x, y, z]
    width: image width
    matched_parts: list of pairs
    """
    joints = _joints.copy()
    # Flip horizontal
    joints[:, 0] = width - joints[:, 0] - 1

    # Change left-right parts
    for pair in matched_parts:
        joints[pair[0], :], joints[pair[1], :] = joints[pair[1], :], joints[pair[0], :].copy()

    return joints

def multi_meshgrid(*args):
    """
    Creates a meshgrid from possibly many
    elements (instead of only 2).
    Returns a nd tensor with as many dimensions
    as there are arguments
    """
    args = list(args)
    template = [1 for _ in args]
    for i in range(len(args)):
        n = args[i].shape[0]
        template_copy = template.copy()
        template_copy[i] = n
        args[i] = args[i].view(*template_copy)
        # there will be some broadcast magic going on
    return tuple(args)


def flip(tensor, dims):
    if not isinstance(dims, (tuple, list)):
        dims = [dims]
    indices = [torch.arange(tensor.shape[dim] - 1, -1, -1,
                            dtype=torch.int64) for dim in dims]
    multi_indices = multi_meshgrid(*indices)
    final_indices = [slice(i) for i in tensor.shape]
    for i, dim in enumerate(dims):
        final_indices[dim] = multi_indices[i]
    flipped = tensor[final_indices]
    assert flipped.device == tensor.device
    assert flipped.requires_grad == tensor.requires_grad
    return flipped


================================================
FILE: common/utils/vis.py
================================================
import os
import cv2
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import matplotlib as mpl
from config import cfg

def vis_keypoints(img, kps, kps_lines, kp_thresh=0.4, alpha=1):

    # Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv.
    cmap = plt.get_cmap('rainbow')
    colors = [cmap(i) for i in np.linspace(0, 1, len(kps_lines) + 2)]
    colors = [(c[2] * 255, c[1] * 255, c[0] * 255) for c in colors]

    # Perform the drawing on a copy of the image, to allow for blending.
    kp_mask = np.copy(img)

    # Draw the keypoints.
    for l in range(len(kps_lines)):
        i1 = kps_lines[l][0]
        i2 = kps_lines[l][1]
        p1 = kps[0, i1].astype(np.int32), kps[1, i1].astype(np.int32)
        p2 = kps[0, i2].astype(np.int32), kps[1, i2].astype(np.int32)
        if kps[2, i1] > kp_thresh and kps[2, i2] > kp_thresh:
            cv2.line(
                kp_mask, p1, p2,
                color=colors[l], thickness=2, lineType=cv2.LINE_AA)
        if kps[2, i1] > kp_thresh:
            cv2.circle(
                kp_mask, p1,
                radius=3, color=colors[l], thickness=-1, lineType=cv2.LINE_AA)
        if kps[2, i2] > kp_thresh:
            cv2.circle(
                kp_mask, p2,
                radius=3, color=colors[l], thickness=-1, lineType=cv2.LINE_AA)

    # Blend the keypoints.
    return cv2.addWeighted(img, 1.0 - alpha, kp_mask, alpha, 0)

def vis_3d_skeleton(kpt_3d, kpt_3d_vis, kps_lines, filename=None):

    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')

    # Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv.
    cmap = plt.get_cmap('rainbow')
    colors = [cmap(i) for i in np.linspace(0, 1, len(kps_lines) + 2)]
    colors = [np.array((c[2], c[1], c[0])) for c in colors]

    for l in range(len(kps_lines)):
        i1 = kps_lines[l][0]
        i2 = kps_lines[l][1]
        x = np.array([kpt_3d[i1,0], kpt_3d[i2,0]])
        y = np.array([kpt_3d[i1,1], kpt_3d[i2,1]])
        z = np.array([kpt_3d[i1,2], kpt_3d[i2,2]])

        if kpt_3d_vis[i1,0] > 0 and kpt_3d_vis[i2,0] > 0:
            ax.plot(x, z, -y, c=colors[l], linewidth=2)
        if kpt_3d_vis[i1,0] > 0:
            ax.scatter(kpt_3d[i1,0], kpt_3d[i1,2], -kpt_3d[i1,1], c=colors[l], marker='o')
        if kpt_3d_vis[i2,0] > 0:
            ax.scatter(kpt_3d[i2,0], kpt_3d[i2,2], -kpt_3d[i2,1], c=colors[l], marker='o')

    if filename is None:
        ax.set_title('3D vis')
    else:
        ax.set_title(filename)

    ax.set_xlabel('X Label')
    ax.set_ylabel('Z Label')
    ax.set_zlabel('Y Label')
    ax.legend()
    
    plt.show()
    cv2.waitKey(0)

def vis_3d_multiple_skeleton(kpt_3d, kpt_3d_vis, kps_lines, filename=None):

    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')

    # Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv.
    cmap = plt.get_cmap('rainbow')
    colors = [cmap(i) for i in np.linspace(0, 1, len(kps_lines) + 2)]
    colors = [np.array((c[2], c[1], c[0])) for c in colors]

    for l in range(len(kps_lines)):
        i1 = kps_lines[l][0]
        i2 = kps_lines[l][1]

        person_num = kpt_3d.shape[0]
        for n in range(person_num):
            x = np.array([kpt_3d[n,i1,0], kpt_3d[n,i2,0]])
            y = np.array([kpt_3d[n,i1,1], kpt_3d[n,i2,1]])
            z = np.array([kpt_3d[n,i1,2], kpt_3d[n,i2,2]])

            if kpt_3d_vis[n,i1,0] > 0 and kpt_3d_vis[n,i2,0] > 0:
                ax.plot(x, z, -y, c=colors[l], linewidth=2)
            if kpt_3d_vis[n,i1,0] > 0:
                ax.scatter(kpt_3d[n,i1,0], kpt_3d[n,i1,2], -kpt_3d[n,i1,1], c=colors[l], marker='o')
            if kpt_3d_vis[n,i2,0] > 0:
                ax.scatter(kpt_3d[n,i2,0], kpt_3d[n,i2,2], -kpt_3d[n,i2,1], c=colors[l], marker='o')

    if filename is None:
        ax.set_title('3D vis')
    else:
        ax.set_title(filename)

    ax.set_xlabel('X Label')
    ax.set_ylabel('Z Label')
    ax.set_zlabel('Y Label')
    ax.legend()
    
    plt.show()
    cv2.waitKey(0)


================================================
FILE: data/Dummy/Dummy.py
================================================
import os
import os.path as osp
from pycocotools.coco import COCO
import numpy as np
from config import cfg
from utils.pose_utils import world2cam, cam2pixel, pixel2cam, rigid_align, process_bbox
import cv2
import random
import json
from utils.vis import vis_keypoints, vis_3d_skeleton

class Dummy:
    def __init__(self, data_split):
        self.data_split = data_split
        self.img_dir = osp.join('data', 'Dummy', 'images')
        self.annot_path = osp.join('data', 'Dummy', 'annotations')
        self.human_bbox_root_dir = osp.join('data', 'Dummy', 'bbox_root', 'bbox_root_human36m_output.json')
        self.joint_num = 18 # original:17, but manually added 'Thorax'
        self.joints_name = ('Pelvis', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Torso', 'Neck', 'Nose', 'Head', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'Thorax')
        self.flip_pairs = ( (1, 4), (2, 5), (3, 6), (14, 11), (15, 12), (16, 13) )
        self.skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) )
        self.joints_have_depth = True
        self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9,  10, 11, 12, 13, 14, 15, 16) # exclude Thorax

        self.action_name = ['Directions', 'Discussion', 'Eating', 'Greeting', 'Phoning', 'Posing', 'Purchases', 'Sitting', 'SittingDown', 'Smoking', 'Photo', 'Waiting', 'Walking', 'WalkDog', 'WalkTogether']
        self.root_idx = self.joints_name.index('Pelvis')
        self.lshoulder_idx = self.joints_name.index('L_Shoulder')
        self.rshoulder_idx = self.joints_name.index('R_Shoulder')
        self.data = self.load_data()

    def get_subsampling_ratio(self):
        if self.data_split == 'train':
            return 5
        elif self.data_split == 'test':
            return 64
        else:
            assert 0, print('Unknown subset')

    def get_subject(self):
        if self.data_split == 'train':
            subject = [1]
        elif self.data_split == 'test':
            subject = [2]
        else:
            assert 0, print("Unknown subset")

        return subject
    
    def add_thorax(self, joint_coord):
        thorax = (joint_coord[self.lshoulder_idx, :] + joint_coord[self.rshoulder_idx, :]) * 0.5
        thorax = thorax.reshape((1, 3))
        joint_coord = np.concatenate((joint_coord, thorax), axis=0)
        return joint_coord

    def load_data(self):
        print('Load data of Dummy')

        subject_list = self.get_subject()
        sampling_ratio = self.get_subsampling_ratio()
        
        # aggregate annotations from each subject
        db = COCO()
        cameras = {}
        joints = {}
        for subject in subject_list:
            # data load
            with open(osp.join(self.annot_path, 'Dummy_subject' + str(subject) + '_data.json'),'r') as f:
                annot = json.load(f)
            if len(db.dataset) == 0:
                for k,v in annot.items():
                    db.dataset[k] = v
            else:
                for k,v in annot.items():
                    db.dataset[k] += v
            # camera load
            with open(osp.join(self.annot_path, 'Dummy_subject' + str(subject) + '_camera.json'),'r') as f:
                cameras[str(subject)] = json.load(f)
            # joint coordinate load
            with open(osp.join(self.annot_path, 'Dummy_subject' + str(subject) + '_joint_3d.json'),'r') as f:
                joints[str(subject)] = json.load(f)
        db.createIndex()
       
        if self.data_split == 'test' and not cfg.use_gt_info:
            print("Get bounding box and root from " + self.human_bbox_root_dir)
            bbox_root_result = {}
            with open(self.human_bbox_root_dir) as f:
                annot = json.load(f)
            for i in range(len(annot)):
                bbox_root_result[str(annot[i]['image_id'])] = {'bbox': np.array(annot[i]['bbox']), 'root': np.array(annot[i]['root_cam'])}
        else:
            print("Get bounding box and root from groundtruth")

        data = []
        for aid in db.anns.keys():
            ann = db.anns[aid]
            image_id = ann['image_id']
            img = db.loadImgs(image_id)[0]
            img_path = osp.join(self.img_dir, img['file_name'])
            img_width, img_height = img['width'], img['height']
           
            # check subject and frame_idx
            subject = img['subject']; frame_idx = img['frame_idx'];
            if subject not in subject_list:
                continue
            if frame_idx % sampling_ratio != 0:
                continue

            # camera parameter
            cam_idx = img['cam_idx']
            cam_param = cameras[str(subject)][str(cam_idx)]
            R,t,f,c = np.array(cam_param['R'], dtype=np.float32), np.array(cam_param['t'], dtype=np.float32), np.array(cam_param['f'], dtype=np.float32), np.array(cam_param['c'], dtype=np.float32)
                
            # project world coordinate to cam, image coordinate space
            action_idx = img['action_idx']; subaction_idx = img['subaction_idx']; frame_idx = img['frame_idx'];
            joint_world = np.array(joints[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)], dtype=np.float32)
            joint_world = self.add_thorax(joint_world)
            joint_cam = world2cam(joint_world, R, t)
            joint_img = cam2pixel(joint_cam, f, c)
            joint_img[:,2] = joint_img[:,2] - joint_cam[self.root_idx,2]
            joint_vis = np.ones((self.joint_num,1))
            
            if self.data_split == 'test' and not cfg.use_gt_info:
                bbox = bbox_root_result[str(image_id)]['bbox'] # bbox should be aspect ratio preserved-extended. It is done in RootNet.
                root_cam = bbox_root_result[str(image_id)]['root']
            else:
                bbox = process_bbox(np.array(ann['bbox']), img_width, img_height)
                if bbox is None: continue
                root_cam = joint_cam[self.root_idx]
               
            data.append({
                'img_path': img_path,
                'img_id': image_id,
                'bbox': bbox,
                'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth]
                'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate
                'joint_vis': joint_vis,
                'root_cam': root_cam, # [X, Y, Z] in camera coordinate
                'f': f,
                'c': c})
           
        return data

    def evaluate(self, preds, result_dir):
        
        print('Evaluation start...')
        gts = self.data
        assert len(gts) == len(preds)
        sample_num = len(gts)
        
        pred_save = []
        error = np.zeros((sample_num, self.joint_num-1)) # joint error
        error_action = [ [] for _ in range(len(self.action_name)) ] # error for each sequence
        for n in range(sample_num):
            gt = gts[n]
            image_id = gt['img_id']
            f = gt['f']
            c = gt['c']
            bbox = gt['bbox']
            gt_3d_root = gt['root_cam']
            gt_3d_kpt = gt['joint_cam']
            gt_vis = gt['joint_vis']
            
            # restore coordinates to original space
            pred_2d_kpt = preds[n].copy()
            pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
            pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
            pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]

            vis = False
            if vis:
                cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
                filename = str(random.randrange(1,500))
                tmpimg = cvimg.copy().astype(np.uint8)
                tmpkps = np.zeros((3,self.joint_num))
                tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1]
                tmpkps[2,:] = 1
                tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton)
                cv2.imwrite(filename + '_output.jpg', tmpimg)

            # back project to camera coordinate system
            pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c)
 
            # root joint alignment
            pred_3d_kpt = pred_3d_kpt - pred_3d_kpt[self.root_idx]
            gt_3d_kpt  = gt_3d_kpt - gt_3d_kpt[self.root_idx]

            pred_3d_kpt = rigid_align(pred_3d_kpt, gt_3d_kpt)
            
            # exclude thorax
            pred_3d_kpt = np.take(pred_3d_kpt, self.eval_joint, axis=0)
            gt_3d_kpt = np.take(gt_3d_kpt, self.eval_joint, axis=0)
           
            # error calculate
            error[n] = np.sqrt(np.sum((pred_3d_kpt - gt_3d_kpt)**2,1))
            img_name = gt['img_path']
            action_idx = int(img_name[img_name.find('act')+4:img_name.find('act')+6]) - 2
            error_action[action_idx].append(error[n].copy())

            # prediction save
            pred_save.append({'image_id': image_id, 'joint_cam': pred_3d_kpt.tolist(), 'bbox': bbox.tolist(), 'root_cam': gt_3d_root.tolist()}) # joint_cam is root-relative coordinate

        # total error
        tot_err = np.mean(error)
        metric = 'PA MPJPE'
        eval_summary = 'Protocol 1' + ' error (' + metric + ') >> tot: %.2f\n' % (tot_err)

        # error for each action
        for i in range(len(error_action)):
            err = np.mean(np.array(error_action[i]))
            eval_summary += (self.action_name[i] + ': %.2f ' % err)

        print(eval_summary)

        # prediction save
        output_path = osp.join(result_dir, 'bbox_root_pose_dummy_output.json')
        with open(output_path, 'w') as f:
            json.dump(pred_save, f)
        print("Test result is saved at " + output_path)

        return eval_summary


================================================
FILE: data/Dummy/annotations/Dummy_subject1_camera.json
================================================
{"1": {"R": [[-0.9059013006181885, 0.4217144115102914, 0.038727105014486805], [0.044493184429779696, 0.1857199061874203, -0.9815948619389944], [-0.4211450938543295, -0.8875049698848251, -0.1870073216538954]], "t": [-234.7208032216618, 464.34018262882194, 5536.652631113797], "f": [1145.04940458804, 1143.78109572365], "c": [512.541504956548, 515.4514869776]}, "2": {"R": [[0.9216646531492915, 0.3879848687925067, -0.0014172943441045224], [0.07721054863099915, -0.18699239961454955, -0.979322405373477], [-0.3802272982247548, 0.9024974149959955, -0.20230080971229314]], "t": [-11.934348472090557, 449.4165893644565, 5541.113551868937], "f": [1149.67569986785, 1147.59161666764], "c": [508.848621645943, 508.064917088557]}, "3": {"R": [[-0.9063540572469627, -0.42053101768163204, -0.04093880896680188], [-0.0603212197838846, 0.22468715090881142, -0.9725620980997899], [0.4181909532208387, -0.8790161246439863, -0.2290130547809762]], "t": [781.127357651581, 235.3131620173424, 5576.37044019807], "f": [1149.14071676148, 1148.7989685676], "c": [519.815837182153, 501.402658888552]}, "4": {"R": [[0.91754082476548, -0.39226322025776267, 0.06517975852741943], [-0.04531905395586976, -0.26600517028098103, -0.9629057236990188], [0.395050652748768, 0.8805514269006645, -0.2618476013752581]], "t": [-155.13650339749012, 422.16256306729633, 4435.416222660868], "f": [1145.51133842318, 1144.77392807652], "c": [514.968197319863, 501.882018537695]}}

================================================
FILE: data/Dummy/annotations/Dummy_subject1_data.json
================================================
{"images": [{"id": 1877420, "file_name": "s_11_act_02_subact_01_ca_01/s_11_act_02_subact_01_ca_01_000001.jpg", "width": 1000, "height": 1002, "subject": 1, "action_name": "Directions", "action_idx": 2, "subaction_idx": 1, "cam_idx": 1, "frame_idx": 0}], "annotations": [{"id": 1877420, "image_id": 1877420, "keypoints_vis": [true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true], "bbox": [304.0201284041609, 222.305917169553, 328.1488619190915, 412.150330355609]}]}

================================================
FILE: data/Dummy/annotations/Dummy_subject1_joint_3d.json
================================================
{"2": {"1": {"0": [[-47.24769973754883, -81.04920196533203, 987.9080200195312], [-184.4625244140625, -69.55330657958984, 999.5223999023438], [-199.22152709960938, -72.29781341552734, 537.8258666992188], [-177.2645721435547, 44.52031326293945, 93.21685028076172], [89.96746063232422, -92.54512023925781, 976.2935791015625], [97.17977142333984, -81.16199493408203, 514.5499877929688], [82.85128784179688, 34.8104248046875, 69.40837097167969], [-52.695899963378906, -77.56897735595703, 1242.206298828125], [-49.09817886352539, -73.6445083618164, 1492.0970458984375], [-71.0900650024414, -139.2397003173828, 1579.0076904296875], [-71.68211364746094, -92.79254150390625, 1684.2078857421875], [116.02037811279297, -63.403587341308594, 1509.3262939453125], [396.226318359375, -72.48757934570312, 1469.46826171875], [633.7438354492188, -144.6726837158203, 1475.2344970703125], [-211.36859130859375, -37.4464111328125, 1487.2081298828125], [-487.9529724121094, -1.2391146421432495, 1438.4637451171875], [-727.43798828125, -60.458595275878906, 1466.75244140625]]}}}

================================================
FILE: data/Dummy/bbox_root/bbox_dummy_output.json
================================================
[{"image_id": 1877420, "category_id": 1, "bbox": [309.1705017089844, 252.84469604492188, 326.1686096191406, 368.1951599121094], "score": 0.9997870326042175}]

================================================
FILE: data/Human36M/Human36M.py
================================================
import os
import os.path as osp
from pycocotools.coco import COCO
import numpy as np
from config import cfg
from utils.pose_utils import world2cam, cam2pixel, pixel2cam, rigid_align, process_bbox
import cv2
import random
import json
from utils.vis import vis_keypoints, vis_3d_skeleton

class Human36M:
    def __init__(self, data_split):
        self.data_split = data_split
        self.img_dir = osp.join('/', 'data', 'Human36M', 'images')
        self.annot_path = osp.join('/', 'data', 'Human36M', 'annotations')
        self.human_bbox_root_dir = osp.join('/', 'data', 'Human36M', 'bbox_root', 'bbox_root_human36m_output.json')
        self.joint_num = 18 # original:17, but manually added 'Thorax'
        self.joints_name = ('Pelvis', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Torso', 'Neck', 'Nose', 'Head', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'Thorax')
        self.flip_pairs = ( (1, 4), (2, 5), (3, 6), (14, 11), (15, 12), (16, 13) )
        self.skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) )
        self.joints_have_depth = True
        self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9,  10, 11, 12, 13, 14, 15, 16) # exclude Thorax

        self.action_name = ['Directions', 'Discussion', 'Eating', 'Greeting', 'Phoning', 'Posing', 'Purchases', 'Sitting', 'SittingDown', 'Smoking', 'Photo', 'Waiting', 'Walking', 'WalkDog', 'WalkTogether']
        self.root_idx = self.joints_name.index('Pelvis')
        self.lshoulder_idx = self.joints_name.index('L_Shoulder')
        self.rshoulder_idx = self.joints_name.index('R_Shoulder')
        self.protocol = 2
        self.data = self.load_data()

    def get_subsampling_ratio(self):
        if self.data_split == 'train':
            return 5
        elif self.data_split == 'test':
            return 64
        else:
            assert 0, print('Unknown subset')

    def get_subject(self):
        if self.data_split == 'train':
            if self.protocol == 1:
                subject = [1,5,6,7,8,9]
            elif self.protocol == 2:
                subject = [1,5,6,7,8]
        elif self.data_split == 'test':
            if self.protocol == 1:
                subject = [11]
            elif self.protocol == 2:
                subject = [9,11]
        else:
            assert 0, print("Unknown subset")

        return subject
    
    def add_thorax(self, joint_coord):
        thorax = (joint_coord[self.lshoulder_idx, :] + joint_coord[self.rshoulder_idx, :]) * 0.5
        thorax = thorax.reshape((1, 3))
        joint_coord = np.concatenate((joint_coord, thorax), axis=0)
        return joint_coord

    def load_data(self):
        print('Load data of H36M Protocol ' + str(self.protocol))

        subject_list = self.get_subject()
        sampling_ratio = self.get_subsampling_ratio()
        
        # aggregate annotations from each subject
        db = COCO()
        cameras = {}
        joints = {}
        for subject in subject_list:
            # data load
            with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_data.json'),'r') as f:
                annot = json.load(f)
            if len(db.dataset) == 0:
                for k,v in annot.items():
                    db.dataset[k] = v
            else:
                for k,v in annot.items():
                    db.dataset[k] += v
            # camera load
            with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_camera.json'),'r') as f:
                cameras[str(subject)] = json.load(f)
            # joint coordinate load
            with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_joint_3d.json'),'r') as f:
                joints[str(subject)] = json.load(f)
        db.createIndex()
       
        if self.data_split == 'test' and not cfg.use_gt_info:
            print("Get bounding box and root from " + self.human_bbox_root_dir)
            bbox_root_result = {}
            with open(self.human_bbox_root_dir) as f:
                annot = json.load(f)
            for i in range(len(annot)):
                bbox_root_result[str(annot[i]['image_id'])] = {'bbox': np.array(annot[i]['bbox']), 'root': np.array(annot[i]['root_cam'])}
        else:
            print("Get bounding box and root from groundtruth")

        data = []
        for aid in db.anns.keys():
            ann = db.anns[aid]
            image_id = ann['image_id']
            img = db.loadImgs(image_id)[0]
            img_path = osp.join(self.img_dir, img['file_name'])
            img_width, img_height = img['width'], img['height']
           
            # check subject and frame_idx
            subject = img['subject']; frame_idx = img['frame_idx'];
            if subject not in subject_list:
                continue
            if frame_idx % sampling_ratio != 0:
                continue

            # camera parameter
            cam_idx = img['cam_idx']
            cam_param = cameras[str(subject)][str(cam_idx)]
            R,t,f,c = np.array(cam_param['R'], dtype=np.float32), np.array(cam_param['t'], dtype=np.float32), np.array(cam_param['f'], dtype=np.float32), np.array(cam_param['c'], dtype=np.float32)
                
            # project world coordinate to cam, image coordinate space
            action_idx = img['action_idx']; subaction_idx = img['subaction_idx']; frame_idx = img['frame_idx'];
            joint_world = np.array(joints[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)], dtype=np.float32)
            joint_world = self.add_thorax(joint_world)
            joint_cam = world2cam(joint_world, R, t)
            joint_img = cam2pixel(joint_cam, f, c)
            joint_img[:,2] = joint_img[:,2] - joint_cam[self.root_idx,2]
            joint_vis = np.ones((self.joint_num,1))
            
            if self.data_split == 'test' and not cfg.use_gt_info:
                bbox = bbox_root_result[str(image_id)]['bbox'] # bbox should be aspect ratio preserved-extended. It is done in RootNet.
                root_cam = bbox_root_result[str(image_id)]['root']
            else:
                bbox = process_bbox(np.array(ann['bbox']), img_width, img_height)
                if bbox is None: continue
                root_cam = joint_cam[self.root_idx]
               
            data.append({
                'img_path': img_path,
                'img_id': image_id,
                'bbox': bbox,
                'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth]
                'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate
                'joint_vis': joint_vis,
                'root_cam': root_cam, # [X, Y, Z] in camera coordinate
                'f': f,
                'c': c})
           
        return data

    def evaluate(self, preds, result_dir):
        
        print('Evaluation start...')
        gts = self.data
        assert len(gts) == len(preds)
        sample_num = len(gts)
        
        pred_save = []
        error = np.zeros((sample_num, self.joint_num-1)) # joint error
        error_action = [ [] for _ in range(len(self.action_name)) ] # error for each sequence
        for n in range(sample_num):
            gt = gts[n]
            image_id = gt['img_id']
            f = gt['f']
            c = gt['c']
            bbox = gt['bbox']
            gt_3d_root = gt['root_cam']
            gt_3d_kpt = gt['joint_cam']
            gt_vis = gt['joint_vis']
            
            # restore coordinates to original space
            pred_2d_kpt = preds[n].copy()
            pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
            pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
            pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]

            vis = False
            if vis:
                cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
                filename = str(random.randrange(1,500))
                tmpimg = cvimg.copy().astype(np.uint8)
                tmpkps = np.zeros((3,self.joint_num))
                tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1]
                tmpkps[2,:] = 1
                tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton)
                cv2.imwrite(filename + '_output.jpg', tmpimg)

            # back project to camera coordinate system
            pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c)
 
            # root joint alignment
            pred_3d_kpt = pred_3d_kpt - pred_3d_kpt[self.root_idx]
            gt_3d_kpt  = gt_3d_kpt - gt_3d_kpt[self.root_idx]
           
            if self.protocol == 1:
                # rigid alignment for PA MPJPE (protocol #1)
                pred_3d_kpt = rigid_align(pred_3d_kpt, gt_3d_kpt)
            
            # exclude thorax
            pred_3d_kpt = np.take(pred_3d_kpt, self.eval_joint, axis=0)
            gt_3d_kpt = np.take(gt_3d_kpt, self.eval_joint, axis=0)
           
            # error calculate
            error[n] = np.sqrt(np.sum((pred_3d_kpt - gt_3d_kpt)**2,1))
            img_name = gt['img_path']
            action_idx = int(img_name[img_name.find('act')+4:img_name.find('act')+6]) - 2
            error_action[action_idx].append(error[n].copy())

            # prediction save
            pred_save.append({'image_id': image_id, 'joint_cam': pred_3d_kpt.tolist(), 'bbox': bbox.tolist(), 'root_cam': gt_3d_root.tolist()}) # joint_cam is root-relative coordinate

        # total error
        tot_err = np.mean(error)
        metric = 'PA MPJPE' if self.protocol == 1 else 'MPJPE'
        eval_summary = 'Protocol ' + str(self.protocol) + ' error (' + metric + ') >> tot: %.2f\n' % (tot_err)

        # error for each action
        for i in range(len(error_action)):
            err = np.mean(np.array(error_action[i]))
            eval_summary += (self.action_name[i] + ': %.2f ' % err)

        print(eval_summary)

        # prediction save
        output_path = osp.join(result_dir, 'bbox_root_pose_human36m_output.json')
        with open(output_path, 'w') as f:
            json.dump(pred_save, f)
        print("Test result is saved at " + output_path)

        return eval_summary


================================================
FILE: data/MPII/MPII.py
================================================
import os
import os.path as osp
import numpy as np
from pycocotools.coco import COCO
from utils.pose_utils import process_bbox
from config import cfg

class MPII:

    def __init__(self, data_split):
        self.data_split = data_split
        self.img_dir = osp.join('/', 'data', 'MPII')
        self.train_annot_path = osp.join('/', 'data', 'MPII', 'annotations', 'train.json')
        self.joint_num = 16
        self.joints_name = ('R_Ankle', 'R_Knee', 'R_Hip', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Thorax', 'Neck', 'Head', 'R_Wrist', 'R_Elbow', 'R_Shoulder', 'L_Shoulder', 'L_Elbow', 'L_Wrist')
        self.flip_pairs = ( (0, 5), (1, 4), (2, 3), (10, 15), (11, 14), (12, 13) )
        self.skeleton = ( (0, 1), (1, 2), (2, 6), (7, 12), (12, 11), (11, 10), (5, 4), (4, 3), (3, 6), (7, 13), (13, 14), (14, 15), (6, 7), (7, 8), (8, 9) )
        self.joints_have_depth = False
        self.data = self.load_data()

    def load_data(self):
        
        if self.data_split == 'train':
            db = COCO(self.train_annot_path)
        else:
            print('Unknown data subset')
            assert 0

        data = []
        for aid in db.anns.keys():
            ann = db.anns[aid]
            img = db.loadImgs(ann['image_id'])[0]
            width, height = img['width'], img['height']

            if ann['num_keypoints'] == 0:
                continue
            
            bbox = process_bbox(ann['bbox'], width, height)
            if bbox is None: continue

            # joints and vis
            joint_img = np.array(ann['keypoints']).reshape(self.joint_num,3)
            joint_vis = joint_img[:,2].copy().reshape(-1,1)
            joint_img[:,2] = 0

            imgname = img['file_name']
            img_path = osp.join(self.img_dir, imgname)
            data.append({
                'img_path': img_path,
                'bbox': bbox,
                'joint_img': joint_img, # [org_img_x, org_img_y, 0]
                'joint_vis': joint_vis,
            })

        return data


================================================
FILE: data/MSCOCO/MSCOCO.py
================================================
import os
import os.path as osp
import numpy as np
from pycocotools.coco import COCO
from config import cfg
import scipy.io as sio
import json
import cv2
import random
import math
from utils.pose_utils import pixel2cam, process_bbox
from utils.vis import vis_keypoints, vis_3d_skeleton


class MSCOCO:
    def __init__(self, data_split):
        self.data_split = data_split
        self.img_dir = osp.join('/','home', 'centos', 'datasets', 'coco', 'images')
        self.train_annot_path = osp.join('/','home', 'centos', 'datasets', 'coco', 'annotations', 'person_keypoints_train2017.json')
        self.test_annot_path = osp.join('/','home', 'centos', 'datasets', 'coco', 'annotations', 'person_keypoints_val2017.json')
        self.human_3d_bbox_root_dir = osp.join('/', 'home', 'centos','datasets', 'coco', 'bbox_root', 'bbox_root_coco_output.json')
        
        if self.data_split == 'train':
            self.joint_num = 19 # original: 17, but manually added 'Thorax', 'Pelvis'
            self.joints_name = ('Nose', 'L_Eye', 'R_Eye', 'L_Ear', 'R_Ear', 'L_Shoulder', 'R_Shoulder', 'L_Elbow', 'R_Elbow', 'L_Wrist', 'R_Wrist', 'L_Hip', 'R_Hip', 'L_Knee', 'R_Knee', 'L_Ankle', 'R_Ankle', 'Thorax', 'Pelvis')
            self.flip_pairs = ( (1, 2), (3, 4), (5, 6), (7, 8), (9, 10), (11, 12), (13, 14), (15, 16) )
            self.skeleton = ( (1, 2), (0, 1), (0, 2), (2, 4), (1, 3), (6, 8), (8, 10), (5, 7), (7, 9), (12, 14), (14, 16), (11, 13), (13, 15), (5, 6), (11, 12) )
            self.joints_have_depth = False

            self.lshoulder_idx = self.joints_name.index('L_Shoulder')
            self.rshoulder_idx = self.joints_name.index('R_Shoulder')
            self.lhip_idx = self.joints_name.index('L_Hip')
            self.rhip_idx = self.joints_name.index('R_Hip')
       
        else:
            ## testing settings (when test model trained on the MuCo-3DHP dataset)
            self.joint_num = 21 # MuCo-3DHP
            self.joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe') # MuCo-3DHP
            self.original_joint_num = 17 # MuPoTS
            self.original_joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head') # MuPoTS
            self.flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13) )
            self.skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (11, 12), (12, 13), (1, 2), (2, 3), (3, 4), (1, 5), (5, 6), (6, 7) )
            self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
            self.joints_have_depth = False

        self.data = self.load_data()

    def load_data(self):

        if self.data_split == 'train':
            db = COCO(self.train_annot_path)
            data = []
            for aid in db.anns.keys():
                ann = db.anns[aid]
                img = db.loadImgs(ann['image_id'])[0]
                width, height = img['width'], img['height']

                if (ann['image_id'] not in db.imgs) or ann['iscrowd'] or (ann['num_keypoints'] == 0):
                    continue
                
                bbox = process_bbox(ann['bbox'], width, height) 
                if bbox is None: continue

                # joints and vis
                joint_img = np.array(ann['keypoints']).reshape(-1,3)
                # add Thorax
                thorax = (joint_img[self.lshoulder_idx, :] + joint_img[self.rshoulder_idx, :]) * 0.5
                thorax[2] = joint_img[self.lshoulder_idx,2] * joint_img[self.rshoulder_idx,2]
                thorax = thorax.reshape((1, 3))
                # add Pelvis
                pelvis = (joint_img[self.lhip_idx, :] + joint_img[self.rhip_idx, :]) * 0.5
                pelvis[2] = joint_img[self.lhip_idx,2] * joint_img[self.rhip_idx,2]
                pelvis = pelvis.reshape((1, 3))

                joint_img = np.concatenate((joint_img, thorax, pelvis), axis=0)

                joint_vis = (joint_img[:,2].copy().reshape(-1,1) > 0)
                joint_img[:,2] = 0

                imgname = osp.join('train2017', db.imgs[ann['image_id']]['file_name'])
                img_path = osp.join(self.img_dir, imgname)
                data.append({
                    'img_path': img_path,
                    'bbox': bbox,
                    'joint_img': joint_img, # [org_img_x, org_img_y, 0]
                    'joint_vis': joint_vis,
                    'f': np.array([1500, 1500]), 
                    'c': np.array([width/2, height/2]) 
                })

        elif self.data_split == 'test':
            db = COCO(self.test_annot_path)
            with open(self.human_3d_bbox_root_dir) as f:
                annot = json.load(f)
            data = [] 
            for i in range(len(annot)):
                image_id = annot[i]['image_id']
                img = db.loadImgs(image_id)[0]
                img_path = osp.join(self.img_dir, 'val2017', img['file_name'])
                fx, fy, cx, cy = 1500, 1500, img['width']/2, img['height']/2
                f = np.array([fx, fy]); c = np.array([cx, cy]);
                root_cam = np.array(annot[i]['root_cam']).reshape(3)
                bbox = np.array(annot[i]['bbox']).reshape(4)

                data.append({
                    'img_path': img_path,
                    'bbox': bbox,
                    'joint_img': np.zeros((self.original_joint_num, 3)), # dummy
                    'joint_cam': np.zeros((self.original_joint_num, 3)), # dummy
                    'joint_vis': np.zeros((self.original_joint_num, 1)), # dummy
                    'root_cam': root_cam, # [X, Y, Z] in camera coordinate
                    'f': f,
                    'c': c,
                })

        else:
            print('Unknown data subset')
            assert 0


        return data

    def evaluate(self, preds, result_dir):
        
        print('Evaluation start...')
        gts = self.data
        sample_num = len(preds)
        joint_num = self.original_joint_num

        pred_2d_save = {}
        pred_3d_save = {}
        for n in range(sample_num):
            
            gt = gts[n]
            f = gt['f']
            c = gt['c']
            bbox = gt['bbox']
            gt_3d_root = gt['root_cam']
            img_name = gt['img_path'].split('/')
            img_name = 'coco_' + img_name[-1].split('.')[0] # e.g., coco_00000000
            
            # restore coordinates to original space
            pred_2d_kpt = preds[n].copy()
            # only consider eval_joint
            pred_2d_kpt = np.take(pred_2d_kpt, self.eval_joint, axis=0)
            pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
            pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
            pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]

            # 2d kpt save
            if img_name in pred_2d_save:
                pred_2d_save[img_name].append(pred_2d_kpt[:,:2])
            else:
                pred_2d_save[img_name] = [pred_2d_kpt[:,:2]]

            vis = False
            if vis:
                cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
                filename = str(random.randrange(1,500))
                tmpimg = cvimg.copy().astype(np.uint8)
                tmpkps = np.zeros((3,joint_num))
                tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1]
                tmpkps[2,:] = 1
                tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton)
                cv2.imwrite(filename + '_output.jpg', tmpimg)

            # back project to camera coordinate system
            pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c)
            
            # 3d kpt save
            if img_name in pred_3d_save:
                pred_3d_save[img_name].append(pred_3d_kpt)
            else:
                pred_3d_save[img_name] = [pred_3d_kpt]
        
        output_path = osp.join(result_dir,'preds_2d_kpt_coco.mat')
        sio.savemat(output_path, pred_2d_save)
        print("Testing result is saved at " + output_path)
        output_path = osp.join(result_dir,'preds_3d_kpt_coco.mat')
        sio.savemat(output_path, pred_3d_save)
        print("Testing result is saved at " + output_path)


================================================
FILE: data/MuCo/MuCo.py
================================================
import os
import os.path as osp
import numpy as np
import math
from utils.pose_utils import process_bbox
from pycocotools.coco import COCO
from config import cfg

class MuCo:
    def __init__(self, data_split):
        self.data_split = data_split
        self.img_dir = osp.join('/', 'home', 'centos', 'datasets', 'MuCo')
        self.train_annot_path = osp.join('/', 'home', 'centos', 'datasets', 'MuCo', 'MuCo-3DHP.json')
        self.joint_num = 21
        self.joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe')
        self.flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) )
        self.skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) )
        self.joints_have_depth = True
        self.root_idx = self.joints_name.index('Pelvis')
        self.data = self.load_data()

    def load_data(self):

        if self.data_split == 'train':
            db = COCO(self.train_annot_path)
        else:
            print('Unknown data subset')
            assert 0

        data = []
        for iid in db.imgs.keys():
            img = db.imgs[iid]
            img_id = img["id"]
            img_width, img_height = img['width'], img['height']
            imgname = img['file_name']
            img_path = osp.join(self.img_dir, imgname)
            f = img["f"]
            c = img["c"]

            # crop the closest person to the camera
            ann_ids = db.getAnnIds(img_id)
            anns = db.loadAnns(ann_ids)

            root_depths = [ann['keypoints_cam'][self.root_idx][2] for ann in anns]
            closest_pid = root_depths.index(min(root_depths))
            pid_list = [closest_pid]
            for i in range(len(anns)):
                if i == closest_pid:
                    continue
                picked = True
                for j in range(len(anns)):
                    if i == j:
                        continue
                    dist = (np.array(anns[i]['keypoints_cam'][self.root_idx]) - np.array(anns[j]['keypoints_cam'][self.root_idx])) ** 2
                    dist_2d = math.sqrt(np.sum(dist[:2]))
                    dist_3d = math.sqrt(np.sum(dist))
                    if dist_2d < 500 or dist_3d < 500:
                        picked = False
                if picked:
                    pid_list.append(i)
            
            for pid in pid_list:
                joint_cam = np.array(anns[pid]['keypoints_cam'])
                root_cam = joint_cam[self.root_idx]
                
                joint_img = np.array(anns[pid]['keypoints_img'])
                joint_img = np.concatenate([joint_img, joint_cam[:,2:]],1)
                joint_img[:,2] = joint_img[:,2] - root_cam[2]
                joint_vis = np.ones((self.joint_num,1))

                bbox = process_bbox(anns[pid]['bbox'], img_width, img_height)
                if bbox is None: continue

                data.append({
                    'img_path': img_path,
                    'bbox': bbox,
                    'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth]
                    'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate
                    'joint_vis': joint_vis,
                    'root_cam': root_cam, # [X, Y, Z] in camera coordinate
                    'f': f,
                    'c': c
                })


        return data


================================================
FILE: data/MuPoTS/MuPoTS.py
================================================
import os
import os.path as osp
import scipy.io as sio
import numpy as np
from pycocotools.coco import COCO
from config import cfg
import json
import cv2
import random
import math
from utils.pose_utils import pixel2cam, process_bbox
from utils.vis import vis_keypoints, vis_3d_skeleton

class MuPoTS:
    def __init__(self, data_split):
        self.data_split = data_split
        self.img_dir = osp.join('/', 'data', 'MuPoTS', 'data', 'MultiPersonTestSet')
        self.test_annot_path = osp.join('/', 'data', 'MuPoTS', 'data', 'MuPoTS-3D.json')
        self.human_bbox_root_dir = osp.join('/', 'data', 'MuPoTS', 'bbox_root', 'bbox_root_mupots_output.json')
        self.joint_num = 21 # MuCo-3DHP
        self.joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe') # MuCo-3DHP
        self.original_joint_num = 17 # MuPoTS
        self.original_joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head') # MuPoTS

        self.flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13) )
        self.skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (11, 12), (12, 13), (1, 2), (2, 3), (3, 4), (1, 5), (5, 6), (6, 7) )
        self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
        self.joints_have_depth = True
        self.root_idx = self.joints_name.index('Pelvis')
        self.data = self.load_data()

    def load_data(self):
        
        if self.data_split != 'test':
            print('Unknown data subset')
            assert 0
        
        data = []
        db = COCO(self.test_annot_path)

        # use gt bbox and root
        if cfg.use_gt_info:
            print("Get bounding box and root from groundtruth")
            for aid in db.anns.keys():
                ann = db.anns[aid]
                if ann['is_valid'] == 0:
                    continue

                image_id = ann['image_id']
                img = db.loadImgs(image_id)[0]
                img_path = osp.join(self.img_dir, img['file_name'])
                fx, fy, cx, cy = img['intrinsic']
                f = np.array([fx, fy]); c = np.array([cx, cy]);

                joint_cam = np.array(ann['keypoints_cam'])
                root_cam = joint_cam[self.root_idx]

                joint_img = np.array(ann['keypoints_img'])
                joint_img = np.concatenate([joint_img, joint_cam[:,2:]],1)
                joint_img[:,2] = joint_img[:,2] - root_cam[2]
                joint_vis = np.ones((self.original_joint_num,1))
                
                bbox = np.array(ann['bbox'])
                img_width, img_height = img['width'], img['height']
                bbox = process_bbox(bbox, img_width, img_height)
                if bbox is None: continue
                
                data.append({
                    'img_path': img_path,
                    'bbox': bbox, 
                    'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth]
                    'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate
                    'joint_vis': joint_vis,
                    'root_cam': root_cam, # [X, Y, Z] in camera coordinate
                    'f': f,
                    'c': c,
                })
           
        else:
            print("Get bounding box and root from " + self.human_bbox_root_dir)
            with open(self.human_bbox_root_dir) as f:
                annot = json.load(f)
            
            for i in range(len(annot)):
                image_id = annot[i]['image_id']
                img = db.loadImgs(image_id)[0]
                img_width, img_height = img['width'], img['height']
                img_path = osp.join(self.img_dir, img['file_name'])
                fx, fy, cx, cy = img['intrinsic']
                f = np.array([fx, fy]); c = np.array([cx, cy]);
                root_cam = np.array(annot[i]['root_cam']).reshape(3)
                bbox = np.array(annot[i]['bbox']).reshape(4)

                data.append({
                    'img_path': img_path,
                    'bbox': bbox,
                    'joint_img': np.zeros((self.original_joint_num, 3)), # dummy
                    'joint_cam': np.zeros((self.original_joint_num, 3)), # dummy
                    'joint_vis': np.zeros((self.original_joint_num, 1)), # dummy
                    'root_cam': root_cam, # [X, Y, Z] in camera coordinate
                    'f': f,
                    'c': c,
                })

        return data

    def evaluate(self, preds, result_dir):
        
        print('Evaluation start...')
        gts = self.data
        sample_num = len(preds)
        joint_num = self.original_joint_num
 
        pred_2d_save = {}
        pred_3d_save = {}
        for n in range(sample_num):
            
            gt = gts[n]
            f = gt['f']
            c = gt['c']
            bbox = gt['bbox']
            gt_3d_root = gt['root_cam']
            img_name = gt['img_path'].split('/')
            img_name = img_name[-2] + '_' + img_name[-1].split('.')[0] # e.g., TS1_img_0001
            
            # restore coordinates to original space
            pred_2d_kpt = preds[n].copy()
            # only consider eval_joint
            pred_2d_kpt = np.take(pred_2d_kpt, self.eval_joint, axis=0)
            pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
            pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
            pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]

            # 2d kpt save
            if img_name in pred_2d_save:
                pred_2d_save[img_name].append(pred_2d_kpt[:,:2])
            else:
                pred_2d_save[img_name] = [pred_2d_kpt[:,:2]]

            vis = False
            if vis:
                cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
                filename = str(random.randrange(1,500))
                tmpimg = cvimg.copy().astype(np.uint8)
                tmpkps = np.zeros((3,joint_num))
                tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1]
                tmpkps[2,:] = 1
                tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton)
                cv2.imwrite(filename + '_output.jpg', tmpimg)

            # back project to camera coordinate system
            pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c)
            
            # 3d kpt save
            if img_name in pred_3d_save:
                pred_3d_save[img_name].append(pred_3d_kpt)
            else:
                pred_3d_save[img_name] = [pred_3d_kpt]
        
        output_path = osp.join(result_dir,'preds_2d_kpt_mupots.mat')
        sio.savemat(output_path, pred_2d_save)
        print("Testing result is saved at " + output_path)
        output_path = osp.join(result_dir,'preds_3d_kpt_mupots.mat')
        sio.savemat(output_path, pred_3d_save)
        print("Testing result is saved at " + output_path)


================================================
FILE: data/MuPoTS/mpii_mupots_multiperson_eval.m
================================================
function mpii_mupots_multiperson_eval(eval_mode, is_relative)

% eval_mode: EVLAUATION_MODE
% is_relative: 1: root-relative 3D multi-person pose estimation, 0: absolute 3D multi-person pose estimation

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Outline of the test eval procedure on MuPoTS-3D. 
% Plug in your predictions at the appropriate point
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
mpii_mupots_config;
addpath('./util');
[~,o1,o2,relevant_labels] = mpii_get_joints('relevant');  
num_joints = length(o1);

%Path to the test images and annotations
test_annot_base = mpii_mupots_path; %See mpii_mupots_config
%Path where results are written out 
results_output_path = './';

%If predicted joints have a different ordering, specify mapping to MPI joints here
%map_to_mpii_jointset = % [11 14 10 13 9 12 5 8 4 7 3 6 1];
%Order to process bones in to resize them to the GT
safe_traversal_order = [15, 16, 2, 1, 17, 3, 4, 5, 6, 7, 8, 9:14];

EVALUATION_MODE = eval_mode; % 0 = evaluate all annotated persons, 1 = evaluate only predictions matched to annotations

person_colors = {'red', 'yellow', 'green', 'blue', 'magenta', 'cyan', 'black', 'white'} ;

sequencewise_per_joint_error = {};
sequencewise_undetected_people = [];
sequencewise_visibility_mask = {};
sequencewise_occlusion_mask = {};
sequencewise_annotated_people = [];
sequencewise_frames = [];

%% load prdictions
preds_2d_kpt = load('preds_2d_kpt_mupots.mat');
preds_3d_kpt = load('preds_3d_kpt_mupots.mat');

for ts = 1:20
    person_ids = [];
    open_person_ids = 1:20;

    load( sprintf('%s/TS%d/annot.mat',test_annot_base, ts));
    load( sprintf('%s/TS%d/occlusion.mat',test_annot_base, ts));
    
    num_frames = size(annotations,1);
    
    undetected_people = 0;
    annotated_people = 0;
    pje_idx = 1;
    
    per_joint_error = []; %zeros(17,1,num_test_points);
    per_joint_occlusion_mask = [];
    per_joint_visibility_mask = [];
    sequencewise_frames(ts) = num_frames;

for i = 1:num_frames

     %Count valid annotations
     valid_annotations = 0;
     for k = 1:size(annotations,2)
         if(annotations{i,k}.isValidFrame)
             valid_annotations = valid_annotations + 1;
         end
     end
     annotated_people = annotated_people + valid_annotations;
     
     if(valid_annotations == 0)
         continue;
     end
     
     gt_pose_2d =  cell(valid_annotations,1);
     gt_pose_3d =  cell(valid_annotations,1);
     gt_visibility = cell(valid_annotations,1);
     gt_pose_occlusion_labels =  cell(valid_annotations,1);
     gt_pose_visibility_labels =  cell(valid_annotations,1);
     %The joint set to use for matching predictions to GT
     matching_joints = [2:14];
     %matching_joints = [2 3 6 9 12];

     idx = 1;
     for k = 1:size(annotations,2)
         if(annotations{i,k}.isValidFrame)
             
             gt_pose_2d{idx} = annotations{i,k}.annot2(:,matching_joints); 
             gt_pose_3d{idx} = annotations{i,k}.univ_annot3 ;
             gt_visibility{idx} = ones(1,length(matching_joints));
             gt_pose_occlusion_labels{idx} = occlusion_labels{i,k} ;
             gt_pose_visibility_labels{idx} = 1 - occlusion_labels{i,k} ;
             idx = idx + 1;
         end
     end

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %%%% Predictions here     
    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
    %img = imread(sprintf('%s/TS%d/img_%06d.jpg',test_annot_base, ts, i-1));
    
    % prediction of this image
    pred_2d_kpt = getfield(preds_2d_kpt,sprintf('TS%d_img_%06d',ts, i-1));
    pred_3d_kpt = getfield(preds_3d_kpt,sprintf('TS%d_img_%06d',ts, i-1));

    %Number of subjects predicted 
    num_pred = size(pred_2d_kpt,1);

    pred_pose_2d = cell(num_pred,1);
    pred_pose_3d = cell(num_pred,1);
    pred_visibility = cell(num_pred,1);
     for k = 1:num_pred
         
         pred_pose_2d{k} = zeros(2,14);
         %pred_pose_2d{k}(:,map_to_mpii_jointset) = % 2D Pose for person detected person k;
         pred_pose_2d{k} = transpose(squeeze(pred_2d_kpt(k,:,:))); % 2D Pose for person detected person k;

         % If some joints such as neck are missing, they can be estimated as the mean of shoulders
         %pred_pose_2d{k}(:,2) = mean(pred_pose_2d{k}(:,[3,6]),2);

         pred_pose_2d{k} = pred_pose_2d{k}(:,matching_joints);
         pred_visibility{k} = ~((pred_pose_2d{k}(1,:) == 0) & (pred_pose_2d{k}(2,:) == 0));
         
         pred_pose_3d{k} = zeros(3,num_joints);
         %pred_pose_3d{k}(:,map_to_mpii_jointset) = % 3D Pose for person detected person k;
         pred_pose_3d{k} = transpose(squeeze(pred_3d_kpt(k,:,:))); % 3D Pose for person detected person k;

         % If some joints such as neck or pelvis are missing, they can be estimated as 
         % the mean of shoulders or hips
         %pred_pose_3d{k}(:,2) = mean(pred_pose_3d{k}(:,[3,6]),2);
         %pred_pose_3d{k}(:,15) = mean(pred_pose_3d{k}(:,[9,12]),2);
         %Center the predictions at the pelvis
         if is_relative == 1
             pred_pose_3d{k} = pred_pose_3d{k} - repmat(pred_pose_3d{k}(:,15), 1, 17);
         else
             pred_pose_3d{k} = pred_pose_3d{k};
         end

         %Other mappings that may be needed to convert the predicted pose to match our coordinate system
         %pred_pose_3d{k} = 1000* pred_pose_3d{k}([2 3 1],:);
         %pred_pose_3d{k}(1:2,:) = -pred_pose_3d{k}(1:2,:);
     end
    
    %Match predictions to GT 
    [matching, old_matched] = mpii_multiperson_get_identity_matching(gt_pose_2d, gt_visibility, pred_pose_2d, pred_visibility, 40);
    
    undetected_people = undetected_people + sum(matching == 0);
    
    for k = 1:valid_annotations
        if is_relative == 1
            P = gt_pose_3d{k}(:,1:num_joints) - repmat(gt_pose_3d{k}(:,15),1 , num_joints);
        else
            P = gt_pose_3d{k}(:,1:num_joints);
        end

        pred_considered = 0;
        
        if(matching(k) ~= 0 )
            pred_p = pred_pose_3d{matching(k)}(:,1:num_joints);
            pred_p = mpii_map_to_gt_bone_lengths(pred_p, P, o1, safe_traversal_order(2:end));
            pred_considered = 1;
        else
            pred_p = 100000 * ones(size(P)); %So that the 3DPCK metric marks all these joints as 0!
            if(EVALUATION_MODE==0)
                pred_considered = 1;
            end
        end
        
        if (pred_considered == 1 )
            error_p = (pred_p - P).^2;
            error_p = sqrt(sum(error_p, 1));
            per_joint_error(1:num_joints,1,pje_idx) = error_p;     
            per_joint_occlusion_mask(1:num_joints,1,pje_idx) = gt_pose_occlusion_labels{k};
            per_joint_visibility_mask(1:num_joints,1,pje_idx) = gt_pose_visibility_labels{k};
            pje_idx = pje_idx + 1;
        end
        
    end

end
sequencewise_undetected_people(ts) = undetected_people;
sequencewise_annotated_people(ts) = annotated_people;
sequencewise_per_joint_error{ts} = per_joint_error;
sequencewise_visibility_mask{ts} =  per_joint_visibility_mask;
sequencewise_occlusion_mask{ts} =  per_joint_occlusion_mask;  
    
end


if(EVALUATION_MODE == 0)
    out_prefix = 'all_annotated_';
else
    out_prefix = 'only_matched_annotations_';
end

save([results_output_path filesep out_prefix 'multiperson_3dhp_evaluation.mat'], 'sequencewise_per_joint_error' );

[seq_table] = mpii_evaluate_multiperson_errors(sequencewise_per_joint_error );%fullfile(net_base, net_path{n,1}));
out_file = [results_output_path filesep out_prefix 'multiperson_3dhp_evaluation'];
writetable(cell2table(seq_table), [out_file '_sequencewise.csv']);

  
[seq_table] = mpii_evaluate_multiperson_errors_visibility_mask(sequencewise_per_joint_error , sequencewise_visibility_mask);
out_file = [results_output_path filesep [out_prefix 'visible_joints_'] 'multiperson_3dhp_evaluation'];
writetable(cell2table(seq_table), [out_file '_sequencewise.csv']);

[seq_table] = mpii_evaluate_multiperson_errors_visibility_mask(sequencewise_per_joint_error , sequencewise_occlusion_mask);
out_file = [results_output_path filesep [out_prefix 'occluded_joints_'] 'multiperson_3dhp_evaluation'];
writetable(cell2table(seq_table), [out_file '_sequencewise.csv']);
  
%
end


================================================
FILE: data/dataset.py
================================================
import numpy as np
import cv2
import random
import time
import torch
import copy
import math
from torch.utils.data.dataset import Dataset
from utils.vis import vis_keypoints, vis_3d_skeleton
from utils.pose_utils import fliplr_joints, transform_joint_to_other_db
from config import cfg

class DatasetLoader(Dataset):
    def __init__(self, db, ref_joints_name, is_train, transform):
        
        self.db = db.data
        self.joint_num = db.joint_num
        self.skeleton = db.skeleton
        self.flip_pairs = db.flip_pairs
        self.joints_have_depth = db.joints_have_depth
        self.joints_name = db.joints_name
        self.ref_joints_name = ref_joints_name
        
        self.transform = transform
        self.is_train = is_train

        if self.is_train:
            self.do_augment = True
        else:
            self.do_augment = False

    def __getitem__(self, index):
        
        joint_num = self.joint_num
        skeleton = self.skeleton
        flip_pairs = self.flip_pairs
        joints_have_depth = self.joints_have_depth

        data = copy.deepcopy(self.db[index])

        bbox = data['bbox']
        joint_img = data['joint_img']
        joint_vis = data['joint_vis']

        # 1. load image
        cvimg = cv2.imread(data['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
        if not isinstance(cvimg, np.ndarray):
            raise IOError("Fail to read %s" % data['img_path'])
        img_height, img_width, img_channels = cvimg.shape

        # 2. get augmentation params
        if self.do_augment:
            scale, rot, do_flip, color_scale, do_occlusion = get_aug_config()
        else:
            scale, rot, do_flip, color_scale, do_occlusion = 1.0, 0.0, False, [1.0, 1.0, 1.0], False

        # 3. crop patch from img and perform data augmentation (flip, rot, color scale, synthetic occlusion)
        img_patch, trans = generate_patch_image(cvimg, bbox, do_flip, scale, rot, do_occlusion)
        for i in range(img_channels):
            img_patch[:, :, i] = np.clip(img_patch[:, :, i] * color_scale[i], 0, 255)

        # 4. generate patch joint ground truth
        # flip joints and apply Affine Transform on joints
        if do_flip:
            joint_img[:, 0] = img_width - joint_img[:, 0] - 1
            for pair in flip_pairs:
                joint_img[pair[0], :], joint_img[pair[1], :] = joint_img[pair[1], :], joint_img[pair[0], :].copy()
                joint_vis[pair[0], :], joint_vis[pair[1], :] = joint_vis[pair[1], :], joint_vis[pair[0], :].copy()

        for i in range(len(joint_img)):
            joint_img[i, 0:2] = trans_point2d(joint_img[i, 0:2], trans)
            joint_img[i, 2] /= (cfg.bbox_3d_shape[0]/2.) # expect depth lies in -bbox_3d_shape[0]/2 ~ bbox_3d_shape[0]/2 -> -1.0 ~ 1.0
            joint_img[i, 2] = (joint_img[i,2] + 1.0)/2. # 0~1 normalize
            joint_vis[i] *= (
                            (joint_img[i,0] >= 0) & \
                            (joint_img[i,0] < cfg.input_shape[1]) & \
                            (joint_img[i,1] >= 0) & \
                            (joint_img[i,1] < cfg.input_shape[0]) & \
                            (joint_img[i,2] >= 0) & \
                            (joint_img[i,2] < 1)
                            )

        vis = False
        if vis:
            filename = str(random.randrange(1,500))
            tmpimg = img_patch.copy().astype(np.uint8)
            tmpkps = np.zeros((3,joint_num))
            tmpkps[:2,:] = joint_img[:,:2].transpose(1,0)
            tmpkps[2,:] = joint_vis[:,0]
            tmpimg = vis_keypoints(tmpimg, tmpkps, skeleton)
            cv2.imwrite(filename + '_gt.jpg', tmpimg)
        
        vis = False
        if vis:
            vis_3d_skeleton(joint_img, joint_vis, skeleton, filename)

        # change coordinates to output space
        joint_img[:, 0] = joint_img[:, 0] / cfg.input_shape[1] * cfg.output_shape[1]
        joint_img[:, 1] = joint_img[:, 1] / cfg.input_shape[0] * cfg.output_shape[0]
        joint_img[:, 2] = joint_img[:, 2] * cfg.depth_dim
        
        if self.is_train:
            img_patch = self.transform(img_patch)
            
            if self.ref_joints_name is not None:
                joint_img = transform_joint_to_other_db(joint_img, self.joints_name, self.ref_joints_name) 
                joint_vis = transform_joint_to_other_db(joint_vis, self.joints_name, self.ref_joints_name)

            joint_img = joint_img.astype(np.float32)
            joint_vis = (joint_vis > 0).astype(np.float32)
            joints_have_depth = np.array([joints_have_depth]).astype(np.float32)

            return img_patch, joint_img, joint_vis, joints_have_depth
        else:
            img_patch = self.transform(img_patch)
            return img_patch

    def __len__(self):
        return len(self.db)

# helper functions
def get_aug_config():
    
    scale_factor = 0.25
    rot_factor = 30
    color_factor = 0.2
    
    scale = np.clip(np.random.randn(), -1.0, 1.0) * scale_factor + 1.0
    rot = np.clip(np.random.randn(), -2.0,
                  2.0) * rot_factor if random.random() <= 0.6 else 0
    do_flip = random.random() <= 0.5
    c_up = 1.0 + color_factor
    c_low = 1.0 - color_factor
    color_scale = [random.uniform(c_low, c_up), random.uniform(c_low, c_up), random.uniform(c_low, c_up)]

    do_occlusion = random.random() <= 0.5

    return scale, rot, do_flip, color_scale, do_occlusion


def generate_patch_image(cvimg, bbox, do_flip, scale, rot, do_occlusion):
    img = cvimg.copy()
    img_height, img_width, img_channels = img.shape

    # synthetic occlusion
    if do_occlusion:
        while True:
            area_min = 0.0
            area_max = 0.7
            synth_area = (random.random() * (area_max - area_min) + area_min) * bbox[2] * bbox[3]

            ratio_min = 0.3
            ratio_max = 1/0.3
            synth_ratio = (random.random() * (ratio_max - ratio_min) + ratio_min)

            synth_h = math.sqrt(synth_area * synth_ratio)
            synth_w = math.sqrt(synth_area / synth_ratio)
            synth_xmin = random.random() * (bbox[2] - synth_w - 1) + bbox[0]
            synth_ymin = random.random() * (bbox[3] - synth_h - 1) + bbox[1]

            if synth_xmin >= 0 and synth_ymin >= 0 and synth_xmin + synth_w < img_width and synth_ymin + synth_h < img_height:
                xmin = int(synth_xmin)
                ymin = int(synth_ymin)
                w = int(synth_w)
                h = int(synth_h)
                img[ymin:ymin+h, xmin:xmin+w, :] = np.random.rand(h, w, 3) * 255
                break

    bb_c_x = float(bbox[0] + 0.5*bbox[2])
    bb_c_y = float(bbox[1] + 0.5*bbox[3])
    bb_width = float(bbox[2])
    bb_height = float(bbox[3])

    if do_flip:
        img = img[:, ::-1, :]
        bb_c_x = img_width - bb_c_x - 1
    
    trans = gen_trans_from_patch_cv(bb_c_x, bb_c_y, bb_width, bb_height, cfg.input_shape[1], cfg.input_shape[0], scale, rot, inv=False)
    img_patch = cv2.warpAffine(img, trans, (int(cfg.input_shape[1]), int(cfg.input_shape[0])), flags=cv2.INTER_LINEAR)

    img_patch = img_patch[:,:,::-1].copy()
    img_patch = img_patch.astype(np.float32)

    return img_patch, trans

def rotate_2d(pt_2d, rot_rad):
    x = pt_2d[0]
    y = pt_2d[1]
    sn, cs = np.sin(rot_rad), np.cos(rot_rad)
    xx = x * cs - y * sn
    yy = x * sn + y * cs
    return np.array([xx, yy], dtype=np.float32)

def gen_trans_from_patch_cv(c_x, c_y, src_width, src_height, dst_width, dst_height, scale, rot, inv=False):
    # augment size with scale
    src_w = src_width * scale
    src_h = src_height * scale
    src_center = np.array([c_x, c_y], dtype=np.float32)

    # augment rotation
    rot_rad = np.pi * rot / 180
    src_downdir = rotate_2d(np.array([0, src_h * 0.5], dtype=np.float32), rot_rad)
    src_rightdir = rotate_2d(np.array([src_w * 0.5, 0], dtype=np.float32), rot_rad)

    dst_w = dst_width
    dst_h = dst_height
    dst_center = np.array([dst_w * 0.5, dst_h * 0.5], dtype=np.float32)
    dst_downdir = np.array([0, dst_h * 0.5], dtype=np.float32)
    dst_rightdir = np.array([dst_w * 0.5, 0], dtype=np.float32)

    src = np.zeros((3, 2), dtype=np.float32)
    src[0, :] = src_center
    src[1, :] = src_center + src_downdir
    src[2, :] = src_center + src_rightdir

    dst = np.zeros((3, 2), dtype=np.float32)
    dst[0, :] = dst_center
    dst[1, :] = dst_center + dst_downdir
    dst[2, :] = dst_center + dst_rightdir

    if inv:
        trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
    else:
        trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))

    return trans

def trans_point2d(pt_2d, trans):
    src_pt = np.array([pt_2d[0], pt_2d[1], 1.]).T
    dst_pt = np.dot(trans, src_pt)
    return dst_pt[0:2]


================================================
FILE: data/multiple_datasets.py
================================================
import random
import numpy as np
from torch.utils.data.dataset import Dataset

class MultipleDatasets(Dataset):
    def __init__(self, dbs, make_same_len=True):
        self.dbs = dbs
        self.db_num = len(self.dbs)
        self.max_db_data_num = max([len(db) for db in dbs])
        self.db_len_cumsum = np.cumsum([len(db) for db in dbs])
        self.make_same_len = make_same_len

    def __len__(self):
        # all dbs have the same length
        if self.make_same_len:
            return self.max_db_data_num * self.db_num
        # each db has different length
        else:
            return sum([len(db) for db in self.dbs])

    def __getitem__(self, index):
        if self.make_same_len:
            db_idx = index // self.max_db_data_num
            data_idx = index % self.max_db_data_num 
            if data_idx >= len(self.dbs[db_idx]) * (self.max_db_data_num // len(self.dbs[db_idx])): # last batch: random sampling
                data_idx = random.randint(0,len(self.dbs[db_idx])-1)
            else: # before last batch: use modular
                data_idx = data_idx % len(self.dbs[db_idx])
        else:
            for i in range(self.db_num):
                if index < self.db_len_cumsum[i]:
                    db_idx = i
                    break
            if db_idx == 0:
                data_idx = index
            else:
                data_idx = index - self.db_len_cumsum[db_idx-1]

        return self.dbs[db_idx][data_idx]


================================================
FILE: demo/demo.py
================================================
import sys
import os
import os.path as osp
import argparse
import numpy as np
import cv2
import torch
import torchvision.transforms as transforms
from torch.nn.parallel.data_parallel import DataParallel
import torch.backends.cudnn as cudnn

sys.path.insert(0, osp.join('..', 'main'))
sys.path.insert(0, osp.join('..', 'data'))
sys.path.insert(0, osp.join('..', 'common'))
from config import cfg
from model import get_pose_net
from dataset import generate_patch_image
from utils.pose_utils import process_bbox, pixel2cam
from utils.vis import vis_keypoints, vis_3d_multiple_skeleton

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--gpu', type=str, dest='gpu_ids')
    parser.add_argument('--model_path', type=str, dest='model')
    parser.add_argument('--input_image', type=str, dest='image')
    parser.add_argument('--backbone', type=str, dest='backbone')
    args = parser.parse_args()

    # test gpus
    if not args.gpu_ids:
        assert 0, print("Please set proper gpu ids")

    if '-' in args.gpu_ids:
        gpus = args.gpu_ids.split('-')
        gpus[0] = 0 if not gpus[0].isdigit() else int(gpus[0])
        gpus[1] = len(mem_info()) if not gpus[1].isdigit() else int(gpus[1]) + 1
        args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
    return args

# argument parsing
args = parse_args()
cfg.set_args(args.gpu_ids)
cudnn.benchmark = True

# MuCo joint set
joint_num = 18
joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe')
# 'Pelvis' 'RHip' 'RKnee' 'RAnkle' 'LHip' 'LKnee' 'LAnkle' 'Spine1' 'Neck' 'Head' 'Site' 'LShoulder' 'LElbow' 'LWrist' 'RShoulder' 'RElbow' 'RWrist
flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) )
# skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) )
skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) )

# snapshot load
model_path = args.model

# print('Load checkpoint from {}'.format(model_path))
model = get_pose_net(args.backbone, False, joint_num)
model = DataParallel(model).cuda()
# print("after DataParallel", model)
ckpt = torch.load(model_path)
# print("ckpt", ckpt['network'])
model.load_state_dict(ckpt['network'])
model.eval()

# prepare input image
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)])
img_path = args.image
assert osp.exists(img_path), 'Cannot find image at ' + img_path
original_img = cv2.imread(img_path)
original_img_height, original_img_width = original_img.shape[:2]

# prepare bbox
bbox_list = [
[139.41, 102.25, 222.39, 241.57],\
[287.17, 61.52, 74.88, 165.61],\
[540.04, 48.81, 99.96, 223.36],\
[372.58, 170.84, 266.63, 217.19],\
[0.5, 43.74, 90.1, 220.09]] # xmin, ymin, width, height
root_depth_list = [11250.5732421875, 15522.8701171875, 11831.3828125, 8852.556640625, 12572.5966796875] # obtain this from RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE/tree/master/demo)
assert len(bbox_list) == len(root_depth_list)
person_num = len(bbox_list)

# normalized camera intrinsics
focal = [1500, 1500] # x-axis, y-axis
princpt = [original_img_width/2, original_img_height/2] # x-axis, y-axis
print('focal length: (' + str(focal[0]) + ', ' + str(focal[1]) + ')')
print('principal points: (' + str(princpt[0]) + ', ' + str(princpt[1]) + ')')

# for each cropped and resized human image, forward it to PoseNet
output_pose_2d_list = []
output_pose_3d_list = []
for n in range(person_num):
    bbox = process_bbox(np.array(bbox_list[n]), original_img_width, original_img_height)
    img, img2bb_trans = generate_patch_image(original_img, bbox, False, 1.0, 0.0, False) 
    img = transform(img).cuda()[None,:,:,:]

    # forward
    with torch.no_grad():
        pose_3d = model(img) # x,y: pixel, z: root-relative depth (mm)

    # inverse affine transform (restore the crop and resize)
    pose_3d = pose_3d[0].cpu().numpy()
    pose_3d[:,0] = pose_3d[:,0] / cfg.output_shape[1] * cfg.input_shape[1]
    pose_3d[:,1] = pose_3d[:,1] / cfg.output_shape[0] * cfg.input_shape[0]
    pose_3d_xy1 = np.concatenate((pose_3d[:,:2], np.ones_like(pose_3d[:,:1])),1)
    img2bb_trans_001 = np.concatenate((img2bb_trans, np.array([0,0,1]).reshape(1,3)))
    pose_3d[:,:2] = np.dot(np.linalg.inv(img2bb_trans_001), pose_3d_xy1.transpose(1,0)).transpose(1,0)[:,:2]
    output_pose_2d_list.append(pose_3d[:,:2].copy())
    
    # root-relative discretized depth -> absolute continuous depth
    pose_3d[:,2] = (pose_3d[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + root_depth_list[n]
    pose_3d = pixel2cam(pose_3d, focal, princpt)
    output_pose_3d_list.append(pose_3d.copy())

# visualize 2d poses
vis_img = original_img.copy()
for n in range(person_num):
    vis_kps = np.zeros((3,joint_num))
    vis_kps[0,:] = output_pose_2d_list[n][:,0]
    vis_kps[1,:] = output_pose_2d_list[n][:,1]
    vis_kps[2,:] = 1
    vis_img = vis_keypoints(vis_img, vis_kps, skeleton)
cv2.imwrite('output_pose_2d.jpg', vis_img)

# visualize 3d poses
vis_kps = np.array(output_pose_3d_list)
vis_3d_multiple_skeleton(vis_kps, np.ones_like(vis_kps), skeleton, 'output_pose_3d (x,y,z: camera-centered. mm.)')


================================================
FILE: main/config.py
================================================
import os
import os.path as osp
import sys
import numpy as np

class Config:

    ## model architecture
    backbone = 'LPSKI'
    
    ## dataset
    # training set
    # 3D: Human36M, MuCo
    # 2D: MSCOCO, MPII
    trainset_3d = ['Dummy']
    # trainset_3d = ['MuCo']
    trainset_2d = []
    # trainset_2d = ['MSCOCO']

    # testing set
    # Human36M, MuPoTS, MSCOCO
    testset = 'MuPoTS'

    ## directory
    cur_dir = osp.dirname(os.path.abspath(__file__))
    root_dir = osp.join(cur_dir, '..')
    data_dir = osp.join(root_dir, 'data')
    output_dir = osp.join(root_dir, 'output')
    model_dir = osp.join(output_dir, 'model_dump')
    pretrain_dir = osp.join(output_dir, 'pre_train')
    vis_dir = osp.join(output_dir, 'vis')
    log_dir = osp.join(output_dir, 'log')
    result_dir = osp.join(output_dir, 'result')
    
    ## input, output
    input_shape = (256, 256) 
    output_shape = (input_shape[0]//8, input_shape[1]//8)
    width_multiplier = 1.0
    depth_dim = 32
    bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
    pixel_mean = (0.485, 0.456, 0.406)
    pixel_std = (0.229, 0.224, 0.225)

    ## training config
    embedding_size = 2048
    lr_dec_epoch = [17, 21]
    end_epoch = 25
    lr = 1e-3
    lr_dec_factor = 10
    batch_size = 64

    ## testing config
    test_batch_size = 32
    flip_test = True
    use_gt_info = True

    ## others
    num_thread = 20
    gpu_ids = '0'
    num_gpus = 1
    continue_train = False

    if '-' in gpu_ids:
        gpus = gpu_ids.split('-')
        gpus[0] = int(gpus[0])
        gpus[1] = int(gpus[1]) + 1
        gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))

    os.environ["CUDA_VISIBLE_DEVICES"] = gpu_ids

cfg = Config()

sys.path.insert(0, osp.join(cfg.root_dir, 'common'))
from utils.dir_utils import add_pypath, make_folder
# adding path
add_pypath(osp.join(cfg.data_dir))
for i in range(len(cfg.trainset_3d)):
    add_pypath(osp.join(cfg.data_dir, cfg.trainset_3d[i]))
for i in range(len(cfg.trainset_2d)):
    add_pypath(osp.join(cfg.data_dir, cfg.trainset_2d[i]))
add_pypath(osp.join(cfg.data_dir, cfg.testset))
make_folder(cfg.model_dir)
make_folder(cfg.vis_dir)
make_folder(cfg.log_dir)
make_folder(cfg.result_dir)


================================================
FILE: main/intermediate.py
================================================
import torch
import argparse
import numpy as np
import os
import os.path as osp
import cv2
import matplotlib.pyplot as plt
import torch.backends.cudnn as cudnn
import torchvision.transforms as transforms
from torchsummary import summary
from torch.nn.parallel.data_parallel import DataParallel
from config import cfg
from model import get_pose_net
from utils.pose_utils import process_bbox, pixel2cam
from utils.vis import vis_keypoints, vis_3d_multiple_skeleton
from dataset import generate_patch_image

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--gpu', type=str, dest='gpu_ids')
    parser.add_argument('--epoch', type=int, dest='test_epoch')
    parser.add_argument('--input_image', type=str, dest='image')
    parser.add_argument('--jointnum', type=int, dest='joint')
    parser.add_argument('--backbone', type=str, dest='backbone')
    args = parser.parse_args()

    # test gpus
    if not args.gpu_ids:
        assert 0, print("Please set proper gpu ids")

    if not args.joint:
        assert print("please insert number of joint")

    if '-' in args.gpu_ids:
        gpus = args.gpu_ids.split('-')
        gpus[0] = 0 if not gpus[0].isdigit() else int(gpus[0])
        gpus[1] = len(mem_info()) if not gpus[1].isdigit() else int(gpus[1]) + 1
        args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
    return args

# argument parsing
args = parse_args()
cfg.set_args(args.gpu_ids)
cudnn.benchmark = True

# joint set
joint_num = args.joint
joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe')
flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) )
if joint_num == 18:
    skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) )
if joint_num == 21:
    skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) )

# snapshot load
model_path = os.path.join(cfg.model_dir, 'snapshot_%d.pth.tar' % args.test_epoch)
assert osp.exists(model_path), 'Cannot find model at ' + model_path
model = get_pose_net(args.backbone, args.frontbone, False, joint_num)
model = DataParallel(model).cuda()
ckpt = torch.load(model_path)
model.load_state_dict(ckpt['network'])
model = model.module
model.eval()

# prepare input image
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)])
img_path = args.image
assert osp.exists(img_path), 'Cannot find image at ' + img_path
original_img = cv2.imread(img_path)
original_img_height, original_img_width = original_img.shape[:2]

# prepare bbox
bbox_list = [
    [139.41, 102.25, 222.39, 241.57],\
    [287.17, 61.52, 74.88, 165.61],\
    [540.04, 48.81, 99.96, 223.36],\
    [372.58, 170.84, 266.63, 217.19],\
    [0.5, 43.74, 90.1, 220.09]
] # xmin, ymin, width, height
root_depth_list = [11250.5732421875, 15522.8701171875, 11831.3828125, 8852.556640625, 12572.5966796875] # obtain this from RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE/tree/master/demo)
assert len(bbox_list) == len(root_depth_list)
person_num = len(bbox_list)

# extractor
activation = {}
def get_activation(name):
    def hook(model, input, output):
        activation[name] = output.detach()
    return hook

for n in range(person_num):
    bbox = process_bbox(np.array(bbox_list[n]), original_img_width, original_img_height)
    img, img2bb_trans = generate_patch_image(original_img, bbox, False, 1.0, 0.0, False) 
    img = transform(img).cuda()[None,:,:,:]

    model.backbone.deonv1.register_forward_hook(get_activation('%d' % n))
    # forward
    with torch.no_grad():
        pose_3d = model(img) # x,y: pixel, z: root-relative depth (mm)

plt.figure(figsize=(32, 32))
a = activation['0'] - activation['1']
b = torch.sum(a, dim=1)
print(b)
for i in range(person_num):
    image = activation['%d'%i]
    print(image.size())
    sum_image = torch.sum(image[0], dim=0)
    print(sum_image.size())
    plt.subplot(1, person_num, i+1)
    plt.imshow(sum_image.cpu(), cmap='gray')
    plt.axis('off')

plt.show()
plt.close()


================================================
FILE: main/model.py
================================================
import torch
import torch.nn as nn
from torch.nn import functional as F
from backbone import *
from config import cfg
import os.path as osp

model_urls = {
    'MobileNetV2': 'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth',
    'ResNet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
    'ResNet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
    'ResNet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
    'ResNet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
    'ResNet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
    'ResNext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth',
    'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth',
    'wide_resnet50_2': 'https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth',
    'wide_resnet101_2': 'https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth',
}

BACKBONE_DICT = {
    'LPRES':LpNetResConcat,
    'LPSKI':LpNetSkiConcat,
    'LPWO':LpNetWoConcat
    }

def soft_argmax(heatmaps, joint_num):

    heatmaps = heatmaps.reshape((-1, joint_num, cfg.depth_dim*cfg.output_shape[0]*cfg.output_shape[1]))
    heatmaps = F.softmax(heatmaps, 2)
    heatmaps = heatmaps.reshape((-1, joint_num, cfg.depth_dim, cfg.output_shape[0], cfg.output_shape[1]))

    accu_x = heatmaps.sum(dim=(2,3))
    accu_y = heatmaps.sum(dim=(2,4))
    accu_z = heatmaps.sum(dim=(3,4))

    # accu_x = accu_x * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[1]+1).type(torch.cuda.FloatTensor), devices=[accu_x.device.index])[0]
    # accu_y = accu_y * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[0]+1).type(torch.cuda.FloatTensor), devices=[accu_y.device.index])[0]
    # accu_z = accu_z * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.depth_dim+1).type(torch.cuda.FloatTensor), devices=[accu_z.device.index])[0]

    accu_x = accu_x * torch.arange(1,cfg.output_shape[1]+1)
    accu_y = accu_y * torch.arange(1,cfg.output_shape[0]+1)
    accu_z = accu_z * torch.arange(1,cfg.depth_dim+1)

    accu_x = accu_x.sum(dim=2, keepdim=True) -1
    accu_y = accu_y.sum(dim=2, keepdim=True) -1
    accu_z = accu_z.sum(dim=2, keepdim=True) -1

    coord_out = torch.cat((accu_x, accu_y, accu_z), dim=2)

    return coord_out

class CustomNet(nn.Module):
    def __init__(self, backbone, joint_num):
        super(CustomNet, self).__init__()
        self.backbone = backbone
        self.joint_num = joint_num

    def forward(self, input_img, target=None):
        fm = self.backbone(input_img)
        coord = soft_argmax(fm, self.joint_num)

        if target is None:
            return coord
        else:
            target_coord = target['coord']
            target_vis = target['vis']
            target_have_depth = target['have_depth']

            ## coordinate loss
            loss_coord = torch.abs(coord - target_coord) * target_vis
            loss_coord = (loss_coord[:,:,0] + loss_coord[:,:,1] + loss_coord[:,:,2] * target_have_depth)/3.
            return loss_coord

def get_pose_net(backbone_str, is_train, joint_num):
    INPUT_SIZE = cfg.input_shape
    EMBEDDING_SIZE = cfg.embedding_size # feature dimension
    WIDTH_MULTIPLIER = cfg.width_multiplier

    assert INPUT_SIZE == (256, 256)

    print("=" * 60)
    print("{} BackBone Generated".format(backbone_str))
    print("=" * 60)
    model = CustomNet(BACKBONE_DICT[backbone_str](input_size = INPUT_SIZE, joint_num = joint_num, embedding_size = EMBEDDING_SIZE, width_mult = WIDTH_MULTIPLIER), joint_num)
    if is_train == True:
        model.backbone.init_weights()
    return model


================================================
FILE: main/pytorch2coreml.py
================================================
import torch
import argparse
import coremltools as ct


from config import cfg
from torch.nn.parallel.data_parallel import DataParallel
from base import Transformer

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--gpu', type=str, dest='gpu_ids')
    parser.add_argument('--joint', type=int, dest='joint')
    parser.add_argument('--modelpath', type=str, dest='modelpath')
    parser.add_argument('--backbone', type=str, dest='backbone')
    args = parser.parse_args()

    # test gpus
    if not args.gpu_ids:
        assert 0, "Please set proper gpu ids"

    if '-' in args.gpu_ids:
        gpus = args.gpu_ids.split('-')
        gpus[0] = int(gpus[0])
        gpus[1] = int(gpus[1]) + 1
        args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))

    return args

args = parse_args()

# modelpath as definite path
transformer = Transformer(args.backbone, args.joint, args.modelpath)
transformer._make_model()

single_pytorch_model = transformer.model

device = torch.device('cpu')
single_pytorch_model.to(device)

dummy_input = torch.randn(1, 3, 256, 256)

traced_model = torch.jit.trace(single_pytorch_model, dummy_input)

# Convert to Core ML using the Unified Conversion API
model = ct.convert(
    traced_model,
    inputs=[ct.ImageType(name="input_1", shape=dummy_input.shape)], #name "input_1" is used in 'quickstart'
)

model.save("test.mlmodel")


================================================
FILE: main/pytorch2onnx.py
================================================
import onnx
import torch
import argparse
import numpy
import imageio
import onnxruntime as ort
import tensorflow as tf

from config import cfg
from torchsummary import summary
from base import Transformer
from onnx_tf.backend import prepare

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--gpu', type=str, dest='gpu_ids')
    parser.add_argument('--joint', type=int, dest='joint')
    parser.add_argument('--modelpath', type=str, dest='modelpath')
    parser.add_argument('--backbone', type=str, dest='backbone')
    args = parser.parse_args()

    # test gpus
    if not args.gpu_ids:
        assert 0, "Please set proper gpu ids"

    if '-' in args.gpu_ids:
        gpus = args.gpu_ids.split('-')
        gpus[0] = int(gpus[0])
        gpus[1] = int(gpus[1]) + 1
        args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))

    return args

args = parse_args()

dummy_input = torch.randn(1, 3, 256, 256, device='cuda')

# modelpath as definite path
transformer = Transformer(args.backbone, args.joint, args.modelpath)
transformer._make_model()

single_pytorch_model = transformer.model

summary(single_pytorch_model, (3, 256, 256))

ONNX_PATH="../output/baseline.onnx"

torch.onnx.export(
    model=single_pytorch_model,
    args=dummy_input,
    f=ONNX_PATH, # where should it be saved
    verbose=False,
    export_params=True,
    do_constant_folding=False,  # fold constant values for optimization
    # do_constant_folding=True,   # fold constant values for optimization
    input_names=['input'],
    output_names=['output'],
    opset_version=11
)

onnx_model = onnx.load(ONNX_PATH)
onnx.checker.check_model(onnx_model)
onnx.helper.printable_graph(onnx_model.graph)

pytorch_result = single_pytorch_model(dummy_input)
pytorch_result = pytorch_result.cpu().detach().numpy()
print("pytorch_model output {}".format(pytorch_result.shape), pytorch_result)

ort_session = ort.InferenceSession(ONNX_PATH)
outputs = ort_session.run(None, {'input': dummy_input.cpu().numpy()})
outputs = numpy.array(outputs[0])
print("onnx_model ouput size{}".format(outputs.shape), outputs)

print("difference", numpy.linalg.norm(pytorch_result-outputs))

TF_PATH = "../output/baseline" # where the representation of tensorflow model will be stored

# prepare function converts an ONNX model to an internel representation
# of the computational graph called TensorflowRep and returns
# the converted representation.
tf_rep = prepare(onnx_model)  # creating TensorflowRep object

# export_graph function obtains the graph proto corresponding to the ONNX
# model associated with the backend representation and serializes
# to a protobuf file.
tf_rep.export_graph(TF_PATH)

TFLITE_PATH = "../output/baseline.tflite"

PB_PATH = "../output/baseline/saved_model.pb"

# make a converter object from the saved tensorflow file
# converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(PB_PATH, input_arrays=['input'], output_arrays=['output'])
converter = tf.lite.TFLiteConverter.from_saved_model(TF_PATH)

# tell converter which type of optimization techniques to use
# to view the best option for optimization read documentation of tflite about optimization
# go to this link https://www.tensorflow.org/lite/guide/get_started#4_optimize_your_model_optional
# converter.optimizations = [tf.compat.v1.lite.Optimize.DEFAULT]

# converter.experimental_new_converter = True
#
# # I had to explicitly state the ops
# converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
#                                        tf.lite.OpsSet.SELECT_TF_OPS]

def representative_dataset():

    dataset_size = 10

    for i in range(dataset_size):
        print(i)
        data = imageio.imread("../sample_images/" + "00000" + str(i) + ".jpg")
        data = numpy.resize(data, [1, 3, 256, 256])
        yield [data.astype(numpy.float32)]


converter.experimental_new_converter = True
converter.experimental_new_quantizer = True

converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8

# input_arrays = converter.get_input_arrays()
# converter.quantized_input_stats = {input_arrays[0]: (0.0, 1.0)}

tf_lite_model = converter.convert()
# Save the model.
with open(TFLITE_PATH, 'wb') as f:
    f.write(tf_lite_model)


================================================
FILE: main/summary.py
================================================
import torch
import argparse
import os
import os.path as osp
import torch.backends.cudnn as cudnn
from torchsummary import summary
from torch.nn.parallel.data_parallel import DataParallel
from config import cfg
from model import get_pose_net
from thop import profile
from thop import clever_format
from ptflops import get_model_complexity_info

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--gpu', type=str, dest='gpu_ids')
    parser.add_argument('--epoch', type=int, dest='test_epoch')
    parser.add_argument('--jointnum', type=int, dest='joint')
    parser.add_argument('--backbone', type=str, dest='backbone')
    args = parser.parse_args()

    # test gpus
    if not args.gpu_ids:
        assert 0, print("Please set proper gpu ids")

    if not args.joint:
        assert print("please insert number of joint")

    if '-' in args.gpu_ids:
        gpus = args.gpu_ids.split('-')
        gpus[0] = 0 if not gpus[0].isdigit() else int(gpus[0])
        gpus[1] = len(mem_info()) if not gpus[1].isdigit() else int(gpus[1]) + 1
        args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
    return args

# argument parsing
args = parse_args()
cfg.set_args(args.gpu_ids)
cudnn.benchmark = True

# joint set
joint_num = args.joint
joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe')
flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) )
if joint_num == 18:
    skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) )
if joint_num == 21:
    skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) )

# snapshot load
model_path = os.path.join(cfg.model_dir, 'snapshot_%d.pth.tar' % args.test_epoch)
assert osp.exists(model_path), 'Cannot find model at ' + model_path
model = get_pose_net(args.backbone, args.frontbone, False, joint_num)
model = DataParallel(model).cuda()
ckpt = torch.load(model_path)
model.load_state_dict(ckpt['network'])

single_model = model.module

summary(single_model, (3, 256, 256))

input = torch.randn(1, 3, 256, 256).cuda()
macs, params = profile(single_model, inputs=(input,))
macs, params = clever_format([macs, params], "%.3f")
flops, params1 = get_model_complexity_info(single_model, (3, 256, 256),as_strings=True, print_per_layer_stat=False)
print('{:<30}  {:<8}'.format('Computational complexity: ', flops))
print('{:<30}  {:<8}'.format('Computational complexity: ', macs))
print('{:<30}  {:<8}'.format('Number of parameters: ', params))
print('{:<30}  {:<8}'.format('Number of parameters: ', params1))


================================================
FILE: main/test.py
================================================
import argparse
from tqdm import tqdm
import numpy as np
import cv2
from config import cfg
import torch
from base import Tester
from utils.vis import vis_keypoints
from utils.pose_utils import flip
import torch.backends.cudnn as cudnn

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--gpu', type=str, dest='gpu_ids')
    parser.add_argument('--epochs', type=str, dest='model')
    parser.add_argument('--backbone', type=str, dest='backbone')
    args = parser.parse_args()

    # test gpus
    if not args.gpu_ids:
        assert 0, "Please set proper gpu ids"

    if '-' in args.gpu_ids:
        gpus = args.gpu_ids.split('-')
        gpus[0] = int(gpus[0])
        gpus[1] = int(gpus[1]) + 1
        args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))

    if '-' in args.model:
        model_epoch = args.model.split('-')
        model_epoch[0] = int(model_epoch[0])
        model_epoch[1] = int(model_epoch[1]) + 1
        args.model_epoch = model_epoch

    return args

def main():

    args = parse_args()
    cfg.set_args(args.gpu_ids)
    cudnn.fastest = True
    cudnn.benchmark = True
    cudnn.deterministic = False
    cudnn.enabled = True

    tester = Tester(args.backbone)
    tester._make_batch_generator()

    for epoch in range(args.model_epoch[0], args.model_epoch[1]):

        tester._make_model(epoch)

        preds = []

        with torch.no_grad():
            for itr, input_img in enumerate(tqdm(tester.batch_generator)):

                # forward
                coord_out = tester.model(input_img)

                if cfg.flip_test:
                    flipped_input_img = flip(input_img, dims=3)
                    flipped_coord_out = tester.model(flipped_input_img)
                    flipped_coord_out[:, :, 0] = cfg.output_shape[1] - flipped_coord_out[:, :, 0] - 1
                    for pair in tester.flip_pairs:
                        flipped_coord_out[:, pair[0], :], flipped_coord_out[:, pair[1], :] = flipped_coord_out[:, pair[1], :].clone(), flipped_coord_out[:, pair[0], :].clone()
                    coord_out = (coord_out + flipped_coord_out)/2.

                vis = False
                if vis:
                    filename = str(itr)
                    tmpimg = input_img[0].cpu().numpy()
                    tmpimg = tmpimg * np.array(cfg.pixel_std).reshape(3,1,1) + np.array(cfg.pixel_mean).reshape(3,1,1)
                    tmpimg = tmpimg.astype(np.uint8)
                    tmpimg = tmpimg[::-1, :, :]
                    tmpimg = np.transpose(tmpimg,(1,2,0)).copy()
                    tmpkps = np.zeros((3,tester.joint_num))
                    tmpkps[:2,:] = coord_out[0,:,:2].cpu().numpy().transpose(1,0) / cfg.output_shape[0] * cfg.input_shape[0]
                    tmpkps[2,:] = 1
                    tmpimg = vis_keypoints(tmpimg, tmpkps, tester.skeleton)
                    cv2.imwrite(filename + '_output.jpg', tmpimg)

                coord_out = coord_out.cpu().numpy()
                preds.append(coord_out)

        # evaluate
        preds = np.concatenate(preds, axis=0)
        tester._evaluate(preds, cfg.result_dir)

if __name__ == "__main__":
    main()


================================================
FILE: main/time.py
================================================
import torch
import argparse
from base import Transformer

def parse_args():
    parser = argparse.ArgumentParser()
    parser.add_argument('--gpu', type=str, dest='gpu_ids')
    parser.add_argument('--joint', type=int, dest='joint')
    parser.add_argument('--modelpath', type=str, dest='modelpath')
    parser.add_argument('--backbone', type=str, dest='backbone')
    args = parser.parse_args()

    # test gpus
    if not args.gpu_ids:
        assert 0, "Please set proper gpu ids"

    if '-' in args.gpu_ids:
        gpus = args.gpu_ids.split('-')
        gpus[0] = int(gpus[0])
        gpus[1] = int(gpus[1]) + 1
        args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))

    return args

args = parse_args()

optimal_batch_size = 64

transformer = Transformer(args.backbone, args.joint, args.modelpath)
transformer._make_model()

model = transformer.model

device = torch.device("cuda")

dummy_input = torch.randn(optimal_batch_size, 3, 256, 256, dtype=torch.float).to(device)

repetitions=100
total_time = 0

with torch.no_grad():
    for rep in range(repetitions):
        starter, ender = torch.cuda.Event(enable_timing=True),   torch.cuda.Event(enable_timing=True)
        starter.record()
        _ = model(dummy_input)
        ender.record()
        torch.cuda.synchronize()
        curr_time = starter.elapsed_time(ender)/1000
        total_time += curr_time
Throughput = (repetitions*optimal_batch_size)/total_time
print('Final Throughput:',Throughput)

================================================
FILE: main/train.py
================================================
import argparse
from config import cfg
from tqdm import tqdm
import os.path as osp
import numpy as np
import torch
from base import Trainer
from utils.pose_utils import flip
import torch.backends.cudnn as cudnn


def main():
    
    # argument parse and create log
    cudnn.fastest = True
    cudnn.benchmark = True

    trainer = Trainer(cfg)
    trainer._make_batch_generator()
    trainer._make_model()

    # train
    for epoch in range(trainer.start_epoch, cfg.end_epoch):
        
        trainer.set_lr(epoch)
        trainer.tot_timer.tic()
        trainer.read_timer.tic()

        for itr, (input_img, joint_img, joint_vis, joints_have_depth) in enumerate(trainer.batch_generator):
            trainer.read_timer.toc()
            trainer.gpu_timer.tic()

            # forward
            trainer.optimizer.zero_grad()
            target = {'coord': joint_img, 'vis': joint_vis, 'have_depth': joints_have_depth}
            loss_coord = trainer.model(input_img, target)
            loss_coord = loss_coord.mean()

            # backward
            loss = loss_coord
            loss.backward()
            trainer.optimizer.step()
            
            trainer.gpu_timer.toc()
            screen = [
                'Epoch %d/%d itr %d/%d:' % (epoch, cfg.end_epoch, itr, trainer.itr_per_epoch),
                'lr: %g' % (trainer.get_lr()),
                'speed: %.2f(%.2fs r%.2f)s/itr' % (
                    trainer.tot_timer.average_time, trainer.gpu_timer.average_time, trainer.read_timer.average_time),
                '%.2fh/epoch' % (trainer.tot_timer.average_time / 3600. * trainer.itr_per_epoch),
                '%s: %.4f' % ('loss_coord', loss_coord.detach()),
                ]
            trainer.logger.info(' '.join(screen))
            trainer.tot_timer.toc()
            trainer.tot_timer.tic()
            trainer.read_timer.tic()

        trainer.save_model({
            'epoch': epoch,
            'network': trainer.model.state_dict(),
            'optimizer': trainer.optimizer.state_dict(),
        }, epoch)

if __name__ == "__main__":
    main()


================================================
FILE: requirements.txt
================================================
numpy
tqdm
torch
torchvision
torchsummary
opencv-python
matplotlib
pycocotools
scipy


================================================
FILE: tool/Human36M/README.MD
================================================
## Human3.6M dataset pre-processing code

You should run the matlab code first, and the python code will convert the output of the matlab code to the json files.
**You don't have to run this when you downloaded json files from the google drive.** This is to make json files from raw data.


================================================
FILE: tool/Human36M/h36m2coco.py
================================================
import os
import os.path as osp
import scipy.io as sio
import numpy as np
import cv2
import random
import json
import math
from tqdm import tqdm

root_dir = './images' # define path here
save_dir = './annotations' # define path here

joint_num = 17
subject_list = [1, 5, 6, 7, 8, 9, 11]
action_idx = (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
subaction_idx = (1, 2)
camera_idx = (1, 2, 3, 4)
action_name = ['Directions', 'Discussion', 'Eating', 'Greeting', 'Phoning', 'Posing', 'Purchases', 'Sitting', 'SittingDown', 'Smoking', 'Photo', 'Waiting', 'Walking', 'WalkDog', 'WalkTogether']

def load_h36m_annot_file(annot_file):
    data = sio.loadmat(annot_file)
    joint_world = data['pose3d_world'] # 3D world coordinates of keypoints
    R = data['R'] # extrinsic
    T = np.reshape(data['T'],(3)) # extrinsic
    f = np.reshape(data['f'],(-1)) # focal legnth
    c = np.reshape(data['c'],(-1)) # principal points
    img_heights = np.reshape(data['img_height'],(-1))
    img_widths = np.reshape(data['img_width'],(-1))
   
    return joint_world, R, T, f, c, img_widths, img_heights

def _H36FolderName(subject_id, act_id, subact_id, camera_id):
    return "s_%02d_act_%02d_subact_%02d_ca_%02d" % \
           (subject_id, act_id, subact_id, camera_id)

def _H36ImageName(folder_name, frame_id):
    return "%s_%06d.jpg" % (folder_name, frame_id + 1)

def cam2pixel(cam_coord, f, c):
    x = cam_coord[..., 0] / cam_coord[..., 2] * f[0] + c[0]
    y = cam_coord[..., 1] / cam_coord[..., 2] * f[1] + c[1]
    return x,y

def world2cam(world_coord, R, t):
    cam_coord = np.dot(R, world_coord - t)
    return cam_coord

def get_bbox(joint_img):
    bbox = np.zeros((4))
    xmin = np.min(joint_img[:,0])
    ymin = np.min(joint_img[:,1])
    xmax = np.max(joint_img[:,0])
    ymax = np.max(joint_img[:,1])
    width = xmax - xmin - 1
    height = ymax - ymin - 1
    
    bbox[0] = (xmin + xmax)/2. - width/2*1.2
    bbox[1] = (ymin + ymax)/2. - height/2*1.2
    bbox[2] = width*1.2
    bbox[3] = height*1.2

    return bbox

img_id = 0; annot_id = 0
for subject in tqdm(subject_list):
    cam_param = {}
    joint_3d = {}
    images = []; annotations = [];
    for aid in tqdm(action_idx):
        for said in tqdm(subaction_idx):
            for cid in tqdm(camera_idx):
                folder = _H36FolderName(subject,aid,said,cid)
                if folder == 's_11_act_02_subact_02_ca_01':
                    continue
               
                joint_world, R, t, f, c, img_widths, img_heights = load_h36m_annot_file(osp.join(root_dir, folder, 'h36m_meta.mat'))

                if str(aid) not in joint_3d:
                    joint_3d[str(aid)] = {}
                if str(said) not in joint_3d[str(aid)]:
                    joint_3d[str(aid)][str(said)] = {}

                img_num = np.shape(joint_world)[0]
                for n in range(img_num):
                    img_dict = {}
                    img_dict['id'] = img_id
                    img_dict['file_name'] = osp.join(folder, _H36ImageName(folder, n))
                    img_dict['width'] = int(img_widths[n])
                    img_dict['height'] = int(img_heights[n])
                    img_dict['subject'] = subject
                    img_dict['action_name'] = action_name[aid-2]
                    img_dict['action_idx'] = aid
                    img_dict['subaction_idx'] = said
                    img_dict['cam_idx'] = cid
                    img_dict['frame_idx'] = n
                    images.append(img_dict)
                    
                    if str(cid) not in cam_param:
                        cam_param[str(cid)] = {'R': R.tolist(), 't': t.tolist(), 'f': f.tolist(), 'c': c.tolist()}
                    if str(n) not in joint_3d[str(aid)][str(said)]:
                        joint_3d[str(aid)][str(said)][str(n)] = joint_world[n].tolist()

                    annot_dict = {}
                    annot_dict['id'] = annot_id
                    annot_dict['image_id'] = img_id

                    # project world coordinate to cam, image coordinate space
                    joint_cam = np.zeros((joint_num,3))
                    for j in range(joint_num):
                        joint_cam[j] = world2cam(joint_world[n][j], R, t)
                    joint_img = np.zeros((joint_num,2))
                    joint_img[:,0], joint_img[:,1] = cam2pixel(joint_cam, f, c)
                    joint_vis = (joint_img[:,0] >= 0) * (joint_img[:,0] < img_widths[n]) * (joint_img[:,1] >= 0) * (joint_img[:,1] < img_heights[n])
                    annot_dict['keypoints_vis'] = joint_vis.tolist()
                    
                    bbox = get_bbox(joint_img)
                    annot_dict['bbox'] = bbox.tolist() # xmin, ymin, width, height
                    annotations.append(annot_dict)

                    img_id += 1
                    annot_id += 1
    
    data = {'images': images, 'annotations': annotations}
    with open(osp.join(save_dir, 'Human36M_subject' + str(subject) + '_data.json'), 'w') as f:
        json.dump(data, f)    
    with open(osp.join(save_dir, 'Human36M_subject' + str(subject) + '_camera.json'), 'w') as f:
        json.dump(cam_param, f)
    with open(osp.join(save_dir, 'Human36M_subject' + str(subject) + '_joint_3d.json'), 'w') as f:
        json.dump(joint_3d, f)


================================================
FILE: tool/Human36M/preprocess_h36m.m
================================================
% Preprocess human3.6m dataset
% Place this file to the Release-v1.1 folder and run it

function preprocess_h36m()

    close all;
    %clear;
    %clc;

    addpaths;

    %--------------------------------------------------------------------------
    % PARAMETERS

    % Subject (1, 5, 6, 7, 8, 9, 11)
    SUBJECT = [1 5 6 7 8 9 11];
     
    % Action (2 ~ 16)
    ACTION = 2:16;
    
    % Subaction (1 ~ 2)
    SUBACTION = 1:2;
    
    % Camera (1 ~ 4)
    CAMERA = 1:4;
    
    num_joint = 17;
    root_dir = '.'; % define path here
    
    % if rgb sequence is declared in the loop, it causes stuck (do not know
    % reason)
    rgb_sequence = cell(1,100000000);
    COUNT = 1;
    %--------------------------------------------------------------------------
    % MAIN LOOP
    % For each subject, action, subaction, and camera..
    for subject = SUBJECT
        for action = ACTION
            for subaction = SUBACTION
                for camera = CAMERA

                    fprintf('Processing subject %d, action %d, subaction %d, camera %d..\n', ...
                        subject, action, subaction, camera);

                    img_save_dir = sprintf('%s/images/s_%02d_act_%02d_subact_%02d_ca_%02d', ...
                        root_dir, subject, action, subaction, camera);
                    if ~exist(img_save_dir, 'dir')
                        mkdir(img_save_dir);
                    end

                    mask_save_dir = sprintf('%s/masks/s_%02d_act_%02d_subact_%02d_ca_%02d', ...
                        root_dir, subject, action, subaction, camera);
                    if ~exist(mask_save_dir, 'dir')
                        mkdir(mask_save_dir);
                    end

                    annot_save_dir = sprintf('%s/annotations/s_%02d_act_%02d_subact_%02d_ca_%02d', ...
                        root_dir, subject, action, subaction, camera);
                    if ~exist(annot_save_dir, 'dir')
                        mkdir(annot_save_dir);
                    end

                    if (subject==11) && (action==2) && (subaction==2) && (camera==1)
                        fprintf('There is an error in subject 11, action 2, subaction 2, and camera 1\n');
                        continue;
                    end
                    
                    % Select sequence
                    Sequence = H36MSequence(subject, action, subaction, camera);

                    % Get 3D pose and 2D pose
                    Features{1} = H36MPose3DPositionsFeature(); % 3D world coordinates
                    Features{1}.Part = 'body'; % Only consider 17 joints
                    Features{2} = H36MPose3DPositionsFeature('Monocular', true); % 3D camera coordinates
                    Features{2}.Part = 'body'; % Only consider 17 joints
                    Features{3} = H36MPose2DPositionsFeature(); % 2D image coordinates
                    Features{3}.Part = 'body'; % Only consider 17 joints
                    F = H36MComputeFeatures(Sequence, Features);
                    num_frame = Sequence.NumFrames;
                    pose3d_world = reshape(F{1}, num_frame, 3, num_joint);
                    pose3d = reshape(F{2}, num_frame, 3, num_joint);
                    pose2d = reshape(F{3}, num_frame, 2, num_joint);

                    % Camera (in global coordinate)
                    Camera = Sequence.getCamera();

                    % Sanity check
                    if false
                        R = Camera.R; % rotation matrix
                        T = Camera.T'; % origin of the world coord system
                        K = [Camera.f(1)    0           Camera.c(1);
                            0              Camera.f(2) Camera.c(2);
                            0              0           1]; % f: focal length, c: principal points
                        error = 0;
                        for i = 1:num_frame
                            X = squeeze(pose3d_global(i,:,:));
                            x = squeeze(pose2d(i,:,:));
                            px = K*R*(X-T);
                            px = px ./ px(3,:);
                            px = px(1:2,:);
                            error = error + mean(sqrt(sum((px-x).^2, 1)));
                        end
                        error = error / num_frame;
                        fprintf('reprojection error = %.2f (pixels)\n', error);
                        keyboard;
                    end

                    %% Image, bounding box for each sampled frame
                    fprintf('Load RGB video: ');
                    rgb_extractor = H36MRGBVideoFeature();
                    rgb_sequence{COUNT} = rgb_extractor.serializer(Sequence);
                    fprintf('Done!!\n');
                    img_height = zeros(num_frame,1);
                    img_width = zeros(num_frame,1);

                    fprintf('Load mask video: ');
                    mask_extractor = H36MMyBGMask();
                    mask_sequence = mask_extractor.serializer(Sequence);
                    fprintf('Done!!\n');


                    % For each frame,
                    for i = 1:num_frame
                        if mod(i,100) == 1
                            fprintf('.');
                        end
                       
                        % Save image
                        % Get data
                        img = rgb_sequence{COUNT}.getFrame(i);  
                        [h, w, c] = size(img);
                        img_height(i) = h;
                        img_width(i) = w;
                        img_name = sprintf('%s/s_%02d_act_%02d_subact_%02d_ca_%02d_%06d.jpg', ...
                            img_save_dir, subject, action, subaction, camera, i);
                        %imwrite(img, img_name);

                        mask = mask_sequence.Buffer{i};
                        mask_name = sprintf('%s/s_%02d_act_%02d_subact_%02d_ca_%02d_%06d.jpg', ...
                            mask_save_dir, subject, action, subaction, camera, i);
                        imwrite(mask, mask_name);
                        
                    end
                    
                    COUNT = COUNT + 1;
                    
                    % Save data
                    pose3d_world = permute(pose3d_world,[1,3,2]); % world coordinate 3D keypoint coordinates
                    R = Camera.R; % rotation matrix
                    T = Camera.T; % origin of the world coord system
                    f = Camera.f; % focal length
                    c = Camera.c; % principal points
                    filename = sprintf('%s/h36m_meta.mat', annot_save_dir);
                    %save(filename, 'pose3d_world', 'f', 'c', 'R', 'T', 'img_height', 'img_width');
                    
                    fprintf('\n');
                    
                end
            end
        end
    end

end


================================================
FILE: vis/coco_img_name.py
================================================
import os
import os.path as osp
import scipy.io as sio
import numpy as np
from pycocotools.coco import COCO
import json
import cv2
import random
import math

annot_path = osp.join('coco', 'person_keypoints_val2017.json')

data = []
db = COCO(annot_path)
fp = open('coco_img_name.txt','w') 
for iid in db.imgs.keys():
    img = db.imgs[iid]
    imgname = img['file_name']
    imgname = 'coco_' + imgname.split('.')[0]
    fp.write(imgname + '\n')
fp.close()


================================================
FILE: vis/multi/draw_2Dskeleton.m
================================================
function img = draw_2Dskeleton(img_name, pred_2d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton)
 
    img = imread(img_name);
    [imgHeight, imgWidth, dim] = size(image);

    f = figure;
    set(f, 'visible', 'off');
    imshow(img);
    hold on;
    line_width = 4;
    
    num_skeleton = size(skeleton,1);

    num_pred = size(pred_2d_kpt,1);
    for i = 1:num_pred
        for j =1:num_skeleton
            k1 = skeleton(j,1);
            k2 = skeleton(j,2);
            plot([pred_2d_kpt(i,k1,1),pred_2d_kpt(i,k2,1)],[pred_2d_kpt(i,k1,2),pred_2d_kpt(i,k2,2)],'Color',colorList_skeleton(j,:),'LineWidth',line_width);
        end
        for j=1:num_joint
            scatter(pred_2d_kpt(i,j,1),pred_2d_kpt(i,j,2),100,colorList_joint(j,:),'filled');
        end
    end
    
    set(gca,'Units','normalized','Position',[0 0 1 1]);  %# Modify axes size

    frame = getframe(gcf);
    img = frame.cdata;
    
    hold off;
    close(f); 

end


================================================
FILE: vis/multi/draw_3Dpose_coco.m
================================================
function draw_3Dpose_coco()
 
    root_path = '/mnt/hdd1/Data/Human_pose_estimation/COCO/2017/val2017/';
    save_path = './vis/';
    num_joint =  17;

    colorList_skeleton = [
    255/255 128/255 0/255;
    255/255 153/255 51/255;
    255/255 178/255 102/255;
    230/255 230/255 0/255;

    255/255 153/255 255/255;
    153/255 204/255 255/255;

    255/255 102/255 255/255;
    255/255 51/255 255/255;

    102/255 178/255 255/255;
    51/255 153/255 255/255;

    255/255 153/255 153/255;
    255/255 102/255 102/255;
    255/255 51/255 51/255;

    153/255 255/255 153/255;
    102/255 255/255 102/255;
    51/255 255/255 51/255;
    ];
    colorList_joint = [
    255/255 128/255 0/255;
    255/255 153/255 51/255;
    255/255 153/255 153/255;
    255/255 102/255 102/255;
    255/255 51/255 51/255;
    153/255 255/255 153/255;
    102/255 255/255 102/255;
    51/255 255/255 51/255;
    255/255 153/255 255/255;
    255/255 102/255 255/255;
    255/255 51/255 255/255;
    153/255 204/255 255/255;
    102/255 178/255 255/255;
    51/255 153/255 255/255;
    230/255 230/255 0/255;
    230/255 230/255 0/255;
    255/255 178/255 102/255;

    ];
    skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ];
    skeleton = transpose(reshape(skeleton,[2,16])) + 1;

    fp_img_name = fopen('../coco_img_name.txt');
    preds_2d_kpt = load('preds_2d_kpt_coco.mat');
    preds_3d_kpt = load('preds_3d_kpt_coco.mat');

    img_name = fgetl(fp_img_name);
    while ischar(img_name)
        
        if isfield(preds_2d_kpt,img_name)
            pred_2d_kpt = getfield(preds_2d_kpt,img_name);
            pred_3d_kpt = getfield(preds_3d_kpt,img_name);
            
            img_name = strsplit(img_name,'_'); 
            img_name = strcat(img_name{2},'.jpg');
            img_path = strcat(root_path,img_name);
            
            %img = draw_2Dskeleton(img_path,pred_2d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton);
            img = imread(img_path);
            f = draw_3Dskeleton(img,pred_3d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton);
            
            set(gcf, 'InvertHardCopy', 'off');
            set(gcf,'color','w');
            mkdir(save_path);
            saveas(f, strcat(save_path,img_name));
            close(f);
        end

        img_name = fgetl(fp_img_name);
    end
        
end


================================================
FILE: vis/multi/draw_3Dpose_mupots.m
================================================
function draw_3Dpose_mupots()
 
    root_path = '/mnt/hdd1/Data/Human_pose_estimation/MU/mupots-3d-eval/MultiPersonTestSet/';
    save_path = './vis/';
    num_joint =  17;

    colorList_skeleton = [
    255/255 128/255 0/255;
    255/255 153/255 51/255;
    255/255 178/255 102/255;
    230/255 230/255 0/255;

    255/255 153/255 255/255;
    153/255 204/255 255/255;

    255/255 102/255 255/255;
    255/255 51/255 255/255;

    102/255 178/255 255/255;
    51/255 153/255 255/255;

    255/255 153/255 153/255;
    255/255 102/255 102/255;
    255/255 51/255 51/255;

    153/255 255/255 153/255;
    102/255 255/255 102/255;
    51/255 255/255 51/255;
    ];
    colorList_joint = [
    255/255 128/255 0/255;
    255/255 153/255 51/255;
    255/255 153/255 153/255;
    255/255 102/255 102/255;
    255/255 51/255 51/255;
    153/255 255/255 153/255;
    102/255 255/255 102/255;
    51/255 255/255 51/255;
    255/255 153/255 255/255;
    255/255 102/255 255/255;
    255/255 51/255 255/255;
    153/255 204/255 255/255;
    102/255 178/255 255/255;
    51/255 153/255 255/255;
    230/255 230/255 0/255;
    230/255 230/255 0/255;
    255/255 178/255 102/255;

    ];
    skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ];
    skeleton = transpose(reshape(skeleton,[2,16])) + 1;

    fp_img_name = fopen('../mupots_img_name.txt');
    preds_2d_kpt = load('preds_2d_kpt_mupots.mat');
    preds_3d_kpt = load('preds_3d_kpt_mupots.mat');

    img_name = fgetl(fp_img_name);
    while ischar(img_name)
        img_name_split = strsplit(img_name);
        folder_id = str2double(img_name_split(1)); frame_id = str2double(img_name_split(2));
        img_name = sprintf('TS%d/img_%06d.jpg',folder_id, frame_id);
        img_path = strcat(root_path,img_name);

        pred_2d_kpt = getfield(preds_2d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id));
        pred_3d_kpt = getfield(preds_3d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id));

        %img = draw_2Dskeleton(img_path,pred_2d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton);
        img = imread(img_path);
        f = draw_3Dskeleton(img,pred_3d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton);

        set(gcf, 'InvertHardCopy', 'off');
        set(gcf,'color','w');
        mkdir(strcat(save_path,sprintf('TS%d',folder_id)));
        saveas(f, strcat(save_path,img_name));
        close(f);

        img_name = fgetl(fp_img_name);
    end
        
end


================================================
FILE: vis/multi/draw_3Dskeleton.m
================================================
function f = draw_3Dskeleton(img, pred_3d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton)
 
    x = pred_3d_kpt(:,:,1);
    y = pred_3d_kpt(:,:,2);
    z = pred_3d_kpt(:,:,3);
    pred_3d_kpt(:,:,1) = -z;
    pred_3d_kpt(:,:,2) = x;
    pred_3d_kpt(:,:,3) = -y;

    [imgHeight, imgWidth, dim] = size(img);
    
    figure_height = 450;
    figure_width = figure_height / imgHeight * imgWidth;
    f = figure('Position',[100 100 figure_width figure_height]);
    set(f, 'visible', 'off');
    hold on;
    grid on;
    line_width = 4;
    point_width = 50;
 
    num_skeleton = size(skeleton,1);

    num_pred = size(pred_3d_kpt,1);
    for i = 1:num_pred
        for j =1:num_skeleton
            k1 = skeleton(j,1);
            k2 = skeleton(j,2);

            plot3([pred_3d_kpt(i,k1,1),pred_3d_kpt(i,k2,1)],[pred_3d_kpt(i,k1,2),pred_3d_kpt(i,k2,2)],[pred_3d_kpt(i,k1,3),pred_3d_kpt(i,k2,3)],'Color',colorList_skeleton(j,:),'LineWidth',line_width);
        end
        for j=1:num_joint
            scatter3(pred_3d_kpt(i,j,1),pred_3d_kpt(i,j,2),pred_3d_kpt(i,j,3),point_width,colorList_joint(j,:),'filled');
        end
    end
   
    set(gca, 'color', [255/255 255/255 255/255]);
    set(gca,'XTickLabel',[]);
    set(gca,'YTickLabel',[]);
    set(gca,'ZTickLabel',[]);
    
    x = pred_3d_kpt(:,:,1);
    xmin = min(x(:)) - 120000;
    xmax = max(x(:)) + 6000;
    
    y = pred_3d_kpt(:,:,2);
    ymin = min(y(:));
    ymax = max(y(:));

    z = pred_3d_kpt(:,:,3);
    zmin = min(z(:));
    zmax = max(z(:));
    
    xlim([xmin xmax]);
    ylim([ymin ymax]);
    zlim([zmin zmax]);
    
    h_img = surf([xmin;xmin],[ymin ymax;ymin ymax],[zmax zmax;zmin zmin],'CData',img,'FaceColor','texturemap');
    set(h_img);
    
    view(62,27);
end


================================================
FILE: vis/mupots_img_name.py
================================================
import os
import os.path as osp
import scipy.io as sio
import numpy as np
from pycocotools.coco import COCO
import json
import cv2
import random
import math

annot_path = osp.join('mupots', 'MuPoTS-3D.json')

data = []
db = COCO(annot_path)
fp = open('mupots_img_name.txt','w') 
for iid in db.imgs.keys():
    img = db.imgs[iid]
    imgname = img['file_name'].split('/')
    folder_id = int(imgname[0][2:])
    frame_id = int(imgname[1].split('.')[0][4:])
    fp.write(str(folder_id) + ' ' + str(frame_id) + '\n')
fp.close()


================================================
FILE: vis/single/draw_2Dskeleton.m
================================================
function img = draw_2Dskeleton(img_name, pred_2d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton)
 
    img = imread(img_name);
    pred_2d_kpt = squeeze(pred_2d_kpt);

    f = figure;
    set(f, 'visible', 'off');
    imshow(img);
    hold on;
    line_width = 4;
    
    num_skeleton = size(skeleton,1);
    for j =1:num_skeleton
        k1 = skeleton(j,1);
        k2 = skeleton(j,2);
        plot([pred_2d_kpt(k1,1),pred_2d_kpt(k2,1)],[pred_2d_kpt(k1,2),pred_2d_kpt(k2,2)],'Color',colorList_skeleton(j,:),'LineWidth',line_width);
    end
    for j=1:num_joint
        scatter(pred_2d_kpt(j,1),pred_2d_kpt(j,2),100,colorList_joint(j,:),'filled');
    end
    
    set(gca,'Units','normalized','Position',[0 0 1 1]);  %# Modify axes size

    frame = getframe(gcf);
    img = frame.cdata;
    
    hold off;
    close(f); 

end


================================================
FILE: vis/single/draw_3Dpose_coco.m
================================================
function draw_3Dpose_coco()
    
    root_path = '/mnt/hdd1/Data/Human_pose_estimation/COCO/2017/val2017/';
    save_path = './vis/';
    num_joint =  17;
    mkdir(save_path);

    colorList_skeleton = [
    255/255 128/255 0/255;
    255/255 153/255 51/255;
    255/255 178/255 102/255;
    230/255 230/255 0/255;

    255/255 153/255 255/255;
    153/255 204/255 255/255;

    255/255 102/255 255/255;
    255/255 51/255 255/255;

    102/255 178/255 255/255;
    51/255 153/255 255/255;

    255/255 153/255 153/255;
    255/255 102/255 102/255;
    255/255 51/255 51/255;

    153/255 255/255 153/255;
    102/255 255/255 102/255;
    51/255 255/255 51/255;
    ];
    colorList_joint = [
    255/255 128/255 0/255;
    255/255 153/255 51/255;
    255/255 153/255 153/255;
    255/255 102/255 102/255;
    255/255 51/255 51/255;
    153/255 255/255 153/255;
    102/255 255/255 102/255;
    51/255 255/255 51/255;
    255/255 153/255 255/255;
    255/255 102/255 255/255;
    255/255 51/255 255/255;
    153/255 204/255 255/255;
    102/255 178/255 255/255;
    51/255 153/255 255/255;
    230/255 230/255 0/255;
    230/255 230/255 0/255;
    255/255 178/255 102/255;

    ];
    skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ];
    skeleton = transpose(reshape(skeleton,[2,16])) + 1;

    fp_img_name = fopen('../coco_img_name.txt');
    preds_2d_kpt = load('preds_2d_kpt_coco.mat');
    preds_3d_kpt = load('preds_3d_kpt_coco.mat');
    
    img_name = fgetl(fp_img_name);
    while ischar(img_name)
        if isfield(preds_2d_kpt,img_name)
            pred_2d_kpt = getfield(preds_2d_kpt,img_name);
            pred_3d_kpt = getfield(preds_3d_kpt,img_name);
            
            img_name = strsplit(img_name,'_');
            img_name = strcat(img_name{2},'.jpg');
            img_path = strcat(root_path,img_name);
 
            num_pred = size(pred_2d_kpt,1);
            for i = 1:num_pred

                img = draw_2Dskeleton(img_path,pred_2d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton);
                save_name = strsplit(img_name,'.');
                save_name = save_name{1};
                save_name = strcat(save_name,sprintf('_%d_2d.jpg',i));
                disp(strcat(save_path,save_name));
                imwrite(img,strcat(save_path,save_name));

                f = draw_3Dskeleton(pred_3d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton);
                set(gcf, 'InvertHardCopy', 'off');
                set(gcf,'color','w');
                save_name = strsplit(img_name,'.');
                save_name = save_name{1};
                save_name = strcat(save_name,sprintf('_%d_3d.jpg',i));
                saveas(f, strcat(save_path,save_name));
                close(f);
            end
           
        end

        img_name = fgetl(fp_img_name);
    end
        
end


================================================
FILE: vis/single/draw_3Dpose_mupots.m
================================================
function draw_3Dpose_mupots()
 
    root_path = '/mnt/hdd1/Data/Human_pose_estimation/MU/mupots-3d-eval/MultiPersonTestSet/';
    save_path = './vis/';
    num_joint =  17;

    colorList_skeleton = [
    255/255 128/255 0/255;
    255/255 153/255 51/255;
    255/255 178/255 102/255;
    230/255 230/255 0/255;

    255/255 153/255 255/255;
    153/255 204/255 255/255;

    255/255 102/255 255/255;
    255/255 51/255 255/255;

    102/255 178/255 255/255;
    51/255 153/255 255/255;

    255/255 153/255 153/255;
    255/255 102/255 102/255;
    255/255 51/255 51/255;

    153/255 255/255 153/255;
    102/255 255/255 102/255;
    51/255 255/255 51/255;
    ];
    colorList_joint = [
    255/255 128/255 0/255;
    255/255 153/255 51/255;
    255/255 153/255 153/255;
    255/255 102/255 102/255;
    255/255 51/255 51/255;
    153/255 255/255 153/255;
    102/255 255/255 102/255;
    51/255 255/255 51/255;
    255/255 153/255 255/255;
    255/255 102/255 255/255;
    255/255 51/255 255/255;
    153/255 204/255 255/255;
    102/255 178/255 255/255;
    51/255 153/255 255/255;
    230/255 230/255 0/255;
    230/255 230/255 0/255;
    255/255 178/255 102/255;

    ];
    skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ];
    skeleton = transpose(reshape(skeleton,[2,16])) + 1;

    fp_img_name = fopen('../mupots_img_name.txt');
    preds_2d_kpt = load('preds_2d_kpt_mupots.mat');
    preds_3d_kpt = load('preds_3d_kpt_mupots.mat');

    img_name = fgetl(fp_img_name);
    while ischar(img_name)
        img_name_split = strsplit(img_name);
        folder_id = str2double(img_name_split(1)); frame_id = str2double(img_name_split(2));
        img_name = sprintf('TS%d/img_%06d.jpg',folder_id, frame_id);
        img_path = strcat(root_path,img_name);
        mkdir(strcat(save_path,sprintf('TS%d',folder_id)));

        pred_2d_kpt = getfield(preds_2d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id));
        pred_3d_kpt = getfield(preds_3d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id));
        
        num_pred = size(pred_2d_kpt,1);
        for i = 1:num_pred

            img = draw_2Dskeleton(img_path,pred_2d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton);
            save_name = sprintf('TS%d/img_%06d_%d_2d.jpg',folder_id, frame_id, i);
            imwrite(img,strcat(save_path,save_name));

            f = draw_3Dskeleton(pred_3d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton);
            set(gcf, 'InvertHardCopy', 'off');
            set(gcf,'color','w');
            save_name = sprintf('TS%d/img_%06d_%d_3d.jpg',folder_id, frame_id, i);
            saveas(f, strcat(save_path,save_name));
            close(f);
        end

        img_name = fgetl(fp_img_name);
    end
        
end


================================================
FILE: vis/single/draw_3Dskeleton.m
================================================
function f = draw_3Dskeleton(pred_3d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton)
    
    pred_3d_kpt = squeeze(pred_3d_kpt);

    x = pred_3d_kpt(:,1);
    y = pred_3d_kpt(:,2);
    z = pred_3d_kpt(:,3);
    pred_3d_kpt(:,1) = -z;
    pred_3d_kpt(:,2) = x;
    pred_3d_kpt(:,3) = -y;

    
    f = figure;%('Position',[100 100 600 600]);
    set(f, 'visible', 'off');
    hold on;
    grid on;
    line_width = 6;
 
    num_skeleton = size(skeleton,1);
    for j =1:num_skeleton
        k1 = skeleton(j,1);
        k2 = skeleton(j,2);

        plot3([pred_3d_kpt(k1,1),pred_3d_kpt(k2,1)],[pred_3d_kpt(k1,2),pred_3d_kpt(k2,2)],[pred_3d_kpt(k1,3),pred_3d_kpt(k2,3)],'Color',colorList_skeleton(j,:),'LineWidth',line_width);
    end
    for j=1:num_joint
        scatter3(pred_3d_kpt(j,1),pred_3d_kpt(j,2),pred_3d_kpt(j,3),100,colorList_joint(j,:),'filled');
    end
   
    set(gca, 'color', [255/255 255/255 255/255]);
    set(gca,'XTickLabel',[]);
    set(gca,'YTickLabel',[]);
    set(gca,'ZTickLabel',[]);
    
    x = pred_3d_kpt(:,1);
    xmin = min(x(:)) - 100;
    xmax = max(x(:)) + 100;
    
    y = pred_3d_kpt(:,2);
    ymin = min(y(:)) - 100;
    ymax = max(y(:)) + 100;

    z = pred_3d_kpt(:,3);
    zmin = min(z(:));
    zmax = max(z(:)) + 100;

    xcenter = mean(pred_3d_kpt(:,1));
    ycenter = mean(pred_3d_kpt(:,2));
    zcenter = mean(pred_3d_kpt(:,3));
    xmin = xcenter - 1000;
    xmax = xcenter + 1000;
    ymin = ycenter - 1000;
    ymax = ycenter + 1000;
    zmin = zcenter - 1000;
    zmax = zcenter + 1000;
    
    xlim([xmin xmax]);
    ylim([ymin ymax]);
    zlim([zmin zmax]);
    
    view(62,7);
end