Repository: SangbumChoi/MobileHumanPose Branch: master Commit: a359dd9798e0 Files: 51 Total size: 172.2 KB Directory structure: gitextract_l8_g4mzf/ ├── .gitignore ├── LICENSE ├── README.md ├── common/ │ ├── backbone/ │ │ ├── __init__.py │ │ ├── lpnet_res_concat.py │ │ ├── lpnet_ski_concat.py │ │ └── lpnet_wo_concat.py │ ├── base.py │ ├── logger.py │ ├── timer.py │ └── utils/ │ ├── __init__.py │ ├── dir_utils.py │ ├── pose_utils.py │ └── vis.py ├── data/ │ ├── Dummy/ │ │ ├── Dummy.py │ │ ├── annotations/ │ │ │ ├── Dummy_subject1_camera.json │ │ │ ├── Dummy_subject1_data.json │ │ │ └── Dummy_subject1_joint_3d.json │ │ └── bbox_root/ │ │ └── bbox_dummy_output.json │ ├── Human36M/ │ │ └── Human36M.py │ ├── MPII/ │ │ └── MPII.py │ ├── MSCOCO/ │ │ └── MSCOCO.py │ ├── MuCo/ │ │ └── MuCo.py │ ├── MuPoTS/ │ │ ├── MuPoTS.py │ │ └── mpii_mupots_multiperson_eval.m │ ├── dataset.py │ └── multiple_datasets.py ├── demo/ │ └── demo.py ├── main/ │ ├── config.py │ ├── intermediate.py │ ├── model.py │ ├── pytorch2coreml.py │ ├── pytorch2onnx.py │ ├── summary.py │ ├── test.py │ ├── time.py │ └── train.py ├── requirements.txt ├── tool/ │ └── Human36M/ │ ├── README.MD │ ├── h36m2coco.py │ └── preprocess_h36m.m └── vis/ ├── coco_img_name.py ├── multi/ │ ├── draw_2Dskeleton.m │ ├── draw_3Dpose_coco.m │ ├── draw_3Dpose_mupots.m │ └── draw_3Dskeleton.m ├── mupots_img_name.py └── single/ ├── draw_2Dskeleton.m ├── draw_3Dpose_coco.m ├── draw_3Dpose_mupots.m └── draw_3Dskeleton.m ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ # virtualenv setting venv_3DMPPE # output result output # demo output demo/*.pth.tar # byte-compiled /__pycache_/ */__pycache/* */*/__pycache/ */*/*/__pycache/ *.py[cod] *.pyc # nohup process *.out # idea .DS_Store .idea ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2019 Gyeongsik Moon Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices" #### [2021.11.23] There will be massive refactoring and optimization expected. It will be released as soon as possible including new model.pth, Please wait for the model!(expecting end of December) #### [2022.05.19] Dummy dataloader is added. This will make reduce about to 100x faster that user to generate dummy pth.tar file of MobileHumanPose model for their PoC. ## Introduction This repo is official **[PyTorch](https://pytorch.org)** implementation of **[MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021)](https://openaccess.thecvf.com/content/CVPR2021W/MAI/html/Choi_MobileHumanPose_Toward_Real-Time_3D_Human_Pose_Estimation_in_Mobile_Devices_CVPRW_2021_paper.html)**. ## Dependencies * [PyTorch](https://pytorch.org) * [CUDA](https://developer.nvidia.com/cuda-downloads) * [cuDNN](https://developer.nvidia.com/cudnn) * [Anaconda](https://www.anaconda.com/download/) * [COCO API](https://github.com/cocodataset/cocoapi) This code is tested under Ubuntu 16.04, CUDA 11.2 environment with two NVIDIA RTX or V100 GPUs. Python 3.6.5 version with virtualenv is used for development. ## Directory ### Root The `${ROOT}` is described as below. ``` ${ROOT} |-- data |-- demo |-- common |-- main |-- tool |-- vis `-- output ``` * `data` contains data loading codes and soft links to images and annotations directories. * `demo` contains demo codes. * `common` contains kernel codes for 3d multi-person pose estimation system. Also custom backbone is implemented in this repo * `main` contains high-level codes for training or testing the network. * `tool` contains data pre-processing codes. You don't have to run this code. I provide pre-processed data below. * `vis` contains scripts for 3d visualization. * `output` contains log, trained models, visualized outputs, and test result. ### Data You need to follow directory structure of the `data` as below. ``` ${POSE_ROOT} |-- data | |-- Human36M | | |-- bbox_root | | | |-- bbox_root_human36m_output.json | | |-- images | | |-- annotations | |-- MPII | | |-- images | | |-- annotations | |-- MSCOCO | | |-- bbox_root | | | |-- bbox_root_coco_output.json | | |-- images | | | |-- train2017 | | | |-- val2017 | | |-- annotations | |-- MuCo | | |-- data | | | |-- augmented_set | | | |-- unaugmented_set | | | |-- MuCo-3DHP.json | |-- MuPoTS | | |-- bbox_root | | | |-- bbox_mupots_output.json | | |-- data | | | |-- MultiPersonTestSet | | | |-- MuPoTS-3D.json ``` * Download Human3.6M parsed data [[data](https://drive.google.com/drive/folders/1kgVH-GugrLoc9XyvP6nRoaFpw3TmM5xK?usp=sharing)] * Download MPII parsed data [[images](http://human-pose.mpi-inf.mpg.de/)][[annotations](https://drive.google.com/drive/folders/1MmQ2FRP0coxHGk0Ntj0JOGv9OxSNuCfK?usp=sharing)] * Download MuCo parsed and composited data [[data](https://drive.google.com/drive/folders/1yL2ey3aWHJnh8f_nhWP--IyC9krAPsQN?usp=sharing)] * Download MuPoTS parsed data [[images](http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson/)][[annotations](https://drive.google.com/drive/folders/1WmfQ8UEj6nuamMfAdkxmrNcsQTrTfKK_?usp=sharing)] * All annotation files follow [MS COCO format](http://cocodataset.org/#format-data). * If you want to add your own dataset, you have to convert it to [MS COCO format](http://cocodataset.org/#format-data). ### Output You need to follow the directory structure of the `output` folder as below. ``` ${POSE_ROOT} |-- output |-- |-- log |-- |-- model_dump |-- |-- result `-- |-- vis ``` * Creating `output` folder as soft link form is recommended instead of folder form because it would take large storage capacity. * `log` folder contains training log file. * `model_dump` folder contains saved checkpoints for each epoch. * `result` folder contains final estimation files generated in the testing stage. * `vis` folder contains visualized results. ### 3D visualization * Run `$DB_NAME_img_name.py` to get image file names in `.txt` format. * Place your test result files (`preds_2d_kpt_$DB_NAME.mat`, `preds_3d_kpt_$DB_NAME.mat`) in `single` or `multi` folder. * Run `draw_3Dpose_$DB_NAME.m`

## Running 3DMPPE_POSENET ### Requirements ```shell cd main pip install -r requirements.txt ``` ### Setup Training * In the `main/config.py`, you can change settings of the model including dataset to use, network backbone, and input size and so on. ### Train In the `main` folder, run ```bash python train.py --gpu 0-1 --backbone LPSKI ``` to train the network on the GPU 0,1. If you want to continue experiment, run ```bash python train.py --gpu 0-1 --backbone LPSKI --continue ``` `--gpu 0,1` can be used instead of `--gpu 0-1`. ### Test Place trained model at the `output/model_dump/`. In the `main` folder, run ```bash python test.py --gpu 0-1 --test_epoch 20-21 --backbone LPSKI ``` to test the network on the GPU 0,1 with 20th and 21th epoch trained model. `--gpu 0,1` can be used instead of `--gpu 0-1`. For the backbone you can either choose BACKBONE_DICT = { 'LPRES':LpNetResConcat, 'LPSKI':LpNetSkiConcat, 'LPWO':LpNetWoConcat } #### Human3.6M dataset using protocol 1 For the evaluation, you can run `test.py` or there are evaluation codes in `Human36M`.

#### Human3.6M dataset using protocol 2 For the evaluation, you can run `test.py` or there are evaluation codes in `Human36M`.

#### MuPoTS-3D dataset For the evaluation, run `test.py`. After that, move `data/MuPoTS/mpii_mupots_multiperson_eval.m` in `data/MuPoTS/data`. Also, move the test result files (`preds_2d_kpt_mupots.mat` and `preds_3d_kpt_mupots.mat`) in `data/MuPoTS/data`. Then run `mpii_mupots_multiperson_eval.m` with your evaluation mode arguments.

#### TFLite inference For the inference in mobile devices we also tested in mobile devices which converting PyTorch implementation through onnx and finally serving into TFlite. Official demo app is available in [here](https://github.com/tucan9389/PoseEstimation-TFLiteSwift) ## Reference **What this repo cames from:** Training section and is based on following paper and github * [PyTorch](https://pytorch.org) implementation of [Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image (ICCV 2019)](https://arxiv.org/abs/1907.11346). * Flexible and simple code. * Compatibility for most of the publicly available 2D and 3D, single and multi-person pose estimation datasets including **[Human3.6M](http://vision.imar.ro/human3.6m/description.php), [MPII](http://human-pose.mpi-inf.mpg.de/), [MS COCO 2017](http://cocodataset.org/#home), [MuCo-3DHP](http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson/) and [MuPoTS-3D](http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson/)**. * Human pose estimation visualization code. ``` @InProceedings{Choi_2021_CVPR, author = {Choi, Sangbum and Choi, Seokeon and Kim, Changick}, title = {MobileHumanPose: Toward Real-Time 3D Human Pose Estimation in Mobile Devices}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2021}, pages = {2328-2338} } ``` ================================================ FILE: common/backbone/__init__.py ================================================ from backbone.lpnet_res_concat import * from backbone.lpnet_ski_concat import * from backbone.lpnet_wo_concat import * ================================================ FILE: common/backbone/lpnet_res_concat.py ================================================ import torch.nn as nn import torch from torchsummary import summary def _make_divisible(v, divisor, min_value=None): """ This function is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by 8 It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py :param v: :param divisor: :param min_value: :return: """ if min_value is None: min_value = divisor new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) # Make sure that round down does not go down by more than 10%. if new_v < 0.9 * v: new_v += divisor return new_v class DoubleConv(nn.Sequential): def __init__(self, in_ch, out_ch, norm_layer=None, activation_layer=None): super(DoubleConv, self).__init__( nn.Conv2d(in_ch , out_ch, kernel_size=1), norm_layer(out_ch), activation_layer(out_ch), nn.Conv2d(out_ch, out_ch, kernel_size=3, padding=1), norm_layer(out_ch), activation_layer(out_ch), nn.UpsamplingBilinear2d(scale_factor=2) ) class ConvBNReLU(nn.Sequential): def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None, activation_layer=None): padding = (kernel_size - 1) // 2 super(ConvBNReLU, self).__init__( nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False), norm_layer(out_planes), activation_layer(out_planes) ) class InvertedResidual(nn.Module): def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None, activation_layer=None): super(InvertedResidual, self).__init__() self.stride = stride assert stride in [1, 2] hidden_dim = int(round(inp * expand_ratio)) self.use_res_connect = self.stride == 1 and inp == oup layers = [] if expand_ratio != 1: # pw layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer)) layers.extend([ # dw ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer, activation_layer=activation_layer), # pw-linear nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), norm_layer(oup), ]) self.conv = nn.Sequential(*layers) def forward(self, x): if self.use_res_connect: return x + self.conv(x) else: return self.conv(x) class LpNetResConcat(nn.Module): def __init__(self, input_size, joint_num, input_channel = 48, embedding_size = 2048, width_mult=1.0, round_nearest=8, block=None, norm_layer=None, activation_layer=None, inverted_residual_setting=None): super(LpNetResConcat, self).__init__() assert input_size[1] in [256] if block is None: block = InvertedResidual if norm_layer is None: norm_layer = nn.BatchNorm2d if activation_layer is None: activation_layer = nn.PReLU # PReLU does not have inplace True if inverted_residual_setting is None: inverted_residual_setting = [ # t, c, n, s [1, 64, 1, 1], #[-1, 48, 256, 256] [6, 48, 2, 2], #[-1, 48, 128, 128] [6, 48, 3, 2], #[-1, 48, 64, 64] [6, 64, 4, 2], #[-1, 64, 32, 32] [6, 96, 3, 2], #[-1, 96, 16, 16] [6, 160, 3, 2], #[-1, 160, 8, 8] [6, 320, 1, 1], #[-1, 320, 8, 8] ] # building first layer inp_channel = [_make_divisible(input_channel * width_mult, round_nearest), _make_divisible(input_channel * width_mult, round_nearest) + inverted_residual_setting[0][1], inverted_residual_setting[0][1] + inverted_residual_setting[1][1], inverted_residual_setting[1][1] + inverted_residual_setting[2][1], inverted_residual_setting[2][1] + inverted_residual_setting[3][1], inverted_residual_setting[3][1] + inverted_residual_setting[4][1], inverted_residual_setting[4][1] + inverted_residual_setting[5][1], inverted_residual_setting[5][1] + inverted_residual_setting[6][1], inverted_residual_setting[6][1] + embedding_size, 256 + embedding_size, ] self.first_conv = ConvBNReLU(3, inp_channel[0], stride=1, norm_layer=norm_layer, activation_layer=activation_layer) inv_residual = [] # building inverted residual blocks j = 0 for t, c, n, s in inverted_residual_setting: output_channel = _make_divisible(c * width_mult, round_nearest) for i in range(n): stride = s if i == 0 else 1 input_channel = inp_channel[j] if i == 0 else output_channel inv_residual.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer, activation_layer=activation_layer)) j += 1 # make it nn.Sequential self.inv_residual = nn.Sequential(*inv_residual) self.last_conv = ConvBNReLU(inp_channel[j], embedding_size, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer) self.deonv0 = DoubleConv(inp_channel[j+1], 256, norm_layer=norm_layer, activation_layer=activation_layer) self.deonv1 = DoubleConv(2304, 256, norm_layer=norm_layer, activation_layer=activation_layer) self.deonv2 = DoubleConv(512, 256, norm_layer=norm_layer, activation_layer=activation_layer) self.final_layer = nn.Conv2d( in_channels=256, out_channels= joint_num * 64, kernel_size=1, stride=1, padding=0 ) self.avgpool = nn.AvgPool2d(3, stride=2, padding=1, count_include_pad=False) self.upsample = nn.UpsamplingBilinear2d(scale_factor=2) def forward(self, x): x0 = self.first_conv(x) x1 = self.inv_residual[0:1](x0) x2 = self.inv_residual[1:3](torch.cat([x0, x1], dim=1)) x0 = self.inv_residual[3:6](torch.cat([self.avgpool(x1), x2], dim=1)) x1 = self.inv_residual[6:10](torch.cat([self.avgpool(x2), x0], dim=1)) x2 = self.inv_residual[10:13](torch.cat([self.avgpool(x0), x1], dim=1)) x0 = self.inv_residual[13:16](torch.cat([self.avgpool(x1), x2], dim=1)) x1 = self.inv_residual[16:17](torch.cat([self.avgpool(x2), x0], dim=1)) x2 = self.last_conv(torch.cat([x0, x1], dim=1)) x0 = self.deonv0(torch.cat([x1, x2], dim=1)) x1 = self.deonv1(torch.cat([self.upsample(x2), x0], dim=1)) x2 = self.deonv2(torch.cat([self.upsample(x0), x1], dim=1)) x0 = self.final_layer(x2) return x0 def init_weights(self): for i in [self.deconv0, self.deconv1, self.deconv2]: for name, m in i.named_modules(): if isinstance(m, nn.ConvTranspose2d): nn.init.normal_(m.weight, std=0.001) elif isinstance(m, nn.BatchNorm2d): nn.init.constant_(m.weight, 1) nn.init.constant_(m.bias, 0) for j in [self.first_conv, self.inv_residual, self.last_conv, self.final_layer]: for m in j.modules(): if isinstance(m, nn.Conv2d): nn.init.normal_(m.weight, std=0.001) if hasattr(m, 'bias'): if m.bias is not None: nn.init.constant_(m.bias, 0) if __name__ == "__main__": model = LpNetResConcat((256, 256), 18) test_data = torch.rand(1, 3, 256, 256) test_outputs = model(test_data) # print(test_outputs.size()) summary(model, (3, 256, 256)) ================================================ FILE: common/backbone/lpnet_ski_concat.py ================================================ import torch.nn as nn import torch from torchsummary import summary def _make_divisible(v, divisor, min_value=None): """ This function is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by 8 It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py :param v: :param divisor: :param min_value: :return: """ if min_value is None: min_value = divisor new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) # Make sure that round down does not go down by more than 10%. if new_v < 0.9 * v: new_v += divisor return new_v class DeConv(nn.Sequential): def __init__(self, in_ch, mid_ch, out_ch, norm_layer=None, activation_layer=None): super(DeConv, self).__init__( nn.Conv2d(in_ch + mid_ch, mid_ch, kernel_size=1), norm_layer(mid_ch), activation_layer(mid_ch), nn.Conv2d(mid_ch, out_ch, kernel_size=3, padding=1), norm_layer(out_ch), activation_layer(out_ch), nn.UpsamplingBilinear2d(scale_factor=2) ) class ConvBNReLU(nn.Sequential): def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None, activation_layer=None): padding = (kernel_size - 1) // 2 super(ConvBNReLU, self).__init__( nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False), norm_layer(out_planes), activation_layer(out_planes) ) class InvertedResidual(nn.Module): def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None, activation_layer=None): super(InvertedResidual, self).__init__() self.stride = stride assert stride in [1, 2] hidden_dim = int(round(inp * expand_ratio)) self.use_res_connect = self.stride == 1 and inp == oup layers = [] if expand_ratio != 1: # pw layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer)) layers.extend([ # dw ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer, activation_layer=activation_layer), # pw-linear nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), norm_layer(oup), ]) self.conv = nn.Sequential(*layers) def forward(self, x): if self.use_res_connect: return x + self.conv(x) else: return self.conv(x) class LpNetSkiConcat(nn.Module): def __init__(self, input_size, joint_num, input_channel = 48, embedding_size = 2048, width_mult=1.0, round_nearest=8, block=None, norm_layer=None, activation_layer=None, inverted_residual_setting=None): super(LpNetSkiConcat, self).__init__() assert input_size[1] in [256] if block is None: block = InvertedResidual if norm_layer is None: norm_layer = nn.BatchNorm2d if activation_layer is None: activation_layer = nn.PReLU # PReLU does not have inplace True if inverted_residual_setting is None: inverted_residual_setting = [ # t, c, n, s [1, 64, 1, 2], #[-1, 48, 256, 256] [6, 48, 2, 2], #[-1, 48, 128, 128] [6, 48, 3, 2], #[-1, 48, 64, 64] [6, 64, 4, 2], #[-1, 64, 32, 32] [6, 96, 3, 2], #[-1, 96, 16, 16] [6, 160, 3, 1], #[-1, 160, 8, 8] [6, 320, 1, 1], #[-1, 320, 8, 8] ] # building first layer input_channel = _make_divisible(input_channel * width_mult, round_nearest) self.first_conv = ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer, activation_layer=activation_layer) inv_residual = [] # building inverted residual blocks for t, c, n, s in inverted_residual_setting: output_channel = _make_divisible(c * width_mult, round_nearest) for i in range(n): stride = s if i == 0 else 1 inv_residual.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer, activation_layer=activation_layer)) input_channel = output_channel # make it nn.Sequential self.inv_residual = nn.Sequential(*inv_residual) self.last_conv = ConvBNReLU(input_channel, embedding_size, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer) self.deconv0 = DeConv(embedding_size, _make_divisible(inverted_residual_setting[-3][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer) self.deconv1 = DeConv(256, _make_divisible(inverted_residual_setting[-4][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer) self.deconv2 = DeConv(256, _make_divisible(inverted_residual_setting[-5][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer) self.final_layer = nn.Conv2d( in_channels=256, out_channels= joint_num * 32, kernel_size=1, stride=1, padding=0 ) def forward(self, x): x = self.first_conv(x) x = self.inv_residual[0:6](x) x2 = x x = self.inv_residual[6:10](x) x1 = x x = self.inv_residual[10:13](x) x0 = x x = self.inv_residual[13:16](x) x = self.inv_residual[16:](x) z = self.last_conv(x) z = torch.cat([x0, z], dim=1) z = self.deconv0(z) z = torch.cat([x1, z], dim=1) z = self.deconv1(z) z = torch.cat([x2, z], dim=1) z = self.deconv2(z) z = self.final_layer(z) return z def init_weights(self): for i in [self.deconv0, self.deconv1, self.deconv2]: for name, m in i.named_modules(): if isinstance(m, nn.ConvTranspose2d): nn.init.normal_(m.weight, std=0.001) elif isinstance(m, nn.BatchNorm2d): nn.init.constant_(m.weight, 1) nn.init.constant_(m.bias, 0) for j in [self.first_conv, self.inv_residual, self.last_conv, self.final_layer]: for m in j.modules(): if isinstance(m, nn.Conv2d): nn.init.normal_(m.weight, std=0.001) if hasattr(m, 'bias'): if m.bias is not None: nn.init.constant_(m.bias, 0) if __name__ == "__main__": LpNetSkiConcat((256, 256), 18).init_weights() model = LpNetSkiConcat((256, 256), 18) test_data = torch.rand(1, 3, 256, 256) test_outputs = model(test_data) print(test_outputs.size()) summary(model, (3, 256, 256)) ================================================ FILE: common/backbone/lpnet_wo_concat.py ================================================ import torch.nn as nn import torch from torchsummary import summary def _make_divisible(v, divisor, min_value=None): """ This function is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by 8 It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py :param v: :param divisor: :param min_value: :return: """ if min_value is None: min_value = divisor new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) # Make sure that round down does not go down by more than 10%. if new_v < 0.9 * v: new_v += divisor return new_v class DeConv(nn.Sequential): def __init__(self, in_ch, mid_ch, out_ch, norm_layer=None, activation_layer=None): super(DeConv, self).__init__( nn.Conv2d(in_ch, mid_ch, kernel_size=1), norm_layer(mid_ch), activation_layer(mid_ch), nn.Conv2d(mid_ch, out_ch, kernel_size=3, padding=1), norm_layer(out_ch), activation_layer(out_ch), nn.UpsamplingBilinear2d(scale_factor=2) ) class ConvBNReLU(nn.Sequential): def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None, activation_layer=None): padding = (kernel_size - 1) // 2 super(ConvBNReLU, self).__init__( nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False), norm_layer(out_planes), activation_layer(out_planes) ) class InvertedResidual(nn.Module): def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None, activation_layer=None): super(InvertedResidual, self).__init__() self.stride = stride assert stride in [1, 2] hidden_dim = int(round(inp * expand_ratio)) self.use_res_connect = self.stride == 1 and inp == oup layers = [] if expand_ratio != 1: # pw layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer)) layers.extend([ # dw ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer, activation_layer=activation_layer), # pw-linear nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False), norm_layer(oup), ]) self.conv = nn.Sequential(*layers) def forward(self, x): if self.use_res_connect: return x + self.conv(x) else: return self.conv(x) class LpNetWoConcat(nn.Module): def __init__(self, input_size, joint_num, input_channel = 48, embedding_size = 2048, width_mult=1.0, round_nearest=8, block=None, norm_layer=None, activation_layer=None, inverted_residual_setting=None): super(LpNetWoConcat, self).__init__() assert input_size[1] in [256] if block is None: block = InvertedResidual if norm_layer is None: norm_layer = nn.BatchNorm2d if activation_layer is None: activation_layer = nn.PReLU # PReLU does not have inplace True if inverted_residual_setting is None: inverted_residual_setting = [ # t, c, n, s [1, 64, 1, 1], #[-1, 48, 256, 256] [6, 48, 2, 2], #[-1, 48, 128, 128] [6, 48, 3, 2], #[-1, 48, 64, 64] [6, 64, 4, 2], #[-1, 64, 32, 32] [6, 96, 3, 2], #[-1, 96, 16, 16] [6, 160, 3, 2], #[-1, 160, 8, 8] [6, 320, 1, 1], #[-1, 320, 8, 8] ] # building first layer input_channel = _make_divisible(input_channel * width_mult, round_nearest) self.first_conv = ConvBNReLU(3, input_channel, stride=1, norm_layer=norm_layer, activation_layer=activation_layer) inv_residual = [] # building inverted residual blocks for t, c, n, s in inverted_residual_setting: output_channel = _make_divisible(c * width_mult, round_nearest) for i in range(n): stride = s if i == 0 else 1 inv_residual.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer, activation_layer=activation_layer)) input_channel = output_channel # make it nn.Sequential self.inv_residual = nn.Sequential(*inv_residual) self.last_conv = ConvBNReLU(input_channel, embedding_size, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer) self.deconv0 = DeConv(embedding_size, _make_divisible(inverted_residual_setting[-2][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer) self.deconv1 = DeConv(256, _make_divisible(inverted_residual_setting[-3][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer) self.deconv2 = DeConv(256, _make_divisible(inverted_residual_setting[-4][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer) self.final_layer = nn.Conv2d( in_channels=256, out_channels= joint_num * 64, kernel_size=1, stride=1, padding=0 ) def forward(self, x): x = self.first_conv(x) x = self.inv_residual(x) x = self.last_conv(x) x = self.deconv0(x) x = self.deconv1(x) x = self.deconv2(x) x = self.final_layer(x) return x def init_weights(self): for i in [self.deconv0, self.deconv1, self.deconv2]: for name, m in i.named_modules(): if isinstance(m, nn.ConvTranspose2d): nn.init.normal_(m.weight, std=0.001) elif isinstance(m, nn.BatchNorm2d): nn.init.constant_(m.weight, 1) nn.init.constant_(m.bias, 0) for j in [self.first_conv, self.inv_residual, self.last_conv, self.final_layer]: for m in j.modules(): if isinstance(m, nn.Conv2d): nn.init.normal_(m.weight, std=0.001) if hasattr(m, 'bias'): if m.bias is not None: nn.init.constant_(m.bias, 0) if __name__ == "__main__": model = LpNetWoConcat((256, 256), 18) test_data = torch.rand(1, 3, 256, 256) test_outputs = model(test_data) summary(model, (3, 256, 256)) ================================================ FILE: common/base.py ================================================ import os import os.path as osp import math import time import glob import abc from torch.utils.data import DataLoader import torch.optim import torchvision.transforms as transforms from timer import Timer from logger import colorlogger from torch.nn.parallel.data_parallel import DataParallel from config import cfg from model import get_pose_net from dataset import DatasetLoader from multiple_datasets import MultipleDatasets # dynamic dataset import for i in range(len(cfg.trainset_3d)): exec('from ' + cfg.trainset_3d[i] + ' import ' + cfg.trainset_3d[i]) for i in range(len(cfg.trainset_2d)): exec('from ' + cfg.trainset_2d[i] + ' import ' + cfg.trainset_2d[i]) exec('from ' + cfg.testset + ' import ' + cfg.testset) class Base(object): __metaclass__ = abc.ABCMeta def __init__(self, log_name='logs.txt'): self.cur_epoch = 0 # timer self.tot_timer = Timer() self.gpu_timer = Timer() self.read_timer = Timer() # logger self.logger = colorlogger(cfg.log_dir, log_name=log_name) @abc.abstractmethod def _make_batch_generator(self): return @abc.abstractmethod def _make_model(self): return def save_model(self, state, epoch): file_path = osp.join(cfg.model_dir,'snapshot_{}.pth.tar'.format(str(epoch))) torch.save(state, file_path) self.logger.info("Write snapshot into {}".format(file_path)) def load_model(self, model, optimizer): model_file_list = glob.glob(osp.join(cfg.model_dir,'*.pth.tar')) cur_epoch = max([int(file_name[file_name.find('snapshot_') + 9 : file_name.find('.pth.tar')]) for file_name in model_file_list]) ckpt = torch.load(osp.join(cfg.model_dir, 'snapshot_' + str(cur_epoch) + '.pth.tar')) start_epoch = ckpt['epoch'] + 1 model.load_state_dict(ckpt['network']) optimizer.load_state_dict(ckpt['optimizer']) return start_epoch, model, optimizer class Trainer(Base): def __init__(self, cfg): super(Trainer, self).__init__(log_name = 'train_logs.txt') self.backbone = cfg.backbone def get_optimizer(self, model): optimizer = torch.optim.Adam(model.parameters(), lr=cfg.lr) return optimizer def set_lr(self, epoch): for e in cfg.lr_dec_epoch: if epoch < e: break if epoch < cfg.lr_dec_epoch[-1]: idx = cfg.lr_dec_epoch.index(e) for g in self.optimizer.param_groups: g['lr'] = cfg.lr / (cfg.lr_dec_factor ** idx) else: for g in self.optimizer.param_groups: g['lr'] = cfg.lr / (cfg.lr_dec_factor ** len(cfg.lr_dec_epoch)) def get_lr(self): for g in self.optimizer.param_groups: cur_lr = g['lr'] return cur_lr def _make_batch_generator(self): # data load and construct batch generator self.logger.info("Creating dataset...") trainset3d_loader = [] for i in range(len(cfg.trainset_3d)): if i > 0: ref_joints_name = trainset3d_loader[0].joints_name else: ref_joints_name = None trainset3d_loader.append(DatasetLoader(eval(cfg.trainset_3d[i])("train"), ref_joints_name, True, transforms.Compose([\ transforms.ToTensor(), transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]\ ))) ref_joints_name = trainset3d_loader[0].joints_name trainset2d_loader = [] for i in range(len(cfg.trainset_2d)): trainset2d_loader.append(DatasetLoader(eval(cfg.trainset_2d[i])("train"), ref_joints_name, True, transforms.Compose([\ transforms.ToTensor(), transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]\ ))) self.joint_num = trainset3d_loader[0].joint_num trainset3d_loader = MultipleDatasets(trainset3d_loader, make_same_len=False) if trainset2d_loader != []: trainset2d_loader = MultipleDatasets(trainset2d_loader, make_same_len=False) trainset_loader = MultipleDatasets([trainset3d_loader, trainset2d_loader], make_same_len=True) else: trainset_loader = MultipleDatasets([trainset3d_loader, ], make_same_len=True) self.itr_per_epoch = math.ceil(len(trainset_loader) / cfg.num_gpus / cfg.batch_size) self.batch_generator = DataLoader(dataset=trainset_loader, batch_size=cfg.num_gpus*cfg.batch_size, shuffle=True, num_workers=cfg.num_thread, pin_memory=True) def _make_model(self): # prepare network self.logger.info("Creating graph and optimizer...") model = get_pose_net(self.backbone, True, self.joint_num) if torch.cuda.is_available(): model = DataParallel(model).cuda() optimizer = self.get_optimizer(model) if cfg.continue_train: start_epoch, model, optimizer = self.load_model(model, optimizer) else: start_epoch = 0 model.train() self.start_epoch = start_epoch self.model = model self.optimizer = optimizer class Tester(Base): def __init__(self, backbone): self.backbone = backbone super(Tester, self).__init__(log_name = 'test_logs.txt') def _make_batch_generator(self): # data load and construct batch generator # self.logger.info("Creating dataset...") testset = eval(cfg.testset)("test") testset_loader = DatasetLoader(testset, None, False, transforms.Compose([\ transforms.ToTensor(), transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]\ )) batch_generator = DataLoader(dataset=testset_loader, batch_size=cfg.num_gpus*cfg.test_batch_size, shuffle=False, num_workers=cfg.num_thread, pin_memory=True) self.testset = testset self.joint_num = testset_loader.joint_num self.skeleton = testset_loader.skeleton self.flip_pairs = testset.flip_pairs self.batch_generator = batch_generator def _make_model(self, test_epoch): self.test_epoch = test_epoch model_path = os.path.join(cfg.model_dir, 'snapshot_%d.pth.tar' % self.test_epoch) assert os.path.exists(model_path), 'Cannot find model at ' + model_path # self.logger.info('Load checkpoint from {}'.format(model_path)) # prepare network # self.logger.info("Creating graph...") model = get_pose_net(self.backbone, False, self.joint_num) model = DataParallel(model).cuda() ckpt = torch.load(model_path) model.load_state_dict(ckpt['network']) model.eval() self.model = model def _evaluate(self, preds, result_save_path): eval_summary = self.testset.evaluate(preds, result_save_path) self.logger.info('{}'.format(eval_summary)) class Transformer(Base): def __init__(self, backbone, jointnum, modelpath): super(Transformer, self).__init__(log_name='transformer_logs.txt') self.backbone = backbone self.jointnum = jointnum self.modelpath = modelpath def _make_model(self): # prepare network self.logger.info("Creating graph and optimizer...") model = get_pose_net(self.backbone, False, self.jointnum) model = DataParallel(model).cuda() model.load_state_dict(torch.load(self.modelpath)['network']) single_pytorch_model = model.module single_pytorch_model.eval() self.model = single_pytorch_model ================================================ FILE: common/logger.py ================================================ import logging import os OK = '\033[92m' WARNING = '\033[93m' FAIL = '\033[91m' END = '\033[0m' PINK = '\033[95m' BLUE = '\033[94m' GREEN = OK RED = FAIL WHITE = END YELLOW = WARNING class colorlogger(): def __init__(self, log_dir, log_name='train_logs.txt'): # set log self._logger = logging.getLogger(log_name) self._logger.setLevel(logging.INFO) log_file = os.path.join(log_dir, log_name) if not os.path.exists(log_dir): os.makedirs(log_dir) file_log = logging.FileHandler(log_file, mode='a') file_log.setLevel(logging.INFO) console_log = logging.StreamHandler() console_log.setLevel(logging.INFO) formatter = logging.Formatter( "{}%(asctime)s{} %(message)s".format(GREEN, END), "%m-%d %H:%M:%S") file_log.setFormatter(formatter) console_log.setFormatter(formatter) self._logger.addHandler(file_log) self._logger.addHandler(console_log) def debug(self, msg): self._logger.debug(str(msg)) def info(self, msg): self._logger.info(str(msg)) def warning(self, msg): self._logger.warning(WARNING + 'WRN: ' + str(msg) + END) def critical(self, msg): self._logger.critical(RED + 'CRI: ' + str(msg) + END) def error(self, msg): self._logger.error(RED + 'ERR: ' + str(msg) + END) ================================================ FILE: common/timer.py ================================================ # -------------------------------------------------------- # Fast R-CNN # Copyright (c) 2015 Microsoft # Licensed under The MIT License [see LICENSE for details] # Written by Ross Girshick # -------------------------------------------------------- import time class Timer(object): """A simple timer.""" def __init__(self): self.total_time = 0. self.calls = 0 self.start_time = 0. self.diff = 0. self.average_time = 0. self.warm_up = 0 def tic(self): # using time.time instead of time.clock because time time.clock # does not normalize for multithreading self.start_time = time.time() def toc(self, average=True): self.diff = time.time() - self.start_time if self.warm_up < 10: self.warm_up += 1 return self.diff else: self.total_time += self.diff self.calls += 1 self.average_time = self.total_time / self.calls if average: return self.average_time else: return self.diff ================================================ FILE: common/utils/__init__.py ================================================ ================================================ FILE: common/utils/dir_utils.py ================================================ import os import sys def make_folder(folder_name): if not os.path.exists(folder_name): os.makedirs(folder_name) def add_pypath(path): if path not in sys.path: sys.path.insert(0, path) ================================================ FILE: common/utils/pose_utils.py ================================================ import torch import numpy as np from config import cfg import copy def cam2pixel(cam_coord, f, c): x = cam_coord[:, 0] / (cam_coord[:, 2] + 1e-8) * f[0] + c[0] y = cam_coord[:, 1] / (cam_coord[:, 2] + 1e-8) * f[1] + c[1] z = cam_coord[:, 2] img_coord = np.concatenate((x[:,None], y[:,None], z[:,None]),1) return img_coord def pixel2cam(pixel_coord, f, c): x = (pixel_coord[:, 0] - c[0]) / f[0] * pixel_coord[:, 2] y = (pixel_coord[:, 1] - c[1]) / f[1] * pixel_coord[:, 2] z = pixel_coord[:, 2] cam_coord = np.concatenate((x[:,None], y[:,None], z[:,None]),1) return cam_coord def world2cam(world_coord, R, t): cam_coord = np.dot(R, world_coord.transpose(1,0)).transpose(1,0) + t.reshape(1,3) return cam_coord def rigid_transform_3D(A, B): centroid_A = np.mean(A, axis = 0) centroid_B = np.mean(B, axis = 0) H = np.dot(np.transpose(A - centroid_A), B - centroid_B) U, s, V = np.linalg.svd(H) R = np.dot(np.transpose(V), np.transpose(U)) if np.linalg.det(R) < 0: V[2] = -V[2] R = np.dot(np.transpose(V), np.transpose(U)) t = -np.dot(R, np.transpose(centroid_A)) + np.transpose(centroid_B) return R, t def rigid_align(A, B): R, t = rigid_transform_3D(A, B) A2 = np.transpose(np.dot(R, np.transpose(A))) + t return A2 def get_bbox(joint_img): # bbox extract from keypoint coordinates bbox = np.zeros((4)) xmin = np.min(joint_img[:,0]) ymin = np.min(joint_img[:,1]) xmax = np.max(joint_img[:,0]) ymax = np.max(joint_img[:,1]) width = xmax - xmin - 1 height = ymax - ymin - 1 bbox[0] = (xmin + xmax)/2. - width/2*1.2 bbox[1] = (ymin + ymax)/2. - height/2*1.2 bbox[2] = width*1.2 bbox[3] = height*1.2 return bbox def process_bbox(bbox, width, height): # sanitize bboxes x, y, w, h = bbox x1 = np.max((0, x)) y1 = np.max((0, y)) x2 = np.min((width - 1, x1 + np.max((0, w - 1)))) y2 = np.min((height - 1, y1 + np.max((0, h - 1)))) if w*h > 0 and x2 >= x1 and y2 >= y1: bbox = np.array([x1, y1, x2-x1, y2-y1]) else: return None # aspect ratio preserving bbox w = bbox[2] h = bbox[3] c_x = bbox[0] + w/2. c_y = bbox[1] + h/2. aspect_ratio = cfg.input_shape[1]/cfg.input_shape[0] if w > aspect_ratio * h: h = w / aspect_ratio elif w < aspect_ratio * h: w = h * aspect_ratio bbox[2] = w*1.25 bbox[3] = h*1.25 bbox[0] = c_x - bbox[2]/2. bbox[1] = c_y - bbox[3]/2. return bbox def transform_joint_to_other_db(src_joint, src_name, dst_name): src_joint_num = len(src_name) dst_joint_num = len(dst_name) new_joint = np.zeros(((dst_joint_num,) + src_joint.shape[1:])) for src_idx in range(len(src_name)): name = src_name[src_idx] if name in dst_name: dst_idx = dst_name.index(name) new_joint[dst_idx] = src_joint[src_idx] return new_joint def fliplr_joints(_joints, width, matched_parts): """ flip coords joints: numpy array, nJoints * dim, dim == 2 [x, y] or dim == 3 [x, y, z] width: image width matched_parts: list of pairs """ joints = _joints.copy() # Flip horizontal joints[:, 0] = width - joints[:, 0] - 1 # Change left-right parts for pair in matched_parts: joints[pair[0], :], joints[pair[1], :] = joints[pair[1], :], joints[pair[0], :].copy() return joints def multi_meshgrid(*args): """ Creates a meshgrid from possibly many elements (instead of only 2). Returns a nd tensor with as many dimensions as there are arguments """ args = list(args) template = [1 for _ in args] for i in range(len(args)): n = args[i].shape[0] template_copy = template.copy() template_copy[i] = n args[i] = args[i].view(*template_copy) # there will be some broadcast magic going on return tuple(args) def flip(tensor, dims): if not isinstance(dims, (tuple, list)): dims = [dims] indices = [torch.arange(tensor.shape[dim] - 1, -1, -1, dtype=torch.int64) for dim in dims] multi_indices = multi_meshgrid(*indices) final_indices = [slice(i) for i in tensor.shape] for i, dim in enumerate(dims): final_indices[dim] = multi_indices[i] flipped = tensor[final_indices] assert flipped.device == tensor.device assert flipped.requires_grad == tensor.requires_grad return flipped ================================================ FILE: common/utils/vis.py ================================================ import os import cv2 import numpy as np from mpl_toolkits.mplot3d import Axes3D import matplotlib.pyplot as plt import matplotlib as mpl from config import cfg def vis_keypoints(img, kps, kps_lines, kp_thresh=0.4, alpha=1): # Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv. cmap = plt.get_cmap('rainbow') colors = [cmap(i) for i in np.linspace(0, 1, len(kps_lines) + 2)] colors = [(c[2] * 255, c[1] * 255, c[0] * 255) for c in colors] # Perform the drawing on a copy of the image, to allow for blending. kp_mask = np.copy(img) # Draw the keypoints. for l in range(len(kps_lines)): i1 = kps_lines[l][0] i2 = kps_lines[l][1] p1 = kps[0, i1].astype(np.int32), kps[1, i1].astype(np.int32) p2 = kps[0, i2].astype(np.int32), kps[1, i2].astype(np.int32) if kps[2, i1] > kp_thresh and kps[2, i2] > kp_thresh: cv2.line( kp_mask, p1, p2, color=colors[l], thickness=2, lineType=cv2.LINE_AA) if kps[2, i1] > kp_thresh: cv2.circle( kp_mask, p1, radius=3, color=colors[l], thickness=-1, lineType=cv2.LINE_AA) if kps[2, i2] > kp_thresh: cv2.circle( kp_mask, p2, radius=3, color=colors[l], thickness=-1, lineType=cv2.LINE_AA) # Blend the keypoints. return cv2.addWeighted(img, 1.0 - alpha, kp_mask, alpha, 0) def vis_3d_skeleton(kpt_3d, kpt_3d_vis, kps_lines, filename=None): fig = plt.figure() ax = fig.add_subplot(111, projection='3d') # Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv. cmap = plt.get_cmap('rainbow') colors = [cmap(i) for i in np.linspace(0, 1, len(kps_lines) + 2)] colors = [np.array((c[2], c[1], c[0])) for c in colors] for l in range(len(kps_lines)): i1 = kps_lines[l][0] i2 = kps_lines[l][1] x = np.array([kpt_3d[i1,0], kpt_3d[i2,0]]) y = np.array([kpt_3d[i1,1], kpt_3d[i2,1]]) z = np.array([kpt_3d[i1,2], kpt_3d[i2,2]]) if kpt_3d_vis[i1,0] > 0 and kpt_3d_vis[i2,0] > 0: ax.plot(x, z, -y, c=colors[l], linewidth=2) if kpt_3d_vis[i1,0] > 0: ax.scatter(kpt_3d[i1,0], kpt_3d[i1,2], -kpt_3d[i1,1], c=colors[l], marker='o') if kpt_3d_vis[i2,0] > 0: ax.scatter(kpt_3d[i2,0], kpt_3d[i2,2], -kpt_3d[i2,1], c=colors[l], marker='o') if filename is None: ax.set_title('3D vis') else: ax.set_title(filename) ax.set_xlabel('X Label') ax.set_ylabel('Z Label') ax.set_zlabel('Y Label') ax.legend() plt.show() cv2.waitKey(0) def vis_3d_multiple_skeleton(kpt_3d, kpt_3d_vis, kps_lines, filename=None): fig = plt.figure() ax = fig.add_subplot(111, projection='3d') # Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv. cmap = plt.get_cmap('rainbow') colors = [cmap(i) for i in np.linspace(0, 1, len(kps_lines) + 2)] colors = [np.array((c[2], c[1], c[0])) for c in colors] for l in range(len(kps_lines)): i1 = kps_lines[l][0] i2 = kps_lines[l][1] person_num = kpt_3d.shape[0] for n in range(person_num): x = np.array([kpt_3d[n,i1,0], kpt_3d[n,i2,0]]) y = np.array([kpt_3d[n,i1,1], kpt_3d[n,i2,1]]) z = np.array([kpt_3d[n,i1,2], kpt_3d[n,i2,2]]) if kpt_3d_vis[n,i1,0] > 0 and kpt_3d_vis[n,i2,0] > 0: ax.plot(x, z, -y, c=colors[l], linewidth=2) if kpt_3d_vis[n,i1,0] > 0: ax.scatter(kpt_3d[n,i1,0], kpt_3d[n,i1,2], -kpt_3d[n,i1,1], c=colors[l], marker='o') if kpt_3d_vis[n,i2,0] > 0: ax.scatter(kpt_3d[n,i2,0], kpt_3d[n,i2,2], -kpt_3d[n,i2,1], c=colors[l], marker='o') if filename is None: ax.set_title('3D vis') else: ax.set_title(filename) ax.set_xlabel('X Label') ax.set_ylabel('Z Label') ax.set_zlabel('Y Label') ax.legend() plt.show() cv2.waitKey(0) ================================================ FILE: data/Dummy/Dummy.py ================================================ import os import os.path as osp from pycocotools.coco import COCO import numpy as np from config import cfg from utils.pose_utils import world2cam, cam2pixel, pixel2cam, rigid_align, process_bbox import cv2 import random import json from utils.vis import vis_keypoints, vis_3d_skeleton class Dummy: def __init__(self, data_split): self.data_split = data_split self.img_dir = osp.join('data', 'Dummy', 'images') self.annot_path = osp.join('data', 'Dummy', 'annotations') self.human_bbox_root_dir = osp.join('data', 'Dummy', 'bbox_root', 'bbox_root_human36m_output.json') self.joint_num = 18 # original:17, but manually added 'Thorax' self.joints_name = ('Pelvis', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Torso', 'Neck', 'Nose', 'Head', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'Thorax') self.flip_pairs = ( (1, 4), (2, 5), (3, 6), (14, 11), (15, 12), (16, 13) ) self.skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) ) self.joints_have_depth = True self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16) # exclude Thorax self.action_name = ['Directions', 'Discussion', 'Eating', 'Greeting', 'Phoning', 'Posing', 'Purchases', 'Sitting', 'SittingDown', 'Smoking', 'Photo', 'Waiting', 'Walking', 'WalkDog', 'WalkTogether'] self.root_idx = self.joints_name.index('Pelvis') self.lshoulder_idx = self.joints_name.index('L_Shoulder') self.rshoulder_idx = self.joints_name.index('R_Shoulder') self.data = self.load_data() def get_subsampling_ratio(self): if self.data_split == 'train': return 5 elif self.data_split == 'test': return 64 else: assert 0, print('Unknown subset') def get_subject(self): if self.data_split == 'train': subject = [1] elif self.data_split == 'test': subject = [2] else: assert 0, print("Unknown subset") return subject def add_thorax(self, joint_coord): thorax = (joint_coord[self.lshoulder_idx, :] + joint_coord[self.rshoulder_idx, :]) * 0.5 thorax = thorax.reshape((1, 3)) joint_coord = np.concatenate((joint_coord, thorax), axis=0) return joint_coord def load_data(self): print('Load data of Dummy') subject_list = self.get_subject() sampling_ratio = self.get_subsampling_ratio() # aggregate annotations from each subject db = COCO() cameras = {} joints = {} for subject in subject_list: # data load with open(osp.join(self.annot_path, 'Dummy_subject' + str(subject) + '_data.json'),'r') as f: annot = json.load(f) if len(db.dataset) == 0: for k,v in annot.items(): db.dataset[k] = v else: for k,v in annot.items(): db.dataset[k] += v # camera load with open(osp.join(self.annot_path, 'Dummy_subject' + str(subject) + '_camera.json'),'r') as f: cameras[str(subject)] = json.load(f) # joint coordinate load with open(osp.join(self.annot_path, 'Dummy_subject' + str(subject) + '_joint_3d.json'),'r') as f: joints[str(subject)] = json.load(f) db.createIndex() if self.data_split == 'test' and not cfg.use_gt_info: print("Get bounding box and root from " + self.human_bbox_root_dir) bbox_root_result = {} with open(self.human_bbox_root_dir) as f: annot = json.load(f) for i in range(len(annot)): bbox_root_result[str(annot[i]['image_id'])] = {'bbox': np.array(annot[i]['bbox']), 'root': np.array(annot[i]['root_cam'])} else: print("Get bounding box and root from groundtruth") data = [] for aid in db.anns.keys(): ann = db.anns[aid] image_id = ann['image_id'] img = db.loadImgs(image_id)[0] img_path = osp.join(self.img_dir, img['file_name']) img_width, img_height = img['width'], img['height'] # check subject and frame_idx subject = img['subject']; frame_idx = img['frame_idx']; if subject not in subject_list: continue if frame_idx % sampling_ratio != 0: continue # camera parameter cam_idx = img['cam_idx'] cam_param = cameras[str(subject)][str(cam_idx)] R,t,f,c = np.array(cam_param['R'], dtype=np.float32), np.array(cam_param['t'], dtype=np.float32), np.array(cam_param['f'], dtype=np.float32), np.array(cam_param['c'], dtype=np.float32) # project world coordinate to cam, image coordinate space action_idx = img['action_idx']; subaction_idx = img['subaction_idx']; frame_idx = img['frame_idx']; joint_world = np.array(joints[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)], dtype=np.float32) joint_world = self.add_thorax(joint_world) joint_cam = world2cam(joint_world, R, t) joint_img = cam2pixel(joint_cam, f, c) joint_img[:,2] = joint_img[:,2] - joint_cam[self.root_idx,2] joint_vis = np.ones((self.joint_num,1)) if self.data_split == 'test' and not cfg.use_gt_info: bbox = bbox_root_result[str(image_id)]['bbox'] # bbox should be aspect ratio preserved-extended. It is done in RootNet. root_cam = bbox_root_result[str(image_id)]['root'] else: bbox = process_bbox(np.array(ann['bbox']), img_width, img_height) if bbox is None: continue root_cam = joint_cam[self.root_idx] data.append({ 'img_path': img_path, 'img_id': image_id, 'bbox': bbox, 'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth] 'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate 'joint_vis': joint_vis, 'root_cam': root_cam, # [X, Y, Z] in camera coordinate 'f': f, 'c': c}) return data def evaluate(self, preds, result_dir): print('Evaluation start...') gts = self.data assert len(gts) == len(preds) sample_num = len(gts) pred_save = [] error = np.zeros((sample_num, self.joint_num-1)) # joint error error_action = [ [] for _ in range(len(self.action_name)) ] # error for each sequence for n in range(sample_num): gt = gts[n] image_id = gt['img_id'] f = gt['f'] c = gt['c'] bbox = gt['bbox'] gt_3d_root = gt['root_cam'] gt_3d_kpt = gt['joint_cam'] gt_vis = gt['joint_vis'] # restore coordinates to original space pred_2d_kpt = preds[n].copy() pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0] pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1] pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2] vis = False if vis: cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) filename = str(random.randrange(1,500)) tmpimg = cvimg.copy().astype(np.uint8) tmpkps = np.zeros((3,self.joint_num)) tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1] tmpkps[2,:] = 1 tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton) cv2.imwrite(filename + '_output.jpg', tmpimg) # back project to camera coordinate system pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c) # root joint alignment pred_3d_kpt = pred_3d_kpt - pred_3d_kpt[self.root_idx] gt_3d_kpt = gt_3d_kpt - gt_3d_kpt[self.root_idx] pred_3d_kpt = rigid_align(pred_3d_kpt, gt_3d_kpt) # exclude thorax pred_3d_kpt = np.take(pred_3d_kpt, self.eval_joint, axis=0) gt_3d_kpt = np.take(gt_3d_kpt, self.eval_joint, axis=0) # error calculate error[n] = np.sqrt(np.sum((pred_3d_kpt - gt_3d_kpt)**2,1)) img_name = gt['img_path'] action_idx = int(img_name[img_name.find('act')+4:img_name.find('act')+6]) - 2 error_action[action_idx].append(error[n].copy()) # prediction save pred_save.append({'image_id': image_id, 'joint_cam': pred_3d_kpt.tolist(), 'bbox': bbox.tolist(), 'root_cam': gt_3d_root.tolist()}) # joint_cam is root-relative coordinate # total error tot_err = np.mean(error) metric = 'PA MPJPE' eval_summary = 'Protocol 1' + ' error (' + metric + ') >> tot: %.2f\n' % (tot_err) # error for each action for i in range(len(error_action)): err = np.mean(np.array(error_action[i])) eval_summary += (self.action_name[i] + ': %.2f ' % err) print(eval_summary) # prediction save output_path = osp.join(result_dir, 'bbox_root_pose_dummy_output.json') with open(output_path, 'w') as f: json.dump(pred_save, f) print("Test result is saved at " + output_path) return eval_summary ================================================ FILE: data/Dummy/annotations/Dummy_subject1_camera.json ================================================ {"1": {"R": [[-0.9059013006181885, 0.4217144115102914, 0.038727105014486805], [0.044493184429779696, 0.1857199061874203, -0.9815948619389944], [-0.4211450938543295, -0.8875049698848251, -0.1870073216538954]], "t": [-234.7208032216618, 464.34018262882194, 5536.652631113797], "f": [1145.04940458804, 1143.78109572365], "c": [512.541504956548, 515.4514869776]}, "2": {"R": [[0.9216646531492915, 0.3879848687925067, -0.0014172943441045224], [0.07721054863099915, -0.18699239961454955, -0.979322405373477], [-0.3802272982247548, 0.9024974149959955, -0.20230080971229314]], "t": [-11.934348472090557, 449.4165893644565, 5541.113551868937], "f": [1149.67569986785, 1147.59161666764], "c": [508.848621645943, 508.064917088557]}, "3": {"R": [[-0.9063540572469627, -0.42053101768163204, -0.04093880896680188], [-0.0603212197838846, 0.22468715090881142, -0.9725620980997899], [0.4181909532208387, -0.8790161246439863, -0.2290130547809762]], "t": [781.127357651581, 235.3131620173424, 5576.37044019807], "f": [1149.14071676148, 1148.7989685676], "c": [519.815837182153, 501.402658888552]}, "4": {"R": [[0.91754082476548, -0.39226322025776267, 0.06517975852741943], [-0.04531905395586976, -0.26600517028098103, -0.9629057236990188], [0.395050652748768, 0.8805514269006645, -0.2618476013752581]], "t": [-155.13650339749012, 422.16256306729633, 4435.416222660868], "f": [1145.51133842318, 1144.77392807652], "c": [514.968197319863, 501.882018537695]}} ================================================ FILE: data/Dummy/annotations/Dummy_subject1_data.json ================================================ {"images": [{"id": 1877420, "file_name": "s_11_act_02_subact_01_ca_01/s_11_act_02_subact_01_ca_01_000001.jpg", "width": 1000, "height": 1002, "subject": 1, "action_name": "Directions", "action_idx": 2, "subaction_idx": 1, "cam_idx": 1, "frame_idx": 0}], "annotations": [{"id": 1877420, "image_id": 1877420, "keypoints_vis": [true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true], "bbox": [304.0201284041609, 222.305917169553, 328.1488619190915, 412.150330355609]}]} ================================================ FILE: data/Dummy/annotations/Dummy_subject1_joint_3d.json ================================================ {"2": {"1": {"0": [[-47.24769973754883, -81.04920196533203, 987.9080200195312], [-184.4625244140625, -69.55330657958984, 999.5223999023438], [-199.22152709960938, -72.29781341552734, 537.8258666992188], [-177.2645721435547, 44.52031326293945, 93.21685028076172], [89.96746063232422, -92.54512023925781, 976.2935791015625], [97.17977142333984, -81.16199493408203, 514.5499877929688], [82.85128784179688, 34.8104248046875, 69.40837097167969], [-52.695899963378906, -77.56897735595703, 1242.206298828125], [-49.09817886352539, -73.6445083618164, 1492.0970458984375], [-71.0900650024414, -139.2397003173828, 1579.0076904296875], [-71.68211364746094, -92.79254150390625, 1684.2078857421875], [116.02037811279297, -63.403587341308594, 1509.3262939453125], [396.226318359375, -72.48757934570312, 1469.46826171875], [633.7438354492188, -144.6726837158203, 1475.2344970703125], [-211.36859130859375, -37.4464111328125, 1487.2081298828125], [-487.9529724121094, -1.2391146421432495, 1438.4637451171875], [-727.43798828125, -60.458595275878906, 1466.75244140625]]}}} ================================================ FILE: data/Dummy/bbox_root/bbox_dummy_output.json ================================================ [{"image_id": 1877420, "category_id": 1, "bbox": [309.1705017089844, 252.84469604492188, 326.1686096191406, 368.1951599121094], "score": 0.9997870326042175}] ================================================ FILE: data/Human36M/Human36M.py ================================================ import os import os.path as osp from pycocotools.coco import COCO import numpy as np from config import cfg from utils.pose_utils import world2cam, cam2pixel, pixel2cam, rigid_align, process_bbox import cv2 import random import json from utils.vis import vis_keypoints, vis_3d_skeleton class Human36M: def __init__(self, data_split): self.data_split = data_split self.img_dir = osp.join('/', 'data', 'Human36M', 'images') self.annot_path = osp.join('/', 'data', 'Human36M', 'annotations') self.human_bbox_root_dir = osp.join('/', 'data', 'Human36M', 'bbox_root', 'bbox_root_human36m_output.json') self.joint_num = 18 # original:17, but manually added 'Thorax' self.joints_name = ('Pelvis', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Torso', 'Neck', 'Nose', 'Head', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'Thorax') self.flip_pairs = ( (1, 4), (2, 5), (3, 6), (14, 11), (15, 12), (16, 13) ) self.skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) ) self.joints_have_depth = True self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16) # exclude Thorax self.action_name = ['Directions', 'Discussion', 'Eating', 'Greeting', 'Phoning', 'Posing', 'Purchases', 'Sitting', 'SittingDown', 'Smoking', 'Photo', 'Waiting', 'Walking', 'WalkDog', 'WalkTogether'] self.root_idx = self.joints_name.index('Pelvis') self.lshoulder_idx = self.joints_name.index('L_Shoulder') self.rshoulder_idx = self.joints_name.index('R_Shoulder') self.protocol = 2 self.data = self.load_data() def get_subsampling_ratio(self): if self.data_split == 'train': return 5 elif self.data_split == 'test': return 64 else: assert 0, print('Unknown subset') def get_subject(self): if self.data_split == 'train': if self.protocol == 1: subject = [1,5,6,7,8,9] elif self.protocol == 2: subject = [1,5,6,7,8] elif self.data_split == 'test': if self.protocol == 1: subject = [11] elif self.protocol == 2: subject = [9,11] else: assert 0, print("Unknown subset") return subject def add_thorax(self, joint_coord): thorax = (joint_coord[self.lshoulder_idx, :] + joint_coord[self.rshoulder_idx, :]) * 0.5 thorax = thorax.reshape((1, 3)) joint_coord = np.concatenate((joint_coord, thorax), axis=0) return joint_coord def load_data(self): print('Load data of H36M Protocol ' + str(self.protocol)) subject_list = self.get_subject() sampling_ratio = self.get_subsampling_ratio() # aggregate annotations from each subject db = COCO() cameras = {} joints = {} for subject in subject_list: # data load with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_data.json'),'r') as f: annot = json.load(f) if len(db.dataset) == 0: for k,v in annot.items(): db.dataset[k] = v else: for k,v in annot.items(): db.dataset[k] += v # camera load with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_camera.json'),'r') as f: cameras[str(subject)] = json.load(f) # joint coordinate load with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_joint_3d.json'),'r') as f: joints[str(subject)] = json.load(f) db.createIndex() if self.data_split == 'test' and not cfg.use_gt_info: print("Get bounding box and root from " + self.human_bbox_root_dir) bbox_root_result = {} with open(self.human_bbox_root_dir) as f: annot = json.load(f) for i in range(len(annot)): bbox_root_result[str(annot[i]['image_id'])] = {'bbox': np.array(annot[i]['bbox']), 'root': np.array(annot[i]['root_cam'])} else: print("Get bounding box and root from groundtruth") data = [] for aid in db.anns.keys(): ann = db.anns[aid] image_id = ann['image_id'] img = db.loadImgs(image_id)[0] img_path = osp.join(self.img_dir, img['file_name']) img_width, img_height = img['width'], img['height'] # check subject and frame_idx subject = img['subject']; frame_idx = img['frame_idx']; if subject not in subject_list: continue if frame_idx % sampling_ratio != 0: continue # camera parameter cam_idx = img['cam_idx'] cam_param = cameras[str(subject)][str(cam_idx)] R,t,f,c = np.array(cam_param['R'], dtype=np.float32), np.array(cam_param['t'], dtype=np.float32), np.array(cam_param['f'], dtype=np.float32), np.array(cam_param['c'], dtype=np.float32) # project world coordinate to cam, image coordinate space action_idx = img['action_idx']; subaction_idx = img['subaction_idx']; frame_idx = img['frame_idx']; joint_world = np.array(joints[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)], dtype=np.float32) joint_world = self.add_thorax(joint_world) joint_cam = world2cam(joint_world, R, t) joint_img = cam2pixel(joint_cam, f, c) joint_img[:,2] = joint_img[:,2] - joint_cam[self.root_idx,2] joint_vis = np.ones((self.joint_num,1)) if self.data_split == 'test' and not cfg.use_gt_info: bbox = bbox_root_result[str(image_id)]['bbox'] # bbox should be aspect ratio preserved-extended. It is done in RootNet. root_cam = bbox_root_result[str(image_id)]['root'] else: bbox = process_bbox(np.array(ann['bbox']), img_width, img_height) if bbox is None: continue root_cam = joint_cam[self.root_idx] data.append({ 'img_path': img_path, 'img_id': image_id, 'bbox': bbox, 'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth] 'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate 'joint_vis': joint_vis, 'root_cam': root_cam, # [X, Y, Z] in camera coordinate 'f': f, 'c': c}) return data def evaluate(self, preds, result_dir): print('Evaluation start...') gts = self.data assert len(gts) == len(preds) sample_num = len(gts) pred_save = [] error = np.zeros((sample_num, self.joint_num-1)) # joint error error_action = [ [] for _ in range(len(self.action_name)) ] # error for each sequence for n in range(sample_num): gt = gts[n] image_id = gt['img_id'] f = gt['f'] c = gt['c'] bbox = gt['bbox'] gt_3d_root = gt['root_cam'] gt_3d_kpt = gt['joint_cam'] gt_vis = gt['joint_vis'] # restore coordinates to original space pred_2d_kpt = preds[n].copy() pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0] pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1] pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2] vis = False if vis: cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) filename = str(random.randrange(1,500)) tmpimg = cvimg.copy().astype(np.uint8) tmpkps = np.zeros((3,self.joint_num)) tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1] tmpkps[2,:] = 1 tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton) cv2.imwrite(filename + '_output.jpg', tmpimg) # back project to camera coordinate system pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c) # root joint alignment pred_3d_kpt = pred_3d_kpt - pred_3d_kpt[self.root_idx] gt_3d_kpt = gt_3d_kpt - gt_3d_kpt[self.root_idx] if self.protocol == 1: # rigid alignment for PA MPJPE (protocol #1) pred_3d_kpt = rigid_align(pred_3d_kpt, gt_3d_kpt) # exclude thorax pred_3d_kpt = np.take(pred_3d_kpt, self.eval_joint, axis=0) gt_3d_kpt = np.take(gt_3d_kpt, self.eval_joint, axis=0) # error calculate error[n] = np.sqrt(np.sum((pred_3d_kpt - gt_3d_kpt)**2,1)) img_name = gt['img_path'] action_idx = int(img_name[img_name.find('act')+4:img_name.find('act')+6]) - 2 error_action[action_idx].append(error[n].copy()) # prediction save pred_save.append({'image_id': image_id, 'joint_cam': pred_3d_kpt.tolist(), 'bbox': bbox.tolist(), 'root_cam': gt_3d_root.tolist()}) # joint_cam is root-relative coordinate # total error tot_err = np.mean(error) metric = 'PA MPJPE' if self.protocol == 1 else 'MPJPE' eval_summary = 'Protocol ' + str(self.protocol) + ' error (' + metric + ') >> tot: %.2f\n' % (tot_err) # error for each action for i in range(len(error_action)): err = np.mean(np.array(error_action[i])) eval_summary += (self.action_name[i] + ': %.2f ' % err) print(eval_summary) # prediction save output_path = osp.join(result_dir, 'bbox_root_pose_human36m_output.json') with open(output_path, 'w') as f: json.dump(pred_save, f) print("Test result is saved at " + output_path) return eval_summary ================================================ FILE: data/MPII/MPII.py ================================================ import os import os.path as osp import numpy as np from pycocotools.coco import COCO from utils.pose_utils import process_bbox from config import cfg class MPII: def __init__(self, data_split): self.data_split = data_split self.img_dir = osp.join('/', 'data', 'MPII') self.train_annot_path = osp.join('/', 'data', 'MPII', 'annotations', 'train.json') self.joint_num = 16 self.joints_name = ('R_Ankle', 'R_Knee', 'R_Hip', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Thorax', 'Neck', 'Head', 'R_Wrist', 'R_Elbow', 'R_Shoulder', 'L_Shoulder', 'L_Elbow', 'L_Wrist') self.flip_pairs = ( (0, 5), (1, 4), (2, 3), (10, 15), (11, 14), (12, 13) ) self.skeleton = ( (0, 1), (1, 2), (2, 6), (7, 12), (12, 11), (11, 10), (5, 4), (4, 3), (3, 6), (7, 13), (13, 14), (14, 15), (6, 7), (7, 8), (8, 9) ) self.joints_have_depth = False self.data = self.load_data() def load_data(self): if self.data_split == 'train': db = COCO(self.train_annot_path) else: print('Unknown data subset') assert 0 data = [] for aid in db.anns.keys(): ann = db.anns[aid] img = db.loadImgs(ann['image_id'])[0] width, height = img['width'], img['height'] if ann['num_keypoints'] == 0: continue bbox = process_bbox(ann['bbox'], width, height) if bbox is None: continue # joints and vis joint_img = np.array(ann['keypoints']).reshape(self.joint_num,3) joint_vis = joint_img[:,2].copy().reshape(-1,1) joint_img[:,2] = 0 imgname = img['file_name'] img_path = osp.join(self.img_dir, imgname) data.append({ 'img_path': img_path, 'bbox': bbox, 'joint_img': joint_img, # [org_img_x, org_img_y, 0] 'joint_vis': joint_vis, }) return data ================================================ FILE: data/MSCOCO/MSCOCO.py ================================================ import os import os.path as osp import numpy as np from pycocotools.coco import COCO from config import cfg import scipy.io as sio import json import cv2 import random import math from utils.pose_utils import pixel2cam, process_bbox from utils.vis import vis_keypoints, vis_3d_skeleton class MSCOCO: def __init__(self, data_split): self.data_split = data_split self.img_dir = osp.join('/','home', 'centos', 'datasets', 'coco', 'images') self.train_annot_path = osp.join('/','home', 'centos', 'datasets', 'coco', 'annotations', 'person_keypoints_train2017.json') self.test_annot_path = osp.join('/','home', 'centos', 'datasets', 'coco', 'annotations', 'person_keypoints_val2017.json') self.human_3d_bbox_root_dir = osp.join('/', 'home', 'centos','datasets', 'coco', 'bbox_root', 'bbox_root_coco_output.json') if self.data_split == 'train': self.joint_num = 19 # original: 17, but manually added 'Thorax', 'Pelvis' self.joints_name = ('Nose', 'L_Eye', 'R_Eye', 'L_Ear', 'R_Ear', 'L_Shoulder', 'R_Shoulder', 'L_Elbow', 'R_Elbow', 'L_Wrist', 'R_Wrist', 'L_Hip', 'R_Hip', 'L_Knee', 'R_Knee', 'L_Ankle', 'R_Ankle', 'Thorax', 'Pelvis') self.flip_pairs = ( (1, 2), (3, 4), (5, 6), (7, 8), (9, 10), (11, 12), (13, 14), (15, 16) ) self.skeleton = ( (1, 2), (0, 1), (0, 2), (2, 4), (1, 3), (6, 8), (8, 10), (5, 7), (7, 9), (12, 14), (14, 16), (11, 13), (13, 15), (5, 6), (11, 12) ) self.joints_have_depth = False self.lshoulder_idx = self.joints_name.index('L_Shoulder') self.rshoulder_idx = self.joints_name.index('R_Shoulder') self.lhip_idx = self.joints_name.index('L_Hip') self.rhip_idx = self.joints_name.index('R_Hip') else: ## testing settings (when test model trained on the MuCo-3DHP dataset) self.joint_num = 21 # MuCo-3DHP self.joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe') # MuCo-3DHP self.original_joint_num = 17 # MuPoTS self.original_joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head') # MuPoTS self.flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13) ) self.skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (11, 12), (12, 13), (1, 2), (2, 3), (3, 4), (1, 5), (5, 6), (6, 7) ) self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16) self.joints_have_depth = False self.data = self.load_data() def load_data(self): if self.data_split == 'train': db = COCO(self.train_annot_path) data = [] for aid in db.anns.keys(): ann = db.anns[aid] img = db.loadImgs(ann['image_id'])[0] width, height = img['width'], img['height'] if (ann['image_id'] not in db.imgs) or ann['iscrowd'] or (ann['num_keypoints'] == 0): continue bbox = process_bbox(ann['bbox'], width, height) if bbox is None: continue # joints and vis joint_img = np.array(ann['keypoints']).reshape(-1,3) # add Thorax thorax = (joint_img[self.lshoulder_idx, :] + joint_img[self.rshoulder_idx, :]) * 0.5 thorax[2] = joint_img[self.lshoulder_idx,2] * joint_img[self.rshoulder_idx,2] thorax = thorax.reshape((1, 3)) # add Pelvis pelvis = (joint_img[self.lhip_idx, :] + joint_img[self.rhip_idx, :]) * 0.5 pelvis[2] = joint_img[self.lhip_idx,2] * joint_img[self.rhip_idx,2] pelvis = pelvis.reshape((1, 3)) joint_img = np.concatenate((joint_img, thorax, pelvis), axis=0) joint_vis = (joint_img[:,2].copy().reshape(-1,1) > 0) joint_img[:,2] = 0 imgname = osp.join('train2017', db.imgs[ann['image_id']]['file_name']) img_path = osp.join(self.img_dir, imgname) data.append({ 'img_path': img_path, 'bbox': bbox, 'joint_img': joint_img, # [org_img_x, org_img_y, 0] 'joint_vis': joint_vis, 'f': np.array([1500, 1500]), 'c': np.array([width/2, height/2]) }) elif self.data_split == 'test': db = COCO(self.test_annot_path) with open(self.human_3d_bbox_root_dir) as f: annot = json.load(f) data = [] for i in range(len(annot)): image_id = annot[i]['image_id'] img = db.loadImgs(image_id)[0] img_path = osp.join(self.img_dir, 'val2017', img['file_name']) fx, fy, cx, cy = 1500, 1500, img['width']/2, img['height']/2 f = np.array([fx, fy]); c = np.array([cx, cy]); root_cam = np.array(annot[i]['root_cam']).reshape(3) bbox = np.array(annot[i]['bbox']).reshape(4) data.append({ 'img_path': img_path, 'bbox': bbox, 'joint_img': np.zeros((self.original_joint_num, 3)), # dummy 'joint_cam': np.zeros((self.original_joint_num, 3)), # dummy 'joint_vis': np.zeros((self.original_joint_num, 1)), # dummy 'root_cam': root_cam, # [X, Y, Z] in camera coordinate 'f': f, 'c': c, }) else: print('Unknown data subset') assert 0 return data def evaluate(self, preds, result_dir): print('Evaluation start...') gts = self.data sample_num = len(preds) joint_num = self.original_joint_num pred_2d_save = {} pred_3d_save = {} for n in range(sample_num): gt = gts[n] f = gt['f'] c = gt['c'] bbox = gt['bbox'] gt_3d_root = gt['root_cam'] img_name = gt['img_path'].split('/') img_name = 'coco_' + img_name[-1].split('.')[0] # e.g., coco_00000000 # restore coordinates to original space pred_2d_kpt = preds[n].copy() # only consider eval_joint pred_2d_kpt = np.take(pred_2d_kpt, self.eval_joint, axis=0) pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0] pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1] pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2] # 2d kpt save if img_name in pred_2d_save: pred_2d_save[img_name].append(pred_2d_kpt[:,:2]) else: pred_2d_save[img_name] = [pred_2d_kpt[:,:2]] vis = False if vis: cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) filename = str(random.randrange(1,500)) tmpimg = cvimg.copy().astype(np.uint8) tmpkps = np.zeros((3,joint_num)) tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1] tmpkps[2,:] = 1 tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton) cv2.imwrite(filename + '_output.jpg', tmpimg) # back project to camera coordinate system pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c) # 3d kpt save if img_name in pred_3d_save: pred_3d_save[img_name].append(pred_3d_kpt) else: pred_3d_save[img_name] = [pred_3d_kpt] output_path = osp.join(result_dir,'preds_2d_kpt_coco.mat') sio.savemat(output_path, pred_2d_save) print("Testing result is saved at " + output_path) output_path = osp.join(result_dir,'preds_3d_kpt_coco.mat') sio.savemat(output_path, pred_3d_save) print("Testing result is saved at " + output_path) ================================================ FILE: data/MuCo/MuCo.py ================================================ import os import os.path as osp import numpy as np import math from utils.pose_utils import process_bbox from pycocotools.coco import COCO from config import cfg class MuCo: def __init__(self, data_split): self.data_split = data_split self.img_dir = osp.join('/', 'home', 'centos', 'datasets', 'MuCo') self.train_annot_path = osp.join('/', 'home', 'centos', 'datasets', 'MuCo', 'MuCo-3DHP.json') self.joint_num = 21 self.joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe') self.flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) ) self.skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) ) self.joints_have_depth = True self.root_idx = self.joints_name.index('Pelvis') self.data = self.load_data() def load_data(self): if self.data_split == 'train': db = COCO(self.train_annot_path) else: print('Unknown data subset') assert 0 data = [] for iid in db.imgs.keys(): img = db.imgs[iid] img_id = img["id"] img_width, img_height = img['width'], img['height'] imgname = img['file_name'] img_path = osp.join(self.img_dir, imgname) f = img["f"] c = img["c"] # crop the closest person to the camera ann_ids = db.getAnnIds(img_id) anns = db.loadAnns(ann_ids) root_depths = [ann['keypoints_cam'][self.root_idx][2] for ann in anns] closest_pid = root_depths.index(min(root_depths)) pid_list = [closest_pid] for i in range(len(anns)): if i == closest_pid: continue picked = True for j in range(len(anns)): if i == j: continue dist = (np.array(anns[i]['keypoints_cam'][self.root_idx]) - np.array(anns[j]['keypoints_cam'][self.root_idx])) ** 2 dist_2d = math.sqrt(np.sum(dist[:2])) dist_3d = math.sqrt(np.sum(dist)) if dist_2d < 500 or dist_3d < 500: picked = False if picked: pid_list.append(i) for pid in pid_list: joint_cam = np.array(anns[pid]['keypoints_cam']) root_cam = joint_cam[self.root_idx] joint_img = np.array(anns[pid]['keypoints_img']) joint_img = np.concatenate([joint_img, joint_cam[:,2:]],1) joint_img[:,2] = joint_img[:,2] - root_cam[2] joint_vis = np.ones((self.joint_num,1)) bbox = process_bbox(anns[pid]['bbox'], img_width, img_height) if bbox is None: continue data.append({ 'img_path': img_path, 'bbox': bbox, 'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth] 'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate 'joint_vis': joint_vis, 'root_cam': root_cam, # [X, Y, Z] in camera coordinate 'f': f, 'c': c }) return data ================================================ FILE: data/MuPoTS/MuPoTS.py ================================================ import os import os.path as osp import scipy.io as sio import numpy as np from pycocotools.coco import COCO from config import cfg import json import cv2 import random import math from utils.pose_utils import pixel2cam, process_bbox from utils.vis import vis_keypoints, vis_3d_skeleton class MuPoTS: def __init__(self, data_split): self.data_split = data_split self.img_dir = osp.join('/', 'data', 'MuPoTS', 'data', 'MultiPersonTestSet') self.test_annot_path = osp.join('/', 'data', 'MuPoTS', 'data', 'MuPoTS-3D.json') self.human_bbox_root_dir = osp.join('/', 'data', 'MuPoTS', 'bbox_root', 'bbox_root_mupots_output.json') self.joint_num = 21 # MuCo-3DHP self.joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe') # MuCo-3DHP self.original_joint_num = 17 # MuPoTS self.original_joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head') # MuPoTS self.flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13) ) self.skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (11, 12), (12, 13), (1, 2), (2, 3), (3, 4), (1, 5), (5, 6), (6, 7) ) self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16) self.joints_have_depth = True self.root_idx = self.joints_name.index('Pelvis') self.data = self.load_data() def load_data(self): if self.data_split != 'test': print('Unknown data subset') assert 0 data = [] db = COCO(self.test_annot_path) # use gt bbox and root if cfg.use_gt_info: print("Get bounding box and root from groundtruth") for aid in db.anns.keys(): ann = db.anns[aid] if ann['is_valid'] == 0: continue image_id = ann['image_id'] img = db.loadImgs(image_id)[0] img_path = osp.join(self.img_dir, img['file_name']) fx, fy, cx, cy = img['intrinsic'] f = np.array([fx, fy]); c = np.array([cx, cy]); joint_cam = np.array(ann['keypoints_cam']) root_cam = joint_cam[self.root_idx] joint_img = np.array(ann['keypoints_img']) joint_img = np.concatenate([joint_img, joint_cam[:,2:]],1) joint_img[:,2] = joint_img[:,2] - root_cam[2] joint_vis = np.ones((self.original_joint_num,1)) bbox = np.array(ann['bbox']) img_width, img_height = img['width'], img['height'] bbox = process_bbox(bbox, img_width, img_height) if bbox is None: continue data.append({ 'img_path': img_path, 'bbox': bbox, 'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth] 'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate 'joint_vis': joint_vis, 'root_cam': root_cam, # [X, Y, Z] in camera coordinate 'f': f, 'c': c, }) else: print("Get bounding box and root from " + self.human_bbox_root_dir) with open(self.human_bbox_root_dir) as f: annot = json.load(f) for i in range(len(annot)): image_id = annot[i]['image_id'] img = db.loadImgs(image_id)[0] img_width, img_height = img['width'], img['height'] img_path = osp.join(self.img_dir, img['file_name']) fx, fy, cx, cy = img['intrinsic'] f = np.array([fx, fy]); c = np.array([cx, cy]); root_cam = np.array(annot[i]['root_cam']).reshape(3) bbox = np.array(annot[i]['bbox']).reshape(4) data.append({ 'img_path': img_path, 'bbox': bbox, 'joint_img': np.zeros((self.original_joint_num, 3)), # dummy 'joint_cam': np.zeros((self.original_joint_num, 3)), # dummy 'joint_vis': np.zeros((self.original_joint_num, 1)), # dummy 'root_cam': root_cam, # [X, Y, Z] in camera coordinate 'f': f, 'c': c, }) return data def evaluate(self, preds, result_dir): print('Evaluation start...') gts = self.data sample_num = len(preds) joint_num = self.original_joint_num pred_2d_save = {} pred_3d_save = {} for n in range(sample_num): gt = gts[n] f = gt['f'] c = gt['c'] bbox = gt['bbox'] gt_3d_root = gt['root_cam'] img_name = gt['img_path'].split('/') img_name = img_name[-2] + '_' + img_name[-1].split('.')[0] # e.g., TS1_img_0001 # restore coordinates to original space pred_2d_kpt = preds[n].copy() # only consider eval_joint pred_2d_kpt = np.take(pred_2d_kpt, self.eval_joint, axis=0) pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0] pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1] pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2] # 2d kpt save if img_name in pred_2d_save: pred_2d_save[img_name].append(pred_2d_kpt[:,:2]) else: pred_2d_save[img_name] = [pred_2d_kpt[:,:2]] vis = False if vis: cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) filename = str(random.randrange(1,500)) tmpimg = cvimg.copy().astype(np.uint8) tmpkps = np.zeros((3,joint_num)) tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1] tmpkps[2,:] = 1 tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton) cv2.imwrite(filename + '_output.jpg', tmpimg) # back project to camera coordinate system pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c) # 3d kpt save if img_name in pred_3d_save: pred_3d_save[img_name].append(pred_3d_kpt) else: pred_3d_save[img_name] = [pred_3d_kpt] output_path = osp.join(result_dir,'preds_2d_kpt_mupots.mat') sio.savemat(output_path, pred_2d_save) print("Testing result is saved at " + output_path) output_path = osp.join(result_dir,'preds_3d_kpt_mupots.mat') sio.savemat(output_path, pred_3d_save) print("Testing result is saved at " + output_path) ================================================ FILE: data/MuPoTS/mpii_mupots_multiperson_eval.m ================================================ function mpii_mupots_multiperson_eval(eval_mode, is_relative) % eval_mode: EVLAUATION_MODE % is_relative: 1: root-relative 3D multi-person pose estimation, 0: absolute 3D multi-person pose estimation %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Outline of the test eval procedure on MuPoTS-3D. % Plug in your predictions at the appropriate point %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% mpii_mupots_config; addpath('./util'); [~,o1,o2,relevant_labels] = mpii_get_joints('relevant'); num_joints = length(o1); %Path to the test images and annotations test_annot_base = mpii_mupots_path; %See mpii_mupots_config %Path where results are written out results_output_path = './'; %If predicted joints have a different ordering, specify mapping to MPI joints here %map_to_mpii_jointset = % [11 14 10 13 9 12 5 8 4 7 3 6 1]; %Order to process bones in to resize them to the GT safe_traversal_order = [15, 16, 2, 1, 17, 3, 4, 5, 6, 7, 8, 9:14]; EVALUATION_MODE = eval_mode; % 0 = evaluate all annotated persons, 1 = evaluate only predictions matched to annotations person_colors = {'red', 'yellow', 'green', 'blue', 'magenta', 'cyan', 'black', 'white'} ; sequencewise_per_joint_error = {}; sequencewise_undetected_people = []; sequencewise_visibility_mask = {}; sequencewise_occlusion_mask = {}; sequencewise_annotated_people = []; sequencewise_frames = []; %% load prdictions preds_2d_kpt = load('preds_2d_kpt_mupots.mat'); preds_3d_kpt = load('preds_3d_kpt_mupots.mat'); for ts = 1:20 person_ids = []; open_person_ids = 1:20; load( sprintf('%s/TS%d/annot.mat',test_annot_base, ts)); load( sprintf('%s/TS%d/occlusion.mat',test_annot_base, ts)); num_frames = size(annotations,1); undetected_people = 0; annotated_people = 0; pje_idx = 1; per_joint_error = []; %zeros(17,1,num_test_points); per_joint_occlusion_mask = []; per_joint_visibility_mask = []; sequencewise_frames(ts) = num_frames; for i = 1:num_frames %Count valid annotations valid_annotations = 0; for k = 1:size(annotations,2) if(annotations{i,k}.isValidFrame) valid_annotations = valid_annotations + 1; end end annotated_people = annotated_people + valid_annotations; if(valid_annotations == 0) continue; end gt_pose_2d = cell(valid_annotations,1); gt_pose_3d = cell(valid_annotations,1); gt_visibility = cell(valid_annotations,1); gt_pose_occlusion_labels = cell(valid_annotations,1); gt_pose_visibility_labels = cell(valid_annotations,1); %The joint set to use for matching predictions to GT matching_joints = [2:14]; %matching_joints = [2 3 6 9 12]; idx = 1; for k = 1:size(annotations,2) if(annotations{i,k}.isValidFrame) gt_pose_2d{idx} = annotations{i,k}.annot2(:,matching_joints); gt_pose_3d{idx} = annotations{i,k}.univ_annot3 ; gt_visibility{idx} = ones(1,length(matching_joints)); gt_pose_occlusion_labels{idx} = occlusion_labels{i,k} ; gt_pose_visibility_labels{idx} = 1 - occlusion_labels{i,k} ; idx = idx + 1; end end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%% Predictions here %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %img = imread(sprintf('%s/TS%d/img_%06d.jpg',test_annot_base, ts, i-1)); % prediction of this image pred_2d_kpt = getfield(preds_2d_kpt,sprintf('TS%d_img_%06d',ts, i-1)); pred_3d_kpt = getfield(preds_3d_kpt,sprintf('TS%d_img_%06d',ts, i-1)); %Number of subjects predicted num_pred = size(pred_2d_kpt,1); pred_pose_2d = cell(num_pred,1); pred_pose_3d = cell(num_pred,1); pred_visibility = cell(num_pred,1); for k = 1:num_pred pred_pose_2d{k} = zeros(2,14); %pred_pose_2d{k}(:,map_to_mpii_jointset) = % 2D Pose for person detected person k; pred_pose_2d{k} = transpose(squeeze(pred_2d_kpt(k,:,:))); % 2D Pose for person detected person k; % If some joints such as neck are missing, they can be estimated as the mean of shoulders %pred_pose_2d{k}(:,2) = mean(pred_pose_2d{k}(:,[3,6]),2); pred_pose_2d{k} = pred_pose_2d{k}(:,matching_joints); pred_visibility{k} = ~((pred_pose_2d{k}(1,:) == 0) & (pred_pose_2d{k}(2,:) == 0)); pred_pose_3d{k} = zeros(3,num_joints); %pred_pose_3d{k}(:,map_to_mpii_jointset) = % 3D Pose for person detected person k; pred_pose_3d{k} = transpose(squeeze(pred_3d_kpt(k,:,:))); % 3D Pose for person detected person k; % If some joints such as neck or pelvis are missing, they can be estimated as % the mean of shoulders or hips %pred_pose_3d{k}(:,2) = mean(pred_pose_3d{k}(:,[3,6]),2); %pred_pose_3d{k}(:,15) = mean(pred_pose_3d{k}(:,[9,12]),2); %Center the predictions at the pelvis if is_relative == 1 pred_pose_3d{k} = pred_pose_3d{k} - repmat(pred_pose_3d{k}(:,15), 1, 17); else pred_pose_3d{k} = pred_pose_3d{k}; end %Other mappings that may be needed to convert the predicted pose to match our coordinate system %pred_pose_3d{k} = 1000* pred_pose_3d{k}([2 3 1],:); %pred_pose_3d{k}(1:2,:) = -pred_pose_3d{k}(1:2,:); end %Match predictions to GT [matching, old_matched] = mpii_multiperson_get_identity_matching(gt_pose_2d, gt_visibility, pred_pose_2d, pred_visibility, 40); undetected_people = undetected_people + sum(matching == 0); for k = 1:valid_annotations if is_relative == 1 P = gt_pose_3d{k}(:,1:num_joints) - repmat(gt_pose_3d{k}(:,15),1 , num_joints); else P = gt_pose_3d{k}(:,1:num_joints); end pred_considered = 0; if(matching(k) ~= 0 ) pred_p = pred_pose_3d{matching(k)}(:,1:num_joints); pred_p = mpii_map_to_gt_bone_lengths(pred_p, P, o1, safe_traversal_order(2:end)); pred_considered = 1; else pred_p = 100000 * ones(size(P)); %So that the 3DPCK metric marks all these joints as 0! if(EVALUATION_MODE==0) pred_considered = 1; end end if (pred_considered == 1 ) error_p = (pred_p - P).^2; error_p = sqrt(sum(error_p, 1)); per_joint_error(1:num_joints,1,pje_idx) = error_p; per_joint_occlusion_mask(1:num_joints,1,pje_idx) = gt_pose_occlusion_labels{k}; per_joint_visibility_mask(1:num_joints,1,pje_idx) = gt_pose_visibility_labels{k}; pje_idx = pje_idx + 1; end end end sequencewise_undetected_people(ts) = undetected_people; sequencewise_annotated_people(ts) = annotated_people; sequencewise_per_joint_error{ts} = per_joint_error; sequencewise_visibility_mask{ts} = per_joint_visibility_mask; sequencewise_occlusion_mask{ts} = per_joint_occlusion_mask; end if(EVALUATION_MODE == 0) out_prefix = 'all_annotated_'; else out_prefix = 'only_matched_annotations_'; end save([results_output_path filesep out_prefix 'multiperson_3dhp_evaluation.mat'], 'sequencewise_per_joint_error' ); [seq_table] = mpii_evaluate_multiperson_errors(sequencewise_per_joint_error );%fullfile(net_base, net_path{n,1})); out_file = [results_output_path filesep out_prefix 'multiperson_3dhp_evaluation']; writetable(cell2table(seq_table), [out_file '_sequencewise.csv']); [seq_table] = mpii_evaluate_multiperson_errors_visibility_mask(sequencewise_per_joint_error , sequencewise_visibility_mask); out_file = [results_output_path filesep [out_prefix 'visible_joints_'] 'multiperson_3dhp_evaluation']; writetable(cell2table(seq_table), [out_file '_sequencewise.csv']); [seq_table] = mpii_evaluate_multiperson_errors_visibility_mask(sequencewise_per_joint_error , sequencewise_occlusion_mask); out_file = [results_output_path filesep [out_prefix 'occluded_joints_'] 'multiperson_3dhp_evaluation']; writetable(cell2table(seq_table), [out_file '_sequencewise.csv']); % end ================================================ FILE: data/dataset.py ================================================ import numpy as np import cv2 import random import time import torch import copy import math from torch.utils.data.dataset import Dataset from utils.vis import vis_keypoints, vis_3d_skeleton from utils.pose_utils import fliplr_joints, transform_joint_to_other_db from config import cfg class DatasetLoader(Dataset): def __init__(self, db, ref_joints_name, is_train, transform): self.db = db.data self.joint_num = db.joint_num self.skeleton = db.skeleton self.flip_pairs = db.flip_pairs self.joints_have_depth = db.joints_have_depth self.joints_name = db.joints_name self.ref_joints_name = ref_joints_name self.transform = transform self.is_train = is_train if self.is_train: self.do_augment = True else: self.do_augment = False def __getitem__(self, index): joint_num = self.joint_num skeleton = self.skeleton flip_pairs = self.flip_pairs joints_have_depth = self.joints_have_depth data = copy.deepcopy(self.db[index]) bbox = data['bbox'] joint_img = data['joint_img'] joint_vis = data['joint_vis'] # 1. load image cvimg = cv2.imread(data['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION) if not isinstance(cvimg, np.ndarray): raise IOError("Fail to read %s" % data['img_path']) img_height, img_width, img_channels = cvimg.shape # 2. get augmentation params if self.do_augment: scale, rot, do_flip, color_scale, do_occlusion = get_aug_config() else: scale, rot, do_flip, color_scale, do_occlusion = 1.0, 0.0, False, [1.0, 1.0, 1.0], False # 3. crop patch from img and perform data augmentation (flip, rot, color scale, synthetic occlusion) img_patch, trans = generate_patch_image(cvimg, bbox, do_flip, scale, rot, do_occlusion) for i in range(img_channels): img_patch[:, :, i] = np.clip(img_patch[:, :, i] * color_scale[i], 0, 255) # 4. generate patch joint ground truth # flip joints and apply Affine Transform on joints if do_flip: joint_img[:, 0] = img_width - joint_img[:, 0] - 1 for pair in flip_pairs: joint_img[pair[0], :], joint_img[pair[1], :] = joint_img[pair[1], :], joint_img[pair[0], :].copy() joint_vis[pair[0], :], joint_vis[pair[1], :] = joint_vis[pair[1], :], joint_vis[pair[0], :].copy() for i in range(len(joint_img)): joint_img[i, 0:2] = trans_point2d(joint_img[i, 0:2], trans) joint_img[i, 2] /= (cfg.bbox_3d_shape[0]/2.) # expect depth lies in -bbox_3d_shape[0]/2 ~ bbox_3d_shape[0]/2 -> -1.0 ~ 1.0 joint_img[i, 2] = (joint_img[i,2] + 1.0)/2. # 0~1 normalize joint_vis[i] *= ( (joint_img[i,0] >= 0) & \ (joint_img[i,0] < cfg.input_shape[1]) & \ (joint_img[i,1] >= 0) & \ (joint_img[i,1] < cfg.input_shape[0]) & \ (joint_img[i,2] >= 0) & \ (joint_img[i,2] < 1) ) vis = False if vis: filename = str(random.randrange(1,500)) tmpimg = img_patch.copy().astype(np.uint8) tmpkps = np.zeros((3,joint_num)) tmpkps[:2,:] = joint_img[:,:2].transpose(1,0) tmpkps[2,:] = joint_vis[:,0] tmpimg = vis_keypoints(tmpimg, tmpkps, skeleton) cv2.imwrite(filename + '_gt.jpg', tmpimg) vis = False if vis: vis_3d_skeleton(joint_img, joint_vis, skeleton, filename) # change coordinates to output space joint_img[:, 0] = joint_img[:, 0] / cfg.input_shape[1] * cfg.output_shape[1] joint_img[:, 1] = joint_img[:, 1] / cfg.input_shape[0] * cfg.output_shape[0] joint_img[:, 2] = joint_img[:, 2] * cfg.depth_dim if self.is_train: img_patch = self.transform(img_patch) if self.ref_joints_name is not None: joint_img = transform_joint_to_other_db(joint_img, self.joints_name, self.ref_joints_name) joint_vis = transform_joint_to_other_db(joint_vis, self.joints_name, self.ref_joints_name) joint_img = joint_img.astype(np.float32) joint_vis = (joint_vis > 0).astype(np.float32) joints_have_depth = np.array([joints_have_depth]).astype(np.float32) return img_patch, joint_img, joint_vis, joints_have_depth else: img_patch = self.transform(img_patch) return img_patch def __len__(self): return len(self.db) # helper functions def get_aug_config(): scale_factor = 0.25 rot_factor = 30 color_factor = 0.2 scale = np.clip(np.random.randn(), -1.0, 1.0) * scale_factor + 1.0 rot = np.clip(np.random.randn(), -2.0, 2.0) * rot_factor if random.random() <= 0.6 else 0 do_flip = random.random() <= 0.5 c_up = 1.0 + color_factor c_low = 1.0 - color_factor color_scale = [random.uniform(c_low, c_up), random.uniform(c_low, c_up), random.uniform(c_low, c_up)] do_occlusion = random.random() <= 0.5 return scale, rot, do_flip, color_scale, do_occlusion def generate_patch_image(cvimg, bbox, do_flip, scale, rot, do_occlusion): img = cvimg.copy() img_height, img_width, img_channels = img.shape # synthetic occlusion if do_occlusion: while True: area_min = 0.0 area_max = 0.7 synth_area = (random.random() * (area_max - area_min) + area_min) * bbox[2] * bbox[3] ratio_min = 0.3 ratio_max = 1/0.3 synth_ratio = (random.random() * (ratio_max - ratio_min) + ratio_min) synth_h = math.sqrt(synth_area * synth_ratio) synth_w = math.sqrt(synth_area / synth_ratio) synth_xmin = random.random() * (bbox[2] - synth_w - 1) + bbox[0] synth_ymin = random.random() * (bbox[3] - synth_h - 1) + bbox[1] if synth_xmin >= 0 and synth_ymin >= 0 and synth_xmin + synth_w < img_width and synth_ymin + synth_h < img_height: xmin = int(synth_xmin) ymin = int(synth_ymin) w = int(synth_w) h = int(synth_h) img[ymin:ymin+h, xmin:xmin+w, :] = np.random.rand(h, w, 3) * 255 break bb_c_x = float(bbox[0] + 0.5*bbox[2]) bb_c_y = float(bbox[1] + 0.5*bbox[3]) bb_width = float(bbox[2]) bb_height = float(bbox[3]) if do_flip: img = img[:, ::-1, :] bb_c_x = img_width - bb_c_x - 1 trans = gen_trans_from_patch_cv(bb_c_x, bb_c_y, bb_width, bb_height, cfg.input_shape[1], cfg.input_shape[0], scale, rot, inv=False) img_patch = cv2.warpAffine(img, trans, (int(cfg.input_shape[1]), int(cfg.input_shape[0])), flags=cv2.INTER_LINEAR) img_patch = img_patch[:,:,::-1].copy() img_patch = img_patch.astype(np.float32) return img_patch, trans def rotate_2d(pt_2d, rot_rad): x = pt_2d[0] y = pt_2d[1] sn, cs = np.sin(rot_rad), np.cos(rot_rad) xx = x * cs - y * sn yy = x * sn + y * cs return np.array([xx, yy], dtype=np.float32) def gen_trans_from_patch_cv(c_x, c_y, src_width, src_height, dst_width, dst_height, scale, rot, inv=False): # augment size with scale src_w = src_width * scale src_h = src_height * scale src_center = np.array([c_x, c_y], dtype=np.float32) # augment rotation rot_rad = np.pi * rot / 180 src_downdir = rotate_2d(np.array([0, src_h * 0.5], dtype=np.float32), rot_rad) src_rightdir = rotate_2d(np.array([src_w * 0.5, 0], dtype=np.float32), rot_rad) dst_w = dst_width dst_h = dst_height dst_center = np.array([dst_w * 0.5, dst_h * 0.5], dtype=np.float32) dst_downdir = np.array([0, dst_h * 0.5], dtype=np.float32) dst_rightdir = np.array([dst_w * 0.5, 0], dtype=np.float32) src = np.zeros((3, 2), dtype=np.float32) src[0, :] = src_center src[1, :] = src_center + src_downdir src[2, :] = src_center + src_rightdir dst = np.zeros((3, 2), dtype=np.float32) dst[0, :] = dst_center dst[1, :] = dst_center + dst_downdir dst[2, :] = dst_center + dst_rightdir if inv: trans = cv2.getAffineTransform(np.float32(dst), np.float32(src)) else: trans = cv2.getAffineTransform(np.float32(src), np.float32(dst)) return trans def trans_point2d(pt_2d, trans): src_pt = np.array([pt_2d[0], pt_2d[1], 1.]).T dst_pt = np.dot(trans, src_pt) return dst_pt[0:2] ================================================ FILE: data/multiple_datasets.py ================================================ import random import numpy as np from torch.utils.data.dataset import Dataset class MultipleDatasets(Dataset): def __init__(self, dbs, make_same_len=True): self.dbs = dbs self.db_num = len(self.dbs) self.max_db_data_num = max([len(db) for db in dbs]) self.db_len_cumsum = np.cumsum([len(db) for db in dbs]) self.make_same_len = make_same_len def __len__(self): # all dbs have the same length if self.make_same_len: return self.max_db_data_num * self.db_num # each db has different length else: return sum([len(db) for db in self.dbs]) def __getitem__(self, index): if self.make_same_len: db_idx = index // self.max_db_data_num data_idx = index % self.max_db_data_num if data_idx >= len(self.dbs[db_idx]) * (self.max_db_data_num // len(self.dbs[db_idx])): # last batch: random sampling data_idx = random.randint(0,len(self.dbs[db_idx])-1) else: # before last batch: use modular data_idx = data_idx % len(self.dbs[db_idx]) else: for i in range(self.db_num): if index < self.db_len_cumsum[i]: db_idx = i break if db_idx == 0: data_idx = index else: data_idx = index - self.db_len_cumsum[db_idx-1] return self.dbs[db_idx][data_idx] ================================================ FILE: demo/demo.py ================================================ import sys import os import os.path as osp import argparse import numpy as np import cv2 import torch import torchvision.transforms as transforms from torch.nn.parallel.data_parallel import DataParallel import torch.backends.cudnn as cudnn sys.path.insert(0, osp.join('..', 'main')) sys.path.insert(0, osp.join('..', 'data')) sys.path.insert(0, osp.join('..', 'common')) from config import cfg from model import get_pose_net from dataset import generate_patch_image from utils.pose_utils import process_bbox, pixel2cam from utils.vis import vis_keypoints, vis_3d_multiple_skeleton def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('--gpu', type=str, dest='gpu_ids') parser.add_argument('--model_path', type=str, dest='model') parser.add_argument('--input_image', type=str, dest='image') parser.add_argument('--backbone', type=str, dest='backbone') args = parser.parse_args() # test gpus if not args.gpu_ids: assert 0, print("Please set proper gpu ids") if '-' in args.gpu_ids: gpus = args.gpu_ids.split('-') gpus[0] = 0 if not gpus[0].isdigit() else int(gpus[0]) gpus[1] = len(mem_info()) if not gpus[1].isdigit() else int(gpus[1]) + 1 args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus)))) return args # argument parsing args = parse_args() cfg.set_args(args.gpu_ids) cudnn.benchmark = True # MuCo joint set joint_num = 18 joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe') # 'Pelvis' 'RHip' 'RKnee' 'RAnkle' 'LHip' 'LKnee' 'LAnkle' 'Spine1' 'Neck' 'Head' 'Site' 'LShoulder' 'LElbow' 'LWrist' 'RShoulder' 'RElbow' 'RWrist flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) ) # skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) ) skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) ) # snapshot load model_path = args.model # print('Load checkpoint from {}'.format(model_path)) model = get_pose_net(args.backbone, False, joint_num) model = DataParallel(model).cuda() # print("after DataParallel", model) ckpt = torch.load(model_path) # print("ckpt", ckpt['network']) model.load_state_dict(ckpt['network']) model.eval() # prepare input image transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]) img_path = args.image assert osp.exists(img_path), 'Cannot find image at ' + img_path original_img = cv2.imread(img_path) original_img_height, original_img_width = original_img.shape[:2] # prepare bbox bbox_list = [ [139.41, 102.25, 222.39, 241.57],\ [287.17, 61.52, 74.88, 165.61],\ [540.04, 48.81, 99.96, 223.36],\ [372.58, 170.84, 266.63, 217.19],\ [0.5, 43.74, 90.1, 220.09]] # xmin, ymin, width, height root_depth_list = [11250.5732421875, 15522.8701171875, 11831.3828125, 8852.556640625, 12572.5966796875] # obtain this from RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE/tree/master/demo) assert len(bbox_list) == len(root_depth_list) person_num = len(bbox_list) # normalized camera intrinsics focal = [1500, 1500] # x-axis, y-axis princpt = [original_img_width/2, original_img_height/2] # x-axis, y-axis print('focal length: (' + str(focal[0]) + ', ' + str(focal[1]) + ')') print('principal points: (' + str(princpt[0]) + ', ' + str(princpt[1]) + ')') # for each cropped and resized human image, forward it to PoseNet output_pose_2d_list = [] output_pose_3d_list = [] for n in range(person_num): bbox = process_bbox(np.array(bbox_list[n]), original_img_width, original_img_height) img, img2bb_trans = generate_patch_image(original_img, bbox, False, 1.0, 0.0, False) img = transform(img).cuda()[None,:,:,:] # forward with torch.no_grad(): pose_3d = model(img) # x,y: pixel, z: root-relative depth (mm) # inverse affine transform (restore the crop and resize) pose_3d = pose_3d[0].cpu().numpy() pose_3d[:,0] = pose_3d[:,0] / cfg.output_shape[1] * cfg.input_shape[1] pose_3d[:,1] = pose_3d[:,1] / cfg.output_shape[0] * cfg.input_shape[0] pose_3d_xy1 = np.concatenate((pose_3d[:,:2], np.ones_like(pose_3d[:,:1])),1) img2bb_trans_001 = np.concatenate((img2bb_trans, np.array([0,0,1]).reshape(1,3))) pose_3d[:,:2] = np.dot(np.linalg.inv(img2bb_trans_001), pose_3d_xy1.transpose(1,0)).transpose(1,0)[:,:2] output_pose_2d_list.append(pose_3d[:,:2].copy()) # root-relative discretized depth -> absolute continuous depth pose_3d[:,2] = (pose_3d[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + root_depth_list[n] pose_3d = pixel2cam(pose_3d, focal, princpt) output_pose_3d_list.append(pose_3d.copy()) # visualize 2d poses vis_img = original_img.copy() for n in range(person_num): vis_kps = np.zeros((3,joint_num)) vis_kps[0,:] = output_pose_2d_list[n][:,0] vis_kps[1,:] = output_pose_2d_list[n][:,1] vis_kps[2,:] = 1 vis_img = vis_keypoints(vis_img, vis_kps, skeleton) cv2.imwrite('output_pose_2d.jpg', vis_img) # visualize 3d poses vis_kps = np.array(output_pose_3d_list) vis_3d_multiple_skeleton(vis_kps, np.ones_like(vis_kps), skeleton, 'output_pose_3d (x,y,z: camera-centered. mm.)') ================================================ FILE: main/config.py ================================================ import os import os.path as osp import sys import numpy as np class Config: ## model architecture backbone = 'LPSKI' ## dataset # training set # 3D: Human36M, MuCo # 2D: MSCOCO, MPII trainset_3d = ['Dummy'] # trainset_3d = ['MuCo'] trainset_2d = [] # trainset_2d = ['MSCOCO'] # testing set # Human36M, MuPoTS, MSCOCO testset = 'MuPoTS' ## directory cur_dir = osp.dirname(os.path.abspath(__file__)) root_dir = osp.join(cur_dir, '..') data_dir = osp.join(root_dir, 'data') output_dir = osp.join(root_dir, 'output') model_dir = osp.join(output_dir, 'model_dump') pretrain_dir = osp.join(output_dir, 'pre_train') vis_dir = osp.join(output_dir, 'vis') log_dir = osp.join(output_dir, 'log') result_dir = osp.join(output_dir, 'result') ## input, output input_shape = (256, 256) output_shape = (input_shape[0]//8, input_shape[1]//8) width_multiplier = 1.0 depth_dim = 32 bbox_3d_shape = (2000, 2000, 2000) # depth, height, width pixel_mean = (0.485, 0.456, 0.406) pixel_std = (0.229, 0.224, 0.225) ## training config embedding_size = 2048 lr_dec_epoch = [17, 21] end_epoch = 25 lr = 1e-3 lr_dec_factor = 10 batch_size = 64 ## testing config test_batch_size = 32 flip_test = True use_gt_info = True ## others num_thread = 20 gpu_ids = '0' num_gpus = 1 continue_train = False if '-' in gpu_ids: gpus = gpu_ids.split('-') gpus[0] = int(gpus[0]) gpus[1] = int(gpus[1]) + 1 gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus)))) os.environ["CUDA_VISIBLE_DEVICES"] = gpu_ids cfg = Config() sys.path.insert(0, osp.join(cfg.root_dir, 'common')) from utils.dir_utils import add_pypath, make_folder # adding path add_pypath(osp.join(cfg.data_dir)) for i in range(len(cfg.trainset_3d)): add_pypath(osp.join(cfg.data_dir, cfg.trainset_3d[i])) for i in range(len(cfg.trainset_2d)): add_pypath(osp.join(cfg.data_dir, cfg.trainset_2d[i])) add_pypath(osp.join(cfg.data_dir, cfg.testset)) make_folder(cfg.model_dir) make_folder(cfg.vis_dir) make_folder(cfg.log_dir) make_folder(cfg.result_dir) ================================================ FILE: main/intermediate.py ================================================ import torch import argparse import numpy as np import os import os.path as osp import cv2 import matplotlib.pyplot as plt import torch.backends.cudnn as cudnn import torchvision.transforms as transforms from torchsummary import summary from torch.nn.parallel.data_parallel import DataParallel from config import cfg from model import get_pose_net from utils.pose_utils import process_bbox, pixel2cam from utils.vis import vis_keypoints, vis_3d_multiple_skeleton from dataset import generate_patch_image def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('--gpu', type=str, dest='gpu_ids') parser.add_argument('--epoch', type=int, dest='test_epoch') parser.add_argument('--input_image', type=str, dest='image') parser.add_argument('--jointnum', type=int, dest='joint') parser.add_argument('--backbone', type=str, dest='backbone') args = parser.parse_args() # test gpus if not args.gpu_ids: assert 0, print("Please set proper gpu ids") if not args.joint: assert print("please insert number of joint") if '-' in args.gpu_ids: gpus = args.gpu_ids.split('-') gpus[0] = 0 if not gpus[0].isdigit() else int(gpus[0]) gpus[1] = len(mem_info()) if not gpus[1].isdigit() else int(gpus[1]) + 1 args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus)))) return args # argument parsing args = parse_args() cfg.set_args(args.gpu_ids) cudnn.benchmark = True # joint set joint_num = args.joint joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe') flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) ) if joint_num == 18: skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) ) if joint_num == 21: skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) ) # snapshot load model_path = os.path.join(cfg.model_dir, 'snapshot_%d.pth.tar' % args.test_epoch) assert osp.exists(model_path), 'Cannot find model at ' + model_path model = get_pose_net(args.backbone, args.frontbone, False, joint_num) model = DataParallel(model).cuda() ckpt = torch.load(model_path) model.load_state_dict(ckpt['network']) model = model.module model.eval() # prepare input image transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]) img_path = args.image assert osp.exists(img_path), 'Cannot find image at ' + img_path original_img = cv2.imread(img_path) original_img_height, original_img_width = original_img.shape[:2] # prepare bbox bbox_list = [ [139.41, 102.25, 222.39, 241.57],\ [287.17, 61.52, 74.88, 165.61],\ [540.04, 48.81, 99.96, 223.36],\ [372.58, 170.84, 266.63, 217.19],\ [0.5, 43.74, 90.1, 220.09] ] # xmin, ymin, width, height root_depth_list = [11250.5732421875, 15522.8701171875, 11831.3828125, 8852.556640625, 12572.5966796875] # obtain this from RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE/tree/master/demo) assert len(bbox_list) == len(root_depth_list) person_num = len(bbox_list) # extractor activation = {} def get_activation(name): def hook(model, input, output): activation[name] = output.detach() return hook for n in range(person_num): bbox = process_bbox(np.array(bbox_list[n]), original_img_width, original_img_height) img, img2bb_trans = generate_patch_image(original_img, bbox, False, 1.0, 0.0, False) img = transform(img).cuda()[None,:,:,:] model.backbone.deonv1.register_forward_hook(get_activation('%d' % n)) # forward with torch.no_grad(): pose_3d = model(img) # x,y: pixel, z: root-relative depth (mm) plt.figure(figsize=(32, 32)) a = activation['0'] - activation['1'] b = torch.sum(a, dim=1) print(b) for i in range(person_num): image = activation['%d'%i] print(image.size()) sum_image = torch.sum(image[0], dim=0) print(sum_image.size()) plt.subplot(1, person_num, i+1) plt.imshow(sum_image.cpu(), cmap='gray') plt.axis('off') plt.show() plt.close() ================================================ FILE: main/model.py ================================================ import torch import torch.nn as nn from torch.nn import functional as F from backbone import * from config import cfg import os.path as osp model_urls = { 'MobileNetV2': 'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth', 'ResNet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth', 'ResNet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth', 'ResNet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth', 'ResNet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth', 'ResNet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth', 'ResNext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth', 'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth', 'wide_resnet50_2': 'https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth', 'wide_resnet101_2': 'https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth', } BACKBONE_DICT = { 'LPRES':LpNetResConcat, 'LPSKI':LpNetSkiConcat, 'LPWO':LpNetWoConcat } def soft_argmax(heatmaps, joint_num): heatmaps = heatmaps.reshape((-1, joint_num, cfg.depth_dim*cfg.output_shape[0]*cfg.output_shape[1])) heatmaps = F.softmax(heatmaps, 2) heatmaps = heatmaps.reshape((-1, joint_num, cfg.depth_dim, cfg.output_shape[0], cfg.output_shape[1])) accu_x = heatmaps.sum(dim=(2,3)) accu_y = heatmaps.sum(dim=(2,4)) accu_z = heatmaps.sum(dim=(3,4)) # accu_x = accu_x * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[1]+1).type(torch.cuda.FloatTensor), devices=[accu_x.device.index])[0] # accu_y = accu_y * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[0]+1).type(torch.cuda.FloatTensor), devices=[accu_y.device.index])[0] # accu_z = accu_z * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.depth_dim+1).type(torch.cuda.FloatTensor), devices=[accu_z.device.index])[0] accu_x = accu_x * torch.arange(1,cfg.output_shape[1]+1) accu_y = accu_y * torch.arange(1,cfg.output_shape[0]+1) accu_z = accu_z * torch.arange(1,cfg.depth_dim+1) accu_x = accu_x.sum(dim=2, keepdim=True) -1 accu_y = accu_y.sum(dim=2, keepdim=True) -1 accu_z = accu_z.sum(dim=2, keepdim=True) -1 coord_out = torch.cat((accu_x, accu_y, accu_z), dim=2) return coord_out class CustomNet(nn.Module): def __init__(self, backbone, joint_num): super(CustomNet, self).__init__() self.backbone = backbone self.joint_num = joint_num def forward(self, input_img, target=None): fm = self.backbone(input_img) coord = soft_argmax(fm, self.joint_num) if target is None: return coord else: target_coord = target['coord'] target_vis = target['vis'] target_have_depth = target['have_depth'] ## coordinate loss loss_coord = torch.abs(coord - target_coord) * target_vis loss_coord = (loss_coord[:,:,0] + loss_coord[:,:,1] + loss_coord[:,:,2] * target_have_depth)/3. return loss_coord def get_pose_net(backbone_str, is_train, joint_num): INPUT_SIZE = cfg.input_shape EMBEDDING_SIZE = cfg.embedding_size # feature dimension WIDTH_MULTIPLIER = cfg.width_multiplier assert INPUT_SIZE == (256, 256) print("=" * 60) print("{} BackBone Generated".format(backbone_str)) print("=" * 60) model = CustomNet(BACKBONE_DICT[backbone_str](input_size = INPUT_SIZE, joint_num = joint_num, embedding_size = EMBEDDING_SIZE, width_mult = WIDTH_MULTIPLIER), joint_num) if is_train == True: model.backbone.init_weights() return model ================================================ FILE: main/pytorch2coreml.py ================================================ import torch import argparse import coremltools as ct from config import cfg from torch.nn.parallel.data_parallel import DataParallel from base import Transformer def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('--gpu', type=str, dest='gpu_ids') parser.add_argument('--joint', type=int, dest='joint') parser.add_argument('--modelpath', type=str, dest='modelpath') parser.add_argument('--backbone', type=str, dest='backbone') args = parser.parse_args() # test gpus if not args.gpu_ids: assert 0, "Please set proper gpu ids" if '-' in args.gpu_ids: gpus = args.gpu_ids.split('-') gpus[0] = int(gpus[0]) gpus[1] = int(gpus[1]) + 1 args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus)))) return args args = parse_args() # modelpath as definite path transformer = Transformer(args.backbone, args.joint, args.modelpath) transformer._make_model() single_pytorch_model = transformer.model device = torch.device('cpu') single_pytorch_model.to(device) dummy_input = torch.randn(1, 3, 256, 256) traced_model = torch.jit.trace(single_pytorch_model, dummy_input) # Convert to Core ML using the Unified Conversion API model = ct.convert( traced_model, inputs=[ct.ImageType(name="input_1", shape=dummy_input.shape)], #name "input_1" is used in 'quickstart' ) model.save("test.mlmodel") ================================================ FILE: main/pytorch2onnx.py ================================================ import onnx import torch import argparse import numpy import imageio import onnxruntime as ort import tensorflow as tf from config import cfg from torchsummary import summary from base import Transformer from onnx_tf.backend import prepare def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('--gpu', type=str, dest='gpu_ids') parser.add_argument('--joint', type=int, dest='joint') parser.add_argument('--modelpath', type=str, dest='modelpath') parser.add_argument('--backbone', type=str, dest='backbone') args = parser.parse_args() # test gpus if not args.gpu_ids: assert 0, "Please set proper gpu ids" if '-' in args.gpu_ids: gpus = args.gpu_ids.split('-') gpus[0] = int(gpus[0]) gpus[1] = int(gpus[1]) + 1 args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus)))) return args args = parse_args() dummy_input = torch.randn(1, 3, 256, 256, device='cuda') # modelpath as definite path transformer = Transformer(args.backbone, args.joint, args.modelpath) transformer._make_model() single_pytorch_model = transformer.model summary(single_pytorch_model, (3, 256, 256)) ONNX_PATH="../output/baseline.onnx" torch.onnx.export( model=single_pytorch_model, args=dummy_input, f=ONNX_PATH, # where should it be saved verbose=False, export_params=True, do_constant_folding=False, # fold constant values for optimization # do_constant_folding=True, # fold constant values for optimization input_names=['input'], output_names=['output'], opset_version=11 ) onnx_model = onnx.load(ONNX_PATH) onnx.checker.check_model(onnx_model) onnx.helper.printable_graph(onnx_model.graph) pytorch_result = single_pytorch_model(dummy_input) pytorch_result = pytorch_result.cpu().detach().numpy() print("pytorch_model output {}".format(pytorch_result.shape), pytorch_result) ort_session = ort.InferenceSession(ONNX_PATH) outputs = ort_session.run(None, {'input': dummy_input.cpu().numpy()}) outputs = numpy.array(outputs[0]) print("onnx_model ouput size{}".format(outputs.shape), outputs) print("difference", numpy.linalg.norm(pytorch_result-outputs)) TF_PATH = "../output/baseline" # where the representation of tensorflow model will be stored # prepare function converts an ONNX model to an internel representation # of the computational graph called TensorflowRep and returns # the converted representation. tf_rep = prepare(onnx_model) # creating TensorflowRep object # export_graph function obtains the graph proto corresponding to the ONNX # model associated with the backend representation and serializes # to a protobuf file. tf_rep.export_graph(TF_PATH) TFLITE_PATH = "../output/baseline.tflite" PB_PATH = "../output/baseline/saved_model.pb" # make a converter object from the saved tensorflow file # converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(PB_PATH, input_arrays=['input'], output_arrays=['output']) converter = tf.lite.TFLiteConverter.from_saved_model(TF_PATH) # tell converter which type of optimization techniques to use # to view the best option for optimization read documentation of tflite about optimization # go to this link https://www.tensorflow.org/lite/guide/get_started#4_optimize_your_model_optional # converter.optimizations = [tf.compat.v1.lite.Optimize.DEFAULT] # converter.experimental_new_converter = True # # # I had to explicitly state the ops # converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, # tf.lite.OpsSet.SELECT_TF_OPS] def representative_dataset(): dataset_size = 10 for i in range(dataset_size): print(i) data = imageio.imread("../sample_images/" + "00000" + str(i) + ".jpg") data = numpy.resize(data, [1, 3, 256, 256]) yield [data.astype(numpy.float32)] converter.experimental_new_converter = True converter.experimental_new_quantizer = True converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = representative_dataset converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8] converter.inference_input_type = tf.uint8 converter.inference_output_type = tf.uint8 # input_arrays = converter.get_input_arrays() # converter.quantized_input_stats = {input_arrays[0]: (0.0, 1.0)} tf_lite_model = converter.convert() # Save the model. with open(TFLITE_PATH, 'wb') as f: f.write(tf_lite_model) ================================================ FILE: main/summary.py ================================================ import torch import argparse import os import os.path as osp import torch.backends.cudnn as cudnn from torchsummary import summary from torch.nn.parallel.data_parallel import DataParallel from config import cfg from model import get_pose_net from thop import profile from thop import clever_format from ptflops import get_model_complexity_info def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('--gpu', type=str, dest='gpu_ids') parser.add_argument('--epoch', type=int, dest='test_epoch') parser.add_argument('--jointnum', type=int, dest='joint') parser.add_argument('--backbone', type=str, dest='backbone') args = parser.parse_args() # test gpus if not args.gpu_ids: assert 0, print("Please set proper gpu ids") if not args.joint: assert print("please insert number of joint") if '-' in args.gpu_ids: gpus = args.gpu_ids.split('-') gpus[0] = 0 if not gpus[0].isdigit() else int(gpus[0]) gpus[1] = len(mem_info()) if not gpus[1].isdigit() else int(gpus[1]) + 1 args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus)))) return args # argument parsing args = parse_args() cfg.set_args(args.gpu_ids) cudnn.benchmark = True # joint set joint_num = args.joint joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe') flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) ) if joint_num == 18: skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) ) if joint_num == 21: skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) ) # snapshot load model_path = os.path.join(cfg.model_dir, 'snapshot_%d.pth.tar' % args.test_epoch) assert osp.exists(model_path), 'Cannot find model at ' + model_path model = get_pose_net(args.backbone, args.frontbone, False, joint_num) model = DataParallel(model).cuda() ckpt = torch.load(model_path) model.load_state_dict(ckpt['network']) single_model = model.module summary(single_model, (3, 256, 256)) input = torch.randn(1, 3, 256, 256).cuda() macs, params = profile(single_model, inputs=(input,)) macs, params = clever_format([macs, params], "%.3f") flops, params1 = get_model_complexity_info(single_model, (3, 256, 256),as_strings=True, print_per_layer_stat=False) print('{:<30} {:<8}'.format('Computational complexity: ', flops)) print('{:<30} {:<8}'.format('Computational complexity: ', macs)) print('{:<30} {:<8}'.format('Number of parameters: ', params)) print('{:<30} {:<8}'.format('Number of parameters: ', params1)) ================================================ FILE: main/test.py ================================================ import argparse from tqdm import tqdm import numpy as np import cv2 from config import cfg import torch from base import Tester from utils.vis import vis_keypoints from utils.pose_utils import flip import torch.backends.cudnn as cudnn def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('--gpu', type=str, dest='gpu_ids') parser.add_argument('--epochs', type=str, dest='model') parser.add_argument('--backbone', type=str, dest='backbone') args = parser.parse_args() # test gpus if not args.gpu_ids: assert 0, "Please set proper gpu ids" if '-' in args.gpu_ids: gpus = args.gpu_ids.split('-') gpus[0] = int(gpus[0]) gpus[1] = int(gpus[1]) + 1 args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus)))) if '-' in args.model: model_epoch = args.model.split('-') model_epoch[0] = int(model_epoch[0]) model_epoch[1] = int(model_epoch[1]) + 1 args.model_epoch = model_epoch return args def main(): args = parse_args() cfg.set_args(args.gpu_ids) cudnn.fastest = True cudnn.benchmark = True cudnn.deterministic = False cudnn.enabled = True tester = Tester(args.backbone) tester._make_batch_generator() for epoch in range(args.model_epoch[0], args.model_epoch[1]): tester._make_model(epoch) preds = [] with torch.no_grad(): for itr, input_img in enumerate(tqdm(tester.batch_generator)): # forward coord_out = tester.model(input_img) if cfg.flip_test: flipped_input_img = flip(input_img, dims=3) flipped_coord_out = tester.model(flipped_input_img) flipped_coord_out[:, :, 0] = cfg.output_shape[1] - flipped_coord_out[:, :, 0] - 1 for pair in tester.flip_pairs: flipped_coord_out[:, pair[0], :], flipped_coord_out[:, pair[1], :] = flipped_coord_out[:, pair[1], :].clone(), flipped_coord_out[:, pair[0], :].clone() coord_out = (coord_out + flipped_coord_out)/2. vis = False if vis: filename = str(itr) tmpimg = input_img[0].cpu().numpy() tmpimg = tmpimg * np.array(cfg.pixel_std).reshape(3,1,1) + np.array(cfg.pixel_mean).reshape(3,1,1) tmpimg = tmpimg.astype(np.uint8) tmpimg = tmpimg[::-1, :, :] tmpimg = np.transpose(tmpimg,(1,2,0)).copy() tmpkps = np.zeros((3,tester.joint_num)) tmpkps[:2,:] = coord_out[0,:,:2].cpu().numpy().transpose(1,0) / cfg.output_shape[0] * cfg.input_shape[0] tmpkps[2,:] = 1 tmpimg = vis_keypoints(tmpimg, tmpkps, tester.skeleton) cv2.imwrite(filename + '_output.jpg', tmpimg) coord_out = coord_out.cpu().numpy() preds.append(coord_out) # evaluate preds = np.concatenate(preds, axis=0) tester._evaluate(preds, cfg.result_dir) if __name__ == "__main__": main() ================================================ FILE: main/time.py ================================================ import torch import argparse from base import Transformer def parse_args(): parser = argparse.ArgumentParser() parser.add_argument('--gpu', type=str, dest='gpu_ids') parser.add_argument('--joint', type=int, dest='joint') parser.add_argument('--modelpath', type=str, dest='modelpath') parser.add_argument('--backbone', type=str, dest='backbone') args = parser.parse_args() # test gpus if not args.gpu_ids: assert 0, "Please set proper gpu ids" if '-' in args.gpu_ids: gpus = args.gpu_ids.split('-') gpus[0] = int(gpus[0]) gpus[1] = int(gpus[1]) + 1 args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus)))) return args args = parse_args() optimal_batch_size = 64 transformer = Transformer(args.backbone, args.joint, args.modelpath) transformer._make_model() model = transformer.model device = torch.device("cuda") dummy_input = torch.randn(optimal_batch_size, 3, 256, 256, dtype=torch.float).to(device) repetitions=100 total_time = 0 with torch.no_grad(): for rep in range(repetitions): starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True) starter.record() _ = model(dummy_input) ender.record() torch.cuda.synchronize() curr_time = starter.elapsed_time(ender)/1000 total_time += curr_time Throughput = (repetitions*optimal_batch_size)/total_time print('Final Throughput:',Throughput) ================================================ FILE: main/train.py ================================================ import argparse from config import cfg from tqdm import tqdm import os.path as osp import numpy as np import torch from base import Trainer from utils.pose_utils import flip import torch.backends.cudnn as cudnn def main(): # argument parse and create log cudnn.fastest = True cudnn.benchmark = True trainer = Trainer(cfg) trainer._make_batch_generator() trainer._make_model() # train for epoch in range(trainer.start_epoch, cfg.end_epoch): trainer.set_lr(epoch) trainer.tot_timer.tic() trainer.read_timer.tic() for itr, (input_img, joint_img, joint_vis, joints_have_depth) in enumerate(trainer.batch_generator): trainer.read_timer.toc() trainer.gpu_timer.tic() # forward trainer.optimizer.zero_grad() target = {'coord': joint_img, 'vis': joint_vis, 'have_depth': joints_have_depth} loss_coord = trainer.model(input_img, target) loss_coord = loss_coord.mean() # backward loss = loss_coord loss.backward() trainer.optimizer.step() trainer.gpu_timer.toc() screen = [ 'Epoch %d/%d itr %d/%d:' % (epoch, cfg.end_epoch, itr, trainer.itr_per_epoch), 'lr: %g' % (trainer.get_lr()), 'speed: %.2f(%.2fs r%.2f)s/itr' % ( trainer.tot_timer.average_time, trainer.gpu_timer.average_time, trainer.read_timer.average_time), '%.2fh/epoch' % (trainer.tot_timer.average_time / 3600. * trainer.itr_per_epoch), '%s: %.4f' % ('loss_coord', loss_coord.detach()), ] trainer.logger.info(' '.join(screen)) trainer.tot_timer.toc() trainer.tot_timer.tic() trainer.read_timer.tic() trainer.save_model({ 'epoch': epoch, 'network': trainer.model.state_dict(), 'optimizer': trainer.optimizer.state_dict(), }, epoch) if __name__ == "__main__": main() ================================================ FILE: requirements.txt ================================================ numpy tqdm torch torchvision torchsummary opencv-python matplotlib pycocotools scipy ================================================ FILE: tool/Human36M/README.MD ================================================ ## Human3.6M dataset pre-processing code You should run the matlab code first, and the python code will convert the output of the matlab code to the json files. **You don't have to run this when you downloaded json files from the google drive.** This is to make json files from raw data. ================================================ FILE: tool/Human36M/h36m2coco.py ================================================ import os import os.path as osp import scipy.io as sio import numpy as np import cv2 import random import json import math from tqdm import tqdm root_dir = './images' # define path here save_dir = './annotations' # define path here joint_num = 17 subject_list = [1, 5, 6, 7, 8, 9, 11] action_idx = (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16) subaction_idx = (1, 2) camera_idx = (1, 2, 3, 4) action_name = ['Directions', 'Discussion', 'Eating', 'Greeting', 'Phoning', 'Posing', 'Purchases', 'Sitting', 'SittingDown', 'Smoking', 'Photo', 'Waiting', 'Walking', 'WalkDog', 'WalkTogether'] def load_h36m_annot_file(annot_file): data = sio.loadmat(annot_file) joint_world = data['pose3d_world'] # 3D world coordinates of keypoints R = data['R'] # extrinsic T = np.reshape(data['T'],(3)) # extrinsic f = np.reshape(data['f'],(-1)) # focal legnth c = np.reshape(data['c'],(-1)) # principal points img_heights = np.reshape(data['img_height'],(-1)) img_widths = np.reshape(data['img_width'],(-1)) return joint_world, R, T, f, c, img_widths, img_heights def _H36FolderName(subject_id, act_id, subact_id, camera_id): return "s_%02d_act_%02d_subact_%02d_ca_%02d" % \ (subject_id, act_id, subact_id, camera_id) def _H36ImageName(folder_name, frame_id): return "%s_%06d.jpg" % (folder_name, frame_id + 1) def cam2pixel(cam_coord, f, c): x = cam_coord[..., 0] / cam_coord[..., 2] * f[0] + c[0] y = cam_coord[..., 1] / cam_coord[..., 2] * f[1] + c[1] return x,y def world2cam(world_coord, R, t): cam_coord = np.dot(R, world_coord - t) return cam_coord def get_bbox(joint_img): bbox = np.zeros((4)) xmin = np.min(joint_img[:,0]) ymin = np.min(joint_img[:,1]) xmax = np.max(joint_img[:,0]) ymax = np.max(joint_img[:,1]) width = xmax - xmin - 1 height = ymax - ymin - 1 bbox[0] = (xmin + xmax)/2. - width/2*1.2 bbox[1] = (ymin + ymax)/2. - height/2*1.2 bbox[2] = width*1.2 bbox[3] = height*1.2 return bbox img_id = 0; annot_id = 0 for subject in tqdm(subject_list): cam_param = {} joint_3d = {} images = []; annotations = []; for aid in tqdm(action_idx): for said in tqdm(subaction_idx): for cid in tqdm(camera_idx): folder = _H36FolderName(subject,aid,said,cid) if folder == 's_11_act_02_subact_02_ca_01': continue joint_world, R, t, f, c, img_widths, img_heights = load_h36m_annot_file(osp.join(root_dir, folder, 'h36m_meta.mat')) if str(aid) not in joint_3d: joint_3d[str(aid)] = {} if str(said) not in joint_3d[str(aid)]: joint_3d[str(aid)][str(said)] = {} img_num = np.shape(joint_world)[0] for n in range(img_num): img_dict = {} img_dict['id'] = img_id img_dict['file_name'] = osp.join(folder, _H36ImageName(folder, n)) img_dict['width'] = int(img_widths[n]) img_dict['height'] = int(img_heights[n]) img_dict['subject'] = subject img_dict['action_name'] = action_name[aid-2] img_dict['action_idx'] = aid img_dict['subaction_idx'] = said img_dict['cam_idx'] = cid img_dict['frame_idx'] = n images.append(img_dict) if str(cid) not in cam_param: cam_param[str(cid)] = {'R': R.tolist(), 't': t.tolist(), 'f': f.tolist(), 'c': c.tolist()} if str(n) not in joint_3d[str(aid)][str(said)]: joint_3d[str(aid)][str(said)][str(n)] = joint_world[n].tolist() annot_dict = {} annot_dict['id'] = annot_id annot_dict['image_id'] = img_id # project world coordinate to cam, image coordinate space joint_cam = np.zeros((joint_num,3)) for j in range(joint_num): joint_cam[j] = world2cam(joint_world[n][j], R, t) joint_img = np.zeros((joint_num,2)) joint_img[:,0], joint_img[:,1] = cam2pixel(joint_cam, f, c) joint_vis = (joint_img[:,0] >= 0) * (joint_img[:,0] < img_widths[n]) * (joint_img[:,1] >= 0) * (joint_img[:,1] < img_heights[n]) annot_dict['keypoints_vis'] = joint_vis.tolist() bbox = get_bbox(joint_img) annot_dict['bbox'] = bbox.tolist() # xmin, ymin, width, height annotations.append(annot_dict) img_id += 1 annot_id += 1 data = {'images': images, 'annotations': annotations} with open(osp.join(save_dir, 'Human36M_subject' + str(subject) + '_data.json'), 'w') as f: json.dump(data, f) with open(osp.join(save_dir, 'Human36M_subject' + str(subject) + '_camera.json'), 'w') as f: json.dump(cam_param, f) with open(osp.join(save_dir, 'Human36M_subject' + str(subject) + '_joint_3d.json'), 'w') as f: json.dump(joint_3d, f) ================================================ FILE: tool/Human36M/preprocess_h36m.m ================================================ % Preprocess human3.6m dataset % Place this file to the Release-v1.1 folder and run it function preprocess_h36m() close all; %clear; %clc; addpaths; %-------------------------------------------------------------------------- % PARAMETERS % Subject (1, 5, 6, 7, 8, 9, 11) SUBJECT = [1 5 6 7 8 9 11]; % Action (2 ~ 16) ACTION = 2:16; % Subaction (1 ~ 2) SUBACTION = 1:2; % Camera (1 ~ 4) CAMERA = 1:4; num_joint = 17; root_dir = '.'; % define path here % if rgb sequence is declared in the loop, it causes stuck (do not know % reason) rgb_sequence = cell(1,100000000); COUNT = 1; %-------------------------------------------------------------------------- % MAIN LOOP % For each subject, action, subaction, and camera.. for subject = SUBJECT for action = ACTION for subaction = SUBACTION for camera = CAMERA fprintf('Processing subject %d, action %d, subaction %d, camera %d..\n', ... subject, action, subaction, camera); img_save_dir = sprintf('%s/images/s_%02d_act_%02d_subact_%02d_ca_%02d', ... root_dir, subject, action, subaction, camera); if ~exist(img_save_dir, 'dir') mkdir(img_save_dir); end mask_save_dir = sprintf('%s/masks/s_%02d_act_%02d_subact_%02d_ca_%02d', ... root_dir, subject, action, subaction, camera); if ~exist(mask_save_dir, 'dir') mkdir(mask_save_dir); end annot_save_dir = sprintf('%s/annotations/s_%02d_act_%02d_subact_%02d_ca_%02d', ... root_dir, subject, action, subaction, camera); if ~exist(annot_save_dir, 'dir') mkdir(annot_save_dir); end if (subject==11) && (action==2) && (subaction==2) && (camera==1) fprintf('There is an error in subject 11, action 2, subaction 2, and camera 1\n'); continue; end % Select sequence Sequence = H36MSequence(subject, action, subaction, camera); % Get 3D pose and 2D pose Features{1} = H36MPose3DPositionsFeature(); % 3D world coordinates Features{1}.Part = 'body'; % Only consider 17 joints Features{2} = H36MPose3DPositionsFeature('Monocular', true); % 3D camera coordinates Features{2}.Part = 'body'; % Only consider 17 joints Features{3} = H36MPose2DPositionsFeature(); % 2D image coordinates Features{3}.Part = 'body'; % Only consider 17 joints F = H36MComputeFeatures(Sequence, Features); num_frame = Sequence.NumFrames; pose3d_world = reshape(F{1}, num_frame, 3, num_joint); pose3d = reshape(F{2}, num_frame, 3, num_joint); pose2d = reshape(F{3}, num_frame, 2, num_joint); % Camera (in global coordinate) Camera = Sequence.getCamera(); % Sanity check if false R = Camera.R; % rotation matrix T = Camera.T'; % origin of the world coord system K = [Camera.f(1) 0 Camera.c(1); 0 Camera.f(2) Camera.c(2); 0 0 1]; % f: focal length, c: principal points error = 0; for i = 1:num_frame X = squeeze(pose3d_global(i,:,:)); x = squeeze(pose2d(i,:,:)); px = K*R*(X-T); px = px ./ px(3,:); px = px(1:2,:); error = error + mean(sqrt(sum((px-x).^2, 1))); end error = error / num_frame; fprintf('reprojection error = %.2f (pixels)\n', error); keyboard; end %% Image, bounding box for each sampled frame fprintf('Load RGB video: '); rgb_extractor = H36MRGBVideoFeature(); rgb_sequence{COUNT} = rgb_extractor.serializer(Sequence); fprintf('Done!!\n'); img_height = zeros(num_frame,1); img_width = zeros(num_frame,1); fprintf('Load mask video: '); mask_extractor = H36MMyBGMask(); mask_sequence = mask_extractor.serializer(Sequence); fprintf('Done!!\n'); % For each frame, for i = 1:num_frame if mod(i,100) == 1 fprintf('.'); end % Save image % Get data img = rgb_sequence{COUNT}.getFrame(i); [h, w, c] = size(img); img_height(i) = h; img_width(i) = w; img_name = sprintf('%s/s_%02d_act_%02d_subact_%02d_ca_%02d_%06d.jpg', ... img_save_dir, subject, action, subaction, camera, i); %imwrite(img, img_name); mask = mask_sequence.Buffer{i}; mask_name = sprintf('%s/s_%02d_act_%02d_subact_%02d_ca_%02d_%06d.jpg', ... mask_save_dir, subject, action, subaction, camera, i); imwrite(mask, mask_name); end COUNT = COUNT + 1; % Save data pose3d_world = permute(pose3d_world,[1,3,2]); % world coordinate 3D keypoint coordinates R = Camera.R; % rotation matrix T = Camera.T; % origin of the world coord system f = Camera.f; % focal length c = Camera.c; % principal points filename = sprintf('%s/h36m_meta.mat', annot_save_dir); %save(filename, 'pose3d_world', 'f', 'c', 'R', 'T', 'img_height', 'img_width'); fprintf('\n'); end end end end end ================================================ FILE: vis/coco_img_name.py ================================================ import os import os.path as osp import scipy.io as sio import numpy as np from pycocotools.coco import COCO import json import cv2 import random import math annot_path = osp.join('coco', 'person_keypoints_val2017.json') data = [] db = COCO(annot_path) fp = open('coco_img_name.txt','w') for iid in db.imgs.keys(): img = db.imgs[iid] imgname = img['file_name'] imgname = 'coco_' + imgname.split('.')[0] fp.write(imgname + '\n') fp.close() ================================================ FILE: vis/multi/draw_2Dskeleton.m ================================================ function img = draw_2Dskeleton(img_name, pred_2d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton) img = imread(img_name); [imgHeight, imgWidth, dim] = size(image); f = figure; set(f, 'visible', 'off'); imshow(img); hold on; line_width = 4; num_skeleton = size(skeleton,1); num_pred = size(pred_2d_kpt,1); for i = 1:num_pred for j =1:num_skeleton k1 = skeleton(j,1); k2 = skeleton(j,2); plot([pred_2d_kpt(i,k1,1),pred_2d_kpt(i,k2,1)],[pred_2d_kpt(i,k1,2),pred_2d_kpt(i,k2,2)],'Color',colorList_skeleton(j,:),'LineWidth',line_width); end for j=1:num_joint scatter(pred_2d_kpt(i,j,1),pred_2d_kpt(i,j,2),100,colorList_joint(j,:),'filled'); end end set(gca,'Units','normalized','Position',[0 0 1 1]); %# Modify axes size frame = getframe(gcf); img = frame.cdata; hold off; close(f); end ================================================ FILE: vis/multi/draw_3Dpose_coco.m ================================================ function draw_3Dpose_coco() root_path = '/mnt/hdd1/Data/Human_pose_estimation/COCO/2017/val2017/'; save_path = './vis/'; num_joint = 17; colorList_skeleton = [ 255/255 128/255 0/255; 255/255 153/255 51/255; 255/255 178/255 102/255; 230/255 230/255 0/255; 255/255 153/255 255/255; 153/255 204/255 255/255; 255/255 102/255 255/255; 255/255 51/255 255/255; 102/255 178/255 255/255; 51/255 153/255 255/255; 255/255 153/255 153/255; 255/255 102/255 102/255; 255/255 51/255 51/255; 153/255 255/255 153/255; 102/255 255/255 102/255; 51/255 255/255 51/255; ]; colorList_joint = [ 255/255 128/255 0/255; 255/255 153/255 51/255; 255/255 153/255 153/255; 255/255 102/255 102/255; 255/255 51/255 51/255; 153/255 255/255 153/255; 102/255 255/255 102/255; 51/255 255/255 51/255; 255/255 153/255 255/255; 255/255 102/255 255/255; 255/255 51/255 255/255; 153/255 204/255 255/255; 102/255 178/255 255/255; 51/255 153/255 255/255; 230/255 230/255 0/255; 230/255 230/255 0/255; 255/255 178/255 102/255; ]; skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ]; skeleton = transpose(reshape(skeleton,[2,16])) + 1; fp_img_name = fopen('../coco_img_name.txt'); preds_2d_kpt = load('preds_2d_kpt_coco.mat'); preds_3d_kpt = load('preds_3d_kpt_coco.mat'); img_name = fgetl(fp_img_name); while ischar(img_name) if isfield(preds_2d_kpt,img_name) pred_2d_kpt = getfield(preds_2d_kpt,img_name); pred_3d_kpt = getfield(preds_3d_kpt,img_name); img_name = strsplit(img_name,'_'); img_name = strcat(img_name{2},'.jpg'); img_path = strcat(root_path,img_name); %img = draw_2Dskeleton(img_path,pred_2d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton); img = imread(img_path); f = draw_3Dskeleton(img,pred_3d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton); set(gcf, 'InvertHardCopy', 'off'); set(gcf,'color','w'); mkdir(save_path); saveas(f, strcat(save_path,img_name)); close(f); end img_name = fgetl(fp_img_name); end end ================================================ FILE: vis/multi/draw_3Dpose_mupots.m ================================================ function draw_3Dpose_mupots() root_path = '/mnt/hdd1/Data/Human_pose_estimation/MU/mupots-3d-eval/MultiPersonTestSet/'; save_path = './vis/'; num_joint = 17; colorList_skeleton = [ 255/255 128/255 0/255; 255/255 153/255 51/255; 255/255 178/255 102/255; 230/255 230/255 0/255; 255/255 153/255 255/255; 153/255 204/255 255/255; 255/255 102/255 255/255; 255/255 51/255 255/255; 102/255 178/255 255/255; 51/255 153/255 255/255; 255/255 153/255 153/255; 255/255 102/255 102/255; 255/255 51/255 51/255; 153/255 255/255 153/255; 102/255 255/255 102/255; 51/255 255/255 51/255; ]; colorList_joint = [ 255/255 128/255 0/255; 255/255 153/255 51/255; 255/255 153/255 153/255; 255/255 102/255 102/255; 255/255 51/255 51/255; 153/255 255/255 153/255; 102/255 255/255 102/255; 51/255 255/255 51/255; 255/255 153/255 255/255; 255/255 102/255 255/255; 255/255 51/255 255/255; 153/255 204/255 255/255; 102/255 178/255 255/255; 51/255 153/255 255/255; 230/255 230/255 0/255; 230/255 230/255 0/255; 255/255 178/255 102/255; ]; skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ]; skeleton = transpose(reshape(skeleton,[2,16])) + 1; fp_img_name = fopen('../mupots_img_name.txt'); preds_2d_kpt = load('preds_2d_kpt_mupots.mat'); preds_3d_kpt = load('preds_3d_kpt_mupots.mat'); img_name = fgetl(fp_img_name); while ischar(img_name) img_name_split = strsplit(img_name); folder_id = str2double(img_name_split(1)); frame_id = str2double(img_name_split(2)); img_name = sprintf('TS%d/img_%06d.jpg',folder_id, frame_id); img_path = strcat(root_path,img_name); pred_2d_kpt = getfield(preds_2d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id)); pred_3d_kpt = getfield(preds_3d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id)); %img = draw_2Dskeleton(img_path,pred_2d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton); img = imread(img_path); f = draw_3Dskeleton(img,pred_3d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton); set(gcf, 'InvertHardCopy', 'off'); set(gcf,'color','w'); mkdir(strcat(save_path,sprintf('TS%d',folder_id))); saveas(f, strcat(save_path,img_name)); close(f); img_name = fgetl(fp_img_name); end end ================================================ FILE: vis/multi/draw_3Dskeleton.m ================================================ function f = draw_3Dskeleton(img, pred_3d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton) x = pred_3d_kpt(:,:,1); y = pred_3d_kpt(:,:,2); z = pred_3d_kpt(:,:,3); pred_3d_kpt(:,:,1) = -z; pred_3d_kpt(:,:,2) = x; pred_3d_kpt(:,:,3) = -y; [imgHeight, imgWidth, dim] = size(img); figure_height = 450; figure_width = figure_height / imgHeight * imgWidth; f = figure('Position',[100 100 figure_width figure_height]); set(f, 'visible', 'off'); hold on; grid on; line_width = 4; point_width = 50; num_skeleton = size(skeleton,1); num_pred = size(pred_3d_kpt,1); for i = 1:num_pred for j =1:num_skeleton k1 = skeleton(j,1); k2 = skeleton(j,2); plot3([pred_3d_kpt(i,k1,1),pred_3d_kpt(i,k2,1)],[pred_3d_kpt(i,k1,2),pred_3d_kpt(i,k2,2)],[pred_3d_kpt(i,k1,3),pred_3d_kpt(i,k2,3)],'Color',colorList_skeleton(j,:),'LineWidth',line_width); end for j=1:num_joint scatter3(pred_3d_kpt(i,j,1),pred_3d_kpt(i,j,2),pred_3d_kpt(i,j,3),point_width,colorList_joint(j,:),'filled'); end end set(gca, 'color', [255/255 255/255 255/255]); set(gca,'XTickLabel',[]); set(gca,'YTickLabel',[]); set(gca,'ZTickLabel',[]); x = pred_3d_kpt(:,:,1); xmin = min(x(:)) - 120000; xmax = max(x(:)) + 6000; y = pred_3d_kpt(:,:,2); ymin = min(y(:)); ymax = max(y(:)); z = pred_3d_kpt(:,:,3); zmin = min(z(:)); zmax = max(z(:)); xlim([xmin xmax]); ylim([ymin ymax]); zlim([zmin zmax]); h_img = surf([xmin;xmin],[ymin ymax;ymin ymax],[zmax zmax;zmin zmin],'CData',img,'FaceColor','texturemap'); set(h_img); view(62,27); end ================================================ FILE: vis/mupots_img_name.py ================================================ import os import os.path as osp import scipy.io as sio import numpy as np from pycocotools.coco import COCO import json import cv2 import random import math annot_path = osp.join('mupots', 'MuPoTS-3D.json') data = [] db = COCO(annot_path) fp = open('mupots_img_name.txt','w') for iid in db.imgs.keys(): img = db.imgs[iid] imgname = img['file_name'].split('/') folder_id = int(imgname[0][2:]) frame_id = int(imgname[1].split('.')[0][4:]) fp.write(str(folder_id) + ' ' + str(frame_id) + '\n') fp.close() ================================================ FILE: vis/single/draw_2Dskeleton.m ================================================ function img = draw_2Dskeleton(img_name, pred_2d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton) img = imread(img_name); pred_2d_kpt = squeeze(pred_2d_kpt); f = figure; set(f, 'visible', 'off'); imshow(img); hold on; line_width = 4; num_skeleton = size(skeleton,1); for j =1:num_skeleton k1 = skeleton(j,1); k2 = skeleton(j,2); plot([pred_2d_kpt(k1,1),pred_2d_kpt(k2,1)],[pred_2d_kpt(k1,2),pred_2d_kpt(k2,2)],'Color',colorList_skeleton(j,:),'LineWidth',line_width); end for j=1:num_joint scatter(pred_2d_kpt(j,1),pred_2d_kpt(j,2),100,colorList_joint(j,:),'filled'); end set(gca,'Units','normalized','Position',[0 0 1 1]); %# Modify axes size frame = getframe(gcf); img = frame.cdata; hold off; close(f); end ================================================ FILE: vis/single/draw_3Dpose_coco.m ================================================ function draw_3Dpose_coco() root_path = '/mnt/hdd1/Data/Human_pose_estimation/COCO/2017/val2017/'; save_path = './vis/'; num_joint = 17; mkdir(save_path); colorList_skeleton = [ 255/255 128/255 0/255; 255/255 153/255 51/255; 255/255 178/255 102/255; 230/255 230/255 0/255; 255/255 153/255 255/255; 153/255 204/255 255/255; 255/255 102/255 255/255; 255/255 51/255 255/255; 102/255 178/255 255/255; 51/255 153/255 255/255; 255/255 153/255 153/255; 255/255 102/255 102/255; 255/255 51/255 51/255; 153/255 255/255 153/255; 102/255 255/255 102/255; 51/255 255/255 51/255; ]; colorList_joint = [ 255/255 128/255 0/255; 255/255 153/255 51/255; 255/255 153/255 153/255; 255/255 102/255 102/255; 255/255 51/255 51/255; 153/255 255/255 153/255; 102/255 255/255 102/255; 51/255 255/255 51/255; 255/255 153/255 255/255; 255/255 102/255 255/255; 255/255 51/255 255/255; 153/255 204/255 255/255; 102/255 178/255 255/255; 51/255 153/255 255/255; 230/255 230/255 0/255; 230/255 230/255 0/255; 255/255 178/255 102/255; ]; skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ]; skeleton = transpose(reshape(skeleton,[2,16])) + 1; fp_img_name = fopen('../coco_img_name.txt'); preds_2d_kpt = load('preds_2d_kpt_coco.mat'); preds_3d_kpt = load('preds_3d_kpt_coco.mat'); img_name = fgetl(fp_img_name); while ischar(img_name) if isfield(preds_2d_kpt,img_name) pred_2d_kpt = getfield(preds_2d_kpt,img_name); pred_3d_kpt = getfield(preds_3d_kpt,img_name); img_name = strsplit(img_name,'_'); img_name = strcat(img_name{2},'.jpg'); img_path = strcat(root_path,img_name); num_pred = size(pred_2d_kpt,1); for i = 1:num_pred img = draw_2Dskeleton(img_path,pred_2d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton); save_name = strsplit(img_name,'.'); save_name = save_name{1}; save_name = strcat(save_name,sprintf('_%d_2d.jpg',i)); disp(strcat(save_path,save_name)); imwrite(img,strcat(save_path,save_name)); f = draw_3Dskeleton(pred_3d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton); set(gcf, 'InvertHardCopy', 'off'); set(gcf,'color','w'); save_name = strsplit(img_name,'.'); save_name = save_name{1}; save_name = strcat(save_name,sprintf('_%d_3d.jpg',i)); saveas(f, strcat(save_path,save_name)); close(f); end end img_name = fgetl(fp_img_name); end end ================================================ FILE: vis/single/draw_3Dpose_mupots.m ================================================ function draw_3Dpose_mupots() root_path = '/mnt/hdd1/Data/Human_pose_estimation/MU/mupots-3d-eval/MultiPersonTestSet/'; save_path = './vis/'; num_joint = 17; colorList_skeleton = [ 255/255 128/255 0/255; 255/255 153/255 51/255; 255/255 178/255 102/255; 230/255 230/255 0/255; 255/255 153/255 255/255; 153/255 204/255 255/255; 255/255 102/255 255/255; 255/255 51/255 255/255; 102/255 178/255 255/255; 51/255 153/255 255/255; 255/255 153/255 153/255; 255/255 102/255 102/255; 255/255 51/255 51/255; 153/255 255/255 153/255; 102/255 255/255 102/255; 51/255 255/255 51/255; ]; colorList_joint = [ 255/255 128/255 0/255; 255/255 153/255 51/255; 255/255 153/255 153/255; 255/255 102/255 102/255; 255/255 51/255 51/255; 153/255 255/255 153/255; 102/255 255/255 102/255; 51/255 255/255 51/255; 255/255 153/255 255/255; 255/255 102/255 255/255; 255/255 51/255 255/255; 153/255 204/255 255/255; 102/255 178/255 255/255; 51/255 153/255 255/255; 230/255 230/255 0/255; 230/255 230/255 0/255; 255/255 178/255 102/255; ]; skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ]; skeleton = transpose(reshape(skeleton,[2,16])) + 1; fp_img_name = fopen('../mupots_img_name.txt'); preds_2d_kpt = load('preds_2d_kpt_mupots.mat'); preds_3d_kpt = load('preds_3d_kpt_mupots.mat'); img_name = fgetl(fp_img_name); while ischar(img_name) img_name_split = strsplit(img_name); folder_id = str2double(img_name_split(1)); frame_id = str2double(img_name_split(2)); img_name = sprintf('TS%d/img_%06d.jpg',folder_id, frame_id); img_path = strcat(root_path,img_name); mkdir(strcat(save_path,sprintf('TS%d',folder_id))); pred_2d_kpt = getfield(preds_2d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id)); pred_3d_kpt = getfield(preds_3d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id)); num_pred = size(pred_2d_kpt,1); for i = 1:num_pred img = draw_2Dskeleton(img_path,pred_2d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton); save_name = sprintf('TS%d/img_%06d_%d_2d.jpg',folder_id, frame_id, i); imwrite(img,strcat(save_path,save_name)); f = draw_3Dskeleton(pred_3d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton); set(gcf, 'InvertHardCopy', 'off'); set(gcf,'color','w'); save_name = sprintf('TS%d/img_%06d_%d_3d.jpg',folder_id, frame_id, i); saveas(f, strcat(save_path,save_name)); close(f); end img_name = fgetl(fp_img_name); end end ================================================ FILE: vis/single/draw_3Dskeleton.m ================================================ function f = draw_3Dskeleton(pred_3d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton) pred_3d_kpt = squeeze(pred_3d_kpt); x = pred_3d_kpt(:,1); y = pred_3d_kpt(:,2); z = pred_3d_kpt(:,3); pred_3d_kpt(:,1) = -z; pred_3d_kpt(:,2) = x; pred_3d_kpt(:,3) = -y; f = figure;%('Position',[100 100 600 600]); set(f, 'visible', 'off'); hold on; grid on; line_width = 6; num_skeleton = size(skeleton,1); for j =1:num_skeleton k1 = skeleton(j,1); k2 = skeleton(j,2); plot3([pred_3d_kpt(k1,1),pred_3d_kpt(k2,1)],[pred_3d_kpt(k1,2),pred_3d_kpt(k2,2)],[pred_3d_kpt(k1,3),pred_3d_kpt(k2,3)],'Color',colorList_skeleton(j,:),'LineWidth',line_width); end for j=1:num_joint scatter3(pred_3d_kpt(j,1),pred_3d_kpt(j,2),pred_3d_kpt(j,3),100,colorList_joint(j,:),'filled'); end set(gca, 'color', [255/255 255/255 255/255]); set(gca,'XTickLabel',[]); set(gca,'YTickLabel',[]); set(gca,'ZTickLabel',[]); x = pred_3d_kpt(:,1); xmin = min(x(:)) - 100; xmax = max(x(:)) + 100; y = pred_3d_kpt(:,2); ymin = min(y(:)) - 100; ymax = max(y(:)) + 100; z = pred_3d_kpt(:,3); zmin = min(z(:)); zmax = max(z(:)) + 100; xcenter = mean(pred_3d_kpt(:,1)); ycenter = mean(pred_3d_kpt(:,2)); zcenter = mean(pred_3d_kpt(:,3)); xmin = xcenter - 1000; xmax = xcenter + 1000; ymin = ycenter - 1000; ymax = ycenter + 1000; zmin = zcenter - 1000; zmax = zcenter + 1000; xlim([xmin xmax]); ylim([ymin ymax]); zlim([zmin zmax]); view(62,7); end