Repository: SangbumChoi/MobileHumanPose
Branch: master
Commit: a359dd9798e0
Files: 51
Total size: 172.2 KB
Directory structure:
gitextract_l8_g4mzf/
├── .gitignore
├── LICENSE
├── README.md
├── common/
│ ├── backbone/
│ │ ├── __init__.py
│ │ ├── lpnet_res_concat.py
│ │ ├── lpnet_ski_concat.py
│ │ └── lpnet_wo_concat.py
│ ├── base.py
│ ├── logger.py
│ ├── timer.py
│ └── utils/
│ ├── __init__.py
│ ├── dir_utils.py
│ ├── pose_utils.py
│ └── vis.py
├── data/
│ ├── Dummy/
│ │ ├── Dummy.py
│ │ ├── annotations/
│ │ │ ├── Dummy_subject1_camera.json
│ │ │ ├── Dummy_subject1_data.json
│ │ │ └── Dummy_subject1_joint_3d.json
│ │ └── bbox_root/
│ │ └── bbox_dummy_output.json
│ ├── Human36M/
│ │ └── Human36M.py
│ ├── MPII/
│ │ └── MPII.py
│ ├── MSCOCO/
│ │ └── MSCOCO.py
│ ├── MuCo/
│ │ └── MuCo.py
│ ├── MuPoTS/
│ │ ├── MuPoTS.py
│ │ └── mpii_mupots_multiperson_eval.m
│ ├── dataset.py
│ └── multiple_datasets.py
├── demo/
│ └── demo.py
├── main/
│ ├── config.py
│ ├── intermediate.py
│ ├── model.py
│ ├── pytorch2coreml.py
│ ├── pytorch2onnx.py
│ ├── summary.py
│ ├── test.py
│ ├── time.py
│ └── train.py
├── requirements.txt
├── tool/
│ └── Human36M/
│ ├── README.MD
│ ├── h36m2coco.py
│ └── preprocess_h36m.m
└── vis/
├── coco_img_name.py
├── multi/
│ ├── draw_2Dskeleton.m
│ ├── draw_3Dpose_coco.m
│ ├── draw_3Dpose_mupots.m
│ └── draw_3Dskeleton.m
├── mupots_img_name.py
└── single/
├── draw_2Dskeleton.m
├── draw_3Dpose_coco.m
├── draw_3Dpose_mupots.m
└── draw_3Dskeleton.m
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# virtualenv setting
venv_3DMPPE
# output result
output
# demo output
demo/*.pth.tar
# byte-compiled
/__pycache_/
*/__pycache/*
*/*/__pycache/
*/*/*/__pycache/
*.py[cod]
*.pyc
# nohup process
*.out
# idea
.DS_Store
.idea
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2019 Gyeongsik Moon
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# Github Code of "MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices"
#### [2021.11.23] There will be massive refactoring and optimization expected. It will be released as soon as possible including new model.pth, Please wait for the model!(expecting end of December)
#### [2022.05.19] Dummy dataloader is added. This will make reduce about to 100x faster that user to generate dummy pth.tar file of MobileHumanPose model for their PoC.
## Introduction
This repo is official **[PyTorch](https://pytorch.org)** implementation of **[MobileHumanPose: Toward real-time 3D human pose estimation in mobile devices(CVPRW 2021)](https://openaccess.thecvf.com/content/CVPR2021W/MAI/html/Choi_MobileHumanPose_Toward_Real-Time_3D_Human_Pose_Estimation_in_Mobile_Devices_CVPRW_2021_paper.html)**.
## Dependencies
* [PyTorch](https://pytorch.org)
* [CUDA](https://developer.nvidia.com/cuda-downloads)
* [cuDNN](https://developer.nvidia.com/cudnn)
* [Anaconda](https://www.anaconda.com/download/)
* [COCO API](https://github.com/cocodataset/cocoapi)
This code is tested under Ubuntu 16.04, CUDA 11.2 environment with two NVIDIA RTX or V100 GPUs.
Python 3.6.5 version with virtualenv is used for development.
## Directory
### Root
The `${ROOT}` is described as below.
```
${ROOT}
|-- data
|-- demo
|-- common
|-- main
|-- tool
|-- vis
`-- output
```
* `data` contains data loading codes and soft links to images and annotations directories.
* `demo` contains demo codes.
* `common` contains kernel codes for 3d multi-person pose estimation system. Also custom backbone is implemented in this repo
* `main` contains high-level codes for training or testing the network.
* `tool` contains data pre-processing codes. You don't have to run this code. I provide pre-processed data below.
* `vis` contains scripts for 3d visualization.
* `output` contains log, trained models, visualized outputs, and test result.
### Data
You need to follow directory structure of the `data` as below.
```
${POSE_ROOT}
|-- data
| |-- Human36M
| | |-- bbox_root
| | | |-- bbox_root_human36m_output.json
| | |-- images
| | |-- annotations
| |-- MPII
| | |-- images
| | |-- annotations
| |-- MSCOCO
| | |-- bbox_root
| | | |-- bbox_root_coco_output.json
| | |-- images
| | | |-- train2017
| | | |-- val2017
| | |-- annotations
| |-- MuCo
| | |-- data
| | | |-- augmented_set
| | | |-- unaugmented_set
| | | |-- MuCo-3DHP.json
| |-- MuPoTS
| | |-- bbox_root
| | | |-- bbox_mupots_output.json
| | |-- data
| | | |-- MultiPersonTestSet
| | | |-- MuPoTS-3D.json
```
* Download Human3.6M parsed data [[data](https://drive.google.com/drive/folders/1kgVH-GugrLoc9XyvP6nRoaFpw3TmM5xK?usp=sharing)]
* Download MPII parsed data [[images](http://human-pose.mpi-inf.mpg.de/)][[annotations](https://drive.google.com/drive/folders/1MmQ2FRP0coxHGk0Ntj0JOGv9OxSNuCfK?usp=sharing)]
* Download MuCo parsed and composited data [[data](https://drive.google.com/drive/folders/1yL2ey3aWHJnh8f_nhWP--IyC9krAPsQN?usp=sharing)]
* Download MuPoTS parsed data [[images](http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson/)][[annotations](https://drive.google.com/drive/folders/1WmfQ8UEj6nuamMfAdkxmrNcsQTrTfKK_?usp=sharing)]
* All annotation files follow [MS COCO format](http://cocodataset.org/#format-data).
* If you want to add your own dataset, you have to convert it to [MS COCO format](http://cocodataset.org/#format-data).
### Output
You need to follow the directory structure of the `output` folder as below.
```
${POSE_ROOT}
|-- output
|-- |-- log
|-- |-- model_dump
|-- |-- result
`-- |-- vis
```
* Creating `output` folder as soft link form is recommended instead of folder form because it would take large storage capacity.
* `log` folder contains training log file.
* `model_dump` folder contains saved checkpoints for each epoch.
* `result` folder contains final estimation files generated in the testing stage.
* `vis` folder contains visualized results.
### 3D visualization
* Run `$DB_NAME_img_name.py` to get image file names in `.txt` format.
* Place your test result files (`preds_2d_kpt_$DB_NAME.mat`, `preds_3d_kpt_$DB_NAME.mat`) in `single` or `multi` folder.
* Run `draw_3Dpose_$DB_NAME.m`
## Running 3DMPPE_POSENET
### Requirements
```shell
cd main
pip install -r requirements.txt
```
### Setup Training
* In the `main/config.py`, you can change settings of the model including dataset to use, network backbone, and input size and so on.
### Train
In the `main` folder, run
```bash
python train.py --gpu 0-1 --backbone LPSKI
```
to train the network on the GPU 0,1.
If you want to continue experiment, run
```bash
python train.py --gpu 0-1 --backbone LPSKI --continue
```
`--gpu 0,1` can be used instead of `--gpu 0-1`.
### Test
Place trained model at the `output/model_dump/`.
In the `main` folder, run
```bash
python test.py --gpu 0-1 --test_epoch 20-21 --backbone LPSKI
```
to test the network on the GPU 0,1 with 20th and 21th epoch trained model. `--gpu 0,1` can be used instead of `--gpu 0-1`. For the backbone you can either choose
BACKBONE_DICT = {
'LPRES':LpNetResConcat,
'LPSKI':LpNetSkiConcat,
'LPWO':LpNetWoConcat
}
#### Human3.6M dataset using protocol 1
For the evaluation, you can run `test.py` or there are evaluation codes in `Human36M`.
#### Human3.6M dataset using protocol 2
For the evaluation, you can run `test.py` or there are evaluation codes in `Human36M`.
#### MuPoTS-3D dataset
For the evaluation, run `test.py`. After that, move `data/MuPoTS/mpii_mupots_multiperson_eval.m` in `data/MuPoTS/data`. Also, move the test result files (`preds_2d_kpt_mupots.mat` and `preds_3d_kpt_mupots.mat`) in `data/MuPoTS/data`. Then run `mpii_mupots_multiperson_eval.m` with your evaluation mode arguments.
#### TFLite inference
For the inference in mobile devices we also tested in mobile devices which converting PyTorch implementation through onnx and finally serving into TFlite.
Official demo app is available in [here](https://github.com/tucan9389/PoseEstimation-TFLiteSwift)
## Reference
**What this repo cames from:**
Training section and is based on following paper and github
* [PyTorch](https://pytorch.org) implementation of [Camera Distance-aware Top-down Approach for 3D Multi-person Pose Estimation from a Single RGB Image (ICCV 2019)](https://arxiv.org/abs/1907.11346).
* Flexible and simple code.
* Compatibility for most of the publicly available 2D and 3D, single and multi-person pose estimation datasets including **[Human3.6M](http://vision.imar.ro/human3.6m/description.php), [MPII](http://human-pose.mpi-inf.mpg.de/), [MS COCO 2017](http://cocodataset.org/#home), [MuCo-3DHP](http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson/) and [MuPoTS-3D](http://gvv.mpi-inf.mpg.de/projects/SingleShotMultiPerson/)**.
* Human pose estimation visualization code.
```
@InProceedings{Choi_2021_CVPR,
author = {Choi, Sangbum and Choi, Seokeon and Kim, Changick},
title = {MobileHumanPose: Toward Real-Time 3D Human Pose Estimation in Mobile Devices},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2021},
pages = {2328-2338}
}
```
================================================
FILE: common/backbone/__init__.py
================================================
from backbone.lpnet_res_concat import *
from backbone.lpnet_ski_concat import *
from backbone.lpnet_wo_concat import *
================================================
FILE: common/backbone/lpnet_res_concat.py
================================================
import torch.nn as nn
import torch
from torchsummary import summary
def _make_divisible(v, divisor, min_value=None):
"""
This function is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by 8
It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
:param v:
:param divisor:
:param min_value:
:return:
"""
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_v < 0.9 * v:
new_v += divisor
return new_v
class DoubleConv(nn.Sequential):
def __init__(self, in_ch, out_ch, norm_layer=None, activation_layer=None):
super(DoubleConv, self).__init__(
nn.Conv2d(in_ch , out_ch, kernel_size=1),
norm_layer(out_ch),
activation_layer(out_ch),
nn.Conv2d(out_ch, out_ch, kernel_size=3, padding=1),
norm_layer(out_ch),
activation_layer(out_ch),
nn.UpsamplingBilinear2d(scale_factor=2)
)
class ConvBNReLU(nn.Sequential):
def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None, activation_layer=None):
padding = (kernel_size - 1) // 2
super(ConvBNReLU, self).__init__(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
norm_layer(out_planes),
activation_layer(out_planes)
)
class InvertedResidual(nn.Module):
def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None, activation_layer=None):
super(InvertedResidual, self).__init__()
self.stride = stride
assert stride in [1, 2]
hidden_dim = int(round(inp * expand_ratio))
self.use_res_connect = self.stride == 1 and inp == oup
layers = []
if expand_ratio != 1:
# pw
layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer))
layers.extend([
# dw
ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer, activation_layer=activation_layer),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
norm_layer(oup),
])
self.conv = nn.Sequential(*layers)
def forward(self, x):
if self.use_res_connect:
return x + self.conv(x)
else:
return self.conv(x)
class LpNetResConcat(nn.Module):
def __init__(self,
input_size,
joint_num,
input_channel = 48,
embedding_size = 2048,
width_mult=1.0,
round_nearest=8,
block=None,
norm_layer=None,
activation_layer=None,
inverted_residual_setting=None):
super(LpNetResConcat, self).__init__()
assert input_size[1] in [256]
if block is None:
block = InvertedResidual
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if activation_layer is None:
activation_layer = nn.PReLU # PReLU does not have inplace True
if inverted_residual_setting is None:
inverted_residual_setting = [
# t, c, n, s
[1, 64, 1, 1], #[-1, 48, 256, 256]
[6, 48, 2, 2], #[-1, 48, 128, 128]
[6, 48, 3, 2], #[-1, 48, 64, 64]
[6, 64, 4, 2], #[-1, 64, 32, 32]
[6, 96, 3, 2], #[-1, 96, 16, 16]
[6, 160, 3, 2], #[-1, 160, 8, 8]
[6, 320, 1, 1], #[-1, 320, 8, 8]
]
# building first layer
inp_channel = [_make_divisible(input_channel * width_mult, round_nearest),
_make_divisible(input_channel * width_mult, round_nearest) + inverted_residual_setting[0][1],
inverted_residual_setting[0][1] + inverted_residual_setting[1][1],
inverted_residual_setting[1][1] + inverted_residual_setting[2][1],
inverted_residual_setting[2][1] + inverted_residual_setting[3][1],
inverted_residual_setting[3][1] + inverted_residual_setting[4][1],
inverted_residual_setting[4][1] + inverted_residual_setting[5][1],
inverted_residual_setting[5][1] + inverted_residual_setting[6][1],
inverted_residual_setting[6][1] + embedding_size,
256 + embedding_size,
]
self.first_conv = ConvBNReLU(3, inp_channel[0], stride=1, norm_layer=norm_layer, activation_layer=activation_layer)
inv_residual = []
# building inverted residual blocks
j = 0
for t, c, n, s in inverted_residual_setting:
output_channel = _make_divisible(c * width_mult, round_nearest)
for i in range(n):
stride = s if i == 0 else 1
input_channel = inp_channel[j] if i == 0 else output_channel
inv_residual.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer, activation_layer=activation_layer))
j += 1
# make it nn.Sequential
self.inv_residual = nn.Sequential(*inv_residual)
self.last_conv = ConvBNReLU(inp_channel[j], embedding_size, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer)
self.deonv0 = DoubleConv(inp_channel[j+1], 256, norm_layer=norm_layer, activation_layer=activation_layer)
self.deonv1 = DoubleConv(2304, 256, norm_layer=norm_layer, activation_layer=activation_layer)
self.deonv2 = DoubleConv(512, 256, norm_layer=norm_layer, activation_layer=activation_layer)
self.final_layer = nn.Conv2d(
in_channels=256,
out_channels= joint_num * 64,
kernel_size=1,
stride=1,
padding=0
)
self.avgpool = nn.AvgPool2d(3, stride=2, padding=1, count_include_pad=False)
self.upsample = nn.UpsamplingBilinear2d(scale_factor=2)
def forward(self, x):
x0 = self.first_conv(x)
x1 = self.inv_residual[0:1](x0)
x2 = self.inv_residual[1:3](torch.cat([x0, x1], dim=1))
x0 = self.inv_residual[3:6](torch.cat([self.avgpool(x1), x2], dim=1))
x1 = self.inv_residual[6:10](torch.cat([self.avgpool(x2), x0], dim=1))
x2 = self.inv_residual[10:13](torch.cat([self.avgpool(x0), x1], dim=1))
x0 = self.inv_residual[13:16](torch.cat([self.avgpool(x1), x2], dim=1))
x1 = self.inv_residual[16:17](torch.cat([self.avgpool(x2), x0], dim=1))
x2 = self.last_conv(torch.cat([x0, x1], dim=1))
x0 = self.deonv0(torch.cat([x1, x2], dim=1))
x1 = self.deonv1(torch.cat([self.upsample(x2), x0], dim=1))
x2 = self.deonv2(torch.cat([self.upsample(x0), x1], dim=1))
x0 = self.final_layer(x2)
return x0
def init_weights(self):
for i in [self.deconv0, self.deconv1, self.deconv2]:
for name, m in i.named_modules():
if isinstance(m, nn.ConvTranspose2d):
nn.init.normal_(m.weight, std=0.001)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
for j in [self.first_conv, self.inv_residual, self.last_conv, self.final_layer]:
for m in j.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, std=0.001)
if hasattr(m, 'bias'):
if m.bias is not None:
nn.init.constant_(m.bias, 0)
if __name__ == "__main__":
model = LpNetResConcat((256, 256), 18)
test_data = torch.rand(1, 3, 256, 256)
test_outputs = model(test_data)
# print(test_outputs.size())
summary(model, (3, 256, 256))
================================================
FILE: common/backbone/lpnet_ski_concat.py
================================================
import torch.nn as nn
import torch
from torchsummary import summary
def _make_divisible(v, divisor, min_value=None):
"""
This function is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by 8
It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
:param v:
:param divisor:
:param min_value:
:return:
"""
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_v < 0.9 * v:
new_v += divisor
return new_v
class DeConv(nn.Sequential):
def __init__(self, in_ch, mid_ch, out_ch, norm_layer=None, activation_layer=None):
super(DeConv, self).__init__(
nn.Conv2d(in_ch + mid_ch, mid_ch, kernel_size=1),
norm_layer(mid_ch),
activation_layer(mid_ch),
nn.Conv2d(mid_ch, out_ch, kernel_size=3, padding=1),
norm_layer(out_ch),
activation_layer(out_ch),
nn.UpsamplingBilinear2d(scale_factor=2)
)
class ConvBNReLU(nn.Sequential):
def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None, activation_layer=None):
padding = (kernel_size - 1) // 2
super(ConvBNReLU, self).__init__(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
norm_layer(out_planes),
activation_layer(out_planes)
)
class InvertedResidual(nn.Module):
def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None, activation_layer=None):
super(InvertedResidual, self).__init__()
self.stride = stride
assert stride in [1, 2]
hidden_dim = int(round(inp * expand_ratio))
self.use_res_connect = self.stride == 1 and inp == oup
layers = []
if expand_ratio != 1:
# pw
layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer))
layers.extend([
# dw
ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer, activation_layer=activation_layer),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
norm_layer(oup),
])
self.conv = nn.Sequential(*layers)
def forward(self, x):
if self.use_res_connect:
return x + self.conv(x)
else:
return self.conv(x)
class LpNetSkiConcat(nn.Module):
def __init__(self,
input_size,
joint_num,
input_channel = 48,
embedding_size = 2048,
width_mult=1.0,
round_nearest=8,
block=None,
norm_layer=None,
activation_layer=None,
inverted_residual_setting=None):
super(LpNetSkiConcat, self).__init__()
assert input_size[1] in [256]
if block is None:
block = InvertedResidual
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if activation_layer is None:
activation_layer = nn.PReLU # PReLU does not have inplace True
if inverted_residual_setting is None:
inverted_residual_setting = [
# t, c, n, s
[1, 64, 1, 2], #[-1, 48, 256, 256]
[6, 48, 2, 2], #[-1, 48, 128, 128]
[6, 48, 3, 2], #[-1, 48, 64, 64]
[6, 64, 4, 2], #[-1, 64, 32, 32]
[6, 96, 3, 2], #[-1, 96, 16, 16]
[6, 160, 3, 1], #[-1, 160, 8, 8]
[6, 320, 1, 1], #[-1, 320, 8, 8]
]
# building first layer
input_channel = _make_divisible(input_channel * width_mult, round_nearest)
self.first_conv = ConvBNReLU(3, input_channel, stride=2, norm_layer=norm_layer, activation_layer=activation_layer)
inv_residual = []
# building inverted residual blocks
for t, c, n, s in inverted_residual_setting:
output_channel = _make_divisible(c * width_mult, round_nearest)
for i in range(n):
stride = s if i == 0 else 1
inv_residual.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer, activation_layer=activation_layer))
input_channel = output_channel
# make it nn.Sequential
self.inv_residual = nn.Sequential(*inv_residual)
self.last_conv = ConvBNReLU(input_channel, embedding_size, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer)
self.deconv0 = DeConv(embedding_size, _make_divisible(inverted_residual_setting[-3][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)
self.deconv1 = DeConv(256, _make_divisible(inverted_residual_setting[-4][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)
self.deconv2 = DeConv(256, _make_divisible(inverted_residual_setting[-5][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)
self.final_layer = nn.Conv2d(
in_channels=256,
out_channels= joint_num * 32,
kernel_size=1,
stride=1,
padding=0
)
def forward(self, x):
x = self.first_conv(x)
x = self.inv_residual[0:6](x)
x2 = x
x = self.inv_residual[6:10](x)
x1 = x
x = self.inv_residual[10:13](x)
x0 = x
x = self.inv_residual[13:16](x)
x = self.inv_residual[16:](x)
z = self.last_conv(x)
z = torch.cat([x0, z], dim=1)
z = self.deconv0(z)
z = torch.cat([x1, z], dim=1)
z = self.deconv1(z)
z = torch.cat([x2, z], dim=1)
z = self.deconv2(z)
z = self.final_layer(z)
return z
def init_weights(self):
for i in [self.deconv0, self.deconv1, self.deconv2]:
for name, m in i.named_modules():
if isinstance(m, nn.ConvTranspose2d):
nn.init.normal_(m.weight, std=0.001)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
for j in [self.first_conv, self.inv_residual, self.last_conv, self.final_layer]:
for m in j.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, std=0.001)
if hasattr(m, 'bias'):
if m.bias is not None:
nn.init.constant_(m.bias, 0)
if __name__ == "__main__":
LpNetSkiConcat((256, 256), 18).init_weights()
model = LpNetSkiConcat((256, 256), 18)
test_data = torch.rand(1, 3, 256, 256)
test_outputs = model(test_data)
print(test_outputs.size())
summary(model, (3, 256, 256))
================================================
FILE: common/backbone/lpnet_wo_concat.py
================================================
import torch.nn as nn
import torch
from torchsummary import summary
def _make_divisible(v, divisor, min_value=None):
"""
This function is taken from the original tf repo. It ensures that all layers have a channel number that is divisible by 8
It can be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
:param v:
:param divisor:
:param min_value:
:return:
"""
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_v < 0.9 * v:
new_v += divisor
return new_v
class DeConv(nn.Sequential):
def __init__(self, in_ch, mid_ch, out_ch, norm_layer=None, activation_layer=None):
super(DeConv, self).__init__(
nn.Conv2d(in_ch, mid_ch, kernel_size=1),
norm_layer(mid_ch),
activation_layer(mid_ch),
nn.Conv2d(mid_ch, out_ch, kernel_size=3, padding=1),
norm_layer(out_ch),
activation_layer(out_ch),
nn.UpsamplingBilinear2d(scale_factor=2)
)
class ConvBNReLU(nn.Sequential):
def __init__(self, in_planes, out_planes, kernel_size=3, stride=1, groups=1, norm_layer=None, activation_layer=None):
padding = (kernel_size - 1) // 2
super(ConvBNReLU, self).__init__(
nn.Conv2d(in_planes, out_planes, kernel_size, stride, padding, groups=groups, bias=False),
norm_layer(out_planes),
activation_layer(out_planes)
)
class InvertedResidual(nn.Module):
def __init__(self, inp, oup, stride, expand_ratio, norm_layer=None, activation_layer=None):
super(InvertedResidual, self).__init__()
self.stride = stride
assert stride in [1, 2]
hidden_dim = int(round(inp * expand_ratio))
self.use_res_connect = self.stride == 1 and inp == oup
layers = []
if expand_ratio != 1:
# pw
layers.append(ConvBNReLU(inp, hidden_dim, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer))
layers.extend([
# dw
ConvBNReLU(hidden_dim, hidden_dim, stride=stride, groups=hidden_dim, norm_layer=norm_layer, activation_layer=activation_layer),
# pw-linear
nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
norm_layer(oup),
])
self.conv = nn.Sequential(*layers)
def forward(self, x):
if self.use_res_connect:
return x + self.conv(x)
else:
return self.conv(x)
class LpNetWoConcat(nn.Module):
def __init__(self,
input_size,
joint_num,
input_channel = 48,
embedding_size = 2048,
width_mult=1.0,
round_nearest=8,
block=None,
norm_layer=None,
activation_layer=None,
inverted_residual_setting=None):
super(LpNetWoConcat, self).__init__()
assert input_size[1] in [256]
if block is None:
block = InvertedResidual
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if activation_layer is None:
activation_layer = nn.PReLU # PReLU does not have inplace True
if inverted_residual_setting is None:
inverted_residual_setting = [
# t, c, n, s
[1, 64, 1, 1], #[-1, 48, 256, 256]
[6, 48, 2, 2], #[-1, 48, 128, 128]
[6, 48, 3, 2], #[-1, 48, 64, 64]
[6, 64, 4, 2], #[-1, 64, 32, 32]
[6, 96, 3, 2], #[-1, 96, 16, 16]
[6, 160, 3, 2], #[-1, 160, 8, 8]
[6, 320, 1, 1], #[-1, 320, 8, 8]
]
# building first layer
input_channel = _make_divisible(input_channel * width_mult, round_nearest)
self.first_conv = ConvBNReLU(3, input_channel, stride=1, norm_layer=norm_layer, activation_layer=activation_layer)
inv_residual = []
# building inverted residual blocks
for t, c, n, s in inverted_residual_setting:
output_channel = _make_divisible(c * width_mult, round_nearest)
for i in range(n):
stride = s if i == 0 else 1
inv_residual.append(block(input_channel, output_channel, stride, expand_ratio=t, norm_layer=norm_layer, activation_layer=activation_layer))
input_channel = output_channel
# make it nn.Sequential
self.inv_residual = nn.Sequential(*inv_residual)
self.last_conv = ConvBNReLU(input_channel, embedding_size, kernel_size=1, norm_layer=norm_layer, activation_layer=activation_layer)
self.deconv0 = DeConv(embedding_size, _make_divisible(inverted_residual_setting[-2][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)
self.deconv1 = DeConv(256, _make_divisible(inverted_residual_setting[-3][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)
self.deconv2 = DeConv(256, _make_divisible(inverted_residual_setting[-4][-3] * width_mult, round_nearest), 256, norm_layer=norm_layer, activation_layer=activation_layer)
self.final_layer = nn.Conv2d(
in_channels=256,
out_channels= joint_num * 64,
kernel_size=1,
stride=1,
padding=0
)
def forward(self, x):
x = self.first_conv(x)
x = self.inv_residual(x)
x = self.last_conv(x)
x = self.deconv0(x)
x = self.deconv1(x)
x = self.deconv2(x)
x = self.final_layer(x)
return x
def init_weights(self):
for i in [self.deconv0, self.deconv1, self.deconv2]:
for name, m in i.named_modules():
if isinstance(m, nn.ConvTranspose2d):
nn.init.normal_(m.weight, std=0.001)
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
for j in [self.first_conv, self.inv_residual, self.last_conv, self.final_layer]:
for m in j.modules():
if isinstance(m, nn.Conv2d):
nn.init.normal_(m.weight, std=0.001)
if hasattr(m, 'bias'):
if m.bias is not None:
nn.init.constant_(m.bias, 0)
if __name__ == "__main__":
model = LpNetWoConcat((256, 256), 18)
test_data = torch.rand(1, 3, 256, 256)
test_outputs = model(test_data)
summary(model, (3, 256, 256))
================================================
FILE: common/base.py
================================================
import os
import os.path as osp
import math
import time
import glob
import abc
from torch.utils.data import DataLoader
import torch.optim
import torchvision.transforms as transforms
from timer import Timer
from logger import colorlogger
from torch.nn.parallel.data_parallel import DataParallel
from config import cfg
from model import get_pose_net
from dataset import DatasetLoader
from multiple_datasets import MultipleDatasets
# dynamic dataset import
for i in range(len(cfg.trainset_3d)):
exec('from ' + cfg.trainset_3d[i] + ' import ' + cfg.trainset_3d[i])
for i in range(len(cfg.trainset_2d)):
exec('from ' + cfg.trainset_2d[i] + ' import ' + cfg.trainset_2d[i])
exec('from ' + cfg.testset + ' import ' + cfg.testset)
class Base(object):
__metaclass__ = abc.ABCMeta
def __init__(self, log_name='logs.txt'):
self.cur_epoch = 0
# timer
self.tot_timer = Timer()
self.gpu_timer = Timer()
self.read_timer = Timer()
# logger
self.logger = colorlogger(cfg.log_dir, log_name=log_name)
@abc.abstractmethod
def _make_batch_generator(self):
return
@abc.abstractmethod
def _make_model(self):
return
def save_model(self, state, epoch):
file_path = osp.join(cfg.model_dir,'snapshot_{}.pth.tar'.format(str(epoch)))
torch.save(state, file_path)
self.logger.info("Write snapshot into {}".format(file_path))
def load_model(self, model, optimizer):
model_file_list = glob.glob(osp.join(cfg.model_dir,'*.pth.tar'))
cur_epoch = max([int(file_name[file_name.find('snapshot_') + 9 : file_name.find('.pth.tar')]) for file_name in model_file_list])
ckpt = torch.load(osp.join(cfg.model_dir, 'snapshot_' + str(cur_epoch) + '.pth.tar'))
start_epoch = ckpt['epoch'] + 1
model.load_state_dict(ckpt['network'])
optimizer.load_state_dict(ckpt['optimizer'])
return start_epoch, model, optimizer
class Trainer(Base):
def __init__(self, cfg):
super(Trainer, self).__init__(log_name = 'train_logs.txt')
self.backbone = cfg.backbone
def get_optimizer(self, model):
optimizer = torch.optim.Adam(model.parameters(), lr=cfg.lr)
return optimizer
def set_lr(self, epoch):
for e in cfg.lr_dec_epoch:
if epoch < e:
break
if epoch < cfg.lr_dec_epoch[-1]:
idx = cfg.lr_dec_epoch.index(e)
for g in self.optimizer.param_groups:
g['lr'] = cfg.lr / (cfg.lr_dec_factor ** idx)
else:
for g in self.optimizer.param_groups:
g['lr'] = cfg.lr / (cfg.lr_dec_factor ** len(cfg.lr_dec_epoch))
def get_lr(self):
for g in self.optimizer.param_groups:
cur_lr = g['lr']
return cur_lr
def _make_batch_generator(self):
# data load and construct batch generator
self.logger.info("Creating dataset...")
trainset3d_loader = []
for i in range(len(cfg.trainset_3d)):
if i > 0:
ref_joints_name = trainset3d_loader[0].joints_name
else:
ref_joints_name = None
trainset3d_loader.append(DatasetLoader(eval(cfg.trainset_3d[i])("train"), ref_joints_name, True, transforms.Compose([\
transforms.ToTensor(),
transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]\
)))
ref_joints_name = trainset3d_loader[0].joints_name
trainset2d_loader = []
for i in range(len(cfg.trainset_2d)):
trainset2d_loader.append(DatasetLoader(eval(cfg.trainset_2d[i])("train"), ref_joints_name, True, transforms.Compose([\
transforms.ToTensor(),
transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]\
)))
self.joint_num = trainset3d_loader[0].joint_num
trainset3d_loader = MultipleDatasets(trainset3d_loader, make_same_len=False)
if trainset2d_loader != []:
trainset2d_loader = MultipleDatasets(trainset2d_loader, make_same_len=False)
trainset_loader = MultipleDatasets([trainset3d_loader, trainset2d_loader], make_same_len=True)
else:
trainset_loader = MultipleDatasets([trainset3d_loader, ], make_same_len=True)
self.itr_per_epoch = math.ceil(len(trainset_loader) / cfg.num_gpus / cfg.batch_size)
self.batch_generator = DataLoader(dataset=trainset_loader, batch_size=cfg.num_gpus*cfg.batch_size, shuffle=True, num_workers=cfg.num_thread, pin_memory=True)
def _make_model(self):
# prepare network
self.logger.info("Creating graph and optimizer...")
model = get_pose_net(self.backbone, True, self.joint_num)
if torch.cuda.is_available():
model = DataParallel(model).cuda()
optimizer = self.get_optimizer(model)
if cfg.continue_train:
start_epoch, model, optimizer = self.load_model(model, optimizer)
else:
start_epoch = 0
model.train()
self.start_epoch = start_epoch
self.model = model
self.optimizer = optimizer
class Tester(Base):
def __init__(self, backbone):
self.backbone = backbone
super(Tester, self).__init__(log_name = 'test_logs.txt')
def _make_batch_generator(self):
# data load and construct batch generator
# self.logger.info("Creating dataset...")
testset = eval(cfg.testset)("test")
testset_loader = DatasetLoader(testset, None, False, transforms.Compose([\
transforms.ToTensor(),
transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)]\
))
batch_generator = DataLoader(dataset=testset_loader, batch_size=cfg.num_gpus*cfg.test_batch_size, shuffle=False, num_workers=cfg.num_thread, pin_memory=True)
self.testset = testset
self.joint_num = testset_loader.joint_num
self.skeleton = testset_loader.skeleton
self.flip_pairs = testset.flip_pairs
self.batch_generator = batch_generator
def _make_model(self, test_epoch):
self.test_epoch = test_epoch
model_path = os.path.join(cfg.model_dir, 'snapshot_%d.pth.tar' % self.test_epoch)
assert os.path.exists(model_path), 'Cannot find model at ' + model_path
# self.logger.info('Load checkpoint from {}'.format(model_path))
# prepare network
# self.logger.info("Creating graph...")
model = get_pose_net(self.backbone, False, self.joint_num)
model = DataParallel(model).cuda()
ckpt = torch.load(model_path)
model.load_state_dict(ckpt['network'])
model.eval()
self.model = model
def _evaluate(self, preds, result_save_path):
eval_summary = self.testset.evaluate(preds, result_save_path)
self.logger.info('{}'.format(eval_summary))
class Transformer(Base):
def __init__(self, backbone, jointnum, modelpath):
super(Transformer, self).__init__(log_name='transformer_logs.txt')
self.backbone = backbone
self.jointnum = jointnum
self.modelpath = modelpath
def _make_model(self):
# prepare network
self.logger.info("Creating graph and optimizer...")
model = get_pose_net(self.backbone, False, self.jointnum)
model = DataParallel(model).cuda()
model.load_state_dict(torch.load(self.modelpath)['network'])
single_pytorch_model = model.module
single_pytorch_model.eval()
self.model = single_pytorch_model
================================================
FILE: common/logger.py
================================================
import logging
import os
OK = '\033[92m'
WARNING = '\033[93m'
FAIL = '\033[91m'
END = '\033[0m'
PINK = '\033[95m'
BLUE = '\033[94m'
GREEN = OK
RED = FAIL
WHITE = END
YELLOW = WARNING
class colorlogger():
def __init__(self, log_dir, log_name='train_logs.txt'):
# set log
self._logger = logging.getLogger(log_name)
self._logger.setLevel(logging.INFO)
log_file = os.path.join(log_dir, log_name)
if not os.path.exists(log_dir):
os.makedirs(log_dir)
file_log = logging.FileHandler(log_file, mode='a')
file_log.setLevel(logging.INFO)
console_log = logging.StreamHandler()
console_log.setLevel(logging.INFO)
formatter = logging.Formatter(
"{}%(asctime)s{} %(message)s".format(GREEN, END),
"%m-%d %H:%M:%S")
file_log.setFormatter(formatter)
console_log.setFormatter(formatter)
self._logger.addHandler(file_log)
self._logger.addHandler(console_log)
def debug(self, msg):
self._logger.debug(str(msg))
def info(self, msg):
self._logger.info(str(msg))
def warning(self, msg):
self._logger.warning(WARNING + 'WRN: ' + str(msg) + END)
def critical(self, msg):
self._logger.critical(RED + 'CRI: ' + str(msg) + END)
def error(self, msg):
self._logger.error(RED + 'ERR: ' + str(msg) + END)
================================================
FILE: common/timer.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import time
class Timer(object):
"""A simple timer."""
def __init__(self):
self.total_time = 0.
self.calls = 0
self.start_time = 0.
self.diff = 0.
self.average_time = 0.
self.warm_up = 0
def tic(self):
# using time.time instead of time.clock because time time.clock
# does not normalize for multithreading
self.start_time = time.time()
def toc(self, average=True):
self.diff = time.time() - self.start_time
if self.warm_up < 10:
self.warm_up += 1
return self.diff
else:
self.total_time += self.diff
self.calls += 1
self.average_time = self.total_time / self.calls
if average:
return self.average_time
else:
return self.diff
================================================
FILE: common/utils/__init__.py
================================================
================================================
FILE: common/utils/dir_utils.py
================================================
import os
import sys
def make_folder(folder_name):
if not os.path.exists(folder_name):
os.makedirs(folder_name)
def add_pypath(path):
if path not in sys.path:
sys.path.insert(0, path)
================================================
FILE: common/utils/pose_utils.py
================================================
import torch
import numpy as np
from config import cfg
import copy
def cam2pixel(cam_coord, f, c):
x = cam_coord[:, 0] / (cam_coord[:, 2] + 1e-8) * f[0] + c[0]
y = cam_coord[:, 1] / (cam_coord[:, 2] + 1e-8) * f[1] + c[1]
z = cam_coord[:, 2]
img_coord = np.concatenate((x[:,None], y[:,None], z[:,None]),1)
return img_coord
def pixel2cam(pixel_coord, f, c):
x = (pixel_coord[:, 0] - c[0]) / f[0] * pixel_coord[:, 2]
y = (pixel_coord[:, 1] - c[1]) / f[1] * pixel_coord[:, 2]
z = pixel_coord[:, 2]
cam_coord = np.concatenate((x[:,None], y[:,None], z[:,None]),1)
return cam_coord
def world2cam(world_coord, R, t):
cam_coord = np.dot(R, world_coord.transpose(1,0)).transpose(1,0) + t.reshape(1,3)
return cam_coord
def rigid_transform_3D(A, B):
centroid_A = np.mean(A, axis = 0)
centroid_B = np.mean(B, axis = 0)
H = np.dot(np.transpose(A - centroid_A), B - centroid_B)
U, s, V = np.linalg.svd(H)
R = np.dot(np.transpose(V), np.transpose(U))
if np.linalg.det(R) < 0:
V[2] = -V[2]
R = np.dot(np.transpose(V), np.transpose(U))
t = -np.dot(R, np.transpose(centroid_A)) + np.transpose(centroid_B)
return R, t
def rigid_align(A, B):
R, t = rigid_transform_3D(A, B)
A2 = np.transpose(np.dot(R, np.transpose(A))) + t
return A2
def get_bbox(joint_img):
# bbox extract from keypoint coordinates
bbox = np.zeros((4))
xmin = np.min(joint_img[:,0])
ymin = np.min(joint_img[:,1])
xmax = np.max(joint_img[:,0])
ymax = np.max(joint_img[:,1])
width = xmax - xmin - 1
height = ymax - ymin - 1
bbox[0] = (xmin + xmax)/2. - width/2*1.2
bbox[1] = (ymin + ymax)/2. - height/2*1.2
bbox[2] = width*1.2
bbox[3] = height*1.2
return bbox
def process_bbox(bbox, width, height):
# sanitize bboxes
x, y, w, h = bbox
x1 = np.max((0, x))
y1 = np.max((0, y))
x2 = np.min((width - 1, x1 + np.max((0, w - 1))))
y2 = np.min((height - 1, y1 + np.max((0, h - 1))))
if w*h > 0 and x2 >= x1 and y2 >= y1:
bbox = np.array([x1, y1, x2-x1, y2-y1])
else:
return None
# aspect ratio preserving bbox
w = bbox[2]
h = bbox[3]
c_x = bbox[0] + w/2.
c_y = bbox[1] + h/2.
aspect_ratio = cfg.input_shape[1]/cfg.input_shape[0]
if w > aspect_ratio * h:
h = w / aspect_ratio
elif w < aspect_ratio * h:
w = h * aspect_ratio
bbox[2] = w*1.25
bbox[3] = h*1.25
bbox[0] = c_x - bbox[2]/2.
bbox[1] = c_y - bbox[3]/2.
return bbox
def transform_joint_to_other_db(src_joint, src_name, dst_name):
src_joint_num = len(src_name)
dst_joint_num = len(dst_name)
new_joint = np.zeros(((dst_joint_num,) + src_joint.shape[1:]))
for src_idx in range(len(src_name)):
name = src_name[src_idx]
if name in dst_name:
dst_idx = dst_name.index(name)
new_joint[dst_idx] = src_joint[src_idx]
return new_joint
def fliplr_joints(_joints, width, matched_parts):
"""
flip coords
joints: numpy array, nJoints * dim, dim == 2 [x, y] or dim == 3 [x, y, z]
width: image width
matched_parts: list of pairs
"""
joints = _joints.copy()
# Flip horizontal
joints[:, 0] = width - joints[:, 0] - 1
# Change left-right parts
for pair in matched_parts:
joints[pair[0], :], joints[pair[1], :] = joints[pair[1], :], joints[pair[0], :].copy()
return joints
def multi_meshgrid(*args):
"""
Creates a meshgrid from possibly many
elements (instead of only 2).
Returns a nd tensor with as many dimensions
as there are arguments
"""
args = list(args)
template = [1 for _ in args]
for i in range(len(args)):
n = args[i].shape[0]
template_copy = template.copy()
template_copy[i] = n
args[i] = args[i].view(*template_copy)
# there will be some broadcast magic going on
return tuple(args)
def flip(tensor, dims):
if not isinstance(dims, (tuple, list)):
dims = [dims]
indices = [torch.arange(tensor.shape[dim] - 1, -1, -1,
dtype=torch.int64) for dim in dims]
multi_indices = multi_meshgrid(*indices)
final_indices = [slice(i) for i in tensor.shape]
for i, dim in enumerate(dims):
final_indices[dim] = multi_indices[i]
flipped = tensor[final_indices]
assert flipped.device == tensor.device
assert flipped.requires_grad == tensor.requires_grad
return flipped
================================================
FILE: common/utils/vis.py
================================================
import os
import cv2
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import matplotlib as mpl
from config import cfg
def vis_keypoints(img, kps, kps_lines, kp_thresh=0.4, alpha=1):
# Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv.
cmap = plt.get_cmap('rainbow')
colors = [cmap(i) for i in np.linspace(0, 1, len(kps_lines) + 2)]
colors = [(c[2] * 255, c[1] * 255, c[0] * 255) for c in colors]
# Perform the drawing on a copy of the image, to allow for blending.
kp_mask = np.copy(img)
# Draw the keypoints.
for l in range(len(kps_lines)):
i1 = kps_lines[l][0]
i2 = kps_lines[l][1]
p1 = kps[0, i1].astype(np.int32), kps[1, i1].astype(np.int32)
p2 = kps[0, i2].astype(np.int32), kps[1, i2].astype(np.int32)
if kps[2, i1] > kp_thresh and kps[2, i2] > kp_thresh:
cv2.line(
kp_mask, p1, p2,
color=colors[l], thickness=2, lineType=cv2.LINE_AA)
if kps[2, i1] > kp_thresh:
cv2.circle(
kp_mask, p1,
radius=3, color=colors[l], thickness=-1, lineType=cv2.LINE_AA)
if kps[2, i2] > kp_thresh:
cv2.circle(
kp_mask, p2,
radius=3, color=colors[l], thickness=-1, lineType=cv2.LINE_AA)
# Blend the keypoints.
return cv2.addWeighted(img, 1.0 - alpha, kp_mask, alpha, 0)
def vis_3d_skeleton(kpt_3d, kpt_3d_vis, kps_lines, filename=None):
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv.
cmap = plt.get_cmap('rainbow')
colors = [cmap(i) for i in np.linspace(0, 1, len(kps_lines) + 2)]
colors = [np.array((c[2], c[1], c[0])) for c in colors]
for l in range(len(kps_lines)):
i1 = kps_lines[l][0]
i2 = kps_lines[l][1]
x = np.array([kpt_3d[i1,0], kpt_3d[i2,0]])
y = np.array([kpt_3d[i1,1], kpt_3d[i2,1]])
z = np.array([kpt_3d[i1,2], kpt_3d[i2,2]])
if kpt_3d_vis[i1,0] > 0 and kpt_3d_vis[i2,0] > 0:
ax.plot(x, z, -y, c=colors[l], linewidth=2)
if kpt_3d_vis[i1,0] > 0:
ax.scatter(kpt_3d[i1,0], kpt_3d[i1,2], -kpt_3d[i1,1], c=colors[l], marker='o')
if kpt_3d_vis[i2,0] > 0:
ax.scatter(kpt_3d[i2,0], kpt_3d[i2,2], -kpt_3d[i2,1], c=colors[l], marker='o')
if filename is None:
ax.set_title('3D vis')
else:
ax.set_title(filename)
ax.set_xlabel('X Label')
ax.set_ylabel('Z Label')
ax.set_zlabel('Y Label')
ax.legend()
plt.show()
cv2.waitKey(0)
def vis_3d_multiple_skeleton(kpt_3d, kpt_3d_vis, kps_lines, filename=None):
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
# Convert from plt 0-1 RGBA colors to 0-255 BGR colors for opencv.
cmap = plt.get_cmap('rainbow')
colors = [cmap(i) for i in np.linspace(0, 1, len(kps_lines) + 2)]
colors = [np.array((c[2], c[1], c[0])) for c in colors]
for l in range(len(kps_lines)):
i1 = kps_lines[l][0]
i2 = kps_lines[l][1]
person_num = kpt_3d.shape[0]
for n in range(person_num):
x = np.array([kpt_3d[n,i1,0], kpt_3d[n,i2,0]])
y = np.array([kpt_3d[n,i1,1], kpt_3d[n,i2,1]])
z = np.array([kpt_3d[n,i1,2], kpt_3d[n,i2,2]])
if kpt_3d_vis[n,i1,0] > 0 and kpt_3d_vis[n,i2,0] > 0:
ax.plot(x, z, -y, c=colors[l], linewidth=2)
if kpt_3d_vis[n,i1,0] > 0:
ax.scatter(kpt_3d[n,i1,0], kpt_3d[n,i1,2], -kpt_3d[n,i1,1], c=colors[l], marker='o')
if kpt_3d_vis[n,i2,0] > 0:
ax.scatter(kpt_3d[n,i2,0], kpt_3d[n,i2,2], -kpt_3d[n,i2,1], c=colors[l], marker='o')
if filename is None:
ax.set_title('3D vis')
else:
ax.set_title(filename)
ax.set_xlabel('X Label')
ax.set_ylabel('Z Label')
ax.set_zlabel('Y Label')
ax.legend()
plt.show()
cv2.waitKey(0)
================================================
FILE: data/Dummy/Dummy.py
================================================
import os
import os.path as osp
from pycocotools.coco import COCO
import numpy as np
from config import cfg
from utils.pose_utils import world2cam, cam2pixel, pixel2cam, rigid_align, process_bbox
import cv2
import random
import json
from utils.vis import vis_keypoints, vis_3d_skeleton
class Dummy:
def __init__(self, data_split):
self.data_split = data_split
self.img_dir = osp.join('data', 'Dummy', 'images')
self.annot_path = osp.join('data', 'Dummy', 'annotations')
self.human_bbox_root_dir = osp.join('data', 'Dummy', 'bbox_root', 'bbox_root_human36m_output.json')
self.joint_num = 18 # original:17, but manually added 'Thorax'
self.joints_name = ('Pelvis', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Torso', 'Neck', 'Nose', 'Head', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'Thorax')
self.flip_pairs = ( (1, 4), (2, 5), (3, 6), (14, 11), (15, 12), (16, 13) )
self.skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) )
self.joints_have_depth = True
self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16) # exclude Thorax
self.action_name = ['Directions', 'Discussion', 'Eating', 'Greeting', 'Phoning', 'Posing', 'Purchases', 'Sitting', 'SittingDown', 'Smoking', 'Photo', 'Waiting', 'Walking', 'WalkDog', 'WalkTogether']
self.root_idx = self.joints_name.index('Pelvis')
self.lshoulder_idx = self.joints_name.index('L_Shoulder')
self.rshoulder_idx = self.joints_name.index('R_Shoulder')
self.data = self.load_data()
def get_subsampling_ratio(self):
if self.data_split == 'train':
return 5
elif self.data_split == 'test':
return 64
else:
assert 0, print('Unknown subset')
def get_subject(self):
if self.data_split == 'train':
subject = [1]
elif self.data_split == 'test':
subject = [2]
else:
assert 0, print("Unknown subset")
return subject
def add_thorax(self, joint_coord):
thorax = (joint_coord[self.lshoulder_idx, :] + joint_coord[self.rshoulder_idx, :]) * 0.5
thorax = thorax.reshape((1, 3))
joint_coord = np.concatenate((joint_coord, thorax), axis=0)
return joint_coord
def load_data(self):
print('Load data of Dummy')
subject_list = self.get_subject()
sampling_ratio = self.get_subsampling_ratio()
# aggregate annotations from each subject
db = COCO()
cameras = {}
joints = {}
for subject in subject_list:
# data load
with open(osp.join(self.annot_path, 'Dummy_subject' + str(subject) + '_data.json'),'r') as f:
annot = json.load(f)
if len(db.dataset) == 0:
for k,v in annot.items():
db.dataset[k] = v
else:
for k,v in annot.items():
db.dataset[k] += v
# camera load
with open(osp.join(self.annot_path, 'Dummy_subject' + str(subject) + '_camera.json'),'r') as f:
cameras[str(subject)] = json.load(f)
# joint coordinate load
with open(osp.join(self.annot_path, 'Dummy_subject' + str(subject) + '_joint_3d.json'),'r') as f:
joints[str(subject)] = json.load(f)
db.createIndex()
if self.data_split == 'test' and not cfg.use_gt_info:
print("Get bounding box and root from " + self.human_bbox_root_dir)
bbox_root_result = {}
with open(self.human_bbox_root_dir) as f:
annot = json.load(f)
for i in range(len(annot)):
bbox_root_result[str(annot[i]['image_id'])] = {'bbox': np.array(annot[i]['bbox']), 'root': np.array(annot[i]['root_cam'])}
else:
print("Get bounding box and root from groundtruth")
data = []
for aid in db.anns.keys():
ann = db.anns[aid]
image_id = ann['image_id']
img = db.loadImgs(image_id)[0]
img_path = osp.join(self.img_dir, img['file_name'])
img_width, img_height = img['width'], img['height']
# check subject and frame_idx
subject = img['subject']; frame_idx = img['frame_idx'];
if subject not in subject_list:
continue
if frame_idx % sampling_ratio != 0:
continue
# camera parameter
cam_idx = img['cam_idx']
cam_param = cameras[str(subject)][str(cam_idx)]
R,t,f,c = np.array(cam_param['R'], dtype=np.float32), np.array(cam_param['t'], dtype=np.float32), np.array(cam_param['f'], dtype=np.float32), np.array(cam_param['c'], dtype=np.float32)
# project world coordinate to cam, image coordinate space
action_idx = img['action_idx']; subaction_idx = img['subaction_idx']; frame_idx = img['frame_idx'];
joint_world = np.array(joints[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)], dtype=np.float32)
joint_world = self.add_thorax(joint_world)
joint_cam = world2cam(joint_world, R, t)
joint_img = cam2pixel(joint_cam, f, c)
joint_img[:,2] = joint_img[:,2] - joint_cam[self.root_idx,2]
joint_vis = np.ones((self.joint_num,1))
if self.data_split == 'test' and not cfg.use_gt_info:
bbox = bbox_root_result[str(image_id)]['bbox'] # bbox should be aspect ratio preserved-extended. It is done in RootNet.
root_cam = bbox_root_result[str(image_id)]['root']
else:
bbox = process_bbox(np.array(ann['bbox']), img_width, img_height)
if bbox is None: continue
root_cam = joint_cam[self.root_idx]
data.append({
'img_path': img_path,
'img_id': image_id,
'bbox': bbox,
'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth]
'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate
'joint_vis': joint_vis,
'root_cam': root_cam, # [X, Y, Z] in camera coordinate
'f': f,
'c': c})
return data
def evaluate(self, preds, result_dir):
print('Evaluation start...')
gts = self.data
assert len(gts) == len(preds)
sample_num = len(gts)
pred_save = []
error = np.zeros((sample_num, self.joint_num-1)) # joint error
error_action = [ [] for _ in range(len(self.action_name)) ] # error for each sequence
for n in range(sample_num):
gt = gts[n]
image_id = gt['img_id']
f = gt['f']
c = gt['c']
bbox = gt['bbox']
gt_3d_root = gt['root_cam']
gt_3d_kpt = gt['joint_cam']
gt_vis = gt['joint_vis']
# restore coordinates to original space
pred_2d_kpt = preds[n].copy()
pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]
vis = False
if vis:
cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
filename = str(random.randrange(1,500))
tmpimg = cvimg.copy().astype(np.uint8)
tmpkps = np.zeros((3,self.joint_num))
tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1]
tmpkps[2,:] = 1
tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton)
cv2.imwrite(filename + '_output.jpg', tmpimg)
# back project to camera coordinate system
pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c)
# root joint alignment
pred_3d_kpt = pred_3d_kpt - pred_3d_kpt[self.root_idx]
gt_3d_kpt = gt_3d_kpt - gt_3d_kpt[self.root_idx]
pred_3d_kpt = rigid_align(pred_3d_kpt, gt_3d_kpt)
# exclude thorax
pred_3d_kpt = np.take(pred_3d_kpt, self.eval_joint, axis=0)
gt_3d_kpt = np.take(gt_3d_kpt, self.eval_joint, axis=0)
# error calculate
error[n] = np.sqrt(np.sum((pred_3d_kpt - gt_3d_kpt)**2,1))
img_name = gt['img_path']
action_idx = int(img_name[img_name.find('act')+4:img_name.find('act')+6]) - 2
error_action[action_idx].append(error[n].copy())
# prediction save
pred_save.append({'image_id': image_id, 'joint_cam': pred_3d_kpt.tolist(), 'bbox': bbox.tolist(), 'root_cam': gt_3d_root.tolist()}) # joint_cam is root-relative coordinate
# total error
tot_err = np.mean(error)
metric = 'PA MPJPE'
eval_summary = 'Protocol 1' + ' error (' + metric + ') >> tot: %.2f\n' % (tot_err)
# error for each action
for i in range(len(error_action)):
err = np.mean(np.array(error_action[i]))
eval_summary += (self.action_name[i] + ': %.2f ' % err)
print(eval_summary)
# prediction save
output_path = osp.join(result_dir, 'bbox_root_pose_dummy_output.json')
with open(output_path, 'w') as f:
json.dump(pred_save, f)
print("Test result is saved at " + output_path)
return eval_summary
================================================
FILE: data/Dummy/annotations/Dummy_subject1_camera.json
================================================
{"1": {"R": [[-0.9059013006181885, 0.4217144115102914, 0.038727105014486805], [0.044493184429779696, 0.1857199061874203, -0.9815948619389944], [-0.4211450938543295, -0.8875049698848251, -0.1870073216538954]], "t": [-234.7208032216618, 464.34018262882194, 5536.652631113797], "f": [1145.04940458804, 1143.78109572365], "c": [512.541504956548, 515.4514869776]}, "2": {"R": [[0.9216646531492915, 0.3879848687925067, -0.0014172943441045224], [0.07721054863099915, -0.18699239961454955, -0.979322405373477], [-0.3802272982247548, 0.9024974149959955, -0.20230080971229314]], "t": [-11.934348472090557, 449.4165893644565, 5541.113551868937], "f": [1149.67569986785, 1147.59161666764], "c": [508.848621645943, 508.064917088557]}, "3": {"R": [[-0.9063540572469627, -0.42053101768163204, -0.04093880896680188], [-0.0603212197838846, 0.22468715090881142, -0.9725620980997899], [0.4181909532208387, -0.8790161246439863, -0.2290130547809762]], "t": [781.127357651581, 235.3131620173424, 5576.37044019807], "f": [1149.14071676148, 1148.7989685676], "c": [519.815837182153, 501.402658888552]}, "4": {"R": [[0.91754082476548, -0.39226322025776267, 0.06517975852741943], [-0.04531905395586976, -0.26600517028098103, -0.9629057236990188], [0.395050652748768, 0.8805514269006645, -0.2618476013752581]], "t": [-155.13650339749012, 422.16256306729633, 4435.416222660868], "f": [1145.51133842318, 1144.77392807652], "c": [514.968197319863, 501.882018537695]}}
================================================
FILE: data/Dummy/annotations/Dummy_subject1_data.json
================================================
{"images": [{"id": 1877420, "file_name": "s_11_act_02_subact_01_ca_01/s_11_act_02_subact_01_ca_01_000001.jpg", "width": 1000, "height": 1002, "subject": 1, "action_name": "Directions", "action_idx": 2, "subaction_idx": 1, "cam_idx": 1, "frame_idx": 0}], "annotations": [{"id": 1877420, "image_id": 1877420, "keypoints_vis": [true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true, true], "bbox": [304.0201284041609, 222.305917169553, 328.1488619190915, 412.150330355609]}]}
================================================
FILE: data/Dummy/annotations/Dummy_subject1_joint_3d.json
================================================
{"2": {"1": {"0": [[-47.24769973754883, -81.04920196533203, 987.9080200195312], [-184.4625244140625, -69.55330657958984, 999.5223999023438], [-199.22152709960938, -72.29781341552734, 537.8258666992188], [-177.2645721435547, 44.52031326293945, 93.21685028076172], [89.96746063232422, -92.54512023925781, 976.2935791015625], [97.17977142333984, -81.16199493408203, 514.5499877929688], [82.85128784179688, 34.8104248046875, 69.40837097167969], [-52.695899963378906, -77.56897735595703, 1242.206298828125], [-49.09817886352539, -73.6445083618164, 1492.0970458984375], [-71.0900650024414, -139.2397003173828, 1579.0076904296875], [-71.68211364746094, -92.79254150390625, 1684.2078857421875], [116.02037811279297, -63.403587341308594, 1509.3262939453125], [396.226318359375, -72.48757934570312, 1469.46826171875], [633.7438354492188, -144.6726837158203, 1475.2344970703125], [-211.36859130859375, -37.4464111328125, 1487.2081298828125], [-487.9529724121094, -1.2391146421432495, 1438.4637451171875], [-727.43798828125, -60.458595275878906, 1466.75244140625]]}}}
================================================
FILE: data/Dummy/bbox_root/bbox_dummy_output.json
================================================
[{"image_id": 1877420, "category_id": 1, "bbox": [309.1705017089844, 252.84469604492188, 326.1686096191406, 368.1951599121094], "score": 0.9997870326042175}]
================================================
FILE: data/Human36M/Human36M.py
================================================
import os
import os.path as osp
from pycocotools.coco import COCO
import numpy as np
from config import cfg
from utils.pose_utils import world2cam, cam2pixel, pixel2cam, rigid_align, process_bbox
import cv2
import random
import json
from utils.vis import vis_keypoints, vis_3d_skeleton
class Human36M:
def __init__(self, data_split):
self.data_split = data_split
self.img_dir = osp.join('/', 'data', 'Human36M', 'images')
self.annot_path = osp.join('/', 'data', 'Human36M', 'annotations')
self.human_bbox_root_dir = osp.join('/', 'data', 'Human36M', 'bbox_root', 'bbox_root_human36m_output.json')
self.joint_num = 18 # original:17, but manually added 'Thorax'
self.joints_name = ('Pelvis', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Torso', 'Neck', 'Nose', 'Head', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'Thorax')
self.flip_pairs = ( (1, 4), (2, 5), (3, 6), (14, 11), (15, 12), (16, 13) )
self.skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) )
self.joints_have_depth = True
self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16) # exclude Thorax
self.action_name = ['Directions', 'Discussion', 'Eating', 'Greeting', 'Phoning', 'Posing', 'Purchases', 'Sitting', 'SittingDown', 'Smoking', 'Photo', 'Waiting', 'Walking', 'WalkDog', 'WalkTogether']
self.root_idx = self.joints_name.index('Pelvis')
self.lshoulder_idx = self.joints_name.index('L_Shoulder')
self.rshoulder_idx = self.joints_name.index('R_Shoulder')
self.protocol = 2
self.data = self.load_data()
def get_subsampling_ratio(self):
if self.data_split == 'train':
return 5
elif self.data_split == 'test':
return 64
else:
assert 0, print('Unknown subset')
def get_subject(self):
if self.data_split == 'train':
if self.protocol == 1:
subject = [1,5,6,7,8,9]
elif self.protocol == 2:
subject = [1,5,6,7,8]
elif self.data_split == 'test':
if self.protocol == 1:
subject = [11]
elif self.protocol == 2:
subject = [9,11]
else:
assert 0, print("Unknown subset")
return subject
def add_thorax(self, joint_coord):
thorax = (joint_coord[self.lshoulder_idx, :] + joint_coord[self.rshoulder_idx, :]) * 0.5
thorax = thorax.reshape((1, 3))
joint_coord = np.concatenate((joint_coord, thorax), axis=0)
return joint_coord
def load_data(self):
print('Load data of H36M Protocol ' + str(self.protocol))
subject_list = self.get_subject()
sampling_ratio = self.get_subsampling_ratio()
# aggregate annotations from each subject
db = COCO()
cameras = {}
joints = {}
for subject in subject_list:
# data load
with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_data.json'),'r') as f:
annot = json.load(f)
if len(db.dataset) == 0:
for k,v in annot.items():
db.dataset[k] = v
else:
for k,v in annot.items():
db.dataset[k] += v
# camera load
with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_camera.json'),'r') as f:
cameras[str(subject)] = json.load(f)
# joint coordinate load
with open(osp.join(self.annot_path, 'Human36M_subject' + str(subject) + '_joint_3d.json'),'r') as f:
joints[str(subject)] = json.load(f)
db.createIndex()
if self.data_split == 'test' and not cfg.use_gt_info:
print("Get bounding box and root from " + self.human_bbox_root_dir)
bbox_root_result = {}
with open(self.human_bbox_root_dir) as f:
annot = json.load(f)
for i in range(len(annot)):
bbox_root_result[str(annot[i]['image_id'])] = {'bbox': np.array(annot[i]['bbox']), 'root': np.array(annot[i]['root_cam'])}
else:
print("Get bounding box and root from groundtruth")
data = []
for aid in db.anns.keys():
ann = db.anns[aid]
image_id = ann['image_id']
img = db.loadImgs(image_id)[0]
img_path = osp.join(self.img_dir, img['file_name'])
img_width, img_height = img['width'], img['height']
# check subject and frame_idx
subject = img['subject']; frame_idx = img['frame_idx'];
if subject not in subject_list:
continue
if frame_idx % sampling_ratio != 0:
continue
# camera parameter
cam_idx = img['cam_idx']
cam_param = cameras[str(subject)][str(cam_idx)]
R,t,f,c = np.array(cam_param['R'], dtype=np.float32), np.array(cam_param['t'], dtype=np.float32), np.array(cam_param['f'], dtype=np.float32), np.array(cam_param['c'], dtype=np.float32)
# project world coordinate to cam, image coordinate space
action_idx = img['action_idx']; subaction_idx = img['subaction_idx']; frame_idx = img['frame_idx'];
joint_world = np.array(joints[str(subject)][str(action_idx)][str(subaction_idx)][str(frame_idx)], dtype=np.float32)
joint_world = self.add_thorax(joint_world)
joint_cam = world2cam(joint_world, R, t)
joint_img = cam2pixel(joint_cam, f, c)
joint_img[:,2] = joint_img[:,2] - joint_cam[self.root_idx,2]
joint_vis = np.ones((self.joint_num,1))
if self.data_split == 'test' and not cfg.use_gt_info:
bbox = bbox_root_result[str(image_id)]['bbox'] # bbox should be aspect ratio preserved-extended. It is done in RootNet.
root_cam = bbox_root_result[str(image_id)]['root']
else:
bbox = process_bbox(np.array(ann['bbox']), img_width, img_height)
if bbox is None: continue
root_cam = joint_cam[self.root_idx]
data.append({
'img_path': img_path,
'img_id': image_id,
'bbox': bbox,
'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth]
'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate
'joint_vis': joint_vis,
'root_cam': root_cam, # [X, Y, Z] in camera coordinate
'f': f,
'c': c})
return data
def evaluate(self, preds, result_dir):
print('Evaluation start...')
gts = self.data
assert len(gts) == len(preds)
sample_num = len(gts)
pred_save = []
error = np.zeros((sample_num, self.joint_num-1)) # joint error
error_action = [ [] for _ in range(len(self.action_name)) ] # error for each sequence
for n in range(sample_num):
gt = gts[n]
image_id = gt['img_id']
f = gt['f']
c = gt['c']
bbox = gt['bbox']
gt_3d_root = gt['root_cam']
gt_3d_kpt = gt['joint_cam']
gt_vis = gt['joint_vis']
# restore coordinates to original space
pred_2d_kpt = preds[n].copy()
pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]
vis = False
if vis:
cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
filename = str(random.randrange(1,500))
tmpimg = cvimg.copy().astype(np.uint8)
tmpkps = np.zeros((3,self.joint_num))
tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1]
tmpkps[2,:] = 1
tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton)
cv2.imwrite(filename + '_output.jpg', tmpimg)
# back project to camera coordinate system
pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c)
# root joint alignment
pred_3d_kpt = pred_3d_kpt - pred_3d_kpt[self.root_idx]
gt_3d_kpt = gt_3d_kpt - gt_3d_kpt[self.root_idx]
if self.protocol == 1:
# rigid alignment for PA MPJPE (protocol #1)
pred_3d_kpt = rigid_align(pred_3d_kpt, gt_3d_kpt)
# exclude thorax
pred_3d_kpt = np.take(pred_3d_kpt, self.eval_joint, axis=0)
gt_3d_kpt = np.take(gt_3d_kpt, self.eval_joint, axis=0)
# error calculate
error[n] = np.sqrt(np.sum((pred_3d_kpt - gt_3d_kpt)**2,1))
img_name = gt['img_path']
action_idx = int(img_name[img_name.find('act')+4:img_name.find('act')+6]) - 2
error_action[action_idx].append(error[n].copy())
# prediction save
pred_save.append({'image_id': image_id, 'joint_cam': pred_3d_kpt.tolist(), 'bbox': bbox.tolist(), 'root_cam': gt_3d_root.tolist()}) # joint_cam is root-relative coordinate
# total error
tot_err = np.mean(error)
metric = 'PA MPJPE' if self.protocol == 1 else 'MPJPE'
eval_summary = 'Protocol ' + str(self.protocol) + ' error (' + metric + ') >> tot: %.2f\n' % (tot_err)
# error for each action
for i in range(len(error_action)):
err = np.mean(np.array(error_action[i]))
eval_summary += (self.action_name[i] + ': %.2f ' % err)
print(eval_summary)
# prediction save
output_path = osp.join(result_dir, 'bbox_root_pose_human36m_output.json')
with open(output_path, 'w') as f:
json.dump(pred_save, f)
print("Test result is saved at " + output_path)
return eval_summary
================================================
FILE: data/MPII/MPII.py
================================================
import os
import os.path as osp
import numpy as np
from pycocotools.coco import COCO
from utils.pose_utils import process_bbox
from config import cfg
class MPII:
def __init__(self, data_split):
self.data_split = data_split
self.img_dir = osp.join('/', 'data', 'MPII')
self.train_annot_path = osp.join('/', 'data', 'MPII', 'annotations', 'train.json')
self.joint_num = 16
self.joints_name = ('R_Ankle', 'R_Knee', 'R_Hip', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Thorax', 'Neck', 'Head', 'R_Wrist', 'R_Elbow', 'R_Shoulder', 'L_Shoulder', 'L_Elbow', 'L_Wrist')
self.flip_pairs = ( (0, 5), (1, 4), (2, 3), (10, 15), (11, 14), (12, 13) )
self.skeleton = ( (0, 1), (1, 2), (2, 6), (7, 12), (12, 11), (11, 10), (5, 4), (4, 3), (3, 6), (7, 13), (13, 14), (14, 15), (6, 7), (7, 8), (8, 9) )
self.joints_have_depth = False
self.data = self.load_data()
def load_data(self):
if self.data_split == 'train':
db = COCO(self.train_annot_path)
else:
print('Unknown data subset')
assert 0
data = []
for aid in db.anns.keys():
ann = db.anns[aid]
img = db.loadImgs(ann['image_id'])[0]
width, height = img['width'], img['height']
if ann['num_keypoints'] == 0:
continue
bbox = process_bbox(ann['bbox'], width, height)
if bbox is None: continue
# joints and vis
joint_img = np.array(ann['keypoints']).reshape(self.joint_num,3)
joint_vis = joint_img[:,2].copy().reshape(-1,1)
joint_img[:,2] = 0
imgname = img['file_name']
img_path = osp.join(self.img_dir, imgname)
data.append({
'img_path': img_path,
'bbox': bbox,
'joint_img': joint_img, # [org_img_x, org_img_y, 0]
'joint_vis': joint_vis,
})
return data
================================================
FILE: data/MSCOCO/MSCOCO.py
================================================
import os
import os.path as osp
import numpy as np
from pycocotools.coco import COCO
from config import cfg
import scipy.io as sio
import json
import cv2
import random
import math
from utils.pose_utils import pixel2cam, process_bbox
from utils.vis import vis_keypoints, vis_3d_skeleton
class MSCOCO:
def __init__(self, data_split):
self.data_split = data_split
self.img_dir = osp.join('/','home', 'centos', 'datasets', 'coco', 'images')
self.train_annot_path = osp.join('/','home', 'centos', 'datasets', 'coco', 'annotations', 'person_keypoints_train2017.json')
self.test_annot_path = osp.join('/','home', 'centos', 'datasets', 'coco', 'annotations', 'person_keypoints_val2017.json')
self.human_3d_bbox_root_dir = osp.join('/', 'home', 'centos','datasets', 'coco', 'bbox_root', 'bbox_root_coco_output.json')
if self.data_split == 'train':
self.joint_num = 19 # original: 17, but manually added 'Thorax', 'Pelvis'
self.joints_name = ('Nose', 'L_Eye', 'R_Eye', 'L_Ear', 'R_Ear', 'L_Shoulder', 'R_Shoulder', 'L_Elbow', 'R_Elbow', 'L_Wrist', 'R_Wrist', 'L_Hip', 'R_Hip', 'L_Knee', 'R_Knee', 'L_Ankle', 'R_Ankle', 'Thorax', 'Pelvis')
self.flip_pairs = ( (1, 2), (3, 4), (5, 6), (7, 8), (9, 10), (11, 12), (13, 14), (15, 16) )
self.skeleton = ( (1, 2), (0, 1), (0, 2), (2, 4), (1, 3), (6, 8), (8, 10), (5, 7), (7, 9), (12, 14), (14, 16), (11, 13), (13, 15), (5, 6), (11, 12) )
self.joints_have_depth = False
self.lshoulder_idx = self.joints_name.index('L_Shoulder')
self.rshoulder_idx = self.joints_name.index('R_Shoulder')
self.lhip_idx = self.joints_name.index('L_Hip')
self.rhip_idx = self.joints_name.index('R_Hip')
else:
## testing settings (when test model trained on the MuCo-3DHP dataset)
self.joint_num = 21 # MuCo-3DHP
self.joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe') # MuCo-3DHP
self.original_joint_num = 17 # MuPoTS
self.original_joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head') # MuPoTS
self.flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13) )
self.skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (11, 12), (12, 13), (1, 2), (2, 3), (3, 4), (1, 5), (5, 6), (6, 7) )
self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
self.joints_have_depth = False
self.data = self.load_data()
def load_data(self):
if self.data_split == 'train':
db = COCO(self.train_annot_path)
data = []
for aid in db.anns.keys():
ann = db.anns[aid]
img = db.loadImgs(ann['image_id'])[0]
width, height = img['width'], img['height']
if (ann['image_id'] not in db.imgs) or ann['iscrowd'] or (ann['num_keypoints'] == 0):
continue
bbox = process_bbox(ann['bbox'], width, height)
if bbox is None: continue
# joints and vis
joint_img = np.array(ann['keypoints']).reshape(-1,3)
# add Thorax
thorax = (joint_img[self.lshoulder_idx, :] + joint_img[self.rshoulder_idx, :]) * 0.5
thorax[2] = joint_img[self.lshoulder_idx,2] * joint_img[self.rshoulder_idx,2]
thorax = thorax.reshape((1, 3))
# add Pelvis
pelvis = (joint_img[self.lhip_idx, :] + joint_img[self.rhip_idx, :]) * 0.5
pelvis[2] = joint_img[self.lhip_idx,2] * joint_img[self.rhip_idx,2]
pelvis = pelvis.reshape((1, 3))
joint_img = np.concatenate((joint_img, thorax, pelvis), axis=0)
joint_vis = (joint_img[:,2].copy().reshape(-1,1) > 0)
joint_img[:,2] = 0
imgname = osp.join('train2017', db.imgs[ann['image_id']]['file_name'])
img_path = osp.join(self.img_dir, imgname)
data.append({
'img_path': img_path,
'bbox': bbox,
'joint_img': joint_img, # [org_img_x, org_img_y, 0]
'joint_vis': joint_vis,
'f': np.array([1500, 1500]),
'c': np.array([width/2, height/2])
})
elif self.data_split == 'test':
db = COCO(self.test_annot_path)
with open(self.human_3d_bbox_root_dir) as f:
annot = json.load(f)
data = []
for i in range(len(annot)):
image_id = annot[i]['image_id']
img = db.loadImgs(image_id)[0]
img_path = osp.join(self.img_dir, 'val2017', img['file_name'])
fx, fy, cx, cy = 1500, 1500, img['width']/2, img['height']/2
f = np.array([fx, fy]); c = np.array([cx, cy]);
root_cam = np.array(annot[i]['root_cam']).reshape(3)
bbox = np.array(annot[i]['bbox']).reshape(4)
data.append({
'img_path': img_path,
'bbox': bbox,
'joint_img': np.zeros((self.original_joint_num, 3)), # dummy
'joint_cam': np.zeros((self.original_joint_num, 3)), # dummy
'joint_vis': np.zeros((self.original_joint_num, 1)), # dummy
'root_cam': root_cam, # [X, Y, Z] in camera coordinate
'f': f,
'c': c,
})
else:
print('Unknown data subset')
assert 0
return data
def evaluate(self, preds, result_dir):
print('Evaluation start...')
gts = self.data
sample_num = len(preds)
joint_num = self.original_joint_num
pred_2d_save = {}
pred_3d_save = {}
for n in range(sample_num):
gt = gts[n]
f = gt['f']
c = gt['c']
bbox = gt['bbox']
gt_3d_root = gt['root_cam']
img_name = gt['img_path'].split('/')
img_name = 'coco_' + img_name[-1].split('.')[0] # e.g., coco_00000000
# restore coordinates to original space
pred_2d_kpt = preds[n].copy()
# only consider eval_joint
pred_2d_kpt = np.take(pred_2d_kpt, self.eval_joint, axis=0)
pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]
# 2d kpt save
if img_name in pred_2d_save:
pred_2d_save[img_name].append(pred_2d_kpt[:,:2])
else:
pred_2d_save[img_name] = [pred_2d_kpt[:,:2]]
vis = False
if vis:
cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
filename = str(random.randrange(1,500))
tmpimg = cvimg.copy().astype(np.uint8)
tmpkps = np.zeros((3,joint_num))
tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1]
tmpkps[2,:] = 1
tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton)
cv2.imwrite(filename + '_output.jpg', tmpimg)
# back project to camera coordinate system
pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c)
# 3d kpt save
if img_name in pred_3d_save:
pred_3d_save[img_name].append(pred_3d_kpt)
else:
pred_3d_save[img_name] = [pred_3d_kpt]
output_path = osp.join(result_dir,'preds_2d_kpt_coco.mat')
sio.savemat(output_path, pred_2d_save)
print("Testing result is saved at " + output_path)
output_path = osp.join(result_dir,'preds_3d_kpt_coco.mat')
sio.savemat(output_path, pred_3d_save)
print("Testing result is saved at " + output_path)
================================================
FILE: data/MuCo/MuCo.py
================================================
import os
import os.path as osp
import numpy as np
import math
from utils.pose_utils import process_bbox
from pycocotools.coco import COCO
from config import cfg
class MuCo:
def __init__(self, data_split):
self.data_split = data_split
self.img_dir = osp.join('/', 'home', 'centos', 'datasets', 'MuCo')
self.train_annot_path = osp.join('/', 'home', 'centos', 'datasets', 'MuCo', 'MuCo-3DHP.json')
self.joint_num = 21
self.joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe')
self.flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) )
self.skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) )
self.joints_have_depth = True
self.root_idx = self.joints_name.index('Pelvis')
self.data = self.load_data()
def load_data(self):
if self.data_split == 'train':
db = COCO(self.train_annot_path)
else:
print('Unknown data subset')
assert 0
data = []
for iid in db.imgs.keys():
img = db.imgs[iid]
img_id = img["id"]
img_width, img_height = img['width'], img['height']
imgname = img['file_name']
img_path = osp.join(self.img_dir, imgname)
f = img["f"]
c = img["c"]
# crop the closest person to the camera
ann_ids = db.getAnnIds(img_id)
anns = db.loadAnns(ann_ids)
root_depths = [ann['keypoints_cam'][self.root_idx][2] for ann in anns]
closest_pid = root_depths.index(min(root_depths))
pid_list = [closest_pid]
for i in range(len(anns)):
if i == closest_pid:
continue
picked = True
for j in range(len(anns)):
if i == j:
continue
dist = (np.array(anns[i]['keypoints_cam'][self.root_idx]) - np.array(anns[j]['keypoints_cam'][self.root_idx])) ** 2
dist_2d = math.sqrt(np.sum(dist[:2]))
dist_3d = math.sqrt(np.sum(dist))
if dist_2d < 500 or dist_3d < 500:
picked = False
if picked:
pid_list.append(i)
for pid in pid_list:
joint_cam = np.array(anns[pid]['keypoints_cam'])
root_cam = joint_cam[self.root_idx]
joint_img = np.array(anns[pid]['keypoints_img'])
joint_img = np.concatenate([joint_img, joint_cam[:,2:]],1)
joint_img[:,2] = joint_img[:,2] - root_cam[2]
joint_vis = np.ones((self.joint_num,1))
bbox = process_bbox(anns[pid]['bbox'], img_width, img_height)
if bbox is None: continue
data.append({
'img_path': img_path,
'bbox': bbox,
'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth]
'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate
'joint_vis': joint_vis,
'root_cam': root_cam, # [X, Y, Z] in camera coordinate
'f': f,
'c': c
})
return data
================================================
FILE: data/MuPoTS/MuPoTS.py
================================================
import os
import os.path as osp
import scipy.io as sio
import numpy as np
from pycocotools.coco import COCO
from config import cfg
import json
import cv2
import random
import math
from utils.pose_utils import pixel2cam, process_bbox
from utils.vis import vis_keypoints, vis_3d_skeleton
class MuPoTS:
def __init__(self, data_split):
self.data_split = data_split
self.img_dir = osp.join('/', 'data', 'MuPoTS', 'data', 'MultiPersonTestSet')
self.test_annot_path = osp.join('/', 'data', 'MuPoTS', 'data', 'MuPoTS-3D.json')
self.human_bbox_root_dir = osp.join('/', 'data', 'MuPoTS', 'bbox_root', 'bbox_root_mupots_output.json')
self.joint_num = 21 # MuCo-3DHP
self.joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe') # MuCo-3DHP
self.original_joint_num = 17 # MuPoTS
self.original_joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head') # MuPoTS
self.flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13) )
self.skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (11, 12), (12, 13), (1, 2), (2, 3), (3, 4), (1, 5), (5, 6), (6, 7) )
self.eval_joint = (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
self.joints_have_depth = True
self.root_idx = self.joints_name.index('Pelvis')
self.data = self.load_data()
def load_data(self):
if self.data_split != 'test':
print('Unknown data subset')
assert 0
data = []
db = COCO(self.test_annot_path)
# use gt bbox and root
if cfg.use_gt_info:
print("Get bounding box and root from groundtruth")
for aid in db.anns.keys():
ann = db.anns[aid]
if ann['is_valid'] == 0:
continue
image_id = ann['image_id']
img = db.loadImgs(image_id)[0]
img_path = osp.join(self.img_dir, img['file_name'])
fx, fy, cx, cy = img['intrinsic']
f = np.array([fx, fy]); c = np.array([cx, cy]);
joint_cam = np.array(ann['keypoints_cam'])
root_cam = joint_cam[self.root_idx]
joint_img = np.array(ann['keypoints_img'])
joint_img = np.concatenate([joint_img, joint_cam[:,2:]],1)
joint_img[:,2] = joint_img[:,2] - root_cam[2]
joint_vis = np.ones((self.original_joint_num,1))
bbox = np.array(ann['bbox'])
img_width, img_height = img['width'], img['height']
bbox = process_bbox(bbox, img_width, img_height)
if bbox is None: continue
data.append({
'img_path': img_path,
'bbox': bbox,
'joint_img': joint_img, # [org_img_x, org_img_y, depth - root_depth]
'joint_cam': joint_cam, # [X, Y, Z] in camera coordinate
'joint_vis': joint_vis,
'root_cam': root_cam, # [X, Y, Z] in camera coordinate
'f': f,
'c': c,
})
else:
print("Get bounding box and root from " + self.human_bbox_root_dir)
with open(self.human_bbox_root_dir) as f:
annot = json.load(f)
for i in range(len(annot)):
image_id = annot[i]['image_id']
img = db.loadImgs(image_id)[0]
img_width, img_height = img['width'], img['height']
img_path = osp.join(self.img_dir, img['file_name'])
fx, fy, cx, cy = img['intrinsic']
f = np.array([fx, fy]); c = np.array([cx, cy]);
root_cam = np.array(annot[i]['root_cam']).reshape(3)
bbox = np.array(annot[i]['bbox']).reshape(4)
data.append({
'img_path': img_path,
'bbox': bbox,
'joint_img': np.zeros((self.original_joint_num, 3)), # dummy
'joint_cam': np.zeros((self.original_joint_num, 3)), # dummy
'joint_vis': np.zeros((self.original_joint_num, 1)), # dummy
'root_cam': root_cam, # [X, Y, Z] in camera coordinate
'f': f,
'c': c,
})
return data
def evaluate(self, preds, result_dir):
print('Evaluation start...')
gts = self.data
sample_num = len(preds)
joint_num = self.original_joint_num
pred_2d_save = {}
pred_3d_save = {}
for n in range(sample_num):
gt = gts[n]
f = gt['f']
c = gt['c']
bbox = gt['bbox']
gt_3d_root = gt['root_cam']
img_name = gt['img_path'].split('/')
img_name = img_name[-2] + '_' + img_name[-1].split('.')[0] # e.g., TS1_img_0001
# restore coordinates to original space
pred_2d_kpt = preds[n].copy()
# only consider eval_joint
pred_2d_kpt = np.take(pred_2d_kpt, self.eval_joint, axis=0)
pred_2d_kpt[:,0] = pred_2d_kpt[:,0] / cfg.output_shape[1] * bbox[2] + bbox[0]
pred_2d_kpt[:,1] = pred_2d_kpt[:,1] / cfg.output_shape[0] * bbox[3] + bbox[1]
pred_2d_kpt[:,2] = (pred_2d_kpt[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + gt_3d_root[2]
# 2d kpt save
if img_name in pred_2d_save:
pred_2d_save[img_name].append(pred_2d_kpt[:,:2])
else:
pred_2d_save[img_name] = [pred_2d_kpt[:,:2]]
vis = False
if vis:
cvimg = cv2.imread(gt['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
filename = str(random.randrange(1,500))
tmpimg = cvimg.copy().astype(np.uint8)
tmpkps = np.zeros((3,joint_num))
tmpkps[0,:], tmpkps[1,:] = pred_2d_kpt[:,0], pred_2d_kpt[:,1]
tmpkps[2,:] = 1
tmpimg = vis_keypoints(tmpimg, tmpkps, self.skeleton)
cv2.imwrite(filename + '_output.jpg', tmpimg)
# back project to camera coordinate system
pred_3d_kpt = pixel2cam(pred_2d_kpt, f, c)
# 3d kpt save
if img_name in pred_3d_save:
pred_3d_save[img_name].append(pred_3d_kpt)
else:
pred_3d_save[img_name] = [pred_3d_kpt]
output_path = osp.join(result_dir,'preds_2d_kpt_mupots.mat')
sio.savemat(output_path, pred_2d_save)
print("Testing result is saved at " + output_path)
output_path = osp.join(result_dir,'preds_3d_kpt_mupots.mat')
sio.savemat(output_path, pred_3d_save)
print("Testing result is saved at " + output_path)
================================================
FILE: data/MuPoTS/mpii_mupots_multiperson_eval.m
================================================
function mpii_mupots_multiperson_eval(eval_mode, is_relative)
% eval_mode: EVLAUATION_MODE
% is_relative: 1: root-relative 3D multi-person pose estimation, 0: absolute 3D multi-person pose estimation
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Outline of the test eval procedure on MuPoTS-3D.
% Plug in your predictions at the appropriate point
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
mpii_mupots_config;
addpath('./util');
[~,o1,o2,relevant_labels] = mpii_get_joints('relevant');
num_joints = length(o1);
%Path to the test images and annotations
test_annot_base = mpii_mupots_path; %See mpii_mupots_config
%Path where results are written out
results_output_path = './';
%If predicted joints have a different ordering, specify mapping to MPI joints here
%map_to_mpii_jointset = % [11 14 10 13 9 12 5 8 4 7 3 6 1];
%Order to process bones in to resize them to the GT
safe_traversal_order = [15, 16, 2, 1, 17, 3, 4, 5, 6, 7, 8, 9:14];
EVALUATION_MODE = eval_mode; % 0 = evaluate all annotated persons, 1 = evaluate only predictions matched to annotations
person_colors = {'red', 'yellow', 'green', 'blue', 'magenta', 'cyan', 'black', 'white'} ;
sequencewise_per_joint_error = {};
sequencewise_undetected_people = [];
sequencewise_visibility_mask = {};
sequencewise_occlusion_mask = {};
sequencewise_annotated_people = [];
sequencewise_frames = [];
%% load prdictions
preds_2d_kpt = load('preds_2d_kpt_mupots.mat');
preds_3d_kpt = load('preds_3d_kpt_mupots.mat');
for ts = 1:20
person_ids = [];
open_person_ids = 1:20;
load( sprintf('%s/TS%d/annot.mat',test_annot_base, ts));
load( sprintf('%s/TS%d/occlusion.mat',test_annot_base, ts));
num_frames = size(annotations,1);
undetected_people = 0;
annotated_people = 0;
pje_idx = 1;
per_joint_error = []; %zeros(17,1,num_test_points);
per_joint_occlusion_mask = [];
per_joint_visibility_mask = [];
sequencewise_frames(ts) = num_frames;
for i = 1:num_frames
%Count valid annotations
valid_annotations = 0;
for k = 1:size(annotations,2)
if(annotations{i,k}.isValidFrame)
valid_annotations = valid_annotations + 1;
end
end
annotated_people = annotated_people + valid_annotations;
if(valid_annotations == 0)
continue;
end
gt_pose_2d = cell(valid_annotations,1);
gt_pose_3d = cell(valid_annotations,1);
gt_visibility = cell(valid_annotations,1);
gt_pose_occlusion_labels = cell(valid_annotations,1);
gt_pose_visibility_labels = cell(valid_annotations,1);
%The joint set to use for matching predictions to GT
matching_joints = [2:14];
%matching_joints = [2 3 6 9 12];
idx = 1;
for k = 1:size(annotations,2)
if(annotations{i,k}.isValidFrame)
gt_pose_2d{idx} = annotations{i,k}.annot2(:,matching_joints);
gt_pose_3d{idx} = annotations{i,k}.univ_annot3 ;
gt_visibility{idx} = ones(1,length(matching_joints));
gt_pose_occlusion_labels{idx} = occlusion_labels{i,k} ;
gt_pose_visibility_labels{idx} = 1 - occlusion_labels{i,k} ;
idx = idx + 1;
end
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%% Predictions here
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%img = imread(sprintf('%s/TS%d/img_%06d.jpg',test_annot_base, ts, i-1));
% prediction of this image
pred_2d_kpt = getfield(preds_2d_kpt,sprintf('TS%d_img_%06d',ts, i-1));
pred_3d_kpt = getfield(preds_3d_kpt,sprintf('TS%d_img_%06d',ts, i-1));
%Number of subjects predicted
num_pred = size(pred_2d_kpt,1);
pred_pose_2d = cell(num_pred,1);
pred_pose_3d = cell(num_pred,1);
pred_visibility = cell(num_pred,1);
for k = 1:num_pred
pred_pose_2d{k} = zeros(2,14);
%pred_pose_2d{k}(:,map_to_mpii_jointset) = % 2D Pose for person detected person k;
pred_pose_2d{k} = transpose(squeeze(pred_2d_kpt(k,:,:))); % 2D Pose for person detected person k;
% If some joints such as neck are missing, they can be estimated as the mean of shoulders
%pred_pose_2d{k}(:,2) = mean(pred_pose_2d{k}(:,[3,6]),2);
pred_pose_2d{k} = pred_pose_2d{k}(:,matching_joints);
pred_visibility{k} = ~((pred_pose_2d{k}(1,:) == 0) & (pred_pose_2d{k}(2,:) == 0));
pred_pose_3d{k} = zeros(3,num_joints);
%pred_pose_3d{k}(:,map_to_mpii_jointset) = % 3D Pose for person detected person k;
pred_pose_3d{k} = transpose(squeeze(pred_3d_kpt(k,:,:))); % 3D Pose for person detected person k;
% If some joints such as neck or pelvis are missing, they can be estimated as
% the mean of shoulders or hips
%pred_pose_3d{k}(:,2) = mean(pred_pose_3d{k}(:,[3,6]),2);
%pred_pose_3d{k}(:,15) = mean(pred_pose_3d{k}(:,[9,12]),2);
%Center the predictions at the pelvis
if is_relative == 1
pred_pose_3d{k} = pred_pose_3d{k} - repmat(pred_pose_3d{k}(:,15), 1, 17);
else
pred_pose_3d{k} = pred_pose_3d{k};
end
%Other mappings that may be needed to convert the predicted pose to match our coordinate system
%pred_pose_3d{k} = 1000* pred_pose_3d{k}([2 3 1],:);
%pred_pose_3d{k}(1:2,:) = -pred_pose_3d{k}(1:2,:);
end
%Match predictions to GT
[matching, old_matched] = mpii_multiperson_get_identity_matching(gt_pose_2d, gt_visibility, pred_pose_2d, pred_visibility, 40);
undetected_people = undetected_people + sum(matching == 0);
for k = 1:valid_annotations
if is_relative == 1
P = gt_pose_3d{k}(:,1:num_joints) - repmat(gt_pose_3d{k}(:,15),1 , num_joints);
else
P = gt_pose_3d{k}(:,1:num_joints);
end
pred_considered = 0;
if(matching(k) ~= 0 )
pred_p = pred_pose_3d{matching(k)}(:,1:num_joints);
pred_p = mpii_map_to_gt_bone_lengths(pred_p, P, o1, safe_traversal_order(2:end));
pred_considered = 1;
else
pred_p = 100000 * ones(size(P)); %So that the 3DPCK metric marks all these joints as 0!
if(EVALUATION_MODE==0)
pred_considered = 1;
end
end
if (pred_considered == 1 )
error_p = (pred_p - P).^2;
error_p = sqrt(sum(error_p, 1));
per_joint_error(1:num_joints,1,pje_idx) = error_p;
per_joint_occlusion_mask(1:num_joints,1,pje_idx) = gt_pose_occlusion_labels{k};
per_joint_visibility_mask(1:num_joints,1,pje_idx) = gt_pose_visibility_labels{k};
pje_idx = pje_idx + 1;
end
end
end
sequencewise_undetected_people(ts) = undetected_people;
sequencewise_annotated_people(ts) = annotated_people;
sequencewise_per_joint_error{ts} = per_joint_error;
sequencewise_visibility_mask{ts} = per_joint_visibility_mask;
sequencewise_occlusion_mask{ts} = per_joint_occlusion_mask;
end
if(EVALUATION_MODE == 0)
out_prefix = 'all_annotated_';
else
out_prefix = 'only_matched_annotations_';
end
save([results_output_path filesep out_prefix 'multiperson_3dhp_evaluation.mat'], 'sequencewise_per_joint_error' );
[seq_table] = mpii_evaluate_multiperson_errors(sequencewise_per_joint_error );%fullfile(net_base, net_path{n,1}));
out_file = [results_output_path filesep out_prefix 'multiperson_3dhp_evaluation'];
writetable(cell2table(seq_table), [out_file '_sequencewise.csv']);
[seq_table] = mpii_evaluate_multiperson_errors_visibility_mask(sequencewise_per_joint_error , sequencewise_visibility_mask);
out_file = [results_output_path filesep [out_prefix 'visible_joints_'] 'multiperson_3dhp_evaluation'];
writetable(cell2table(seq_table), [out_file '_sequencewise.csv']);
[seq_table] = mpii_evaluate_multiperson_errors_visibility_mask(sequencewise_per_joint_error , sequencewise_occlusion_mask);
out_file = [results_output_path filesep [out_prefix 'occluded_joints_'] 'multiperson_3dhp_evaluation'];
writetable(cell2table(seq_table), [out_file '_sequencewise.csv']);
%
end
================================================
FILE: data/dataset.py
================================================
import numpy as np
import cv2
import random
import time
import torch
import copy
import math
from torch.utils.data.dataset import Dataset
from utils.vis import vis_keypoints, vis_3d_skeleton
from utils.pose_utils import fliplr_joints, transform_joint_to_other_db
from config import cfg
class DatasetLoader(Dataset):
def __init__(self, db, ref_joints_name, is_train, transform):
self.db = db.data
self.joint_num = db.joint_num
self.skeleton = db.skeleton
self.flip_pairs = db.flip_pairs
self.joints_have_depth = db.joints_have_depth
self.joints_name = db.joints_name
self.ref_joints_name = ref_joints_name
self.transform = transform
self.is_train = is_train
if self.is_train:
self.do_augment = True
else:
self.do_augment = False
def __getitem__(self, index):
joint_num = self.joint_num
skeleton = self.skeleton
flip_pairs = self.flip_pairs
joints_have_depth = self.joints_have_depth
data = copy.deepcopy(self.db[index])
bbox = data['bbox']
joint_img = data['joint_img']
joint_vis = data['joint_vis']
# 1. load image
cvimg = cv2.imread(data['img_path'], cv2.IMREAD_COLOR | cv2.IMREAD_IGNORE_ORIENTATION)
if not isinstance(cvimg, np.ndarray):
raise IOError("Fail to read %s" % data['img_path'])
img_height, img_width, img_channels = cvimg.shape
# 2. get augmentation params
if self.do_augment:
scale, rot, do_flip, color_scale, do_occlusion = get_aug_config()
else:
scale, rot, do_flip, color_scale, do_occlusion = 1.0, 0.0, False, [1.0, 1.0, 1.0], False
# 3. crop patch from img and perform data augmentation (flip, rot, color scale, synthetic occlusion)
img_patch, trans = generate_patch_image(cvimg, bbox, do_flip, scale, rot, do_occlusion)
for i in range(img_channels):
img_patch[:, :, i] = np.clip(img_patch[:, :, i] * color_scale[i], 0, 255)
# 4. generate patch joint ground truth
# flip joints and apply Affine Transform on joints
if do_flip:
joint_img[:, 0] = img_width - joint_img[:, 0] - 1
for pair in flip_pairs:
joint_img[pair[0], :], joint_img[pair[1], :] = joint_img[pair[1], :], joint_img[pair[0], :].copy()
joint_vis[pair[0], :], joint_vis[pair[1], :] = joint_vis[pair[1], :], joint_vis[pair[0], :].copy()
for i in range(len(joint_img)):
joint_img[i, 0:2] = trans_point2d(joint_img[i, 0:2], trans)
joint_img[i, 2] /= (cfg.bbox_3d_shape[0]/2.) # expect depth lies in -bbox_3d_shape[0]/2 ~ bbox_3d_shape[0]/2 -> -1.0 ~ 1.0
joint_img[i, 2] = (joint_img[i,2] + 1.0)/2. # 0~1 normalize
joint_vis[i] *= (
(joint_img[i,0] >= 0) & \
(joint_img[i,0] < cfg.input_shape[1]) & \
(joint_img[i,1] >= 0) & \
(joint_img[i,1] < cfg.input_shape[0]) & \
(joint_img[i,2] >= 0) & \
(joint_img[i,2] < 1)
)
vis = False
if vis:
filename = str(random.randrange(1,500))
tmpimg = img_patch.copy().astype(np.uint8)
tmpkps = np.zeros((3,joint_num))
tmpkps[:2,:] = joint_img[:,:2].transpose(1,0)
tmpkps[2,:] = joint_vis[:,0]
tmpimg = vis_keypoints(tmpimg, tmpkps, skeleton)
cv2.imwrite(filename + '_gt.jpg', tmpimg)
vis = False
if vis:
vis_3d_skeleton(joint_img, joint_vis, skeleton, filename)
# change coordinates to output space
joint_img[:, 0] = joint_img[:, 0] / cfg.input_shape[1] * cfg.output_shape[1]
joint_img[:, 1] = joint_img[:, 1] / cfg.input_shape[0] * cfg.output_shape[0]
joint_img[:, 2] = joint_img[:, 2] * cfg.depth_dim
if self.is_train:
img_patch = self.transform(img_patch)
if self.ref_joints_name is not None:
joint_img = transform_joint_to_other_db(joint_img, self.joints_name, self.ref_joints_name)
joint_vis = transform_joint_to_other_db(joint_vis, self.joints_name, self.ref_joints_name)
joint_img = joint_img.astype(np.float32)
joint_vis = (joint_vis > 0).astype(np.float32)
joints_have_depth = np.array([joints_have_depth]).astype(np.float32)
return img_patch, joint_img, joint_vis, joints_have_depth
else:
img_patch = self.transform(img_patch)
return img_patch
def __len__(self):
return len(self.db)
# helper functions
def get_aug_config():
scale_factor = 0.25
rot_factor = 30
color_factor = 0.2
scale = np.clip(np.random.randn(), -1.0, 1.0) * scale_factor + 1.0
rot = np.clip(np.random.randn(), -2.0,
2.0) * rot_factor if random.random() <= 0.6 else 0
do_flip = random.random() <= 0.5
c_up = 1.0 + color_factor
c_low = 1.0 - color_factor
color_scale = [random.uniform(c_low, c_up), random.uniform(c_low, c_up), random.uniform(c_low, c_up)]
do_occlusion = random.random() <= 0.5
return scale, rot, do_flip, color_scale, do_occlusion
def generate_patch_image(cvimg, bbox, do_flip, scale, rot, do_occlusion):
img = cvimg.copy()
img_height, img_width, img_channels = img.shape
# synthetic occlusion
if do_occlusion:
while True:
area_min = 0.0
area_max = 0.7
synth_area = (random.random() * (area_max - area_min) + area_min) * bbox[2] * bbox[3]
ratio_min = 0.3
ratio_max = 1/0.3
synth_ratio = (random.random() * (ratio_max - ratio_min) + ratio_min)
synth_h = math.sqrt(synth_area * synth_ratio)
synth_w = math.sqrt(synth_area / synth_ratio)
synth_xmin = random.random() * (bbox[2] - synth_w - 1) + bbox[0]
synth_ymin = random.random() * (bbox[3] - synth_h - 1) + bbox[1]
if synth_xmin >= 0 and synth_ymin >= 0 and synth_xmin + synth_w < img_width and synth_ymin + synth_h < img_height:
xmin = int(synth_xmin)
ymin = int(synth_ymin)
w = int(synth_w)
h = int(synth_h)
img[ymin:ymin+h, xmin:xmin+w, :] = np.random.rand(h, w, 3) * 255
break
bb_c_x = float(bbox[0] + 0.5*bbox[2])
bb_c_y = float(bbox[1] + 0.5*bbox[3])
bb_width = float(bbox[2])
bb_height = float(bbox[3])
if do_flip:
img = img[:, ::-1, :]
bb_c_x = img_width - bb_c_x - 1
trans = gen_trans_from_patch_cv(bb_c_x, bb_c_y, bb_width, bb_height, cfg.input_shape[1], cfg.input_shape[0], scale, rot, inv=False)
img_patch = cv2.warpAffine(img, trans, (int(cfg.input_shape[1]), int(cfg.input_shape[0])), flags=cv2.INTER_LINEAR)
img_patch = img_patch[:,:,::-1].copy()
img_patch = img_patch.astype(np.float32)
return img_patch, trans
def rotate_2d(pt_2d, rot_rad):
x = pt_2d[0]
y = pt_2d[1]
sn, cs = np.sin(rot_rad), np.cos(rot_rad)
xx = x * cs - y * sn
yy = x * sn + y * cs
return np.array([xx, yy], dtype=np.float32)
def gen_trans_from_patch_cv(c_x, c_y, src_width, src_height, dst_width, dst_height, scale, rot, inv=False):
# augment size with scale
src_w = src_width * scale
src_h = src_height * scale
src_center = np.array([c_x, c_y], dtype=np.float32)
# augment rotation
rot_rad = np.pi * rot / 180
src_downdir = rotate_2d(np.array([0, src_h * 0.5], dtype=np.float32), rot_rad)
src_rightdir = rotate_2d(np.array([src_w * 0.5, 0], dtype=np.float32), rot_rad)
dst_w = dst_width
dst_h = dst_height
dst_center = np.array([dst_w * 0.5, dst_h * 0.5], dtype=np.float32)
dst_downdir = np.array([0, dst_h * 0.5], dtype=np.float32)
dst_rightdir = np.array([dst_w * 0.5, 0], dtype=np.float32)
src = np.zeros((3, 2), dtype=np.float32)
src[0, :] = src_center
src[1, :] = src_center + src_downdir
src[2, :] = src_center + src_rightdir
dst = np.zeros((3, 2), dtype=np.float32)
dst[0, :] = dst_center
dst[1, :] = dst_center + dst_downdir
dst[2, :] = dst_center + dst_rightdir
if inv:
trans = cv2.getAffineTransform(np.float32(dst), np.float32(src))
else:
trans = cv2.getAffineTransform(np.float32(src), np.float32(dst))
return trans
def trans_point2d(pt_2d, trans):
src_pt = np.array([pt_2d[0], pt_2d[1], 1.]).T
dst_pt = np.dot(trans, src_pt)
return dst_pt[0:2]
================================================
FILE: data/multiple_datasets.py
================================================
import random
import numpy as np
from torch.utils.data.dataset import Dataset
class MultipleDatasets(Dataset):
def __init__(self, dbs, make_same_len=True):
self.dbs = dbs
self.db_num = len(self.dbs)
self.max_db_data_num = max([len(db) for db in dbs])
self.db_len_cumsum = np.cumsum([len(db) for db in dbs])
self.make_same_len = make_same_len
def __len__(self):
# all dbs have the same length
if self.make_same_len:
return self.max_db_data_num * self.db_num
# each db has different length
else:
return sum([len(db) for db in self.dbs])
def __getitem__(self, index):
if self.make_same_len:
db_idx = index // self.max_db_data_num
data_idx = index % self.max_db_data_num
if data_idx >= len(self.dbs[db_idx]) * (self.max_db_data_num // len(self.dbs[db_idx])): # last batch: random sampling
data_idx = random.randint(0,len(self.dbs[db_idx])-1)
else: # before last batch: use modular
data_idx = data_idx % len(self.dbs[db_idx])
else:
for i in range(self.db_num):
if index < self.db_len_cumsum[i]:
db_idx = i
break
if db_idx == 0:
data_idx = index
else:
data_idx = index - self.db_len_cumsum[db_idx-1]
return self.dbs[db_idx][data_idx]
================================================
FILE: demo/demo.py
================================================
import sys
import os
import os.path as osp
import argparse
import numpy as np
import cv2
import torch
import torchvision.transforms as transforms
from torch.nn.parallel.data_parallel import DataParallel
import torch.backends.cudnn as cudnn
sys.path.insert(0, osp.join('..', 'main'))
sys.path.insert(0, osp.join('..', 'data'))
sys.path.insert(0, osp.join('..', 'common'))
from config import cfg
from model import get_pose_net
from dataset import generate_patch_image
from utils.pose_utils import process_bbox, pixel2cam
from utils.vis import vis_keypoints, vis_3d_multiple_skeleton
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, dest='gpu_ids')
parser.add_argument('--model_path', type=str, dest='model')
parser.add_argument('--input_image', type=str, dest='image')
parser.add_argument('--backbone', type=str, dest='backbone')
args = parser.parse_args()
# test gpus
if not args.gpu_ids:
assert 0, print("Please set proper gpu ids")
if '-' in args.gpu_ids:
gpus = args.gpu_ids.split('-')
gpus[0] = 0 if not gpus[0].isdigit() else int(gpus[0])
gpus[1] = len(mem_info()) if not gpus[1].isdigit() else int(gpus[1]) + 1
args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
return args
# argument parsing
args = parse_args()
cfg.set_args(args.gpu_ids)
cudnn.benchmark = True
# MuCo joint set
joint_num = 18
joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe')
# 'Pelvis' 'RHip' 'RKnee' 'RAnkle' 'LHip' 'LKnee' 'LAnkle' 'Spine1' 'Neck' 'Head' 'Site' 'LShoulder' 'LElbow' 'LWrist' 'RShoulder' 'RElbow' 'RWrist
flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) )
# skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) )
skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) )
# snapshot load
model_path = args.model
# print('Load checkpoint from {}'.format(model_path))
model = get_pose_net(args.backbone, False, joint_num)
model = DataParallel(model).cuda()
# print("after DataParallel", model)
ckpt = torch.load(model_path)
# print("ckpt", ckpt['network'])
model.load_state_dict(ckpt['network'])
model.eval()
# prepare input image
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)])
img_path = args.image
assert osp.exists(img_path), 'Cannot find image at ' + img_path
original_img = cv2.imread(img_path)
original_img_height, original_img_width = original_img.shape[:2]
# prepare bbox
bbox_list = [
[139.41, 102.25, 222.39, 241.57],\
[287.17, 61.52, 74.88, 165.61],\
[540.04, 48.81, 99.96, 223.36],\
[372.58, 170.84, 266.63, 217.19],\
[0.5, 43.74, 90.1, 220.09]] # xmin, ymin, width, height
root_depth_list = [11250.5732421875, 15522.8701171875, 11831.3828125, 8852.556640625, 12572.5966796875] # obtain this from RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE/tree/master/demo)
assert len(bbox_list) == len(root_depth_list)
person_num = len(bbox_list)
# normalized camera intrinsics
focal = [1500, 1500] # x-axis, y-axis
princpt = [original_img_width/2, original_img_height/2] # x-axis, y-axis
print('focal length: (' + str(focal[0]) + ', ' + str(focal[1]) + ')')
print('principal points: (' + str(princpt[0]) + ', ' + str(princpt[1]) + ')')
# for each cropped and resized human image, forward it to PoseNet
output_pose_2d_list = []
output_pose_3d_list = []
for n in range(person_num):
bbox = process_bbox(np.array(bbox_list[n]), original_img_width, original_img_height)
img, img2bb_trans = generate_patch_image(original_img, bbox, False, 1.0, 0.0, False)
img = transform(img).cuda()[None,:,:,:]
# forward
with torch.no_grad():
pose_3d = model(img) # x,y: pixel, z: root-relative depth (mm)
# inverse affine transform (restore the crop and resize)
pose_3d = pose_3d[0].cpu().numpy()
pose_3d[:,0] = pose_3d[:,0] / cfg.output_shape[1] * cfg.input_shape[1]
pose_3d[:,1] = pose_3d[:,1] / cfg.output_shape[0] * cfg.input_shape[0]
pose_3d_xy1 = np.concatenate((pose_3d[:,:2], np.ones_like(pose_3d[:,:1])),1)
img2bb_trans_001 = np.concatenate((img2bb_trans, np.array([0,0,1]).reshape(1,3)))
pose_3d[:,:2] = np.dot(np.linalg.inv(img2bb_trans_001), pose_3d_xy1.transpose(1,0)).transpose(1,0)[:,:2]
output_pose_2d_list.append(pose_3d[:,:2].copy())
# root-relative discretized depth -> absolute continuous depth
pose_3d[:,2] = (pose_3d[:,2] / cfg.depth_dim * 2 - 1) * (cfg.bbox_3d_shape[0]/2) + root_depth_list[n]
pose_3d = pixel2cam(pose_3d, focal, princpt)
output_pose_3d_list.append(pose_3d.copy())
# visualize 2d poses
vis_img = original_img.copy()
for n in range(person_num):
vis_kps = np.zeros((3,joint_num))
vis_kps[0,:] = output_pose_2d_list[n][:,0]
vis_kps[1,:] = output_pose_2d_list[n][:,1]
vis_kps[2,:] = 1
vis_img = vis_keypoints(vis_img, vis_kps, skeleton)
cv2.imwrite('output_pose_2d.jpg', vis_img)
# visualize 3d poses
vis_kps = np.array(output_pose_3d_list)
vis_3d_multiple_skeleton(vis_kps, np.ones_like(vis_kps), skeleton, 'output_pose_3d (x,y,z: camera-centered. mm.)')
================================================
FILE: main/config.py
================================================
import os
import os.path as osp
import sys
import numpy as np
class Config:
## model architecture
backbone = 'LPSKI'
## dataset
# training set
# 3D: Human36M, MuCo
# 2D: MSCOCO, MPII
trainset_3d = ['Dummy']
# trainset_3d = ['MuCo']
trainset_2d = []
# trainset_2d = ['MSCOCO']
# testing set
# Human36M, MuPoTS, MSCOCO
testset = 'MuPoTS'
## directory
cur_dir = osp.dirname(os.path.abspath(__file__))
root_dir = osp.join(cur_dir, '..')
data_dir = osp.join(root_dir, 'data')
output_dir = osp.join(root_dir, 'output')
model_dir = osp.join(output_dir, 'model_dump')
pretrain_dir = osp.join(output_dir, 'pre_train')
vis_dir = osp.join(output_dir, 'vis')
log_dir = osp.join(output_dir, 'log')
result_dir = osp.join(output_dir, 'result')
## input, output
input_shape = (256, 256)
output_shape = (input_shape[0]//8, input_shape[1]//8)
width_multiplier = 1.0
depth_dim = 32
bbox_3d_shape = (2000, 2000, 2000) # depth, height, width
pixel_mean = (0.485, 0.456, 0.406)
pixel_std = (0.229, 0.224, 0.225)
## training config
embedding_size = 2048
lr_dec_epoch = [17, 21]
end_epoch = 25
lr = 1e-3
lr_dec_factor = 10
batch_size = 64
## testing config
test_batch_size = 32
flip_test = True
use_gt_info = True
## others
num_thread = 20
gpu_ids = '0'
num_gpus = 1
continue_train = False
if '-' in gpu_ids:
gpus = gpu_ids.split('-')
gpus[0] = int(gpus[0])
gpus[1] = int(gpus[1]) + 1
gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
os.environ["CUDA_VISIBLE_DEVICES"] = gpu_ids
cfg = Config()
sys.path.insert(0, osp.join(cfg.root_dir, 'common'))
from utils.dir_utils import add_pypath, make_folder
# adding path
add_pypath(osp.join(cfg.data_dir))
for i in range(len(cfg.trainset_3d)):
add_pypath(osp.join(cfg.data_dir, cfg.trainset_3d[i]))
for i in range(len(cfg.trainset_2d)):
add_pypath(osp.join(cfg.data_dir, cfg.trainset_2d[i]))
add_pypath(osp.join(cfg.data_dir, cfg.testset))
make_folder(cfg.model_dir)
make_folder(cfg.vis_dir)
make_folder(cfg.log_dir)
make_folder(cfg.result_dir)
================================================
FILE: main/intermediate.py
================================================
import torch
import argparse
import numpy as np
import os
import os.path as osp
import cv2
import matplotlib.pyplot as plt
import torch.backends.cudnn as cudnn
import torchvision.transforms as transforms
from torchsummary import summary
from torch.nn.parallel.data_parallel import DataParallel
from config import cfg
from model import get_pose_net
from utils.pose_utils import process_bbox, pixel2cam
from utils.vis import vis_keypoints, vis_3d_multiple_skeleton
from dataset import generate_patch_image
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, dest='gpu_ids')
parser.add_argument('--epoch', type=int, dest='test_epoch')
parser.add_argument('--input_image', type=str, dest='image')
parser.add_argument('--jointnum', type=int, dest='joint')
parser.add_argument('--backbone', type=str, dest='backbone')
args = parser.parse_args()
# test gpus
if not args.gpu_ids:
assert 0, print("Please set proper gpu ids")
if not args.joint:
assert print("please insert number of joint")
if '-' in args.gpu_ids:
gpus = args.gpu_ids.split('-')
gpus[0] = 0 if not gpus[0].isdigit() else int(gpus[0])
gpus[1] = len(mem_info()) if not gpus[1].isdigit() else int(gpus[1]) + 1
args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
return args
# argument parsing
args = parse_args()
cfg.set_args(args.gpu_ids)
cudnn.benchmark = True
# joint set
joint_num = args.joint
joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe')
flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) )
if joint_num == 18:
skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) )
if joint_num == 21:
skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) )
# snapshot load
model_path = os.path.join(cfg.model_dir, 'snapshot_%d.pth.tar' % args.test_epoch)
assert osp.exists(model_path), 'Cannot find model at ' + model_path
model = get_pose_net(args.backbone, args.frontbone, False, joint_num)
model = DataParallel(model).cuda()
ckpt = torch.load(model_path)
model.load_state_dict(ckpt['network'])
model = model.module
model.eval()
# prepare input image
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize(mean=cfg.pixel_mean, std=cfg.pixel_std)])
img_path = args.image
assert osp.exists(img_path), 'Cannot find image at ' + img_path
original_img = cv2.imread(img_path)
original_img_height, original_img_width = original_img.shape[:2]
# prepare bbox
bbox_list = [
[139.41, 102.25, 222.39, 241.57],\
[287.17, 61.52, 74.88, 165.61],\
[540.04, 48.81, 99.96, 223.36],\
[372.58, 170.84, 266.63, 217.19],\
[0.5, 43.74, 90.1, 220.09]
] # xmin, ymin, width, height
root_depth_list = [11250.5732421875, 15522.8701171875, 11831.3828125, 8852.556640625, 12572.5966796875] # obtain this from RootNet (https://github.com/mks0601/3DMPPE_ROOTNET_RELEASE/tree/master/demo)
assert len(bbox_list) == len(root_depth_list)
person_num = len(bbox_list)
# extractor
activation = {}
def get_activation(name):
def hook(model, input, output):
activation[name] = output.detach()
return hook
for n in range(person_num):
bbox = process_bbox(np.array(bbox_list[n]), original_img_width, original_img_height)
img, img2bb_trans = generate_patch_image(original_img, bbox, False, 1.0, 0.0, False)
img = transform(img).cuda()[None,:,:,:]
model.backbone.deonv1.register_forward_hook(get_activation('%d' % n))
# forward
with torch.no_grad():
pose_3d = model(img) # x,y: pixel, z: root-relative depth (mm)
plt.figure(figsize=(32, 32))
a = activation['0'] - activation['1']
b = torch.sum(a, dim=1)
print(b)
for i in range(person_num):
image = activation['%d'%i]
print(image.size())
sum_image = torch.sum(image[0], dim=0)
print(sum_image.size())
plt.subplot(1, person_num, i+1)
plt.imshow(sum_image.cpu(), cmap='gray')
plt.axis('off')
plt.show()
plt.close()
================================================
FILE: main/model.py
================================================
import torch
import torch.nn as nn
from torch.nn import functional as F
from backbone import *
from config import cfg
import os.path as osp
model_urls = {
'MobileNetV2': 'https://download.pytorch.org/models/mobilenet_v2-b0353104.pth',
'ResNet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
'ResNet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
'ResNet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
'ResNet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
'ResNet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
'ResNext50_32x4d': 'https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth',
'resnext101_32x8d': 'https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth',
'wide_resnet50_2': 'https://download.pytorch.org/models/wide_resnet50_2-95faca4d.pth',
'wide_resnet101_2': 'https://download.pytorch.org/models/wide_resnet101_2-32ee1156.pth',
}
BACKBONE_DICT = {
'LPRES':LpNetResConcat,
'LPSKI':LpNetSkiConcat,
'LPWO':LpNetWoConcat
}
def soft_argmax(heatmaps, joint_num):
heatmaps = heatmaps.reshape((-1, joint_num, cfg.depth_dim*cfg.output_shape[0]*cfg.output_shape[1]))
heatmaps = F.softmax(heatmaps, 2)
heatmaps = heatmaps.reshape((-1, joint_num, cfg.depth_dim, cfg.output_shape[0], cfg.output_shape[1]))
accu_x = heatmaps.sum(dim=(2,3))
accu_y = heatmaps.sum(dim=(2,4))
accu_z = heatmaps.sum(dim=(3,4))
# accu_x = accu_x * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[1]+1).type(torch.cuda.FloatTensor), devices=[accu_x.device.index])[0]
# accu_y = accu_y * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.output_shape[0]+1).type(torch.cuda.FloatTensor), devices=[accu_y.device.index])[0]
# accu_z = accu_z * torch.nn.parallel.comm.broadcast(torch.arange(1,cfg.depth_dim+1).type(torch.cuda.FloatTensor), devices=[accu_z.device.index])[0]
accu_x = accu_x * torch.arange(1,cfg.output_shape[1]+1)
accu_y = accu_y * torch.arange(1,cfg.output_shape[0]+1)
accu_z = accu_z * torch.arange(1,cfg.depth_dim+1)
accu_x = accu_x.sum(dim=2, keepdim=True) -1
accu_y = accu_y.sum(dim=2, keepdim=True) -1
accu_z = accu_z.sum(dim=2, keepdim=True) -1
coord_out = torch.cat((accu_x, accu_y, accu_z), dim=2)
return coord_out
class CustomNet(nn.Module):
def __init__(self, backbone, joint_num):
super(CustomNet, self).__init__()
self.backbone = backbone
self.joint_num = joint_num
def forward(self, input_img, target=None):
fm = self.backbone(input_img)
coord = soft_argmax(fm, self.joint_num)
if target is None:
return coord
else:
target_coord = target['coord']
target_vis = target['vis']
target_have_depth = target['have_depth']
## coordinate loss
loss_coord = torch.abs(coord - target_coord) * target_vis
loss_coord = (loss_coord[:,:,0] + loss_coord[:,:,1] + loss_coord[:,:,2] * target_have_depth)/3.
return loss_coord
def get_pose_net(backbone_str, is_train, joint_num):
INPUT_SIZE = cfg.input_shape
EMBEDDING_SIZE = cfg.embedding_size # feature dimension
WIDTH_MULTIPLIER = cfg.width_multiplier
assert INPUT_SIZE == (256, 256)
print("=" * 60)
print("{} BackBone Generated".format(backbone_str))
print("=" * 60)
model = CustomNet(BACKBONE_DICT[backbone_str](input_size = INPUT_SIZE, joint_num = joint_num, embedding_size = EMBEDDING_SIZE, width_mult = WIDTH_MULTIPLIER), joint_num)
if is_train == True:
model.backbone.init_weights()
return model
================================================
FILE: main/pytorch2coreml.py
================================================
import torch
import argparse
import coremltools as ct
from config import cfg
from torch.nn.parallel.data_parallel import DataParallel
from base import Transformer
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, dest='gpu_ids')
parser.add_argument('--joint', type=int, dest='joint')
parser.add_argument('--modelpath', type=str, dest='modelpath')
parser.add_argument('--backbone', type=str, dest='backbone')
args = parser.parse_args()
# test gpus
if not args.gpu_ids:
assert 0, "Please set proper gpu ids"
if '-' in args.gpu_ids:
gpus = args.gpu_ids.split('-')
gpus[0] = int(gpus[0])
gpus[1] = int(gpus[1]) + 1
args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
return args
args = parse_args()
# modelpath as definite path
transformer = Transformer(args.backbone, args.joint, args.modelpath)
transformer._make_model()
single_pytorch_model = transformer.model
device = torch.device('cpu')
single_pytorch_model.to(device)
dummy_input = torch.randn(1, 3, 256, 256)
traced_model = torch.jit.trace(single_pytorch_model, dummy_input)
# Convert to Core ML using the Unified Conversion API
model = ct.convert(
traced_model,
inputs=[ct.ImageType(name="input_1", shape=dummy_input.shape)], #name "input_1" is used in 'quickstart'
)
model.save("test.mlmodel")
================================================
FILE: main/pytorch2onnx.py
================================================
import onnx
import torch
import argparse
import numpy
import imageio
import onnxruntime as ort
import tensorflow as tf
from config import cfg
from torchsummary import summary
from base import Transformer
from onnx_tf.backend import prepare
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, dest='gpu_ids')
parser.add_argument('--joint', type=int, dest='joint')
parser.add_argument('--modelpath', type=str, dest='modelpath')
parser.add_argument('--backbone', type=str, dest='backbone')
args = parser.parse_args()
# test gpus
if not args.gpu_ids:
assert 0, "Please set proper gpu ids"
if '-' in args.gpu_ids:
gpus = args.gpu_ids.split('-')
gpus[0] = int(gpus[0])
gpus[1] = int(gpus[1]) + 1
args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
return args
args = parse_args()
dummy_input = torch.randn(1, 3, 256, 256, device='cuda')
# modelpath as definite path
transformer = Transformer(args.backbone, args.joint, args.modelpath)
transformer._make_model()
single_pytorch_model = transformer.model
summary(single_pytorch_model, (3, 256, 256))
ONNX_PATH="../output/baseline.onnx"
torch.onnx.export(
model=single_pytorch_model,
args=dummy_input,
f=ONNX_PATH, # where should it be saved
verbose=False,
export_params=True,
do_constant_folding=False, # fold constant values for optimization
# do_constant_folding=True, # fold constant values for optimization
input_names=['input'],
output_names=['output'],
opset_version=11
)
onnx_model = onnx.load(ONNX_PATH)
onnx.checker.check_model(onnx_model)
onnx.helper.printable_graph(onnx_model.graph)
pytorch_result = single_pytorch_model(dummy_input)
pytorch_result = pytorch_result.cpu().detach().numpy()
print("pytorch_model output {}".format(pytorch_result.shape), pytorch_result)
ort_session = ort.InferenceSession(ONNX_PATH)
outputs = ort_session.run(None, {'input': dummy_input.cpu().numpy()})
outputs = numpy.array(outputs[0])
print("onnx_model ouput size{}".format(outputs.shape), outputs)
print("difference", numpy.linalg.norm(pytorch_result-outputs))
TF_PATH = "../output/baseline" # where the representation of tensorflow model will be stored
# prepare function converts an ONNX model to an internel representation
# of the computational graph called TensorflowRep and returns
# the converted representation.
tf_rep = prepare(onnx_model) # creating TensorflowRep object
# export_graph function obtains the graph proto corresponding to the ONNX
# model associated with the backend representation and serializes
# to a protobuf file.
tf_rep.export_graph(TF_PATH)
TFLITE_PATH = "../output/baseline.tflite"
PB_PATH = "../output/baseline/saved_model.pb"
# make a converter object from the saved tensorflow file
# converter = tf.compat.v1.lite.TFLiteConverter.from_frozen_graph(PB_PATH, input_arrays=['input'], output_arrays=['output'])
converter = tf.lite.TFLiteConverter.from_saved_model(TF_PATH)
# tell converter which type of optimization techniques to use
# to view the best option for optimization read documentation of tflite about optimization
# go to this link https://www.tensorflow.org/lite/guide/get_started#4_optimize_your_model_optional
# converter.optimizations = [tf.compat.v1.lite.Optimize.DEFAULT]
# converter.experimental_new_converter = True
#
# # I had to explicitly state the ops
# converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS,
# tf.lite.OpsSet.SELECT_TF_OPS]
def representative_dataset():
dataset_size = 10
for i in range(dataset_size):
print(i)
data = imageio.imread("../sample_images/" + "00000" + str(i) + ".jpg")
data = numpy.resize(data, [1, 3, 256, 256])
yield [data.astype(numpy.float32)]
converter.experimental_new_converter = True
converter.experimental_new_quantizer = True
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
# input_arrays = converter.get_input_arrays()
# converter.quantized_input_stats = {input_arrays[0]: (0.0, 1.0)}
tf_lite_model = converter.convert()
# Save the model.
with open(TFLITE_PATH, 'wb') as f:
f.write(tf_lite_model)
================================================
FILE: main/summary.py
================================================
import torch
import argparse
import os
import os.path as osp
import torch.backends.cudnn as cudnn
from torchsummary import summary
from torch.nn.parallel.data_parallel import DataParallel
from config import cfg
from model import get_pose_net
from thop import profile
from thop import clever_format
from ptflops import get_model_complexity_info
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, dest='gpu_ids')
parser.add_argument('--epoch', type=int, dest='test_epoch')
parser.add_argument('--jointnum', type=int, dest='joint')
parser.add_argument('--backbone', type=str, dest='backbone')
args = parser.parse_args()
# test gpus
if not args.gpu_ids:
assert 0, print("Please set proper gpu ids")
if not args.joint:
assert print("please insert number of joint")
if '-' in args.gpu_ids:
gpus = args.gpu_ids.split('-')
gpus[0] = 0 if not gpus[0].isdigit() else int(gpus[0])
gpus[1] = len(mem_info()) if not gpus[1].isdigit() else int(gpus[1]) + 1
args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
return args
# argument parsing
args = parse_args()
cfg.set_args(args.gpu_ids)
cudnn.benchmark = True
# joint set
joint_num = args.joint
joints_name = ('Head_top', 'Thorax', 'R_Shoulder', 'R_Elbow', 'R_Wrist', 'L_Shoulder', 'L_Elbow', 'L_Wrist', 'R_Hip', 'R_Knee', 'R_Ankle', 'L_Hip', 'L_Knee', 'L_Ankle', 'Pelvis', 'Spine', 'Head', 'R_Hand', 'L_Hand', 'R_Toe', 'L_Toe')
flip_pairs = ( (2, 5), (3, 6), (4, 7), (8, 11), (9, 12), (10, 13), (17, 18), (19, 20) )
if joint_num == 18:
skeleton = ( (0, 7), (7, 8), (8, 9), (9, 10), (8, 11), (11, 12), (12, 13), (8, 14), (14, 15), (15, 16), (0, 1), (1, 2), (2, 3), (0, 4), (4, 5), (5, 6) )
if joint_num == 21:
skeleton = ( (0, 16), (16, 1), (1, 15), (15, 14), (14, 8), (14, 11), (8, 9), (9, 10), (10, 19), (11, 12), (12, 13), (13, 20), (1, 2), (2, 3), (3, 4), (4, 17), (1, 5), (5, 6), (6, 7), (7, 18) )
# snapshot load
model_path = os.path.join(cfg.model_dir, 'snapshot_%d.pth.tar' % args.test_epoch)
assert osp.exists(model_path), 'Cannot find model at ' + model_path
model = get_pose_net(args.backbone, args.frontbone, False, joint_num)
model = DataParallel(model).cuda()
ckpt = torch.load(model_path)
model.load_state_dict(ckpt['network'])
single_model = model.module
summary(single_model, (3, 256, 256))
input = torch.randn(1, 3, 256, 256).cuda()
macs, params = profile(single_model, inputs=(input,))
macs, params = clever_format([macs, params], "%.3f")
flops, params1 = get_model_complexity_info(single_model, (3, 256, 256),as_strings=True, print_per_layer_stat=False)
print('{:<30} {:<8}'.format('Computational complexity: ', flops))
print('{:<30} {:<8}'.format('Computational complexity: ', macs))
print('{:<30} {:<8}'.format('Number of parameters: ', params))
print('{:<30} {:<8}'.format('Number of parameters: ', params1))
================================================
FILE: main/test.py
================================================
import argparse
from tqdm import tqdm
import numpy as np
import cv2
from config import cfg
import torch
from base import Tester
from utils.vis import vis_keypoints
from utils.pose_utils import flip
import torch.backends.cudnn as cudnn
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, dest='gpu_ids')
parser.add_argument('--epochs', type=str, dest='model')
parser.add_argument('--backbone', type=str, dest='backbone')
args = parser.parse_args()
# test gpus
if not args.gpu_ids:
assert 0, "Please set proper gpu ids"
if '-' in args.gpu_ids:
gpus = args.gpu_ids.split('-')
gpus[0] = int(gpus[0])
gpus[1] = int(gpus[1]) + 1
args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
if '-' in args.model:
model_epoch = args.model.split('-')
model_epoch[0] = int(model_epoch[0])
model_epoch[1] = int(model_epoch[1]) + 1
args.model_epoch = model_epoch
return args
def main():
args = parse_args()
cfg.set_args(args.gpu_ids)
cudnn.fastest = True
cudnn.benchmark = True
cudnn.deterministic = False
cudnn.enabled = True
tester = Tester(args.backbone)
tester._make_batch_generator()
for epoch in range(args.model_epoch[0], args.model_epoch[1]):
tester._make_model(epoch)
preds = []
with torch.no_grad():
for itr, input_img in enumerate(tqdm(tester.batch_generator)):
# forward
coord_out = tester.model(input_img)
if cfg.flip_test:
flipped_input_img = flip(input_img, dims=3)
flipped_coord_out = tester.model(flipped_input_img)
flipped_coord_out[:, :, 0] = cfg.output_shape[1] - flipped_coord_out[:, :, 0] - 1
for pair in tester.flip_pairs:
flipped_coord_out[:, pair[0], :], flipped_coord_out[:, pair[1], :] = flipped_coord_out[:, pair[1], :].clone(), flipped_coord_out[:, pair[0], :].clone()
coord_out = (coord_out + flipped_coord_out)/2.
vis = False
if vis:
filename = str(itr)
tmpimg = input_img[0].cpu().numpy()
tmpimg = tmpimg * np.array(cfg.pixel_std).reshape(3,1,1) + np.array(cfg.pixel_mean).reshape(3,1,1)
tmpimg = tmpimg.astype(np.uint8)
tmpimg = tmpimg[::-1, :, :]
tmpimg = np.transpose(tmpimg,(1,2,0)).copy()
tmpkps = np.zeros((3,tester.joint_num))
tmpkps[:2,:] = coord_out[0,:,:2].cpu().numpy().transpose(1,0) / cfg.output_shape[0] * cfg.input_shape[0]
tmpkps[2,:] = 1
tmpimg = vis_keypoints(tmpimg, tmpkps, tester.skeleton)
cv2.imwrite(filename + '_output.jpg', tmpimg)
coord_out = coord_out.cpu().numpy()
preds.append(coord_out)
# evaluate
preds = np.concatenate(preds, axis=0)
tester._evaluate(preds, cfg.result_dir)
if __name__ == "__main__":
main()
================================================
FILE: main/time.py
================================================
import torch
import argparse
from base import Transformer
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument('--gpu', type=str, dest='gpu_ids')
parser.add_argument('--joint', type=int, dest='joint')
parser.add_argument('--modelpath', type=str, dest='modelpath')
parser.add_argument('--backbone', type=str, dest='backbone')
args = parser.parse_args()
# test gpus
if not args.gpu_ids:
assert 0, "Please set proper gpu ids"
if '-' in args.gpu_ids:
gpus = args.gpu_ids.split('-')
gpus[0] = int(gpus[0])
gpus[1] = int(gpus[1]) + 1
args.gpu_ids = ','.join(map(lambda x: str(x), list(range(*gpus))))
return args
args = parse_args()
optimal_batch_size = 64
transformer = Transformer(args.backbone, args.joint, args.modelpath)
transformer._make_model()
model = transformer.model
device = torch.device("cuda")
dummy_input = torch.randn(optimal_batch_size, 3, 256, 256, dtype=torch.float).to(device)
repetitions=100
total_time = 0
with torch.no_grad():
for rep in range(repetitions):
starter, ender = torch.cuda.Event(enable_timing=True), torch.cuda.Event(enable_timing=True)
starter.record()
_ = model(dummy_input)
ender.record()
torch.cuda.synchronize()
curr_time = starter.elapsed_time(ender)/1000
total_time += curr_time
Throughput = (repetitions*optimal_batch_size)/total_time
print('Final Throughput:',Throughput)
================================================
FILE: main/train.py
================================================
import argparse
from config import cfg
from tqdm import tqdm
import os.path as osp
import numpy as np
import torch
from base import Trainer
from utils.pose_utils import flip
import torch.backends.cudnn as cudnn
def main():
# argument parse and create log
cudnn.fastest = True
cudnn.benchmark = True
trainer = Trainer(cfg)
trainer._make_batch_generator()
trainer._make_model()
# train
for epoch in range(trainer.start_epoch, cfg.end_epoch):
trainer.set_lr(epoch)
trainer.tot_timer.tic()
trainer.read_timer.tic()
for itr, (input_img, joint_img, joint_vis, joints_have_depth) in enumerate(trainer.batch_generator):
trainer.read_timer.toc()
trainer.gpu_timer.tic()
# forward
trainer.optimizer.zero_grad()
target = {'coord': joint_img, 'vis': joint_vis, 'have_depth': joints_have_depth}
loss_coord = trainer.model(input_img, target)
loss_coord = loss_coord.mean()
# backward
loss = loss_coord
loss.backward()
trainer.optimizer.step()
trainer.gpu_timer.toc()
screen = [
'Epoch %d/%d itr %d/%d:' % (epoch, cfg.end_epoch, itr, trainer.itr_per_epoch),
'lr: %g' % (trainer.get_lr()),
'speed: %.2f(%.2fs r%.2f)s/itr' % (
trainer.tot_timer.average_time, trainer.gpu_timer.average_time, trainer.read_timer.average_time),
'%.2fh/epoch' % (trainer.tot_timer.average_time / 3600. * trainer.itr_per_epoch),
'%s: %.4f' % ('loss_coord', loss_coord.detach()),
]
trainer.logger.info(' '.join(screen))
trainer.tot_timer.toc()
trainer.tot_timer.tic()
trainer.read_timer.tic()
trainer.save_model({
'epoch': epoch,
'network': trainer.model.state_dict(),
'optimizer': trainer.optimizer.state_dict(),
}, epoch)
if __name__ == "__main__":
main()
================================================
FILE: requirements.txt
================================================
numpy
tqdm
torch
torchvision
torchsummary
opencv-python
matplotlib
pycocotools
scipy
================================================
FILE: tool/Human36M/README.MD
================================================
## Human3.6M dataset pre-processing code
You should run the matlab code first, and the python code will convert the output of the matlab code to the json files.
**You don't have to run this when you downloaded json files from the google drive.** This is to make json files from raw data.
================================================
FILE: tool/Human36M/h36m2coco.py
================================================
import os
import os.path as osp
import scipy.io as sio
import numpy as np
import cv2
import random
import json
import math
from tqdm import tqdm
root_dir = './images' # define path here
save_dir = './annotations' # define path here
joint_num = 17
subject_list = [1, 5, 6, 7, 8, 9, 11]
action_idx = (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)
subaction_idx = (1, 2)
camera_idx = (1, 2, 3, 4)
action_name = ['Directions', 'Discussion', 'Eating', 'Greeting', 'Phoning', 'Posing', 'Purchases', 'Sitting', 'SittingDown', 'Smoking', 'Photo', 'Waiting', 'Walking', 'WalkDog', 'WalkTogether']
def load_h36m_annot_file(annot_file):
data = sio.loadmat(annot_file)
joint_world = data['pose3d_world'] # 3D world coordinates of keypoints
R = data['R'] # extrinsic
T = np.reshape(data['T'],(3)) # extrinsic
f = np.reshape(data['f'],(-1)) # focal legnth
c = np.reshape(data['c'],(-1)) # principal points
img_heights = np.reshape(data['img_height'],(-1))
img_widths = np.reshape(data['img_width'],(-1))
return joint_world, R, T, f, c, img_widths, img_heights
def _H36FolderName(subject_id, act_id, subact_id, camera_id):
return "s_%02d_act_%02d_subact_%02d_ca_%02d" % \
(subject_id, act_id, subact_id, camera_id)
def _H36ImageName(folder_name, frame_id):
return "%s_%06d.jpg" % (folder_name, frame_id + 1)
def cam2pixel(cam_coord, f, c):
x = cam_coord[..., 0] / cam_coord[..., 2] * f[0] + c[0]
y = cam_coord[..., 1] / cam_coord[..., 2] * f[1] + c[1]
return x,y
def world2cam(world_coord, R, t):
cam_coord = np.dot(R, world_coord - t)
return cam_coord
def get_bbox(joint_img):
bbox = np.zeros((4))
xmin = np.min(joint_img[:,0])
ymin = np.min(joint_img[:,1])
xmax = np.max(joint_img[:,0])
ymax = np.max(joint_img[:,1])
width = xmax - xmin - 1
height = ymax - ymin - 1
bbox[0] = (xmin + xmax)/2. - width/2*1.2
bbox[1] = (ymin + ymax)/2. - height/2*1.2
bbox[2] = width*1.2
bbox[3] = height*1.2
return bbox
img_id = 0; annot_id = 0
for subject in tqdm(subject_list):
cam_param = {}
joint_3d = {}
images = []; annotations = [];
for aid in tqdm(action_idx):
for said in tqdm(subaction_idx):
for cid in tqdm(camera_idx):
folder = _H36FolderName(subject,aid,said,cid)
if folder == 's_11_act_02_subact_02_ca_01':
continue
joint_world, R, t, f, c, img_widths, img_heights = load_h36m_annot_file(osp.join(root_dir, folder, 'h36m_meta.mat'))
if str(aid) not in joint_3d:
joint_3d[str(aid)] = {}
if str(said) not in joint_3d[str(aid)]:
joint_3d[str(aid)][str(said)] = {}
img_num = np.shape(joint_world)[0]
for n in range(img_num):
img_dict = {}
img_dict['id'] = img_id
img_dict['file_name'] = osp.join(folder, _H36ImageName(folder, n))
img_dict['width'] = int(img_widths[n])
img_dict['height'] = int(img_heights[n])
img_dict['subject'] = subject
img_dict['action_name'] = action_name[aid-2]
img_dict['action_idx'] = aid
img_dict['subaction_idx'] = said
img_dict['cam_idx'] = cid
img_dict['frame_idx'] = n
images.append(img_dict)
if str(cid) not in cam_param:
cam_param[str(cid)] = {'R': R.tolist(), 't': t.tolist(), 'f': f.tolist(), 'c': c.tolist()}
if str(n) not in joint_3d[str(aid)][str(said)]:
joint_3d[str(aid)][str(said)][str(n)] = joint_world[n].tolist()
annot_dict = {}
annot_dict['id'] = annot_id
annot_dict['image_id'] = img_id
# project world coordinate to cam, image coordinate space
joint_cam = np.zeros((joint_num,3))
for j in range(joint_num):
joint_cam[j] = world2cam(joint_world[n][j], R, t)
joint_img = np.zeros((joint_num,2))
joint_img[:,0], joint_img[:,1] = cam2pixel(joint_cam, f, c)
joint_vis = (joint_img[:,0] >= 0) * (joint_img[:,0] < img_widths[n]) * (joint_img[:,1] >= 0) * (joint_img[:,1] < img_heights[n])
annot_dict['keypoints_vis'] = joint_vis.tolist()
bbox = get_bbox(joint_img)
annot_dict['bbox'] = bbox.tolist() # xmin, ymin, width, height
annotations.append(annot_dict)
img_id += 1
annot_id += 1
data = {'images': images, 'annotations': annotations}
with open(osp.join(save_dir, 'Human36M_subject' + str(subject) + '_data.json'), 'w') as f:
json.dump(data, f)
with open(osp.join(save_dir, 'Human36M_subject' + str(subject) + '_camera.json'), 'w') as f:
json.dump(cam_param, f)
with open(osp.join(save_dir, 'Human36M_subject' + str(subject) + '_joint_3d.json'), 'w') as f:
json.dump(joint_3d, f)
================================================
FILE: tool/Human36M/preprocess_h36m.m
================================================
% Preprocess human3.6m dataset
% Place this file to the Release-v1.1 folder and run it
function preprocess_h36m()
close all;
%clear;
%clc;
addpaths;
%--------------------------------------------------------------------------
% PARAMETERS
% Subject (1, 5, 6, 7, 8, 9, 11)
SUBJECT = [1 5 6 7 8 9 11];
% Action (2 ~ 16)
ACTION = 2:16;
% Subaction (1 ~ 2)
SUBACTION = 1:2;
% Camera (1 ~ 4)
CAMERA = 1:4;
num_joint = 17;
root_dir = '.'; % define path here
% if rgb sequence is declared in the loop, it causes stuck (do not know
% reason)
rgb_sequence = cell(1,100000000);
COUNT = 1;
%--------------------------------------------------------------------------
% MAIN LOOP
% For each subject, action, subaction, and camera..
for subject = SUBJECT
for action = ACTION
for subaction = SUBACTION
for camera = CAMERA
fprintf('Processing subject %d, action %d, subaction %d, camera %d..\n', ...
subject, action, subaction, camera);
img_save_dir = sprintf('%s/images/s_%02d_act_%02d_subact_%02d_ca_%02d', ...
root_dir, subject, action, subaction, camera);
if ~exist(img_save_dir, 'dir')
mkdir(img_save_dir);
end
mask_save_dir = sprintf('%s/masks/s_%02d_act_%02d_subact_%02d_ca_%02d', ...
root_dir, subject, action, subaction, camera);
if ~exist(mask_save_dir, 'dir')
mkdir(mask_save_dir);
end
annot_save_dir = sprintf('%s/annotations/s_%02d_act_%02d_subact_%02d_ca_%02d', ...
root_dir, subject, action, subaction, camera);
if ~exist(annot_save_dir, 'dir')
mkdir(annot_save_dir);
end
if (subject==11) && (action==2) && (subaction==2) && (camera==1)
fprintf('There is an error in subject 11, action 2, subaction 2, and camera 1\n');
continue;
end
% Select sequence
Sequence = H36MSequence(subject, action, subaction, camera);
% Get 3D pose and 2D pose
Features{1} = H36MPose3DPositionsFeature(); % 3D world coordinates
Features{1}.Part = 'body'; % Only consider 17 joints
Features{2} = H36MPose3DPositionsFeature('Monocular', true); % 3D camera coordinates
Features{2}.Part = 'body'; % Only consider 17 joints
Features{3} = H36MPose2DPositionsFeature(); % 2D image coordinates
Features{3}.Part = 'body'; % Only consider 17 joints
F = H36MComputeFeatures(Sequence, Features);
num_frame = Sequence.NumFrames;
pose3d_world = reshape(F{1}, num_frame, 3, num_joint);
pose3d = reshape(F{2}, num_frame, 3, num_joint);
pose2d = reshape(F{3}, num_frame, 2, num_joint);
% Camera (in global coordinate)
Camera = Sequence.getCamera();
% Sanity check
if false
R = Camera.R; % rotation matrix
T = Camera.T'; % origin of the world coord system
K = [Camera.f(1) 0 Camera.c(1);
0 Camera.f(2) Camera.c(2);
0 0 1]; % f: focal length, c: principal points
error = 0;
for i = 1:num_frame
X = squeeze(pose3d_global(i,:,:));
x = squeeze(pose2d(i,:,:));
px = K*R*(X-T);
px = px ./ px(3,:);
px = px(1:2,:);
error = error + mean(sqrt(sum((px-x).^2, 1)));
end
error = error / num_frame;
fprintf('reprojection error = %.2f (pixels)\n', error);
keyboard;
end
%% Image, bounding box for each sampled frame
fprintf('Load RGB video: ');
rgb_extractor = H36MRGBVideoFeature();
rgb_sequence{COUNT} = rgb_extractor.serializer(Sequence);
fprintf('Done!!\n');
img_height = zeros(num_frame,1);
img_width = zeros(num_frame,1);
fprintf('Load mask video: ');
mask_extractor = H36MMyBGMask();
mask_sequence = mask_extractor.serializer(Sequence);
fprintf('Done!!\n');
% For each frame,
for i = 1:num_frame
if mod(i,100) == 1
fprintf('.');
end
% Save image
% Get data
img = rgb_sequence{COUNT}.getFrame(i);
[h, w, c] = size(img);
img_height(i) = h;
img_width(i) = w;
img_name = sprintf('%s/s_%02d_act_%02d_subact_%02d_ca_%02d_%06d.jpg', ...
img_save_dir, subject, action, subaction, camera, i);
%imwrite(img, img_name);
mask = mask_sequence.Buffer{i};
mask_name = sprintf('%s/s_%02d_act_%02d_subact_%02d_ca_%02d_%06d.jpg', ...
mask_save_dir, subject, action, subaction, camera, i);
imwrite(mask, mask_name);
end
COUNT = COUNT + 1;
% Save data
pose3d_world = permute(pose3d_world,[1,3,2]); % world coordinate 3D keypoint coordinates
R = Camera.R; % rotation matrix
T = Camera.T; % origin of the world coord system
f = Camera.f; % focal length
c = Camera.c; % principal points
filename = sprintf('%s/h36m_meta.mat', annot_save_dir);
%save(filename, 'pose3d_world', 'f', 'c', 'R', 'T', 'img_height', 'img_width');
fprintf('\n');
end
end
end
end
end
================================================
FILE: vis/coco_img_name.py
================================================
import os
import os.path as osp
import scipy.io as sio
import numpy as np
from pycocotools.coco import COCO
import json
import cv2
import random
import math
annot_path = osp.join('coco', 'person_keypoints_val2017.json')
data = []
db = COCO(annot_path)
fp = open('coco_img_name.txt','w')
for iid in db.imgs.keys():
img = db.imgs[iid]
imgname = img['file_name']
imgname = 'coco_' + imgname.split('.')[0]
fp.write(imgname + '\n')
fp.close()
================================================
FILE: vis/multi/draw_2Dskeleton.m
================================================
function img = draw_2Dskeleton(img_name, pred_2d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton)
img = imread(img_name);
[imgHeight, imgWidth, dim] = size(image);
f = figure;
set(f, 'visible', 'off');
imshow(img);
hold on;
line_width = 4;
num_skeleton = size(skeleton,1);
num_pred = size(pred_2d_kpt,1);
for i = 1:num_pred
for j =1:num_skeleton
k1 = skeleton(j,1);
k2 = skeleton(j,2);
plot([pred_2d_kpt(i,k1,1),pred_2d_kpt(i,k2,1)],[pred_2d_kpt(i,k1,2),pred_2d_kpt(i,k2,2)],'Color',colorList_skeleton(j,:),'LineWidth',line_width);
end
for j=1:num_joint
scatter(pred_2d_kpt(i,j,1),pred_2d_kpt(i,j,2),100,colorList_joint(j,:),'filled');
end
end
set(gca,'Units','normalized','Position',[0 0 1 1]); %# Modify axes size
frame = getframe(gcf);
img = frame.cdata;
hold off;
close(f);
end
================================================
FILE: vis/multi/draw_3Dpose_coco.m
================================================
function draw_3Dpose_coco()
root_path = '/mnt/hdd1/Data/Human_pose_estimation/COCO/2017/val2017/';
save_path = './vis/';
num_joint = 17;
colorList_skeleton = [
255/255 128/255 0/255;
255/255 153/255 51/255;
255/255 178/255 102/255;
230/255 230/255 0/255;
255/255 153/255 255/255;
153/255 204/255 255/255;
255/255 102/255 255/255;
255/255 51/255 255/255;
102/255 178/255 255/255;
51/255 153/255 255/255;
255/255 153/255 153/255;
255/255 102/255 102/255;
255/255 51/255 51/255;
153/255 255/255 153/255;
102/255 255/255 102/255;
51/255 255/255 51/255;
];
colorList_joint = [
255/255 128/255 0/255;
255/255 153/255 51/255;
255/255 153/255 153/255;
255/255 102/255 102/255;
255/255 51/255 51/255;
153/255 255/255 153/255;
102/255 255/255 102/255;
51/255 255/255 51/255;
255/255 153/255 255/255;
255/255 102/255 255/255;
255/255 51/255 255/255;
153/255 204/255 255/255;
102/255 178/255 255/255;
51/255 153/255 255/255;
230/255 230/255 0/255;
230/255 230/255 0/255;
255/255 178/255 102/255;
];
skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ];
skeleton = transpose(reshape(skeleton,[2,16])) + 1;
fp_img_name = fopen('../coco_img_name.txt');
preds_2d_kpt = load('preds_2d_kpt_coco.mat');
preds_3d_kpt = load('preds_3d_kpt_coco.mat');
img_name = fgetl(fp_img_name);
while ischar(img_name)
if isfield(preds_2d_kpt,img_name)
pred_2d_kpt = getfield(preds_2d_kpt,img_name);
pred_3d_kpt = getfield(preds_3d_kpt,img_name);
img_name = strsplit(img_name,'_');
img_name = strcat(img_name{2},'.jpg');
img_path = strcat(root_path,img_name);
%img = draw_2Dskeleton(img_path,pred_2d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton);
img = imread(img_path);
f = draw_3Dskeleton(img,pred_3d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton);
set(gcf, 'InvertHardCopy', 'off');
set(gcf,'color','w');
mkdir(save_path);
saveas(f, strcat(save_path,img_name));
close(f);
end
img_name = fgetl(fp_img_name);
end
end
================================================
FILE: vis/multi/draw_3Dpose_mupots.m
================================================
function draw_3Dpose_mupots()
root_path = '/mnt/hdd1/Data/Human_pose_estimation/MU/mupots-3d-eval/MultiPersonTestSet/';
save_path = './vis/';
num_joint = 17;
colorList_skeleton = [
255/255 128/255 0/255;
255/255 153/255 51/255;
255/255 178/255 102/255;
230/255 230/255 0/255;
255/255 153/255 255/255;
153/255 204/255 255/255;
255/255 102/255 255/255;
255/255 51/255 255/255;
102/255 178/255 255/255;
51/255 153/255 255/255;
255/255 153/255 153/255;
255/255 102/255 102/255;
255/255 51/255 51/255;
153/255 255/255 153/255;
102/255 255/255 102/255;
51/255 255/255 51/255;
];
colorList_joint = [
255/255 128/255 0/255;
255/255 153/255 51/255;
255/255 153/255 153/255;
255/255 102/255 102/255;
255/255 51/255 51/255;
153/255 255/255 153/255;
102/255 255/255 102/255;
51/255 255/255 51/255;
255/255 153/255 255/255;
255/255 102/255 255/255;
255/255 51/255 255/255;
153/255 204/255 255/255;
102/255 178/255 255/255;
51/255 153/255 255/255;
230/255 230/255 0/255;
230/255 230/255 0/255;
255/255 178/255 102/255;
];
skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ];
skeleton = transpose(reshape(skeleton,[2,16])) + 1;
fp_img_name = fopen('../mupots_img_name.txt');
preds_2d_kpt = load('preds_2d_kpt_mupots.mat');
preds_3d_kpt = load('preds_3d_kpt_mupots.mat');
img_name = fgetl(fp_img_name);
while ischar(img_name)
img_name_split = strsplit(img_name);
folder_id = str2double(img_name_split(1)); frame_id = str2double(img_name_split(2));
img_name = sprintf('TS%d/img_%06d.jpg',folder_id, frame_id);
img_path = strcat(root_path,img_name);
pred_2d_kpt = getfield(preds_2d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id));
pred_3d_kpt = getfield(preds_3d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id));
%img = draw_2Dskeleton(img_path,pred_2d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton);
img = imread(img_path);
f = draw_3Dskeleton(img,pred_3d_kpt,num_joint,skeleton,colorList_joint,colorList_skeleton);
set(gcf, 'InvertHardCopy', 'off');
set(gcf,'color','w');
mkdir(strcat(save_path,sprintf('TS%d',folder_id)));
saveas(f, strcat(save_path,img_name));
close(f);
img_name = fgetl(fp_img_name);
end
end
================================================
FILE: vis/multi/draw_3Dskeleton.m
================================================
function f = draw_3Dskeleton(img, pred_3d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton)
x = pred_3d_kpt(:,:,1);
y = pred_3d_kpt(:,:,2);
z = pred_3d_kpt(:,:,3);
pred_3d_kpt(:,:,1) = -z;
pred_3d_kpt(:,:,2) = x;
pred_3d_kpt(:,:,3) = -y;
[imgHeight, imgWidth, dim] = size(img);
figure_height = 450;
figure_width = figure_height / imgHeight * imgWidth;
f = figure('Position',[100 100 figure_width figure_height]);
set(f, 'visible', 'off');
hold on;
grid on;
line_width = 4;
point_width = 50;
num_skeleton = size(skeleton,1);
num_pred = size(pred_3d_kpt,1);
for i = 1:num_pred
for j =1:num_skeleton
k1 = skeleton(j,1);
k2 = skeleton(j,2);
plot3([pred_3d_kpt(i,k1,1),pred_3d_kpt(i,k2,1)],[pred_3d_kpt(i,k1,2),pred_3d_kpt(i,k2,2)],[pred_3d_kpt(i,k1,3),pred_3d_kpt(i,k2,3)],'Color',colorList_skeleton(j,:),'LineWidth',line_width);
end
for j=1:num_joint
scatter3(pred_3d_kpt(i,j,1),pred_3d_kpt(i,j,2),pred_3d_kpt(i,j,3),point_width,colorList_joint(j,:),'filled');
end
end
set(gca, 'color', [255/255 255/255 255/255]);
set(gca,'XTickLabel',[]);
set(gca,'YTickLabel',[]);
set(gca,'ZTickLabel',[]);
x = pred_3d_kpt(:,:,1);
xmin = min(x(:)) - 120000;
xmax = max(x(:)) + 6000;
y = pred_3d_kpt(:,:,2);
ymin = min(y(:));
ymax = max(y(:));
z = pred_3d_kpt(:,:,3);
zmin = min(z(:));
zmax = max(z(:));
xlim([xmin xmax]);
ylim([ymin ymax]);
zlim([zmin zmax]);
h_img = surf([xmin;xmin],[ymin ymax;ymin ymax],[zmax zmax;zmin zmin],'CData',img,'FaceColor','texturemap');
set(h_img);
view(62,27);
end
================================================
FILE: vis/mupots_img_name.py
================================================
import os
import os.path as osp
import scipy.io as sio
import numpy as np
from pycocotools.coco import COCO
import json
import cv2
import random
import math
annot_path = osp.join('mupots', 'MuPoTS-3D.json')
data = []
db = COCO(annot_path)
fp = open('mupots_img_name.txt','w')
for iid in db.imgs.keys():
img = db.imgs[iid]
imgname = img['file_name'].split('/')
folder_id = int(imgname[0][2:])
frame_id = int(imgname[1].split('.')[0][4:])
fp.write(str(folder_id) + ' ' + str(frame_id) + '\n')
fp.close()
================================================
FILE: vis/single/draw_2Dskeleton.m
================================================
function img = draw_2Dskeleton(img_name, pred_2d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton)
img = imread(img_name);
pred_2d_kpt = squeeze(pred_2d_kpt);
f = figure;
set(f, 'visible', 'off');
imshow(img);
hold on;
line_width = 4;
num_skeleton = size(skeleton,1);
for j =1:num_skeleton
k1 = skeleton(j,1);
k2 = skeleton(j,2);
plot([pred_2d_kpt(k1,1),pred_2d_kpt(k2,1)],[pred_2d_kpt(k1,2),pred_2d_kpt(k2,2)],'Color',colorList_skeleton(j,:),'LineWidth',line_width);
end
for j=1:num_joint
scatter(pred_2d_kpt(j,1),pred_2d_kpt(j,2),100,colorList_joint(j,:),'filled');
end
set(gca,'Units','normalized','Position',[0 0 1 1]); %# Modify axes size
frame = getframe(gcf);
img = frame.cdata;
hold off;
close(f);
end
================================================
FILE: vis/single/draw_3Dpose_coco.m
================================================
function draw_3Dpose_coco()
root_path = '/mnt/hdd1/Data/Human_pose_estimation/COCO/2017/val2017/';
save_path = './vis/';
num_joint = 17;
mkdir(save_path);
colorList_skeleton = [
255/255 128/255 0/255;
255/255 153/255 51/255;
255/255 178/255 102/255;
230/255 230/255 0/255;
255/255 153/255 255/255;
153/255 204/255 255/255;
255/255 102/255 255/255;
255/255 51/255 255/255;
102/255 178/255 255/255;
51/255 153/255 255/255;
255/255 153/255 153/255;
255/255 102/255 102/255;
255/255 51/255 51/255;
153/255 255/255 153/255;
102/255 255/255 102/255;
51/255 255/255 51/255;
];
colorList_joint = [
255/255 128/255 0/255;
255/255 153/255 51/255;
255/255 153/255 153/255;
255/255 102/255 102/255;
255/255 51/255 51/255;
153/255 255/255 153/255;
102/255 255/255 102/255;
51/255 255/255 51/255;
255/255 153/255 255/255;
255/255 102/255 255/255;
255/255 51/255 255/255;
153/255 204/255 255/255;
102/255 178/255 255/255;
51/255 153/255 255/255;
230/255 230/255 0/255;
230/255 230/255 0/255;
255/255 178/255 102/255;
];
skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ];
skeleton = transpose(reshape(skeleton,[2,16])) + 1;
fp_img_name = fopen('../coco_img_name.txt');
preds_2d_kpt = load('preds_2d_kpt_coco.mat');
preds_3d_kpt = load('preds_3d_kpt_coco.mat');
img_name = fgetl(fp_img_name);
while ischar(img_name)
if isfield(preds_2d_kpt,img_name)
pred_2d_kpt = getfield(preds_2d_kpt,img_name);
pred_3d_kpt = getfield(preds_3d_kpt,img_name);
img_name = strsplit(img_name,'_');
img_name = strcat(img_name{2},'.jpg');
img_path = strcat(root_path,img_name);
num_pred = size(pred_2d_kpt,1);
for i = 1:num_pred
img = draw_2Dskeleton(img_path,pred_2d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton);
save_name = strsplit(img_name,'.');
save_name = save_name{1};
save_name = strcat(save_name,sprintf('_%d_2d.jpg',i));
disp(strcat(save_path,save_name));
imwrite(img,strcat(save_path,save_name));
f = draw_3Dskeleton(pred_3d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton);
set(gcf, 'InvertHardCopy', 'off');
set(gcf,'color','w');
save_name = strsplit(img_name,'.');
save_name = save_name{1};
save_name = strcat(save_name,sprintf('_%d_3d.jpg',i));
saveas(f, strcat(save_path,save_name));
close(f);
end
end
img_name = fgetl(fp_img_name);
end
end
================================================
FILE: vis/single/draw_3Dpose_mupots.m
================================================
function draw_3Dpose_mupots()
root_path = '/mnt/hdd1/Data/Human_pose_estimation/MU/mupots-3d-eval/MultiPersonTestSet/';
save_path = './vis/';
num_joint = 17;
colorList_skeleton = [
255/255 128/255 0/255;
255/255 153/255 51/255;
255/255 178/255 102/255;
230/255 230/255 0/255;
255/255 153/255 255/255;
153/255 204/255 255/255;
255/255 102/255 255/255;
255/255 51/255 255/255;
102/255 178/255 255/255;
51/255 153/255 255/255;
255/255 153/255 153/255;
255/255 102/255 102/255;
255/255 51/255 51/255;
153/255 255/255 153/255;
102/255 255/255 102/255;
51/255 255/255 51/255;
];
colorList_joint = [
255/255 128/255 0/255;
255/255 153/255 51/255;
255/255 153/255 153/255;
255/255 102/255 102/255;
255/255 51/255 51/255;
153/255 255/255 153/255;
102/255 255/255 102/255;
51/255 255/255 51/255;
255/255 153/255 255/255;
255/255 102/255 255/255;
255/255 51/255 255/255;
153/255 204/255 255/255;
102/255 178/255 255/255;
51/255 153/255 255/255;
230/255 230/255 0/255;
230/255 230/255 0/255;
255/255 178/255 102/255;
];
skeleton = [ [0, 16], [1, 16], [1, 15], [15, 14], [14, 8], [14, 11], [8, 9], [9, 10], [11, 12], [12, 13], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7] ];
skeleton = transpose(reshape(skeleton,[2,16])) + 1;
fp_img_name = fopen('../mupots_img_name.txt');
preds_2d_kpt = load('preds_2d_kpt_mupots.mat');
preds_3d_kpt = load('preds_3d_kpt_mupots.mat');
img_name = fgetl(fp_img_name);
while ischar(img_name)
img_name_split = strsplit(img_name);
folder_id = str2double(img_name_split(1)); frame_id = str2double(img_name_split(2));
img_name = sprintf('TS%d/img_%06d.jpg',folder_id, frame_id);
img_path = strcat(root_path,img_name);
mkdir(strcat(save_path,sprintf('TS%d',folder_id)));
pred_2d_kpt = getfield(preds_2d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id));
pred_3d_kpt = getfield(preds_3d_kpt,sprintf('TS%d_img_%06d',folder_id, frame_id));
num_pred = size(pred_2d_kpt,1);
for i = 1:num_pred
img = draw_2Dskeleton(img_path,pred_2d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton);
save_name = sprintf('TS%d/img_%06d_%d_2d.jpg',folder_id, frame_id, i);
imwrite(img,strcat(save_path,save_name));
f = draw_3Dskeleton(pred_3d_kpt(i,:,:),num_joint,skeleton,colorList_joint,colorList_skeleton);
set(gcf, 'InvertHardCopy', 'off');
set(gcf,'color','w');
save_name = sprintf('TS%d/img_%06d_%d_3d.jpg',folder_id, frame_id, i);
saveas(f, strcat(save_path,save_name));
close(f);
end
img_name = fgetl(fp_img_name);
end
end
================================================
FILE: vis/single/draw_3Dskeleton.m
================================================
function f = draw_3Dskeleton(pred_3d_kpt, num_joint, skeleton, colorList_joint, colorList_skeleton)
pred_3d_kpt = squeeze(pred_3d_kpt);
x = pred_3d_kpt(:,1);
y = pred_3d_kpt(:,2);
z = pred_3d_kpt(:,3);
pred_3d_kpt(:,1) = -z;
pred_3d_kpt(:,2) = x;
pred_3d_kpt(:,3) = -y;
f = figure;%('Position',[100 100 600 600]);
set(f, 'visible', 'off');
hold on;
grid on;
line_width = 6;
num_skeleton = size(skeleton,1);
for j =1:num_skeleton
k1 = skeleton(j,1);
k2 = skeleton(j,2);
plot3([pred_3d_kpt(k1,1),pred_3d_kpt(k2,1)],[pred_3d_kpt(k1,2),pred_3d_kpt(k2,2)],[pred_3d_kpt(k1,3),pred_3d_kpt(k2,3)],'Color',colorList_skeleton(j,:),'LineWidth',line_width);
end
for j=1:num_joint
scatter3(pred_3d_kpt(j,1),pred_3d_kpt(j,2),pred_3d_kpt(j,3),100,colorList_joint(j,:),'filled');
end
set(gca, 'color', [255/255 255/255 255/255]);
set(gca,'XTickLabel',[]);
set(gca,'YTickLabel',[]);
set(gca,'ZTickLabel',[]);
x = pred_3d_kpt(:,1);
xmin = min(x(:)) - 100;
xmax = max(x(:)) + 100;
y = pred_3d_kpt(:,2);
ymin = min(y(:)) - 100;
ymax = max(y(:)) + 100;
z = pred_3d_kpt(:,3);
zmin = min(z(:));
zmax = max(z(:)) + 100;
xcenter = mean(pred_3d_kpt(:,1));
ycenter = mean(pred_3d_kpt(:,2));
zcenter = mean(pred_3d_kpt(:,3));
xmin = xcenter - 1000;
xmax = xcenter + 1000;
ymin = ycenter - 1000;
ymax = ycenter + 1000;
zmin = zcenter - 1000;
zmax = zcenter + 1000;
xlim([xmin xmax]);
ylim([ymin ymax]);
zlim([zmin zmax]);
view(62,7);
end