Repository: fengju514/Face-Pose-Net
Branch: master
Commit: 088bba25a170
Files: 34
Total size: 187.5 KB
Directory structure:
gitextract_nt_altqw/
├── .gitmodules
├── BFM/
│ └── README
├── README.md
├── ResNet/
│ └── ThreeDMM_shape.py
├── get_Rts.py
├── input.csv
├── input_list.txt
├── input_samples/
│ └── README
├── kaffe/
│ ├── __init__.py
│ ├── errors.py
│ ├── graph.py
│ ├── layers.py
│ ├── shapes.py
│ ├── tensorflow/
│ │ ├── __init__.py
│ │ └── network_shape.py
│ └── transformers.py
├── main_fpn.py
├── main_predict_6DoF.py
├── main_predict_ProjMat.py
├── models/
│ └── README
├── myparse.py
├── output_render/
│ └── README.md
├── pose_model.py
├── pose_utils.py
├── renderer_fpn.py
├── tf_utils.py
├── train_stats/
│ ├── 3DMM_shape_mean.npy
│ ├── README
│ ├── train_label_mean_300WLP.npy
│ ├── train_label_mean_ProjMat.npy
│ ├── train_label_std_300WLP.npy
│ └── train_label_std_ProjMat.npy
└── utils/
├── README
└── pose_utils.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitmodules
================================================
[submodule "face_renderer"]
path = face_renderer
url = https://github.com/iacopomasi/face_specific_augm/
================================================
FILE: BFM/README
================================================
================================================
FILE: README.md
================================================
# Face-Pose-Net

<sub>**Extreme face alignment examples:** Faces rendered to a 45 degrees yaw angle (aligned to half profile) using our FacePoseNet. Images were taken from the IJB-A collection and represent extreme viewing conditions, including near profile views, occlusions, and low resolution. Such conditions are often too hard for existing face landmark detection methods to handle yet easily aligned with our FacePoseNet.</sub>
<br/>
<br/>
This page contains DCNN model and python code to robustly estimate 6 degrees of freedom, 3D face pose from an unconstrained image, without the use of face landmark detectors. The method is described in the paper:
_F.-J. Chang, A. Tran, T. Hassner, I. Masi, R. Nevatia, G. Medioni, "[FacePoseNet: Making a Case for Landmark-Free Face Alignment](https://arxiv.org/abs/1708.07517)", in 7th IEEE International Workshop on Analysis and Modeling of Faces and Gestures, ICCV Workshops, 2017_ [1].
This release bundles up our **FacePoseNet** (FPN) with the **Face Renderer** from Masi _et al._ [2,5], which is available separately from [this project page](https://github.com/iacopomasi/face_specific_augm).
The result is an end-to-end pipeline that seamlessly estimates facial pose and produces multiple rendered views to be used for face alignment and data augmentation.

## Updates (Modified and New features, 12/20/2018)
* FPN structure is changed to ResNet-101 for better pose prediction [fpn-resnet101](./ResNet/ThreeDMM_shape.py)
* **Two versions of FPNs (under the assumption of weak perspective transformation) are added**:
* (1) **Predict 6DoF head pose** (scale, pitch, yaw, roll, translation_x, translation_y): [main_predict_6DoF.py](./main_predict_6DoF.py)
* (2) **Predict 11 parameters of the 3x4 projection matrix**: [main_predict_ProjMat.py](./main_predict_ProjMat.py)
* The codes to convert 6DoF head pose to 3x4 projection matrix is [here](https://github.com/fengju514/Face-Pose-Net/blob/fb733f358d9f633f6525a41f3a7a0a99e5c71647/main_predict_6DoF.py#L263-L268)
* The codes to convert 11 parameters / 3x4 projection matrix to 6DoF head pose is [here](https://github.com/fengju514/Face-Pose-Net/blob/92bd65fa056d17065890e186ca2f2b376a5ab135/main_predict_ProjMat.py#L306-L308)
* The corresponding 3D shape and landmarks can be obtained by predicted 6DoF head pose [3D shape from 6DoF](https://github.com/fengju514/Face-Pose-Net/blob/92bd65fa056d17065890e186ca2f2b376a5ab135/main_predict_6DoF.py#L271-L297) or by predicted 11 parameters [3D shape from 11 parameters](https://github.com/fengju514/Face-Pose-Net/blob/92bd65fa056d17065890e186ca2f2b376a5ab135/main_predict_ProjMat.py#L272-L297)
* Download new FPN models: Please put all model files [here](https://www.dropbox.com/sh/lr9u4my1qrhmgik/AADQVUIHSJIUXqUAj1AoZMIGa?dl=0) in the folder `models`
* Download BFM models: Please put BFM shape and expression files [here](https://www.dropbox.com/sh/ru7ierl9516a9az/AABTP9hJj3dJnapicFFgHmOna?dl=0) in the folder `BFM`
* Run new FPN to predict 6DoF head pose:
```bash
$ python main_predict_6DoF.py <gpu_id> <input-list-path>
```
* Run new FPN to predict 11DoF parameters of the projection matrix:
```bash
$ python main_predict_ProjMat.py <gpu_id> <input-list-path>
```
We provide a sample input list available [here](./input_list.txt).
```bash
<FILE_NAME, FACE_X, FACE_y, FACE_WIDTH, FACE_HEIGHT>
```
where `<FACE_X, FACE_y, FACE_WIDTH, FACE_HEIGHT>` is the x,y coordinates of the upper-left point, the width, and the height of the tight face bounding box, either obtained manually, by the face detector or by the landmark detector. The predicted 6DoF and 11DoF results would be saved in [output_6DoF folder](https://github.com/fengju514/Face-Pose-Net/blob/a7923b764f92892021297fd046065c22a41dc519/main_predict_6DoF.py#L232-L236) and [output_ProjMat folder](https://github.com/fengju514/Face-Pose-Net/blob/a7923b764f92892021297fd046065c22a41dc519/main_predict_ProjMat.py#L235-L239) respectively. The output 3D shapes and landmarks by 6DoF and 11DoF are saved in [output_6DoF folder](https://github.com/fengju514/Face-Pose-Net/blob/a7923b764f92892021297fd046065c22a41dc519/main_predict_6DoF.py#L301) and in [output_ProjMat folder](https://github.com/fengju514/Face-Pose-Net/blob/a7923b764f92892021297fd046065c22a41dc519/main_predict_ProjMat.py#L301) respectively. You can visualize the 3D shapes and landmarks via Matlab.
* The same renderer can be used. Instead of feeding into the 6DoF pose, you need to feed into the predicted landmarks either from 6DoF head pose or from 3x4 projection matrix. Please see an example in demo.py of [this project page](https://github.com/iacopomasi/face_specific_augm)
## Features
* **6DoF 3D Head Pose estimation** + **3D rendered facial views**.
* Does not use **fragile** landmark detectors
* Robustness on images landmark detectors struggle with (low rez., occluded, etc.)
* Extremely fast pose estimation
* Both CPU and GPU supported
* Provides better face recognition through better face alignment than alignment using state of the art landmark detectors [1]
## Dependencies
* [TensorFlow](https://www.tensorflow.org/)
* [OpenCV Python Wrapper](http://opencv.org/)
* [Numpy](http://www.numpy.org/)
* [Python2.7](https://www.python.org/download/releases/2.7/)
The code has been tested on Linux only. On Linux you can rely on the default version of python, installing all the packages needed from the package manager or on Anaconda Python and install required packages through `conda`.
**Note:** no landmarks are used in our method, although you can still project the landmarks on the input image using the estimated pose. See the paper for further details.
## Usage
* **Important:** In order to download **both** FPN code and the renderer use `git clone --recursive`
* **Important:** Please download the learned models from https://www.dropbox.com/s/r38psbq55y2yj4f/fpn_new_model.tar.gz?dl=0 and make sure that the FPN models are stored in the folder `fpn_new_model`.
### Run it
The alignment and rendering can be used from the command line in the following, different ways.
To run it directly on a list of images (software will run FPN to estimate the pose and then render novel views based on the estimated pose):
```bash
$ python main_fpn.py <input-list-path>
```
We provide a sample input list available [here](input.csv).
```bash
<ID, FILE, FACE_X, FACE_y, FACE_WIDTH, FACE_HEIGHT>
```
where `<FACE_X, FACE_y, FACE_WIDTH, FACE_HEIGHT>` is the face bounding box information, either obtained manually or by the face detector.
## Sample Results
Please see the input images [here](images) and rendered outputs [here](output_render).
### input: ###

### rendering: ###





## Current Limitations
FPN is currently trained with a single 3D generic shape, without accounting for facial expressions. Addressing these is planned as future work.
## Citation
Please cite our paper with the following bibtex if you use our face renderer:
``` latex
@inproceedings{chang17fpn,
title={{F}ace{P}ose{N}et: Making a Case for Landmark-Free Face Alignment},
booktitle = {7th IEEE International Workshop on Analysis and Modeling of Faces and Gestures, ICCV Workshops},
author={
Feng-ju Chang
and Anh Tran
and Tal Hassner
and Iacopo Masi
and Ram Nevatia
and G\'{e}rard Medioni},
year={2017},
}
```
## References
[1] F.-J. Chang, A. Tran, T. Hassner, I. Masi, R. Nevatia, G. Medioni, "[FacePoseNet: Making a Case for Landmark-Free Face Alignment](https://arxiv.org/abs/1708.07517)", in 7th IEEE International Workshop on Analysis and Modeling of Faces and Gestures, ICCV Workshops, 2017
[2] I. Masi\*, A. Tran\*, T. Hassner\*, J. Leksut, G. Medioni, "Do We Really Need to Collect Million of Faces for Effective Face Recognition? ", ECCV 2016,
\* denotes equal authorship
[3] I. Masi, S. Rawls, G. Medioni, P. Natarajan "Pose-Aware Face Recognition in the Wild", CVPR 2016
[4] T. Hassner, S. Harel, E. Paz and R. Enbar "Effective Face Frontalization in Unconstrained Images", CVPR 2015
[5] I. Masi, T. Hassner, A. Tran, and G. Medioni, "Rapid Synthesis of Massive Face Sets for Improved Face Recognition", FG 2017
## Changelog
- August 2017, First Release
## Disclaimer
_The SOFTWARE PACKAGE provided in this page is provided "as is", without any guarantee made as to its suitability or fitness for any particular use. It may contain bugs, so use of this tool is at your own risk. We take no responsibility for any damage of any sort that may unintentionally be caused through its use._
## Contacts
If you have any questions, drop an email to _fengjuch@usc.edu_, _anhttran@usc.edu_, _iacopo.masi@usc.edu_ or _hassner@isi.edu_ or leave a message below with GitHub (log-in is needed).
================================================
FILE: ResNet/ThreeDMM_shape.py
================================================
import sys
sys.path.append('./kaffe')
sys.path.append('./kaffe/tensorflow')
#from kaffe.tensorflow.network_allNonTrain import Network
from network_shape import Network_Shape
class ResNet_101(Network_Shape):
def setup(self):
(self.feed('input')
.conv(7, 7, 64, 2, 2, biased=False, relu=False, name='conv1')
.batch_normalization(relu=True, name='bn_conv1')
.max_pool(3, 3, 2, 2, name='pool1')
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch1')
.batch_normalization(name='bn2a_branch1'))
(self.feed('pool1')
.conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2a_branch2a')
.batch_normalization(relu=True, name='bn2a_branch2a')
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2a_branch2b')
.batch_normalization(relu=True, name='bn2a_branch2b')
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch2c')
.batch_normalization(name='bn2a_branch2c'))
(self.feed('bn2a_branch1',
'bn2a_branch2c')
.add(name='res2a')
.relu(name='res2a_relu') # batch_size x 56 x 56 x 256
.conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2b_branch2a') # batch_size x 56 x 56 x 64
.batch_normalization(relu=True, name='bn2b_branch2a') # batch_size x 56 x 56 x 64
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2b_branch2b') # batch_size x 56 x 56 x 64
.batch_normalization(relu=True, name='bn2b_branch2b') # batch_size x 56 x 56 x 64
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2b_branch2c') # batch_size x 56 x 56 x 256
.batch_normalization(name='bn2b_branch2c')) # batch_size x 56 x 56 x 256
(self.feed('res2a_relu', # batch_size x 56 x 56 x 256
'bn2b_branch2c') # batch_size x 56 x 56 x 256
.add(name='res2b')
.relu(name='res2b_relu') # batch_size x 56 x 56 x 256
.conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2c_branch2a')
.batch_normalization(relu=True, name='bn2c_branch2a')
.conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2c_branch2b')
.batch_normalization(relu=True, name='bn2c_branch2b')
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2c_branch2c')
.batch_normalization(name='bn2c_branch2c'))
(self.feed('res2b_relu',
'bn2c_branch2c')
.add(name='res2c')
.relu(name='res2c_relu') # batch_size x 56 x 56 x 256
.conv(1, 1, 512, 2, 2, biased=False, relu=False, name='res3a_branch1')
.batch_normalization(name='bn3a_branch1'))
(self.feed('res2c_relu') # batch_size x 56 x 56 x 256
.conv(1, 1, 128, 2, 2, biased=False, relu=False, name='res3a_branch2a')
.batch_normalization(relu=True, name='bn3a_branch2a')
.conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3a_branch2b')
.batch_normalization(relu=True, name='bn3a_branch2b')
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3a_branch2c')
.batch_normalization(name='bn3a_branch2c'))
(self.feed('bn3a_branch1',
'bn3a_branch2c')
.add(name='res3a')
.relu(name='res3a_relu') # batch_size x 28 x 28 x 512
.conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3b1_branch2a')
.batch_normalization(relu=True, name='bn3b1_branch2a')
.conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3b1_branch2b')
.batch_normalization(relu=True, name='bn3b1_branch2b')
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3b1_branch2c')
.batch_normalization(name='bn3b1_branch2c'))
(self.feed('res3a_relu',
'bn3b1_branch2c')
.add(name='res3b1')
.relu(name='res3b1_relu') # batch_size x 28 x 28 x 512
.conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3b2_branch2a')
.batch_normalization(relu=True, name='bn3b2_branch2a')
.conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3b2_branch2b')
.batch_normalization(relu=True, name='bn3b2_branch2b')
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3b2_branch2c')
.batch_normalization(name='bn3b2_branch2c'))
(self.feed('res3b1_relu',
'bn3b2_branch2c')
.add(name='res3b2')
.relu(name='res3b2_relu') # batch_size x 28 x 28 x 512
.conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3b3_branch2a')
.batch_normalization(relu=True, name='bn3b3_branch2a')
.conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3b3_branch2b')
.batch_normalization(relu=True, name='bn3b3_branch2b')
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3b3_branch2c')
.batch_normalization(name='bn3b3_branch2c'))
(self.feed('res3b2_relu',
'bn3b3_branch2c')
.add(name='res3b3')
.relu(name='res3b3_relu') # batch_size x 28 x 28 x 512
.conv(1, 1, 1024, 2, 2, biased=False, relu=False, name='res4a_branch1')
.batch_normalization(name='bn4a_branch1'))
(self.feed('res3b3_relu') # batch_size x 28 x 28 x 512
.conv(1, 1, 256, 2, 2, biased=False, relu=False, name='res4a_branch2a')
.batch_normalization(relu=True, name='bn4a_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4a_branch2b')
.batch_normalization(relu=True, name='bn4a_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4a_branch2c')
.batch_normalization(name='bn4a_branch2c'))
(self.feed('bn4a_branch1',
'bn4a_branch2c')
.add(name='res4a')
.relu(name='res4a_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b1_branch2a')
.batch_normalization(relu=True, name='bn4b1_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b1_branch2b')
.batch_normalization(relu=True, name='bn4b1_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b1_branch2c')
.batch_normalization(name='bn4b1_branch2c'))
(self.feed('res4a_relu',
'bn4b1_branch2c')
.add(name='res4b1')
.relu(name='res4b1_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b2_branch2a')
.batch_normalization(relu=True, name='bn4b2_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b2_branch2b')
.batch_normalization(relu=True, name='bn4b2_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b2_branch2c')
.batch_normalization(name='bn4b2_branch2c'))
(self.feed('res4b1_relu',
'bn4b2_branch2c')
.add(name='res4b2')
.relu(name='res4b2_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b3_branch2a')
.batch_normalization(relu=True, name='bn4b3_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b3_branch2b')
.batch_normalization(relu=True, name='bn4b3_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b3_branch2c')
.batch_normalization(name='bn4b3_branch2c'))
(self.feed('res4b2_relu',
'bn4b3_branch2c')
.add(name='res4b3')
.relu(name='res4b3_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b4_branch2a')
.batch_normalization(relu=True, name='bn4b4_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b4_branch2b')
.batch_normalization(relu=True, name='bn4b4_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b4_branch2c')
.batch_normalization(name='bn4b4_branch2c'))
(self.feed('res4b3_relu',
'bn4b4_branch2c')
.add(name='res4b4')
.relu(name='res4b4_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b5_branch2a')
.batch_normalization(relu=True, name='bn4b5_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b5_branch2b')
.batch_normalization(relu=True, name='bn4b5_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b5_branch2c')
.batch_normalization(name='bn4b5_branch2c'))
(self.feed('res4b4_relu',
'bn4b5_branch2c')
.add(name='res4b5')
.relu(name='res4b5_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b6_branch2a')
.batch_normalization(relu=True, name='bn4b6_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b6_branch2b')
.batch_normalization(relu=True, name='bn4b6_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b6_branch2c')
.batch_normalization(name='bn4b6_branch2c'))
(self.feed('res4b5_relu',
'bn4b6_branch2c')
.add(name='res4b6')
.relu(name='res4b6_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b7_branch2a')
.batch_normalization(relu=True, name='bn4b7_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b7_branch2b')
.batch_normalization(relu=True, name='bn4b7_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b7_branch2c')
.batch_normalization(name='bn4b7_branch2c'))
(self.feed('res4b6_relu',
'bn4b7_branch2c')
.add(name='res4b7')
.relu(name='res4b7_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b8_branch2a')
.batch_normalization(relu=True, name='bn4b8_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b8_branch2b')
.batch_normalization(relu=True, name='bn4b8_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b8_branch2c')
.batch_normalization(name='bn4b8_branch2c'))
(self.feed('res4b7_relu',
'bn4b8_branch2c')
.add(name='res4b8')
.relu(name='res4b8_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b9_branch2a')
.batch_normalization(relu=True, name='bn4b9_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b9_branch2b')
.batch_normalization(relu=True, name='bn4b9_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b9_branch2c')
.batch_normalization(name='bn4b9_branch2c'))
(self.feed('res4b8_relu',
'bn4b9_branch2c')
.add(name='res4b9')
.relu(name='res4b9_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b10_branch2a')
.batch_normalization(relu=True, name='bn4b10_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b10_branch2b')
.batch_normalization(relu=True, name='bn4b10_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b10_branch2c')
.batch_normalization(name='bn4b10_branch2c'))
(self.feed('res4b9_relu',
'bn4b10_branch2c')
.add(name='res4b10')
.relu(name='res4b10_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b11_branch2a')
.batch_normalization(relu=True, name='bn4b11_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b11_branch2b')
.batch_normalization(relu=True, name='bn4b11_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b11_branch2c')
.batch_normalization(name='bn4b11_branch2c'))
(self.feed('res4b10_relu',
'bn4b11_branch2c')
.add(name='res4b11')
.relu(name='res4b11_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b12_branch2a')
.batch_normalization(relu=True, name='bn4b12_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b12_branch2b')
.batch_normalization(relu=True, name='bn4b12_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b12_branch2c')
.batch_normalization(name='bn4b12_branch2c'))
(self.feed('res4b11_relu',
'bn4b12_branch2c')
.add(name='res4b12')
.relu(name='res4b12_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b13_branch2a')
.batch_normalization(relu=True, name='bn4b13_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b13_branch2b')
.batch_normalization(relu=True, name='bn4b13_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b13_branch2c')
.batch_normalization(name='bn4b13_branch2c'))
(self.feed('res4b12_relu',
'bn4b13_branch2c')
.add(name='res4b13')
.relu(name='res4b13_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b14_branch2a')
.batch_normalization(relu=True, name='bn4b14_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b14_branch2b')
.batch_normalization(relu=True, name='bn4b14_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b14_branch2c')
.batch_normalization(name='bn4b14_branch2c'))
(self.feed('res4b13_relu',
'bn4b14_branch2c')
.add(name='res4b14')
.relu(name='res4b14_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b15_branch2a')
.batch_normalization(relu=True, name='bn4b15_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b15_branch2b')
.batch_normalization(relu=True, name='bn4b15_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b15_branch2c')
.batch_normalization(name='bn4b15_branch2c'))
(self.feed('res4b14_relu',
'bn4b15_branch2c')
.add(name='res4b15')
.relu(name='res4b15_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b16_branch2a')
.batch_normalization(relu=True, name='bn4b16_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b16_branch2b')
.batch_normalization(relu=True, name='bn4b16_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b16_branch2c')
.batch_normalization(name='bn4b16_branch2c'))
(self.feed('res4b15_relu',
'bn4b16_branch2c')
.add(name='res4b16')
.relu(name='res4b16_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b17_branch2a')
.batch_normalization(relu=True, name='bn4b17_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b17_branch2b')
.batch_normalization(relu=True, name='bn4b17_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b17_branch2c')
.batch_normalization(name='bn4b17_branch2c'))
(self.feed('res4b16_relu',
'bn4b17_branch2c')
.add(name='res4b17')
.relu(name='res4b17_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b18_branch2a')
.batch_normalization(relu=True, name='bn4b18_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b18_branch2b')
.batch_normalization(relu=True, name='bn4b18_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b18_branch2c')
.batch_normalization(name='bn4b18_branch2c'))
(self.feed('res4b17_relu',
'bn4b18_branch2c')
.add(name='res4b18')
.relu(name='res4b18_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b19_branch2a')
.batch_normalization(relu=True, name='bn4b19_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b19_branch2b')
.batch_normalization(relu=True, name='bn4b19_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b19_branch2c')
.batch_normalization(name='bn4b19_branch2c'))
(self.feed('res4b18_relu',
'bn4b19_branch2c')
.add(name='res4b19')
.relu(name='res4b19_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b20_branch2a')
.batch_normalization(relu=True, name='bn4b20_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b20_branch2b')
.batch_normalization(relu=True, name='bn4b20_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b20_branch2c')
.batch_normalization(name='bn4b20_branch2c'))
(self.feed('res4b19_relu',
'bn4b20_branch2c')
.add(name='res4b20')
.relu(name='res4b20_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b21_branch2a')
.batch_normalization(relu=True, name='bn4b21_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b21_branch2b')
.batch_normalization(relu=True, name='bn4b21_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b21_branch2c')
.batch_normalization(name='bn4b21_branch2c'))
(self.feed('res4b20_relu',
'bn4b21_branch2c')
.add(name='res4b21')
.relu(name='res4b21_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b22_branch2a')
.batch_normalization(relu=True, name='bn4b22_branch2a')
.conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b22_branch2b')
.batch_normalization(relu=True, name='bn4b22_branch2b')
.conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b22_branch2c')
.batch_normalization(name='bn4b22_branch2c'))
(self.feed('res4b21_relu',
'bn4b22_branch2c')
.add(name='res4b22')
.relu(name='res4b22_relu') # batch_size x 14 x 14 x 1024
.conv(1, 1, 2048, 2, 2, biased=False, relu=False, name='res5a_branch1')
.batch_normalization(name='bn5a_branch1'))
(self.feed('res4b22_relu')
.conv(1, 1, 512, 2, 2, biased=False, relu=False, name='res5a_branch2a')
.batch_normalization(relu=True, name='bn5a_branch2a')
.conv(3, 3, 512, 1, 1, biased=False, relu=False, name='res5a_branch2b')
.batch_normalization(relu=True, name='bn5a_branch2b')
.conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5a_branch2c')
.batch_normalization(name='bn5a_branch2c'))
(self.feed('bn5a_branch1',
'bn5a_branch2c')
.add(name='res5a')
.relu(name='res5a_relu') # batch_size x 7 x 7 x 2048
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5b_branch2a')
.batch_normalization(relu=True, name='bn5b_branch2a')
.conv(3, 3, 512, 1, 1, biased=False, relu=False, name='res5b_branch2b')
.batch_normalization(relu=True, name='bn5b_branch2b')
.conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5b_branch2c')
.batch_normalization(name='bn5b_branch2c'))
(self.feed('res5a_relu',
'bn5b_branch2c')
.add(name='res5b')
.relu(name='res5b_relu') # batch_size x 7 x 7 x 2048
.conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5c_branch2a')
.batch_normalization(relu=True, name='bn5c_branch2a')
.conv(3, 3, 512, 1, 1, biased=False, relu=False, name='res5c_branch2b')
.batch_normalization(relu=True, name='bn5c_branch2b')
.conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5c_branch2c')
.batch_normalization(name='bn5c_branch2c'))
(self.feed('res5b_relu',
'bn5c_branch2c')
.add(name='res5c')
.relu(name='res5c_relu') # batch_size x 7 x 7 x 2048
.avg_pool(7, 7, 1, 1, padding='VALID', name='pool5'))
#.fc(198, relu=False, name='fc_ftnew'))
================================================
FILE: get_Rts.py
================================================
"""3D pose estimation network: get R and ts
"""
import lmdb
import sys
import time
import csv
import numpy as np
import numpy.matlib
import os
import pose_model as Pose_model
import tf_utils as util
import tensorflow as tf
import scipy
from scipy import ndimage, misc
import os.path
import glob
tf.logging.set_verbosity(tf.logging.INFO)
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_string('mode', 'valid', 'train or eval or valid.')
tf.app.flags.DEFINE_integer('image_size', 227, 'Image side length.')
tf.app.flags.DEFINE_string('log_root', '.', 'Directory to keep the checkpoints')
tf.app.flags.DEFINE_string('model_root', '.', 'Directory to keep the checkpoints')
tf.app.flags.DEFINE_integer('num_gpus', 0, 'Number of gpus used for training. (0 or 1)')
tf.app.flags.DEFINE_integer('gpu_id', 0, 'GPU ID to be used.')
tf.app.flags.DEFINE_string('input_csv', 'input.csv', 'input file to process')
tf.app.flags.DEFINE_string('output_lmdb', 'pose_lmdb', 'output lmdb')
tf.app.flags.DEFINE_integer('batch_size', 1, 'Batch Size')
def run_pose_estimation(root_model_path, inputFile, outputDB, model_used, lr_rate_scalar, if_dropout, keep_rate):
# Load training images mean: The values are in the range of [0,1], so the image pixel values should also divided by 255
file = np.load(root_model_path + "perturb_Oxford_train_imgs_mean.npz")
train_mean_vec = file["train_mean_vec"]
del file
# Load training labels mean and std
file = np.load(root_model_path +"perturb_Oxford_train_labels_mean_std.npz")
mean_labels = file["mean_labels"]
std_labels = file["std_labels"]
del file
# placeholders for the batches
x = tf.placeholder(tf.float32, [FLAGS.batch_size, FLAGS.image_size, FLAGS.image_size, 3])
y = tf.placeholder(tf.float32, [FLAGS.batch_size, 6])
net_data = np.load(root_model_path +"PAM_frontal_ALexNet.npy").item()
pose_3D_model = Pose_model.ThreeD_Pose_Estimation(x, y, 'valid', if_dropout, keep_rate, keep_rate, lr_rate_scalar, net_data, FLAGS.batch_size, mean_labels, std_labels)
pose_3D_model._build_graph()
del net_data
# #Add ops to save and restore all the variables.
saver = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.VARIABLES, scope='Spatial_Transformer'))
pose_lmdb_env = lmdb.Environment(outputDB, map_size=1e12)
with tf.Session(config=tf.ConfigProto(allow_soft_placement=True )) as sess, \
pose_lmdb_env.begin(write=True) as pose_txn:
# Restore variables from disk.
load_path = root_model_path + model_used
saver.restore(sess, load_path)
print("Model restored.")
# Load cropped and scaled image file list (csv file)
with open(inputFile, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
lines = csvfile.readlines()
for lin in lines:
### THE file is of the form
### key1, image_path_key_1
mykey = lin.split(',')[0]
image_file_path = lin.split(',')[-1].rstrip('\n')
import cv2
image = cv2.imread(image_file_path)
image = np.asarray(image)
# Fix the 2D image
if len(image.shape) < 3:
image_r = np.reshape(image, (image.shape[0], image.shape[1], 1))
image = np.append(image_r, image_r, axis=2)
image = np.append(image, image_r, axis=2)
label = np.array([0.,0.,0.,0.,0.,0.])
id_labels = np.array([0])
# Normalize images and labels
nr_image, nr_pose_label, id_label = util.input_processing(image, label, id_labels, train_mean_vec, mean_labels, std_labels, 1, FLAGS.image_size, 739)
del id_label
# Reshape the image and label to fit model
nr_image = nr_image.reshape(1, FLAGS.image_size, FLAGS.image_size, 3)
nr_pose_label = nr_pose_label.reshape(1,6)
# Get predicted R-ts
pred_Rts = sess.run(pose_3D_model.preds_unNormalized, feed_dict={x: nr_image, y: nr_pose_label})
print 'Predicted pose for: ' + mykey
pose_txn.put( mykey , pred_Rts[0].astype('float32') )
def esimatePose(root_model_path, inputFile, outputDB, model_used, lr_rate_scalar, if_dropout, keep_rate, use_gpu=False ):
## Force TF to use CPU oterwise we set the ID of the string of GPU we wanna use but here we are going use CPU
os.environ['CUDA_VISIBLE_DEVICES'] = '1' #e.g. str(FLAGS.gpu_id)# '7'
if use_gpu == False:
dev = '/cpu:0'
print "Using CPU"
elif usc_gpu == True:
dev = '/gpu:0'
print "Using GPU " + os.environ['CUDA_VISIBLE_DEVICES']
else:
raise ValueError('Only support 0 or 1 gpu.')
run_pose_estimation( root_model_path, inputFile, outputDB, model_used, lr_rate_scalar, if_dropout, keep_rate )
================================================
FILE: input.csv
================================================
ID,FILE,FACE_X,FACE_Y,FACE_WIDTH,FACE_HEIGHT
subject1_a,images/input1.jpg,108.2642,119.6774,170,179
subject2_a,images/input2.jpg,48.51129913,38.1551857,141.19125366,149.40893555
subject3_a,images/input3.jpg,47.94947433,26.95211983,126.64208984,169.57138062
subject4_a,images/input4.jpg,41.02483749,81.23366547,122.9382019,79.80832672
subject5_a,images/input5.jpg,44.65912247,30.22106934,138.8326416,156.31950378
subject6_a,images/input6.jpg,54.94252396,41.26684189,117.19006348,137.38693237
subject7_a,images/input7.jpg,63.90779114,54.21474075,159.63040161,90.42936707
subject8_a,images/input8.jpg,53.62681198,48.40485001,78.09403992,101.56494141
subject9_a,images/input9.jpg,55.74394226,72.12078094,76.75720215,114.19478607
subject10_a,images/input10.jpg,48.07297897,30.98786163,145.96961975,124.47624969
================================================
FILE: input_list.txt
================================================
./input_samples/HELEN_30427236_2_0.jpg,132.8177,213.8680,183.1707,171.1925
./input_samples/LFPW_image_test_0001_0.jpg,132.3776,213.9545,178.3532,168.3772
./input_samples/LFPW_image_test_0008_0.jpg,138.6822,210.8271,172.3008,174.2025
================================================
FILE: input_samples/README
================================================
Three input sample images to run our new FPN
================================================
FILE: kaffe/__init__.py
================================================
from .graph import GraphBuilder, NodeMapper
from .errors import KaffeError, print_stderr
from . import tensorflow
================================================
FILE: kaffe/errors.py
================================================
import sys
class KaffeError(Exception):
pass
def print_stderr(msg):
sys.stderr.write('%s\n' % msg)
================================================
FILE: kaffe/graph.py
================================================
from google.protobuf import text_format
from .caffe import get_caffe_resolver
from .errors import KaffeError, print_stderr
from .layers import LayerAdapter, LayerType, NodeKind, NodeDispatch
from .shapes import TensorShape
class Node(object):
def __init__(self, name, kind, layer=None):
self.name = name
self.kind = kind
self.layer = LayerAdapter(layer, kind) if layer else None
self.parents = []
self.children = []
self.data = None
self.output_shape = None
self.metadata = {}
def add_parent(self, parent_node):
assert parent_node not in self.parents
self.parents.append(parent_node)
if self not in parent_node.children:
parent_node.children.append(self)
def add_child(self, child_node):
assert child_node not in self.children
self.children.append(child_node)
if self not in child_node.parents:
child_node.parents.append(self)
def get_only_parent(self):
if len(self.parents) != 1:
raise KaffeError('Node (%s) expected to have 1 parent. Found %s.' %
(self, len(self.parents)))
return self.parents[0]
@property
def parameters(self):
if self.layer is not None:
return self.layer.parameters
return None
def __str__(self):
return '[%s] %s' % (self.kind, self.name)
def __repr__(self):
return '%s (0x%x)' % (self.name, id(self))
class Graph(object):
def __init__(self, nodes=None, name=None):
self.nodes = nodes or []
self.node_lut = {node.name: node for node in self.nodes}
self.name = name
def add_node(self, node):
self.nodes.append(node)
self.node_lut[node.name] = node
def get_node(self, name):
try:
return self.node_lut[name]
except KeyError:
raise KaffeError('Layer not found: %s' % name)
def get_input_nodes(self):
return [node for node in self.nodes if len(node.parents) == 0]
def get_output_nodes(self):
return [node for node in self.nodes if len(node.children) == 0]
def topologically_sorted(self):
sorted_nodes = []
unsorted_nodes = list(self.nodes)
temp_marked = set()
perm_marked = set()
def visit(node):
if node in temp_marked:
raise KaffeError('Graph is not a DAG.')
if node in perm_marked:
return
temp_marked.add(node)
for child in node.children:
visit(child)
perm_marked.add(node)
temp_marked.remove(node)
sorted_nodes.insert(0, node)
while len(unsorted_nodes):
visit(unsorted_nodes.pop())
return sorted_nodes
def compute_output_shapes(self):
sorted_nodes = self.topologically_sorted()
for node in sorted_nodes:
node.output_shape = TensorShape(*NodeKind.compute_output_shape(node))
def replaced(self, new_nodes):
return Graph(nodes=new_nodes, name=self.name)
def transformed(self, transformers):
graph = self
for transformer in transformers:
graph = transformer(graph)
if graph is None:
raise KaffeError('Transformer failed: {}'.format(transformer))
assert isinstance(graph, Graph)
return graph
def __contains__(self, key):
return key in self.node_lut
def __str__(self):
hdr = '{:<20} {:<30} {:>20} {:>20}'.format('Type', 'Name', 'Param', 'Output')
s = [hdr, '-' * 94]
for node in self.topologically_sorted():
# If the node has learned parameters, display the first one's shape.
# In case of convolutions, this corresponds to the weights.
data_shape = node.data[0].shape if node.data else '--'
out_shape = node.output_shape or '--'
s.append('{:<20} {:<30} {:>20} {:>20}'.format(node.kind, node.name, data_shape,
tuple(out_shape)))
return '\n'.join(s)
class GraphBuilder(object):
'''Constructs a model graph from a Caffe protocol buffer definition.'''
def __init__(self, def_path, phase='test'):
'''
def_path: Path to the model definition (.prototxt)
data_path: Path to the model data (.caffemodel)
phase: Either 'test' or 'train'. Used for filtering phase-specific nodes.
'''
self.def_path = def_path
self.phase = phase
self.load()
def load(self):
'''Load the layer definitions from the prototxt.'''
self.params = get_caffe_resolver().NetParameter()
with open(self.def_path, 'rb') as def_file:
text_format.Merge(def_file.read(), self.params)
def filter_layers(self, layers):
'''Filter out layers based on the current phase.'''
phase_map = {0: 'train', 1: 'test'}
filtered_layer_names = set()
filtered_layers = []
for layer in layers:
phase = self.phase
if len(layer.include):
phase = phase_map[layer.include[0].phase]
if len(layer.exclude):
phase = phase_map[1 - layer.include[0].phase]
exclude = (phase != self.phase)
# Dropout layers appear in a fair number of Caffe
# test-time networks. These are just ignored. We'll
# filter them out here.
if (not exclude) and (phase == 'test'):
exclude = (layer.type == LayerType.Dropout)
if not exclude:
filtered_layers.append(layer)
# Guard against dupes.
assert layer.name not in filtered_layer_names
filtered_layer_names.add(layer.name)
return filtered_layers
def make_node(self, layer):
'''Create a graph node for the given layer.'''
kind = NodeKind.map_raw_kind(layer.type)
if kind is None:
raise KaffeError('Unknown layer type encountered: %s' % layer.type)
# We want to use the layer's top names (the "output" names), rather than the
# name attribute, which is more of readability thing than a functional one.
# Other layers will refer to a node by its "top name".
return Node(layer.name, kind, layer=layer)
def make_input_nodes(self):
'''
Create data input nodes.
This method is for old-style inputs, where the input specification
was not treated as a first-class layer in the prototext.
Newer models use the "Input layer" type.
'''
nodes = [Node(name, NodeKind.Data) for name in self.params.input]
if len(nodes):
input_dim = map(int, self.params.input_dim)
if not input_dim:
if len(self.params.input_shape) > 0:
input_dim = map(int, self.params.input_shape[0].dim)
else:
raise KaffeError('Dimensions for input not specified.')
for node in nodes:
node.output_shape = tuple(input_dim)
return nodes
def build(self):
'''
Builds the graph from the Caffe layer definitions.
'''
# Get the layers
layers = self.params.layers or self.params.layer
# Filter out phase-excluded layers
layers = self.filter_layers(layers)
# Get any separately-specified input layers
nodes = self.make_input_nodes()
nodes += [self.make_node(layer) for layer in layers]
# Initialize the graph
graph = Graph(nodes=nodes, name=self.params.name)
# Connect the nodes
#
# A note on layers and outputs:
# In Caffe, each layer can produce multiple outputs ("tops") from a set of inputs
# ("bottoms"). The bottoms refer to other layers' tops. The top can rewrite a bottom
# (in case of in-place operations). Note that the layer's name is not used for establishing
# any connectivity. It's only used for data association. By convention, a layer with a
# single top will often use the same name (although this is not required).
#
# The current implementation only supports single-output nodes (note that a node can still
# have multiple children, since multiple child nodes can refer to the single top's name).
node_outputs = {}
for layer in layers:
node = graph.get_node(layer.name)
for input_name in layer.bottom:
assert input_name != layer.name
parent_node = node_outputs.get(input_name)
if (parent_node is None) or (parent_node == node):
parent_node = graph.get_node(input_name)
node.add_parent(parent_node)
if len(layer.top)>1:
raise KaffeError('Multiple top nodes are not supported.')
for output_name in layer.top:
if output_name == layer.name:
# Output is named the same as the node. No further action required.
continue
# There are two possibilities here:
#
# Case 1: output_name refers to another node in the graph.
# This is an "in-place operation" that overwrites an existing node.
# This would create a cycle in the graph. We'll undo the in-placing
# by substituting this node wherever the overwritten node is referenced.
#
# Case 2: output_name violates the convention layer.name == output_name.
# Since we are working in the single-output regime, we will can rename it to
# match the layer name.
#
# For both cases, future references to this top re-routes to this node.
node_outputs[output_name] = node
graph.compute_output_shapes()
return graph
class NodeMapper(NodeDispatch):
def __init__(self, graph):
self.graph = graph
def map(self):
nodes = self.graph.topologically_sorted()
# Remove input nodes - we'll handle them separately.
input_nodes = self.graph.get_input_nodes()
nodes = [t for t in nodes if t not in input_nodes]
# Decompose DAG into chains.
chains = []
for node in nodes:
attach_to_chain = None
if len(node.parents) == 1:
parent = node.get_only_parent()
for chain in chains:
if chain[-1] == parent:
# Node is part of an existing chain.
attach_to_chain = chain
break
if attach_to_chain is None:
# Start a new chain for this node.
attach_to_chain = []
chains.append(attach_to_chain)
attach_to_chain.append(node)
# Map each chain.
mapped_chains = []
for chain in chains:
mapped_chains.append(self.map_chain(chain))
return self.commit(mapped_chains)
def map_chain(self, chain):
return [self.map_node(node) for node in chain]
def map_node(self, node):
map_func = self.get_handler(node.kind, 'map')
mapped_node = map_func(node)
assert mapped_node is not None
mapped_node.node = node
return mapped_node
def commit(self, mapped_chains):
raise NotImplementedError('Must be implemented by subclass.')
================================================
FILE: kaffe/layers.py
================================================
import re
import numbers
from collections import namedtuple
from .shapes import *
LAYER_DESCRIPTORS = {
# Caffe Types
'AbsVal': shape_identity,
'Accuracy': shape_scalar,
'ArgMax': shape_not_implemented,
'BatchNorm': shape_identity,
'BNLL': shape_not_implemented,
'Concat': shape_concat,
'ContrastiveLoss': shape_scalar,
'Convolution': shape_convolution,
'Deconvolution': shape_not_implemented,
'Data': shape_data,
'Dropout': shape_identity,
'DummyData': shape_data,
'EuclideanLoss': shape_scalar,
'Eltwise': shape_identity,
'Exp': shape_identity,
'Flatten': shape_not_implemented,
'HDF5Data': shape_data,
'HDF5Output': shape_identity,
'HingeLoss': shape_scalar,
'Im2col': shape_not_implemented,
'ImageData': shape_data,
'InfogainLoss': shape_scalar,
'InnerProduct': shape_inner_product,
'Input': shape_data,
'LRN': shape_identity,
'MemoryData': shape_mem_data,
'MultinomialLogisticLoss': shape_scalar,
'MVN': shape_not_implemented,
'Pooling': shape_pool,
'Power': shape_identity,
'ReLU': shape_identity,
'Scale': shape_identity,
'Sigmoid': shape_identity,
'SigmoidCrossEntropyLoss': shape_scalar,
'Silence': shape_not_implemented,
'Softmax': shape_identity,
'SoftmaxWithLoss': shape_scalar,
'Split': shape_not_implemented,
'Slice': shape_not_implemented,
'TanH': shape_identity,
'WindowData': shape_not_implemented,
'Threshold': shape_identity,
}
LAYER_TYPES = LAYER_DESCRIPTORS.keys()
LayerType = type('LayerType', (), {t: t for t in LAYER_TYPES})
class NodeKind(LayerType):
@staticmethod
def map_raw_kind(kind):
if kind in LAYER_TYPES:
return kind
return None
@staticmethod
def compute_output_shape(node):
try:
val = LAYER_DESCRIPTORS[node.kind](node)
return val
except NotImplementedError:
raise KaffeError('Output shape computation not implemented for type: %s' % node.kind)
class NodeDispatchError(KaffeError):
pass
class NodeDispatch(object):
@staticmethod
def get_handler_name(node_kind):
if len(node_kind) <= 4:
# A catch-all for things like ReLU and tanh
return node_kind.lower()
# Convert from CamelCase to under_scored
name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', node_kind)
return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower()
def get_handler(self, node_kind, prefix):
name = self.get_handler_name(node_kind)
name = '_'.join((prefix, name))
try:
return getattr(self, name)
except AttributeError:
raise NodeDispatchError('No handler found for node kind: %s (expected: %s)' %
(node_kind, name))
class LayerAdapter(object):
def __init__(self, layer, kind):
self.layer = layer
self.kind = kind
@property
def parameters(self):
name = NodeDispatch.get_handler_name(self.kind)
name = '_'.join((name, 'param'))
try:
return getattr(self.layer, name)
except AttributeError:
raise NodeDispatchError('Caffe parameters not found for layer kind: %s' % (self.kind))
@staticmethod
def get_kernel_value(scalar, repeated, idx, default=None):
if scalar:
return scalar
if repeated:
if isinstance(repeated, numbers.Number):
return repeated
if len(repeated) == 1:
# Same value applies to all spatial dimensions
return int(repeated[0])
assert idx < len(repeated)
# Extract the value for the given spatial dimension
return repeated[idx]
if default is None:
raise ValueError('Unable to determine kernel parameter!')
return default
@property
def kernel_parameters(self):
assert self.kind in (NodeKind.Convolution, NodeKind.Pooling)
params = self.parameters
k_h = self.get_kernel_value(params.kernel_h, params.kernel_size, 0)
k_w = self.get_kernel_value(params.kernel_w, params.kernel_size, 1)
s_h = self.get_kernel_value(params.stride_h, params.stride, 0, default=1)
s_w = self.get_kernel_value(params.stride_w, params.stride, 1, default=1)
p_h = self.get_kernel_value(params.pad_h, params.pad, 0, default=0)
p_w = self.get_kernel_value(params.pad_h, params.pad, 1, default=0)
return KernelParameters(k_h, k_w, s_h, s_w, p_h, p_w)
KernelParameters = namedtuple('KernelParameters', ['kernel_h', 'kernel_w', 'stride_h', 'stride_w',
'pad_h', 'pad_w'])
================================================
FILE: kaffe/shapes.py
================================================
import math
from collections import namedtuple
from .errors import KaffeError
TensorShape = namedtuple('TensorShape', ['batch_size', 'channels', 'height', 'width'])
def get_filter_output_shape(i_h, i_w, params, round_func):
o_h = (i_h + 2 * params.pad_h - params.kernel_h) / float(params.stride_h) + 1
o_w = (i_w + 2 * params.pad_w - params.kernel_w) / float(params.stride_w) + 1
return (int(round_func(o_h)), int(round_func(o_w)))
def get_strided_kernel_output_shape(node, round_func):
assert node.layer is not None
input_shape = node.get_only_parent().output_shape
o_h, o_w = get_filter_output_shape(input_shape.height, input_shape.width,
node.layer.kernel_parameters, round_func)
params = node.layer.parameters
has_c_o = hasattr(params, 'num_output')
c = params.num_output if has_c_o else input_shape.channels
return TensorShape(input_shape.batch_size, c, o_h, o_w)
def shape_not_implemented(node):
raise NotImplementedError
def shape_identity(node):
assert len(node.parents) > 0
return node.parents[0].output_shape
def shape_scalar(node):
return TensorShape(1, 1, 1, 1)
def shape_data(node):
if node.output_shape:
# Old-style input specification
return node.output_shape
try:
# New-style input specification
return map(int, node.parameters.shape[0].dim)
except:
# We most likely have a data layer on our hands. The problem is,
# Caffe infers the dimensions of the data from the source (eg: LMDB).
# We want to avoid reading datasets here. Fail for now.
# This can be temporarily fixed by transforming the data layer to
# Caffe's "input" layer (as is usually used in the "deploy" version).
# TODO: Find a better solution for this.
raise KaffeError('Cannot determine dimensions of data layer.\n'
'See comments in function shape_data for more info.')
def shape_mem_data(node):
params = node.parameters
return TensorShape(params.batch_size, params.channels, params.height, params.width)
def shape_concat(node):
axis = node.layer.parameters.axis
output_shape = None
for parent in node.parents:
if output_shape is None:
output_shape = list(parent.output_shape)
else:
output_shape[axis] += parent.output_shape[axis]
return tuple(output_shape)
def shape_convolution(node):
return get_strided_kernel_output_shape(node, math.floor)
def shape_pool(node):
return get_strided_kernel_output_shape(node, math.ceil)
def shape_inner_product(node):
input_shape = node.get_only_parent().output_shape
return TensorShape(input_shape.batch_size, node.layer.parameters.num_output, 1, 1)
================================================
FILE: kaffe/tensorflow/__init__.py
================================================
from .transformer import TensorFlowTransformer
from .network import Network
================================================
FILE: kaffe/tensorflow/network_shape.py
================================================
import numpy as np
import tensorflow as tf
DEFAULT_PADDING = 'SAME'
def layer(op):
'''Decorator for composable network layers.'''
def layer_decorated(self, *args, **kwargs):
# Automatically set a name if not provided.
name = kwargs.setdefault('name', self.get_unique_name(op.__name__))
# Figure out the layer inputs.
if len(self.terminals) == 0:
raise RuntimeError('No input variables found for layer %s.' % name)
elif len(self.terminals) == 1:
layer_input = self.terminals[0]
else:
layer_input = list(self.terminals)
# Perform the operation and get the output.
layer_output = op(self, layer_input, *args, **kwargs)
# Add to layer LUT.
self.layers[name] = layer_output
# This output is now the input for the next layer.
self.feed(layer_output)
# Return self for chained calls.
return self
return layer_decorated
class Network_Shape(object):
def __init__(self, inputs, trainable=True):
# The input nodes for this network
self.inputs = inputs
# The current list of terminal nodes
self.terminals = []
# Mapping from layer names to layers
self.layers = dict(inputs)
# If true, the resulting variables are set as trainable
self.trainable = trainable
print self.trainable
# Switch variable for dropout
self.use_dropout = tf.placeholder_with_default(tf.constant(1.0),
shape=[],
name='use_dropout')
self.setup()
def setup(self):
'''Construct the network. '''
raise NotImplementedError('Must be implemented by the subclass.')
def load(self, data_path, prefix_name, session, ignore_missing=False):
'''Load network weights.
data_path: The path to the numpy-serialized network weights
session: The current TensorFlow session
ignore_missing: If true, serialized weights for missing layers are ignored.
'''
data_dict = np.load(data_path).item()
print len(data_dict) #data_dict['res2b_branch2a']
for op_name in data_dict:
#print op_name
#if op_name == "res2b_branch2a":
# REUSE = None
#else:
# REUSE = True
if op_name == 'fc_ftnew':
continue
with tf.variable_scope(prefix_name + '/' + op_name, reuse=True): # reuse=True
for param_name, data in data_dict[op_name].iteritems():
#if op_name == 'fc_ftnew':
# print param_name, data, data.shape
try:
#if op_name == "res2b_branch2a":
# var = tf.Variable(data_dict[op_name][param_name], trainable=False, name=param_name)
#else:
var = tf.get_variable(param_name)
session.run(var.assign(data))
except ValueError:
if not ignore_missing:
raise
"""
def load(self, data_path, ignore_missing=False):
'''Load network weights.
data_path: The path to the numpy-serialized network weights
session: The current TensorFlow session
ignore_missing: If true, serialized weights for missing layers are ignored.
'''
data_dict = np.load(data_path).item()
#print data_dict['res5c_branch2c'], data_dict['res5c_branch2c']['weights'], data_dict['res5c_branch2c']['weights'].shape
for op_name in data_dict:
with tf.variable_scope(op_name): # reuse=True
for param_name, data in data_dict[op_name].iteritems():
#print param_name, data
try:
if op_name == 'res5c_branch2c':
var = tf.Variable(data_dict[op_name][param_name], trainable=True, name=param_name)
else:
var = tf.Variable(data_dict[op_name][param_name], trainable=False, name=param_name)
#session.run(var.assign(data))
except ValueError:
if not ignore_missing:
raise
"""
def load_specific_vars(self, data_path, op_name, session, ignore_missing=False):
'''Load network weights.
data_path: The path to the numpy-serialized network weights
session: The current TensorFlow session
ignore_missing: If true, serialized weights for missing layers are ignored.
'''
data_dict = np.load(data_path).item()
with tf.variable_scope(op_name, reuse=True): # reuse=None
for param_name, data in data_dict[op_name].iteritems():
#print param_name, data
try:
var = tf.get_variable(param_name)
session.run(var.assign(data))
except ValueError:
if not ignore_missing:
raise
def feed(self, *args):
'''Set the input(s) for the next operation by replacing the terminal nodes.
The arguments can be either layer names or the actual layers.
'''
assert len(args) != 0
self.terminals = []
for fed_layer in args:
if isinstance(fed_layer, basestring):
try:
fed_layer = self.layers[fed_layer]
except KeyError:
raise KeyError('Unknown layer name fed: %s' % fed_layer)
self.terminals.append(fed_layer)
return self
def get_output(self):
'''Returns the current network output.'''
return self.terminals[-1]
def get_unique_name(self, prefix):
'''Returns an index-suffixed unique name for the given prefix.
This is used for auto-generating layer names based on the type-prefix.
'''
ident = sum(t.startswith(prefix) for t, _ in self.layers.items()) + 1
return '%s_%d' % (prefix, ident)
def make_var(self, name, shape):
'''Creates a new TensorFlow variable.'''
return tf.get_variable(name, shape, trainable=self.trainable) #self.trainable)
#tmp = tf.get_variable(name, shape=shape, trainable=False)
#return tf.Variable(tmp, trainable=False, name=name)
def make_var_fixed(self, name, shape):
'''Creates a new TensorFlow variable.'''
return tf.get_variable(name, shape, trainable=False)
#tmp = tf.get_variable(name, shape=shape, trainable=False)
#return tf.Variable(tmp, trainable=False, name=name)
def validate_padding(self, padding):
'''Verifies that the padding is one of the supported ones.'''
assert padding in ('SAME', 'VALID')
@layer
def conv(self,
input,
k_h,
k_w,
c_o,
s_h,
s_w,
name,
relu=True,
padding=DEFAULT_PADDING,
group=1,
biased=True):
# Verify that the padding is acceptable
self.validate_padding(padding)
# Get the number of channels in the input
c_i = input.get_shape()[-1]
# Verify that the grouping parameter is valid
assert c_i % group == 0
assert c_o % group == 0
# Convolution for a given input and kernel
convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding)
with tf.variable_scope(name) as scope:
if name == 'res5c_branch2c' or name == 'res5c_branch2b' or name == 'res5c_branch2a' or \
name == 'res5b_branch2c' or name == 'res5b_branch2b' or name == 'res5b_branch2a': # or \
#name == 'res5a_branch2c' or name == 'res5a_branch2b' or name == 'res5a_branch2a' or \
#name == 'res5a_branch1':
kernel = self.make_var('weights', shape=[k_h, k_w, c_i / group, c_o])
else:
#kernel = self.make_var_fixed('weights', shape=[k_h, k_w, c_i / group, c_o])
kernel = self.make_var('weights', shape=[k_h, k_w, c_i / group, c_o])
if group == 1:
# This is the common-case. Convolve the input without any further complications.
output = convolve(input, kernel)
else:
# Split the input into groups and then convolve each of them independently
input_groups = tf.split(3, group, input)
kernel_groups = tf.split(3, group, kernel)
output_groups = [convolve(i, k) for i, k in zip(input_groups, kernel_groups)]
# Concatenate the groups
output = tf.concat(3, output_groups)
# Add the biases
if biased:
if name == 'res5c_branch2c' or name == 'res5c_branch2b' or name == 'res5c_branch2a' or \
name == 'res5b_branch2c' or name == 'res5b_branch2b' or name == 'res5b_branch2a': # or \
#name == 'res5a_branch2c' or name == 'res5a_branch2b' or name == 'res5a_branch2a' or \
#name == 'res5a_branch1':
biases = self.make_var('biases', [c_o])
else:
#biases = self.make_var_fixed('biases', [c_o])
biases = self.make_var('biases', [c_o])
output = tf.nn.bias_add(output, biases)
if relu:
# ReLU non-linearity
output = tf.nn.relu(output, name=scope.name)
return output
@layer
def relu(self, input, name):
return tf.nn.relu(input, name=name)
@layer
def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING):
self.validate_padding(padding)
return tf.nn.max_pool(input,
ksize=[1, k_h, k_w, 1],
strides=[1, s_h, s_w, 1],
padding=padding,
name=name)
@layer
def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING):
self.validate_padding(padding)
return tf.nn.avg_pool(input,
ksize=[1, k_h, k_w, 1],
strides=[1, s_h, s_w, 1],
padding=padding,
name=name)
@layer
def lrn(self, input, radius, alpha, beta, name, bias=1.0):
return tf.nn.local_response_normalization(input,
depth_radius=radius,
alpha=alpha,
beta=beta,
bias=bias,
name=name)
@layer
def concat(self, inputs, axis, name):
return tf.concat(concat_dim=axis, values=inputs, name=name)
@layer
def add(self, inputs, name):
return tf.add_n(inputs, name=name)
@layer
def fc(self, input, num_out, name, relu=True):
with tf.variable_scope(name) as scope:
input_shape = input.get_shape()
if input_shape.ndims == 4:
# The input is spatial. Vectorize it first.
dim = 1
for d in input_shape[1:].as_list():
dim *= d
feed_in = tf.reshape(input, [-1, dim])
else:
feed_in, dim = (input, input_shape[-1].value)
weights = self.make_var('weights', shape=[dim, num_out])
biases = self.make_var('biases', [num_out])
op = tf.nn.relu_layer if relu else tf.nn.xw_plus_b
fc = op(feed_in, weights, biases, name=scope.name)
return fc
@layer
def softmax(self, input, name):
input_shape = map(lambda v: v.value, input.get_shape())
if len(input_shape) > 2:
# For certain models (like NiN), the singleton spatial dimensions
# need to be explicitly squeezed, since they're not broadcast-able
# in TensorFlow's NHWC ordering (unlike Caffe's NCHW).
if input_shape[1] == 1 and input_shape[2] == 1:
input = tf.squeeze(input, squeeze_dims=[1, 2])
else:
raise ValueError('Rank 2 tensor input expected for softmax!')
return tf.nn.softmax(input, name)
@layer
def batch_normalization(self, input, name, scale_offset=True, relu=False):
# NOTE: Currently, only inference is supported
with tf.variable_scope(name) as scope:
shape = [input.get_shape()[-1]]
if scale_offset:
scale = self.make_var_fixed('scale', shape=shape)
offset = self.make_var_fixed('offset', shape=shape)
#scale = self.make_var('scale', shape=shape)
#offset = self.make_var('offset', shape=shape)
else:
scale, offset = (None, None)
output = tf.nn.batch_normalization(
input,
mean=self.make_var_fixed('mean', shape=shape),
variance=self.make_var_fixed('variance', shape=shape),
#mean=self.make_var('mean', shape=shape),
#variance=self.make_var('variance', shape=shape),
offset=offset,
scale=scale,
# TODO: This is the default Caffe batch norm eps
# Get the actual eps from parameters
variance_epsilon=1e-5,
name=name)
if relu:
output = tf.nn.relu(output)
return output
@layer
def dropout(self, input, keep_prob, name):
keep = 1 - self.use_dropout + (self.use_dropout * keep_prob)
return tf.nn.dropout(input, keep, name=name)
================================================
FILE: kaffe/transformers.py
================================================
'''
A collection of graph transforms.
A transformer is a callable that accepts a graph and returns a transformed version.
'''
import numpy as np
from .caffe import get_caffe_resolver, has_pycaffe
from .errors import KaffeError, print_stderr
from .layers import NodeKind
class DataInjector(object):
'''
Associates parameters loaded from a .caffemodel file with their corresponding nodes.
'''
def __init__(self, def_path, data_path):
# The .prototxt file defining the graph
self.def_path = def_path
# The .caffemodel file containing the learned parameters
self.data_path = data_path
# Set to true if the fallback protocol-buffer based backend was used
self.did_use_pb = False
# A list containing (layer name, parameters) tuples
self.params = None
# Load the parameters
self.load()
def load(self):
if has_pycaffe():
self.load_using_caffe()
else:
self.load_using_pb()
def load_using_caffe(self):
caffe = get_caffe_resolver().caffe
net = caffe.Net(self.def_path, self.data_path, caffe.TEST)
data = lambda blob: blob.data
self.params = [(k, map(data, v)) for k, v in net.params.items()]
def load_using_pb(self):
data = get_caffe_resolver().NetParameter()
data.MergeFromString(open(self.data_path, 'rb').read())
pair = lambda layer: (layer.name, self.normalize_pb_data(layer))
layers = data.layers or data.layer
self.params = [pair(layer) for layer in layers if layer.blobs]
self.did_use_pb = True
def normalize_pb_data(self, layer):
transformed = []
for blob in layer.blobs:
if len(blob.shape.dim):
dims = blob.shape.dim
c_o, c_i, h, w = map(int, [1] * (4 - len(dims)) + list(dims))
else:
c_o = blob.num
c_i = blob.channels
h = blob.height
w = blob.width
data = np.array(blob.data, dtype=np.float32).reshape(c_o, c_i, h, w)
transformed.append(data)
return transformed
def adjust_parameters(self, node, data):
if not self.did_use_pb:
return data
# When using the protobuf-backend, each parameter initially has four dimensions.
# In certain cases (like FC layers), we want to eliminate the singleton dimensions.
# This implementation takes care of the common cases. However, it does leave the
# potential for future issues.
# The Caffe-backend does not suffer from this problem.
data = list(data)
squeeze_indices = [1] # Squeeze biases.
if node.kind == NodeKind.InnerProduct:
squeeze_indices.append(0) # Squeeze FC.
for idx in squeeze_indices:
data[idx] = np.squeeze(data[idx])
return data
def __call__(self, graph):
for layer_name, data in self.params:
if layer_name in graph:
node = graph.get_node(layer_name)
node.data = self.adjust_parameters(node, data)
else:
print_stderr('Ignoring parameters for non-existent layer: %s' % layer_name)
return graph
class DataReshaper(object):
def __init__(self, mapping, replace=True):
# A dictionary mapping NodeKind to the transposed order.
self.mapping = mapping
# The node kinds eligible for reshaping
self.reshaped_node_types = self.mapping.keys()
# If true, the reshaped data will replace the old one.
# Otherwise, it's set to the reshaped_data attribute.
self.replace = replace
def has_spatial_parent(self, node):
try:
parent = node.get_only_parent()
s = parent.output_shape
return s.height > 1 or s.width > 1
except KaffeError:
return False
def map(self, node_kind):
try:
return self.mapping[node_kind]
except KeyError:
raise KaffeError('Ordering not found for node kind: {}'.format(node_kind))
def __call__(self, graph):
for node in graph.nodes:
if node.data is None:
continue
if node.kind not in self.reshaped_node_types:
# Check for 2+ dimensional data
if any(len(tensor.shape) > 1 for tensor in node.data):
print_stderr('Warning: parmaters not reshaped for node: {}'.format(node))
continue
transpose_order = self.map(node.kind)
weights = node.data[0]
if (node.kind == NodeKind.InnerProduct) and self.has_spatial_parent(node):
# The FC layer connected to the spatial layer needs to be
# re-wired to match the new spatial ordering.
in_shape = node.get_only_parent().output_shape
fc_shape = weights.shape
output_channels = fc_shape[0]
weights = weights.reshape((output_channels, in_shape.channels, in_shape.height,
in_shape.width))
weights = weights.transpose(self.map(NodeKind.Convolution))
node.reshaped_data = weights.reshape(fc_shape[transpose_order[0]],
fc_shape[transpose_order[1]])
else:
node.reshaped_data = weights.transpose(transpose_order)
if self.replace:
for node in graph.nodes:
if hasattr(node, 'reshaped_data'):
# Set the weights
node.data[0] = node.reshaped_data
del node.reshaped_data
return graph
class SubNodeFuser(object):
'''
An abstract helper for merging a single-child with its single-parent.
'''
def __call__(self, graph):
nodes = graph.nodes
fused_nodes = []
for node in nodes:
if len(node.parents) != 1:
# We're only fusing nodes with single parents
continue
parent = node.get_only_parent()
if len(parent.children) != 1:
# We can only fuse a node if its parent's
# value isn't used by any other node.
continue
if not self.is_eligible_pair(parent, node):
continue
# Rewrite the fused node's children to its parent.
for child in node.children:
child.parents.remove(node)
parent.add_child(child)
# Disconnect the fused node from the graph.
parent.children.remove(node)
fused_nodes.append(node)
# Let the sub-class merge the fused node in any arbitrary way.
self.merge(parent, node)
transformed_nodes = [node for node in nodes if node not in fused_nodes]
return graph.replaced(transformed_nodes)
def is_eligible_pair(self, parent, child):
'''Returns true if this parent/child pair is eligible for fusion.'''
raise NotImplementedError('Must be implemented by subclass.')
def merge(self, parent, child):
'''Merge the child node into the parent.'''
raise NotImplementedError('Must be implemented by subclass')
class ReLUFuser(SubNodeFuser):
'''
Fuses rectified linear units with their parent nodes.
'''
def __init__(self, allowed_parent_types=None):
# Fuse ReLUs when the parent node is one of the given types.
# If None, all node types are eligible.
self.allowed_parent_types = allowed_parent_types
def is_eligible_pair(self, parent, child):
return ((self.allowed_parent_types is None or parent.kind in self.allowed_parent_types) and
child.kind == NodeKind.ReLU)
def merge(self, parent, _):
parent.metadata['relu'] = True
class BatchNormScaleBiasFuser(SubNodeFuser):
'''
The original batch normalization paper includes two learned
parameters: a scaling factor \gamma and a bias \beta.
Caffe's implementation does not include these two. However, it is commonly
replicated by adding a scaling+bias layer immidiately after the batch norm.
This fuser merges the scaling+bias layer with the batch norm.
'''
def is_eligible_pair(self, parent, child):
return (parent.kind == NodeKind.BatchNorm and child.kind == NodeKind.Scale and
child.parameters.axis == 1 and child.parameters.bias_term == True)
def merge(self, parent, child):
parent.scale_bias_node = child
class BatchNormPreprocessor(object):
'''
Prescale batch normalization parameters.
Concatenate gamma (scale) and beta (bias) terms if set.
'''
def __call__(self, graph):
for node in graph.nodes:
if node.kind != NodeKind.BatchNorm:
continue
assert node.data is not None
assert len(node.data) == 3
mean, variance, scale = node.data
# Prescale the stats
scaling_factor = 1.0 / scale if scale != 0 else 0
mean *= scaling_factor
variance *= scaling_factor
# Replace with the updated values
node.data = [mean, variance]
if hasattr(node, 'scale_bias_node'):
# Include the scale and bias terms
gamma, beta = node.scale_bias_node.data
node.data += [gamma, beta]
return graph
class NodeRenamer(object):
'''
Renames nodes in the graph using a given unary function that
accepts a node and returns its new name.
'''
def __init__(self, renamer):
self.renamer = renamer
def __call__(self, graph):
for node in graph.nodes:
node.name = self.renamer(node)
return graph
class ParameterNamer(object):
'''
Convert layer data arrays to a dictionary mapping parameter names to their values.
'''
def __call__(self, graph):
for node in graph.nodes:
if node.data is None:
continue
if node.kind in (NodeKind.Convolution, NodeKind.InnerProduct):
names = ('weights',)
if node.parameters.bias_term:
names += ('biases',)
elif node.kind == NodeKind.BatchNorm:
names = ('mean', 'variance')
if len(node.data) == 4:
names += ('scale', 'offset')
else:
print_stderr('WARNING: Unhandled parameters: {}'.format(node.kind))
continue
assert len(names) == len(node.data)
node.data = dict(zip(names, node.data))
return graph
================================================
FILE: main_fpn.py
================================================
import sys
import os
import csv
import numpy as np
import cv2
import math
import pose_utils
import os
import myparse
import renderer_fpn
## To make tensorflow print less (this can be useful for debug though)
#os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
#import ctypes;
print '> loading getRts'
import get_Rts as getRts
######## TMP FOLDER #####################
_tmpdir = './tmp/'#os.environ['TMPDIR'] + '/'
print '> make dir'
if not os.path.exists( _tmpdir):
os.makedirs( _tmpdir )
#########################################
##INPUT/OUTPUT
input_file = str(sys.argv[1]) #'input.csv'
outpu_proc = 'output_preproc.csv'
output_pose_db = './output_pose.lmdb'
output_render = './output_render'
#################################################
print '> network'
_alexNetSize = 227
_factor = 0.25 #0.1
# ***** please download the model in https://www.dropbox.com/s/r38psbq55y2yj4f/fpn_new_model.tar.gz?dl=0 ***** #
model_folder = './fpn_new_model/'
model_used = 'model_0_1.0_1.0_1e-07_1_16000.ckpt' #'model_0_1.0_1.0_1e-05_0_6000.ckpt'
lr_rate_scalar = 1.0
if_dropout = 0
keep_rate = 1
################################
data_dict = myparse.parse_input(input_file)
## Pre-processing the images
print '> preproc'
pose_utils.preProcessImage( _tmpdir, data_dict, './',\
_factor, _alexNetSize, outpu_proc )
## Runnin FacePoseNet
print '> run'
## Running the pose estimation
getRts.esimatePose( model_folder, outpu_proc, output_pose_db, model_used, lr_rate_scalar, if_dropout, keep_rate, use_gpu=False )
renderer_fpn.render_fpn(outpu_proc, output_pose_db, output_render)
================================================
FILE: main_predict_6DoF.py
================================================
import sys
import numpy as np
import tensorflow as tf
import cv2
import scipy.io as sio
sys.path.append('./utils')
import pose_utils as pu
import os
import os.path
from glob import glob
import time
import pickle
sys.path.append('./kaffe')
sys.path.append('./ResNet')
from ThreeDMM_shape import ResNet_101 as resnet101_shape
# Global parameters
factor = 0.25
_resNetSize = 224
n_hidden1 = 2048
n_hidden2 = 4096
ifdropout = 0
gpuID = int(sys.argv[1])
input_sample_list_path = str(sys.argv[2]) #'./input_list.txt' # You can change to your own image list
tf.logging.set_verbosity(tf.logging.INFO)
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('image_size', 224, 'Image side length.')
output_path = './output_6DoF'
tf.app.flags.DEFINE_string('save_output_path', output_path, 'Directory to keep the checkpoints')
tf.app.flags.DEFINE_integer('num_gpus', 1, 'Number of gpus used for training. (0 or 1)')
tf.app.flags.DEFINE_integer('batch_size', 1, 'Batch Size') # 60
if not os.path.exists(FLAGS.save_output_path):
os.makedirs(FLAGS.save_output_path)
def extract_3dmm_pose():
########################################
# Load train image mean, train label mean and std
########################################
# labels stats on 300W-LP
train_label_mean = np.load('./train_stats/train_label_mean_300WLP.npy')
train_label_std = np.load('./train_stats/train_label_std_300WLP.npy')
Pose_label_mean = train_label_mean[:6]
Pose_label_std = train_label_std[:6]
#ShapeExpr_label_mean_300WLP = train_label_mean[6:]
#ShapeExpr_label_std_300WLP = train_label_std[6:]
# Get training image mean from Anh's ShapeNet (CVPR2017)
mean_image_shape = np.load('./train_stats/3DMM_shape_mean.npy') # 3 x 224 x 224
train_image_mean = np.transpose(mean_image_shape, [1,2,0]) # 224 x 224 x 3, [0,255]
########################################
# Build CNN graph
########################################
# placeholders for the batches
x_img = tf.placeholder(tf.float32, [None, FLAGS.image_size, FLAGS.image_size, 3])
# Resize Image
x2 = tf.image.resize_bilinear(x_img, tf.constant([224,224], dtype=tf.int32))
x2 = tf.cast(x2, 'float32')
x2 = tf.reshape(x2, [-1, 224, 224, 3])
# Image normalization
mean = tf.reshape(train_image_mean, [1, 224, 224, 3])
mean = tf.cast(mean, 'float32')
x2 = x2 - mean
########################################
# New-FPN with ResNet structure
########################################
with tf.variable_scope('shapeCNN'):
net_shape = resnet101_shape({'input': x2}, trainable=True) # False: Freeze the ResNet Layers
pool5 = net_shape.layers['pool5']
pool5 = tf.squeeze(pool5)
pool5 = tf.reshape(pool5, [1, 2048])
print pool5.get_shape() # batch_size x 2048
with tf.variable_scope('Pose'):
with tf.variable_scope('fc1'):
fc1W = tf.Variable(tf.random_normal(tf.stack([pool5.get_shape()[1].value, n_hidden1]), mean=0.0, stddev=0.01), trainable=True, name='W')
fc1b = tf.Variable(tf.zeros([n_hidden1]), trainable=True, name='baises')
fc1 = tf.nn.relu_layer(tf.reshape(pool5, [-1, int(np.prod(pool5.get_shape()[1:]))]), fc1W, fc1b, name='fc1')
print "\nfc1 shape:"
print fc1.get_shape(), fc1W.get_shape(), fc1b.get_shape() # (batch_size, 4096) (2048, 4096) (4096,)
if ifdropout == 1:
fc1 = tf.nn.dropout(fc1, prob, name='fc1_dropout')
with tf.variable_scope('fc2'):
fc2W = tf.Variable(tf.random_normal([n_hidden1, n_hidden2], mean=0.0, stddev=0.01), trainable=True, name='W')
fc2b = tf.Variable(tf.zeros([n_hidden2]), trainable=True, name='baises')
fc2 = tf.nn.relu_layer(fc1, fc2W, fc2b, name='fc2')
print fc2.get_shape(), fc2W.get_shape(), fc2b.get_shape() # (batch_size, 29 (2048, 2048) (2048,)
if ifdropout == 1:
fc2 = tf.nn.dropout(fc2, prob, name='fc2_dropout')
with tf.variable_scope('fc3'):
# Move everything into depth so we can perform a single matrix multiplication.
fc2 = tf.reshape(fc2, [FLAGS.batch_size, -1])
dim = fc2.get_shape()[1].value
print "\nfc2 dim:"
print fc2.get_shape(), dim
fc3W = tf.Variable(tf.random_normal(tf.stack([dim,6]), mean=0.0, stddev=0.01), trainable=True, name='W')
fc3b = tf.Variable(tf.zeros([6]), trainable=True, name='baises')
#print "*** label shape: " + str(len(train_label_mean))
Pose_params_ZNorm = tf.nn.xw_plus_b(fc2, fc3W, fc3b)
print "\nfc3 shape:"
print Pose_params_ZNorm.get_shape(), fc3W.get_shape(), fc3b.get_shape()
Pose_label_mean = tf.cast(tf.reshape(Pose_label_mean, [1, -1]), 'float32')
Pose_label_std = tf.cast(tf.reshape(Pose_label_std, [1, -1]), 'float32')
Pose_params = Pose_params_ZNorm * (Pose_label_std + 0.000000000000000001) + Pose_label_mean
########################################
# Start extracting 3dmm pose
########################################
init_op = tf.global_variables_initializer()
saver = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES))
saver_ini_shape_net = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='shapeCNN'))
saver_shapeCNN = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='shapeCNN'))
saver_Pose = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='Pose'))
config = tf.ConfigProto(allow_soft_placement=True) #, log_device_placement=True)
#config.gpu_options.per_process_gpu_memory_fraction = 0.5
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
sess.run(init_op)
start_time = time.time()
# For non-trainable parameters such as the parameters for batch normalization
load_path = "./models/ini_shapeNet_model_L7L_trainable.ckpt"
saver_ini_shape_net.restore(sess, load_path)
# For other trainable parameters
load_path = "./models/model_0.0001_1_18_0.0_2048_4096.ckpt"
saver_shapeCNN.restore(sess, load_path)
saver_Pose.restore(sess, load_path)
load_model_time = time.time() - start_time
print("Model restored: " + str(load_model_time))
with open(input_sample_list_path, 'r') as fin:
for line in fin:
curr_line = line.strip().split(',')
image_path = curr_line[0]
bbox = np.array([float(curr_line[1]), float(curr_line[2]), float(curr_line[3]), float(curr_line[4])]) # [lt_x, lt_y, w, h]
image_key = image_path.split('/')[-1][:-4]
image = cv2.imread(image_path,1) # BGR
image = np.asarray(image)
# Fix the grey image
if len(image.shape) < 3:
image_r = np.reshape(image, (image.shape[0], image.shape[1], 1))
image = np.append(image_r, image_r, axis=2)
image = np.append(image, image_r, axis=2)
# Crop and expand (25%) the image based on the tight bbox (from the face detector or detected lmks)
factor = [1.9255, 2.2591, 1.9423, 1.6087];
img_new = pu.preProcessImage_v2(image.copy(), bbox.copy(), factor, _resNetSize, 1)
image_array = np.reshape(img_new, [1, _resNetSize, _resNetSize, 3])
(params_pose, pool5_feats) = sess.run([Pose_params, pool5], feed_dict={x_img: image_array}) # [scale, pitch, yaw, roll, translation_x, translation_y]
params_pose = params_pose[0]
print params_pose #, pool5_feats
# save the predicted pose
with open(FLAGS.save_output_path + '/' + image_key + '.txt', 'w') as fout:
for pp in params_pose:
fout.write(str(pp) + '\n')
# Convert the 6DoF predicted pose to 3x4 projection matrix (weak-perspective projection)
# Load BFM model
shape_mat = sio.loadmat('./BFM/Model_Shape.mat')
mu_shape = shape_mat['mu_shape'].astype('float32')
expr_mat = sio.loadmat('./BFM/Model_Exp.mat')
mu_exp = expr_mat['mu_exp'].astype('float32')
mu = mu_shape + mu_exp
len_mu = len(mu)
mu = np.reshape(mu, [-1,1])
keypoints = np.reshape(shape_mat['keypoints'], [-1]) - 1 # -1 for python index
keypoints = keypoints.astype('int32')
vertex = np.reshape(mu, [len_mu/3, 3]) # # of vertices x 3
# mean shape
mesh = vertex.T # 3 x # of vertices
mesh_1 = np.concatenate([mesh, np.ones([1,len_mu/3])], axis=0) # 4 x # of vertices
# Get projection matrix from 6DoF pose
scale, pitch, yaw, roll, tx, ty = params_pose
R = pu.RotationMatrix(pitch, yaw, roll)
ProjMat = np.zeros([3,4])
ProjMat[:,:3] = scale * R
ProjMat[:,3] = np.array([tx,ty,0])
# Get predicted shape
#print ProjMat, ProjMat.shape
#print mesh_1, mesh_1.shape
pred_shape = np.matmul(ProjMat, mesh_1) # 3 x # of vertices
pred_shape = pred_shape.T # # of vertices x 3
pred_shape_x = np.reshape(pred_shape[:,0], [len_mu/3, 1])
pred_shape_z = np.reshape(pred_shape[:,2], [len_mu/3, 1])
pred_shape_y = 224 + 1 - pred_shape[:,1]
pred_shape_y = np.reshape(pred_shape_y, [len_mu/3, 1])
pred_shape = np.concatenate([pred_shape_x, pred_shape_y, pred_shape_z], 1)
# Convert shape and lmks back to the original image scale
_, bbox_new, _, lmks_filling, old_h, old_w, img_new = pu.resize_crop_rescaleCASIA(image.copy(), bbox.copy(), pred_shape.copy(), factor)
#print lmks_filling
pred_shape[:,0] = pred_shape[:,0] * old_w / 224.
pred_shape[:,1] = pred_shape[:,1] * old_h / 224.
pred_shape[:,0] = pred_shape[:,0] + bbox_new[0]
pred_shape[:,1] = pred_shape[:,1] + bbox_new[1]
# Get predicted lmks
pred_lmks = pred_shape[keypoints]
sio.savemat(FLAGS.save_output_path + '/' + image_key + '.mat', {'shape_3D': pred_shape, 'lmks_3D': pred_lmks})
#cv2.imwrite(FLAGS.save_output_path + '/' + image_key + '.jpg', img_new)
def main(_):
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]=str(gpuID)
if FLAGS.num_gpus == 0:
dev = '/cpu:0'
elif FLAGS.num_gpus == 1:
dev = '/gpu:0'
else:
raise ValueError('Only support 0 or 1 gpu.')
print dev
with tf.device(dev):
extract_3dmm_pose()
if __name__ == '__main__':
tf.app.run()
================================================
FILE: main_predict_ProjMat.py
================================================
import sys
import numpy as np
import tensorflow as tf
import cv2
import scipy.io as sio
sys.path.append('./utils')
import pose_utils as pu
import os
import os.path
from glob import glob
import time
import pickle
sys.path.append('./kaffe')
sys.path.append('./ResNet')
from ThreeDMM_shape import ResNet_101 as resnet101_shape
# Global parameters
factor = 0.25
_resNetSize = 224
n_hidden1 = 2048
n_hidden2 = 4096
ifdropout = 0
gpuID = int(sys.argv[1])
input_sample_list_path = str(sys.argv[2]) #'./input_list.txt' # You can change to your own image list
tf.logging.set_verbosity(tf.logging.INFO)
FLAGS = tf.app.flags.FLAGS
tf.app.flags.DEFINE_integer('image_size', 224, 'Image side length.')
output_path = './output_ProjMat'
tf.app.flags.DEFINE_string('save_output_path', output_path, 'Directory to keep the checkpoints')
tf.app.flags.DEFINE_integer('num_gpus', 1, 'Number of gpus used for training. (0 or 1)')
tf.app.flags.DEFINE_integer('batch_size', 1, 'Batch Size') # 60
if not os.path.exists(FLAGS.save_output_path):
os.makedirs(FLAGS.save_output_path)
def extract_3dmm_ProjMat():
########################################
# Load train image mean, train label mean and std
########################################
# labels stats on 300W-LP
train_label_mean = np.load('./train_stats/train_label_mean_ProjMat.npy')
train_label_std = np.load('./train_stats/train_label_std_ProjMat.npy')
ProjMat_label_mean = train_label_mean[-12:-1]
ProjMat_label_std = train_label_std[-12:-1]
# Get training image mean from Anh's ShapeNet (CVPR2017)
mean_image_shape = np.load('./train_stats/3DMM_shape_mean.npy') # 3 x 224 x 224
train_image_mean = np.transpose(mean_image_shape, [1,2,0]) # 224 x 224 x 3, [0,255]
########################################
# Build CNN graph
########################################
# placeholders for the batches
x_img = tf.placeholder(tf.float32, [None, FLAGS.image_size, FLAGS.image_size, 3])
# Resize Image
x2 = tf.image.resize_bilinear(x_img, tf.constant([224,224], dtype=tf.int32))
x2 = tf.cast(x2, 'float32')
x2 = tf.reshape(x2, [-1, 224, 224, 3])
# Image normalization
mean = tf.reshape(train_image_mean, [1, 224, 224, 3])
mean = tf.cast(mean, 'float32')
x2 = x2 - mean
########################################
# New-FPN with ResNet structure
########################################
with tf.variable_scope('shapeCNN'):
net_shape = resnet101_shape({'input': x2}, trainable=True) # False: Freeze the ResNet Layers
pool5 = net_shape.layers['pool5']
pool5 = tf.squeeze(pool5)
pool5 = tf.reshape(pool5, [1, 2048])
print pool5.get_shape() # batch_size x 2048
with tf.variable_scope('Pose'):
with tf.variable_scope('fc1'):
fc1W = tf.Variable(tf.random_normal(tf.stack([pool5.get_shape()[1].value, n_hidden1]), mean=0.0, stddev=0.01), trainable=True, name='W')
fc1b = tf.Variable(tf.zeros([n_hidden1]), trainable=True, name='baises')
fc1 = tf.nn.relu_layer(tf.reshape(pool5, [-1, int(np.prod(pool5.get_shape()[1:]))]), fc1W, fc1b, name='fc1')
print "\nfc1 shape:"
print fc1.get_shape(), fc1W.get_shape(), fc1b.get_shape() # (batch_size, 4096) (2048, 4096) (4096,)
if ifdropout == 1:
fc1 = tf.nn.dropout(fc1, prob, name='fc1_dropout')
with tf.variable_scope('fc2'):
fc2W = tf.Variable(tf.random_normal([n_hidden1, n_hidden2], mean=0.0, stddev=0.01), trainable=True, name='W')
fc2b = tf.Variable(tf.zeros([n_hidden2]), trainable=True, name='baises')
fc2 = tf.nn.relu_layer(fc1, fc2W, fc2b, name='fc2')
print fc2.get_shape(), fc2W.get_shape(), fc2b.get_shape() # (batch_size, 29 (2048, 2048) (2048,)
if ifdropout == 1:
fc2 = tf.nn.dropout(fc2, prob, name='fc2_dropout')
with tf.variable_scope('fc3'):
# Move everything into depth so we can perform a single matrix multiplication.
fc2 = tf.reshape(fc2, [FLAGS.batch_size, -1])
dim = fc2.get_shape()[1].value
print "\nfc2 dim:"
print fc2.get_shape(), dim
fc3W = tf.Variable(tf.random_normal(tf.stack([dim,11]), mean=0.0, stddev=0.01), trainable=True, name='W')
fc3b = tf.Variable(tf.zeros([11]), trainable=True, name='baises')
#print "*** label shape: " + str(len(train_label_mean))
ProjMat_preds_ZNorm = tf.nn.xw_plus_b(fc2, fc3W, fc3b)
print "\nfc3 shape:"
print ProjMat_preds_ZNorm.get_shape(), fc3W.get_shape(), fc3b.get_shape()
label_mean = tf.cast(tf.reshape(ProjMat_label_mean, [1, -1]), 'float32')
label_std = tf.cast(tf.reshape(ProjMat_label_std, [1, -1]), 'float32')
ProjMat_preds = ProjMat_preds_ZNorm * (label_std + 0.000000000000000001) + label_mean
ProjMat_preds = tf.concat([ProjMat_preds, tf.zeros([FLAGS.batch_size,1])], 1)
########################################
# Start extracting 3dmm pose
########################################
init_op = tf.global_variables_initializer()
saver = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES))
saver_ini_shape_net = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='shapeCNN'))
saver_shapeCNN = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='shapeCNN'))
saver_Pose = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='Pose'))
config = tf.ConfigProto(allow_soft_placement=True) #, log_device_placement=True)
#config.gpu_options.per_process_gpu_memory_fraction = 0.5
config.gpu_options.allow_growth = True
with tf.Session(config=config) as sess:
sess.run(init_op)
start_time = time.time()
load_path = "./models/ini_shapeNet_model_L7L_trainable.ckpt"
saver_ini_shape_net.restore(sess, load_path)
load_path = "./models/model_0.0001_1_18_0.0_2048_4096.ckpt"
saver_shapeCNN.restore(sess, load_path)
load_path = "./models/model_iniLR_0.001_wProjMat_1.0_wLmks_10.0_wd_0.0_do_1_122.ckpt"
saver_Pose.restore(sess, load_path)
load_model_time = time.time() - start_time
print("Model restored: " + str(load_model_time))
with open(input_sample_list_path, 'r') as fin:
for line in fin:
curr_line = line.strip().split(',')
image_path = curr_line[0]
bbox = np.array([float(curr_line[1]), float(curr_line[2]), float(curr_line[3]), float(curr_line[4])]) # [lt_x, lt_y, w, h]
image_key = image_path.split('/')[-1][:-4]
image = cv2.imread(image_path,1) # BGR
image = np.asarray(image)
# Fix the grey image
if len(image.shape) < 3:
image_r = np.reshape(image, (image.shape[0], image.shape[1], 1))
image = np.append(image_r, image_r, axis=2)
image = np.append(image, image_r, axis=2)
# Crop and expand (25%) the image based on the tight bbox (from the face detector or detected lmks)
factor = [1.9255, 2.2591, 1.9423, 1.6087];
img_new = pu.preProcessImage_v2(image.copy(), bbox.copy(), factor, _resNetSize, 1)
image_array = np.reshape(img_new, [1, _resNetSize, _resNetSize, 3])
#print image_array
(params_ProjMat, pool5_feats) = sess.run([ProjMat_preds, pool5], feed_dict={x_img: image_array}) # [scale, pitch, yaw, roll, translation_x, translation_y]
params_ProjMat = params_ProjMat[0]
#print params_ProjMat, pool5_feats
# save the predicted pose
with open(FLAGS.save_output_path + '/' + image_key + '.txt', 'w') as fout:
for pp in params_ProjMat:
fout.write(str(pp) + '\n')
# Convert the 6DoF predicted pose to 3x4 projection matrix (weak-perspective projection)
# Load BFM model
shape_mat = sio.loadmat('./BFM/Model_Shape.mat')
mu_shape = shape_mat['mu_shape'].astype('float32')
expr_mat = sio.loadmat('./BFM/Model_Exp.mat')
mu_exp = expr_mat['mu_exp'].astype('float32')
mu = mu_shape + mu_exp
len_mu = len(mu)
mu = np.reshape(mu, [-1,1])
keypoints = np.reshape(shape_mat['keypoints'], [-1]) - 1 # -1 for python index
keypoints = keypoints.astype('int32')
vertex = np.reshape(mu, [len_mu/3, 3]) # # of vertices x 3
# mean shape
mesh = vertex.T # 3 x # of vertices
mesh_1 = np.concatenate([mesh, np.ones([1,len_mu/3])], axis=0) # 4 x # of vertices
# Get projection matrix from 6DoF pose
ProjMat = np.reshape(params_ProjMat, [4,3])
ProjMat = ProjMat.T
# Get predicted shape
#print ProjMat, ProjMat.shape
#print mesh_1, mesh_1.shape
pred_shape = np.matmul(ProjMat, mesh_1) # 3 x # of vertices
pred_shape = pred_shape.T # # of vertices x 3
pred_shape_x = np.reshape(pred_shape[:,0], [len_mu/3, 1])
pred_shape_z = np.reshape(pred_shape[:,2], [len_mu/3, 1])
pred_shape_y = 224 + 1 - pred_shape[:,1]
pred_shape_y = np.reshape(pred_shape_y, [len_mu/3, 1])
pred_shape = np.concatenate([pred_shape_x, pred_shape_y, pred_shape_z], 1)
# Convert shape and lmks back to the original image scale
_, bbox_new, _, _, old_h, old_w, _ = pu.resize_crop_rescaleCASIA(image.copy(), bbox.copy(), pred_shape.copy(), factor)
pred_shape[:,0] = pred_shape[:,0] * old_w / 224.
pred_shape[:,1] = pred_shape[:,1] * old_h / 224.
pred_shape[:,0] = pred_shape[:,0] + bbox_new[0]
pred_shape[:,1] = pred_shape[:,1] + bbox_new[1]
# Get predicted lmks
pred_lmks = pred_shape[keypoints]
sio.savemat(FLAGS.save_output_path + '/' + image_key + '.mat', {'shape_3D': pred_shape, 'lmks_3D': pred_lmks})
# Obtain pose from ProjMat
scale,R,t3d = pu.P2sRt(ProjMat) # decompose affine matrix to s, R, t
pose = pu.matrix2angle(R) # yaw, pitch, roll
# print scale, pitch, yaw , roll, translation_x, translation_y
print scale, pose[1], pose[0], pose[2], t3d[0], t3d[1]
def main(_):
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID"
os.environ["CUDA_VISIBLE_DEVICES"]=str(gpuID)
if FLAGS.num_gpus == 0:
dev = '/cpu:0'
elif FLAGS.num_gpus == 1:
dev = '/gpu:0'
else:
raise ValueError('Only support 0 or 1 gpu.')
print dev
with tf.device(dev):
extract_3dmm_ProjMat()
if __name__ == '__main__':
tf.app.run()
================================================
FILE: models/README
================================================
Please download all the model files and put them in this folder
================================================
FILE: myparse.py
================================================
import csv
def parse_input(input_file):
data_dict = dict()
reader = csv.DictReader(open(input_file,'r'))
#### Reading the metadata into a DICT
for line in reader:
key = line['ID']
data_dict[key] = {'file' : line['FILE'] ,\
'x' : float( line['FACE_X'] ),\
'y' : float( line['FACE_Y'] ),\
'width' : float( line['FACE_WIDTH'] ),\
'height' : float( line['FACE_HEIGHT'] ),\
}
return data_dict
================================================
FILE: output_render/README.md
================================================
The rendered images will be saved here!
## Subject 1 ##
### input: ###

### rendering: ###





## Subject 2 ##
### input: ###

### rendering: ###



## Subject 3 ##
### input: ###

### rendering: ###



## Subject 4 ##
### input: ###

### rendering: ###



## Subject 5 ##
## input: ##
### input: ###

### rendering: ###



## Subject 6 ##
### input: ###

### rendering: ###



## Subject 7 ##
### input: ###

### rendering: ###





## Subject 8 ##
### input: ###

### rendering: ###



## Subject 9 ##
### input: ###

### rendering: ###



## Subject 10 ##
### input: ###

### rendering: ###





================================================
FILE: pose_model.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""ResNet model.
Related papers:
https://arxiv.org/pdf/1603.05027v2.pdf
https://arxiv.org/pdf/1512.03385v1.pdf
https://arxiv.org/pdf/1605.07146v1.pdf
"""
import numpy as np
import tensorflow as tf
from tensorflow.python.training import moving_averages
#import sys
#sys.path.append('/staging/pn/fengjuch/transformer')
#from spatial_transformer import transformer
#from tf_utils import weight_variable, bias_variable, dense_to_one_hot
"""
HParams = namedtuple('HParams',
'batch_size, num_classes, min_lrn_rate, lrn_rate, '
'num_residual_units, use_bottleneck, weight_decay_rate, '
'relu_leakiness, optimizer')
"""
class ThreeD_Pose_Estimation(object):
"""ResNet model."""
def __init__(self, images, labels, mode, ifdropout, keep_rate_fc6, keep_rate_fc7, lr_rate_fac, net_data, batch_size, mean_labels, std_labels):
"""ResNet constructor.
Args:
hps: Hyperparameters.
images: Batches of images. [batch_size, image_size, image_size, 3]
labels: Batches of labels. [batch_size, num_classes]
mode: One of 'train' and 'eval'.
"""
#self.hps = hps
self.batch_size = batch_size
self._images = images
self.labels = labels
self.mode = mode
self.ifdropout = ifdropout
self.keep_rate_fc6 = keep_rate_fc6
self.keep_rate_fc7 = keep_rate_fc7
self.ifadd_weight_decay = 0 #ifadd_weight_decay
self.net_data = net_data
self.lr_rate_fac = lr_rate_fac
self._extra_train_ops = []
self.optimizer = 'Adam'
self.mean_labels = mean_labels
self.std_labels = std_labels
#self.train_mean_vec = train_mean_vec
def _build_graph(self):
"""Build a whole graph for the model."""
self.global_step = tf.Variable(0, name='global_step', trainable=False)
self._build_model()
if self.mode == 'train':
self._build_train_op()
#self.summaries = tf.merge_all_summaries()
def _stride_arr(self, stride):
"""Map a stride scalar to the stride array for tf.nn.conv2d."""
return [1, stride, stride, 1]
def _build_model(self):
"""Build the core model within the graph."""
#with tf.variable_scope('init'):
# x = self._images
# print x, x.get_shape()
# x = self._conv('init_conv', x, 3, 3, 16, self._stride_arr(1))
# print x, x.get_shape()
with tf.variable_scope('Spatial_Transformer'):
x = self._images
x = tf.image.resize_bilinear(x, tf.constant([227,227], dtype=tf.int32)) # the image should be 227 x 227 x 3
print x.get_shape()
self.resized_img = x
theta = self._ST('ST2', x, 3, (16,16), 3, 16, self._stride_arr(1))
#print "*** ", x.get_shape()
#with tf.variable_scope('logit'):
# logits = self._fully_connected(theta, self.hps.num_classes)
# self.predictions = tf.nn.softmax(logits)
#print "*** ", logits, self.predictions
with tf.variable_scope('costs'):
self.predictions = theta
self.preds_unNormalized = theta * (self.std_labels + 0.000000000000000001) + self.mean_labels
pred_dim1 = theta.get_shape()[0]
pred_dim2 = theta.get_shape()[1]
del theta
#diff = self.predictions - self.labels
#print diff
#xent = tf.mul(diff, diff) #tf.nn.l2_loss(diff)
#print xent
#xent = tf.reduce_sum(xent, 1)
pow_res = tf.pow(self.predictions-self.labels, 2)
"""
print pow_res, pow_res.get_shape()
const1 = tf.constant(1.0,shape=[pred_dim1, 3],dtype=tf.float32)
const2 = tf.constant(1.0,shape=[pred_dim1, 3],dtype=tf.float32)
#print const1, const2, const1.get_shape(), const2.get_shape()
const = tf.concat(1,[const1, const2])
print const, const.get_shape()
cpow_res = tf.mul(const,pow_res)
xent = tf.reduce_sum(cpow_res,1)
print xent
"""
xent = tf.reduce_sum(pow_res,1)
self.cost = tf.reduce_mean(xent, name='xent')
#print self.cost
#self.cost = tf.nn.l2_loss(diff)
# Add weight decay of needed
if self.ifadd_weight_decay == 1:
self.cost += self._decay()
#self.train_step = tf.train.GradientDescentOptimizer(self.hps.lrn_rate).minimize(self.cost)
#tf.scalar_summary('cost', self.cost)
def conv(self, input, kernel, biases, k_h, k_w, c_o, s_h, s_w, padding="VALID", group=1):
'''From https://github.com/ethereon/caffe-tensorflow
'''
c_i = input.get_shape()[-1]
assert c_i%group==0
assert c_o%group==0
convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding)
if group==1:
conv = convolve(input, kernel)
else:
#input_groups = tf.split(3, group, input)
#kernel_groups = tf.split(3, group, kernel)
input_groups = tf.split(input, group, 3)
kernel_groups = tf.split(kernel, group, 3)
output_groups = [convolve(i, k) for i,k in zip(input_groups, kernel_groups)]
#conv = tf.concat(3, output_groups)
conv = tf.concat(output_groups, 3)
return tf.reshape(tf.nn.bias_add(conv, biases), [-1]+conv.get_shape().as_list()[1:])
def _ST(self, name, x, channel_x, out_size, filter_size, out_filters, strides):
""" Spatial Transformer. """
with tf.variable_scope(name):
# zero-mean input [B,G,R]: [93.5940, 104.7624, 129.1863] --> provided by vgg-face
"""
with tf.name_scope('preprocess') as scope:
mean = tf.constant(tf.reshape(self.train_mean_vec*255.0, [3]), dtype=tf.float32, shape=[1, 1, 1, 3], name='img_mean')
x = x - mean
"""
# conv1
with tf.name_scope('conv1') as scope:
#conv(11, 11, 96, 4, 4, padding='VALID', name='conv1')
k_h = 11; k_w = 11; c_o = 96; s_h = 4; s_w = 4
conv1W = tf.Variable(self.net_data["conv1"]["weights"], trainable=True, name='W')
conv1b = tf.Variable(self.net_data["conv1"]["biases"], trainable=True, name='baises')
conv1_in = self.conv(x, conv1W, conv1b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=1)
conv1 = tf.nn.relu(conv1_in, name='conv1')
print x.get_shape(), conv1.get_shape()
#maxpool1
#max_pool(3, 3, 2, 2, padding='VALID', name='pool1')
k_h = 3; k_w = 3; s_h = 2; s_w = 2; padding = 'VALID'
maxpool1 = tf.nn.max_pool(conv1, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding, name='pool1')
print maxpool1.get_shape()
#lrn1
#lrn(2, 2e-05, 0.75, name='norm1')
radius = 2; alpha = 2e-05; beta = 0.75; bias = 1.0
lrn1 = tf.nn.local_response_normalization(maxpool1,
depth_radius=radius,
alpha=alpha,
beta=beta,
bias=bias, name='norm1')
# conv2
with tf.name_scope('conv2') as scope:
#conv(5, 5, 256, 1, 1, group=2, name='conv2')
k_h = 5; k_w = 5; c_o = 256; s_h = 1; s_w = 1; group = 2
conv2W = tf.Variable(self.net_data["conv2"]["weights"], trainable=True, name='W')
conv2b = tf.Variable(self.net_data["conv2"]["biases"], trainable=True, name='baises')
conv2_in = self.conv(lrn1, conv2W, conv2b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group)
conv2 = tf.nn.relu(conv2_in, name='conv2')
print conv2.get_shape()
#maxpool2
#max_pool(3, 3, 2, 2, padding='VALID', name='pool2')
k_h = 3; k_w = 3; s_h = 2; s_w = 2; padding = 'VALID'
maxpool2 = tf.nn.max_pool(conv2, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding, name='pool2')
print maxpool2.get_shape()
#lrn2
#lrn(2, 2e-05, 0.75, name='norm2')
radius = 2; alpha = 2e-05; beta = 0.75; bias = 1.0
lrn2 = tf.nn.local_response_normalization(maxpool2,
depth_radius=radius,
alpha=alpha,
beta=beta,
bias=bias, name='norm2')
# conv3
with tf.name_scope('conv3') as scope:
#conv(3, 3, 384, 1, 1, name='conv3')
k_h = 3; k_w = 3; c_o = 384; s_h = 1; s_w = 1; group = 1
conv3W = tf.Variable(self.net_data["conv3"]["weights"], trainable=True, name='W')
conv3b = tf.Variable(self.net_data["conv3"]["biases"], trainable=True, name='baises')
conv3_in = self.conv(lrn2, conv3W, conv3b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group)
conv3 = tf.nn.relu(conv3_in, name='conv3')
print conv3.get_shape()
# conv4
with tf.name_scope('conv4') as scope:
#conv(3, 3, 384, 1, 1, group=2, name='conv4')
k_h = 3; k_w = 3; c_o = 384; s_h = 1; s_w = 1; group = 2
conv4W = tf.Variable(self.net_data["conv4"]["weights"], trainable=True, name='W')
conv4b = tf.Variable(self.net_data["conv4"]["biases"], trainable=True, name='baises')
conv4_in = self.conv(conv3, conv4W, conv4b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group)
conv4 = tf.nn.relu(conv4_in, name='conv4')
print conv4.get_shape()
# conv5
with tf.name_scope('conv5') as scope:
#conv(3, 3, 256, 1, 1, group=2, name='conv5')
k_h = 3; k_w = 3; c_o = 256; s_h = 1; s_w = 1; group = 2
conv5W = tf.Variable(self.net_data["conv5"]["weights"], trainable=True, name='W')
conv5b = tf.Variable(self.net_data["conv5"]["biases"], trainable=True, name='baises')
self.conv5b = conv5b
conv5_in = self.conv(conv4, conv5W, conv5b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group)
conv5 = tf.nn.relu(conv5_in, name='conv5')
print conv5.get_shape()
#maxpool5
#max_pool(3, 3, 2, 2, padding='VALID', name='pool5')
k_h = 3; k_w = 3; s_h = 2; s_w = 2; padding = 'VALID'
maxpool5 = tf.nn.max_pool(conv5, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding, name='pool5')
print maxpool5.get_shape(), maxpool5.get_shape()[1:], int(np.prod(maxpool5.get_shape()[1:]))
# fc6
with tf.variable_scope('fc6') as scope:
#fc(4096, name='fc6')
fc6W = tf.Variable(self.net_data["fc6"]["weights"], trainable=True, name='W')
fc6b = tf.Variable(self.net_data["fc6"]["biases"], trainable=True, name='baises')
self.fc6W = fc6W
self.fc6b = fc6b
fc6 = tf.nn.relu_layer(tf.reshape(maxpool5, [-1, int(np.prod(maxpool5.get_shape()[1:]))]), fc6W, fc6b, name='fc6')
print fc6.get_shape()
if self.ifdropout == 1:
fc6 = tf.nn.dropout(fc6, self.keep_rate_fc6, name='fc6_dropout')
# fc7
with tf.variable_scope('fc7') as scope:
#fc(4096, name='fc7')
fc7W = tf.Variable(self.net_data["fc7"]["weights"], trainable=True, name='W')
fc7b = tf.Variable(self.net_data["fc7"]["biases"], trainable=True, name='baises')
self.fc7b = fc7b
fc7 = tf.nn.relu_layer(fc6, fc7W, fc7b, name='fc7')
print fc7.get_shape()
if self.ifdropout == 1:
fc7 = tf.nn.dropout(fc7, self.keep_rate_fc7, name='fc7_dropout')
# fc8
with tf.variable_scope('fc8') as scope:
"""
#fc(6, relu=False, name='fc8')
fc8W = tf.Variable(net_data["fc8"][0])
fc8b = tf.Variable(net_data["fc8"][1])
fc8 = tf.nn.xw_plus_b(fc7, fc8W, fc8b)
"""
# Move everything into depth so we can perform a single matrix multiplication.
fc7 = tf.reshape(fc7, [self.batch_size, -1])
dim = fc7.get_shape()[1].value
#print "fc7 dim:\n"
#print fc7.get_shape(), dim
fc8W = tf.Variable(tf.random_normal([dim, 6], mean=0.0, stddev=0.01), trainable=True, name='W')
fc8b = tf.Variable(tf.zeros([6]), trainable=True, name='baises')
self.fc8b = fc8b
theta = tf.nn.xw_plus_b(fc7, fc8W, fc8b)
"""
weights = self._variable_with_weight_decay('weights', shape=[dim, 6],
stddev=0.04, wd=None) #wd=0.004)
biases = self._variable_on_cpu('biases', [6], tf.constant_initializer(0.1))
theta = tf.matmul(reshape, weights) + biases
print theta.get_shape()
"""
self.theta = theta
self.fc8W = fc8W
self.fc8b = fc8b
# %% We'll create a spatial transformer module to identify discriminative
# %% patches
#h_trans = self._transform(theta, x, out_size, channel_x)
#print h_trans.get_shape()
return theta
def _variable_with_weight_decay(self, name, shape, stddev, wd):
"""Helper to create an initialized Variable with weight decay.
Note that the Variable is initialized with a truncated normal distribution.
A weight decay is added only if one is specified.
Args:
name: name of the variable
shape: list of ints
stddev: standard deviation of a truncated Gaussian
wd: add L2Loss weight decay multiplied by this float. If None, weight
decay is not added for this Variable.
Returns:
Variable Tensor
"""
dtype = tf.float32 #if FLAGS.use_fp16 else tf.float32
var = self._variable_on_cpu(
name,
shape,
tf.truncated_normal_initializer(stddev=stddev, dtype=dtype))
if wd is not None:
weight_decay = tf.mul(tf.nn.l2_loss(var), wd, name='weight_loss')
tf.add_to_collection('losses', weight_decay)
return var
def _variable_on_cpu(self, name, shape, initializer):
"""Helper to create a Variable stored on CPU memory.
Args: name: name of the variable
shape: list of ints
initializer: initializer for Variable
Returns:
Variable Tensor
"""
with tf.device('/cpu:0'):
dtype = tf.float32 # if FLAGS.use_fp16 else tf.float32
var = tf.get_variable(name, shape, initializer=initializer, dtype=dtype)
return var
def _build_train_op(self):
"""Build training specific ops for the graph."""
#self.lrn_rate = tf.constant(self.hps.lrn_rate, tf.float32)
#tf.scalar_summary('learning rate', self.lrn_rate)
"""
trainable_variables = tf.trainable_variables()
grads = tf.gradients(self.cost, trainable_variables)
"""
if self.optimizer == 'sgd':
optimizer = tf.train.GradientDescentOptimizer(self.lrn_rate)
elif self.optimizer == 'Adam':
optimizer = tf.train.AdamOptimizer(0.001 * self.lr_rate_fac)
elif self.optimizer == 'mom':
optimizer = tf.train.MomentumOptimizer(self.lrn_rate, 0.9)
"""
apply_op = optimizer.apply_gradients(
zip(grads, trainable_variables),
global_step=self.global_step, name='train_step')
train_ops = [apply_op] + self._extra_train_ops
self.train_op = tf.group(*train_ops)
"""
self.train_op = optimizer.minimize(self.cost)
# TODO(xpan): Consider batch_norm in contrib/layers/python/layers/layers.py
def _batch_norm(self, name, x):
"""Batch normalization."""
with tf.variable_scope(name):
params_shape = [x.get_shape()[-1]]
#print x.get_shape(), params_shape
beta = tf.get_variable(
'beta', params_shape, tf.float32,
initializer=tf.constant_initializer(0.0, tf.float32))
gamma = tf.get_variable(
'gamma', params_shape, tf.float32,
initializer=tf.constant_initializer(1.0, tf.float32))
if self.mode == 'train':
mean, variance = tf.nn.moments(x, [0, 1, 2], name='moments')
moving_mean = tf.get_variable(
'moving_mean', params_shape, tf.float32,
initializer=tf.constant_initializer(0.0, tf.float32),
trainable=False)
moving_variance = tf.get_variable(
'moving_variance', params_shape, tf.float32,
initializer=tf.constant_initializer(1.0, tf.float32),
trainable=False)
self._extra_train_ops.append(moving_averages.assign_moving_average(
moving_mean, mean, 0.9))
self._extra_train_ops.append(moving_averages.assign_moving_average(
moving_variance, variance, 0.9))
else:
mean = tf.get_variable(
'moving_mean', params_shape, tf.float32,
initializer=tf.constant_initializer(0.0, tf.float32),
trainable=False)
variance = tf.get_variable(
'moving_variance', params_shape, tf.float32,
initializer=tf.constant_initializer(1.0, tf.float32),
trainable=False)
tf.histogram_summary(mean.op.name, mean)
tf.histogram_summary(variance.op.name, variance)
# elipson used to be 1e-5. Maybe 0.001 solves NaN problem in deeper net.
y = tf.nn.batch_normalization(
x, mean, variance, beta, gamma, 0.001)
y.set_shape(x.get_shape())
return y
def _residual(self, x, in_filter, out_filter, stride,
activate_before_residual=False):
"""Residual unit with 2 sub layers."""
if activate_before_residual:
with tf.variable_scope('shared_activation'):
x = self._batch_norm('init_bn', x)
x = self._relu(x, self.hps.relu_leakiness)
orig_x = x
else:
with tf.variable_scope('residual_only_activation'):
orig_x = x
x = self._batch_norm('init_bn', x)
x = self._relu(x, self.hps.relu_leakiness)
with tf.variable_scope('sub1'):
x = self._conv('conv1', x, 3, in_filter, out_filter, stride)
with tf.variable_scope('sub2'):
x = self._batch_norm('bn2', x)
x = self._relu(x, self.hps.relu_leakiness)
x = self._conv('conv2', x, 3, out_filter, out_filter, [1, 1, 1, 1])
with tf.variable_scope('sub_add'):
if in_filter != out_filter:
orig_x = tf.nn.avg_pool(orig_x, stride, stride, 'VALID')
orig_x = tf.pad(
orig_x, [[0, 0], [0, 0], [0, 0],
[(out_filter-in_filter)//2, (out_filter-in_filter)//2]])
x += orig_x
tf.logging.info('image after unit %s', x.get_shape())
return x
def _bottleneck_residual(self, x, in_filter, out_filter, stride,
activate_before_residual=False):
"""Bottleneck resisual unit with 3 sub layers."""
if activate_before_residual:
with tf.variable_scope('common_bn_relu'):
x = self._batch_norm('init_bn', x)
x = self._relu(x, self.hps.relu_leakiness)
orig_x = x
else:
with tf.variable_scope('residual_bn_relu'):
orig_x = x
x = self._batch_norm('init_bn', x)
x = self._relu(x, self.hps.relu_leakiness)
with tf.variable_scope('sub1'):
x = self._conv('conv1', x, 1, in_filter, out_filter/4, stride)
with tf.variable_scope('sub2'):
x = self._batch_norm('bn2', x)
x = self._relu(x, self.hps.relu_leakiness)
x = self._conv('conv2', x, 3, out_filter/4, out_filter/4, [1, 1, 1, 1])
with tf.variable_scope('sub3'):
x = self._batch_norm('bn3', x)
x = self._relu(x, self.hps.relu_leakiness)
x = self._conv('conv3', x, 1, out_filter/4, out_filter, [1, 1, 1, 1])
with tf.variable_scope('sub_add'):
if in_filter != out_filter:
orig_x = self._conv('project', orig_x, 1, in_filter, out_filter, stride)
x += orig_x
tf.logging.info('image after unit %s', x.get_shape())
return x
def _decay(self):
"""L2 weight decay loss."""
costs = []
for var in tf.trainable_variables():
if var.op.name.find(r'DW') > 0:
costs.append(tf.nn.l2_loss(var))
aaa = tf.nn.l2_loss(var)
#print aaa
# tf.histogram_summary(var.op.name, var)
return tf.mul(self.hps.weight_decay_rate, tf.add_n(costs))
def _conv(self, name, x, filter_size, in_filters, out_filters, strides):
"""Convolution."""
with tf.variable_scope(name):
n = filter_size * filter_size * out_filters
kernel = tf.get_variable(
'DW', [filter_size, filter_size, in_filters, out_filters],
tf.float32, initializer=tf.random_normal_initializer(
stddev=np.sqrt(2.0/n)))
return tf.nn.conv2d(x, kernel, strides, padding='SAME')
def _relu(self, x, leakiness=0.0):
"""Relu, with optional leaky support."""
return tf.select(tf.less(x, 0.0), leakiness * x, x, name='leaky_relu')
def _fully_connected(self, x, out_dim):
"""FullyConnected layer for final output."""
x = tf.reshape(x, [self.hps.batch_size, -1])
#print "*** ", x.get_shape()
w = tf.get_variable(
'DW', [x.get_shape()[1], out_dim],
initializer=tf.uniform_unit_scaling_initializer(factor=1.0))
#print "*** ", w.get_shape()
b = tf.get_variable('biases', [out_dim],
initializer=tf.constant_initializer())
#print "*** ", b.get_shape()
aaa = tf.nn.xw_plus_b(x, w, b)
#print "*** ", aaa.get_shape()
return tf.nn.xw_plus_b(x, w, b)
def _fully_connected_ST(self, x, out_dim):
"""FullyConnected layer for final output of the localization network in the spatial transformer"""
x = tf.reshape(x, [self.hps.batch_size, -1])
w = tf.get_variable(
'DW2', [x.get_shape()[1], out_dim],
initializer=tf.uniform_unit_scaling_initializer(factor=1.0))
initial = np.array([[1., 0, 0], [0, 1., 0]])
initial = initial.astype('float32')
initial = initial.flatten()
b = tf.get_variable('biases2', [out_dim],
initializer=tf.constant_initializer(initial))
return tf.nn.xw_plus_b(x, w, b)
def _global_avg_pool(self, x):
assert x.get_shape().ndims == 4
return tf.reduce_mean(x, [1, 2])
def _repeat(self, x, n_repeats):
with tf.variable_scope('_repeat'):
rep = tf.transpose(
tf.expand_dims(tf.ones(shape=tf.pack([n_repeats, ])), 1), [1, 0])
rep = tf.cast(rep, 'int32')
x = tf.matmul(tf.reshape(x, (-1, 1)), rep)
return tf.reshape(x, [-1])
def _interpolate(self, im, x, y, out_size, channel_x):
with tf.variable_scope('_interpolate2'):
# constants
num_batch = self.hps.batch_size #tf.shape(im)[0]
print num_batch
height = tf.shape(im)[1]
width = tf.shape(im)[2]
channels = tf.shape(im)[3]
print channels
#channels = tf.cast(channels, tf.int32)
#print channels
x = tf.cast(x, 'float32')
y = tf.cast(y, 'float32')
height_f = tf.cast(height, 'float32')
width_f = tf.cast(width, 'float32')
out_height = out_size[0]
out_width = out_size[1]
zero = tf.zeros([], dtype='int32')
#max_y = tf.cast(tf.shape(im)[1] - 1, 'int32')
#max_x = tf.cast(tf.shape(im)[2] - 1, 'int32')
max_y = tf.cast(height - 1, 'int32')
max_x = tf.cast(width - 1, 'int32')
# scale indices from [-1, 1] to [0, width/height]
x = (x + 1.0)*(width_f) / 2.0
y = (y + 1.0)*(height_f) / 2.0
# do sampling
x0 = tf.cast(tf.floor(x), 'int32')
x1 = x0 + 1
y0 = tf.cast(tf.floor(y), 'int32')
y1 = y0 + 1
x0 = tf.clip_by_value(x0, zero, max_x)
x1 = tf.clip_by_value(x1, zero, max_x)
y0 = tf.clip_by_value(y0, zero, max_y)
y1 = tf.clip_by_value(y1, zero, max_y)
dim2 = width
dim1 = width*height
base = self._repeat(tf.range(num_batch)*dim1, out_height*out_width)
base_y0 = base + y0*dim2
base_y1 = base + y1*dim2
idx_a = base_y0 + x0
idx_b = base_y1 + x0
idx_c = base_y0 + x1
idx_d = base_y1 + x1
# use indices to lookup pixels in the flat image and restore
# channels dim
im_flat = tf.reshape(im, tf.pack([-1, channel_x]))
#aa = tf.pack([-1, channels])
#im_flat = tf.reshape(im, [-1, channels])
#print im.get_shape(), im_flat.get_shape() #, aa.get_shape()
im_flat = tf.cast(im_flat, 'float32')
Ia = tf.gather(im_flat, idx_a)
Ib = tf.gather(im_flat, idx_b)
Ic = tf.gather(im_flat, idx_c)
Id = tf.gather(im_flat, idx_d)
#print im_flat.get_shape(), idx_a.get_shape()
#print Ia.get_shape(), Ib.get_shape(), Ic.get_shape(), Id.get_shape()
# and finally calculate interpolated values
x0_f = tf.cast(x0, 'float32')
x1_f = tf.cast(x1, 'float32')
y0_f = tf.cast(y0, 'float32')
y1_f = tf.cast(y1, 'float32')
wa = tf.expand_dims(((x1_f-x) * (y1_f-y)), 1)
wb = tf.expand_dims(((x1_f-x) * (y-y0_f)), 1)
wc = tf.expand_dims(((x-x0_f) * (y1_f-y)), 1)
wd = tf.expand_dims(((x-x0_f) * (y-y0_f)), 1)
#print wa.get_shape(), wb.get_shape(), wc.get_shape(), wd.get_shape()
output = tf.add_n([wa*Ia, wb*Ib, wc*Ic, wd*Id])
#print output.get_shape()
return output
def _meshgrid(self, height, width):
with tf.variable_scope('_meshgrid'):
# This should be equivalent to:
# x_t, y_t = np.meshgrid(np.linspace(-1, 1, width),
# np.linspace(-1, 1, height))
# ones = np.ones(np.prod(x_t.shape))
# grid = np.vstack([x_t.flatten(), y_t.flatten(), ones])
x_t = tf.matmul(tf.ones(shape=tf.pack([height, 1])),
tf.transpose(tf.expand_dims(tf.linspace(-1.0, 1.0, width), 1), [1, 0]))
y_t = tf.matmul(tf.expand_dims(tf.linspace(-1.0, 1.0, height), 1),
tf.ones(shape=tf.pack([1, width])))
x_t_flat = tf.reshape(x_t, (1, -1))
y_t_flat = tf.reshape(y_t, (1, -1))
ones = tf.ones_like(x_t_flat)
grid = tf.concat(0, [x_t_flat, y_t_flat, ones])
return grid
def _transform(self, theta, input_dim, out_size, channel_input):
with tf.variable_scope('_transform'):
print input_dim.get_shape(), theta.get_shape(), out_size[0], out_size[1]
num_batch = self.hps.batch_size #tf.shape(input_dim)[0]
height = tf.shape(input_dim)[1]
width = tf.shape(input_dim)[2]
num_channels = tf.shape(input_dim)[3]
theta = tf.reshape(theta, (-1, 2, 3))
theta = tf.cast(theta, 'float32')
# grid of (x_t, y_t, 1), eq (1) in ref [1]
height_f = tf.cast(height, 'float32')
width_f = tf.cast(width, 'float32')
out_height = out_size[0]
out_width = out_size[1]
grid = self._meshgrid(out_height, out_width)
#print grid, grid.get_shape()
grid = tf.expand_dims(grid, 0)
grid = tf.reshape(grid, [-1])
grid = tf.tile(grid, tf.pack([num_batch]))
grid = tf.reshape(grid, tf.pack([num_batch, 3, -1]))
#print grid, grid.get_shape()
# Transform A x (x_t, y_t, 1)^T -> (x_s, y_s)
T_g = tf.batch_matmul(theta, grid)
x_s = tf.slice(T_g, [0, 0, 0], [-1, 1, -1])
y_s = tf.slice(T_g, [0, 1, 0], [-1, 1, -1])
x_s_flat = tf.reshape(x_s, [-1])
y_s_flat = tf.reshape(y_s, [-1])
#print x_s_flat.get_shape(), y_s_flat.get_shape()
input_transformed = self._interpolate(input_dim, x_s_flat, y_s_flat, out_size, channel_input)
#print input_transformed.get_shape()
output = tf.reshape(input_transformed, tf.pack([num_batch, out_height, out_width, channel_input]))
return output
#return input_dim
================================================
FILE: pose_utils.py
================================================
import sys
import os
#sys.path.append('+glaive_pylib+')
#import JanusUtils
import numpy as np
import cv2
import math
import fileinput
import shutil
def increaseBbox(bbox, factor):
tlx = bbox[0]
tly = bbox[1]
brx = bbox[2]
bry = bbox[3]
dx = factor
dy = factor
dw = 1 + factor
dh = 1 + factor
#Getting bbox height and width
w = brx-tlx;
h = bry-tly;
tlx2 = tlx - w * dx
tly2 = tly - h * dy
brx2 = tlx + w * dw
bry2 = tly + h * dh
nbbox = np.zeros( (4,1), dtype=np.float32 )
nbbox[0] = tlx2
nbbox[1] = tly2
nbbox[2] = brx2
nbbox[3] = bry2
return nbbox
def image_bbox_processing_v2(img, bbox):
img_h, img_w, img_c = img.shape
lt_x = bbox[0]
lt_y = bbox[1]
rb_x = bbox[2]
rb_y = bbox[3]
fillings = np.zeros( (4,1), dtype=np.int32)
if lt_x < 0: ## 0 for python
fillings[0] = math.ceil(-lt_x)
if lt_y < 0:
fillings[1] = math.ceil(-lt_y)
if rb_x > img_w-1:
fillings[2] = math.ceil(rb_x - img_w + 1)
if rb_y > img_h-1:
fillings[3] = math.ceil(rb_y - img_h + 1)
new_bbox = np.zeros( (4,1), dtype=np.float32 )
# img = [zeros(size(img,1),fillings(1),img_c), img]
# img = [zeros(fillings(2), size(img,2),img_c); img]
# img = [img, zeros(size(img,1), fillings(3),img_c)]
# new_img = [img; zeros(fillings(4), size(img,2),img_c)]
imgc = img.copy()
if fillings[0] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.hstack( [np.zeros( (img_h, fillings[0][0], img_c), dtype=np.uint8 ), imgc] )
if fillings[1] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.vstack( [np.zeros( (fillings[1][0], img_w, img_c), dtype=np.uint8 ), imgc] )
if fillings[2] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.hstack( [ imgc, np.zeros( (img_h, fillings[2][0], img_c), dtype=np.uint8 ) ] )
if fillings[3] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.vstack( [ imgc, np.zeros( (fillings[3][0], img_w, img_c), dtype=np.uint8) ] )
new_bbox[0] = lt_x + fillings[0]
new_bbox[1] = lt_y + fillings[1]
new_bbox[2] = rb_x + fillings[0]
new_bbox[3] = rb_y + fillings[1]
return imgc, new_bbox
def preProcessImage(_savingDir, data_dict, data_root, factor, _alexNetSize, _listFile):
#### Formatting the images as needed
file_output = _listFile
count = 1
fileIn = open(file_output , 'w' )
for key in data_dict.keys():
filename = data_dict[key]['file']
im = cv2.imread(data_root + filename)
if im is not None:
print 'Processing ' + filename + ' '+ str(count)
sys.stdout.flush()
lt_x = data_dict[key]['x']
lt_y = data_dict[key]['y']
rb_x = lt_x + data_dict[key]['width']
rb_y = lt_y + data_dict[key]['height']
w = data_dict[key]['width']
h = data_dict[key]['height']
center = ( (lt_x+rb_x)/2, (lt_y+rb_y)/2 )
side_length = max(w,h);
bbox = np.zeros( (4,1), dtype=np.float32 )
bbox[0] = center[0] - side_length/2
bbox[1] = center[1] - side_length/2
bbox[2] = center[0] + side_length/2
bbox[3] = center[1] + side_length/2
#img_2, bbox_green = image_bbox_processing_v2(im, bbox)
#%% Get the expanded square bbox
bbox_red = increaseBbox(bbox, factor)
#[img, bbox_red] = image_bbox_processing_v2(img, bbox_red);
img_3, bbox_new = image_bbox_processing_v2(im, bbox_red)
#%% Crop and resized
#bbox_red = ceil(bbox_red);
bbox_new = np.ceil( bbox_new )
#side_length = max(bbox_new(3) - bbox_new(1), bbox_new(4) - bbox_new(2));
side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] )
bbox_new[2:4] = bbox_new[0:2] + side_length
#crop_img = img(bbox_red(2):bbox_red(4), bbox_red(1):bbox_red(3), :);
#resized_crop_img = imresize(crop_img, [227, 227]);# % re-scaling to 227 x 227
bbox_new = bbox_new.astype(int)
crop_img = img_3[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :];
resized_crop_img = cv2.resize(crop_img, ( _alexNetSize, _alexNetSize ), interpolation = cv2.INTER_CUBIC)
cv2.imwrite(_savingDir + key + '.jpg', resized_crop_img )
# flip image for latter use
img_flip = cv2.flip(resized_crop_img,1)
cv2.imwrite(_savingDir + key + '_flip.jpg', img_flip )
## Tracking pose image
fileIn.write(key + ',')
fileIn.write(_savingDir + key + '.jpg\n')
fileIn.write(key + '_flip,')
fileIn.write(_savingDir + key + '_flip.jpg\n')
else:
print ' '.join(['Skipping image:', filename, 'Image is None', str(count)])
count+=1
fileIn.close()
def replaceInFile(filep, before, after):
for line in fileinput.input(filep, inplace=True):
print line.replace(before,after),
================================================
FILE: renderer_fpn.py
================================================
import csv
import lmdb
import sys
import numpy as np
import cv2
import os
this_path = os.path.dirname(os.path.abspath(__file__))
render_path = this_path+'/face_renderer/'
sys.path.append(render_path)
try:
import myutil
except ImportError as ie:
print '****************************************************************'
print '**** Have you forgotten to "git clone --recursive"? ****'
print '**** You have to do that to also download the face renderer ****'
print '****************************************************************'
print ie.message
exit(0)
import config
opts = config.parse()
import camera_calibration as calib
import ThreeD_Model
import renderer as renderer_core
import get_Rts as getRts
#pose_models = ['model3D_aug_-00_00','model3D_aug_-22_00','model3D_aug_-40_00','model3D_aug_-55_00','model3D_aug_-75_00']
newModels = opts.getboolean('renderer', 'newRenderedViews')
if opts.getboolean('renderer', 'newRenderedViews'):
pose_models_folder = '/models3d_new/'
pose_models = ['model3D_aug_-00_00','model3D_aug_-22_00','model3D_aug_-40_00','model3D_aug_-55_00','model3D_aug_-75_00']
else:
pose_models_folder = '/models3d/'
pose_models = ['model3D_aug_-00','model3D_aug_-40','model3D_aug_-75',]
nSub = 10
allModels = myutil.preload(render_path,pose_models_folder,pose_models,nSub)
def render_fpn(inputFile, output_pose_db, outputFolder):
## Opening FPN pose db
pose_env = lmdb.open( output_pose_db, readonly=True )
pose_cnn_lmdb = pose_env.begin()
## looping over images
with open(inputFile, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
lines = csvfile.readlines()
for lin in lines:
### key1, image_path_key_1
image_key = lin.split(',')[0]
if 'flip' in image_key:
continue
image_path = lin.split(',')[-1].rstrip('\n')
img = cv2.imread(image_path, 1)
pose_Rt_raw = pose_cnn_lmdb.get( image_key )
pose_Rt_flip_raw = pose_cnn_lmdb.get(image_key + '_flip')
if pose_Rt_raw is not None:
pose_Rt = np.frombuffer( pose_Rt_raw, np.float32 )
pose_Rt_flip = np.frombuffer( pose_Rt_flip_raw, np.float32 )
yaw = myutil.decideSide_from_db(img, pose_Rt, allModels)
if yaw < 0: # Flip image and get the corresponsidng pose
img = cv2.flip(img,1)
pose_Rt = pose_Rt_flip
listPose = myutil.decidePose(yaw, opts, newModels)
## Looping over the poses
for poseId in listPose:
posee = pose_models[poseId]
## Looping over the subjects
for subj in [10]:
pose = posee + '_' + str(subj).zfill(2) +'.mat'
print '> Looking at file: ' + image_path + ' with ' + pose
# load detections performed by dlib library on 3D model and Reference Image
print "> Using pose model in " + pose
## Indexing the right model instead of loading it each time from memory.
model3D = allModels[pose]
eyemask = model3D.eyemask
# perform camera calibration according to the first face detected
proj_matrix, camera_matrix, rmat, tvec = calib.estimate_camera(model3D, pose_Rt, pose_db_on=True)
## We use eyemask only for frontal
if not myutil.isFrontal(pose):
eyemask = None
##### Main part of the code: doing the rendering #############
rendered_raw, rendered_sym, face_proj, background_proj, temp_proj2_out_2, sym_weight = renderer_core.render(img, proj_matrix,\
model3D.ref_U, eyemask, model3D.facemask, opts)
########################################################
if myutil.isFrontal(pose):
rendered_raw = rendered_sym
## Cropping if required by crop_models
#rendered_raw = myutil.cropFunc(pose,rendered_raw,crop_models[poseId])
## Resizing if required
#if resizeCNN:
# rendered_raw = cv2.resize(rendered_raw, ( cnnSize, cnnSize ), interpolation=cv2.INTER_CUBIC )
## Saving if required
if opts.getboolean('general', 'saveON'):
subjFolder = outputFolder + '/'+ image_key.split('_')[0]
myutil.mymkdir(subjFolder)
savingString = subjFolder + '/' + image_key +'_rendered_'+ pose[8:-7]+'_'+str(subj).zfill(2)+'.jpg'
cv2.imwrite(savingString,rendered_raw)
================================================
FILE: tf_utils.py
================================================
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# %% Borrowed utils from here: https://github.com/pkmital/tensorflow_tutorials/
#import tensorflow as tf
import numpy as np
import csv
def conv2d(x, n_filters,
k_h=5, k_w=5,
stride_h=2, stride_w=2,
stddev=0.02,
activation=lambda x: x,
bias=True,
padding='SAME',
name="Conv2D"):
"""2D Convolution with options for kernel size, stride, and init deviation.
Parameters
----------
x : Tensor
Input tensor to convolve.
n_filters : int
Number of filters to apply.
k_h : int, optional
Kernel height.
k_w : int, optional
Kernel width.
stride_h : int, optional
Stride in rows.
stride_w : int, optional
Stride in cols.
stddev : float, optional
Initialization's standard deviation.
activation : arguments, optional
Function which applies a nonlinearity
padding : str, optional
'SAME' or 'VALID'
name : str, optional
Variable scope to use.
Returns
-------
x : Tensor
Convolved input.
"""
with tf.variable_scope(name):
w = tf.get_variable(
'w', [k_h, k_w, x.get_shape()[-1], n_filters],
initializer=tf.truncated_normal_initializer(stddev=stddev))
conv = tf.nn.conv2d(
x, w, strides=[1, stride_h, stride_w, 1], padding=padding)
if bias:
b = tf.get_variable(
'b', [n_filters],
initializer=tf.truncated_normal_initializer(stddev=stddev))
conv = conv + b
return conv
def linear(x, n_units, scope=None, stddev=0.02,
activation=lambda x: x):
"""Fully-connected network.
Parameters
----------
x : Tensor
Input tensor to the network.
n_units : int
Number of units to connect to.
scope : str, optional
Variable scope to use.
stddev : float, optional
Initialization's standard deviation.
activation : arguments, optional
Function which applies a nonlinearity
Returns
-------
x : Tensor
Fully-connected output.
"""
shape = x.get_shape().as_list()
with tf.variable_scope(scope or "Linear"):
matrix = tf.get_variable("Matrix", [shape[1], n_units], tf.float32,
tf.random_normal_initializer(stddev=stddev))
return activation(tf.matmul(x, matrix))
# %%
def weight_variable(shape):
'''Helper function to create a weight variable initialized with
a normal distribution
Parameters
----------
shape : list
Size of weight variable
'''
#initial = tf.random_normal(shape, mean=0.0, stddev=0.01)
initial = tf.zeros(shape)
return tf.Variable(initial)
# %%
def bias_variable(shape):
'''Helper function to create a bias variable initialized with
a constant value.
Parameters
----------
shape : list
Size of weight variable
'''
initial = tf.random_normal(shape, mean=0.0, stddev=0.01)
return tf.Variable(initial)
# %%
def dense_to_one_hot(labels, n_classes=2):
"""Convert class labels from scalars to one-hot vectors."""
labels = np.array(labels).astype('int32')
n_labels = labels.shape[0]
index_offset = (np.arange(n_labels) * n_classes).astype('int32')
labels_one_hot = np.zeros((n_labels, n_classes), dtype=np.float32)
labels_one_hot.flat[index_offset + labels.ravel()] = 1
return labels_one_hot
def prepare_trainVal_img_list(img_list, num_subjs):
#num_imgs_per_subj =np.zeros([num_subjs])
id_label_list = []
for row in img_list:
id_label = int(row[8])
#num_imgs_per_subj[id_label] += 1
id_label_list.append(id_label)
id_label_list = np.asarray(id_label_list)
id_label_list = np.reshape(id_label_list, [-1])
train_indices_list = []
valid_indices_list= []
eval_train_indices_list = []
eval_valid_indices_list = []
for i in range(num_subjs):
print i
curr_subj_idx = np.nonzero(id_label_list == i)[0]
tmp = np.random.permutation(curr_subj_idx)
per80 = np.floor(len(curr_subj_idx) * 0.8)
t_inds = tmp[0:per80]
v_inds = tmp[per80:]
train_indices_list.append(t_inds)
valid_indices_list.append(v_inds)
eval_train_indices_list.append(t_inds[0])
eval_valid_indices_list.append(v_inds[0])
train_indices_list = np.asarray(train_indices_list)
valid_indices_list = np.asarray(valid_indices_list)
eval_train_indices_list = np.asarray(eval_train_indices_list)
eval_valid_indices_list = np.asarray(eval_valid_indices_list)
#print train_indices_list, train_indices_list.shape
train_indices_list = np.hstack(train_indices_list).astype('int')
valid_indices_list = np.hstack(valid_indices_list).astype('int')
eval_train_indices_list = np.hstack(eval_train_indices_list).astype('int')
eval_valid_indices_list = np.hstack(eval_valid_indices_list).astype('int')
print train_indices_list.shape, valid_indices_list.shape, eval_train_indices_list.shape, eval_valid_indices_list.shape
img_list = np.asarray(img_list)
print img_list.shape
train_list = img_list[train_indices_list]
valid_list = img_list[valid_indices_list]
eval_train_list = img_list[eval_train_indices_list]
eval_valid_list = img_list[eval_valid_indices_list]
np.savez("Oxford_trainVal_data_3DSTN.npz", train_list=train_list, valid_list=valid_list, eval_train_list=eval_train_list, eval_valid_list=eval_valid_list)
def select_eval_img_list(img_list, num_subjs, save_file_name):
# number of validation subjects
id_label_list = []
for row in img_list:
id_label = int(row[8])
id_label_list.append(id_label)
id_label_list = np.asarray(id_label_list)
id_label_list = np.reshape(id_label_list, [-1])
eval_indices_list = []
for i in range(num_subjs):
print i
curr_subj_idx = np.nonzero(id_label_list == i)[0]
tmp = np.random.permutation(curr_subj_idx)
inds = tmp[0:min(5, len(curr_subj_idx))]
eval_indices_list.append(inds)
eval_indices_list = np.asarray(eval_indices_list)
eval_indices_list = np.hstack(eval_indices_list).astype('int')
print eval_indices_list.shape
img_list = np.asarray(img_list)
print img_list.shape
eval_list = img_list[eval_indices_list]
np.savez(save_file_name, eval_list=eval_list)
"""
# Record the number of images per subject
num_imgs_per_subj =np.zeros([num_subjs])
for row in valid_img_list:
id_label = int(row[8])
num_imgs_per_subj[id_label] += 1
hist_subj = np.zeros([num_subjs])
idx = 0
count = 0
for row in valid_img_list:
count += 1
print count
image_key = row[0]
image_path = row[1]
id_label = int(row[8])
if idx >= num_subjs:
break
if hist_subj[idx] < min(1, num_imgs_per_subj[idx]):
if id_label == idx:
with open(save_file_name, "a") as f:
f.write(image_key + "," + image_path + "," + row[2] + "," + row[3] + "," + row[4] + "," + row[5] + "," + row[6] + "," + row[7] + "," + str(id_label) + "\n")
hist_subj[idx] += 1
else:
idx += 1
"""
def input_processing(images, pose_labels, id_labels, train_mean_vec, mean_labels, std_labels, num_imgs, image_size, num_classes):
images = images.reshape([num_imgs, image_size, image_size, 3])
pose_labels = pose_labels.reshape([num_imgs, 6])
id_labels = id_labels.reshape([num_imgs, 1])
id_labels = dense_to_one_hot(id_labels, num_classes)
# Subtract train image mean
images = images / 255.
train_mean_mat = train_mean_vec2mat(train_mean_vec, images)
normalized_images = images - train_mean_mat
# Normalize labels
normalized_pose_labels = (pose_labels - mean_labels) / (std_labels + 0.000000000000000001)
return normalized_images, normalized_pose_labels, id_labels
def train_mean_vec2mat(train_mean, images_array):
height = images_array.shape[1]
width = images_array.shape[2]
#batch = images_array.shape[0]
train_mean_R = np.matlib.repmat(train_mean[0],height,width)
train_mean_G = np.matlib.repmat(train_mean[1],height,width)
train_mean_B = np.matlib.repmat(train_mean[2],height,width)
R = np.reshape(train_mean_R, (height,width,1))
G = np.reshape(train_mean_G, (height,width,1))
B = np.reshape(train_mean_B, (height,width,1))
train_mean_image = np.append(R, G, axis=2)
train_mean_image = np.append(train_mean_image, B, axis=2)
return train_mean_image
def create_file_list(csv_file_path):
with open(csv_file_path, 'r') as csvfile:
csvreader = csv.reader(csvfile, delimiter=',')
csv_list = list(csvreader)
return csv_list
================================================
FILE: train_stats/README
================================================
Here are the precomputed training data statistics
================================================
FILE: utils/README
================================================
Some utility functions are here
================================================
FILE: utils/pose_utils.py
================================================
import sys
import os
import numpy as np
import cv2
import math
from math import cos, sin, atan2, asin
import fileinput
## Index to remap landmarks in case we flip an image
repLand = [ 17,16,15,14,13,12,11,10, 9,8,7,6,5,4,3,2,1,27,26,25, \
24,23,22,21,20,19,18,28,29,30,31,36,35,34,33,32,46,45,44,43, \
48,47,40,39,38,37,42,41,55,54,53,52,51,50,49,60,59,58,57,56, \
65,64,63,62,61,68,67,66 ]
def increaseBbox(bbox, factor):
tlx = bbox[0]
tly = bbox[1]
brx = bbox[2]
bry = bbox[3]
dx = factor
dy = factor
dw = 1 + factor
dh = 1 + factor
#Getting bbox height and width
w = brx-tlx;
h = bry-tly;
tlx2 = tlx - w * dx
tly2 = tly - h * dy
brx2 = tlx + w * dw
bry2 = tly + h * dh
nbbox = np.zeros( (4,1), dtype=np.float32 )
nbbox[0] = tlx2
nbbox[1] = tly2
nbbox[2] = brx2
nbbox[3] = bry2
return nbbox
def increaseBbox_rescaleCASIA(bbox, factor):
tlx = bbox[0]
tly = bbox[1]
brx = bbox[2]
bry = bbox[3]
ww = brx - tlx;
hh = bry - tly;
cx = tlx + ww/2;
cy = tly + hh/2;
tsize = max(ww,hh)/2;
bl = cx - factor[0]*tsize;
bt = cy - factor[1]*tsize;
br = cx + factor[2]*tsize;
bb = cy + factor[3]*tsize;
nbbox = np.zeros( (4,1), dtype=np.float32 )
nbbox[0] = bl;
nbbox[1] = bt;
nbbox[2] = br;
nbbox[3] = bb;
return nbbox
def increaseBbox_rescaleYOLO(bbox, im):
rescaleFrontal = [1.4421, 2.2853, 1.4421, 1.4286];
rescaleCS2 = [0.9775, 1.5074, 0.9563, 0.9436];
l = bbox[0]
t = bbox[1]
ww = bbox[2]
hh = bbox[3]
# Approximate LM tight BB
h = im.shape[0];
w = im.shape[1];
cx = l + ww/2;
cy = t + hh/2;
tsize = max(ww,hh)/2;
l = cx - tsize;
t = cy - tsize;
cx = l + (2*tsize)/(rescaleCS2[0]+rescaleCS2[2]) * rescaleCS2[0];
cy = t + (2*tsize)/(rescaleCS2[1]+rescaleCS2[3]) * rescaleCS2[1];
tsize = 2*tsize/(rescaleCS2[0]+rescaleCS2[2]);
"""
# Approximate inplane align (frontal)
nbbox = np.zeros( (4,1), dtype=np.float32 )
nbbox[0] = cx - rescaleFrontal[0]*tsize;
nbbox[1] = cy - rescaleFrontal[1]*tsize;
nbbox[2] = cx + rescaleFrontal[2]*tsize;
nbbox[3] = cy + rescaleFrontal[3]*tsize;
"""
nbbox = np.zeros( (4,1), dtype=np.float32 )
nbbox[0] = cx - tsize;
nbbox[1] = cy - tsize;
nbbox[2] = cx + tsize;
nbbox[3] = cy + tsize;
return nbbox
def image_bbox_processing_v2(img, bbox, landmarks=None):
img_h, img_w, img_c = img.shape
lt_x = bbox[0]
lt_y = bbox[1]
rb_x = bbox[2]
rb_y = bbox[3]
fillings = np.zeros( (4,1), dtype=np.int32)
if lt_x < 0: ## 0 for python
fillings[0] = math.ceil(-lt_x)
if lt_y < 0:
fillings[1] = math.ceil(-lt_y)
if rb_x > img_w-1:
fillings[2] = math.ceil(rb_x - img_w + 1)
if rb_y > img_h-1:
fillings[3] = math.ceil(rb_y - img_h + 1)
new_bbox = np.zeros( (4,1), dtype=np.float32 )
# img = [zeros(size(img,1),fillings(1),img_c), img]
# img = [zeros(fillings(2), size(img,2),img_c); img]
# img = [img, zeros(size(img,1), fillings(3),img_c)]
# new_img = [img; zeros(fillings(4), size(img,2),img_c)]
imgc = img.copy()
if fillings[0] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.hstack( [np.zeros( (img_h, fillings[0][0], img_c), dtype=np.uint8 ), imgc] )
if fillings[1] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.vstack( [np.zeros( (fillings[1][0], img_w, img_c), dtype=np.uint8 ), imgc] )
if fillings[2] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.hstack( [ imgc, np.zeros( (img_h, fillings[2][0], img_c), dtype=np.uint8 ) ] )
if fillings[3] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.vstack( [ imgc, np.zeros( (fillings[3][0], img_w, img_c), dtype=np.uint8) ] )
new_bbox[0] = lt_x + fillings[0]
new_bbox[1] = lt_y + fillings[1]
new_bbox[2] = rb_x + fillings[0]
new_bbox[3] = rb_y + fillings[1]
if len(landmarks) == 0: #len(landmarks) == 0: #landmarks == None:
return imgc, new_bbox
else:
landmarks_new = np.zeros([landmarks.shape[0], landmarks.shape[1]])
#print "landmarks_new's shape: \n"
#print landmarks_new.shape
landmarks_new[:,0] = landmarks[:,0] + fillings[0]
landmarks_new[:,1] = landmarks[:,1] + fillings[1]
return imgc, new_bbox, landmarks_new
#return imgc, new_bbox
def image_bbox_processing_v3(img, bbox):
img_h, img_w, img_c = img.shape
lt_x = bbox[0]
lt_y = bbox[1]
rb_x = bbox[2]
rb_y = bbox[3]
fillings = np.zeros( (4,1), dtype=np.int32)
if lt_x < 0: ## 0 for python
fillings[0] = math.ceil(-lt_x)
if lt_y < 0:
fillings[1] = math.ceil(-lt_y)
if rb_x > img_w-1:
fillings[2] = math.ceil(rb_x - img_w + 1)
if rb_y > img_h-1:
fillings[3] = math.ceil(rb_y - img_h + 1)
new_bbox = np.zeros( (4,1), dtype=np.float32 )
# img = [zeros(size(img,1),fillings(1),img_c), img]
# img = [zeros(fillings(2), size(img,2),img_c); img]
# img = [img, zeros(size(img,1), fillings(3),img_c)]
# new_img = [img; zeros(fillings(4), size(img,2),img_c)]
imgc = img.copy()
if fillings[0] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.hstack( [np.zeros( (img_h, fillings[0][0], img_c), dtype=np.uint8 ), imgc] )
if fillings[1] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.vstack( [np.zeros( (fillings[1][0], img_w, img_c), dtype=np.uint8 ), imgc] )
if fillings[2] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.hstack( [ imgc, np.zeros( (img_h, fillings[2][0], img_c), dtype=np.uint8 ) ] )
if fillings[3] > 0:
img_h, img_w, img_c = imgc.shape
imgc = np.vstack( [ imgc, np.zeros( (fillings[3][0], img_w, img_c), dtype=np.uint8) ] )
new_bbox[0] = lt_x + fillings[0]
new_bbox[1] = lt_y + fillings[1]
new_bbox[2] = rb_x + fillings[0]
new_bbox[3] = rb_y + fillings[1]
return imgc, new_bbox
def preProcessImage(im, lmks, bbox, factor, _alexNetSize, flipped):
sys.stdout.flush()
if flipped == 1: # flip landmarks and indices if it's flipped imag
lmks = flip_lmk_idx(im, lmks)
lmks_flip = lmks
lt_x = bbox[0]
lt_y = bbox[1]
rb_x = lt_x + bbox[2]
rb_y = lt_y + bbox[3]
w = bbox[2]
h = bbox[3]
center = ( (lt_x+rb_x)/2, (lt_y+rb_y)/2 )
side_length = max(w,h);
# make the bbox be square
bbox = np.zeros( (4,1), dtype=np.float32 )
bbox[0] = center[0] - side_length/2
bbox[1] = center[1] - side_length/2
bbox[2] = center[0] + side_length/2
bbox[3] = center[1] + side_length/2
img_2, bbox_green = image_bbox_processing_v2(im, bbox)
#%% Get the expanded square bbox
bbox_red = increaseBbox(bbox_green, factor)
bbox_red2 = increaseBbox(bbox, factor)
bbox_red2[2] = bbox_red2[2] - bbox_red2[0]
bbox_red2[3] = bbox_red2[3] - bbox_red2[1]
bbox_red2 = np.reshape(bbox_red2, [4])
img_3, bbox_new, lmks = image_bbox_processing_v2(img_2, bbox_red, lmks)
#%% Crop and resized
bbox_new = np.ceil( bbox_new )
side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] )
bbox_new[2:4] = bbox_new[0:2] + side_length
bbox_new = bbox_new.astype(int)
crop_img = img_3[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :];
lmks_new = np.zeros([lmks.shape[0],2])
lmks_new[:,0] = lmks[:,0] - bbox_new[0][0]
lmks_new[:,1] = lmks[:,1] - bbox_new[1][0]
resized_crop_img = cv2.resize(crop_img, ( _alexNetSize, _alexNetSize ), interpolation = cv2.INTER_CUBIC)
old_h, old_w, channels = crop_img.shape
lmks_new2 = np.zeros([lmks.shape[0],2])
lmks_new2[:,0] = lmks_new[:,0] * _alexNetSize / old_w
lmks_new2[:,1] = lmks_new[:,1] * _alexNetSize / old_h
#print _alexNetSize, old_w, old_h
return resized_crop_img, lmks_new2, bbox_red2, lmks_flip, side_length, center
def resize_crop_rescaleCASIA(im, bbox, lmks, factor):
lt_x = bbox[0]
lt_y = bbox[1]
rb_x = lt_x + bbox[2]
rb_y = lt_y + bbox[3]
bbox = np.reshape([lt_x, lt_y, rb_x, rb_y], [-1])
# Get the expanded square bbox
bbox_red = increaseBbox_rescaleCASIA(bbox, factor)
img_3, bbox_new, lmks = image_bbox_processing_v2(im, bbox_red, lmks);
lmks_filling = lmks.copy()
#%% Crop and resized
bbox_new = np.ceil( bbox_new )
side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] )
bbox_new[2:4] = bbox_new[0:2] + side_length
#bbox_new[0] = max(0, bbox_new[0])
#bbox_new[1] = max(0, bbox_new[1])
#bbox_new[2] = min(img_3.shape[1]-1, bbox_new[2])
#bbox_new[3] = min(img_3.shape[0]-1, bbox_new[3])
bbox_new = bbox_new.astype(int)
crop_img = img_3[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :];
lmks_new = np.zeros([lmks.shape[0],2])
lmks_new[:,0] = lmks[:,0] - bbox_new[0][0]
lmks_new[:,1] = lmks[:,1] - bbox_new[1][0]
old_h, old_w, channels = crop_img.shape
resized_crop_img = cv2.resize(crop_img, ( 224, 224 ), interpolation = cv2.INTER_CUBIC)
lmks_new2 = np.zeros([lmks.shape[0],2])
lmks_new2[:,0] = lmks_new[:,0] * 224 / old_w
lmks_new2[:,1] = lmks_new[:,1] * 224 / old_h
return resized_crop_img, bbox_new, lmks_new2, lmks_filling, old_h, old_w, img_3
def resize_crop_rescaleCASIA_v2(im, bbox, lmks, factor, bbox_type):
# Get the expanded square bbox
if bbox_type == "casia":
lt_x = bbox[0]
lt_y = bbox[1]
rb_x = lt_x + bbox[2]
rb_y = lt_y + bbox[3]
bbox = np.reshape([lt_x, lt_y, rb_x, rb_y], [-1])
bbox_red = increaseBbox_rescaleCASIA(bbox, factor)
elif bbox_type == "yolo":
lt_x = bbox[0]
lt_y = bbox[1]
rb_x = lt_x + bbox[2]
rb_y = lt_y + bbox[3]
w = bbox[2]
h = bbox[3]
center = ( (lt_x+rb_x)/2, (lt_y+rb_y)/2 )
side_length = max(w,h);
# make the bbox be square
bbox = np.zeros( (4,1), dtype=np.float32 )
bbox[0] = center[0] - side_length/2
bbox[1] = center[1] - side_length/2
bbox[2] = center[0] + side_length/2
bbox[3] = center[1] + side_length/2
img_2, bbox_green = image_bbox_processing_v3(im, bbox)
#%% Get the expanded square bbox
bbox_red = increaseBbox(bbox_green, factor)
img_3, bbox_new, lmks = image_bbox_processing_v2(im, bbox_red, lmks);
lmks_filling = lmks.copy()
#%% Crop and resized
bbox_new = np.ceil( bbox_new )
side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] )
bbox_new[2:4] = bbox_new[0:2] + side_length
#bbox_new[0] = max(0, bbox_new[0])
#bbox_new[1] = max(0, bbox_new[1])
#bbox_new[2] = min(img_3.shape[1]-1, bbox_new[2])
#bbox_new[3] = min(img_3.shape[0]-1, bbox_new[3])
bbox_new = bbox_new.astype(int)
crop_img = img_3[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :];
lmks_new = np.zeros([lmks.shape[0],2])
lmks_new[:,0] = lmks[:,0] - bbox_new[0][0]
lmks_new[:,1] = lmks[:,1] - bbox_new[1][0]
old_h, old_w, channels = crop_img.shape
resized_crop_img = cv2.resize(crop_img, ( 224, 224 ), interpolation = cv2.INTER_CUBIC)
lmks_new2 = np.zeros([lmks.shape[0],2])
lmks_new2[:,0] = lmks_new[:,0] * 224 / old_w
lmks_new2[:,1] = lmks_new[:,1] * 224 / old_h
return resized_crop_img, bbox_new, lmks_new2, lmks_filling, old_h, old_w, img_3
def resize_crop_AFLW(im, bbox, lmks):
lt_x = bbox[0]
lt_y = bbox[1]
rb_x = lt_x + bbox[2]
rb_y = lt_y + bbox[3]
bbox = np.reshape([lt_x, lt_y, rb_x, rb_y], [-1])
crop_img = img[bbox[1]:bbox[3], bbox[0]:bbox[2], :];
lmks_new = np.zeros([lmks.shape[0],2])
lmks_new[:,0] = lmks[:,0] - bbox[0]
lmks_new[:,1] = lmks[:,1] - bbox[1]
old_h, old_w, channels = crop_img.shape
resized_crop_img = cv2.resize(crop_img, ( 224, 224 ), interpolation = cv2.INTER_CUBIC)
lmks_new2 = np.zeros([lmks.shape[0],2])
lmks_new2[:,0] = lmks_new[:,0] * 224 / old_w
lmks_new2[:,1] = lmks_new[:,1] * 224 / old_h
bbox_new = np.zeros([4])
bbox_new[0] = bbox[0] * 224 / old_w
bbox_new[1] = bbox[1] * 224 / old_h
bbox_new[2] = bbox[2] * 224 / old_w
bbox_new[3] = bbox[3] * 224 / old_h
bbox_new[2] = bbox_new[2] - bbox_new[0] # box width
bbox_new[3] = bbox_new[3] - bbox_new[1] # box height
return resized_crop_img, bbox_new, lmks_new2
def preProcessImage_v2(im, bbox, factor, _resNetSize, if_cropbyLmks_rescaleCASIA):
sys.stdout.flush()
if if_cropbyLmks_rescaleCASIA == 0:
lt_x = bbox[0]
lt_y = bbox[1]
rb_x = lt_x + bbox[2]
rb_y = lt_y + bbox[3]
w = bbox[2]
h = bbox[3]
center = ( (lt_x+rb_x)/2, (lt_y+rb_y)/2 )
side_length = max(w,h);
# make the bbox be square
bbox = np.zeros( (4,1), dtype=np.float32 )
bbox[0] = center[0] - side_length/2
bbox[1] = center[1] - side_length/2
bbox[2] = center[0] + side_length/2
bbox[3] = center[1] + side_length/2
img_2, bbox_green = image_bbox_processing_v2(im, bbox)
#%% Get the expanded square bbox
bbox_red = increaseBbox(bbox_green, factor)
img_3, bbox_new = image_bbox_processing_v2(img_2, bbox_red)
elif if_cropbyLmks_rescaleCASIA == 1:
bbox[2] = bbox[0] + bbox[2]
bbox[3] = bbox[1] + bbox[3]
bbox_red = increaseBbox_rescaleCASIA(bbox, factor)
#print bbox_red
img_3, bbox_new = image_bbox_processing_v3(im, bbox_red)
else:
bbox2 = increaseBbox_rescaleYOLO(bbox, im)
bbox_red = increaseBbox_rescaleCASIA(bbox2, factor)
img_3, bbox_new = image_bbox_processing_v2(im, bbox_red)
#bbox_red2 = increaseBbox(bbox, factor)
#bbox_red2[2] = bbox_red2[2] - bbox_red2[0]
#bbox_red2[3] = bbox_red2[3] - bbox_red2[1]
#bbox_red2 = np.reshape(bbox_red2, [4])
#%% Crop and resized
bbox_new = np.ceil( bbox_new )
side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] )
bbox_new[2:4] = bbox_new[0:2] + side_length
bbox_new = bbox_new.astype(int)
crop_img = img_3[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :];
#print crop_img.shape
resized_crop_img = cv2.resize(crop_img, ( _resNetSize, _resNetSize ), interpolation = cv2.INTER_CUBIC)
return resized_crop_img
def preProcessImage_useGTBBox(im, lmks, bbox, factor, _alexNetSize, flipped, to_train_scale, yolo_bbox):
sys.stdout.flush()
#print bbox, yolo_bbox, to_train_scale
if flipped == 1: # flip landmarks and indices if it's flipped imag
lmks = flip_lmk_idx(im, lmks)
lmks_flip = lmks
lt_x = bbox[0]
lt_y = bbox[1]
rb_x = lt_x + bbox[2]
rb_y = lt_y + bbox[3]
w = bbox[2]
h = bbox[3]
center = ( (lt_x+rb_x)/2, (lt_y+rb_y)/2 )
side_length = max(w,h);
# make the bbox be square
bbox = np.zeros( (4,1), dtype=np.float32 )
#print bbox
bbox_red = np.zeros( (4,1), dtype=np.float32 )
if to_train_scale == 1:
_, _, _, _, side_length2, center2 = preProcessImage(im, lmks, yolo_bbox, factor, _alexNetSize, flipped)
center3 = ( (center[0]+center2[0])/2, (center[1]+center2[1])/2 )
bbox[0] = center3[0] - side_length2/2
bbox[1] = center3[1] - side_length2/2
bbox[2] = center3[0] + side_length2/2
bbox[3] = center3[1] + side_length2/2
bbox_red[0] = center3[0] - side_length2/2
bbox_red[1] = center3[1] - side_length2/2
bbox_red[2] = side_length2
bbox_red[3] = side_length2
else:
bbox[0] = center[0] - side_length/2
bbox[1] = center[1] - side_length/2
bbox[2] = center[0] + side_length/2
bbox[3] = center[1] + side_length/2
#print center, side_length, bbox[0], bbox[1], bbox[2], bbox[3]
bbox_red[0] = center[0] - side_length/2
bbox_red[1] = center[1] - side_length/2
bbox_red[2] = side_length
bbox_red[3] = side_length
bbox_red = np.reshape(bbox_red, [4])
#print bbox, bbox_red
img_2, bbox_green = image_bbox_processing_v2(im, bbox)
#print img_2.shape, bbox_green
#%% Crop and resized
bbox_new = np.ceil( bbox_green )
side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] )
bbox_new[2:4] = bbox_new[0:2] + side_length
bbox_new = bbox_new.astype(int)
#print bbox_new
crop_img = img_2[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :];
lmks_new = np.zeros([68,2])
lmks_new[:,0] = lmks[:,0] - bbox_new[0][0]
lmks_new[:,1] = lmks[:,1] - bbox_new[1][0]
#print crop_img.shape
resized_crop_img = cv2.resize(crop_img, ( _alexNetSize, _alexNetSize ), interpolation = cv2.INTER_CUBIC)
old_h, old_w, channels = crop_img.shape
lmks_new2 = np.zeros([68,2])
lmks_new2[:,0] = lmks_new[:,0] * _alexNetSize / old_w
lmks_new2[:,1] = lmks_new[:,1] * _alexNetSize / old_h
#print _alexNetSize, old_w, old_h
return resized_crop_img, lmks_new2, bbox_red, lmks_flip
def replaceInFile(filep, before, after):
for line in fileinput.input(filep, inplace=True):
print line.replace(before,after),
def flip_lmk_idx(img, lmarks):
# Flipping X values for landmarks \
lmarks[:,0] = img.shape[1] - lmarks[:,0]
# Creating flipped landmarks with new indexing
lmarks_flip = np.zeros((68,2))
for i in range(len(repLand)):
lmarks_flip[i,:] = lmarks[repLand[i]-1,:]
return lmarks_flip
def pose_to_LMs(pose_Rt):
pose_Rt = np.reshape(pose_Rt, [6])
ref_lm = np.loadtxt('./lm_m10.txt', delimiter=',')
ref_lm_t = np.transpose(ref_lm)
numLM = ref_lm_t.shape[1]
#PI = np.array([[ 4.22519775e+03,0.00000000e+00,1.15000000e+02], [0.00000000e+00, 4.22519775e+03, 1.15000000e+02], [0, 0, 1]]);
PI = np.array([[ 2.88000000e+03, 0.00000000e+00, 1.12000000e+02], [0.00000000e+00, 2.88000000e+03, 1.12000000e+02], [0, 0, 1]]);
rvecs = pose_Rt[0:3]
tvec = np.reshape(pose_Rt[3:6], [3,1])
tsum = np.repeat(tvec,numLM,1)
rmat, jacobian = cv2.Rodrigues(rvecs, None)
transformed_lms = np.matmul(rmat, ref_lm_t) + tsum
transformed_lms = np.matmul(PI, transformed_lms)
transformed_lms[0,:] = transformed_lms[0,:]/transformed_lms[2,:]
transformed_lms[1,:] = transformed_lms[1,:]/transformed_lms[2,:]
lms = np.transpose(transformed_lms[:2,:])
return lms
def RotationMatrix(angle_x, angle_y, angle_z):
# get rotation matrix by rotate angle
phi = angle_x; # pitch
gamma = angle_y; # yaw
theta = angle_z; # roll
R_x = np.array([ [1, 0, 0] , [0, np.cos(phi), np.sin(phi)] , [0, -np.sin(phi), np.cos(phi)] ]);
R_y = np.array([ [np.cos(gamma), 0, -np.sin(gamma)] , [0, 1, 0] , [np.sin(gamma), 0, np.cos(gamma)] ]);
R_z = np.array([ [np.cos(theta), np.sin(theta), 0] , [-np.sin(theta), np.cos(theta), 0] , [0, 0, 1] ]);
R = np.matmul( R_x , np.matmul(R_y , R_z) );
return R
def matrix2angle(R):
''' compute three Euler angles from a Rotation Matrix. Ref: http://www.gregslabaugh.net/publications/euler.pdf
Args:
R: (3,3). rotation matrix
Returns:
x: yaw
y: pitch
z: roll
'''
# assert(isRotationMatrix(R))
if R[2,0] !=1 or R[2,0] != -1:
#x = asin(R[2,0])
#y = atan2(R[2,1]/cos(x), R[2,2]/cos(x))
#z = atan2(R[1,0]/cos(x), R[0,0]/cos(x))
x = -asin(R[2,0])
#x = np.pi - x
y = atan2(R[2,1]/cos(x), R[2,2]/cos(x))
z = atan2(R[1,0]/cos(x), R[0,0]/cos(x))
else:# Gimbal lock
z = 0 #can be anything
if R[2,0] == -1:
x = np.pi/2
y = z + atan2(R[0,1], R[0,2])
else:
x = -np.pi/2
y = -z + atan2(-R[0,1], -R[0,2])
return x, y, z
def P2sRt(P):
''' decompositing camera matrix P.
Args:
P: (3, 4). Affine Camera Matrix.
Returns:
s: scale factor.
R: (3, 3). rotation matrix.
t2d: (2,). 2d translation.
t3d: (3,). 3d translation.
'''
#t2d = P[:2, 3]
t3d = P[:, 3]
R1 = P[0:1, :3]
R2 = P[1:2, :3]
s = (np.linalg.norm(R1) + np.linalg.norm(R2))/2.0
r1 = R1/np.linalg.norm(R1)
r2 = R2/np.linalg.norm(R2)
r3 = np.cross(r1, r2)
R = np.concatenate((r1, r2, r3), 0)
return s, R, t3d
gitextract_nt_altqw/
├── .gitmodules
├── BFM/
│ └── README
├── README.md
├── ResNet/
│ └── ThreeDMM_shape.py
├── get_Rts.py
├── input.csv
├── input_list.txt
├── input_samples/
│ └── README
├── kaffe/
│ ├── __init__.py
│ ├── errors.py
│ ├── graph.py
│ ├── layers.py
│ ├── shapes.py
│ ├── tensorflow/
│ │ ├── __init__.py
│ │ └── network_shape.py
│ └── transformers.py
├── main_fpn.py
├── main_predict_6DoF.py
├── main_predict_ProjMat.py
├── models/
│ └── README
├── myparse.py
├── output_render/
│ └── README.md
├── pose_model.py
├── pose_utils.py
├── renderer_fpn.py
├── tf_utils.py
├── train_stats/
│ ├── 3DMM_shape_mean.npy
│ ├── README
│ ├── train_label_mean_300WLP.npy
│ ├── train_label_mean_ProjMat.npy
│ ├── train_label_std_300WLP.npy
│ └── train_label_std_ProjMat.npy
└── utils/
├── README
└── pose_utils.py
SYMBOL INDEX (176 symbols across 16 files)
FILE: ResNet/ThreeDMM_shape.py
class ResNet_101 (line 9) | class ResNet_101(Network_Shape):
method setup (line 10) | def setup(self):
FILE: get_Rts.py
function run_pose_estimation (line 35) | def run_pose_estimation(root_model_path, inputFile, outputDB, model_used...
function esimatePose (line 110) | def esimatePose(root_model_path, inputFile, outputDB, model_used, lr_rat...
FILE: kaffe/errors.py
class KaffeError (line 3) | class KaffeError(Exception):
function print_stderr (line 6) | def print_stderr(msg):
FILE: kaffe/graph.py
class Node (line 8) | class Node(object):
method __init__ (line 10) | def __init__(self, name, kind, layer=None):
method add_parent (line 20) | def add_parent(self, parent_node):
method add_child (line 26) | def add_child(self, child_node):
method get_only_parent (line 32) | def get_only_parent(self):
method parameters (line 39) | def parameters(self):
method __str__ (line 44) | def __str__(self):
method __repr__ (line 47) | def __repr__(self):
class Graph (line 51) | class Graph(object):
method __init__ (line 53) | def __init__(self, nodes=None, name=None):
method add_node (line 58) | def add_node(self, node):
method get_node (line 62) | def get_node(self, name):
method get_input_nodes (line 68) | def get_input_nodes(self):
method get_output_nodes (line 71) | def get_output_nodes(self):
method topologically_sorted (line 74) | def topologically_sorted(self):
method compute_output_shapes (line 96) | def compute_output_shapes(self):
method replaced (line 101) | def replaced(self, new_nodes):
method transformed (line 104) | def transformed(self, transformers):
method __contains__ (line 113) | def __contains__(self, key):
method __str__ (line 116) | def __str__(self):
class GraphBuilder (line 129) | class GraphBuilder(object):
method __init__ (line 132) | def __init__(self, def_path, phase='test'):
method load (line 142) | def load(self):
method filter_layers (line 148) | def filter_layers(self, layers):
method make_node (line 172) | def make_node(self, layer):
method make_input_nodes (line 182) | def make_input_nodes(self):
method build (line 202) | def build(self):
class NodeMapper (line 259) | class NodeMapper(NodeDispatch):
method __init__ (line 261) | def __init__(self, graph):
method map (line 264) | def map(self):
method map_chain (line 291) | def map_chain(self, chain):
method map_node (line 294) | def map_node(self, node):
method commit (line 301) | def commit(self, mapped_chains):
FILE: kaffe/layers.py
class NodeKind (line 58) | class NodeKind(LayerType):
method map_raw_kind (line 61) | def map_raw_kind(kind):
method compute_output_shape (line 67) | def compute_output_shape(node):
class NodeDispatchError (line 75) | class NodeDispatchError(KaffeError):
class NodeDispatch (line 80) | class NodeDispatch(object):
method get_handler_name (line 83) | def get_handler_name(node_kind):
method get_handler (line 91) | def get_handler(self, node_kind, prefix):
class LayerAdapter (line 101) | class LayerAdapter(object):
method __init__ (line 103) | def __init__(self, layer, kind):
method parameters (line 108) | def parameters(self):
method get_kernel_value (line 117) | def get_kernel_value(scalar, repeated, idx, default=None):
method kernel_parameters (line 134) | def kernel_parameters(self):
FILE: kaffe/shapes.py
function get_filter_output_shape (line 9) | def get_filter_output_shape(i_h, i_w, params, round_func):
function get_strided_kernel_output_shape (line 15) | def get_strided_kernel_output_shape(node, round_func):
function shape_not_implemented (line 26) | def shape_not_implemented(node):
function shape_identity (line 30) | def shape_identity(node):
function shape_scalar (line 35) | def shape_scalar(node):
function shape_data (line 39) | def shape_data(node):
function shape_mem_data (line 57) | def shape_mem_data(node):
function shape_concat (line 62) | def shape_concat(node):
function shape_convolution (line 73) | def shape_convolution(node):
function shape_pool (line 77) | def shape_pool(node):
function shape_inner_product (line 81) | def shape_inner_product(node):
FILE: kaffe/tensorflow/network_shape.py
function layer (line 7) | def layer(op):
class Network_Shape (line 32) | class Network_Shape(object):
method __init__ (line 34) | def __init__(self, inputs, trainable=True):
method setup (line 50) | def setup(self):
method load (line 55) | def load(self, data_path, prefix_name, session, ignore_missing=False):
method load_specific_vars (line 113) | def load_specific_vars(self, data_path, op_name, session, ignore_missi...
method feed (line 133) | def feed(self, *args):
method get_output (line 148) | def get_output(self):
method get_unique_name (line 152) | def get_unique_name(self, prefix):
method make_var (line 159) | def make_var(self, name, shape):
method make_var_fixed (line 165) | def make_var_fixed(self, name, shape):
method validate_padding (line 171) | def validate_padding(self, padding):
method conv (line 176) | def conv(self,
method relu (line 234) | def relu(self, input, name):
method max_pool (line 238) | def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PA...
method avg_pool (line 247) | def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PA...
method lrn (line 256) | def lrn(self, input, radius, alpha, beta, name, bias=1.0):
method concat (line 265) | def concat(self, inputs, axis, name):
method add (line 269) | def add(self, inputs, name):
method fc (line 273) | def fc(self, input, num_out, name, relu=True):
method softmax (line 291) | def softmax(self, input, name):
method batch_normalization (line 304) | def batch_normalization(self, input, name, scale_offset=True, relu=Fal...
method dropout (line 332) | def dropout(self, input, keep_prob, name):
FILE: kaffe/transformers.py
class DataInjector (line 14) | class DataInjector(object):
method __init__ (line 19) | def __init__(self, def_path, data_path):
method load (line 31) | def load(self):
method load_using_caffe (line 37) | def load_using_caffe(self):
method load_using_pb (line 43) | def load_using_pb(self):
method normalize_pb_data (line 51) | def normalize_pb_data(self, layer):
method adjust_parameters (line 66) | def adjust_parameters(self, node, data):
method __call__ (line 82) | def __call__(self, graph):
class DataReshaper (line 92) | class DataReshaper(object):
method __init__ (line 94) | def __init__(self, mapping, replace=True):
method has_spatial_parent (line 103) | def has_spatial_parent(self, node):
method map (line 111) | def map(self, node_kind):
method __call__ (line 117) | def __call__(self, graph):
class SubNodeFuser (line 151) | class SubNodeFuser(object):
method __call__ (line 156) | def __call__(self, graph):
method is_eligible_pair (line 182) | def is_eligible_pair(self, parent, child):
method merge (line 186) | def merge(self, parent, child):
class ReLUFuser (line 191) | class ReLUFuser(SubNodeFuser):
method __init__ (line 196) | def __init__(self, allowed_parent_types=None):
method is_eligible_pair (line 201) | def is_eligible_pair(self, parent, child):
method merge (line 205) | def merge(self, parent, _):
class BatchNormScaleBiasFuser (line 209) | class BatchNormScaleBiasFuser(SubNodeFuser):
method is_eligible_pair (line 219) | def is_eligible_pair(self, parent, child):
method merge (line 223) | def merge(self, parent, child):
class BatchNormPreprocessor (line 227) | class BatchNormPreprocessor(object):
method __call__ (line 233) | def __call__(self, graph):
class NodeRenamer (line 253) | class NodeRenamer(object):
method __init__ (line 259) | def __init__(self, renamer):
method __call__ (line 262) | def __call__(self, graph):
class ParameterNamer (line 268) | class ParameterNamer(object):
method __call__ (line 273) | def __call__(self, graph):
FILE: main_predict_6DoF.py
function extract_3dmm_pose (line 52) | def extract_3dmm_pose():
function main (line 315) | def main(_):
FILE: main_predict_ProjMat.py
function extract_3dmm_ProjMat (line 52) | def extract_3dmm_ProjMat():
function main (line 322) | def main(_):
FILE: myparse.py
function parse_input (line 3) | def parse_input(input_file):
FILE: pose_model.py
class ThreeD_Pose_Estimation (line 40) | class ThreeD_Pose_Estimation(object):
method __init__ (line 43) | def __init__(self, images, labels, mode, ifdropout, keep_rate_fc6, kee...
method _build_graph (line 70) | def _build_graph(self):
method _stride_arr (line 80) | def _stride_arr(self, stride):
method _build_model (line 84) | def _build_model(self):
method conv (line 146) | def conv(self, input, kernel, biases, k_h, k_w, c_o, s_h, s_w, paddin...
method _ST (line 171) | def _ST(self, name, x, channel_x, out_size, filter_size, out_filters, ...
method _variable_with_weight_decay (line 343) | def _variable_with_weight_decay(self, name, shape, stddev, wd):
method _variable_on_cpu (line 368) | def _variable_on_cpu(self, name, shape, initializer):
method _build_train_op (line 385) | def _build_train_op(self):
method _batch_norm (line 412) | def _batch_norm(self, name, x):
method _residual (line 457) | def _residual(self, x, in_filter, out_filter, stride,
method _bottleneck_residual (line 490) | def _bottleneck_residual(self, x, in_filter, out_filter, stride,
method _decay (line 525) | def _decay(self):
method _conv (line 537) | def _conv(self, name, x, filter_size, in_filters, out_filters, strides):
method _relu (line 547) | def _relu(self, x, leakiness=0.0):
method _fully_connected (line 551) | def _fully_connected(self, x, out_dim):
method _fully_connected_ST (line 567) | def _fully_connected_ST(self, x, out_dim):
method _global_avg_pool (line 582) | def _global_avg_pool(self, x):
method _repeat (line 587) | def _repeat(self, x, n_repeats):
method _interpolate (line 596) | def _interpolate(self, im, x, y, out_size, channel_x):
method _meshgrid (line 670) | def _meshgrid(self, height, width):
method _transform (line 689) | def _transform(self, theta, input_dim, out_size, channel_input):
FILE: pose_utils.py
function increaseBbox (line 11) | def increaseBbox(bbox, factor):
function image_bbox_processing_v2 (line 34) | def image_bbox_processing_v2(img, bbox):
function preProcessImage (line 80) | def preProcessImage(_savingDir, data_dict, data_root, factor, _alexNetSi...
function replaceInFile (line 136) | def replaceInFile(filep, before, after):
FILE: renderer_fpn.py
function render_fpn (line 36) | def render_fpn(inputFile, output_pose_db, outputFolder):
FILE: tf_utils.py
function conv2d (line 21) | def conv2d(x, n_filters,
function linear (line 70) | def linear(x, n_units, scope=None, stddev=0.02,
function weight_variable (line 98) | def weight_variable(shape):
function bias_variable (line 111) | def bias_variable(shape):
function dense_to_one_hot (line 123) | def dense_to_one_hot(labels, n_classes=2):
function prepare_trainVal_img_list (line 133) | def prepare_trainVal_img_list(img_list, num_subjs):
function select_eval_img_list (line 184) | def select_eval_img_list(img_list, num_subjs, save_file_name):
function input_processing (line 249) | def input_processing(images, pose_labels, id_labels, train_mean_vec, mea...
function train_mean_vec2mat (line 271) | def train_mean_vec2mat(train_mean, images_array):
function create_file_list (line 288) | def create_file_list(csv_file_path):
FILE: utils/pose_utils.py
function increaseBbox (line 16) | def increaseBbox(bbox, factor):
function increaseBbox_rescaleCASIA (line 40) | def increaseBbox_rescaleCASIA(bbox, factor):
function increaseBbox_rescaleYOLO (line 67) | def increaseBbox_rescaleYOLO(bbox, im):
function image_bbox_processing_v2 (line 112) | def image_bbox_processing_v2(img, bbox, landmarks=None):
function image_bbox_processing_v3 (line 171) | def image_bbox_processing_v3(img, bbox):
function preProcessImage (line 221) | def preProcessImage(im, lmks, bbox, factor, _alexNetSize, flipped):
function resize_crop_rescaleCASIA (line 280) | def resize_crop_rescaleCASIA(im, bbox, lmks, factor):
function resize_crop_rescaleCASIA_v2 (line 326) | def resize_crop_rescaleCASIA_v2(im, bbox, lmks, factor, bbox_type):
function resize_crop_AFLW (line 400) | def resize_crop_AFLW(im, bbox, lmks):
function preProcessImage_v2 (line 438) | def preProcessImage_v2(im, bbox, factor, _resNetSize, if_cropbyLmks_resc...
function preProcessImage_useGTBBox (line 504) | def preProcessImage_useGTBBox(im, lmks, bbox, factor, _alexNetSize, flip...
function replaceInFile (line 590) | def replaceInFile(filep, before, after):
function flip_lmk_idx (line 596) | def flip_lmk_idx(img, lmarks):
function pose_to_LMs (line 612) | def pose_to_LMs(pose_Rt):
function RotationMatrix (line 637) | def RotationMatrix(angle_x, angle_y, angle_z):
function matrix2angle (line 655) | def matrix2angle(R):
function P2sRt (line 689) | def P2sRt(P):
Condensed preview — 34 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (198K chars).
[
{
"path": ".gitmodules",
"chars": 107,
"preview": "[submodule \"face_renderer\"]\n\tpath = face_renderer\n\turl = https://github.com/iacopomasi/face_specific_augm/\n"
},
{
"path": "BFM/README",
"chars": 1,
"preview": "\n"
},
{
"path": "README.md",
"chars": 9244,
"preview": "# Face-Pose-Net\n\n\n<sub>**Extreme face alignment examples:** Faces rendered to a 45"
},
{
"path": "ResNet/ThreeDMM_shape.py",
"chars": 22299,
"preview": "import sys\nsys.path.append('./kaffe')\nsys.path.append('./kaffe/tensorflow')\n#from kaffe.tensorflow.network_allNonTrain i"
},
{
"path": "get_Rts.py",
"chars": 5402,
"preview": "\"\"\"3D pose estimation network: get R and ts "
},
{
"path": "input.csv",
"chars": 817,
"preview": "ID,FILE,FACE_X,FACE_Y,FACE_WIDTH,FACE_HEIGHT\r\nsubject1_a,images/input1.jpg,108.2642,119.6774,170,179\r\nsubject2_a,images/"
},
{
"path": "input_list.txt",
"chars": 232,
"preview": "./input_samples/HELEN_30427236_2_0.jpg,132.8177,213.8680,183.1707,171.1925\n./input_samples/LFPW_image_test_0001_0.jpg,13"
},
{
"path": "input_samples/README",
"chars": 45,
"preview": "Three input sample images to run our new FPN\n"
},
{
"path": "kaffe/__init__.py",
"chars": 115,
"preview": "from .graph import GraphBuilder, NodeMapper\nfrom .errors import KaffeError, print_stderr\n\nfrom . import tensorflow\n"
},
{
"path": "kaffe/errors.py",
"chars": 109,
"preview": "import sys\n\nclass KaffeError(Exception):\n pass\n\ndef print_stderr(msg):\n sys.stderr.write('%s\\n' % msg)\n"
},
{
"path": "kaffe/graph.py",
"chars": 11653,
"preview": "from google.protobuf import text_format\n\nfrom .caffe import get_caffe_resolver\nfrom .errors import KaffeError, print_std"
},
{
"path": "kaffe/layers.py",
"chars": 4782,
"preview": "import re\nimport numbers\nfrom collections import namedtuple\n\nfrom .shapes import *\n\nLAYER_DESCRIPTORS = {\n\n # Caffe T"
},
{
"path": "kaffe/shapes.py",
"chars": 2792,
"preview": "import math\nfrom collections import namedtuple\n\nfrom .errors import KaffeError\n\nTensorShape = namedtuple('TensorShape', "
},
{
"path": "kaffe/tensorflow/__init__.py",
"chars": 76,
"preview": "from .transformer import TensorFlowTransformer\nfrom .network import Network\n"
},
{
"path": "kaffe/tensorflow/network_shape.py",
"chars": 14124,
"preview": "import numpy as np\nimport tensorflow as tf\n\nDEFAULT_PADDING = 'SAME'\n\n\ndef layer(op):\n '''Decorator for composable ne"
},
{
"path": "kaffe/transformers.py",
"chars": 10799,
"preview": "'''\nA collection of graph transforms.\n\nA transformer is a callable that accepts a graph and returns a transformed versio"
},
{
"path": "main_fpn.py",
"chars": 1593,
"preview": "import sys\nimport os\nimport csv\nimport numpy as np\nimport cv2\nimport math\nimport pose_utils\nimport os\nimport myparse\nimp"
},
{
"path": "main_predict_6DoF.py",
"chars": 14115,
"preview": "import sys\nimport numpy as np\nimport tensorflow as tf\nimport cv2\nimport scipy.io as sio\nsys.path.append('./utils')\nimpor"
},
{
"path": "main_predict_ProjMat.py",
"chars": 14203,
"preview": "import sys\nimport numpy as np\nimport tensorflow as tf\nimport cv2\nimport scipy.io as sio\nsys.path.append('./utils')\nimpor"
},
{
"path": "models/README",
"chars": 64,
"preview": "Please download all the model files and put them in this folder\n"
},
{
"path": "myparse.py",
"chars": 516,
"preview": "import csv\n\ndef parse_input(input_file):\n\tdata_dict = dict()\n\treader = csv.DictReader(open(input_file,'r'))\n\t#### Readin"
},
{
"path": "output_render/README.md",
"chars": 2992,
"preview": "The rendered images will be saved here!\n\n## Subject 1 ##\n### input: ### \n\n### rendering: ##"
},
{
"path": "pose_model.py",
"chars": 32330,
"preview": "# Copyright 2016 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "pose_utils.py",
"chars": 5123,
"preview": "import sys\nimport os\n#sys.path.append('+glaive_pylib+')\n#import JanusUtils\nimport numpy as np\nimport cv2\nimport math\nimp"
},
{
"path": "renderer_fpn.py",
"chars": 5125,
"preview": "import csv\nimport lmdb\nimport sys\nimport numpy as np\nimport cv2\nimport os\nthis_path = os.path.dirname(os.path.abspath(__"
},
{
"path": "tf_utils.py",
"chars": 10585,
"preview": "# Copyright 2016 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
},
{
"path": "train_stats/README",
"chars": 50,
"preview": "Here are the precomputed training data statistics\n"
},
{
"path": "utils/README",
"chars": 32,
"preview": "Some utility functions are here\n"
},
{
"path": "utils/pose_utils.py",
"chars": 22674,
"preview": "import sys\nimport os\nimport numpy as np\nimport cv2\nimport math\nfrom math import cos, sin, atan2, asin\nimport fileinput\n\n"
}
]
// ... and 5 more files (download for full content)
About this extraction
This page contains the full source code of the fengju514/Face-Pose-Net GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 34 files (187.5 KB), approximately 50.9k tokens, and a symbol index with 176 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.