Repository: fengju514/Face-Pose-Net Branch: master Commit: 088bba25a170 Files: 34 Total size: 187.5 KB Directory structure: gitextract_nt_altqw/ ├── .gitmodules ├── BFM/ │ └── README ├── README.md ├── ResNet/ │ └── ThreeDMM_shape.py ├── get_Rts.py ├── input.csv ├── input_list.txt ├── input_samples/ │ └── README ├── kaffe/ │ ├── __init__.py │ ├── errors.py │ ├── graph.py │ ├── layers.py │ ├── shapes.py │ ├── tensorflow/ │ │ ├── __init__.py │ │ └── network_shape.py │ └── transformers.py ├── main_fpn.py ├── main_predict_6DoF.py ├── main_predict_ProjMat.py ├── models/ │ └── README ├── myparse.py ├── output_render/ │ └── README.md ├── pose_model.py ├── pose_utils.py ├── renderer_fpn.py ├── tf_utils.py ├── train_stats/ │ ├── 3DMM_shape_mean.npy │ ├── README │ ├── train_label_mean_300WLP.npy │ ├── train_label_mean_ProjMat.npy │ ├── train_label_std_300WLP.npy │ └── train_label_std_ProjMat.npy └── utils/ ├── README └── pose_utils.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitmodules ================================================ [submodule "face_renderer"] path = face_renderer url = https://github.com/iacopomasi/face_specific_augm/ ================================================ FILE: BFM/README ================================================ ================================================ FILE: README.md ================================================ # Face-Pose-Net ![Teaser](./teasers/extreme_cases.jpg) **Extreme face alignment examples:** Faces rendered to a 45 degrees yaw angle (aligned to half profile) using our FacePoseNet. Images were taken from the IJB-A collection and represent extreme viewing conditions, including near profile views, occlusions, and low resolution. Such conditions are often too hard for existing face landmark detection methods to handle yet easily aligned with our FacePoseNet.

This page contains DCNN model and python code to robustly estimate 6 degrees of freedom, 3D face pose from an unconstrained image, without the use of face landmark detectors. The method is described in the paper: _F.-J. Chang, A. Tran, T. Hassner, I. Masi, R. Nevatia, G. Medioni, "[FacePoseNet: Making a Case for Landmark-Free Face Alignment](https://arxiv.org/abs/1708.07517)", in 7th IEEE International Workshop on Analysis and Modeling of Faces and Gestures, ICCV Workshops, 2017_ [1]. This release bundles up our **FacePoseNet** (FPN) with the **Face Renderer** from Masi _et al._ [2,5], which is available separately from [this project page](https://github.com/iacopomasi/face_specific_augm). The result is an end-to-end pipeline that seamlessly estimates facial pose and produces multiple rendered views to be used for face alignment and data augmentation. ![Teaser](./teasers/diagram.png) ## Updates (Modified and New features, 12/20/2018) * FPN structure is changed to ResNet-101 for better pose prediction [fpn-resnet101](./ResNet/ThreeDMM_shape.py) * **Two versions of FPNs (under the assumption of weak perspective transformation) are added**: * (1) **Predict 6DoF head pose** (scale, pitch, yaw, roll, translation_x, translation_y): [main_predict_6DoF.py](./main_predict_6DoF.py) * (2) **Predict 11 parameters of the 3x4 projection matrix**: [main_predict_ProjMat.py](./main_predict_ProjMat.py) * The codes to convert 6DoF head pose to 3x4 projection matrix is [here](https://github.com/fengju514/Face-Pose-Net/blob/fb733f358d9f633f6525a41f3a7a0a99e5c71647/main_predict_6DoF.py#L263-L268) * The codes to convert 11 parameters / 3x4 projection matrix to 6DoF head pose is [here](https://github.com/fengju514/Face-Pose-Net/blob/92bd65fa056d17065890e186ca2f2b376a5ab135/main_predict_ProjMat.py#L306-L308) * The corresponding 3D shape and landmarks can be obtained by predicted 6DoF head pose [3D shape from 6DoF](https://github.com/fengju514/Face-Pose-Net/blob/92bd65fa056d17065890e186ca2f2b376a5ab135/main_predict_6DoF.py#L271-L297) or by predicted 11 parameters [3D shape from 11 parameters](https://github.com/fengju514/Face-Pose-Net/blob/92bd65fa056d17065890e186ca2f2b376a5ab135/main_predict_ProjMat.py#L272-L297) * Download new FPN models: Please put all model files [here](https://www.dropbox.com/sh/lr9u4my1qrhmgik/AADQVUIHSJIUXqUAj1AoZMIGa?dl=0) in the folder `models` * Download BFM models: Please put BFM shape and expression files [here](https://www.dropbox.com/sh/ru7ierl9516a9az/AABTP9hJj3dJnapicFFgHmOna?dl=0) in the folder `BFM` * Run new FPN to predict 6DoF head pose: ```bash $ python main_predict_6DoF.py ``` * Run new FPN to predict 11DoF parameters of the projection matrix: ```bash $ python main_predict_ProjMat.py ``` We provide a sample input list available [here](./input_list.txt). ```bash ``` where `` is the x,y coordinates of the upper-left point, the width, and the height of the tight face bounding box, either obtained manually, by the face detector or by the landmark detector. The predicted 6DoF and 11DoF results would be saved in [output_6DoF folder](https://github.com/fengju514/Face-Pose-Net/blob/a7923b764f92892021297fd046065c22a41dc519/main_predict_6DoF.py#L232-L236) and [output_ProjMat folder](https://github.com/fengju514/Face-Pose-Net/blob/a7923b764f92892021297fd046065c22a41dc519/main_predict_ProjMat.py#L235-L239) respectively. The output 3D shapes and landmarks by 6DoF and 11DoF are saved in [output_6DoF folder](https://github.com/fengju514/Face-Pose-Net/blob/a7923b764f92892021297fd046065c22a41dc519/main_predict_6DoF.py#L301) and in [output_ProjMat folder](https://github.com/fengju514/Face-Pose-Net/blob/a7923b764f92892021297fd046065c22a41dc519/main_predict_ProjMat.py#L301) respectively. You can visualize the 3D shapes and landmarks via Matlab. * The same renderer can be used. Instead of feeding into the 6DoF pose, you need to feed into the predicted landmarks either from 6DoF head pose or from 3x4 projection matrix. Please see an example in demo.py of [this project page](https://github.com/iacopomasi/face_specific_augm) ## Features * **6DoF 3D Head Pose estimation** + **3D rendered facial views**. * Does not use **fragile** landmark detectors * Robustness on images landmark detectors struggle with (low rez., occluded, etc.) * Extremely fast pose estimation * Both CPU and GPU supported * Provides better face recognition through better face alignment than alignment using state of the art landmark detectors [1] ## Dependencies * [TensorFlow](https://www.tensorflow.org/) * [OpenCV Python Wrapper](http://opencv.org/) * [Numpy](http://www.numpy.org/) * [Python2.7](https://www.python.org/download/releases/2.7/) The code has been tested on Linux only. On Linux you can rely on the default version of python, installing all the packages needed from the package manager or on Anaconda Python and install required packages through `conda`. **Note:** no landmarks are used in our method, although you can still project the landmarks on the input image using the estimated pose. See the paper for further details. ## Usage * **Important:** In order to download **both** FPN code and the renderer use `git clone --recursive` * **Important:** Please download the learned models from https://www.dropbox.com/s/r38psbq55y2yj4f/fpn_new_model.tar.gz?dl=0 and make sure that the FPN models are stored in the folder `fpn_new_model`. ### Run it The alignment and rendering can be used from the command line in the following, different ways. To run it directly on a list of images (software will run FPN to estimate the pose and then render novel views based on the estimated pose): ```bash $ python main_fpn.py ``` We provide a sample input list available [here](input.csv). ```bash ``` where `` is the face bounding box information, either obtained manually or by the face detector. ## Sample Results Please see the input images [here](images) and rendered outputs [here](output_render). ### input: ### ![sbj10](./images/input10.jpg) ### rendering: ### ![sbj10](./output_render/subject10/subject10_a_rendered_aug_-00_00_10.jpg) ![sbj10](./output_render/subject10/subject10_a_rendered_aug_-22_00_10.jpg) ![sbj10](./output_render/subject10/subject10_a_rendered_aug_-40_00_10.jpg) ![sbj10](./output_render/subject10/subject10_a_rendered_aug_-55_00_10.jpg) ![sbj10](./output_render/subject10/subject10_a_rendered_aug_-75_00_10.jpg) ## Current Limitations FPN is currently trained with a single 3D generic shape, without accounting for facial expressions. Addressing these is planned as future work. ## Citation Please cite our paper with the following bibtex if you use our face renderer: ``` latex @inproceedings{chang17fpn, title={{F}ace{P}ose{N}et: Making a Case for Landmark-Free Face Alignment}, booktitle = {7th IEEE International Workshop on Analysis and Modeling of Faces and Gestures, ICCV Workshops}, author={ Feng-ju Chang and Anh Tran and Tal Hassner and Iacopo Masi and Ram Nevatia and G\'{e}rard Medioni}, year={2017}, } ``` ## References [1] F.-J. Chang, A. Tran, T. Hassner, I. Masi, R. Nevatia, G. Medioni, "[FacePoseNet: Making a Case for Landmark-Free Face Alignment](https://arxiv.org/abs/1708.07517)", in 7th IEEE International Workshop on Analysis and Modeling of Faces and Gestures, ICCV Workshops, 2017 [2] I. Masi\*, A. Tran\*, T. Hassner\*, J. Leksut, G. Medioni, "Do We Really Need to Collect Million of Faces for Effective Face Recognition? ", ECCV 2016, \* denotes equal authorship [3] I. Masi, S. Rawls, G. Medioni, P. Natarajan "Pose-Aware Face Recognition in the Wild", CVPR 2016 [4] T. Hassner, S. Harel, E. Paz and R. Enbar "Effective Face Frontalization in Unconstrained Images", CVPR 2015 [5] I. Masi, T. Hassner, A. Tran, and G. Medioni, "Rapid Synthesis of Massive Face Sets for Improved Face Recognition", FG 2017 ## Changelog - August 2017, First Release ## Disclaimer _The SOFTWARE PACKAGE provided in this page is provided "as is", without any guarantee made as to its suitability or fitness for any particular use. It may contain bugs, so use of this tool is at your own risk. We take no responsibility for any damage of any sort that may unintentionally be caused through its use._ ## Contacts If you have any questions, drop an email to _fengjuch@usc.edu_, _anhttran@usc.edu_, _iacopo.masi@usc.edu_ or _hassner@isi.edu_ or leave a message below with GitHub (log-in is needed). ================================================ FILE: ResNet/ThreeDMM_shape.py ================================================ import sys sys.path.append('./kaffe') sys.path.append('./kaffe/tensorflow') #from kaffe.tensorflow.network_allNonTrain import Network from network_shape import Network_Shape class ResNet_101(Network_Shape): def setup(self): (self.feed('input') .conv(7, 7, 64, 2, 2, biased=False, relu=False, name='conv1') .batch_normalization(relu=True, name='bn_conv1') .max_pool(3, 3, 2, 2, name='pool1') .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch1') .batch_normalization(name='bn2a_branch1')) (self.feed('pool1') .conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2a_branch2a') .batch_normalization(relu=True, name='bn2a_branch2a') .conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2a_branch2b') .batch_normalization(relu=True, name='bn2a_branch2b') .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2a_branch2c') .batch_normalization(name='bn2a_branch2c')) (self.feed('bn2a_branch1', 'bn2a_branch2c') .add(name='res2a') .relu(name='res2a_relu') # batch_size x 56 x 56 x 256 .conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2b_branch2a') # batch_size x 56 x 56 x 64 .batch_normalization(relu=True, name='bn2b_branch2a') # batch_size x 56 x 56 x 64 .conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2b_branch2b') # batch_size x 56 x 56 x 64 .batch_normalization(relu=True, name='bn2b_branch2b') # batch_size x 56 x 56 x 64 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2b_branch2c') # batch_size x 56 x 56 x 256 .batch_normalization(name='bn2b_branch2c')) # batch_size x 56 x 56 x 256 (self.feed('res2a_relu', # batch_size x 56 x 56 x 256 'bn2b_branch2c') # batch_size x 56 x 56 x 256 .add(name='res2b') .relu(name='res2b_relu') # batch_size x 56 x 56 x 256 .conv(1, 1, 64, 1, 1, biased=False, relu=False, name='res2c_branch2a') .batch_normalization(relu=True, name='bn2c_branch2a') .conv(3, 3, 64, 1, 1, biased=False, relu=False, name='res2c_branch2b') .batch_normalization(relu=True, name='bn2c_branch2b') .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res2c_branch2c') .batch_normalization(name='bn2c_branch2c')) (self.feed('res2b_relu', 'bn2c_branch2c') .add(name='res2c') .relu(name='res2c_relu') # batch_size x 56 x 56 x 256 .conv(1, 1, 512, 2, 2, biased=False, relu=False, name='res3a_branch1') .batch_normalization(name='bn3a_branch1')) (self.feed('res2c_relu') # batch_size x 56 x 56 x 256 .conv(1, 1, 128, 2, 2, biased=False, relu=False, name='res3a_branch2a') .batch_normalization(relu=True, name='bn3a_branch2a') .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3a_branch2b') .batch_normalization(relu=True, name='bn3a_branch2b') .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3a_branch2c') .batch_normalization(name='bn3a_branch2c')) (self.feed('bn3a_branch1', 'bn3a_branch2c') .add(name='res3a') .relu(name='res3a_relu') # batch_size x 28 x 28 x 512 .conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3b1_branch2a') .batch_normalization(relu=True, name='bn3b1_branch2a') .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3b1_branch2b') .batch_normalization(relu=True, name='bn3b1_branch2b') .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3b1_branch2c') .batch_normalization(name='bn3b1_branch2c')) (self.feed('res3a_relu', 'bn3b1_branch2c') .add(name='res3b1') .relu(name='res3b1_relu') # batch_size x 28 x 28 x 512 .conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3b2_branch2a') .batch_normalization(relu=True, name='bn3b2_branch2a') .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3b2_branch2b') .batch_normalization(relu=True, name='bn3b2_branch2b') .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3b2_branch2c') .batch_normalization(name='bn3b2_branch2c')) (self.feed('res3b1_relu', 'bn3b2_branch2c') .add(name='res3b2') .relu(name='res3b2_relu') # batch_size x 28 x 28 x 512 .conv(1, 1, 128, 1, 1, biased=False, relu=False, name='res3b3_branch2a') .batch_normalization(relu=True, name='bn3b3_branch2a') .conv(3, 3, 128, 1, 1, biased=False, relu=False, name='res3b3_branch2b') .batch_normalization(relu=True, name='bn3b3_branch2b') .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res3b3_branch2c') .batch_normalization(name='bn3b3_branch2c')) (self.feed('res3b2_relu', 'bn3b3_branch2c') .add(name='res3b3') .relu(name='res3b3_relu') # batch_size x 28 x 28 x 512 .conv(1, 1, 1024, 2, 2, biased=False, relu=False, name='res4a_branch1') .batch_normalization(name='bn4a_branch1')) (self.feed('res3b3_relu') # batch_size x 28 x 28 x 512 .conv(1, 1, 256, 2, 2, biased=False, relu=False, name='res4a_branch2a') .batch_normalization(relu=True, name='bn4a_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4a_branch2b') .batch_normalization(relu=True, name='bn4a_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4a_branch2c') .batch_normalization(name='bn4a_branch2c')) (self.feed('bn4a_branch1', 'bn4a_branch2c') .add(name='res4a') .relu(name='res4a_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b1_branch2a') .batch_normalization(relu=True, name='bn4b1_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b1_branch2b') .batch_normalization(relu=True, name='bn4b1_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b1_branch2c') .batch_normalization(name='bn4b1_branch2c')) (self.feed('res4a_relu', 'bn4b1_branch2c') .add(name='res4b1') .relu(name='res4b1_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b2_branch2a') .batch_normalization(relu=True, name='bn4b2_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b2_branch2b') .batch_normalization(relu=True, name='bn4b2_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b2_branch2c') .batch_normalization(name='bn4b2_branch2c')) (self.feed('res4b1_relu', 'bn4b2_branch2c') .add(name='res4b2') .relu(name='res4b2_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b3_branch2a') .batch_normalization(relu=True, name='bn4b3_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b3_branch2b') .batch_normalization(relu=True, name='bn4b3_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b3_branch2c') .batch_normalization(name='bn4b3_branch2c')) (self.feed('res4b2_relu', 'bn4b3_branch2c') .add(name='res4b3') .relu(name='res4b3_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b4_branch2a') .batch_normalization(relu=True, name='bn4b4_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b4_branch2b') .batch_normalization(relu=True, name='bn4b4_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b4_branch2c') .batch_normalization(name='bn4b4_branch2c')) (self.feed('res4b3_relu', 'bn4b4_branch2c') .add(name='res4b4') .relu(name='res4b4_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b5_branch2a') .batch_normalization(relu=True, name='bn4b5_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b5_branch2b') .batch_normalization(relu=True, name='bn4b5_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b5_branch2c') .batch_normalization(name='bn4b5_branch2c')) (self.feed('res4b4_relu', 'bn4b5_branch2c') .add(name='res4b5') .relu(name='res4b5_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b6_branch2a') .batch_normalization(relu=True, name='bn4b6_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b6_branch2b') .batch_normalization(relu=True, name='bn4b6_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b6_branch2c') .batch_normalization(name='bn4b6_branch2c')) (self.feed('res4b5_relu', 'bn4b6_branch2c') .add(name='res4b6') .relu(name='res4b6_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b7_branch2a') .batch_normalization(relu=True, name='bn4b7_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b7_branch2b') .batch_normalization(relu=True, name='bn4b7_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b7_branch2c') .batch_normalization(name='bn4b7_branch2c')) (self.feed('res4b6_relu', 'bn4b7_branch2c') .add(name='res4b7') .relu(name='res4b7_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b8_branch2a') .batch_normalization(relu=True, name='bn4b8_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b8_branch2b') .batch_normalization(relu=True, name='bn4b8_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b8_branch2c') .batch_normalization(name='bn4b8_branch2c')) (self.feed('res4b7_relu', 'bn4b8_branch2c') .add(name='res4b8') .relu(name='res4b8_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b9_branch2a') .batch_normalization(relu=True, name='bn4b9_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b9_branch2b') .batch_normalization(relu=True, name='bn4b9_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b9_branch2c') .batch_normalization(name='bn4b9_branch2c')) (self.feed('res4b8_relu', 'bn4b9_branch2c') .add(name='res4b9') .relu(name='res4b9_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b10_branch2a') .batch_normalization(relu=True, name='bn4b10_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b10_branch2b') .batch_normalization(relu=True, name='bn4b10_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b10_branch2c') .batch_normalization(name='bn4b10_branch2c')) (self.feed('res4b9_relu', 'bn4b10_branch2c') .add(name='res4b10') .relu(name='res4b10_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b11_branch2a') .batch_normalization(relu=True, name='bn4b11_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b11_branch2b') .batch_normalization(relu=True, name='bn4b11_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b11_branch2c') .batch_normalization(name='bn4b11_branch2c')) (self.feed('res4b10_relu', 'bn4b11_branch2c') .add(name='res4b11') .relu(name='res4b11_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b12_branch2a') .batch_normalization(relu=True, name='bn4b12_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b12_branch2b') .batch_normalization(relu=True, name='bn4b12_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b12_branch2c') .batch_normalization(name='bn4b12_branch2c')) (self.feed('res4b11_relu', 'bn4b12_branch2c') .add(name='res4b12') .relu(name='res4b12_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b13_branch2a') .batch_normalization(relu=True, name='bn4b13_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b13_branch2b') .batch_normalization(relu=True, name='bn4b13_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b13_branch2c') .batch_normalization(name='bn4b13_branch2c')) (self.feed('res4b12_relu', 'bn4b13_branch2c') .add(name='res4b13') .relu(name='res4b13_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b14_branch2a') .batch_normalization(relu=True, name='bn4b14_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b14_branch2b') .batch_normalization(relu=True, name='bn4b14_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b14_branch2c') .batch_normalization(name='bn4b14_branch2c')) (self.feed('res4b13_relu', 'bn4b14_branch2c') .add(name='res4b14') .relu(name='res4b14_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b15_branch2a') .batch_normalization(relu=True, name='bn4b15_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b15_branch2b') .batch_normalization(relu=True, name='bn4b15_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b15_branch2c') .batch_normalization(name='bn4b15_branch2c')) (self.feed('res4b14_relu', 'bn4b15_branch2c') .add(name='res4b15') .relu(name='res4b15_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b16_branch2a') .batch_normalization(relu=True, name='bn4b16_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b16_branch2b') .batch_normalization(relu=True, name='bn4b16_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b16_branch2c') .batch_normalization(name='bn4b16_branch2c')) (self.feed('res4b15_relu', 'bn4b16_branch2c') .add(name='res4b16') .relu(name='res4b16_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b17_branch2a') .batch_normalization(relu=True, name='bn4b17_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b17_branch2b') .batch_normalization(relu=True, name='bn4b17_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b17_branch2c') .batch_normalization(name='bn4b17_branch2c')) (self.feed('res4b16_relu', 'bn4b17_branch2c') .add(name='res4b17') .relu(name='res4b17_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b18_branch2a') .batch_normalization(relu=True, name='bn4b18_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b18_branch2b') .batch_normalization(relu=True, name='bn4b18_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b18_branch2c') .batch_normalization(name='bn4b18_branch2c')) (self.feed('res4b17_relu', 'bn4b18_branch2c') .add(name='res4b18') .relu(name='res4b18_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b19_branch2a') .batch_normalization(relu=True, name='bn4b19_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b19_branch2b') .batch_normalization(relu=True, name='bn4b19_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b19_branch2c') .batch_normalization(name='bn4b19_branch2c')) (self.feed('res4b18_relu', 'bn4b19_branch2c') .add(name='res4b19') .relu(name='res4b19_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b20_branch2a') .batch_normalization(relu=True, name='bn4b20_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b20_branch2b') .batch_normalization(relu=True, name='bn4b20_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b20_branch2c') .batch_normalization(name='bn4b20_branch2c')) (self.feed('res4b19_relu', 'bn4b20_branch2c') .add(name='res4b20') .relu(name='res4b20_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b21_branch2a') .batch_normalization(relu=True, name='bn4b21_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b21_branch2b') .batch_normalization(relu=True, name='bn4b21_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b21_branch2c') .batch_normalization(name='bn4b21_branch2c')) (self.feed('res4b20_relu', 'bn4b21_branch2c') .add(name='res4b21') .relu(name='res4b21_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 256, 1, 1, biased=False, relu=False, name='res4b22_branch2a') .batch_normalization(relu=True, name='bn4b22_branch2a') .conv(3, 3, 256, 1, 1, biased=False, relu=False, name='res4b22_branch2b') .batch_normalization(relu=True, name='bn4b22_branch2b') .conv(1, 1, 1024, 1, 1, biased=False, relu=False, name='res4b22_branch2c') .batch_normalization(name='bn4b22_branch2c')) (self.feed('res4b21_relu', 'bn4b22_branch2c') .add(name='res4b22') .relu(name='res4b22_relu') # batch_size x 14 x 14 x 1024 .conv(1, 1, 2048, 2, 2, biased=False, relu=False, name='res5a_branch1') .batch_normalization(name='bn5a_branch1')) (self.feed('res4b22_relu') .conv(1, 1, 512, 2, 2, biased=False, relu=False, name='res5a_branch2a') .batch_normalization(relu=True, name='bn5a_branch2a') .conv(3, 3, 512, 1, 1, biased=False, relu=False, name='res5a_branch2b') .batch_normalization(relu=True, name='bn5a_branch2b') .conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5a_branch2c') .batch_normalization(name='bn5a_branch2c')) (self.feed('bn5a_branch1', 'bn5a_branch2c') .add(name='res5a') .relu(name='res5a_relu') # batch_size x 7 x 7 x 2048 .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5b_branch2a') .batch_normalization(relu=True, name='bn5b_branch2a') .conv(3, 3, 512, 1, 1, biased=False, relu=False, name='res5b_branch2b') .batch_normalization(relu=True, name='bn5b_branch2b') .conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5b_branch2c') .batch_normalization(name='bn5b_branch2c')) (self.feed('res5a_relu', 'bn5b_branch2c') .add(name='res5b') .relu(name='res5b_relu') # batch_size x 7 x 7 x 2048 .conv(1, 1, 512, 1, 1, biased=False, relu=False, name='res5c_branch2a') .batch_normalization(relu=True, name='bn5c_branch2a') .conv(3, 3, 512, 1, 1, biased=False, relu=False, name='res5c_branch2b') .batch_normalization(relu=True, name='bn5c_branch2b') .conv(1, 1, 2048, 1, 1, biased=False, relu=False, name='res5c_branch2c') .batch_normalization(name='bn5c_branch2c')) (self.feed('res5b_relu', 'bn5c_branch2c') .add(name='res5c') .relu(name='res5c_relu') # batch_size x 7 x 7 x 2048 .avg_pool(7, 7, 1, 1, padding='VALID', name='pool5')) #.fc(198, relu=False, name='fc_ftnew')) ================================================ FILE: get_Rts.py ================================================ """3D pose estimation network: get R and ts """ import lmdb import sys import time import csv import numpy as np import numpy.matlib import os import pose_model as Pose_model import tf_utils as util import tensorflow as tf import scipy from scipy import ndimage, misc import os.path import glob tf.logging.set_verbosity(tf.logging.INFO) FLAGS = tf.app.flags.FLAGS tf.app.flags.DEFINE_string('mode', 'valid', 'train or eval or valid.') tf.app.flags.DEFINE_integer('image_size', 227, 'Image side length.') tf.app.flags.DEFINE_string('log_root', '.', 'Directory to keep the checkpoints') tf.app.flags.DEFINE_string('model_root', '.', 'Directory to keep the checkpoints') tf.app.flags.DEFINE_integer('num_gpus', 0, 'Number of gpus used for training. (0 or 1)') tf.app.flags.DEFINE_integer('gpu_id', 0, 'GPU ID to be used.') tf.app.flags.DEFINE_string('input_csv', 'input.csv', 'input file to process') tf.app.flags.DEFINE_string('output_lmdb', 'pose_lmdb', 'output lmdb') tf.app.flags.DEFINE_integer('batch_size', 1, 'Batch Size') def run_pose_estimation(root_model_path, inputFile, outputDB, model_used, lr_rate_scalar, if_dropout, keep_rate): # Load training images mean: The values are in the range of [0,1], so the image pixel values should also divided by 255 file = np.load(root_model_path + "perturb_Oxford_train_imgs_mean.npz") train_mean_vec = file["train_mean_vec"] del file # Load training labels mean and std file = np.load(root_model_path +"perturb_Oxford_train_labels_mean_std.npz") mean_labels = file["mean_labels"] std_labels = file["std_labels"] del file # placeholders for the batches x = tf.placeholder(tf.float32, [FLAGS.batch_size, FLAGS.image_size, FLAGS.image_size, 3]) y = tf.placeholder(tf.float32, [FLAGS.batch_size, 6]) net_data = np.load(root_model_path +"PAM_frontal_ALexNet.npy").item() pose_3D_model = Pose_model.ThreeD_Pose_Estimation(x, y, 'valid', if_dropout, keep_rate, keep_rate, lr_rate_scalar, net_data, FLAGS.batch_size, mean_labels, std_labels) pose_3D_model._build_graph() del net_data # #Add ops to save and restore all the variables. saver = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.VARIABLES, scope='Spatial_Transformer')) pose_lmdb_env = lmdb.Environment(outputDB, map_size=1e12) with tf.Session(config=tf.ConfigProto(allow_soft_placement=True )) as sess, \ pose_lmdb_env.begin(write=True) as pose_txn: # Restore variables from disk. load_path = root_model_path + model_used saver.restore(sess, load_path) print("Model restored.") # Load cropped and scaled image file list (csv file) with open(inputFile, 'r') as csvfile: csvreader = csv.reader(csvfile, delimiter=',') lines = csvfile.readlines() for lin in lines: ### THE file is of the form ### key1, image_path_key_1 mykey = lin.split(',')[0] image_file_path = lin.split(',')[-1].rstrip('\n') import cv2 image = cv2.imread(image_file_path) image = np.asarray(image) # Fix the 2D image if len(image.shape) < 3: image_r = np.reshape(image, (image.shape[0], image.shape[1], 1)) image = np.append(image_r, image_r, axis=2) image = np.append(image, image_r, axis=2) label = np.array([0.,0.,0.,0.,0.,0.]) id_labels = np.array([0]) # Normalize images and labels nr_image, nr_pose_label, id_label = util.input_processing(image, label, id_labels, train_mean_vec, mean_labels, std_labels, 1, FLAGS.image_size, 739) del id_label # Reshape the image and label to fit model nr_image = nr_image.reshape(1, FLAGS.image_size, FLAGS.image_size, 3) nr_pose_label = nr_pose_label.reshape(1,6) # Get predicted R-ts pred_Rts = sess.run(pose_3D_model.preds_unNormalized, feed_dict={x: nr_image, y: nr_pose_label}) print 'Predicted pose for: ' + mykey pose_txn.put( mykey , pred_Rts[0].astype('float32') ) def esimatePose(root_model_path, inputFile, outputDB, model_used, lr_rate_scalar, if_dropout, keep_rate, use_gpu=False ): ## Force TF to use CPU oterwise we set the ID of the string of GPU we wanna use but here we are going use CPU os.environ['CUDA_VISIBLE_DEVICES'] = '1' #e.g. str(FLAGS.gpu_id)# '7' if use_gpu == False: dev = '/cpu:0' print "Using CPU" elif usc_gpu == True: dev = '/gpu:0' print "Using GPU " + os.environ['CUDA_VISIBLE_DEVICES'] else: raise ValueError('Only support 0 or 1 gpu.') run_pose_estimation( root_model_path, inputFile, outputDB, model_used, lr_rate_scalar, if_dropout, keep_rate ) ================================================ FILE: input.csv ================================================ ID,FILE,FACE_X,FACE_Y,FACE_WIDTH,FACE_HEIGHT subject1_a,images/input1.jpg,108.2642,119.6774,170,179 subject2_a,images/input2.jpg,48.51129913,38.1551857,141.19125366,149.40893555 subject3_a,images/input3.jpg,47.94947433,26.95211983,126.64208984,169.57138062 subject4_a,images/input4.jpg,41.02483749,81.23366547,122.9382019,79.80832672 subject5_a,images/input5.jpg,44.65912247,30.22106934,138.8326416,156.31950378 subject6_a,images/input6.jpg,54.94252396,41.26684189,117.19006348,137.38693237 subject7_a,images/input7.jpg,63.90779114,54.21474075,159.63040161,90.42936707 subject8_a,images/input8.jpg,53.62681198,48.40485001,78.09403992,101.56494141 subject9_a,images/input9.jpg,55.74394226,72.12078094,76.75720215,114.19478607 subject10_a,images/input10.jpg,48.07297897,30.98786163,145.96961975,124.47624969 ================================================ FILE: input_list.txt ================================================ ./input_samples/HELEN_30427236_2_0.jpg,132.8177,213.8680,183.1707,171.1925 ./input_samples/LFPW_image_test_0001_0.jpg,132.3776,213.9545,178.3532,168.3772 ./input_samples/LFPW_image_test_0008_0.jpg,138.6822,210.8271,172.3008,174.2025 ================================================ FILE: input_samples/README ================================================ Three input sample images to run our new FPN ================================================ FILE: kaffe/__init__.py ================================================ from .graph import GraphBuilder, NodeMapper from .errors import KaffeError, print_stderr from . import tensorflow ================================================ FILE: kaffe/errors.py ================================================ import sys class KaffeError(Exception): pass def print_stderr(msg): sys.stderr.write('%s\n' % msg) ================================================ FILE: kaffe/graph.py ================================================ from google.protobuf import text_format from .caffe import get_caffe_resolver from .errors import KaffeError, print_stderr from .layers import LayerAdapter, LayerType, NodeKind, NodeDispatch from .shapes import TensorShape class Node(object): def __init__(self, name, kind, layer=None): self.name = name self.kind = kind self.layer = LayerAdapter(layer, kind) if layer else None self.parents = [] self.children = [] self.data = None self.output_shape = None self.metadata = {} def add_parent(self, parent_node): assert parent_node not in self.parents self.parents.append(parent_node) if self not in parent_node.children: parent_node.children.append(self) def add_child(self, child_node): assert child_node not in self.children self.children.append(child_node) if self not in child_node.parents: child_node.parents.append(self) def get_only_parent(self): if len(self.parents) != 1: raise KaffeError('Node (%s) expected to have 1 parent. Found %s.' % (self, len(self.parents))) return self.parents[0] @property def parameters(self): if self.layer is not None: return self.layer.parameters return None def __str__(self): return '[%s] %s' % (self.kind, self.name) def __repr__(self): return '%s (0x%x)' % (self.name, id(self)) class Graph(object): def __init__(self, nodes=None, name=None): self.nodes = nodes or [] self.node_lut = {node.name: node for node in self.nodes} self.name = name def add_node(self, node): self.nodes.append(node) self.node_lut[node.name] = node def get_node(self, name): try: return self.node_lut[name] except KeyError: raise KaffeError('Layer not found: %s' % name) def get_input_nodes(self): return [node for node in self.nodes if len(node.parents) == 0] def get_output_nodes(self): return [node for node in self.nodes if len(node.children) == 0] def topologically_sorted(self): sorted_nodes = [] unsorted_nodes = list(self.nodes) temp_marked = set() perm_marked = set() def visit(node): if node in temp_marked: raise KaffeError('Graph is not a DAG.') if node in perm_marked: return temp_marked.add(node) for child in node.children: visit(child) perm_marked.add(node) temp_marked.remove(node) sorted_nodes.insert(0, node) while len(unsorted_nodes): visit(unsorted_nodes.pop()) return sorted_nodes def compute_output_shapes(self): sorted_nodes = self.topologically_sorted() for node in sorted_nodes: node.output_shape = TensorShape(*NodeKind.compute_output_shape(node)) def replaced(self, new_nodes): return Graph(nodes=new_nodes, name=self.name) def transformed(self, transformers): graph = self for transformer in transformers: graph = transformer(graph) if graph is None: raise KaffeError('Transformer failed: {}'.format(transformer)) assert isinstance(graph, Graph) return graph def __contains__(self, key): return key in self.node_lut def __str__(self): hdr = '{:<20} {:<30} {:>20} {:>20}'.format('Type', 'Name', 'Param', 'Output') s = [hdr, '-' * 94] for node in self.topologically_sorted(): # If the node has learned parameters, display the first one's shape. # In case of convolutions, this corresponds to the weights. data_shape = node.data[0].shape if node.data else '--' out_shape = node.output_shape or '--' s.append('{:<20} {:<30} {:>20} {:>20}'.format(node.kind, node.name, data_shape, tuple(out_shape))) return '\n'.join(s) class GraphBuilder(object): '''Constructs a model graph from a Caffe protocol buffer definition.''' def __init__(self, def_path, phase='test'): ''' def_path: Path to the model definition (.prototxt) data_path: Path to the model data (.caffemodel) phase: Either 'test' or 'train'. Used for filtering phase-specific nodes. ''' self.def_path = def_path self.phase = phase self.load() def load(self): '''Load the layer definitions from the prototxt.''' self.params = get_caffe_resolver().NetParameter() with open(self.def_path, 'rb') as def_file: text_format.Merge(def_file.read(), self.params) def filter_layers(self, layers): '''Filter out layers based on the current phase.''' phase_map = {0: 'train', 1: 'test'} filtered_layer_names = set() filtered_layers = [] for layer in layers: phase = self.phase if len(layer.include): phase = phase_map[layer.include[0].phase] if len(layer.exclude): phase = phase_map[1 - layer.include[0].phase] exclude = (phase != self.phase) # Dropout layers appear in a fair number of Caffe # test-time networks. These are just ignored. We'll # filter them out here. if (not exclude) and (phase == 'test'): exclude = (layer.type == LayerType.Dropout) if not exclude: filtered_layers.append(layer) # Guard against dupes. assert layer.name not in filtered_layer_names filtered_layer_names.add(layer.name) return filtered_layers def make_node(self, layer): '''Create a graph node for the given layer.''' kind = NodeKind.map_raw_kind(layer.type) if kind is None: raise KaffeError('Unknown layer type encountered: %s' % layer.type) # We want to use the layer's top names (the "output" names), rather than the # name attribute, which is more of readability thing than a functional one. # Other layers will refer to a node by its "top name". return Node(layer.name, kind, layer=layer) def make_input_nodes(self): ''' Create data input nodes. This method is for old-style inputs, where the input specification was not treated as a first-class layer in the prototext. Newer models use the "Input layer" type. ''' nodes = [Node(name, NodeKind.Data) for name in self.params.input] if len(nodes): input_dim = map(int, self.params.input_dim) if not input_dim: if len(self.params.input_shape) > 0: input_dim = map(int, self.params.input_shape[0].dim) else: raise KaffeError('Dimensions for input not specified.') for node in nodes: node.output_shape = tuple(input_dim) return nodes def build(self): ''' Builds the graph from the Caffe layer definitions. ''' # Get the layers layers = self.params.layers or self.params.layer # Filter out phase-excluded layers layers = self.filter_layers(layers) # Get any separately-specified input layers nodes = self.make_input_nodes() nodes += [self.make_node(layer) for layer in layers] # Initialize the graph graph = Graph(nodes=nodes, name=self.params.name) # Connect the nodes # # A note on layers and outputs: # In Caffe, each layer can produce multiple outputs ("tops") from a set of inputs # ("bottoms"). The bottoms refer to other layers' tops. The top can rewrite a bottom # (in case of in-place operations). Note that the layer's name is not used for establishing # any connectivity. It's only used for data association. By convention, a layer with a # single top will often use the same name (although this is not required). # # The current implementation only supports single-output nodes (note that a node can still # have multiple children, since multiple child nodes can refer to the single top's name). node_outputs = {} for layer in layers: node = graph.get_node(layer.name) for input_name in layer.bottom: assert input_name != layer.name parent_node = node_outputs.get(input_name) if (parent_node is None) or (parent_node == node): parent_node = graph.get_node(input_name) node.add_parent(parent_node) if len(layer.top)>1: raise KaffeError('Multiple top nodes are not supported.') for output_name in layer.top: if output_name == layer.name: # Output is named the same as the node. No further action required. continue # There are two possibilities here: # # Case 1: output_name refers to another node in the graph. # This is an "in-place operation" that overwrites an existing node. # This would create a cycle in the graph. We'll undo the in-placing # by substituting this node wherever the overwritten node is referenced. # # Case 2: output_name violates the convention layer.name == output_name. # Since we are working in the single-output regime, we will can rename it to # match the layer name. # # For both cases, future references to this top re-routes to this node. node_outputs[output_name] = node graph.compute_output_shapes() return graph class NodeMapper(NodeDispatch): def __init__(self, graph): self.graph = graph def map(self): nodes = self.graph.topologically_sorted() # Remove input nodes - we'll handle them separately. input_nodes = self.graph.get_input_nodes() nodes = [t for t in nodes if t not in input_nodes] # Decompose DAG into chains. chains = [] for node in nodes: attach_to_chain = None if len(node.parents) == 1: parent = node.get_only_parent() for chain in chains: if chain[-1] == parent: # Node is part of an existing chain. attach_to_chain = chain break if attach_to_chain is None: # Start a new chain for this node. attach_to_chain = [] chains.append(attach_to_chain) attach_to_chain.append(node) # Map each chain. mapped_chains = [] for chain in chains: mapped_chains.append(self.map_chain(chain)) return self.commit(mapped_chains) def map_chain(self, chain): return [self.map_node(node) for node in chain] def map_node(self, node): map_func = self.get_handler(node.kind, 'map') mapped_node = map_func(node) assert mapped_node is not None mapped_node.node = node return mapped_node def commit(self, mapped_chains): raise NotImplementedError('Must be implemented by subclass.') ================================================ FILE: kaffe/layers.py ================================================ import re import numbers from collections import namedtuple from .shapes import * LAYER_DESCRIPTORS = { # Caffe Types 'AbsVal': shape_identity, 'Accuracy': shape_scalar, 'ArgMax': shape_not_implemented, 'BatchNorm': shape_identity, 'BNLL': shape_not_implemented, 'Concat': shape_concat, 'ContrastiveLoss': shape_scalar, 'Convolution': shape_convolution, 'Deconvolution': shape_not_implemented, 'Data': shape_data, 'Dropout': shape_identity, 'DummyData': shape_data, 'EuclideanLoss': shape_scalar, 'Eltwise': shape_identity, 'Exp': shape_identity, 'Flatten': shape_not_implemented, 'HDF5Data': shape_data, 'HDF5Output': shape_identity, 'HingeLoss': shape_scalar, 'Im2col': shape_not_implemented, 'ImageData': shape_data, 'InfogainLoss': shape_scalar, 'InnerProduct': shape_inner_product, 'Input': shape_data, 'LRN': shape_identity, 'MemoryData': shape_mem_data, 'MultinomialLogisticLoss': shape_scalar, 'MVN': shape_not_implemented, 'Pooling': shape_pool, 'Power': shape_identity, 'ReLU': shape_identity, 'Scale': shape_identity, 'Sigmoid': shape_identity, 'SigmoidCrossEntropyLoss': shape_scalar, 'Silence': shape_not_implemented, 'Softmax': shape_identity, 'SoftmaxWithLoss': shape_scalar, 'Split': shape_not_implemented, 'Slice': shape_not_implemented, 'TanH': shape_identity, 'WindowData': shape_not_implemented, 'Threshold': shape_identity, } LAYER_TYPES = LAYER_DESCRIPTORS.keys() LayerType = type('LayerType', (), {t: t for t in LAYER_TYPES}) class NodeKind(LayerType): @staticmethod def map_raw_kind(kind): if kind in LAYER_TYPES: return kind return None @staticmethod def compute_output_shape(node): try: val = LAYER_DESCRIPTORS[node.kind](node) return val except NotImplementedError: raise KaffeError('Output shape computation not implemented for type: %s' % node.kind) class NodeDispatchError(KaffeError): pass class NodeDispatch(object): @staticmethod def get_handler_name(node_kind): if len(node_kind) <= 4: # A catch-all for things like ReLU and tanh return node_kind.lower() # Convert from CamelCase to under_scored name = re.sub('(.)([A-Z][a-z]+)', r'\1_\2', node_kind) return re.sub('([a-z0-9])([A-Z])', r'\1_\2', name).lower() def get_handler(self, node_kind, prefix): name = self.get_handler_name(node_kind) name = '_'.join((prefix, name)) try: return getattr(self, name) except AttributeError: raise NodeDispatchError('No handler found for node kind: %s (expected: %s)' % (node_kind, name)) class LayerAdapter(object): def __init__(self, layer, kind): self.layer = layer self.kind = kind @property def parameters(self): name = NodeDispatch.get_handler_name(self.kind) name = '_'.join((name, 'param')) try: return getattr(self.layer, name) except AttributeError: raise NodeDispatchError('Caffe parameters not found for layer kind: %s' % (self.kind)) @staticmethod def get_kernel_value(scalar, repeated, idx, default=None): if scalar: return scalar if repeated: if isinstance(repeated, numbers.Number): return repeated if len(repeated) == 1: # Same value applies to all spatial dimensions return int(repeated[0]) assert idx < len(repeated) # Extract the value for the given spatial dimension return repeated[idx] if default is None: raise ValueError('Unable to determine kernel parameter!') return default @property def kernel_parameters(self): assert self.kind in (NodeKind.Convolution, NodeKind.Pooling) params = self.parameters k_h = self.get_kernel_value(params.kernel_h, params.kernel_size, 0) k_w = self.get_kernel_value(params.kernel_w, params.kernel_size, 1) s_h = self.get_kernel_value(params.stride_h, params.stride, 0, default=1) s_w = self.get_kernel_value(params.stride_w, params.stride, 1, default=1) p_h = self.get_kernel_value(params.pad_h, params.pad, 0, default=0) p_w = self.get_kernel_value(params.pad_h, params.pad, 1, default=0) return KernelParameters(k_h, k_w, s_h, s_w, p_h, p_w) KernelParameters = namedtuple('KernelParameters', ['kernel_h', 'kernel_w', 'stride_h', 'stride_w', 'pad_h', 'pad_w']) ================================================ FILE: kaffe/shapes.py ================================================ import math from collections import namedtuple from .errors import KaffeError TensorShape = namedtuple('TensorShape', ['batch_size', 'channels', 'height', 'width']) def get_filter_output_shape(i_h, i_w, params, round_func): o_h = (i_h + 2 * params.pad_h - params.kernel_h) / float(params.stride_h) + 1 o_w = (i_w + 2 * params.pad_w - params.kernel_w) / float(params.stride_w) + 1 return (int(round_func(o_h)), int(round_func(o_w))) def get_strided_kernel_output_shape(node, round_func): assert node.layer is not None input_shape = node.get_only_parent().output_shape o_h, o_w = get_filter_output_shape(input_shape.height, input_shape.width, node.layer.kernel_parameters, round_func) params = node.layer.parameters has_c_o = hasattr(params, 'num_output') c = params.num_output if has_c_o else input_shape.channels return TensorShape(input_shape.batch_size, c, o_h, o_w) def shape_not_implemented(node): raise NotImplementedError def shape_identity(node): assert len(node.parents) > 0 return node.parents[0].output_shape def shape_scalar(node): return TensorShape(1, 1, 1, 1) def shape_data(node): if node.output_shape: # Old-style input specification return node.output_shape try: # New-style input specification return map(int, node.parameters.shape[0].dim) except: # We most likely have a data layer on our hands. The problem is, # Caffe infers the dimensions of the data from the source (eg: LMDB). # We want to avoid reading datasets here. Fail for now. # This can be temporarily fixed by transforming the data layer to # Caffe's "input" layer (as is usually used in the "deploy" version). # TODO: Find a better solution for this. raise KaffeError('Cannot determine dimensions of data layer.\n' 'See comments in function shape_data for more info.') def shape_mem_data(node): params = node.parameters return TensorShape(params.batch_size, params.channels, params.height, params.width) def shape_concat(node): axis = node.layer.parameters.axis output_shape = None for parent in node.parents: if output_shape is None: output_shape = list(parent.output_shape) else: output_shape[axis] += parent.output_shape[axis] return tuple(output_shape) def shape_convolution(node): return get_strided_kernel_output_shape(node, math.floor) def shape_pool(node): return get_strided_kernel_output_shape(node, math.ceil) def shape_inner_product(node): input_shape = node.get_only_parent().output_shape return TensorShape(input_shape.batch_size, node.layer.parameters.num_output, 1, 1) ================================================ FILE: kaffe/tensorflow/__init__.py ================================================ from .transformer import TensorFlowTransformer from .network import Network ================================================ FILE: kaffe/tensorflow/network_shape.py ================================================ import numpy as np import tensorflow as tf DEFAULT_PADDING = 'SAME' def layer(op): '''Decorator for composable network layers.''' def layer_decorated(self, *args, **kwargs): # Automatically set a name if not provided. name = kwargs.setdefault('name', self.get_unique_name(op.__name__)) # Figure out the layer inputs. if len(self.terminals) == 0: raise RuntimeError('No input variables found for layer %s.' % name) elif len(self.terminals) == 1: layer_input = self.terminals[0] else: layer_input = list(self.terminals) # Perform the operation and get the output. layer_output = op(self, layer_input, *args, **kwargs) # Add to layer LUT. self.layers[name] = layer_output # This output is now the input for the next layer. self.feed(layer_output) # Return self for chained calls. return self return layer_decorated class Network_Shape(object): def __init__(self, inputs, trainable=True): # The input nodes for this network self.inputs = inputs # The current list of terminal nodes self.terminals = [] # Mapping from layer names to layers self.layers = dict(inputs) # If true, the resulting variables are set as trainable self.trainable = trainable print self.trainable # Switch variable for dropout self.use_dropout = tf.placeholder_with_default(tf.constant(1.0), shape=[], name='use_dropout') self.setup() def setup(self): '''Construct the network. ''' raise NotImplementedError('Must be implemented by the subclass.') def load(self, data_path, prefix_name, session, ignore_missing=False): '''Load network weights. data_path: The path to the numpy-serialized network weights session: The current TensorFlow session ignore_missing: If true, serialized weights for missing layers are ignored. ''' data_dict = np.load(data_path).item() print len(data_dict) #data_dict['res2b_branch2a'] for op_name in data_dict: #print op_name #if op_name == "res2b_branch2a": # REUSE = None #else: # REUSE = True if op_name == 'fc_ftnew': continue with tf.variable_scope(prefix_name + '/' + op_name, reuse=True): # reuse=True for param_name, data in data_dict[op_name].iteritems(): #if op_name == 'fc_ftnew': # print param_name, data, data.shape try: #if op_name == "res2b_branch2a": # var = tf.Variable(data_dict[op_name][param_name], trainable=False, name=param_name) #else: var = tf.get_variable(param_name) session.run(var.assign(data)) except ValueError: if not ignore_missing: raise """ def load(self, data_path, ignore_missing=False): '''Load network weights. data_path: The path to the numpy-serialized network weights session: The current TensorFlow session ignore_missing: If true, serialized weights for missing layers are ignored. ''' data_dict = np.load(data_path).item() #print data_dict['res5c_branch2c'], data_dict['res5c_branch2c']['weights'], data_dict['res5c_branch2c']['weights'].shape for op_name in data_dict: with tf.variable_scope(op_name): # reuse=True for param_name, data in data_dict[op_name].iteritems(): #print param_name, data try: if op_name == 'res5c_branch2c': var = tf.Variable(data_dict[op_name][param_name], trainable=True, name=param_name) else: var = tf.Variable(data_dict[op_name][param_name], trainable=False, name=param_name) #session.run(var.assign(data)) except ValueError: if not ignore_missing: raise """ def load_specific_vars(self, data_path, op_name, session, ignore_missing=False): '''Load network weights. data_path: The path to the numpy-serialized network weights session: The current TensorFlow session ignore_missing: If true, serialized weights for missing layers are ignored. ''' data_dict = np.load(data_path).item() with tf.variable_scope(op_name, reuse=True): # reuse=None for param_name, data in data_dict[op_name].iteritems(): #print param_name, data try: var = tf.get_variable(param_name) session.run(var.assign(data)) except ValueError: if not ignore_missing: raise def feed(self, *args): '''Set the input(s) for the next operation by replacing the terminal nodes. The arguments can be either layer names or the actual layers. ''' assert len(args) != 0 self.terminals = [] for fed_layer in args: if isinstance(fed_layer, basestring): try: fed_layer = self.layers[fed_layer] except KeyError: raise KeyError('Unknown layer name fed: %s' % fed_layer) self.terminals.append(fed_layer) return self def get_output(self): '''Returns the current network output.''' return self.terminals[-1] def get_unique_name(self, prefix): '''Returns an index-suffixed unique name for the given prefix. This is used for auto-generating layer names based on the type-prefix. ''' ident = sum(t.startswith(prefix) for t, _ in self.layers.items()) + 1 return '%s_%d' % (prefix, ident) def make_var(self, name, shape): '''Creates a new TensorFlow variable.''' return tf.get_variable(name, shape, trainable=self.trainable) #self.trainable) #tmp = tf.get_variable(name, shape=shape, trainable=False) #return tf.Variable(tmp, trainable=False, name=name) def make_var_fixed(self, name, shape): '''Creates a new TensorFlow variable.''' return tf.get_variable(name, shape, trainable=False) #tmp = tf.get_variable(name, shape=shape, trainable=False) #return tf.Variable(tmp, trainable=False, name=name) def validate_padding(self, padding): '''Verifies that the padding is one of the supported ones.''' assert padding in ('SAME', 'VALID') @layer def conv(self, input, k_h, k_w, c_o, s_h, s_w, name, relu=True, padding=DEFAULT_PADDING, group=1, biased=True): # Verify that the padding is acceptable self.validate_padding(padding) # Get the number of channels in the input c_i = input.get_shape()[-1] # Verify that the grouping parameter is valid assert c_i % group == 0 assert c_o % group == 0 # Convolution for a given input and kernel convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding) with tf.variable_scope(name) as scope: if name == 'res5c_branch2c' or name == 'res5c_branch2b' or name == 'res5c_branch2a' or \ name == 'res5b_branch2c' or name == 'res5b_branch2b' or name == 'res5b_branch2a': # or \ #name == 'res5a_branch2c' or name == 'res5a_branch2b' or name == 'res5a_branch2a' or \ #name == 'res5a_branch1': kernel = self.make_var('weights', shape=[k_h, k_w, c_i / group, c_o]) else: #kernel = self.make_var_fixed('weights', shape=[k_h, k_w, c_i / group, c_o]) kernel = self.make_var('weights', shape=[k_h, k_w, c_i / group, c_o]) if group == 1: # This is the common-case. Convolve the input without any further complications. output = convolve(input, kernel) else: # Split the input into groups and then convolve each of them independently input_groups = tf.split(3, group, input) kernel_groups = tf.split(3, group, kernel) output_groups = [convolve(i, k) for i, k in zip(input_groups, kernel_groups)] # Concatenate the groups output = tf.concat(3, output_groups) # Add the biases if biased: if name == 'res5c_branch2c' or name == 'res5c_branch2b' or name == 'res5c_branch2a' or \ name == 'res5b_branch2c' or name == 'res5b_branch2b' or name == 'res5b_branch2a': # or \ #name == 'res5a_branch2c' or name == 'res5a_branch2b' or name == 'res5a_branch2a' or \ #name == 'res5a_branch1': biases = self.make_var('biases', [c_o]) else: #biases = self.make_var_fixed('biases', [c_o]) biases = self.make_var('biases', [c_o]) output = tf.nn.bias_add(output, biases) if relu: # ReLU non-linearity output = tf.nn.relu(output, name=scope.name) return output @layer def relu(self, input, name): return tf.nn.relu(input, name=name) @layer def max_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING): self.validate_padding(padding) return tf.nn.max_pool(input, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding, name=name) @layer def avg_pool(self, input, k_h, k_w, s_h, s_w, name, padding=DEFAULT_PADDING): self.validate_padding(padding) return tf.nn.avg_pool(input, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding, name=name) @layer def lrn(self, input, radius, alpha, beta, name, bias=1.0): return tf.nn.local_response_normalization(input, depth_radius=radius, alpha=alpha, beta=beta, bias=bias, name=name) @layer def concat(self, inputs, axis, name): return tf.concat(concat_dim=axis, values=inputs, name=name) @layer def add(self, inputs, name): return tf.add_n(inputs, name=name) @layer def fc(self, input, num_out, name, relu=True): with tf.variable_scope(name) as scope: input_shape = input.get_shape() if input_shape.ndims == 4: # The input is spatial. Vectorize it first. dim = 1 for d in input_shape[1:].as_list(): dim *= d feed_in = tf.reshape(input, [-1, dim]) else: feed_in, dim = (input, input_shape[-1].value) weights = self.make_var('weights', shape=[dim, num_out]) biases = self.make_var('biases', [num_out]) op = tf.nn.relu_layer if relu else tf.nn.xw_plus_b fc = op(feed_in, weights, biases, name=scope.name) return fc @layer def softmax(self, input, name): input_shape = map(lambda v: v.value, input.get_shape()) if len(input_shape) > 2: # For certain models (like NiN), the singleton spatial dimensions # need to be explicitly squeezed, since they're not broadcast-able # in TensorFlow's NHWC ordering (unlike Caffe's NCHW). if input_shape[1] == 1 and input_shape[2] == 1: input = tf.squeeze(input, squeeze_dims=[1, 2]) else: raise ValueError('Rank 2 tensor input expected for softmax!') return tf.nn.softmax(input, name) @layer def batch_normalization(self, input, name, scale_offset=True, relu=False): # NOTE: Currently, only inference is supported with tf.variable_scope(name) as scope: shape = [input.get_shape()[-1]] if scale_offset: scale = self.make_var_fixed('scale', shape=shape) offset = self.make_var_fixed('offset', shape=shape) #scale = self.make_var('scale', shape=shape) #offset = self.make_var('offset', shape=shape) else: scale, offset = (None, None) output = tf.nn.batch_normalization( input, mean=self.make_var_fixed('mean', shape=shape), variance=self.make_var_fixed('variance', shape=shape), #mean=self.make_var('mean', shape=shape), #variance=self.make_var('variance', shape=shape), offset=offset, scale=scale, # TODO: This is the default Caffe batch norm eps # Get the actual eps from parameters variance_epsilon=1e-5, name=name) if relu: output = tf.nn.relu(output) return output @layer def dropout(self, input, keep_prob, name): keep = 1 - self.use_dropout + (self.use_dropout * keep_prob) return tf.nn.dropout(input, keep, name=name) ================================================ FILE: kaffe/transformers.py ================================================ ''' A collection of graph transforms. A transformer is a callable that accepts a graph and returns a transformed version. ''' import numpy as np from .caffe import get_caffe_resolver, has_pycaffe from .errors import KaffeError, print_stderr from .layers import NodeKind class DataInjector(object): ''' Associates parameters loaded from a .caffemodel file with their corresponding nodes. ''' def __init__(self, def_path, data_path): # The .prototxt file defining the graph self.def_path = def_path # The .caffemodel file containing the learned parameters self.data_path = data_path # Set to true if the fallback protocol-buffer based backend was used self.did_use_pb = False # A list containing (layer name, parameters) tuples self.params = None # Load the parameters self.load() def load(self): if has_pycaffe(): self.load_using_caffe() else: self.load_using_pb() def load_using_caffe(self): caffe = get_caffe_resolver().caffe net = caffe.Net(self.def_path, self.data_path, caffe.TEST) data = lambda blob: blob.data self.params = [(k, map(data, v)) for k, v in net.params.items()] def load_using_pb(self): data = get_caffe_resolver().NetParameter() data.MergeFromString(open(self.data_path, 'rb').read()) pair = lambda layer: (layer.name, self.normalize_pb_data(layer)) layers = data.layers or data.layer self.params = [pair(layer) for layer in layers if layer.blobs] self.did_use_pb = True def normalize_pb_data(self, layer): transformed = [] for blob in layer.blobs: if len(blob.shape.dim): dims = blob.shape.dim c_o, c_i, h, w = map(int, [1] * (4 - len(dims)) + list(dims)) else: c_o = blob.num c_i = blob.channels h = blob.height w = blob.width data = np.array(blob.data, dtype=np.float32).reshape(c_o, c_i, h, w) transformed.append(data) return transformed def adjust_parameters(self, node, data): if not self.did_use_pb: return data # When using the protobuf-backend, each parameter initially has four dimensions. # In certain cases (like FC layers), we want to eliminate the singleton dimensions. # This implementation takes care of the common cases. However, it does leave the # potential for future issues. # The Caffe-backend does not suffer from this problem. data = list(data) squeeze_indices = [1] # Squeeze biases. if node.kind == NodeKind.InnerProduct: squeeze_indices.append(0) # Squeeze FC. for idx in squeeze_indices: data[idx] = np.squeeze(data[idx]) return data def __call__(self, graph): for layer_name, data in self.params: if layer_name in graph: node = graph.get_node(layer_name) node.data = self.adjust_parameters(node, data) else: print_stderr('Ignoring parameters for non-existent layer: %s' % layer_name) return graph class DataReshaper(object): def __init__(self, mapping, replace=True): # A dictionary mapping NodeKind to the transposed order. self.mapping = mapping # The node kinds eligible for reshaping self.reshaped_node_types = self.mapping.keys() # If true, the reshaped data will replace the old one. # Otherwise, it's set to the reshaped_data attribute. self.replace = replace def has_spatial_parent(self, node): try: parent = node.get_only_parent() s = parent.output_shape return s.height > 1 or s.width > 1 except KaffeError: return False def map(self, node_kind): try: return self.mapping[node_kind] except KeyError: raise KaffeError('Ordering not found for node kind: {}'.format(node_kind)) def __call__(self, graph): for node in graph.nodes: if node.data is None: continue if node.kind not in self.reshaped_node_types: # Check for 2+ dimensional data if any(len(tensor.shape) > 1 for tensor in node.data): print_stderr('Warning: parmaters not reshaped for node: {}'.format(node)) continue transpose_order = self.map(node.kind) weights = node.data[0] if (node.kind == NodeKind.InnerProduct) and self.has_spatial_parent(node): # The FC layer connected to the spatial layer needs to be # re-wired to match the new spatial ordering. in_shape = node.get_only_parent().output_shape fc_shape = weights.shape output_channels = fc_shape[0] weights = weights.reshape((output_channels, in_shape.channels, in_shape.height, in_shape.width)) weights = weights.transpose(self.map(NodeKind.Convolution)) node.reshaped_data = weights.reshape(fc_shape[transpose_order[0]], fc_shape[transpose_order[1]]) else: node.reshaped_data = weights.transpose(transpose_order) if self.replace: for node in graph.nodes: if hasattr(node, 'reshaped_data'): # Set the weights node.data[0] = node.reshaped_data del node.reshaped_data return graph class SubNodeFuser(object): ''' An abstract helper for merging a single-child with its single-parent. ''' def __call__(self, graph): nodes = graph.nodes fused_nodes = [] for node in nodes: if len(node.parents) != 1: # We're only fusing nodes with single parents continue parent = node.get_only_parent() if len(parent.children) != 1: # We can only fuse a node if its parent's # value isn't used by any other node. continue if not self.is_eligible_pair(parent, node): continue # Rewrite the fused node's children to its parent. for child in node.children: child.parents.remove(node) parent.add_child(child) # Disconnect the fused node from the graph. parent.children.remove(node) fused_nodes.append(node) # Let the sub-class merge the fused node in any arbitrary way. self.merge(parent, node) transformed_nodes = [node for node in nodes if node not in fused_nodes] return graph.replaced(transformed_nodes) def is_eligible_pair(self, parent, child): '''Returns true if this parent/child pair is eligible for fusion.''' raise NotImplementedError('Must be implemented by subclass.') def merge(self, parent, child): '''Merge the child node into the parent.''' raise NotImplementedError('Must be implemented by subclass') class ReLUFuser(SubNodeFuser): ''' Fuses rectified linear units with their parent nodes. ''' def __init__(self, allowed_parent_types=None): # Fuse ReLUs when the parent node is one of the given types. # If None, all node types are eligible. self.allowed_parent_types = allowed_parent_types def is_eligible_pair(self, parent, child): return ((self.allowed_parent_types is None or parent.kind in self.allowed_parent_types) and child.kind == NodeKind.ReLU) def merge(self, parent, _): parent.metadata['relu'] = True class BatchNormScaleBiasFuser(SubNodeFuser): ''' The original batch normalization paper includes two learned parameters: a scaling factor \gamma and a bias \beta. Caffe's implementation does not include these two. However, it is commonly replicated by adding a scaling+bias layer immidiately after the batch norm. This fuser merges the scaling+bias layer with the batch norm. ''' def is_eligible_pair(self, parent, child): return (parent.kind == NodeKind.BatchNorm and child.kind == NodeKind.Scale and child.parameters.axis == 1 and child.parameters.bias_term == True) def merge(self, parent, child): parent.scale_bias_node = child class BatchNormPreprocessor(object): ''' Prescale batch normalization parameters. Concatenate gamma (scale) and beta (bias) terms if set. ''' def __call__(self, graph): for node in graph.nodes: if node.kind != NodeKind.BatchNorm: continue assert node.data is not None assert len(node.data) == 3 mean, variance, scale = node.data # Prescale the stats scaling_factor = 1.0 / scale if scale != 0 else 0 mean *= scaling_factor variance *= scaling_factor # Replace with the updated values node.data = [mean, variance] if hasattr(node, 'scale_bias_node'): # Include the scale and bias terms gamma, beta = node.scale_bias_node.data node.data += [gamma, beta] return graph class NodeRenamer(object): ''' Renames nodes in the graph using a given unary function that accepts a node and returns its new name. ''' def __init__(self, renamer): self.renamer = renamer def __call__(self, graph): for node in graph.nodes: node.name = self.renamer(node) return graph class ParameterNamer(object): ''' Convert layer data arrays to a dictionary mapping parameter names to their values. ''' def __call__(self, graph): for node in graph.nodes: if node.data is None: continue if node.kind in (NodeKind.Convolution, NodeKind.InnerProduct): names = ('weights',) if node.parameters.bias_term: names += ('biases',) elif node.kind == NodeKind.BatchNorm: names = ('mean', 'variance') if len(node.data) == 4: names += ('scale', 'offset') else: print_stderr('WARNING: Unhandled parameters: {}'.format(node.kind)) continue assert len(names) == len(node.data) node.data = dict(zip(names, node.data)) return graph ================================================ FILE: main_fpn.py ================================================ import sys import os import csv import numpy as np import cv2 import math import pose_utils import os import myparse import renderer_fpn ## To make tensorflow print less (this can be useful for debug though) #os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' #import ctypes; print '> loading getRts' import get_Rts as getRts ######## TMP FOLDER ##################### _tmpdir = './tmp/'#os.environ['TMPDIR'] + '/' print '> make dir' if not os.path.exists( _tmpdir): os.makedirs( _tmpdir ) ######################################### ##INPUT/OUTPUT input_file = str(sys.argv[1]) #'input.csv' outpu_proc = 'output_preproc.csv' output_pose_db = './output_pose.lmdb' output_render = './output_render' ################################################# print '> network' _alexNetSize = 227 _factor = 0.25 #0.1 # ***** please download the model in https://www.dropbox.com/s/r38psbq55y2yj4f/fpn_new_model.tar.gz?dl=0 ***** # model_folder = './fpn_new_model/' model_used = 'model_0_1.0_1.0_1e-07_1_16000.ckpt' #'model_0_1.0_1.0_1e-05_0_6000.ckpt' lr_rate_scalar = 1.0 if_dropout = 0 keep_rate = 1 ################################ data_dict = myparse.parse_input(input_file) ## Pre-processing the images print '> preproc' pose_utils.preProcessImage( _tmpdir, data_dict, './',\ _factor, _alexNetSize, outpu_proc ) ## Runnin FacePoseNet print '> run' ## Running the pose estimation getRts.esimatePose( model_folder, outpu_proc, output_pose_db, model_used, lr_rate_scalar, if_dropout, keep_rate, use_gpu=False ) renderer_fpn.render_fpn(outpu_proc, output_pose_db, output_render) ================================================ FILE: main_predict_6DoF.py ================================================ import sys import numpy as np import tensorflow as tf import cv2 import scipy.io as sio sys.path.append('./utils') import pose_utils as pu import os import os.path from glob import glob import time import pickle sys.path.append('./kaffe') sys.path.append('./ResNet') from ThreeDMM_shape import ResNet_101 as resnet101_shape # Global parameters factor = 0.25 _resNetSize = 224 n_hidden1 = 2048 n_hidden2 = 4096 ifdropout = 0 gpuID = int(sys.argv[1]) input_sample_list_path = str(sys.argv[2]) #'./input_list.txt' # You can change to your own image list tf.logging.set_verbosity(tf.logging.INFO) FLAGS = tf.app.flags.FLAGS tf.app.flags.DEFINE_integer('image_size', 224, 'Image side length.') output_path = './output_6DoF' tf.app.flags.DEFINE_string('save_output_path', output_path, 'Directory to keep the checkpoints') tf.app.flags.DEFINE_integer('num_gpus', 1, 'Number of gpus used for training. (0 or 1)') tf.app.flags.DEFINE_integer('batch_size', 1, 'Batch Size') # 60 if not os.path.exists(FLAGS.save_output_path): os.makedirs(FLAGS.save_output_path) def extract_3dmm_pose(): ######################################## # Load train image mean, train label mean and std ######################################## # labels stats on 300W-LP train_label_mean = np.load('./train_stats/train_label_mean_300WLP.npy') train_label_std = np.load('./train_stats/train_label_std_300WLP.npy') Pose_label_mean = train_label_mean[:6] Pose_label_std = train_label_std[:6] #ShapeExpr_label_mean_300WLP = train_label_mean[6:] #ShapeExpr_label_std_300WLP = train_label_std[6:] # Get training image mean from Anh's ShapeNet (CVPR2017) mean_image_shape = np.load('./train_stats/3DMM_shape_mean.npy') # 3 x 224 x 224 train_image_mean = np.transpose(mean_image_shape, [1,2,0]) # 224 x 224 x 3, [0,255] ######################################## # Build CNN graph ######################################## # placeholders for the batches x_img = tf.placeholder(tf.float32, [None, FLAGS.image_size, FLAGS.image_size, 3]) # Resize Image x2 = tf.image.resize_bilinear(x_img, tf.constant([224,224], dtype=tf.int32)) x2 = tf.cast(x2, 'float32') x2 = tf.reshape(x2, [-1, 224, 224, 3]) # Image normalization mean = tf.reshape(train_image_mean, [1, 224, 224, 3]) mean = tf.cast(mean, 'float32') x2 = x2 - mean ######################################## # New-FPN with ResNet structure ######################################## with tf.variable_scope('shapeCNN'): net_shape = resnet101_shape({'input': x2}, trainable=True) # False: Freeze the ResNet Layers pool5 = net_shape.layers['pool5'] pool5 = tf.squeeze(pool5) pool5 = tf.reshape(pool5, [1, 2048]) print pool5.get_shape() # batch_size x 2048 with tf.variable_scope('Pose'): with tf.variable_scope('fc1'): fc1W = tf.Variable(tf.random_normal(tf.stack([pool5.get_shape()[1].value, n_hidden1]), mean=0.0, stddev=0.01), trainable=True, name='W') fc1b = tf.Variable(tf.zeros([n_hidden1]), trainable=True, name='baises') fc1 = tf.nn.relu_layer(tf.reshape(pool5, [-1, int(np.prod(pool5.get_shape()[1:]))]), fc1W, fc1b, name='fc1') print "\nfc1 shape:" print fc1.get_shape(), fc1W.get_shape(), fc1b.get_shape() # (batch_size, 4096) (2048, 4096) (4096,) if ifdropout == 1: fc1 = tf.nn.dropout(fc1, prob, name='fc1_dropout') with tf.variable_scope('fc2'): fc2W = tf.Variable(tf.random_normal([n_hidden1, n_hidden2], mean=0.0, stddev=0.01), trainable=True, name='W') fc2b = tf.Variable(tf.zeros([n_hidden2]), trainable=True, name='baises') fc2 = tf.nn.relu_layer(fc1, fc2W, fc2b, name='fc2') print fc2.get_shape(), fc2W.get_shape(), fc2b.get_shape() # (batch_size, 29 (2048, 2048) (2048,) if ifdropout == 1: fc2 = tf.nn.dropout(fc2, prob, name='fc2_dropout') with tf.variable_scope('fc3'): # Move everything into depth so we can perform a single matrix multiplication. fc2 = tf.reshape(fc2, [FLAGS.batch_size, -1]) dim = fc2.get_shape()[1].value print "\nfc2 dim:" print fc2.get_shape(), dim fc3W = tf.Variable(tf.random_normal(tf.stack([dim,6]), mean=0.0, stddev=0.01), trainable=True, name='W') fc3b = tf.Variable(tf.zeros([6]), trainable=True, name='baises') #print "*** label shape: " + str(len(train_label_mean)) Pose_params_ZNorm = tf.nn.xw_plus_b(fc2, fc3W, fc3b) print "\nfc3 shape:" print Pose_params_ZNorm.get_shape(), fc3W.get_shape(), fc3b.get_shape() Pose_label_mean = tf.cast(tf.reshape(Pose_label_mean, [1, -1]), 'float32') Pose_label_std = tf.cast(tf.reshape(Pose_label_std, [1, -1]), 'float32') Pose_params = Pose_params_ZNorm * (Pose_label_std + 0.000000000000000001) + Pose_label_mean ######################################## # Start extracting 3dmm pose ######################################## init_op = tf.global_variables_initializer() saver = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)) saver_ini_shape_net = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='shapeCNN')) saver_shapeCNN = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='shapeCNN')) saver_Pose = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='Pose')) config = tf.ConfigProto(allow_soft_placement=True) #, log_device_placement=True) #config.gpu_options.per_process_gpu_memory_fraction = 0.5 config.gpu_options.allow_growth = True with tf.Session(config=config) as sess: sess.run(init_op) start_time = time.time() # For non-trainable parameters such as the parameters for batch normalization load_path = "./models/ini_shapeNet_model_L7L_trainable.ckpt" saver_ini_shape_net.restore(sess, load_path) # For other trainable parameters load_path = "./models/model_0.0001_1_18_0.0_2048_4096.ckpt" saver_shapeCNN.restore(sess, load_path) saver_Pose.restore(sess, load_path) load_model_time = time.time() - start_time print("Model restored: " + str(load_model_time)) with open(input_sample_list_path, 'r') as fin: for line in fin: curr_line = line.strip().split(',') image_path = curr_line[0] bbox = np.array([float(curr_line[1]), float(curr_line[2]), float(curr_line[3]), float(curr_line[4])]) # [lt_x, lt_y, w, h] image_key = image_path.split('/')[-1][:-4] image = cv2.imread(image_path,1) # BGR image = np.asarray(image) # Fix the grey image if len(image.shape) < 3: image_r = np.reshape(image, (image.shape[0], image.shape[1], 1)) image = np.append(image_r, image_r, axis=2) image = np.append(image, image_r, axis=2) # Crop and expand (25%) the image based on the tight bbox (from the face detector or detected lmks) factor = [1.9255, 2.2591, 1.9423, 1.6087]; img_new = pu.preProcessImage_v2(image.copy(), bbox.copy(), factor, _resNetSize, 1) image_array = np.reshape(img_new, [1, _resNetSize, _resNetSize, 3]) (params_pose, pool5_feats) = sess.run([Pose_params, pool5], feed_dict={x_img: image_array}) # [scale, pitch, yaw, roll, translation_x, translation_y] params_pose = params_pose[0] print params_pose #, pool5_feats # save the predicted pose with open(FLAGS.save_output_path + '/' + image_key + '.txt', 'w') as fout: for pp in params_pose: fout.write(str(pp) + '\n') # Convert the 6DoF predicted pose to 3x4 projection matrix (weak-perspective projection) # Load BFM model shape_mat = sio.loadmat('./BFM/Model_Shape.mat') mu_shape = shape_mat['mu_shape'].astype('float32') expr_mat = sio.loadmat('./BFM/Model_Exp.mat') mu_exp = expr_mat['mu_exp'].astype('float32') mu = mu_shape + mu_exp len_mu = len(mu) mu = np.reshape(mu, [-1,1]) keypoints = np.reshape(shape_mat['keypoints'], [-1]) - 1 # -1 for python index keypoints = keypoints.astype('int32') vertex = np.reshape(mu, [len_mu/3, 3]) # # of vertices x 3 # mean shape mesh = vertex.T # 3 x # of vertices mesh_1 = np.concatenate([mesh, np.ones([1,len_mu/3])], axis=0) # 4 x # of vertices # Get projection matrix from 6DoF pose scale, pitch, yaw, roll, tx, ty = params_pose R = pu.RotationMatrix(pitch, yaw, roll) ProjMat = np.zeros([3,4]) ProjMat[:,:3] = scale * R ProjMat[:,3] = np.array([tx,ty,0]) # Get predicted shape #print ProjMat, ProjMat.shape #print mesh_1, mesh_1.shape pred_shape = np.matmul(ProjMat, mesh_1) # 3 x # of vertices pred_shape = pred_shape.T # # of vertices x 3 pred_shape_x = np.reshape(pred_shape[:,0], [len_mu/3, 1]) pred_shape_z = np.reshape(pred_shape[:,2], [len_mu/3, 1]) pred_shape_y = 224 + 1 - pred_shape[:,1] pred_shape_y = np.reshape(pred_shape_y, [len_mu/3, 1]) pred_shape = np.concatenate([pred_shape_x, pred_shape_y, pred_shape_z], 1) # Convert shape and lmks back to the original image scale _, bbox_new, _, lmks_filling, old_h, old_w, img_new = pu.resize_crop_rescaleCASIA(image.copy(), bbox.copy(), pred_shape.copy(), factor) #print lmks_filling pred_shape[:,0] = pred_shape[:,0] * old_w / 224. pred_shape[:,1] = pred_shape[:,1] * old_h / 224. pred_shape[:,0] = pred_shape[:,0] + bbox_new[0] pred_shape[:,1] = pred_shape[:,1] + bbox_new[1] # Get predicted lmks pred_lmks = pred_shape[keypoints] sio.savemat(FLAGS.save_output_path + '/' + image_key + '.mat', {'shape_3D': pred_shape, 'lmks_3D': pred_lmks}) #cv2.imwrite(FLAGS.save_output_path + '/' + image_key + '.jpg', img_new) def main(_): os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"]=str(gpuID) if FLAGS.num_gpus == 0: dev = '/cpu:0' elif FLAGS.num_gpus == 1: dev = '/gpu:0' else: raise ValueError('Only support 0 or 1 gpu.') print dev with tf.device(dev): extract_3dmm_pose() if __name__ == '__main__': tf.app.run() ================================================ FILE: main_predict_ProjMat.py ================================================ import sys import numpy as np import tensorflow as tf import cv2 import scipy.io as sio sys.path.append('./utils') import pose_utils as pu import os import os.path from glob import glob import time import pickle sys.path.append('./kaffe') sys.path.append('./ResNet') from ThreeDMM_shape import ResNet_101 as resnet101_shape # Global parameters factor = 0.25 _resNetSize = 224 n_hidden1 = 2048 n_hidden2 = 4096 ifdropout = 0 gpuID = int(sys.argv[1]) input_sample_list_path = str(sys.argv[2]) #'./input_list.txt' # You can change to your own image list tf.logging.set_verbosity(tf.logging.INFO) FLAGS = tf.app.flags.FLAGS tf.app.flags.DEFINE_integer('image_size', 224, 'Image side length.') output_path = './output_ProjMat' tf.app.flags.DEFINE_string('save_output_path', output_path, 'Directory to keep the checkpoints') tf.app.flags.DEFINE_integer('num_gpus', 1, 'Number of gpus used for training. (0 or 1)') tf.app.flags.DEFINE_integer('batch_size', 1, 'Batch Size') # 60 if not os.path.exists(FLAGS.save_output_path): os.makedirs(FLAGS.save_output_path) def extract_3dmm_ProjMat(): ######################################## # Load train image mean, train label mean and std ######################################## # labels stats on 300W-LP train_label_mean = np.load('./train_stats/train_label_mean_ProjMat.npy') train_label_std = np.load('./train_stats/train_label_std_ProjMat.npy') ProjMat_label_mean = train_label_mean[-12:-1] ProjMat_label_std = train_label_std[-12:-1] # Get training image mean from Anh's ShapeNet (CVPR2017) mean_image_shape = np.load('./train_stats/3DMM_shape_mean.npy') # 3 x 224 x 224 train_image_mean = np.transpose(mean_image_shape, [1,2,0]) # 224 x 224 x 3, [0,255] ######################################## # Build CNN graph ######################################## # placeholders for the batches x_img = tf.placeholder(tf.float32, [None, FLAGS.image_size, FLAGS.image_size, 3]) # Resize Image x2 = tf.image.resize_bilinear(x_img, tf.constant([224,224], dtype=tf.int32)) x2 = tf.cast(x2, 'float32') x2 = tf.reshape(x2, [-1, 224, 224, 3]) # Image normalization mean = tf.reshape(train_image_mean, [1, 224, 224, 3]) mean = tf.cast(mean, 'float32') x2 = x2 - mean ######################################## # New-FPN with ResNet structure ######################################## with tf.variable_scope('shapeCNN'): net_shape = resnet101_shape({'input': x2}, trainable=True) # False: Freeze the ResNet Layers pool5 = net_shape.layers['pool5'] pool5 = tf.squeeze(pool5) pool5 = tf.reshape(pool5, [1, 2048]) print pool5.get_shape() # batch_size x 2048 with tf.variable_scope('Pose'): with tf.variable_scope('fc1'): fc1W = tf.Variable(tf.random_normal(tf.stack([pool5.get_shape()[1].value, n_hidden1]), mean=0.0, stddev=0.01), trainable=True, name='W') fc1b = tf.Variable(tf.zeros([n_hidden1]), trainable=True, name='baises') fc1 = tf.nn.relu_layer(tf.reshape(pool5, [-1, int(np.prod(pool5.get_shape()[1:]))]), fc1W, fc1b, name='fc1') print "\nfc1 shape:" print fc1.get_shape(), fc1W.get_shape(), fc1b.get_shape() # (batch_size, 4096) (2048, 4096) (4096,) if ifdropout == 1: fc1 = tf.nn.dropout(fc1, prob, name='fc1_dropout') with tf.variable_scope('fc2'): fc2W = tf.Variable(tf.random_normal([n_hidden1, n_hidden2], mean=0.0, stddev=0.01), trainable=True, name='W') fc2b = tf.Variable(tf.zeros([n_hidden2]), trainable=True, name='baises') fc2 = tf.nn.relu_layer(fc1, fc2W, fc2b, name='fc2') print fc2.get_shape(), fc2W.get_shape(), fc2b.get_shape() # (batch_size, 29 (2048, 2048) (2048,) if ifdropout == 1: fc2 = tf.nn.dropout(fc2, prob, name='fc2_dropout') with tf.variable_scope('fc3'): # Move everything into depth so we can perform a single matrix multiplication. fc2 = tf.reshape(fc2, [FLAGS.batch_size, -1]) dim = fc2.get_shape()[1].value print "\nfc2 dim:" print fc2.get_shape(), dim fc3W = tf.Variable(tf.random_normal(tf.stack([dim,11]), mean=0.0, stddev=0.01), trainable=True, name='W') fc3b = tf.Variable(tf.zeros([11]), trainable=True, name='baises') #print "*** label shape: " + str(len(train_label_mean)) ProjMat_preds_ZNorm = tf.nn.xw_plus_b(fc2, fc3W, fc3b) print "\nfc3 shape:" print ProjMat_preds_ZNorm.get_shape(), fc3W.get_shape(), fc3b.get_shape() label_mean = tf.cast(tf.reshape(ProjMat_label_mean, [1, -1]), 'float32') label_std = tf.cast(tf.reshape(ProjMat_label_std, [1, -1]), 'float32') ProjMat_preds = ProjMat_preds_ZNorm * (label_std + 0.000000000000000001) + label_mean ProjMat_preds = tf.concat([ProjMat_preds, tf.zeros([FLAGS.batch_size,1])], 1) ######################################## # Start extracting 3dmm pose ######################################## init_op = tf.global_variables_initializer() saver = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)) saver_ini_shape_net = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES, scope='shapeCNN')) saver_shapeCNN = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='shapeCNN')) saver_Pose = tf.train.Saver(var_list=tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='Pose')) config = tf.ConfigProto(allow_soft_placement=True) #, log_device_placement=True) #config.gpu_options.per_process_gpu_memory_fraction = 0.5 config.gpu_options.allow_growth = True with tf.Session(config=config) as sess: sess.run(init_op) start_time = time.time() load_path = "./models/ini_shapeNet_model_L7L_trainable.ckpt" saver_ini_shape_net.restore(sess, load_path) load_path = "./models/model_0.0001_1_18_0.0_2048_4096.ckpt" saver_shapeCNN.restore(sess, load_path) load_path = "./models/model_iniLR_0.001_wProjMat_1.0_wLmks_10.0_wd_0.0_do_1_122.ckpt" saver_Pose.restore(sess, load_path) load_model_time = time.time() - start_time print("Model restored: " + str(load_model_time)) with open(input_sample_list_path, 'r') as fin: for line in fin: curr_line = line.strip().split(',') image_path = curr_line[0] bbox = np.array([float(curr_line[1]), float(curr_line[2]), float(curr_line[3]), float(curr_line[4])]) # [lt_x, lt_y, w, h] image_key = image_path.split('/')[-1][:-4] image = cv2.imread(image_path,1) # BGR image = np.asarray(image) # Fix the grey image if len(image.shape) < 3: image_r = np.reshape(image, (image.shape[0], image.shape[1], 1)) image = np.append(image_r, image_r, axis=2) image = np.append(image, image_r, axis=2) # Crop and expand (25%) the image based on the tight bbox (from the face detector or detected lmks) factor = [1.9255, 2.2591, 1.9423, 1.6087]; img_new = pu.preProcessImage_v2(image.copy(), bbox.copy(), factor, _resNetSize, 1) image_array = np.reshape(img_new, [1, _resNetSize, _resNetSize, 3]) #print image_array (params_ProjMat, pool5_feats) = sess.run([ProjMat_preds, pool5], feed_dict={x_img: image_array}) # [scale, pitch, yaw, roll, translation_x, translation_y] params_ProjMat = params_ProjMat[0] #print params_ProjMat, pool5_feats # save the predicted pose with open(FLAGS.save_output_path + '/' + image_key + '.txt', 'w') as fout: for pp in params_ProjMat: fout.write(str(pp) + '\n') # Convert the 6DoF predicted pose to 3x4 projection matrix (weak-perspective projection) # Load BFM model shape_mat = sio.loadmat('./BFM/Model_Shape.mat') mu_shape = shape_mat['mu_shape'].astype('float32') expr_mat = sio.loadmat('./BFM/Model_Exp.mat') mu_exp = expr_mat['mu_exp'].astype('float32') mu = mu_shape + mu_exp len_mu = len(mu) mu = np.reshape(mu, [-1,1]) keypoints = np.reshape(shape_mat['keypoints'], [-1]) - 1 # -1 for python index keypoints = keypoints.astype('int32') vertex = np.reshape(mu, [len_mu/3, 3]) # # of vertices x 3 # mean shape mesh = vertex.T # 3 x # of vertices mesh_1 = np.concatenate([mesh, np.ones([1,len_mu/3])], axis=0) # 4 x # of vertices # Get projection matrix from 6DoF pose ProjMat = np.reshape(params_ProjMat, [4,3]) ProjMat = ProjMat.T # Get predicted shape #print ProjMat, ProjMat.shape #print mesh_1, mesh_1.shape pred_shape = np.matmul(ProjMat, mesh_1) # 3 x # of vertices pred_shape = pred_shape.T # # of vertices x 3 pred_shape_x = np.reshape(pred_shape[:,0], [len_mu/3, 1]) pred_shape_z = np.reshape(pred_shape[:,2], [len_mu/3, 1]) pred_shape_y = 224 + 1 - pred_shape[:,1] pred_shape_y = np.reshape(pred_shape_y, [len_mu/3, 1]) pred_shape = np.concatenate([pred_shape_x, pred_shape_y, pred_shape_z], 1) # Convert shape and lmks back to the original image scale _, bbox_new, _, _, old_h, old_w, _ = pu.resize_crop_rescaleCASIA(image.copy(), bbox.copy(), pred_shape.copy(), factor) pred_shape[:,0] = pred_shape[:,0] * old_w / 224. pred_shape[:,1] = pred_shape[:,1] * old_h / 224. pred_shape[:,0] = pred_shape[:,0] + bbox_new[0] pred_shape[:,1] = pred_shape[:,1] + bbox_new[1] # Get predicted lmks pred_lmks = pred_shape[keypoints] sio.savemat(FLAGS.save_output_path + '/' + image_key + '.mat', {'shape_3D': pred_shape, 'lmks_3D': pred_lmks}) # Obtain pose from ProjMat scale,R,t3d = pu.P2sRt(ProjMat) # decompose affine matrix to s, R, t pose = pu.matrix2angle(R) # yaw, pitch, roll # print scale, pitch, yaw , roll, translation_x, translation_y print scale, pose[1], pose[0], pose[2], t3d[0], t3d[1] def main(_): os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"]=str(gpuID) if FLAGS.num_gpus == 0: dev = '/cpu:0' elif FLAGS.num_gpus == 1: dev = '/gpu:0' else: raise ValueError('Only support 0 or 1 gpu.') print dev with tf.device(dev): extract_3dmm_ProjMat() if __name__ == '__main__': tf.app.run() ================================================ FILE: models/README ================================================ Please download all the model files and put them in this folder ================================================ FILE: myparse.py ================================================ import csv def parse_input(input_file): data_dict = dict() reader = csv.DictReader(open(input_file,'r')) #### Reading the metadata into a DICT for line in reader: key = line['ID'] data_dict[key] = {'file' : line['FILE'] ,\ 'x' : float( line['FACE_X'] ),\ 'y' : float( line['FACE_Y'] ),\ 'width' : float( line['FACE_WIDTH'] ),\ 'height' : float( line['FACE_HEIGHT'] ),\ } return data_dict ================================================ FILE: output_render/README.md ================================================ The rendered images will be saved here! ## Subject 1 ## ### input: ### ![sbj1](../images/input1.jpg) ### rendering: ### ![sbj1](./subject1/subject1_a_rendered_aug_-00_00_10.jpg) ![sbj1](./subject1/subject1_a_rendered_aug_-22_00_10.jpg) ![sbj1](./subject1/subject1_a_rendered_aug_-40_00_10.jpg) ![sbj1](./subject1/subject1_a_rendered_aug_-55_00_10.jpg) ![sbj1](./subject1/subject1_a_rendered_aug_-75_00_10.jpg) ## Subject 2 ## ### input: ### ![sbj2](../images/input2.jpg) ### rendering: ### ![sbj2](./subject2/subject2_a_rendered_aug_-40_00_10.jpg) ![sbj2](./subject2/subject2_a_rendered_aug_-55_00_10.jpg) ![sbj2](./subject2/subject2_a_rendered_aug_-75_00_10.jpg) ## Subject 3 ## ### input: ### ![sbj3](../images/input3.jpg) ### rendering: ### ![sbj3](./subject3/subject3_a_rendered_aug_-40_00_10.jpg) ![sbj3](./subject3/subject3_a_rendered_aug_-55_00_10.jpg) ![sbj3](./subject3/subject3_a_rendered_aug_-75_00_10.jpg) ## Subject 4 ## ### input: ### ![sbj4](../images/input4.jpg) ### rendering: ### ![sbj4](./subject4/subject4_a_rendered_aug_-40_00_10.jpg) ![sbj4](./subject4/subject4_a_rendered_aug_-55_00_10.jpg) ![sbj4](./subject4/subject4_a_rendered_aug_-75_00_10.jpg) ## Subject 5 ## ## input: ## ### input: ### ![sbj5](../images/input5.jpg) ### rendering: ### ![sbj5](./subject5/subject5_a_rendered_aug_-40_00_10.jpg) ![sbj5](./subject5/subject5_a_rendered_aug_-55_00_10.jpg) ![sbj5](./subject5/subject5_a_rendered_aug_-75_00_10.jpg) ## Subject 6 ## ### input: ### ![sbj6](../images/input6.jpg) ### rendering: ### ![sbj6](./subject6/subject6_a_rendered_aug_-40_00_10.jpg) ![sbj6](./subject6/subject6_a_rendered_aug_-55_00_10.jpg) ![sbj6](./subject6/subject6_a_rendered_aug_-75_00_10.jpg) ## Subject 7 ## ### input: ### ![sbj7](../images/input7.jpg) ### rendering: ### ![sbj7](./subject7/subject7_a_rendered_aug_-00_00_10.jpg) ![sbj7](./subject7/subject7_a_rendered_aug_-22_00_10.jpg) ![sbj7](./subject7/subject7_a_rendered_aug_-40_00_10.jpg) ![sbj7](./subject7/subject7_a_rendered_aug_-55_00_10.jpg) ![sbj7](./subject7/subject7_a_rendered_aug_-75_00_10.jpg) ## Subject 8 ## ### input: ### ![sbj8](../images/input8.jpg) ### rendering: ### ![sbj8](./subject8/subject8_a_rendered_aug_-40_00_10.jpg) ![sbj8](./subject8/subject8_a_rendered_aug_-55_00_10.jpg) ![sbj8](./subject8/subject8_a_rendered_aug_-75_00_10.jpg) ## Subject 9 ## ### input: ### ![sbj9](../images/input9.jpg) ### rendering: ### ![sbj9](./subject9/subject9_a_rendered_aug_-40_00_10.jpg) ![sbj9](./subject9/subject9_a_rendered_aug_-55_00_10.jpg) ![sbj9](./subject9/subject9_a_rendered_aug_-75_00_10.jpg) ## Subject 10 ## ### input: ### ![sbj10](../images/input10.jpg) ### rendering: ### ![sbj10](./subject10/subject10_a_rendered_aug_-00_00_10.jpg) ![sbj10](./subject10/subject10_a_rendered_aug_-22_00_10.jpg) ![sbj10](./subject10/subject10_a_rendered_aug_-40_00_10.jpg) ![sbj10](./subject10/subject10_a_rendered_aug_-55_00_10.jpg) ![sbj10](./subject10/subject10_a_rendered_aug_-75_00_10.jpg) ================================================ FILE: pose_model.py ================================================ # Copyright 2016 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """ResNet model. Related papers: https://arxiv.org/pdf/1603.05027v2.pdf https://arxiv.org/pdf/1512.03385v1.pdf https://arxiv.org/pdf/1605.07146v1.pdf """ import numpy as np import tensorflow as tf from tensorflow.python.training import moving_averages #import sys #sys.path.append('/staging/pn/fengjuch/transformer') #from spatial_transformer import transformer #from tf_utils import weight_variable, bias_variable, dense_to_one_hot """ HParams = namedtuple('HParams', 'batch_size, num_classes, min_lrn_rate, lrn_rate, ' 'num_residual_units, use_bottleneck, weight_decay_rate, ' 'relu_leakiness, optimizer') """ class ThreeD_Pose_Estimation(object): """ResNet model.""" def __init__(self, images, labels, mode, ifdropout, keep_rate_fc6, keep_rate_fc7, lr_rate_fac, net_data, batch_size, mean_labels, std_labels): """ResNet constructor. Args: hps: Hyperparameters. images: Batches of images. [batch_size, image_size, image_size, 3] labels: Batches of labels. [batch_size, num_classes] mode: One of 'train' and 'eval'. """ #self.hps = hps self.batch_size = batch_size self._images = images self.labels = labels self.mode = mode self.ifdropout = ifdropout self.keep_rate_fc6 = keep_rate_fc6 self.keep_rate_fc7 = keep_rate_fc7 self.ifadd_weight_decay = 0 #ifadd_weight_decay self.net_data = net_data self.lr_rate_fac = lr_rate_fac self._extra_train_ops = [] self.optimizer = 'Adam' self.mean_labels = mean_labels self.std_labels = std_labels #self.train_mean_vec = train_mean_vec def _build_graph(self): """Build a whole graph for the model.""" self.global_step = tf.Variable(0, name='global_step', trainable=False) self._build_model() if self.mode == 'train': self._build_train_op() #self.summaries = tf.merge_all_summaries() def _stride_arr(self, stride): """Map a stride scalar to the stride array for tf.nn.conv2d.""" return [1, stride, stride, 1] def _build_model(self): """Build the core model within the graph.""" #with tf.variable_scope('init'): # x = self._images # print x, x.get_shape() # x = self._conv('init_conv', x, 3, 3, 16, self._stride_arr(1)) # print x, x.get_shape() with tf.variable_scope('Spatial_Transformer'): x = self._images x = tf.image.resize_bilinear(x, tf.constant([227,227], dtype=tf.int32)) # the image should be 227 x 227 x 3 print x.get_shape() self.resized_img = x theta = self._ST('ST2', x, 3, (16,16), 3, 16, self._stride_arr(1)) #print "*** ", x.get_shape() #with tf.variable_scope('logit'): # logits = self._fully_connected(theta, self.hps.num_classes) # self.predictions = tf.nn.softmax(logits) #print "*** ", logits, self.predictions with tf.variable_scope('costs'): self.predictions = theta self.preds_unNormalized = theta * (self.std_labels + 0.000000000000000001) + self.mean_labels pred_dim1 = theta.get_shape()[0] pred_dim2 = theta.get_shape()[1] del theta #diff = self.predictions - self.labels #print diff #xent = tf.mul(diff, diff) #tf.nn.l2_loss(diff) #print xent #xent = tf.reduce_sum(xent, 1) pow_res = tf.pow(self.predictions-self.labels, 2) """ print pow_res, pow_res.get_shape() const1 = tf.constant(1.0,shape=[pred_dim1, 3],dtype=tf.float32) const2 = tf.constant(1.0,shape=[pred_dim1, 3],dtype=tf.float32) #print const1, const2, const1.get_shape(), const2.get_shape() const = tf.concat(1,[const1, const2]) print const, const.get_shape() cpow_res = tf.mul(const,pow_res) xent = tf.reduce_sum(cpow_res,1) print xent """ xent = tf.reduce_sum(pow_res,1) self.cost = tf.reduce_mean(xent, name='xent') #print self.cost #self.cost = tf.nn.l2_loss(diff) # Add weight decay of needed if self.ifadd_weight_decay == 1: self.cost += self._decay() #self.train_step = tf.train.GradientDescentOptimizer(self.hps.lrn_rate).minimize(self.cost) #tf.scalar_summary('cost', self.cost) def conv(self, input, kernel, biases, k_h, k_w, c_o, s_h, s_w, padding="VALID", group=1): '''From https://github.com/ethereon/caffe-tensorflow ''' c_i = input.get_shape()[-1] assert c_i%group==0 assert c_o%group==0 convolve = lambda i, k: tf.nn.conv2d(i, k, [1, s_h, s_w, 1], padding=padding) if group==1: conv = convolve(input, kernel) else: #input_groups = tf.split(3, group, input) #kernel_groups = tf.split(3, group, kernel) input_groups = tf.split(input, group, 3) kernel_groups = tf.split(kernel, group, 3) output_groups = [convolve(i, k) for i,k in zip(input_groups, kernel_groups)] #conv = tf.concat(3, output_groups) conv = tf.concat(output_groups, 3) return tf.reshape(tf.nn.bias_add(conv, biases), [-1]+conv.get_shape().as_list()[1:]) def _ST(self, name, x, channel_x, out_size, filter_size, out_filters, strides): """ Spatial Transformer. """ with tf.variable_scope(name): # zero-mean input [B,G,R]: [93.5940, 104.7624, 129.1863] --> provided by vgg-face """ with tf.name_scope('preprocess') as scope: mean = tf.constant(tf.reshape(self.train_mean_vec*255.0, [3]), dtype=tf.float32, shape=[1, 1, 1, 3], name='img_mean') x = x - mean """ # conv1 with tf.name_scope('conv1') as scope: #conv(11, 11, 96, 4, 4, padding='VALID', name='conv1') k_h = 11; k_w = 11; c_o = 96; s_h = 4; s_w = 4 conv1W = tf.Variable(self.net_data["conv1"]["weights"], trainable=True, name='W') conv1b = tf.Variable(self.net_data["conv1"]["biases"], trainable=True, name='baises') conv1_in = self.conv(x, conv1W, conv1b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=1) conv1 = tf.nn.relu(conv1_in, name='conv1') print x.get_shape(), conv1.get_shape() #maxpool1 #max_pool(3, 3, 2, 2, padding='VALID', name='pool1') k_h = 3; k_w = 3; s_h = 2; s_w = 2; padding = 'VALID' maxpool1 = tf.nn.max_pool(conv1, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding, name='pool1') print maxpool1.get_shape() #lrn1 #lrn(2, 2e-05, 0.75, name='norm1') radius = 2; alpha = 2e-05; beta = 0.75; bias = 1.0 lrn1 = tf.nn.local_response_normalization(maxpool1, depth_radius=radius, alpha=alpha, beta=beta, bias=bias, name='norm1') # conv2 with tf.name_scope('conv2') as scope: #conv(5, 5, 256, 1, 1, group=2, name='conv2') k_h = 5; k_w = 5; c_o = 256; s_h = 1; s_w = 1; group = 2 conv2W = tf.Variable(self.net_data["conv2"]["weights"], trainable=True, name='W') conv2b = tf.Variable(self.net_data["conv2"]["biases"], trainable=True, name='baises') conv2_in = self.conv(lrn1, conv2W, conv2b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group) conv2 = tf.nn.relu(conv2_in, name='conv2') print conv2.get_shape() #maxpool2 #max_pool(3, 3, 2, 2, padding='VALID', name='pool2') k_h = 3; k_w = 3; s_h = 2; s_w = 2; padding = 'VALID' maxpool2 = tf.nn.max_pool(conv2, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding, name='pool2') print maxpool2.get_shape() #lrn2 #lrn(2, 2e-05, 0.75, name='norm2') radius = 2; alpha = 2e-05; beta = 0.75; bias = 1.0 lrn2 = tf.nn.local_response_normalization(maxpool2, depth_radius=radius, alpha=alpha, beta=beta, bias=bias, name='norm2') # conv3 with tf.name_scope('conv3') as scope: #conv(3, 3, 384, 1, 1, name='conv3') k_h = 3; k_w = 3; c_o = 384; s_h = 1; s_w = 1; group = 1 conv3W = tf.Variable(self.net_data["conv3"]["weights"], trainable=True, name='W') conv3b = tf.Variable(self.net_data["conv3"]["biases"], trainable=True, name='baises') conv3_in = self.conv(lrn2, conv3W, conv3b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group) conv3 = tf.nn.relu(conv3_in, name='conv3') print conv3.get_shape() # conv4 with tf.name_scope('conv4') as scope: #conv(3, 3, 384, 1, 1, group=2, name='conv4') k_h = 3; k_w = 3; c_o = 384; s_h = 1; s_w = 1; group = 2 conv4W = tf.Variable(self.net_data["conv4"]["weights"], trainable=True, name='W') conv4b = tf.Variable(self.net_data["conv4"]["biases"], trainable=True, name='baises') conv4_in = self.conv(conv3, conv4W, conv4b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group) conv4 = tf.nn.relu(conv4_in, name='conv4') print conv4.get_shape() # conv5 with tf.name_scope('conv5') as scope: #conv(3, 3, 256, 1, 1, group=2, name='conv5') k_h = 3; k_w = 3; c_o = 256; s_h = 1; s_w = 1; group = 2 conv5W = tf.Variable(self.net_data["conv5"]["weights"], trainable=True, name='W') conv5b = tf.Variable(self.net_data["conv5"]["biases"], trainable=True, name='baises') self.conv5b = conv5b conv5_in = self.conv(conv4, conv5W, conv5b, k_h, k_w, c_o, s_h, s_w, padding="SAME", group=group) conv5 = tf.nn.relu(conv5_in, name='conv5') print conv5.get_shape() #maxpool5 #max_pool(3, 3, 2, 2, padding='VALID', name='pool5') k_h = 3; k_w = 3; s_h = 2; s_w = 2; padding = 'VALID' maxpool5 = tf.nn.max_pool(conv5, ksize=[1, k_h, k_w, 1], strides=[1, s_h, s_w, 1], padding=padding, name='pool5') print maxpool5.get_shape(), maxpool5.get_shape()[1:], int(np.prod(maxpool5.get_shape()[1:])) # fc6 with tf.variable_scope('fc6') as scope: #fc(4096, name='fc6') fc6W = tf.Variable(self.net_data["fc6"]["weights"], trainable=True, name='W') fc6b = tf.Variable(self.net_data["fc6"]["biases"], trainable=True, name='baises') self.fc6W = fc6W self.fc6b = fc6b fc6 = tf.nn.relu_layer(tf.reshape(maxpool5, [-1, int(np.prod(maxpool5.get_shape()[1:]))]), fc6W, fc6b, name='fc6') print fc6.get_shape() if self.ifdropout == 1: fc6 = tf.nn.dropout(fc6, self.keep_rate_fc6, name='fc6_dropout') # fc7 with tf.variable_scope('fc7') as scope: #fc(4096, name='fc7') fc7W = tf.Variable(self.net_data["fc7"]["weights"], trainable=True, name='W') fc7b = tf.Variable(self.net_data["fc7"]["biases"], trainable=True, name='baises') self.fc7b = fc7b fc7 = tf.nn.relu_layer(fc6, fc7W, fc7b, name='fc7') print fc7.get_shape() if self.ifdropout == 1: fc7 = tf.nn.dropout(fc7, self.keep_rate_fc7, name='fc7_dropout') # fc8 with tf.variable_scope('fc8') as scope: """ #fc(6, relu=False, name='fc8') fc8W = tf.Variable(net_data["fc8"][0]) fc8b = tf.Variable(net_data["fc8"][1]) fc8 = tf.nn.xw_plus_b(fc7, fc8W, fc8b) """ # Move everything into depth so we can perform a single matrix multiplication. fc7 = tf.reshape(fc7, [self.batch_size, -1]) dim = fc7.get_shape()[1].value #print "fc7 dim:\n" #print fc7.get_shape(), dim fc8W = tf.Variable(tf.random_normal([dim, 6], mean=0.0, stddev=0.01), trainable=True, name='W') fc8b = tf.Variable(tf.zeros([6]), trainable=True, name='baises') self.fc8b = fc8b theta = tf.nn.xw_plus_b(fc7, fc8W, fc8b) """ weights = self._variable_with_weight_decay('weights', shape=[dim, 6], stddev=0.04, wd=None) #wd=0.004) biases = self._variable_on_cpu('biases', [6], tf.constant_initializer(0.1)) theta = tf.matmul(reshape, weights) + biases print theta.get_shape() """ self.theta = theta self.fc8W = fc8W self.fc8b = fc8b # %% We'll create a spatial transformer module to identify discriminative # %% patches #h_trans = self._transform(theta, x, out_size, channel_x) #print h_trans.get_shape() return theta def _variable_with_weight_decay(self, name, shape, stddev, wd): """Helper to create an initialized Variable with weight decay. Note that the Variable is initialized with a truncated normal distribution. A weight decay is added only if one is specified. Args: name: name of the variable shape: list of ints stddev: standard deviation of a truncated Gaussian wd: add L2Loss weight decay multiplied by this float. If None, weight decay is not added for this Variable. Returns: Variable Tensor """ dtype = tf.float32 #if FLAGS.use_fp16 else tf.float32 var = self._variable_on_cpu( name, shape, tf.truncated_normal_initializer(stddev=stddev, dtype=dtype)) if wd is not None: weight_decay = tf.mul(tf.nn.l2_loss(var), wd, name='weight_loss') tf.add_to_collection('losses', weight_decay) return var def _variable_on_cpu(self, name, shape, initializer): """Helper to create a Variable stored on CPU memory. Args: name: name of the variable shape: list of ints initializer: initializer for Variable Returns: Variable Tensor """ with tf.device('/cpu:0'): dtype = tf.float32 # if FLAGS.use_fp16 else tf.float32 var = tf.get_variable(name, shape, initializer=initializer, dtype=dtype) return var def _build_train_op(self): """Build training specific ops for the graph.""" #self.lrn_rate = tf.constant(self.hps.lrn_rate, tf.float32) #tf.scalar_summary('learning rate', self.lrn_rate) """ trainable_variables = tf.trainable_variables() grads = tf.gradients(self.cost, trainable_variables) """ if self.optimizer == 'sgd': optimizer = tf.train.GradientDescentOptimizer(self.lrn_rate) elif self.optimizer == 'Adam': optimizer = tf.train.AdamOptimizer(0.001 * self.lr_rate_fac) elif self.optimizer == 'mom': optimizer = tf.train.MomentumOptimizer(self.lrn_rate, 0.9) """ apply_op = optimizer.apply_gradients( zip(grads, trainable_variables), global_step=self.global_step, name='train_step') train_ops = [apply_op] + self._extra_train_ops self.train_op = tf.group(*train_ops) """ self.train_op = optimizer.minimize(self.cost) # TODO(xpan): Consider batch_norm in contrib/layers/python/layers/layers.py def _batch_norm(self, name, x): """Batch normalization.""" with tf.variable_scope(name): params_shape = [x.get_shape()[-1]] #print x.get_shape(), params_shape beta = tf.get_variable( 'beta', params_shape, tf.float32, initializer=tf.constant_initializer(0.0, tf.float32)) gamma = tf.get_variable( 'gamma', params_shape, tf.float32, initializer=tf.constant_initializer(1.0, tf.float32)) if self.mode == 'train': mean, variance = tf.nn.moments(x, [0, 1, 2], name='moments') moving_mean = tf.get_variable( 'moving_mean', params_shape, tf.float32, initializer=tf.constant_initializer(0.0, tf.float32), trainable=False) moving_variance = tf.get_variable( 'moving_variance', params_shape, tf.float32, initializer=tf.constant_initializer(1.0, tf.float32), trainable=False) self._extra_train_ops.append(moving_averages.assign_moving_average( moving_mean, mean, 0.9)) self._extra_train_ops.append(moving_averages.assign_moving_average( moving_variance, variance, 0.9)) else: mean = tf.get_variable( 'moving_mean', params_shape, tf.float32, initializer=tf.constant_initializer(0.0, tf.float32), trainable=False) variance = tf.get_variable( 'moving_variance', params_shape, tf.float32, initializer=tf.constant_initializer(1.0, tf.float32), trainable=False) tf.histogram_summary(mean.op.name, mean) tf.histogram_summary(variance.op.name, variance) # elipson used to be 1e-5. Maybe 0.001 solves NaN problem in deeper net. y = tf.nn.batch_normalization( x, mean, variance, beta, gamma, 0.001) y.set_shape(x.get_shape()) return y def _residual(self, x, in_filter, out_filter, stride, activate_before_residual=False): """Residual unit with 2 sub layers.""" if activate_before_residual: with tf.variable_scope('shared_activation'): x = self._batch_norm('init_bn', x) x = self._relu(x, self.hps.relu_leakiness) orig_x = x else: with tf.variable_scope('residual_only_activation'): orig_x = x x = self._batch_norm('init_bn', x) x = self._relu(x, self.hps.relu_leakiness) with tf.variable_scope('sub1'): x = self._conv('conv1', x, 3, in_filter, out_filter, stride) with tf.variable_scope('sub2'): x = self._batch_norm('bn2', x) x = self._relu(x, self.hps.relu_leakiness) x = self._conv('conv2', x, 3, out_filter, out_filter, [1, 1, 1, 1]) with tf.variable_scope('sub_add'): if in_filter != out_filter: orig_x = tf.nn.avg_pool(orig_x, stride, stride, 'VALID') orig_x = tf.pad( orig_x, [[0, 0], [0, 0], [0, 0], [(out_filter-in_filter)//2, (out_filter-in_filter)//2]]) x += orig_x tf.logging.info('image after unit %s', x.get_shape()) return x def _bottleneck_residual(self, x, in_filter, out_filter, stride, activate_before_residual=False): """Bottleneck resisual unit with 3 sub layers.""" if activate_before_residual: with tf.variable_scope('common_bn_relu'): x = self._batch_norm('init_bn', x) x = self._relu(x, self.hps.relu_leakiness) orig_x = x else: with tf.variable_scope('residual_bn_relu'): orig_x = x x = self._batch_norm('init_bn', x) x = self._relu(x, self.hps.relu_leakiness) with tf.variable_scope('sub1'): x = self._conv('conv1', x, 1, in_filter, out_filter/4, stride) with tf.variable_scope('sub2'): x = self._batch_norm('bn2', x) x = self._relu(x, self.hps.relu_leakiness) x = self._conv('conv2', x, 3, out_filter/4, out_filter/4, [1, 1, 1, 1]) with tf.variable_scope('sub3'): x = self._batch_norm('bn3', x) x = self._relu(x, self.hps.relu_leakiness) x = self._conv('conv3', x, 1, out_filter/4, out_filter, [1, 1, 1, 1]) with tf.variable_scope('sub_add'): if in_filter != out_filter: orig_x = self._conv('project', orig_x, 1, in_filter, out_filter, stride) x += orig_x tf.logging.info('image after unit %s', x.get_shape()) return x def _decay(self): """L2 weight decay loss.""" costs = [] for var in tf.trainable_variables(): if var.op.name.find(r'DW') > 0: costs.append(tf.nn.l2_loss(var)) aaa = tf.nn.l2_loss(var) #print aaa # tf.histogram_summary(var.op.name, var) return tf.mul(self.hps.weight_decay_rate, tf.add_n(costs)) def _conv(self, name, x, filter_size, in_filters, out_filters, strides): """Convolution.""" with tf.variable_scope(name): n = filter_size * filter_size * out_filters kernel = tf.get_variable( 'DW', [filter_size, filter_size, in_filters, out_filters], tf.float32, initializer=tf.random_normal_initializer( stddev=np.sqrt(2.0/n))) return tf.nn.conv2d(x, kernel, strides, padding='SAME') def _relu(self, x, leakiness=0.0): """Relu, with optional leaky support.""" return tf.select(tf.less(x, 0.0), leakiness * x, x, name='leaky_relu') def _fully_connected(self, x, out_dim): """FullyConnected layer for final output.""" x = tf.reshape(x, [self.hps.batch_size, -1]) #print "*** ", x.get_shape() w = tf.get_variable( 'DW', [x.get_shape()[1], out_dim], initializer=tf.uniform_unit_scaling_initializer(factor=1.0)) #print "*** ", w.get_shape() b = tf.get_variable('biases', [out_dim], initializer=tf.constant_initializer()) #print "*** ", b.get_shape() aaa = tf.nn.xw_plus_b(x, w, b) #print "*** ", aaa.get_shape() return tf.nn.xw_plus_b(x, w, b) def _fully_connected_ST(self, x, out_dim): """FullyConnected layer for final output of the localization network in the spatial transformer""" x = tf.reshape(x, [self.hps.batch_size, -1]) w = tf.get_variable( 'DW2', [x.get_shape()[1], out_dim], initializer=tf.uniform_unit_scaling_initializer(factor=1.0)) initial = np.array([[1., 0, 0], [0, 1., 0]]) initial = initial.astype('float32') initial = initial.flatten() b = tf.get_variable('biases2', [out_dim], initializer=tf.constant_initializer(initial)) return tf.nn.xw_plus_b(x, w, b) def _global_avg_pool(self, x): assert x.get_shape().ndims == 4 return tf.reduce_mean(x, [1, 2]) def _repeat(self, x, n_repeats): with tf.variable_scope('_repeat'): rep = tf.transpose( tf.expand_dims(tf.ones(shape=tf.pack([n_repeats, ])), 1), [1, 0]) rep = tf.cast(rep, 'int32') x = tf.matmul(tf.reshape(x, (-1, 1)), rep) return tf.reshape(x, [-1]) def _interpolate(self, im, x, y, out_size, channel_x): with tf.variable_scope('_interpolate2'): # constants num_batch = self.hps.batch_size #tf.shape(im)[0] print num_batch height = tf.shape(im)[1] width = tf.shape(im)[2] channels = tf.shape(im)[3] print channels #channels = tf.cast(channels, tf.int32) #print channels x = tf.cast(x, 'float32') y = tf.cast(y, 'float32') height_f = tf.cast(height, 'float32') width_f = tf.cast(width, 'float32') out_height = out_size[0] out_width = out_size[1] zero = tf.zeros([], dtype='int32') #max_y = tf.cast(tf.shape(im)[1] - 1, 'int32') #max_x = tf.cast(tf.shape(im)[2] - 1, 'int32') max_y = tf.cast(height - 1, 'int32') max_x = tf.cast(width - 1, 'int32') # scale indices from [-1, 1] to [0, width/height] x = (x + 1.0)*(width_f) / 2.0 y = (y + 1.0)*(height_f) / 2.0 # do sampling x0 = tf.cast(tf.floor(x), 'int32') x1 = x0 + 1 y0 = tf.cast(tf.floor(y), 'int32') y1 = y0 + 1 x0 = tf.clip_by_value(x0, zero, max_x) x1 = tf.clip_by_value(x1, zero, max_x) y0 = tf.clip_by_value(y0, zero, max_y) y1 = tf.clip_by_value(y1, zero, max_y) dim2 = width dim1 = width*height base = self._repeat(tf.range(num_batch)*dim1, out_height*out_width) base_y0 = base + y0*dim2 base_y1 = base + y1*dim2 idx_a = base_y0 + x0 idx_b = base_y1 + x0 idx_c = base_y0 + x1 idx_d = base_y1 + x1 # use indices to lookup pixels in the flat image and restore # channels dim im_flat = tf.reshape(im, tf.pack([-1, channel_x])) #aa = tf.pack([-1, channels]) #im_flat = tf.reshape(im, [-1, channels]) #print im.get_shape(), im_flat.get_shape() #, aa.get_shape() im_flat = tf.cast(im_flat, 'float32') Ia = tf.gather(im_flat, idx_a) Ib = tf.gather(im_flat, idx_b) Ic = tf.gather(im_flat, idx_c) Id = tf.gather(im_flat, idx_d) #print im_flat.get_shape(), idx_a.get_shape() #print Ia.get_shape(), Ib.get_shape(), Ic.get_shape(), Id.get_shape() # and finally calculate interpolated values x0_f = tf.cast(x0, 'float32') x1_f = tf.cast(x1, 'float32') y0_f = tf.cast(y0, 'float32') y1_f = tf.cast(y1, 'float32') wa = tf.expand_dims(((x1_f-x) * (y1_f-y)), 1) wb = tf.expand_dims(((x1_f-x) * (y-y0_f)), 1) wc = tf.expand_dims(((x-x0_f) * (y1_f-y)), 1) wd = tf.expand_dims(((x-x0_f) * (y-y0_f)), 1) #print wa.get_shape(), wb.get_shape(), wc.get_shape(), wd.get_shape() output = tf.add_n([wa*Ia, wb*Ib, wc*Ic, wd*Id]) #print output.get_shape() return output def _meshgrid(self, height, width): with tf.variable_scope('_meshgrid'): # This should be equivalent to: # x_t, y_t = np.meshgrid(np.linspace(-1, 1, width), # np.linspace(-1, 1, height)) # ones = np.ones(np.prod(x_t.shape)) # grid = np.vstack([x_t.flatten(), y_t.flatten(), ones]) x_t = tf.matmul(tf.ones(shape=tf.pack([height, 1])), tf.transpose(tf.expand_dims(tf.linspace(-1.0, 1.0, width), 1), [1, 0])) y_t = tf.matmul(tf.expand_dims(tf.linspace(-1.0, 1.0, height), 1), tf.ones(shape=tf.pack([1, width]))) x_t_flat = tf.reshape(x_t, (1, -1)) y_t_flat = tf.reshape(y_t, (1, -1)) ones = tf.ones_like(x_t_flat) grid = tf.concat(0, [x_t_flat, y_t_flat, ones]) return grid def _transform(self, theta, input_dim, out_size, channel_input): with tf.variable_scope('_transform'): print input_dim.get_shape(), theta.get_shape(), out_size[0], out_size[1] num_batch = self.hps.batch_size #tf.shape(input_dim)[0] height = tf.shape(input_dim)[1] width = tf.shape(input_dim)[2] num_channels = tf.shape(input_dim)[3] theta = tf.reshape(theta, (-1, 2, 3)) theta = tf.cast(theta, 'float32') # grid of (x_t, y_t, 1), eq (1) in ref [1] height_f = tf.cast(height, 'float32') width_f = tf.cast(width, 'float32') out_height = out_size[0] out_width = out_size[1] grid = self._meshgrid(out_height, out_width) #print grid, grid.get_shape() grid = tf.expand_dims(grid, 0) grid = tf.reshape(grid, [-1]) grid = tf.tile(grid, tf.pack([num_batch])) grid = tf.reshape(grid, tf.pack([num_batch, 3, -1])) #print grid, grid.get_shape() # Transform A x (x_t, y_t, 1)^T -> (x_s, y_s) T_g = tf.batch_matmul(theta, grid) x_s = tf.slice(T_g, [0, 0, 0], [-1, 1, -1]) y_s = tf.slice(T_g, [0, 1, 0], [-1, 1, -1]) x_s_flat = tf.reshape(x_s, [-1]) y_s_flat = tf.reshape(y_s, [-1]) #print x_s_flat.get_shape(), y_s_flat.get_shape() input_transformed = self._interpolate(input_dim, x_s_flat, y_s_flat, out_size, channel_input) #print input_transformed.get_shape() output = tf.reshape(input_transformed, tf.pack([num_batch, out_height, out_width, channel_input])) return output #return input_dim ================================================ FILE: pose_utils.py ================================================ import sys import os #sys.path.append('+glaive_pylib+') #import JanusUtils import numpy as np import cv2 import math import fileinput import shutil def increaseBbox(bbox, factor): tlx = bbox[0] tly = bbox[1] brx = bbox[2] bry = bbox[3] dx = factor dy = factor dw = 1 + factor dh = 1 + factor #Getting bbox height and width w = brx-tlx; h = bry-tly; tlx2 = tlx - w * dx tly2 = tly - h * dy brx2 = tlx + w * dw bry2 = tly + h * dh nbbox = np.zeros( (4,1), dtype=np.float32 ) nbbox[0] = tlx2 nbbox[1] = tly2 nbbox[2] = brx2 nbbox[3] = bry2 return nbbox def image_bbox_processing_v2(img, bbox): img_h, img_w, img_c = img.shape lt_x = bbox[0] lt_y = bbox[1] rb_x = bbox[2] rb_y = bbox[3] fillings = np.zeros( (4,1), dtype=np.int32) if lt_x < 0: ## 0 for python fillings[0] = math.ceil(-lt_x) if lt_y < 0: fillings[1] = math.ceil(-lt_y) if rb_x > img_w-1: fillings[2] = math.ceil(rb_x - img_w + 1) if rb_y > img_h-1: fillings[3] = math.ceil(rb_y - img_h + 1) new_bbox = np.zeros( (4,1), dtype=np.float32 ) # img = [zeros(size(img,1),fillings(1),img_c), img] # img = [zeros(fillings(2), size(img,2),img_c); img] # img = [img, zeros(size(img,1), fillings(3),img_c)] # new_img = [img; zeros(fillings(4), size(img,2),img_c)] imgc = img.copy() if fillings[0] > 0: img_h, img_w, img_c = imgc.shape imgc = np.hstack( [np.zeros( (img_h, fillings[0][0], img_c), dtype=np.uint8 ), imgc] ) if fillings[1] > 0: img_h, img_w, img_c = imgc.shape imgc = np.vstack( [np.zeros( (fillings[1][0], img_w, img_c), dtype=np.uint8 ), imgc] ) if fillings[2] > 0: img_h, img_w, img_c = imgc.shape imgc = np.hstack( [ imgc, np.zeros( (img_h, fillings[2][0], img_c), dtype=np.uint8 ) ] ) if fillings[3] > 0: img_h, img_w, img_c = imgc.shape imgc = np.vstack( [ imgc, np.zeros( (fillings[3][0], img_w, img_c), dtype=np.uint8) ] ) new_bbox[0] = lt_x + fillings[0] new_bbox[1] = lt_y + fillings[1] new_bbox[2] = rb_x + fillings[0] new_bbox[3] = rb_y + fillings[1] return imgc, new_bbox def preProcessImage(_savingDir, data_dict, data_root, factor, _alexNetSize, _listFile): #### Formatting the images as needed file_output = _listFile count = 1 fileIn = open(file_output , 'w' ) for key in data_dict.keys(): filename = data_dict[key]['file'] im = cv2.imread(data_root + filename) if im is not None: print 'Processing ' + filename + ' '+ str(count) sys.stdout.flush() lt_x = data_dict[key]['x'] lt_y = data_dict[key]['y'] rb_x = lt_x + data_dict[key]['width'] rb_y = lt_y + data_dict[key]['height'] w = data_dict[key]['width'] h = data_dict[key]['height'] center = ( (lt_x+rb_x)/2, (lt_y+rb_y)/2 ) side_length = max(w,h); bbox = np.zeros( (4,1), dtype=np.float32 ) bbox[0] = center[0] - side_length/2 bbox[1] = center[1] - side_length/2 bbox[2] = center[0] + side_length/2 bbox[3] = center[1] + side_length/2 #img_2, bbox_green = image_bbox_processing_v2(im, bbox) #%% Get the expanded square bbox bbox_red = increaseBbox(bbox, factor) #[img, bbox_red] = image_bbox_processing_v2(img, bbox_red); img_3, bbox_new = image_bbox_processing_v2(im, bbox_red) #%% Crop and resized #bbox_red = ceil(bbox_red); bbox_new = np.ceil( bbox_new ) #side_length = max(bbox_new(3) - bbox_new(1), bbox_new(4) - bbox_new(2)); side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] ) bbox_new[2:4] = bbox_new[0:2] + side_length #crop_img = img(bbox_red(2):bbox_red(4), bbox_red(1):bbox_red(3), :); #resized_crop_img = imresize(crop_img, [227, 227]);# % re-scaling to 227 x 227 bbox_new = bbox_new.astype(int) crop_img = img_3[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :]; resized_crop_img = cv2.resize(crop_img, ( _alexNetSize, _alexNetSize ), interpolation = cv2.INTER_CUBIC) cv2.imwrite(_savingDir + key + '.jpg', resized_crop_img ) # flip image for latter use img_flip = cv2.flip(resized_crop_img,1) cv2.imwrite(_savingDir + key + '_flip.jpg', img_flip ) ## Tracking pose image fileIn.write(key + ',') fileIn.write(_savingDir + key + '.jpg\n') fileIn.write(key + '_flip,') fileIn.write(_savingDir + key + '_flip.jpg\n') else: print ' '.join(['Skipping image:', filename, 'Image is None', str(count)]) count+=1 fileIn.close() def replaceInFile(filep, before, after): for line in fileinput.input(filep, inplace=True): print line.replace(before,after), ================================================ FILE: renderer_fpn.py ================================================ import csv import lmdb import sys import numpy as np import cv2 import os this_path = os.path.dirname(os.path.abspath(__file__)) render_path = this_path+'/face_renderer/' sys.path.append(render_path) try: import myutil except ImportError as ie: print '****************************************************************' print '**** Have you forgotten to "git clone --recursive"? ****' print '**** You have to do that to also download the face renderer ****' print '****************************************************************' print ie.message exit(0) import config opts = config.parse() import camera_calibration as calib import ThreeD_Model import renderer as renderer_core import get_Rts as getRts #pose_models = ['model3D_aug_-00_00','model3D_aug_-22_00','model3D_aug_-40_00','model3D_aug_-55_00','model3D_aug_-75_00'] newModels = opts.getboolean('renderer', 'newRenderedViews') if opts.getboolean('renderer', 'newRenderedViews'): pose_models_folder = '/models3d_new/' pose_models = ['model3D_aug_-00_00','model3D_aug_-22_00','model3D_aug_-40_00','model3D_aug_-55_00','model3D_aug_-75_00'] else: pose_models_folder = '/models3d/' pose_models = ['model3D_aug_-00','model3D_aug_-40','model3D_aug_-75',] nSub = 10 allModels = myutil.preload(render_path,pose_models_folder,pose_models,nSub) def render_fpn(inputFile, output_pose_db, outputFolder): ## Opening FPN pose db pose_env = lmdb.open( output_pose_db, readonly=True ) pose_cnn_lmdb = pose_env.begin() ## looping over images with open(inputFile, 'r') as csvfile: csvreader = csv.reader(csvfile, delimiter=',') lines = csvfile.readlines() for lin in lines: ### key1, image_path_key_1 image_key = lin.split(',')[0] if 'flip' in image_key: continue image_path = lin.split(',')[-1].rstrip('\n') img = cv2.imread(image_path, 1) pose_Rt_raw = pose_cnn_lmdb.get( image_key ) pose_Rt_flip_raw = pose_cnn_lmdb.get(image_key + '_flip') if pose_Rt_raw is not None: pose_Rt = np.frombuffer( pose_Rt_raw, np.float32 ) pose_Rt_flip = np.frombuffer( pose_Rt_flip_raw, np.float32 ) yaw = myutil.decideSide_from_db(img, pose_Rt, allModels) if yaw < 0: # Flip image and get the corresponsidng pose img = cv2.flip(img,1) pose_Rt = pose_Rt_flip listPose = myutil.decidePose(yaw, opts, newModels) ## Looping over the poses for poseId in listPose: posee = pose_models[poseId] ## Looping over the subjects for subj in [10]: pose = posee + '_' + str(subj).zfill(2) +'.mat' print '> Looking at file: ' + image_path + ' with ' + pose # load detections performed by dlib library on 3D model and Reference Image print "> Using pose model in " + pose ## Indexing the right model instead of loading it each time from memory. model3D = allModels[pose] eyemask = model3D.eyemask # perform camera calibration according to the first face detected proj_matrix, camera_matrix, rmat, tvec = calib.estimate_camera(model3D, pose_Rt, pose_db_on=True) ## We use eyemask only for frontal if not myutil.isFrontal(pose): eyemask = None ##### Main part of the code: doing the rendering ############# rendered_raw, rendered_sym, face_proj, background_proj, temp_proj2_out_2, sym_weight = renderer_core.render(img, proj_matrix,\ model3D.ref_U, eyemask, model3D.facemask, opts) ######################################################## if myutil.isFrontal(pose): rendered_raw = rendered_sym ## Cropping if required by crop_models #rendered_raw = myutil.cropFunc(pose,rendered_raw,crop_models[poseId]) ## Resizing if required #if resizeCNN: # rendered_raw = cv2.resize(rendered_raw, ( cnnSize, cnnSize ), interpolation=cv2.INTER_CUBIC ) ## Saving if required if opts.getboolean('general', 'saveON'): subjFolder = outputFolder + '/'+ image_key.split('_')[0] myutil.mymkdir(subjFolder) savingString = subjFolder + '/' + image_key +'_rendered_'+ pose[8:-7]+'_'+str(subj).zfill(2)+'.jpg' cv2.imwrite(savingString,rendered_raw) ================================================ FILE: tf_utils.py ================================================ # Copyright 2016 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== # %% Borrowed utils from here: https://github.com/pkmital/tensorflow_tutorials/ #import tensorflow as tf import numpy as np import csv def conv2d(x, n_filters, k_h=5, k_w=5, stride_h=2, stride_w=2, stddev=0.02, activation=lambda x: x, bias=True, padding='SAME', name="Conv2D"): """2D Convolution with options for kernel size, stride, and init deviation. Parameters ---------- x : Tensor Input tensor to convolve. n_filters : int Number of filters to apply. k_h : int, optional Kernel height. k_w : int, optional Kernel width. stride_h : int, optional Stride in rows. stride_w : int, optional Stride in cols. stddev : float, optional Initialization's standard deviation. activation : arguments, optional Function which applies a nonlinearity padding : str, optional 'SAME' or 'VALID' name : str, optional Variable scope to use. Returns ------- x : Tensor Convolved input. """ with tf.variable_scope(name): w = tf.get_variable( 'w', [k_h, k_w, x.get_shape()[-1], n_filters], initializer=tf.truncated_normal_initializer(stddev=stddev)) conv = tf.nn.conv2d( x, w, strides=[1, stride_h, stride_w, 1], padding=padding) if bias: b = tf.get_variable( 'b', [n_filters], initializer=tf.truncated_normal_initializer(stddev=stddev)) conv = conv + b return conv def linear(x, n_units, scope=None, stddev=0.02, activation=lambda x: x): """Fully-connected network. Parameters ---------- x : Tensor Input tensor to the network. n_units : int Number of units to connect to. scope : str, optional Variable scope to use. stddev : float, optional Initialization's standard deviation. activation : arguments, optional Function which applies a nonlinearity Returns ------- x : Tensor Fully-connected output. """ shape = x.get_shape().as_list() with tf.variable_scope(scope or "Linear"): matrix = tf.get_variable("Matrix", [shape[1], n_units], tf.float32, tf.random_normal_initializer(stddev=stddev)) return activation(tf.matmul(x, matrix)) # %% def weight_variable(shape): '''Helper function to create a weight variable initialized with a normal distribution Parameters ---------- shape : list Size of weight variable ''' #initial = tf.random_normal(shape, mean=0.0, stddev=0.01) initial = tf.zeros(shape) return tf.Variable(initial) # %% def bias_variable(shape): '''Helper function to create a bias variable initialized with a constant value. Parameters ---------- shape : list Size of weight variable ''' initial = tf.random_normal(shape, mean=0.0, stddev=0.01) return tf.Variable(initial) # %% def dense_to_one_hot(labels, n_classes=2): """Convert class labels from scalars to one-hot vectors.""" labels = np.array(labels).astype('int32') n_labels = labels.shape[0] index_offset = (np.arange(n_labels) * n_classes).astype('int32') labels_one_hot = np.zeros((n_labels, n_classes), dtype=np.float32) labels_one_hot.flat[index_offset + labels.ravel()] = 1 return labels_one_hot def prepare_trainVal_img_list(img_list, num_subjs): #num_imgs_per_subj =np.zeros([num_subjs]) id_label_list = [] for row in img_list: id_label = int(row[8]) #num_imgs_per_subj[id_label] += 1 id_label_list.append(id_label) id_label_list = np.asarray(id_label_list) id_label_list = np.reshape(id_label_list, [-1]) train_indices_list = [] valid_indices_list= [] eval_train_indices_list = [] eval_valid_indices_list = [] for i in range(num_subjs): print i curr_subj_idx = np.nonzero(id_label_list == i)[0] tmp = np.random.permutation(curr_subj_idx) per80 = np.floor(len(curr_subj_idx) * 0.8) t_inds = tmp[0:per80] v_inds = tmp[per80:] train_indices_list.append(t_inds) valid_indices_list.append(v_inds) eval_train_indices_list.append(t_inds[0]) eval_valid_indices_list.append(v_inds[0]) train_indices_list = np.asarray(train_indices_list) valid_indices_list = np.asarray(valid_indices_list) eval_train_indices_list = np.asarray(eval_train_indices_list) eval_valid_indices_list = np.asarray(eval_valid_indices_list) #print train_indices_list, train_indices_list.shape train_indices_list = np.hstack(train_indices_list).astype('int') valid_indices_list = np.hstack(valid_indices_list).astype('int') eval_train_indices_list = np.hstack(eval_train_indices_list).astype('int') eval_valid_indices_list = np.hstack(eval_valid_indices_list).astype('int') print train_indices_list.shape, valid_indices_list.shape, eval_train_indices_list.shape, eval_valid_indices_list.shape img_list = np.asarray(img_list) print img_list.shape train_list = img_list[train_indices_list] valid_list = img_list[valid_indices_list] eval_train_list = img_list[eval_train_indices_list] eval_valid_list = img_list[eval_valid_indices_list] np.savez("Oxford_trainVal_data_3DSTN.npz", train_list=train_list, valid_list=valid_list, eval_train_list=eval_train_list, eval_valid_list=eval_valid_list) def select_eval_img_list(img_list, num_subjs, save_file_name): # number of validation subjects id_label_list = [] for row in img_list: id_label = int(row[8]) id_label_list.append(id_label) id_label_list = np.asarray(id_label_list) id_label_list = np.reshape(id_label_list, [-1]) eval_indices_list = [] for i in range(num_subjs): print i curr_subj_idx = np.nonzero(id_label_list == i)[0] tmp = np.random.permutation(curr_subj_idx) inds = tmp[0:min(5, len(curr_subj_idx))] eval_indices_list.append(inds) eval_indices_list = np.asarray(eval_indices_list) eval_indices_list = np.hstack(eval_indices_list).astype('int') print eval_indices_list.shape img_list = np.asarray(img_list) print img_list.shape eval_list = img_list[eval_indices_list] np.savez(save_file_name, eval_list=eval_list) """ # Record the number of images per subject num_imgs_per_subj =np.zeros([num_subjs]) for row in valid_img_list: id_label = int(row[8]) num_imgs_per_subj[id_label] += 1 hist_subj = np.zeros([num_subjs]) idx = 0 count = 0 for row in valid_img_list: count += 1 print count image_key = row[0] image_path = row[1] id_label = int(row[8]) if idx >= num_subjs: break if hist_subj[idx] < min(1, num_imgs_per_subj[idx]): if id_label == idx: with open(save_file_name, "a") as f: f.write(image_key + "," + image_path + "," + row[2] + "," + row[3] + "," + row[4] + "," + row[5] + "," + row[6] + "," + row[7] + "," + str(id_label) + "\n") hist_subj[idx] += 1 else: idx += 1 """ def input_processing(images, pose_labels, id_labels, train_mean_vec, mean_labels, std_labels, num_imgs, image_size, num_classes): images = images.reshape([num_imgs, image_size, image_size, 3]) pose_labels = pose_labels.reshape([num_imgs, 6]) id_labels = id_labels.reshape([num_imgs, 1]) id_labels = dense_to_one_hot(id_labels, num_classes) # Subtract train image mean images = images / 255. train_mean_mat = train_mean_vec2mat(train_mean_vec, images) normalized_images = images - train_mean_mat # Normalize labels normalized_pose_labels = (pose_labels - mean_labels) / (std_labels + 0.000000000000000001) return normalized_images, normalized_pose_labels, id_labels def train_mean_vec2mat(train_mean, images_array): height = images_array.shape[1] width = images_array.shape[2] #batch = images_array.shape[0] train_mean_R = np.matlib.repmat(train_mean[0],height,width) train_mean_G = np.matlib.repmat(train_mean[1],height,width) train_mean_B = np.matlib.repmat(train_mean[2],height,width) R = np.reshape(train_mean_R, (height,width,1)) G = np.reshape(train_mean_G, (height,width,1)) B = np.reshape(train_mean_B, (height,width,1)) train_mean_image = np.append(R, G, axis=2) train_mean_image = np.append(train_mean_image, B, axis=2) return train_mean_image def create_file_list(csv_file_path): with open(csv_file_path, 'r') as csvfile: csvreader = csv.reader(csvfile, delimiter=',') csv_list = list(csvreader) return csv_list ================================================ FILE: train_stats/README ================================================ Here are the precomputed training data statistics ================================================ FILE: utils/README ================================================ Some utility functions are here ================================================ FILE: utils/pose_utils.py ================================================ import sys import os import numpy as np import cv2 import math from math import cos, sin, atan2, asin import fileinput ## Index to remap landmarks in case we flip an image repLand = [ 17,16,15,14,13,12,11,10, 9,8,7,6,5,4,3,2,1,27,26,25, \ 24,23,22,21,20,19,18,28,29,30,31,36,35,34,33,32,46,45,44,43, \ 48,47,40,39,38,37,42,41,55,54,53,52,51,50,49,60,59,58,57,56, \ 65,64,63,62,61,68,67,66 ] def increaseBbox(bbox, factor): tlx = bbox[0] tly = bbox[1] brx = bbox[2] bry = bbox[3] dx = factor dy = factor dw = 1 + factor dh = 1 + factor #Getting bbox height and width w = brx-tlx; h = bry-tly; tlx2 = tlx - w * dx tly2 = tly - h * dy brx2 = tlx + w * dw bry2 = tly + h * dh nbbox = np.zeros( (4,1), dtype=np.float32 ) nbbox[0] = tlx2 nbbox[1] = tly2 nbbox[2] = brx2 nbbox[3] = bry2 return nbbox def increaseBbox_rescaleCASIA(bbox, factor): tlx = bbox[0] tly = bbox[1] brx = bbox[2] bry = bbox[3] ww = brx - tlx; hh = bry - tly; cx = tlx + ww/2; cy = tly + hh/2; tsize = max(ww,hh)/2; bl = cx - factor[0]*tsize; bt = cy - factor[1]*tsize; br = cx + factor[2]*tsize; bb = cy + factor[3]*tsize; nbbox = np.zeros( (4,1), dtype=np.float32 ) nbbox[0] = bl; nbbox[1] = bt; nbbox[2] = br; nbbox[3] = bb; return nbbox def increaseBbox_rescaleYOLO(bbox, im): rescaleFrontal = [1.4421, 2.2853, 1.4421, 1.4286]; rescaleCS2 = [0.9775, 1.5074, 0.9563, 0.9436]; l = bbox[0] t = bbox[1] ww = bbox[2] hh = bbox[3] # Approximate LM tight BB h = im.shape[0]; w = im.shape[1]; cx = l + ww/2; cy = t + hh/2; tsize = max(ww,hh)/2; l = cx - tsize; t = cy - tsize; cx = l + (2*tsize)/(rescaleCS2[0]+rescaleCS2[2]) * rescaleCS2[0]; cy = t + (2*tsize)/(rescaleCS2[1]+rescaleCS2[3]) * rescaleCS2[1]; tsize = 2*tsize/(rescaleCS2[0]+rescaleCS2[2]); """ # Approximate inplane align (frontal) nbbox = np.zeros( (4,1), dtype=np.float32 ) nbbox[0] = cx - rescaleFrontal[0]*tsize; nbbox[1] = cy - rescaleFrontal[1]*tsize; nbbox[2] = cx + rescaleFrontal[2]*tsize; nbbox[3] = cy + rescaleFrontal[3]*tsize; """ nbbox = np.zeros( (4,1), dtype=np.float32 ) nbbox[0] = cx - tsize; nbbox[1] = cy - tsize; nbbox[2] = cx + tsize; nbbox[3] = cy + tsize; return nbbox def image_bbox_processing_v2(img, bbox, landmarks=None): img_h, img_w, img_c = img.shape lt_x = bbox[0] lt_y = bbox[1] rb_x = bbox[2] rb_y = bbox[3] fillings = np.zeros( (4,1), dtype=np.int32) if lt_x < 0: ## 0 for python fillings[0] = math.ceil(-lt_x) if lt_y < 0: fillings[1] = math.ceil(-lt_y) if rb_x > img_w-1: fillings[2] = math.ceil(rb_x - img_w + 1) if rb_y > img_h-1: fillings[3] = math.ceil(rb_y - img_h + 1) new_bbox = np.zeros( (4,1), dtype=np.float32 ) # img = [zeros(size(img,1),fillings(1),img_c), img] # img = [zeros(fillings(2), size(img,2),img_c); img] # img = [img, zeros(size(img,1), fillings(3),img_c)] # new_img = [img; zeros(fillings(4), size(img,2),img_c)] imgc = img.copy() if fillings[0] > 0: img_h, img_w, img_c = imgc.shape imgc = np.hstack( [np.zeros( (img_h, fillings[0][0], img_c), dtype=np.uint8 ), imgc] ) if fillings[1] > 0: img_h, img_w, img_c = imgc.shape imgc = np.vstack( [np.zeros( (fillings[1][0], img_w, img_c), dtype=np.uint8 ), imgc] ) if fillings[2] > 0: img_h, img_w, img_c = imgc.shape imgc = np.hstack( [ imgc, np.zeros( (img_h, fillings[2][0], img_c), dtype=np.uint8 ) ] ) if fillings[3] > 0: img_h, img_w, img_c = imgc.shape imgc = np.vstack( [ imgc, np.zeros( (fillings[3][0], img_w, img_c), dtype=np.uint8) ] ) new_bbox[0] = lt_x + fillings[0] new_bbox[1] = lt_y + fillings[1] new_bbox[2] = rb_x + fillings[0] new_bbox[3] = rb_y + fillings[1] if len(landmarks) == 0: #len(landmarks) == 0: #landmarks == None: return imgc, new_bbox else: landmarks_new = np.zeros([landmarks.shape[0], landmarks.shape[1]]) #print "landmarks_new's shape: \n" #print landmarks_new.shape landmarks_new[:,0] = landmarks[:,0] + fillings[0] landmarks_new[:,1] = landmarks[:,1] + fillings[1] return imgc, new_bbox, landmarks_new #return imgc, new_bbox def image_bbox_processing_v3(img, bbox): img_h, img_w, img_c = img.shape lt_x = bbox[0] lt_y = bbox[1] rb_x = bbox[2] rb_y = bbox[3] fillings = np.zeros( (4,1), dtype=np.int32) if lt_x < 0: ## 0 for python fillings[0] = math.ceil(-lt_x) if lt_y < 0: fillings[1] = math.ceil(-lt_y) if rb_x > img_w-1: fillings[2] = math.ceil(rb_x - img_w + 1) if rb_y > img_h-1: fillings[3] = math.ceil(rb_y - img_h + 1) new_bbox = np.zeros( (4,1), dtype=np.float32 ) # img = [zeros(size(img,1),fillings(1),img_c), img] # img = [zeros(fillings(2), size(img,2),img_c); img] # img = [img, zeros(size(img,1), fillings(3),img_c)] # new_img = [img; zeros(fillings(4), size(img,2),img_c)] imgc = img.copy() if fillings[0] > 0: img_h, img_w, img_c = imgc.shape imgc = np.hstack( [np.zeros( (img_h, fillings[0][0], img_c), dtype=np.uint8 ), imgc] ) if fillings[1] > 0: img_h, img_w, img_c = imgc.shape imgc = np.vstack( [np.zeros( (fillings[1][0], img_w, img_c), dtype=np.uint8 ), imgc] ) if fillings[2] > 0: img_h, img_w, img_c = imgc.shape imgc = np.hstack( [ imgc, np.zeros( (img_h, fillings[2][0], img_c), dtype=np.uint8 ) ] ) if fillings[3] > 0: img_h, img_w, img_c = imgc.shape imgc = np.vstack( [ imgc, np.zeros( (fillings[3][0], img_w, img_c), dtype=np.uint8) ] ) new_bbox[0] = lt_x + fillings[0] new_bbox[1] = lt_y + fillings[1] new_bbox[2] = rb_x + fillings[0] new_bbox[3] = rb_y + fillings[1] return imgc, new_bbox def preProcessImage(im, lmks, bbox, factor, _alexNetSize, flipped): sys.stdout.flush() if flipped == 1: # flip landmarks and indices if it's flipped imag lmks = flip_lmk_idx(im, lmks) lmks_flip = lmks lt_x = bbox[0] lt_y = bbox[1] rb_x = lt_x + bbox[2] rb_y = lt_y + bbox[3] w = bbox[2] h = bbox[3] center = ( (lt_x+rb_x)/2, (lt_y+rb_y)/2 ) side_length = max(w,h); # make the bbox be square bbox = np.zeros( (4,1), dtype=np.float32 ) bbox[0] = center[0] - side_length/2 bbox[1] = center[1] - side_length/2 bbox[2] = center[0] + side_length/2 bbox[3] = center[1] + side_length/2 img_2, bbox_green = image_bbox_processing_v2(im, bbox) #%% Get the expanded square bbox bbox_red = increaseBbox(bbox_green, factor) bbox_red2 = increaseBbox(bbox, factor) bbox_red2[2] = bbox_red2[2] - bbox_red2[0] bbox_red2[3] = bbox_red2[3] - bbox_red2[1] bbox_red2 = np.reshape(bbox_red2, [4]) img_3, bbox_new, lmks = image_bbox_processing_v2(img_2, bbox_red, lmks) #%% Crop and resized bbox_new = np.ceil( bbox_new ) side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] ) bbox_new[2:4] = bbox_new[0:2] + side_length bbox_new = bbox_new.astype(int) crop_img = img_3[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :]; lmks_new = np.zeros([lmks.shape[0],2]) lmks_new[:,0] = lmks[:,0] - bbox_new[0][0] lmks_new[:,1] = lmks[:,1] - bbox_new[1][0] resized_crop_img = cv2.resize(crop_img, ( _alexNetSize, _alexNetSize ), interpolation = cv2.INTER_CUBIC) old_h, old_w, channels = crop_img.shape lmks_new2 = np.zeros([lmks.shape[0],2]) lmks_new2[:,0] = lmks_new[:,0] * _alexNetSize / old_w lmks_new2[:,1] = lmks_new[:,1] * _alexNetSize / old_h #print _alexNetSize, old_w, old_h return resized_crop_img, lmks_new2, bbox_red2, lmks_flip, side_length, center def resize_crop_rescaleCASIA(im, bbox, lmks, factor): lt_x = bbox[0] lt_y = bbox[1] rb_x = lt_x + bbox[2] rb_y = lt_y + bbox[3] bbox = np.reshape([lt_x, lt_y, rb_x, rb_y], [-1]) # Get the expanded square bbox bbox_red = increaseBbox_rescaleCASIA(bbox, factor) img_3, bbox_new, lmks = image_bbox_processing_v2(im, bbox_red, lmks); lmks_filling = lmks.copy() #%% Crop and resized bbox_new = np.ceil( bbox_new ) side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] ) bbox_new[2:4] = bbox_new[0:2] + side_length #bbox_new[0] = max(0, bbox_new[0]) #bbox_new[1] = max(0, bbox_new[1]) #bbox_new[2] = min(img_3.shape[1]-1, bbox_new[2]) #bbox_new[3] = min(img_3.shape[0]-1, bbox_new[3]) bbox_new = bbox_new.astype(int) crop_img = img_3[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :]; lmks_new = np.zeros([lmks.shape[0],2]) lmks_new[:,0] = lmks[:,0] - bbox_new[0][0] lmks_new[:,1] = lmks[:,1] - bbox_new[1][0] old_h, old_w, channels = crop_img.shape resized_crop_img = cv2.resize(crop_img, ( 224, 224 ), interpolation = cv2.INTER_CUBIC) lmks_new2 = np.zeros([lmks.shape[0],2]) lmks_new2[:,0] = lmks_new[:,0] * 224 / old_w lmks_new2[:,1] = lmks_new[:,1] * 224 / old_h return resized_crop_img, bbox_new, lmks_new2, lmks_filling, old_h, old_w, img_3 def resize_crop_rescaleCASIA_v2(im, bbox, lmks, factor, bbox_type): # Get the expanded square bbox if bbox_type == "casia": lt_x = bbox[0] lt_y = bbox[1] rb_x = lt_x + bbox[2] rb_y = lt_y + bbox[3] bbox = np.reshape([lt_x, lt_y, rb_x, rb_y], [-1]) bbox_red = increaseBbox_rescaleCASIA(bbox, factor) elif bbox_type == "yolo": lt_x = bbox[0] lt_y = bbox[1] rb_x = lt_x + bbox[2] rb_y = lt_y + bbox[3] w = bbox[2] h = bbox[3] center = ( (lt_x+rb_x)/2, (lt_y+rb_y)/2 ) side_length = max(w,h); # make the bbox be square bbox = np.zeros( (4,1), dtype=np.float32 ) bbox[0] = center[0] - side_length/2 bbox[1] = center[1] - side_length/2 bbox[2] = center[0] + side_length/2 bbox[3] = center[1] + side_length/2 img_2, bbox_green = image_bbox_processing_v3(im, bbox) #%% Get the expanded square bbox bbox_red = increaseBbox(bbox_green, factor) img_3, bbox_new, lmks = image_bbox_processing_v2(im, bbox_red, lmks); lmks_filling = lmks.copy() #%% Crop and resized bbox_new = np.ceil( bbox_new ) side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] ) bbox_new[2:4] = bbox_new[0:2] + side_length #bbox_new[0] = max(0, bbox_new[0]) #bbox_new[1] = max(0, bbox_new[1]) #bbox_new[2] = min(img_3.shape[1]-1, bbox_new[2]) #bbox_new[3] = min(img_3.shape[0]-1, bbox_new[3]) bbox_new = bbox_new.astype(int) crop_img = img_3[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :]; lmks_new = np.zeros([lmks.shape[0],2]) lmks_new[:,0] = lmks[:,0] - bbox_new[0][0] lmks_new[:,1] = lmks[:,1] - bbox_new[1][0] old_h, old_w, channels = crop_img.shape resized_crop_img = cv2.resize(crop_img, ( 224, 224 ), interpolation = cv2.INTER_CUBIC) lmks_new2 = np.zeros([lmks.shape[0],2]) lmks_new2[:,0] = lmks_new[:,0] * 224 / old_w lmks_new2[:,1] = lmks_new[:,1] * 224 / old_h return resized_crop_img, bbox_new, lmks_new2, lmks_filling, old_h, old_w, img_3 def resize_crop_AFLW(im, bbox, lmks): lt_x = bbox[0] lt_y = bbox[1] rb_x = lt_x + bbox[2] rb_y = lt_y + bbox[3] bbox = np.reshape([lt_x, lt_y, rb_x, rb_y], [-1]) crop_img = img[bbox[1]:bbox[3], bbox[0]:bbox[2], :]; lmks_new = np.zeros([lmks.shape[0],2]) lmks_new[:,0] = lmks[:,0] - bbox[0] lmks_new[:,1] = lmks[:,1] - bbox[1] old_h, old_w, channels = crop_img.shape resized_crop_img = cv2.resize(crop_img, ( 224, 224 ), interpolation = cv2.INTER_CUBIC) lmks_new2 = np.zeros([lmks.shape[0],2]) lmks_new2[:,0] = lmks_new[:,0] * 224 / old_w lmks_new2[:,1] = lmks_new[:,1] * 224 / old_h bbox_new = np.zeros([4]) bbox_new[0] = bbox[0] * 224 / old_w bbox_new[1] = bbox[1] * 224 / old_h bbox_new[2] = bbox[2] * 224 / old_w bbox_new[3] = bbox[3] * 224 / old_h bbox_new[2] = bbox_new[2] - bbox_new[0] # box width bbox_new[3] = bbox_new[3] - bbox_new[1] # box height return resized_crop_img, bbox_new, lmks_new2 def preProcessImage_v2(im, bbox, factor, _resNetSize, if_cropbyLmks_rescaleCASIA): sys.stdout.flush() if if_cropbyLmks_rescaleCASIA == 0: lt_x = bbox[0] lt_y = bbox[1] rb_x = lt_x + bbox[2] rb_y = lt_y + bbox[3] w = bbox[2] h = bbox[3] center = ( (lt_x+rb_x)/2, (lt_y+rb_y)/2 ) side_length = max(w,h); # make the bbox be square bbox = np.zeros( (4,1), dtype=np.float32 ) bbox[0] = center[0] - side_length/2 bbox[1] = center[1] - side_length/2 bbox[2] = center[0] + side_length/2 bbox[3] = center[1] + side_length/2 img_2, bbox_green = image_bbox_processing_v2(im, bbox) #%% Get the expanded square bbox bbox_red = increaseBbox(bbox_green, factor) img_3, bbox_new = image_bbox_processing_v2(img_2, bbox_red) elif if_cropbyLmks_rescaleCASIA == 1: bbox[2] = bbox[0] + bbox[2] bbox[3] = bbox[1] + bbox[3] bbox_red = increaseBbox_rescaleCASIA(bbox, factor) #print bbox_red img_3, bbox_new = image_bbox_processing_v3(im, bbox_red) else: bbox2 = increaseBbox_rescaleYOLO(bbox, im) bbox_red = increaseBbox_rescaleCASIA(bbox2, factor) img_3, bbox_new = image_bbox_processing_v2(im, bbox_red) #bbox_red2 = increaseBbox(bbox, factor) #bbox_red2[2] = bbox_red2[2] - bbox_red2[0] #bbox_red2[3] = bbox_red2[3] - bbox_red2[1] #bbox_red2 = np.reshape(bbox_red2, [4]) #%% Crop and resized bbox_new = np.ceil( bbox_new ) side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] ) bbox_new[2:4] = bbox_new[0:2] + side_length bbox_new = bbox_new.astype(int) crop_img = img_3[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :]; #print crop_img.shape resized_crop_img = cv2.resize(crop_img, ( _resNetSize, _resNetSize ), interpolation = cv2.INTER_CUBIC) return resized_crop_img def preProcessImage_useGTBBox(im, lmks, bbox, factor, _alexNetSize, flipped, to_train_scale, yolo_bbox): sys.stdout.flush() #print bbox, yolo_bbox, to_train_scale if flipped == 1: # flip landmarks and indices if it's flipped imag lmks = flip_lmk_idx(im, lmks) lmks_flip = lmks lt_x = bbox[0] lt_y = bbox[1] rb_x = lt_x + bbox[2] rb_y = lt_y + bbox[3] w = bbox[2] h = bbox[3] center = ( (lt_x+rb_x)/2, (lt_y+rb_y)/2 ) side_length = max(w,h); # make the bbox be square bbox = np.zeros( (4,1), dtype=np.float32 ) #print bbox bbox_red = np.zeros( (4,1), dtype=np.float32 ) if to_train_scale == 1: _, _, _, _, side_length2, center2 = preProcessImage(im, lmks, yolo_bbox, factor, _alexNetSize, flipped) center3 = ( (center[0]+center2[0])/2, (center[1]+center2[1])/2 ) bbox[0] = center3[0] - side_length2/2 bbox[1] = center3[1] - side_length2/2 bbox[2] = center3[0] + side_length2/2 bbox[3] = center3[1] + side_length2/2 bbox_red[0] = center3[0] - side_length2/2 bbox_red[1] = center3[1] - side_length2/2 bbox_red[2] = side_length2 bbox_red[3] = side_length2 else: bbox[0] = center[0] - side_length/2 bbox[1] = center[1] - side_length/2 bbox[2] = center[0] + side_length/2 bbox[3] = center[1] + side_length/2 #print center, side_length, bbox[0], bbox[1], bbox[2], bbox[3] bbox_red[0] = center[0] - side_length/2 bbox_red[1] = center[1] - side_length/2 bbox_red[2] = side_length bbox_red[3] = side_length bbox_red = np.reshape(bbox_red, [4]) #print bbox, bbox_red img_2, bbox_green = image_bbox_processing_v2(im, bbox) #print img_2.shape, bbox_green #%% Crop and resized bbox_new = np.ceil( bbox_green ) side_length = max( bbox_new[2] - bbox_new[0], bbox_new[3] - bbox_new[1] ) bbox_new[2:4] = bbox_new[0:2] + side_length bbox_new = bbox_new.astype(int) #print bbox_new crop_img = img_2[bbox_new[1][0]:bbox_new[3][0], bbox_new[0][0]:bbox_new[2][0], :]; lmks_new = np.zeros([68,2]) lmks_new[:,0] = lmks[:,0] - bbox_new[0][0] lmks_new[:,1] = lmks[:,1] - bbox_new[1][0] #print crop_img.shape resized_crop_img = cv2.resize(crop_img, ( _alexNetSize, _alexNetSize ), interpolation = cv2.INTER_CUBIC) old_h, old_w, channels = crop_img.shape lmks_new2 = np.zeros([68,2]) lmks_new2[:,0] = lmks_new[:,0] * _alexNetSize / old_w lmks_new2[:,1] = lmks_new[:,1] * _alexNetSize / old_h #print _alexNetSize, old_w, old_h return resized_crop_img, lmks_new2, bbox_red, lmks_flip def replaceInFile(filep, before, after): for line in fileinput.input(filep, inplace=True): print line.replace(before,after), def flip_lmk_idx(img, lmarks): # Flipping X values for landmarks \ lmarks[:,0] = img.shape[1] - lmarks[:,0] # Creating flipped landmarks with new indexing lmarks_flip = np.zeros((68,2)) for i in range(len(repLand)): lmarks_flip[i,:] = lmarks[repLand[i]-1,:] return lmarks_flip def pose_to_LMs(pose_Rt): pose_Rt = np.reshape(pose_Rt, [6]) ref_lm = np.loadtxt('./lm_m10.txt', delimiter=',') ref_lm_t = np.transpose(ref_lm) numLM = ref_lm_t.shape[1] #PI = np.array([[ 4.22519775e+03,0.00000000e+00,1.15000000e+02], [0.00000000e+00, 4.22519775e+03, 1.15000000e+02], [0, 0, 1]]); PI = np.array([[ 2.88000000e+03, 0.00000000e+00, 1.12000000e+02], [0.00000000e+00, 2.88000000e+03, 1.12000000e+02], [0, 0, 1]]); rvecs = pose_Rt[0:3] tvec = np.reshape(pose_Rt[3:6], [3,1]) tsum = np.repeat(tvec,numLM,1) rmat, jacobian = cv2.Rodrigues(rvecs, None) transformed_lms = np.matmul(rmat, ref_lm_t) + tsum transformed_lms = np.matmul(PI, transformed_lms) transformed_lms[0,:] = transformed_lms[0,:]/transformed_lms[2,:] transformed_lms[1,:] = transformed_lms[1,:]/transformed_lms[2,:] lms = np.transpose(transformed_lms[:2,:]) return lms def RotationMatrix(angle_x, angle_y, angle_z): # get rotation matrix by rotate angle phi = angle_x; # pitch gamma = angle_y; # yaw theta = angle_z; # roll R_x = np.array([ [1, 0, 0] , [0, np.cos(phi), np.sin(phi)] , [0, -np.sin(phi), np.cos(phi)] ]); R_y = np.array([ [np.cos(gamma), 0, -np.sin(gamma)] , [0, 1, 0] , [np.sin(gamma), 0, np.cos(gamma)] ]); R_z = np.array([ [np.cos(theta), np.sin(theta), 0] , [-np.sin(theta), np.cos(theta), 0] , [0, 0, 1] ]); R = np.matmul( R_x , np.matmul(R_y , R_z) ); return R def matrix2angle(R): ''' compute three Euler angles from a Rotation Matrix. Ref: http://www.gregslabaugh.net/publications/euler.pdf Args: R: (3,3). rotation matrix Returns: x: yaw y: pitch z: roll ''' # assert(isRotationMatrix(R)) if R[2,0] !=1 or R[2,0] != -1: #x = asin(R[2,0]) #y = atan2(R[2,1]/cos(x), R[2,2]/cos(x)) #z = atan2(R[1,0]/cos(x), R[0,0]/cos(x)) x = -asin(R[2,0]) #x = np.pi - x y = atan2(R[2,1]/cos(x), R[2,2]/cos(x)) z = atan2(R[1,0]/cos(x), R[0,0]/cos(x)) else:# Gimbal lock z = 0 #can be anything if R[2,0] == -1: x = np.pi/2 y = z + atan2(R[0,1], R[0,2]) else: x = -np.pi/2 y = -z + atan2(-R[0,1], -R[0,2]) return x, y, z def P2sRt(P): ''' decompositing camera matrix P. Args: P: (3, 4). Affine Camera Matrix. Returns: s: scale factor. R: (3, 3). rotation matrix. t2d: (2,). 2d translation. t3d: (3,). 3d translation. ''' #t2d = P[:2, 3] t3d = P[:, 3] R1 = P[0:1, :3] R2 = P[1:2, :3] s = (np.linalg.norm(R1) + np.linalg.norm(R2))/2.0 r1 = R1/np.linalg.norm(R1) r2 = R2/np.linalg.norm(R2) r3 = np.cross(r1, r2) R = np.concatenate((r1, r2, r3), 0) return s, R, t3d