Repository: epiception/CalibNet
Branch: main
Commit: a6fed6de1805
Files: 83
Total size: 325.6 KB

Directory structure:
gitextract_1fl8qxli/

├── LICENSE
├── README.md
├── code/
│   ├── common/
│   │   ├── Lie_functions.py
│   │   ├── __init__.py
│   │   ├── all_transformer.py
│   │   ├── cnn_utils_res.py
│   │   ├── global_agg_net.py
│   │   ├── resnet_depth_model.py
│   │   └── resnet_rgb_model.py
│   ├── config_res.py
│   ├── dataset_files/
│   │   ├── dataset_build_color.py
│   │   ├── dataset_build_color_2.py
│   │   ├── dataset_builder_parallel.sh
│   │   └── parser.py
│   ├── model_utils.py
│   ├── nw_loader_color.py
│   ├── tf_ops/
│   │   ├── CD/
│   │   │   ├── __init__.py
│   │   │   ├── makefile
│   │   │   ├── render_balls_so.cpp
│   │   │   ├── tf_nndistance.cpp
│   │   │   ├── tf_nndistance.py
│   │   │   ├── tf_nndistance_g.cu
│   │   │   └── tf_nndistance_g.cu.o
│   │   ├── __init__.py
│   │   ├── emd/
│   │   │   ├── __init__.py
│   │   │   ├── tf_auctionmatch.cpp
│   │   │   ├── tf_auctionmatch.py
│   │   │   ├── tf_auctionmatch_compile.sh
│   │   │   ├── tf_auctionmatch_g.cu
│   │   │   └── tf_auctionmatch_g.cu.o
│   │   ├── grouping/
│   │   │   ├── __init__.py
│   │   │   ├── compile.sh
│   │   │   ├── query_ball_point.cpp
│   │   │   ├── query_ball_point.cu
│   │   │   ├── query_ball_point_block.cu
│   │   │   ├── query_ball_point_grid.cu
│   │   │   ├── selection_sort
│   │   │   ├── selection_sort.cpp
│   │   │   ├── selection_sort.cu
│   │   │   ├── selection_sort_const.cu
│   │   │   ├── test_knn.py
│   │   │   ├── tf_grouping.cpp
│   │   │   ├── tf_grouping.py
│   │   │   ├── tf_grouping_compile.sh
│   │   │   ├── tf_grouping_g.cu
│   │   │   ├── tf_grouping_g.cu.o
│   │   │   └── tf_grouping_op_test.py
│   │   ├── interpolation/
│   │   │   ├── __init__.py
│   │   │   ├── interpolate.cpp
│   │   │   ├── tf_interpolate.cpp
│   │   │   ├── tf_interpolate.py
│   │   │   ├── tf_interpolate_compile.sh
│   │   │   ├── tf_interpolate_op_test.py
│   │   │   └── visu_interpolation.py
│   │   └── sampling/
│   │       ├── __init__.py
│   │       ├── tf_sampling.cpp
│   │       ├── tf_sampling.py
│   │       ├── tf_sampling_compile.sh
│   │       ├── tf_sampling_g.cu
│   │       └── tf_sampling_g.cu.o
│   ├── train_model_combined.py
│   └── utils/
│       ├── __init__.py
│       ├── data_prep_util.py
│       ├── eulerangles.py
│       ├── modelnet_data_prep.py
│       ├── off2obj.py
│       ├── pc_util.py
│       ├── plyfile.py
│       ├── pointnet_util.py
│       ├── provider.py
│       ├── show3d.py
│       ├── tf_util.py
│       ├── tf_util2.py
│       └── write_result2html.py
└── docs/
    ├── .nojekyll
    ├── _site/
    │   ├── css/
    │   │   ├── main.css
    │   │   ├── project.css
    │   │   └── project_file.css
    │   └── index.html
    ├── css/
    │   ├── main.css
    │   ├── project.css
    │   └── project_file.css
    └── index.html

================================================
FILE CONTENTS
================================================

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2018 Ganesh Iyer

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# CalibNet

### [DEPRECATED] This repository is no longer actively supported. 

While the authors work on an update, please check out this unofficial implementation: [CalibNet_pytorch](https://github.com/gitouni/CalibNet_pytorch) :fire: :slightly_smiling_face:
___

Code for our paper:
[CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks](https://arxiv.org/pdf/1803.08181.pdf)

Check out our [project page](https://epiception.github.io/CalibNet/)!

![CalibNet_gif1](https://media.giphy.com/media/1zjOgLf7j4lHmeMubG/giphy.gif)

### Prerequisites
CalibNet is trained on Tensorflow 1.3, CUDA 8.0, CUDNN 7.0.1


##### Installation

The code for point cloud distance loss is modified from [PU-NET](https://github.com/yulequan/PU-Net), PointNet++, PointSetGeneration.

This repository, thus, is based on Tensorflow and the TF operators from PointNet++ and PU-NET.

For installing tensorflow, please follow the official instructions in here. The code is tested under TF1.3 and Python 2.7 on Ubuntu 16.04.

For compiling TF operators, please check tf_xxx_compile.sh under each op subfolder in code/tf_ops folder, and change the path correctly to ../path/to/tensorflow/include. Note that you need to update nvcc, python and tensoflow include library if necessary. You also need to remove -D_GLIBCXX_USE_CXX11_ABI=0 flag in g++ command in order to compile correctly if necessary.

We are working to update the code and installation steps for the latest tensorflow versions.

### Dataset Preparation

To prepare the dataset, run /dataset_files/dataset_builder_parallel.sh in the directory where you wish to store. We will also create a parser `parsed_set.txt` for the dataset, that contains the file names for training.

```
git clone https://github.com/epiception/CalibNet.git
or
svn checkout https://github.com/CalibNet/trunk/code (for the code)
cd ../path/to/dataset_directory
bash ../path/to/code_folder/dataset_files/dataset_builder_parallel.sh
cd ../path/to/CalibNet/code
python dataset_files/parser.py ../dataset_directory/2011_09_26/
```
##### Resnet-18
Pretrained Resnet-18 parameters can be found [here](https://drive.google.com/open?id=1XGqdBH3A88m1LgUIe5tS7VjKjtQc1A6V).


### Training

Before training, be sure to make requisite changes to the paths and training parameters in the config file `config_res.py`.
We trained using 2 GPUs. The base code is written to support ops for the same device configuration. 

To begin training:
```
CUDA_VISIBLE_DEVICES=<device_id1>,<device_id2> python -B train_model_combined.py
```

##### Trained Weights
Trained weights for the base variant (non-iterative) model is available [here](https://drive.google.com/drive/folders/138hq7OgTEBmG-wK52h7gchg5ob1WqARn?usp=sharing). This model was trained for 44 epochs. As mentioned in the paper, the iterative realignment model for better translation outputs will uploaded soon.

### Evaluation/Test
Code for Direct Evaluation/Testing pipeline for point cloud calibration will be uploaded soon.


================================================
FILE: code/common/Lie_functions.py
================================================
import numpy as np
import tensorflow as tf

def for_translation(Old_transform,t):

    R = Old_transform[:3,:3]
    t = tf.expand_dims(t,1)
    trans_T = tf.concat([R,t], 1)

    return trans_T

def for_rotation(Old_transform):

    R = Old_transform[:3,:3]
    t = tf.expand_dims(tf.constant(np.array([0.0, 0.0, 0.0], dtype = np.float32)), 1)
    rots_T = tf.concat([R,t], 1)

    return rots_T


def exponential_map_single(vec):

    "Exponential Map Operation. Decoupled for SO(3) and translation t"

    with tf.name_scope("Exponential_map"):

        u = vec[:3]
        omega = vec[3:]

        theta = tf.sqrt(omega[0]*omega[0] + omega[1]*omega[1] + omega[2]*omega[2])

        omega_cross = tf.stack([0.0, -omega[2], omega[1], omega[2], 0.0, -omega[0], -omega[1], omega[0], 0.0])
        omega_cross = tf.reshape(omega_cross, [3,3])

        #Taylor's approximation for A,B and C not being used currently, approximations preferable for low values of theta

        # A = 1.0 - (tf.pow(theta,2)/factorial(3.0)) + (tf.pow(theta, 4)/factorial(5.0))
        # B = 1.0/factorial(2.0) - (tf.pow(theta,2)/factorial(4.0)) + (tf.pow(theta, 4)/factorial(6.0))
        # C = 1.0/factorial(3.0) - (tf.pow(theta,2)/factorial(5.0)) + (tf.pow(theta, 4)/factorial(7.0))

        A = tf.sin(theta)/theta

        B = (1.0 - tf.cos(theta))/(tf.pow(theta,2))

        C = (1.0 - A)/(tf.pow(theta,2))

        omega_cross_square = tf.matmul(omega_cross, omega_cross)

        R = tf.eye(3,3) + A*omega_cross + B*omega_cross_square

        V = tf.eye(3,3) + B*omega_cross + C*omega_cross_square
        Vu = tf.matmul(V,tf.expand_dims(u,1))

        T = tf.concat([R, Vu], 1)

        return T


def transforms_mul(T1, T2):


    T1 = tf.concat ([T1, tf.constant(np.array([[0.0, 0.0, 0.0, 1.0]]), dtype = tf.float32)], 0)
    T2 = tf.concat ([T2, tf.constant(np.array([[0.0, 0.0, 0.0, 1.0]]), dtype = tf.float32)], 0)

    product = tf.matmul(T1, T2)

    return product


================================================
FILE: code/common/__init__.py
================================================


================================================
FILE: code/common/all_transformer.py
================================================
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import scipy.misc as smc

import config_res as config

IMG_HT = config.depth_img_params['IMG_HT']
IMG_WDT = config.depth_img_params['IMG_WDT']
batch_size = 1

shape = (IMG_HT, IMG_WDT)

def _simple_transformer(depth_map, t_mat, k_mat, small_transform):

    batch_grids, transformed_depth_map, sparse_cloud  = _3D_meshgrid_batchwise_diff(IMG_HT, IMG_WDT, depth_map, batch_size, t_mat, k_mat, small_transform)

    x_all = tf.reshape(batch_grids[:,0], (IMG_HT, IMG_WDT))
    y_all = tf.reshape(batch_grids[:,1], (IMG_HT, IMG_WDT))

    return _bilinear_sampling(transformed_depth_map, x_all, y_all), sparse_cloud


def sparsify_cloud(S):

    """
    Cluster centers of point clouds used to sparsify cloud for Earth Mover's Distance. Using 4096 centroids
    """

    with tf.device('/cpu:0'):

        point_limit = 4096
        no_points = tf.shape(S)[0]
        no_partitions = no_points/tf.constant(point_limit, dtype=tf.int32)
        saved_points = tf.gather_nd(S, [tf.expand_dims(tf.range(0, no_partitions*point_limit), 1)])
        saved_points = tf.reshape(saved_points, [point_limit, no_partitions, 3])
        saved_points_sparse = tf.reduce_mean(saved_points, 1)

        return saved_points_sparse

def _3D_meshgrid_batchwise_diff(height, width, depth_img, num_batch, transformation_matrix, tf_K_mat, small_transform):

    """
    Creates 3d sampling meshgrid
    """

    x_index = tf.linspace(-1.0, 1.0, width)
    y_index = tf.linspace(-1.0, 1.0, height)
    z_index = tf.range(0, width*height)

    x_t, y_t = tf.meshgrid(x_index, y_index)

    # flatten
    x_t_flat = tf.reshape(x_t, [1,-1])
    y_t_flat = tf.reshape(y_t, [1,-1])
    ZZ = tf.reshape(depth_img, [-1])

    zeros_target = tf.zeros_like(ZZ)
    mask = tf.not_equal(ZZ, zeros_target)
    ones = tf.ones_like(x_t_flat)

    sampling_grid_2d = tf.concat([x_t_flat, y_t_flat, ones], 0)
    sampling_grid_2d_sparse = tf.transpose(tf.boolean_mask(tf.transpose(sampling_grid_2d), mask))
    ZZ_saved = tf.boolean_mask(ZZ, mask)
    ones_saved = tf.expand_dims(tf.ones_like(ZZ_saved), 0)

    projection_grid_3d = tf.matmul(tf.matrix_inverse(tf_K_mat), sampling_grid_2d_sparse*ZZ_saved)

    homog_points_3d = tf.concat([projection_grid_3d, ones_saved], 0)

    final_transformation_matrix = tf.matmul(transformation_matrix, small_transform)[:3,:]
    warped_sampling_grid = tf.matmul(final_transformation_matrix, homog_points_3d)

    points_2d = tf.matmul(tf_K_mat, warped_sampling_grid[:3,:])

    Z = points_2d[2,:]

    x_dash_pred = points_2d[0,:]
    y_dash_pred = points_2d[1,:]
    point_cloud = tf.stack([x_dash_pred, y_dash_pred, Z], 1)

    sparse_point_cloud = sparsify_cloud(point_cloud)

    x = tf.transpose(points_2d[0,:]/Z)
    y = tf.transpose(points_2d[1,:]/Z)

    mask_int = tf.cast(mask, 'int32')

    updated_indices = tf.expand_dims(tf.boolean_mask(mask_int*z_index, mask), 1)

    updated_Z = tf.scatter_nd(updated_indices, Z, tf.constant([width*height]))
    updated_x = tf.scatter_nd(updated_indices, x, tf.constant([width*height]))
    neg_ones = tf.ones_like(updated_x)*-1.0
    updated_x_fin = tf.where(tf.equal(updated_Z, zeros_target), neg_ones, updated_x)

    updated_y = tf.scatter_nd(updated_indices, y, tf.constant([width*height]))
    updated_y_fin = tf.where(tf.equal(updated_Z, zeros_target), neg_ones, updated_y)

    reprojected_grid = tf.stack([updated_x_fin, updated_y_fin], 1)

    transformed_depth = tf.reshape(updated_Z, (IMG_HT, IMG_WDT))

    return reprojected_grid, transformed_depth, sparse_point_cloud

def reverse_all(z):

    """Reversing from cantor function indices to correct indices"""

    z = tf.cast(z, 'float32')
    w = tf.floor((tf.sqrt(8.*z + 1.) - 1.)/2.0)
    t = (w**2 + w)/2.0
    y = tf.clip_by_value(tf.expand_dims(z - t, 1), 0.0, IMG_HT - 1)
    x = tf.clip_by_value(tf.expand_dims(w - y[:,0], 1), 0.0, IMG_WDT - 1)

    return tf.concat([y,x], 1)

def get_pixel_value(img, x, y):

    """Cantor pairing for removing non-unique updates and indices. At the time of implementation, unfixed issue with scatter_nd causes problems with int32 update values. Till resolution, implemented on cpu """

    with tf.device('/cpu:0'):
        indices = tf.stack([y, x], 2)
        indices = tf.reshape(indices, (375*1242, 2))
        values = tf.reshape(img, [-1])

        Y = indices[:,0]
        X = indices[:,1]
        Z = (X + Y)*(X + Y + 1)/2 + Y

        filtered, idx = tf.unique(tf.squeeze(Z))
        updated_values  = tf.unsorted_segment_max(values, idx, tf.shape(filtered)[0])

        # updated_indices = tf.map_fn(fn=lambda i: reverse(i), elems=filtered, dtype=tf.float32)
        updated_indices = reverse_all(filtered)
        updated_indices = tf.cast(updated_indices, 'int32')
        resolved_map = tf.scatter_nd(updated_indices, updated_values, img.shape)

        return resolved_map

def _bilinear_sampling(img, x_func, y_func):

    """
    Sampling from input image and performing bilinear interpolation
    """

    max_y = tf.constant(IMG_HT - 1, dtype=tf.int32)
    max_x = tf.constant(IMG_WDT - 1, dtype=tf.int32)

    zero = tf.zeros([], dtype='int32')

    # rescale x and y to [0, W/H/D]
    x = 0.5 * ((x_func + 1.0) * tf.cast(IMG_WDT - 1, 'float32'))
    y = 0.5 * ((y_func + 1.0) * tf.cast(IMG_HT - 1, 'float32'))

    x = tf.clip_by_value(x, 0.0, tf.cast(max_x, 'float32'))
    y = tf.clip_by_value(y, 0.0, tf.cast(max_y, 'float32'))

    # grab 4 nearest corner points for each (x_i, y_i)
    # i.e. we need a rectangle around the point of interest
    x0 = tf.cast(tf.floor(x), 'int32')
    x1 = x0 + 1
    y0 = tf.cast(tf.floor(y), 'int32')
    y1 = y0 + 1

    # clip to range [0, H/W] to not violate img boundaries
    x0 = tf.clip_by_value(x0, 0, max_x)
    x1 = tf.clip_by_value(x1, 0, max_x)
    y0 = tf.clip_by_value(y0, 0, max_y)
    y1 = tf.clip_by_value(y1, 0, max_y)

    # find Ia, Ib, Ic, Id

    Ia = get_pixel_value(img, x0, y0)
    Ib = get_pixel_value(img, x0, y1)
    Ic = get_pixel_value(img, x1, y0)
    Id = get_pixel_value(img, x1, y1)

    x0 = tf.cast(x0, 'float32')
    x1 = tf.cast(x1, 'float32')
    y0 = tf.cast(y0, 'float32')
    y1 = tf.cast(y1, 'float32')

    # calculate deltas
    wa = (x1-x) * (y1-y)
    wb = (x1-x) * (y-y0)
    wc = (x-x0) * (y1-y)
    wd = (x-x0) * (y-y0)

    loc = wa*Ia + wb*Ib + wc*Ic + wd*Id

    return loc


================================================
FILE: code/common/cnn_utils_res.py
================================================
import numpy as np
import tensorflow as tf

def weight_variable(shape, str):
    '''
    Helper function to create a weight variable initialized with
    a normal distribution (truncated to two standard devs)
    Parameters
    ----------
    shape : list
        Size of weight variable
    '''

    W = tf.get_variable("weight" + str, shape=shape, initializer=tf.contrib.layers.xavier_initializer())
    return W

def weight_variable_fc(shape, str):
    '''
    Helper function to create a weight variable initialized with
    a normal distribution (truncated to two standard devs)
    Parameters
    ----------
    shape : list
        Size of weight variable
    '''

    W = 0.01*tf.get_variable("weight" + str, shape=shape, initializer=tf.contrib.layers.xavier_initializer())
    return W

def bias_variable(shape, str):

    B = tf.Variable(tf.constant(0.0, shape= shape, dtype=tf.float32), name="bias" + str)
    return B


def init_weights(W, str, to_train):
    init = tf.constant_initializer(W)
    weight = tf.get_variable('weight'+str, shape=W.shape, dtype=tf.float32, initializer= init, trainable = to_train)
    return weight

def init_bias(B, layerno, to_train):
    init = tf.constant_initializer(B)
    bias = tf.get_variable('bias_%d'%layerno, shape=B.shape, dtype=tf.float32, initializer= init, trainable = to_train)
    return bias

def conv2d_batchnorm(x, W, name, phase, beta_r, gamma_r, mean_r, variance_r, stride = [1,1,1,1], relu = True):

    beta = tf.constant_initializer(beta_r)
    gamma = tf.constant_initializer(gamma_r)
    moving_mean = tf.constant_initializer(mean_r)
    moving_variance = tf.constant_initializer(variance_r)

    with tf.name_scope(name):
        mid1 =  tf.nn.conv2d(x, W, strides = stride, padding = "SAME")
        with tf.name_scope('batch_norm'):
            mid2 = tf.contrib.layers.batch_norm(mid1, param_initializers={'beta': beta, 'gamma': gamma, 'moving_mean': moving_mean,'moving_variance': moving_variance,}, is_training = phase, updates_collections = None, scale = True, decay = 0.9)
            if(relu == True):
                return tf.nn.relu(mid2)
            else:
                return mid2

def conv2d_batchnorm_init(x, W, name, phase, stride = [1,1,1,1], relu = True):

    with tf.name_scope(name):
        mid1 =  tf.nn.conv2d(x, W, strides = stride, padding ="SAME")
        with tf.name_scope('batch_norm'):
            mid2 = tf.contrib.layers.batch_norm(mid1, is_training = phase, updates_collections = None)
            if(relu == True):
                return tf.nn.relu(mid2)
            else:
                return mid2

def conv2d_init(x, W, name, phase, stride, padding):
    with tf.name_scope(name):
        mid1 =  tf.nn.conv2d(x, W, strides = stride, padding = padding)
        mid2 = tf.nn.relu(mid1)

        return mid2


def conv2d_bias_init(x, W, b, name):
    with tf.name_scope(name):
        mid1 =  tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = "SAME") + b
        mid2 = tf.nn.relu(mid1)

        return mid2

def max_pool(x, name):
    return tf.nn.max_pool(x, ksize = [1,2,2,1], strides = [1,2,2,1], padding = "SAME", name=name)

def variable_summaries(var):
  """Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""

  with tf.name_scope('summaries'):
    mean = tf.reduce_mean(var)
    sum_mean = tf.summary.scalar('mean', mean)
    with tf.name_scope('stddev'):
      stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
    sum_stddev = tf.summary.scalar('stddev', stddev)
    #tf.summary.scalar('max', tf.reduce_max(var))
    #tf.summary.scalar('min', tf.reduce_min(var))
    sum_hist = tf.summary.histogram('histogram', var)
    return [sum_mean, sum_hist, sum_stddev]


================================================
FILE: code/common/global_agg_net.py
================================================
import numpy as np
import tensorflow as tf
import scipy.misc as smc
import matplotlib.pyplot as plt

import config_res as config
from cnn_utils_res import *

import resnet_rgb_model as model
import resnet_depth_model as model_depth

batch_size = config.net_params['batch_size']
current_epoch = config.net_params['load_epoch']

def End_Net_weights_init():

    """
    Initialize Aggregation Network Weights and Summaries
    """

    W_ext1 = weight_variable([3,3,768,384], "_8")
    W_ext2 = weight_variable([3,3,384,384], "_9")
    W_ext3 = weight_variable([1,2,384,384], "_10")

    W_ext4_rot = weight_variable([1,1,384,384], "_11")
    W_fc_rot = weight_variable_fc([3840,3], "_12")

    W_ext4_tr = weight_variable([1,1,384,384], "_13")
    W_fc_tr = weight_variable_fc([3840,3], "_14")

    end_weights = [W_ext1, W_ext2, W_ext3, W_ext4_rot, W_fc_rot, W_ext4_tr, W_fc_tr]

    weight_summaries = []

    for weight_index in range(len(end_weights)):
        with tf.name_scope('weight_%d'%weight_index):
            weight_summaries += variable_summaries(end_weights[weight_index])

    return end_weights, weight_summaries

def End_Net(input_x, phase_depth, keep_prob):

    """
    Define Aggregation Network
    """

    weights, summaries = End_Net_weights_init()

    layer8 = conv2d_batchnorm_init(input_x, weights[0], name="conv_9", phase= phase_depth, stride=[1,2,2,1])
    layer9 = conv2d_batchnorm_init(layer8, weights[1], name="conv_10", phase= phase_depth, stride=[1,2,2,1])
    layer10 = conv2d_batchnorm_init(layer9, weights[2], name="conv_11", phase= phase_depth, stride=[1,1,1,1])

    layer11_rot = conv2d_batchnorm_init(layer10, weights[3], name="conv_12", phase= phase_depth, stride=[1,1,1,1])
    layer11_m_rot = tf.reshape(layer11_rot, [batch_size, 3840])
    layer11_drop_rot = tf.nn.dropout(layer11_m_rot, keep_prob)
    layer11_vec_rot = (tf.matmul(layer11_drop_rot, weights[4]))

    layer11_tr = conv2d_batchnorm_init(layer10, weights[5], name="conv_13", phase= phase_depth, stride=[1,1,1,1])
    layer11_m_tr = tf.reshape(layer11_tr, [batch_size, 3840])
    layer11_drop_tr = tf.nn.dropout(layer11_m_tr, keep_prob)
    layer11_vec_tr = (tf.matmul(layer11_drop_tr, weights[6]))

    output_vectors = tf.concat([layer11_vec_tr, layer11_vec_rot], 1)
    return output_vectors, summaries


def End_Net_Out(X1, phase_rgb, pooled_input2, phase, keep_prob):

    """
    Computation Graph
    """

    RGB_Net_obj = model.Resnet(X1, phase_rgb)
    Depth_Net_obj = model_depth.Depthnet(pooled_input2, phase)

    with tf.variable_scope('ResNet'):
        with tf.device('/device:GPU:0'):
            output_rgb = RGB_Net_obj.Net()
    with tf.variable_scope('DepthNet'):
        with tf.device('/device:GPU:1'):
            output_depth = Depth_Net_obj.Net()

    layer_next = tf.concat([output_rgb, output_depth], 3)

    end_net_op = End_Net(layer_next, phase, keep_prob)

    return end_net_op


================================================
FILE: code/common/resnet_depth_model.py
================================================
import numpy as np
import tensorflow as tf
import json
from cnn_utils_res import *

import config_res

with open(config_res.paths['resnet_params_path']) as f_in:
    parameters = json.load(f_in)

class Depthnet:

    def __init__(self, input_y, phase, parameters = parameters):

        self.input_y = input_y
        self.phase = phase
        self.parameters = parameters
        self.layer_zero
        self.layer
        self.Net

    def Net(self):
        layer_zero_out = self.layer_zero(self.input_y)

        current_output = layer_zero_out

        for layer_idx in range(1,5):
            layer_out = self.layer(current_output, layer_idx)
            current_output = layer_out

        return current_output

    def layer_zero(self, layer_input):

        layer_dict =self.parameters['layer0']
        bl_str = "block_1"

        W = np.array(layer_dict[bl_str]['conv1']['weight'], dtype = np.float32)
        bn_mov_mean = np.array(layer_dict[bl_str]['bn1']['running_mean'], dtype = np.float32)
        bn_mov_var = np.array(layer_dict[bl_str]['bn1']['running_var'], dtype = np.float32)
        bn_gamma = np.array(layer_dict[bl_str]['bn1']['weight'], dtype = np.float32)
        bn_beta = np.array(layer_dict[bl_str]['bn1']['bias'], dtype = np.float32)

        shapex = W.shape
        W_conv = weight_variable([shapex[0], shapex[1], 1, shapex[3]/2], "_depth_0")
        # out = conv2d_batchnorm(layer_input, W_conv, "layer_depth_0", self.phase, bn_beta, bn_gamma, bn_mov_mean, bn_mov_var, [1,2,2,1], True)
        out = conv2d_batchnorm_init(layer_input, W_conv, "layer_depth_0", self.phase, [1,2,2,1], True)

        out = tf.nn.max_pool(out, [1,3,3,1], strides=[1,2,2,1], padding="SAME")

        print('layer0', out.shape)
        return out

    def layer(self, layer_input, layer_no):
        layer_dict = self.parameters['layer%d'%layer_no]

        cur = layer_input
        res = layer_input

        for b_no in range(1,3):
            bl_str = "block_%d"%b_no

            stride = [0,0]
            if(b_no == 1):
                stride = [2,1]
            else:
                stride = [1,1]

            # for in_bno in range(1,3):

            W1 = np.array(layer_dict[bl_str]['conv1']['weight'], dtype = np.float32)
            bn_mov_mean1 = np.array(layer_dict[bl_str]['bn1']['running_mean'], dtype = np.float32)
            bn_mov_var1 = np.array(layer_dict[bl_str]['bn1']['running_var'], dtype = np.float32)
            bn_gamma1 = np.array(layer_dict[bl_str]['bn1']['weight'], dtype = np.float32)
            bn_beta1 = np.array(layer_dict[bl_str]['bn1']['bias'], dtype = np.float32)

            W2 = np.array(layer_dict[bl_str]['conv2']['weight'], dtype = np.float32)
            bn_mov_mean2 = np.array(layer_dict[bl_str]['bn2']['running_mean'], dtype = np.float32)
            bn_mov_var2 = np.array(layer_dict[bl_str]['bn2']['running_var'], dtype = np.float32)
            bn_gamma2 = np.array(layer_dict[bl_str]['bn2']['weight'], dtype = np.float32)
            bn_beta2 = np.array(layer_dict[bl_str]['bn2']['bias'], dtype = np.float32)

            # W_conv1 = init_weights(W1, "_l_%d_bl_%d_no_%d"%(layer_no,b_no, 1), False)
            # W_conv2 = init_weights(W2, "_l_%d_bl_%d_no_%d"%(layer_no,b_no, 2), False)

            shapex1 = W1.shape
            shapex2 = W2.shape

            W_conv1 = weight_variable([shapex1[0], shapex1[1], shapex1[2]/2, shapex1[3]/2], "dep_l_%d_bl_%d_no_%d"%(layer_no,b_no, 1))
            W_conv2 = weight_variable([shapex2[0], shapex2[1], shapex2[2]/2, shapex2[3]/2], "dep_l_%d_bl_%d_no_%d"%(layer_no,b_no, 2))

            # out1 = conv2d_batchnorm(cur, W_conv1, "layer_%d_%d_1"%(layer_no,b_no), self.phase, bn_beta1, bn_gamma1, bn_mov_mean1, bn_mov_var1, [1,stride[0],stride[0],1], False)

            out1 = conv2d_batchnorm_init(cur, W_conv1, "dep_layer_%d_%d_1"%(layer_no,b_no), self.phase, [1,stride[0],stride[0],1], False)

            print("layer_%d_%d_1"%(layer_no,b_no), out1.shape)

            """ if layer1 no downsample, so stride 2,1 then 1,1 """
            """else stride 2,1 then downsample then 1,1 """

            if(layer_no > 1 and b_no == 1):
                downsample_dict = self.parameters['layer%d_downsample'%layer_no]
                W_dn = np.array(downsample_dict['block_1']['conv']['weight'], dtype = np.float32)
                bn_mov_mean_dn = np.array(downsample_dict['block_1']['bn']['running_mean'], dtype = np.float32)
                bn_mov_var_dn = np.array(downsample_dict['block_1']['bn']['running_var'], dtype = np.float32)
                bn_gamma_dn = np.array(downsample_dict['block_1']['bn']['weight'], dtype = np.float32)
                bn_beta_dn = np.array(downsample_dict['block_1']['bn']['bias'], dtype = np.float32)

                # W_conv_dn = init_weights(W_dn, "downsample_%d"%(layer_no), False)
                # res = conv2d_batchnorm(res, W_conv_dn, "layer_dn_%d"%(layer_no), self.phase, bn_beta_dn, bn_gamma_dn, bn_mov_mean_dn, bn_mov_var_dn, [1,2,2,1], False)

                shapex = W_dn.shape

                W_conv_dn = weight_variable([shapex[0], shapex[1], shapex[2]/2, shapex[3]/2], "dep_downsample_%d"%(layer_no))
                res = conv2d_batchnorm_init(res, W_conv_dn, "dep_layer_dn_%d"%(layer_no), self.phase, [1,2,2,1], False)

                print("downsample_layer_%d_%d_1"%(layer_no,b_no), res.shape)

                out1 = tf.nn.relu(out1 + res)

            else:
                out1 = tf.nn.relu(out1)

            out2 = conv2d_batchnorm_init(out1, W_conv2, "dep_layer_%d_%d_2"%(layer_no,b_no), self.phase, [1,stride[1],stride[1],1], True)
            print("layer_%d_%d_2"%(layer_no,b_no), out2.shape)
            cur = out2

        return cur


================================================
FILE: code/common/resnet_rgb_model.py
================================================
import numpy as np
import tensorflow as tf
import json
from cnn_utils_res import *
import config_res

with open(config_res.paths['resnet_params_path']) as f_in:
    parameters = json.load(f_in)


class Resnet:

    def __init__(self, input_x, phase, parameters = parameters):

        self.input_x = input_x
        self.phase = phase
        self.parameters = parameters
        self.layer_zero
        self.layer
        self.Net

    def Net(self):
        layer_zero_out = self.layer_zero(self.input_x)

        current_output = layer_zero_out

        for layer_idx in range(1,5):
            layer_out = self.layer(current_output, layer_idx)
            current_output = layer_out

        return current_output

    def layer_zero(self, layer_input):

        layer_dict =self.parameters['layer0']
        bl_str = "block_1"

        W = np.array(layer_dict[bl_str]['conv1']['weight'], dtype = np.float32)
        bn_mov_mean = np.array(layer_dict[bl_str]['bn1']['running_mean'], dtype = np.float32)
        bn_mov_var = np.array(layer_dict[bl_str]['bn1']['running_var'], dtype = np.float32)
        bn_gamma = np.array(layer_dict[bl_str]['bn1']['weight'], dtype = np.float32)
        bn_beta = np.array(layer_dict[bl_str]['bn1']['bias'], dtype = np.float32)

        W_conv = init_weights(W, "_0", False)

        # with tf.name_scope('layer_0'):
        out = conv2d_batchnorm(layer_input, W_conv, "layer_0", self.phase, bn_beta, bn_gamma, bn_mov_mean, bn_mov_var, [1,2,2,1], True)
        out = tf.nn.max_pool(out, [1,3,3,1], strides=[1,2,2,1], padding="SAME")

        print('layer0', out.shape)
        return out

    def layer(self, layer_input, layer_no):
        layer_dict = self.parameters['layer%d'%layer_no]

        cur = layer_input
        res = layer_input

        for b_no in range(1,3):
            bl_str = "block_%d"%b_no

            stride = [0,0]
            if(b_no == 1):
                stride = [2,1]
            else:
                stride = [1,1]

            # for in_bno in range(1,3):

            W1 = np.array(layer_dict[bl_str]['conv1']['weight'], dtype = np.float32)
            bn_mov_mean1 = np.array(layer_dict[bl_str]['bn1']['running_mean'], dtype = np.float32)
            bn_mov_var1 = np.array(layer_dict[bl_str]['bn1']['running_var'], dtype = np.float32)
            bn_gamma1 = np.array(layer_dict[bl_str]['bn1']['weight'], dtype = np.float32)
            bn_beta1 = np.array(layer_dict[bl_str]['bn1']['bias'], dtype = np.float32)

            W2 = np.array(layer_dict[bl_str]['conv2']['weight'], dtype = np.float32)
            bn_mov_mean2 = np.array(layer_dict[bl_str]['bn2']['running_mean'], dtype = np.float32)
            bn_mov_var2 = np.array(layer_dict[bl_str]['bn2']['running_var'], dtype = np.float32)
            bn_gamma2 = np.array(layer_dict[bl_str]['bn2']['weight'], dtype = np.float32)
            bn_beta2 = np.array(layer_dict[bl_str]['bn2']['bias'], dtype = np.float32)

            W_conv1 = init_weights(W1, "_l_%d_bl_%d_no_%d"%(layer_no,b_no, 1), False)
            W_conv2 = init_weights(W2, "_l_%d_bl_%d_no_%d"%(layer_no,b_no, 2), False)

            # with tf.name_scope("layer_%d_%d_1"%(layer_no,b_no)):
            out1 = conv2d_batchnorm(cur, W_conv1, "layer_%d_%d_1"%(layer_no,b_no), self.phase, bn_beta1, bn_gamma1, bn_mov_mean1, bn_mov_var1, [1,stride[0],stride[0],1], False)

            print("layer_%d_%d_1"%(layer_no,b_no), out1.shape)

            """ if layer1 no downsample, so stride 2,1 then 1,1 """
            """else stride 2,1 then downsample then 1,1 """

            if(layer_no > 1 and b_no == 1):
                downsample_dict = self.parameters['layer%d_downsample'%layer_no]
                W_dn = np.array(downsample_dict['block_1']['conv']['weight'], dtype = np.float32)
                bn_mov_mean_dn = np.array(downsample_dict['block_1']['bn']['running_mean'], dtype = np.float32)
                bn_mov_var_dn = np.array(downsample_dict['block_1']['bn']['running_var'], dtype = np.float32)
                bn_gamma_dn = np.array(downsample_dict['block_1']['bn']['weight'], dtype = np.float32)
                bn_beta_dn = np.array(downsample_dict['block_1']['bn']['bias'], dtype = np.float32)

                W_conv_dn = init_weights(W_dn, "downsample_%d"%(layer_no), False)
                # with tf.name_scope("downsample_layer_%d_%d"%(layer_no,b_no)):
                res = conv2d_batchnorm(res, W_conv_dn, "layer_dn_%d"%(layer_no), self.phase, bn_beta_dn, bn_gamma_dn, bn_mov_mean_dn, bn_mov_var_dn, [1,2,2,1], False)

                print("downsample_layer_%d_%d_1"%(layer_no,b_no), res.shape)

                out1 = tf.nn.relu(out1 + res)

            else:
                out1 = tf.nn.relu(out1)

            # with tf.name_scope("layer_%d_%d_2"%(layer_no,b_no)):
            out2 = conv2d_batchnorm(out1, W_conv2, "layer_%d_%d_2"%(layer_no,b_no), self.phase, bn_beta2, bn_gamma2, bn_mov_mean2, bn_mov_var2, [1,stride[1],stride[1],1], True)
            print("layer_%d_%d_2"%(layer_no,b_no), out2.shape)
            cur = out2

        return cur


================================================
FILE: code/config_res.py
================================================
import numpy as np

# Important paths

# resnet_params_path: path containing .pkl file of RESNET18_BN weights
# dataset_path_full: path to parser file
# checkpoint_path: dir where checkpoints are saved and loaded from
# training_imgs_path: dir to save training progress frames (spatial transformer outputs during training)
# validation_imgs_path: dir to save validation progress frames (spatial transformer outputs during validation)
paths = dict(
	resnet_params_path = "../Extrinsic_Calibration_2/parameters.json",
	dataset_path_full = "/home/ganeshiyer/Extrinsic_Calibration_1.5/model_with_emd/rotation_results/parsed_set.txt",
	checkpoint_path = "/tmp/ganesh_saved_Weights/Checkpoint_simple_transformer",
	training_imgs_path = "/tmp/ganesh_saved_Weights/training_imgs",
	validation_imgs_path = "/tmp/ganesh_saved_Weights/validation_imgs"
)

# Depth Map parameters

# IMG_HT: input image height
# IMG_WDT: input image width
depth_img_params = dict(
	IMG_HT = 375,
	IMG_WDT = 1242
)

# Camera parameters
# The current parameters are for the training set created for the all raw sequences: http://www.cvlibs.net/datasets/kitti/raw_data.php on  2011_09_26. Camera parameters change based on the date of sequence, hence used in config

# fx: focal length x
# fy: focal length y
# cx: camera center cx
# cy: camera center cy
# cam_transform_02: Transform to color camera 02 (applied after transform)
# cam_transform_02_inv: Inverse translation of above
camera_params = dict(
	fx = 7.215377e+02,
    fy = 7.215377e+02,
    cx = 6.095593e+02,
    cy = 1.728540e+02,

    cam_transform_02 =  np.array([1.0, 0.0, 0.0, (-4.485728e+01)/7.215377e+02,
	                              0.0, 1.0, 0.0, (-2.163791e-01)/7.215377e+02,
	                              0.0, 0.0, 1.0, -2.745884e-03,
	                              0.0, 0.0, 0.0, 1.0]).reshape(4,4),

	cam_transform_02_inv =  np.array([1.0, 0.0, 0.0, (4.485728e+01)/7.215377e+02,
	                                  0.0, 1.0, 0.0, (2.163791e-01)/7.215377e+02,
	                                  0.0, 0.0, 1.0, 2.745884e-03,
	                                  0.0, 0.0, 0.0, 1.0]).reshape(4,4)
)


# Network and Training Parameters

# batch_size: batch_size taken during training
# total_frames: total instances (check parsed_set.txt for total number of lines)
# total_frames_train: total training instances
# total_frames_validation: total validation instances
# partition_limit: partition size of the total dataset loaded into memory during training
# epochs: total number of epochs
# learning_rate
# beta1: momentum term for Adam Optimizer
# load_epoch: Load checkpoint no. 0 at the start of training (can be changed to resume training)
net_params = dict(
	batch_size = 20,
	total_frames = 30000,
	total_frames_train = 24000,
	total_frames_validation = 6000,
	partition_limit = 1200,
	epochs = 40,
	learning_rate = 5e-4,
	beta1 = 0.9,
	load_epoch = 0
	)


================================================
FILE: code/dataset_files/dataset_build_color.py
================================================
"""

For sequence: 2011_09_26

"""

import numpy as np
import matplotlib.pyplot as plt
import os
import glob
import argparse
from natsort import natsorted as ns
from skimage import io

import scipy.misc as smc
plt.ion()


IMG_HT = 375
IMG_WDT = 1242

fx = 7.215377e+02
fy = 7.215377e+02
cx = 6.095593e+02
cy = 1.728540e+02

K = np.array([7.215377e+02, 0.000000e+00, 6.095593e+02, 0.000000e+00, 7.215377e+02, 1.728540e+02, 0.000000e+00, 0.000000e+00, 1.000000e+00]).reshape(3,3)

velo_to_cam_R = np.array([7.533745e-03, -9.999714e-01, -6.166020e-04, 1.480249e-02, 7.280733e-04, -9.998902e-01, 9.998621e-01, 7.523790e-03, 1.480755e-02]).reshape(3,3)
velo_to_cam_T = np.array([-4.069766e-03, -7.631618e-02, -2.717806e-01]).reshape(3,1)

velo_to_cam = np.vstack((np.hstack((velo_to_cam_R, velo_to_cam_T)), np.array([[0,0,0,1]])))

R_rect_00 =  np.array([9.999239e-01, 9.837760e-03, -7.445048e-03, 0.0,
                      -9.869795e-03, 9.999421e-01, -4.278459e-03, 0.0,
                       7.402527e-03, 4.351614e-03, 9.999631e-01,  0.0,
                       0.0,          0.0,          0.0,           1.0]).reshape(4,4)

cam_02_transform = np.array([1.0, 0.0, 0.0, 4.485728e+01/fx,
                             0.0, 1.0, 0.0, 2.163791e-01/fy,
                             0.0, 0.0, 1.0, 2.745884e-03,
                             0.0, 0.0, 0.0, 1.0]).reshape(4,4)


parser = argparse.ArgumentParser(description="Create Lidar Dataset")
parser.add_argument("path", help = "path_to_folder, end with number", type = str)
args = parser.parse_args()

#
# main_path = "/tmp/lidar_calibration_dataset/2011_09_26/2011_09_26_drive_0104"

main_path = args.path

def timestamp_sync(path):
    txt1 = np.loadtxt(path + "_extract/velodyne_points/timestamps.txt", dtype = str)
    txt2 = np.loadtxt(path + "_sync/velodyne_points/timestamps.txt", dtype = str)
    file_list = ns(glob.glob(path + "_extract/velodyne_points/data/*.txt"))

    times1 = txt1[:,1]
    times2 = txt2[:,1]

    for idx in range(times1.shape[0]):
        times1[idx] = times1[idx].split(":")[2]

    for idx in range(times2.shape[0]):
        times2[idx] = times2[idx].split(":")[2]

    start_pt = times2[0]
    end_pt = times2[-1]

    index_start = np.where(times1 == start_pt)[0][0]
    index_end = np.where(times1 == end_pt)[0][0]

    return file_list[index_start:index_end+1]

if not os.path.exists(main_path + "_sync/depth_maps"):
    os.makedirs(main_path + "_sync/depth_maps")

if not os.path.exists(main_path + "_sync/target_imgs"):
    os.makedirs(main_path + "_sync/target_imgs")

if not os.path.exists(main_path + "_sync/depth_maps_transformed"):
    os.makedirs(main_path + "_sync/depth_maps_transformed")

depth_maps_folder = main_path + "_sync/depth_maps"
target_img_folder = main_path + "_sync/target_imgs"
depth_maps_transformed_folder = main_path + "_sync/depth_maps_transformed"

point_files = timestamp_sync(main_path)
imgs_files = ns(glob.glob(main_path + "_sync/image_02/data/*.png"))

angle_limit = 0.34722965035593395/1.25
tr_limit = 0.34722965035593395/1.25

angle_list = np.zeros((1,16), dtype = np.float32)

for img_name, cloud_name in zip(imgs_files, point_files):

    print(img_name, cloud_name)

    omega_x = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    omega_y = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    omega_z = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    tr_x = tr_limit*np.random.random_sample() - (tr_limit/2.0)
    tr_y = tr_limit*np.random.random_sample() - (tr_limit/2.0)
    tr_z = tr_limit*np.random.random_sample() - (tr_limit/2.0)

    theta = np.sqrt(omega_x**2 + omega_y**2 + omega_z**2)
    omega_cross = np.array([0.0, -omega_z, omega_y, omega_z, 0.0, -omega_x, -omega_y, omega_x, 0.0]).reshape(3,3)

    A = np.sin(theta)/theta
    B = (1.0 - np.cos(theta))/(theta**2)

    R = np.eye(3,3) + A*omega_cross + B*np.matmul(omega_cross, omega_cross)

    T = np.array([tr_x, tr_y, tr_z]).reshape(3,1)

    random_transform = np.vstack((np.hstack((R, T)), np.array([[0.0, 0.0, 0.0, 1.0]])))

    to_write_tr = np.expand_dims(np.ndarray.flatten(random_transform), 0)
    angle_list = np.vstack((angle_list, to_write_tr))

    points = np.loadtxt(cloud_name)
    points = points[:,:3]
    ones_col = np.ones(shape=(points.shape[0],1))
    points = np.hstack((points,ones_col))
    current_img = smc.imread(img_name)
    # current_img_color = smc.imread(color_name)

    img = smc.imread(img_name)
    img_ht = img.shape[0]
    img_wdt = img.shape[1]

    points_in_cam_axis = np.matmul(R_rect_00, (np.matmul(velo_to_cam, points.T)))
    transformed_points = np.matmul(random_transform, points_in_cam_axis)
    # transformed_points = transformed_points[:-1,:]

    points_2d = np.matmul(K, np.matmul(cam_02_transform, transformed_points)[:-1,:])

    # points_2d = np.matmul(K, points_in_cam_axis[:-1,:])

    Z = points_2d[2,:]
    x = (points_2d[0,:]/Z).T
    y = (points_2d[1,:]/Z).T

    x = np.clip(x, 0.0, img_wdt - 1)
    y = np.clip(y, 0.0, img_ht - 1)

    reprojected_img = np.zeros_like(img)
    for x_idx, y_idx,z_idx in zip(x,y,Z):
        if(z_idx>0):
            reprojected_img[int(y_idx), int(x_idx)] = z_idx

    smc.imsave(depth_maps_transformed_folder + "/" + img_name[-14:], reprojected_img)

    points_2d = np.matmul(K, np.matmul(cam_02_transform, points_in_cam_axis)[:-1,:])

    Z = points_2d[2,:]
    x = (points_2d[0,:]/Z).T
    y = (points_2d[1,:]/Z).T

    x = np.clip(x, 0.0, img_wdt - 1)
    y = np.clip(y, 0.0, img_ht - 1)

    reprojected_img = np.zeros_like(img)
    for x_idx, y_idx,z_idx in zip(x,y,Z):
        if(z_idx>0):
            reprojected_img[int(y_idx), int(x_idx)] = z_idx
    pooled_img = reprojected_img

    print(img_name[-14:])

    reconstructed_img = current_img*(pooled_img>0.)
    smc.imsave(depth_maps_folder + "/" + img_name[-14:], pooled_img)
    smc.imsave(target_img_folder + "/" + img_name[-14:], reconstructed_img)

np.savetxt(depth_maps_transformed_folder + "/../angle_list.txt", angle_list[1:], fmt = "%.4f")


================================================
FILE: code/dataset_files/dataset_build_color_2.py
================================================
"""

For sequence: 2011_09_26

"""

import numpy as np
import matplotlib.pyplot as plt
import os
import glob
import argparse
from natsort import natsorted as ns
from skimage import io

import scipy.misc as smc
plt.ion()


IMG_HT = 375
IMG_WDT = 1242

fx = 7.215377e+02
fy = 7.215377e+02
cx = 6.095593e+02
cy = 1.728540e+02

K = np.array([7.215377e+02, 0.000000e+00, 6.095593e+02, 0.000000e+00, 7.215377e+02, 1.728540e+02, 0.000000e+00, 0.000000e+00, 1.000000e+00]).reshape(3,3)

velo_to_cam_R = np.array([7.533745e-03, -9.999714e-01, -6.166020e-04, 1.480249e-02, 7.280733e-04, -9.998902e-01, 9.998621e-01, 7.523790e-03, 1.480755e-02]).reshape(3,3)
velo_to_cam_T = np.array([-4.069766e-03, -7.631618e-02, -2.717806e-01]).reshape(3,1)

velo_to_cam = np.vstack((np.hstack((velo_to_cam_R, velo_to_cam_T)), np.array([[0,0,0,1]])))

R_rect_00 =  np.array([9.999239e-01, 9.837760e-03, -7.445048e-03, 0.0,
                      -9.869795e-03, 9.999421e-01, -4.278459e-03, 0.0,
                       7.402527e-03, 4.351614e-03, 9.999631e-01,  0.0,
                       0.0,          0.0,          0.0,           1.0]).reshape(4,4)

cam_02_transform = np.array([1.0, 0.0, 0.0, 4.485728e+01/fx,
                             0.0, 1.0, 0.0, 2.163791e-01/fy,
                             0.0, 0.0, 1.0, 2.745884e-03,
                             0.0, 0.0, 0.0, 1.0]).reshape(4,4)


parser = argparse.ArgumentParser(description="Create Lidar Dataset")
parser.add_argument("path", help = "path_to_folder, end with number", type = str)
args = parser.parse_args()

#
# main_path = "/tmp/lidar_calibration_dataset/2011_09_26/2011_09_26_drive_0104"

main_path = args.path

def timestamp_sync(path):
    txt1 = np.loadtxt(path + "_extract/velodyne_points/timestamps.txt", dtype = str)
    txt2 = np.loadtxt(path + "_sync/velodyne_points/timestamps.txt", dtype = str)
    file_list = ns(glob.glob(path + "_extract/velodyne_points/data/*.txt"))

    times1 = txt1[:,1]
    times2 = txt2[:,1]

    for idx in range(times1.shape[0]):
        times1[idx] = times1[idx].split(":")[2]

    for idx in range(times2.shape[0]):
        times2[idx] = times2[idx].split(":")[2]

    start_pt = times2[0]
    end_pt = times2[-1]

    index_start = np.where(times1 == start_pt)[0][0]
    index_end = np.where(times1 == end_pt)[0][0]

    return file_list[index_start:index_end+1]

if not os.path.exists(main_path + "_sync/depth_maps_2"):
    os.makedirs(main_path + "_sync/depth_maps_2")

if not os.path.exists(main_path + "_sync/target_imgs_2"):
    os.makedirs(main_path + "_sync/target_imgs_2")

if not os.path.exists(main_path + "_sync/depth_maps_transformed_2"):
    os.makedirs(main_path + "_sync/depth_maps_transformed_2")

depth_maps_folder = main_path + "_sync/depth_maps_2"
target_img_folder = main_path + "_sync/target_imgs_2"
depth_maps_transformed_folder = main_path + "_sync/depth_maps_transformed_2"

point_files = timestamp_sync(main_path)
imgs_files = ns(glob.glob(main_path + "_sync/image_02/data/*.png"))

angle_limit = 0.34722965035593395/2.50
tr_limit = 0.34722965035593395*1.5

angle_list = np.zeros((1,16), dtype = np.float32)

for img_name, cloud_name in zip(imgs_files, point_files):

    print(img_name, cloud_name)

    omega_x = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    omega_y = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    omega_z = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    tr_x = tr_limit*np.random.random_sample() - (tr_limit/2.0)
    tr_y = tr_limit*np.random.random_sample() - (tr_limit/2.0)
    tr_z = tr_limit*np.random.random_sample() - (tr_limit/2.0)

    theta = np.sqrt(omega_x**2 + omega_y**2 + omega_z**2)
    omega_cross = np.array([0.0, -omega_z, omega_y, omega_z, 0.0, -omega_x, -omega_y, omega_x, 0.0]).reshape(3,3)

    A = np.sin(theta)/theta
    B = (1.0 - np.cos(theta))/(theta**2)

    R = np.eye(3,3) + A*omega_cross + B*np.matmul(omega_cross, omega_cross)

    T = np.array([tr_x, tr_y, tr_z]).reshape(3,1)

    random_transform = np.vstack((np.hstack((R, T)), np.array([[0.0, 0.0, 0.0, 1.0]])))

    to_write_tr = np.expand_dims(np.ndarray.flatten(random_transform), 0)
    angle_list = np.vstack((angle_list, to_write_tr))

    points = np.loadtxt(cloud_name)
    points = points[:,:3]
    ones_col = np.ones(shape=(points.shape[0],1))
    points = np.hstack((points,ones_col))
    current_img = smc.imread(img_name)
    # current_img_color = smc.imread(color_name)

    img = smc.imread(img_name)
    img_ht = img.shape[0]
    img_wdt = img.shape[1]

    points_in_cam_axis = np.matmul(R_rect_00, (np.matmul(velo_to_cam, points.T)))
    transformed_points = np.matmul(random_transform, points_in_cam_axis)
    # transformed_points = transformed_points[:-1,:]

    points_2d = np.matmul(K, np.matmul(cam_02_transform, transformed_points)[:-1,:])

    # points_2d = np.matmul(K, points_in_cam_axis[:-1,:])

    Z = points_2d[2,:]
    x = (points_2d[0,:]/Z).T
    y = (points_2d[1,:]/Z).T

    x = np.clip(x, 0.0, img_wdt - 1)
    y = np.clip(y, 0.0, img_ht - 1)

    reprojected_img = np.zeros_like(img)
    for x_idx, y_idx,z_idx in zip(x,y,Z):
        if(z_idx>0):
            reprojected_img[int(y_idx), int(x_idx)] = z_idx

    smc.imsave(depth_maps_transformed_folder + "/" + img_name[-14:], reprojected_img)

    points_2d = np.matmul(K, np.matmul(cam_02_transform, points_in_cam_axis)[:-1,:])

    Z = points_2d[2,:]
    x = (points_2d[0,:]/Z).T
    y = (points_2d[1,:]/Z).T

    x = np.clip(x, 0.0, img_wdt - 1)
    y = np.clip(y, 0.0, img_ht - 1)

    reprojected_img = np.zeros_like(img)
    for x_idx, y_idx,z_idx in zip(x,y,Z):
        if(z_idx>0):
            reprojected_img[int(y_idx), int(x_idx)] = z_idx
    pooled_img = reprojected_img

    print(img_name[-14:])
    print(np.max(pooled_img), np.min(pooled_img))

    reconstructed_img = current_img*(pooled_img>0.)
    smc.imsave(depth_maps_folder + "/" + img_name[-14:], pooled_img)
    smc.imsave(target_img_folder + "/" + img_name[-14:], reconstructed_img)

np.savetxt(depth_maps_transformed_folder + "/../angle_list_2.txt", angle_list[1:], fmt = "%.4f")


================================================
FILE: code/dataset_files/dataset_builder_parallel.sh
================================================
SCRIPTPATH="$( cd "$(dirname "$0")" ; pwd -P )"

array=( 0001 0002 0005 0009 0011 0013 0014 0015 0017 0018 0019 0020 0022 0023 0027 0028 0029 0032 0035 0036 0039 0046 0048 0051 0052 0056 0057 0059 0060 0061 0064 0070 0079 0084 0086 0087 0091 0093 0095 0096 0101 0104 0106 0113 0117 )

task()
{
    wget -k -c https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_09_26_drive_$1/2011_09_26_drive_$1_extract.zip
    wget -k -c https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_09_26_drive_$1/2011_09_26_drive_$1_sync.zip
    echo "Downloading $1"

    unzip 2011_09_26_drive_$1_extract.zip
    unzip 2011_09_26_drive_$1_sync.zip
    echo "Extracting $1"

    python -B $SCRIPTPATH/dataset_build_color.py $PWD/2011_09_26/2011_09_26_drive_$1
    python -B $SCRIPTPATH/dataset_build_color_2.py $PWD/2011_09_26/2011_09_26_drive_$1
    echo "Perturbing Depth Maps for ${array[$id]}"
}


N=4
(
for id in {0..44}
do
   ((i=i%N)); ((i++==0)) && wait
   task ${array[$id]} &
done
)

wait
echo "All processes done!"


================================================
FILE: code/dataset_files/parser.py
================================================
import numpy as np
import scipy.misc as smc
from natsort import natsorted as ns
import glob, os
import argparse

parser = argparse.ArgumentParser(description="Create Lidar Dataset Parser file")
parser.add_argument("path", help = "path_to_folder", type = str)
args = parser.parse_args()

dataset_path = args.path

#Picking up all sync folders
folder_names = ns(glob.glob(dataset_path +"*_sync" + os.path.sep))

dataset_array = np.zeros(dtype = str, shape = (1,20))
dataset_array_2 = np.zeros(dtype = str, shape = (1,20))

for fn in folder_names:
    print fn
    file_names_source = ns(glob.glob(fn + "depth_maps_transformed/*.png"))
    file_names_target = ns(glob.glob(fn + "depth_maps/*.png"))
    img_source = ns(glob.glob(fn + "image_02/data/*.png"))
    img_target = ns(glob.glob(fn + "image_03/data/*.png"))
    transforms_list = np.loadtxt(fn + "angle_list.txt", dtype = str)

    file_names_source = np.array(file_names_source, dtype=str).reshape(-1,1)
    file_names_target = np.array(file_names_target, dtype=str).reshape(-1,1)
    img_source = np.array(img_source, dtype=str).reshape(-1,1)
    img_target = np.array(img_target, dtype=str).reshape(-1,1)

    dataset = np.hstack((file_names_source, file_names_target, img_source, img_target, transforms_list))
    print(dataset.shape)

    dataset_array = np.vstack((dataset_array, dataset))

    #######################################################################################

    file_names_source_2 = ns(glob.glob(fn + "depth_maps_transformed_2/*.png"))
    file_names_target_2 = ns(glob.glob(fn + "depth_maps_2/*.png"))

    transforms_list_2 = np.loadtxt(fn + "angle_list_2.txt", dtype = str)

    file_names_source_2 = np.array(file_names_source_2, dtype=str).reshape(-1,1)
    file_names_target_2 = np.array(file_names_target_2, dtype=str).reshape(-1,1)

    dataset_2 = np.hstack((file_names_source_2, file_names_target_2, img_source, img_target, transforms_list_2))
    print(dataset_2.shape)

    dataset_array_2 = np.vstack((dataset_array_2, dataset_2))


dataset_array = dataset_array[1:]
dataset_array_2 = dataset_array_2[1:]

final_array = np.vstack((dataset_array, dataset_array_2))

np.random.shuffle(final_array)
np.savetxt("parsed_set.txt", final_array, fmt = "%s", delimiter=' ')


================================================
FILE: code/model_utils.py
================================================
import os
import tensorflow as tf
from tf_ops.emd import tf_auctionmatch
from tf_ops.CD import tf_nndistance
from tf_ops.sampling import tf_sampling
from tf_ops.grouping.tf_grouping import query_ball_point, group_point

def pre_load_checkpoint(checkpoint_dir):
    ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
    if ckpt and ckpt.model_checkpoint_path:
        # print(" [*] Reading checkpoint from {}".format(ckpt.model_checkpoint_path))
        epoch_step = int(os.path.basename(ckpt.model_checkpoint_path).split('-')[1])
        return epoch_step,ckpt.model_checkpoint_path
    else:
        return 0,None


def get_repulsion_loss4(pred, nsample=20, radius=0.07):
    # pred: (batch_size, npoint,3)
    idx, pts_cnt = query_ball_point(radius, nsample, pred, pred)
    tf.summary.histogram('smooth/unque_index', pts_cnt)

    grouped_pred = group_point(pred, idx)  # (batch_size, npoint, nsample, 3)
    grouped_pred -= tf.expand_dims(pred, 2)

    ##get the uniform loss
    h = 0.03
    dist_square = tf.reduce_sum(grouped_pred ** 2, axis=-1)
    dist_square, idx = tf.nn.top_k(-dist_square, 5)
    dist_square = -dist_square[:, :, 1:]  # remove the first one
    dist_square = tf.maximum(1e-12,dist_square)
    dist = tf.sqrt(dist_square)
    weight = tf.exp(-dist_square/h**2)
    uniform_loss = tf.reduce_mean(radius-dist*weight)
    return uniform_loss


def get_emd_loss(pred, gt):
    """ pred: BxNxC,
        label: BxN, """
    batch_size = pred.get_shape()[0].value
    matchl_out, matchr_out = tf_auctionmatch.auction_match(pred, gt)
    matched_out = tf_sampling.gather_point(gt, matchl_out)
    dist = tf.reshape((pred - matched_out) ** 2, shape=(batch_size, -1))
    emd_loss = tf.reduce_sum(dist)
    return emd_loss

def get_cd_loss(pred, gt):
    """ pred: BxNxC,
        label: BxN, """
    dists_forward, _, dists_backward, _ = tf_nndistance.nn_distance(gt, pred)

    #dists_forward is for each element in gt, the cloest distance to this element
    CD_dist = tf.reduce_sum(dists_forward) + tf.reduce_sum(dists_backward)
    return CD_dist


if __name__ == '__main__':
    gt = tf.constant([[[1,0,0],[2,0,0],[3,0,0],[4,0,0]]],tf.float32)
    pred = tf.constant([[[-10,0,0], [1,0, 0], [2,0, 0], [3,0,0]]],tf.float32)

    dists_forward, idx1, dists_backward, idx2 = tf_nndistance.nn_distance(gt, pred)
    with tf.Session() as sess:
        print idx1.eval() # for each element in gt, the idx of pred
        print idx2.eval() # for each element in pred,
        print dists_forward.eval()
        print dists_backward.eval()


================================================
FILE: code/nw_loader_color.py
================================================
import numpy as np
import glob, os, argparse
import scipy.misc as smc
from tqdm import tqdm
import matplotlib.pyplot as plt

import config_res as config

total = config.net_params['total_frames']
total_train = config.net_params['total_frames_train']
total_validation = config.net_params['total_frames_validation']
partition_limit = config.net_params['partition_limit']

IMG_HT = config.depth_img_params['IMG_HT']
IMG_WDT = config.depth_img_params['IMG_WDT']
batch_size = config.net_params['batch_size']

dataset = np.loadtxt(config.paths['dataset_path_full'], dtype = str)

dataset_train = dataset[:total_train]
dataset_validation = dataset[total_train:total]

def shuffle():
    np.random.shuffle(dataset_train)
    np.random.shuffle(dataset_validation)

def load(p_no, mode):

    if(mode == "train"):
        dataset_part = dataset_train[p_no*partition_limit:(p_no + 1)*partition_limit]
    elif(mode == "validation"):
        dataset_part = dataset_validation[p_no*partition_limit:(p_no + 1)*partition_limit]
    source_file_names = dataset_part[:,0]
    target_file_names = dataset_part[:,1]
    source_image_names = dataset_part[:,2]
    target_image_names = dataset_part[:,3]
    transforms = np.float32(dataset_part[:,4:])

    target_container = np.zeros((partition_limit, IMG_HT, IMG_WDT, 1), dtype = np.float32)
    source_container = np.zeros((partition_limit, IMG_HT, IMG_WDT, 1), dtype = np.float32)
    source_img_container = np.zeros((partition_limit, IMG_HT, IMG_WDT, 3), dtype = np.float32)
    target_img_container = np.zeros((partition_limit, IMG_HT, IMG_WDT, 3), dtype = np.float32)
    transforms_container = np.zeros((partition_limit, 4, 4), dtype = np.float32)

    c_idx = 0
    for s_name, t_name, img_source_name, img_target_name, transform in tqdm(zip(source_file_names, target_file_names, source_image_names, target_image_names, transforms)):

        warped_ip = np.float32(smc.imread(s_name, True))
        warped_ip[0:5,:] = 0.0 ; warped_ip[:,0:5] = 0.0 ; warped_ip[IMG_HT - 5:,:] = 0.0 ; warped_ip[:,IMG_WDT-5:] = 0.0 ;
        warped_ip = (warped_ip - 40.0)/40.0
        source_container[c_idx, :, :, 0] = warped_ip

        target_ip = np.float32(smc.imread(t_name, True))
        target_ip[0:5,:] = 0.0 ; target_ip[:,0:5] = 0.0 ; target_ip[IMG_HT - 5:,:] = 0.0 ; target_ip[:,IMG_WDT-5:] = 0.0 ;
        target_ip = (target_ip - 40.0)/40.0
        target_container[c_idx, :, :, 0] = target_ip

        source_img = np.float32(smc.imread(img_source_name))
        source_img[0:5,:,:] = 0.0 ; source_img[:,0:5,:] = 0.0 ; source_img[IMG_HT - 5:,:,:] = 0.0 ; source_img[:,IMG_WDT-5:,:] = 0.0 ;
        source_img = (source_img - 127.5)/127.5
        source_img_container[c_idx, :, :, :] = source_img

        target_img = np.float32(smc.imread(img_target_name))
        target_img[0:5,:,:] = 0.0 ; target_img[:,0:5,:] = 0.0 ; target_img[IMG_HT - 5:,:,:] = 0.0 ; target_img[:,IMG_WDT-5:,:] = 0.0 ;
        target_img = (target_img - 127.5)/127.5
        target_img_container[c_idx, :, :, :] = target_img

        transforms_container[c_idx, :, :] = np.linalg.inv(transform.reshape(4,4))
        c_idx+=1

    source_container = source_container.reshape(partition_limit/batch_size, batch_size, IMG_HT, IMG_WDT , 1)
    target_container = target_container.reshape(partition_limit/batch_size, batch_size, IMG_HT, IMG_WDT , 1)
    source_img_container = source_img_container.reshape(partition_limit/batch_size, batch_size, IMG_HT, IMG_WDT, 3)
    target_img_container = target_img_container.reshape(partition_limit/batch_size, batch_size, IMG_HT, IMG_WDT, 3)
    transforms_container = transforms_container.reshape(partition_limit/batch_size, batch_size, 4, 4)

    return source_container, target_container, source_img_container, target_img_container, transforms_container


================================================
FILE: code/tf_ops/CD/__init__.py
================================================


================================================
FILE: code/tf_ops/CD/makefile
================================================
nvcc = /usr/local/cuda-8.0/bin/nvcc
cudalib = /usr/local/cuda-8.0/lib64/
tensorflow = /home/ganeshiyer/tensorflow/lib/python2.7/site-packages/tensorflow/include

all: tf_nndistance_so.so render_balls_so.so
.PHONY : all

tf_nndistance_so.so: tf_nndistance_g.cu.o tf_nndistance.cpp
	g++ -std=c++11 tf_nndistance.cpp tf_nndistance_g.cu.o -o tf_nndistance_so.so -shared -fPIC -I $(tensorflow) -lcudart -L $(cudalib) -O2 -D_GLIBCXX_USE_CXX11_ABI=0

tf_nndistance_g.cu.o: tf_nndistance_g.cu
	$(nvcc) -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 -c -o tf_nndistance_g.cu.o tf_nndistance_g.cu -I $(tensorflow) -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -O2

render_balls_so.so: render_balls_so.cpp
	g++ -std=c++11 render_balls_so.cpp -o render_balls_so.so -shared -fPIC -O2 -D_GLIBCXX_USE_CXX11_ABI=0


================================================
FILE: code/tf_ops/CD/render_balls_so.cpp
================================================
#include <cstdio>
#include <vector>
#include <algorithm>
#include <math.h>
using namespace std;

struct PointInfo{
	int x,y,z;
	float r,g,b;
};

extern "C"{

void render_ball(int h,int w,unsigned char * show,int n,int * xyzs,float * c0,float * c1,float * c2,int r){
	r=max(r,1);
	vector<int> depth(h*w,-2100000000);
	vector<PointInfo> pattern;
	for (int dx=-r;dx<=r;dx++)
		for (int dy=-r;dy<=r;dy++)
			if (dx*dx+dy*dy<r*r){
				double dz=sqrt(double(r*r-dx*dx-dy*dy));
				PointInfo pinfo;
				pinfo.x=dx;
				pinfo.y=dy;
				pinfo.z=dz;
				pinfo.r=dz/r;
				pinfo.g=dz/r;
				pinfo.b=dz/r;
				pattern.push_back(pinfo);
			}
	double zmin=0,zmax=0;
	for (int i=0;i<n;i++){
		if (i==0){
			zmin=xyzs[i*3+2]-r;
			zmax=xyzs[i*3+2]+r;
		}else{
			zmin=min(zmin,double(xyzs[i*3+2]-r));
			zmax=max(zmax,double(xyzs[i*3+2]+r));
		}
	}
	for (int i=0;i<n;i++){
		int x=xyzs[i*3+0],y=xyzs[i*3+1],z=xyzs[i*3+2];
		for (int j=0;j<int(pattern.size());j++){
			int x2=x+pattern[j].x;
			int y2=y+pattern[j].y;
			int z2=z+pattern[j].z;
			if (!(x2<0 || x2>=h || y2<0 || y2>=w) && depth[x2*w+y2]<z2){
				depth[x2*w+y2]=z2;
				double intensity=min(1.0,(z2-zmin)/(zmax-zmin)*0.7+0.3);
				show[(x2*w+y2)*3+0]=pattern[j].b*c2[i]*intensity;
				show[(x2*w+y2)*3+1]=pattern[j].g*c0[i]*intensity;
				show[(x2*w+y2)*3+2]=pattern[j].r*c1[i]*intensity;
			}
		}
	}
}

}//extern "C"


================================================
FILE: code/tf_ops/CD/tf_nndistance.cpp
================================================
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
REGISTER_OP("NnDistance")
	.Input("xyz1: float32")
	.Input("xyz2: float32")
	.Output("dist1: float32")
	.Output("idx1: int32")
	.Output("dist2: float32")
	.Output("idx2: int32");
REGISTER_OP("NnDistanceGrad")
	.Input("xyz1: float32")
	.Input("xyz2: float32")
	.Input("grad_dist1: float32")
	.Input("idx1: int32")
	.Input("grad_dist2: float32")
	.Input("idx2: int32")
	.Output("grad_xyz1: float32")
	.Output("grad_xyz2: float32");
using namespace tensorflow;

static void nnsearch(int b,int n,int m,const float * xyz1,const float * xyz2,float * dist,int * idx){
	for (int i=0;i<b;i++){
		for (int j=0;j<n;j++){
			float x1=xyz1[(i*n+j)*3+0];
			float y1=xyz1[(i*n+j)*3+1];
			float z1=xyz1[(i*n+j)*3+2];
			double best=0;
			int besti=0;
			for (int k=0;k<m;k++){
				float x2=xyz2[(i*m+k)*3+0]-x1;
				float y2=xyz2[(i*m+k)*3+1]-y1;
				float z2=xyz2[(i*m+k)*3+2]-z1;
				double d=x2*x2+y2*y2+z2*z2;
				if (k==0 || d<best){
					best=d;
					besti=k;
				}
			}
			dist[i*n+j]=best;
			idx[i*n+j]=besti;
		}
	}
}

class NnDistanceOp : public OpKernel{
	public:
		explicit NnDistanceOp(OpKernelConstruction* context):OpKernel(context){}
		void Compute(OpKernelContext * context)override{
			const Tensor& xyz1_tensor=context->input(0);
			const Tensor& xyz2_tensor=context->input(1);
			OP_REQUIRES(context,xyz1_tensor.dims()==3,errors::InvalidArgument("NnDistance requires xyz1 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz1_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistance only accepts 3d point set xyz1"));
			int b=xyz1_tensor.shape().dim_size(0);
			int n=xyz1_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.dims()==3,errors::InvalidArgument("NnDistance requires xyz2 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistance only accepts 3d point set xyz2"));
			int m=xyz2_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(0)==b,errors::InvalidArgument("NnDistance expects xyz1 and xyz2 have same batch size"));
			auto xyz1_flat=xyz1_tensor.flat<float>();
			const float * xyz1=&xyz1_flat(0);
			auto xyz2_flat=xyz2_tensor.flat<float>();
			const float * xyz2=&xyz2_flat(0);
			Tensor * dist1_tensor=NULL;
			Tensor * idx1_tensor=NULL;
			Tensor * dist2_tensor=NULL;
			Tensor * idx2_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n},&dist1_tensor));
			OP_REQUIRES_OK(context,context->allocate_output(1,TensorShape{b,n},&idx1_tensor));
			auto dist1_flat=dist1_tensor->flat<float>();
			auto idx1_flat=idx1_tensor->flat<int>();
			OP_REQUIRES_OK(context,context->allocate_output(2,TensorShape{b,m},&dist2_tensor));
			OP_REQUIRES_OK(context,context->allocate_output(3,TensorShape{b,m},&idx2_tensor));
			auto dist2_flat=dist2_tensor->flat<float>();
			auto idx2_flat=idx2_tensor->flat<int>();
			float * dist1=&(dist1_flat(0));
			int * idx1=&(idx1_flat(0));
			float * dist2=&(dist2_flat(0));
			int * idx2=&(idx2_flat(0));
			nnsearch(b,n,m,xyz1,xyz2,dist1,idx1);
			nnsearch(b,m,n,xyz2,xyz1,dist2,idx2);
		}
};
REGISTER_KERNEL_BUILDER(Name("NnDistance").Device(DEVICE_CPU), NnDistanceOp);
class NnDistanceGradOp : public OpKernel{
	public:
		explicit NnDistanceGradOp(OpKernelConstruction* context):OpKernel(context){}
		void Compute(OpKernelContext * context)override{
			const Tensor& xyz1_tensor=context->input(0);
			const Tensor& xyz2_tensor=context->input(1);
			const Tensor& grad_dist1_tensor=context->input(2);
			const Tensor& idx1_tensor=context->input(3);
			const Tensor& grad_dist2_tensor=context->input(4);
			const Tensor& idx2_tensor=context->input(5);
			OP_REQUIRES(context,xyz1_tensor.dims()==3,errors::InvalidArgument("NnDistanceGrad requires xyz1 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz1_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistanceGrad only accepts 3d point set xyz1"));
			int b=xyz1_tensor.shape().dim_size(0);
			int n=xyz1_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.dims()==3,errors::InvalidArgument("NnDistanceGrad requires xyz2 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistanceGrad only accepts 3d point set xyz2"));
			int m=xyz2_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(0)==b,errors::InvalidArgument("NnDistanceGrad expects xyz1 and xyz2 have same batch size"));
			OP_REQUIRES(context,grad_dist1_tensor.shape()==(TensorShape{b,n}),errors::InvalidArgument("NnDistanceGrad requires grad_dist1 be of shape(batch,#points)"));
			OP_REQUIRES(context,idx1_tensor.shape()==(TensorShape{b,n}),errors::InvalidArgument("NnDistanceGrad requires idx1 be of shape(batch,#points)"));
			OP_REQUIRES(context,grad_dist2_tensor.shape()==(TensorShape{b,m}),errors::InvalidArgument("NnDistanceGrad requires grad_dist2 be of shape(batch,#points)"));
			OP_REQUIRES(context,idx2_tensor.shape()==(TensorShape{b,m}),errors::InvalidArgument("NnDistanceGrad requires idx2 be of shape(batch,#points)"));
			auto xyz1_flat=xyz1_tensor.flat<float>();
			const float * xyz1=&xyz1_flat(0);
			auto xyz2_flat=xyz2_tensor.flat<float>();
			const float * xyz2=&xyz2_flat(0);
			auto idx1_flat=idx1_tensor.flat<int>();
			const int * idx1=&idx1_flat(0);
			auto idx2_flat=idx2_tensor.flat<int>();
			const int * idx2=&idx2_flat(0);
			auto grad_dist1_flat=grad_dist1_tensor.flat<float>();
			const float * grad_dist1=&grad_dist1_flat(0);
			auto grad_dist2_flat=grad_dist2_tensor.flat<float>();
			const float * grad_dist2=&grad_dist2_flat(0);
			Tensor * grad_xyz1_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n,3},&grad_xyz1_tensor));
			Tensor * grad_xyz2_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(1,TensorShape{b,m,3},&grad_xyz2_tensor));
			auto grad_xyz1_flat=grad_xyz1_tensor->flat<float>();
			float * grad_xyz1=&grad_xyz1_flat(0);
			auto grad_xyz2_flat=grad_xyz2_tensor->flat<float>();
			float * grad_xyz2=&grad_xyz2_flat(0);
			for (int i=0;i<b*n*3;i++)
				grad_xyz1[i]=0;
			for (int i=0;i<b*m*3;i++)
				grad_xyz2[i]=0;
			for (int i=0;i<b;i++){
				for (int j=0;j<n;j++){
					float x1=xyz1[(i*n+j)*3+0];
					float y1=xyz1[(i*n+j)*3+1];
					float z1=xyz1[(i*n+j)*3+2];
					int j2=idx1[i*n+j];
					float x2=xyz2[(i*m+j2)*3+0];
					float y2=xyz2[(i*m+j2)*3+1];
					float z2=xyz2[(i*m+j2)*3+2];
					float g=grad_dist1[i*n+j]*2;
					grad_xyz1[(i*n+j)*3+0]+=g*(x1-x2);
					grad_xyz1[(i*n+j)*3+1]+=g*(y1-y2);
					grad_xyz1[(i*n+j)*3+2]+=g*(z1-z2);
					grad_xyz2[(i*m+j2)*3+0]-=(g*(x1-x2));
					grad_xyz2[(i*m+j2)*3+1]-=(g*(y1-y2));
					grad_xyz2[(i*m+j2)*3+2]-=(g*(z1-z2));
				}
				for (int j=0;j<m;j++){
					float x1=xyz2[(i*m+j)*3+0];
					float y1=xyz2[(i*m+j)*3+1];
					float z1=xyz2[(i*m+j)*3+2];
					int j2=idx2[i*m+j];
					float x2=xyz1[(i*n+j2)*3+0];
					float y2=xyz1[(i*n+j2)*3+1];
					float z2=xyz1[(i*n+j2)*3+2];
					float g=grad_dist2[i*m+j]*2;
					grad_xyz2[(i*m+j)*3+0]+=g*(x1-x2);
					grad_xyz2[(i*m+j)*3+1]+=g*(y1-y2);
					grad_xyz2[(i*m+j)*3+2]+=g*(z1-z2);
					grad_xyz1[(i*n+j2)*3+0]-=(g*(x1-x2));
					grad_xyz1[(i*n+j2)*3+1]-=(g*(y1-y2));
					grad_xyz1[(i*n+j2)*3+2]-=(g*(z1-z2));
				}
			}
		}
};
REGISTER_KERNEL_BUILDER(Name("NnDistanceGrad").Device(DEVICE_CPU), NnDistanceGradOp);

void NmDistanceKernelLauncher(int b,int n,const float * xyz,int m,const float * xyz2,float * result,int * result_i,float * result2,int * result2_i);
class NnDistanceGpuOp : public OpKernel{
	public:
		explicit NnDistanceGpuOp(OpKernelConstruction* context):OpKernel(context){}
		void Compute(OpKernelContext * context)override{
			const Tensor& xyz1_tensor=context->input(0);
			const Tensor& xyz2_tensor=context->input(1);
			OP_REQUIRES(context,xyz1_tensor.dims()==3,errors::InvalidArgument("NnDistance requires xyz1 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz1_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistance only accepts 3d point set xyz1"));
			int b=xyz1_tensor.shape().dim_size(0);
			int n=xyz1_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.dims()==3,errors::InvalidArgument("NnDistance requires xyz2 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistance only accepts 3d point set xyz2"));
			int m=xyz2_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(0)==b,errors::InvalidArgument("NnDistance expects xyz1 and xyz2 have same batch size"));
			auto xyz1_flat=xyz1_tensor.flat<float>();
			const float * xyz1=&xyz1_flat(0);
			auto xyz2_flat=xyz2_tensor.flat<float>();
			const float * xyz2=&xyz2_flat(0);
			Tensor * dist1_tensor=NULL;
			Tensor * idx1_tensor=NULL;
			Tensor * dist2_tensor=NULL;
			Tensor * idx2_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n},&dist1_tensor));
			OP_REQUIRES_OK(context,context->allocate_output(1,TensorShape{b,n},&idx1_tensor));
			auto dist1_flat=dist1_tensor->flat<float>();
			auto idx1_flat=idx1_tensor->flat<int>();
			OP_REQUIRES_OK(context,context->allocate_output(2,TensorShape{b,m},&dist2_tensor));
			OP_REQUIRES_OK(context,context->allocate_output(3,TensorShape{b,m},&idx2_tensor));
			auto dist2_flat=dist2_tensor->flat<float>();
			auto idx2_flat=idx2_tensor->flat<int>();
			float * dist1=&(dist1_flat(0));
			int * idx1=&(idx1_flat(0));
			float * dist2=&(dist2_flat(0));
			int * idx2=&(idx2_flat(0));
			NmDistanceKernelLauncher(b,n,xyz1,m,xyz2,dist1,idx1,dist2,idx2);
		}
};
REGISTER_KERNEL_BUILDER(Name("NnDistance").Device(DEVICE_GPU), NnDistanceGpuOp);

void NmDistanceGradKernelLauncher(int b,int n,const float * xyz1,int m,const float * xyz2,const float * grad_dist1,const int * idx1,const float * grad_dist2,const int * idx2,float * grad_xyz1,float * grad_xyz2);
class NnDistanceGradGpuOp : public OpKernel{
	public:
		explicit NnDistanceGradGpuOp(OpKernelConstruction* context):OpKernel(context){}
		void Compute(OpKernelContext * context)override{
			const Tensor& xyz1_tensor=context->input(0);
			const Tensor& xyz2_tensor=context->input(1);
			const Tensor& grad_dist1_tensor=context->input(2);
			const Tensor& idx1_tensor=context->input(3);
			const Tensor& grad_dist2_tensor=context->input(4);
			const Tensor& idx2_tensor=context->input(5);
			OP_REQUIRES(context,xyz1_tensor.dims()==3,errors::InvalidArgument("NnDistanceGrad requires xyz1 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz1_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistanceGrad only accepts 3d point set xyz1"));
			int b=xyz1_tensor.shape().dim_size(0);
			int n=xyz1_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.dims()==3,errors::InvalidArgument("NnDistanceGrad requires xyz2 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistanceGrad only accepts 3d point set xyz2"));
			int m=xyz2_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(0)==b,errors::InvalidArgument("NnDistanceGrad expects xyz1 and xyz2 have same batch size"));
			OP_REQUIRES(context,grad_dist1_tensor.shape()==(TensorShape{b,n}),errors::InvalidArgument("NnDistanceGrad requires grad_dist1 be of shape(batch,#points)"));
			OP_REQUIRES(context,idx1_tensor.shape()==(TensorShape{b,n}),errors::InvalidArgument("NnDistanceGrad requires idx1 be of shape(batch,#points)"));
			OP_REQUIRES(context,grad_dist2_tensor.shape()==(TensorShape{b,m}),errors::InvalidArgument("NnDistanceGrad requires grad_dist2 be of shape(batch,#points)"));
			OP_REQUIRES(context,idx2_tensor.shape()==(TensorShape{b,m}),errors::InvalidArgument("NnDistanceGrad requires idx2 be of shape(batch,#points)"));
			auto xyz1_flat=xyz1_tensor.flat<float>();
			const float * xyz1=&xyz1_flat(0);
			auto xyz2_flat=xyz2_tensor.flat<float>();
			const float * xyz2=&xyz2_flat(0);
			auto idx1_flat=idx1_tensor.flat<int>();
			const int * idx1=&idx1_flat(0);
			auto idx2_flat=idx2_tensor.flat<int>();
			const int * idx2=&idx2_flat(0);
			auto grad_dist1_flat=grad_dist1_tensor.flat<float>();
			const float * grad_dist1=&grad_dist1_flat(0);
			auto grad_dist2_flat=grad_dist2_tensor.flat<float>();
			const float * grad_dist2=&grad_dist2_flat(0);
			Tensor * grad_xyz1_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n,3},&grad_xyz1_tensor));
			Tensor * grad_xyz2_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(1,TensorShape{b,m,3},&grad_xyz2_tensor));
			auto grad_xyz1_flat=grad_xyz1_tensor->flat<float>();
			float * grad_xyz1=&grad_xyz1_flat(0);
			auto grad_xyz2_flat=grad_xyz2_tensor->flat<float>();
			float * grad_xyz2=&grad_xyz2_flat(0);
			NmDistanceGradKernelLauncher(b,n,xyz1,m,xyz2,grad_dist1,idx1,grad_dist2,idx2,grad_xyz1,grad_xyz2);
		}
};
REGISTER_KERNEL_BUILDER(Name("NnDistanceGrad").Device(DEVICE_GPU), NnDistanceGradGpuOp);


================================================
FILE: code/tf_ops/CD/tf_nndistance.py
================================================
import os
import tensorflow as tf
from tensorflow.python.framework import ops

BASE_DIR = os.path.dirname(os.path.abspath(__file__))

nn_distance_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_nndistance_so.so'))

def nn_distance(xyz1,xyz2):
	'''
Computes the distance of nearest neighbors for a pair of point clouds
input: xyz1: (batch_size,#points_1,3)  the first point cloud
input: xyz2: (batch_size,#points_2,3)  the second point cloud
output: dist1: (batch_size,#point_1)   distance from first to second
output: idx1:  (batch_size,#point_1)   nearest neighbor from first to second
output: dist2: (batch_size,#point_2)   distance from second to first
output: idx2:  (batch_size,#point_2)   nearest neighbor from second to first
	'''
	return nn_distance_module.nn_distance(xyz1,xyz2)
#@tf.RegisterShape('NnDistance')
#def _nn_distance_shape(op):
	#shape1=op.inputs[0].get_shape().with_rank(3)
	#shape2=op.inputs[1].get_shape().with_rank(3)
	#return [tf.TensorShape([shape1.dims[0],shape1.dims[1]]),tf.TensorShape([shape1.dims[0],shape1.dims[1]]),
		#tf.TensorShape([shape2.dims[0],shape2.dims[1]]),tf.TensorShape([shape2.dims[0],shape2.dims[1]])]
@ops.RegisterGradient('NnDistance')
def _nn_distance_grad(op,grad_dist1,grad_idx1,grad_dist2,grad_idx2):
	xyz1=op.inputs[0]
	xyz2=op.inputs[1]
	idx1=op.outputs[1]
	idx2=op.outputs[3]
	return nn_distance_module.nn_distance_grad(xyz1,xyz2,grad_dist1,idx1,grad_dist2,idx2)


if __name__=='__main__':
	import numpy as np
	import random
	import time
	from tensorflow.python.ops.gradient_checker import compute_gradient
	random.seed(100)
	np.random.seed(100)
	with tf.Session('') as sess:
		xyz1=np.random.randn(32,16384,3).astype('float32')
		xyz2=np.random.randn(32,1024,3).astype('float32')
		#with tf.device('/gpu:0'):
		if True:
			inp1=tf.Variable(xyz1)
			inp2=tf.constant(xyz2)
			reta,retb,retc,retd=nn_distance(inp1,inp2)
			loss=tf.reduce_sum(reta)+tf.reduce_sum(retc)
			train=tf.train.GradientDescentOptimizer(learning_rate=0.05).minimize(loss)
		sess.run(tf.initialize_all_variables())
		t0=time.time()
		t1=t0
		best=1e100
		for i in xrange(100):
			trainloss,_=sess.run([loss,train])
			newt=time.time()
			best=min(best,newt-t1)
			print i,trainloss,(newt-t0)/(i+1),best
			t1=newt
		#print sess.run([inp1,retb,inp2,retd])
		#grads=compute_gradient([inp1,inp2],[(16,32,3),(16,32,3)],loss,(1,),[xyz1,xyz2])
		#for i,j in grads:
			#print i.shape,j.shape,np.mean(np.abs(i-j)),np.mean(np.abs(i)),np.mean(np.abs(j))
		#for i in xrange(10):
			#t0=time.time()
			#a,b,c,d=sess.run([reta,retb,retc,retd],feed_dict={inp1:xyz1,inp2:xyz2})
			#print 'time',time.time()-t0
		#print a.shape,b.shape,c.shape,d.shape
		#print a.dtype,b.dtype,c.dtype,d.dtype
		#samples=np.array(random.sample(range(xyz2.shape[1]),100),dtype='int32')
		#dist1=((xyz1[:,samples,None,:]-xyz2[:,None,:,:])**2).sum(axis=-1).min(axis=-1)
		#idx1=((xyz1[:,samples,None,:]-xyz2[:,None,:,:])**2).sum(axis=-1).argmin(axis=-1)
		#print np.abs(dist1-a[:,samples]).max()
		#print np.abs(idx1-b[:,samples]).max()
		#dist2=((xyz2[:,samples,None,:]-xyz1[:,None,:,:])**2).sum(axis=-1).min(axis=-1)
		#idx2=((xyz2[:,samples,None,:]-xyz1[:,None,:,:])**2).sum(axis=-1).argmin(axis=-1)
		#print np.abs(dist2-c[:,samples]).max()
		#print np.abs(idx2-d[:,samples]).max()


================================================
FILE: code/tf_ops/CD/tf_nndistance_g.cu
================================================
#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"

__global__ void NmDistanceKernel(int b,int n,const float * xyz,int m,const float * xyz2,float * result,int * result_i){
	const int batch=512;
	__shared__ float buf[batch*3];
	for (int i=blockIdx.x;i<b;i+=gridDim.x){
		for (int k2=0;k2<m;k2+=batch){
			int end_k=min(m,k2+batch)-k2;
			for (int j=threadIdx.x;j<end_k*3;j+=blockDim.x){
				buf[j]=xyz2[(i*m+k2)*3+j];
			}
			__syncthreads();
			for (int j=threadIdx.x+blockIdx.y*blockDim.x;j<n;j+=blockDim.x*gridDim.y){
				float x1=xyz[(i*n+j)*3+0];
				float y1=xyz[(i*n+j)*3+1];
				float z1=xyz[(i*n+j)*3+2];
				int best_i=0;
				float best=0;
				int end_ka=end_k-(end_k&3);
				if (end_ka==batch){
					for (int k=0;k<batch;k+=4){
						{
							float x2=buf[k*3+0]-x1;
							float y2=buf[k*3+1]-y1;
							float z2=buf[k*3+2]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (k==0 || d<best){
								best=d;
								best_i=k+k2;
							}
						}
						{
							float x2=buf[k*3+3]-x1;
							float y2=buf[k*3+4]-y1;
							float z2=buf[k*3+5]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+1;
							}
						}
						{
							float x2=buf[k*3+6]-x1;
							float y2=buf[k*3+7]-y1;
							float z2=buf[k*3+8]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+2;
							}
						}
						{
							float x2=buf[k*3+9]-x1;
							float y2=buf[k*3+10]-y1;
							float z2=buf[k*3+11]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+3;
							}
						}
					}
				}else{
					for (int k=0;k<end_ka;k+=4){
						{
							float x2=buf[k*3+0]-x1;
							float y2=buf[k*3+1]-y1;
							float z2=buf[k*3+2]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (k==0 || d<best){
								best=d;
								best_i=k+k2;
							}
						}
						{
							float x2=buf[k*3+3]-x1;
							float y2=buf[k*3+4]-y1;
							float z2=buf[k*3+5]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+1;
							}
						}
						{
							float x2=buf[k*3+6]-x1;
							float y2=buf[k*3+7]-y1;
							float z2=buf[k*3+8]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+2;
							}
						}
						{
							float x2=buf[k*3+9]-x1;
							float y2=buf[k*3+10]-y1;
							float z2=buf[k*3+11]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+3;
							}
						}
					}
				}
				for (int k=end_ka;k<end_k;k++){
					float x2=buf[k*3+0]-x1;
					float y2=buf[k*3+1]-y1;
					float z2=buf[k*3+2]-z1;
					float d=x2*x2+y2*y2+z2*z2;
					if (k==0 || d<best){
						best=d;
						best_i=k+k2;
					}
				}
				if (k2==0 || result[(i*n+j)]>best){
					result[(i*n+j)]=best;
					result_i[(i*n+j)]=best_i;
				}
			}
			__syncthreads();
		}
	}
}
void NmDistanceKernelLauncher(int b,int n,const float * xyz,int m,const float * xyz2,float * result,int * result_i,float * result2,int * result2_i){
	NmDistanceKernel<<<dim3(32,16,1),512>>>(b,n,xyz,m,xyz2,result,result_i);
	NmDistanceKernel<<<dim3(32,16,1),512>>>(b,m,xyz2,n,xyz,result2,result2_i);
}
__global__ void NmDistanceGradKernel(int b,int n,const float * xyz1,int m,const float * xyz2,const float * grad_dist1,const int * idx1,float * grad_xyz1,float * grad_xyz2){
	for (int i=blockIdx.x;i<b;i+=gridDim.x){
		for (int j=threadIdx.x+blockIdx.y*blockDim.x;j<n;j+=blockDim.x*gridDim.y){
			float x1=xyz1[(i*n+j)*3+0];
			float y1=xyz1[(i*n+j)*3+1];
			float z1=xyz1[(i*n+j)*3+2];
			int j2=idx1[i*n+j];
			float x2=xyz2[(i*m+j2)*3+0];
			float y2=xyz2[(i*m+j2)*3+1];
			float z2=xyz2[(i*m+j2)*3+2];
			float g=grad_dist1[i*n+j]*2;
			atomicAdd(&(grad_xyz1[(i*n+j)*3+0]),g*(x1-x2));
			atomicAdd(&(grad_xyz1[(i*n+j)*3+1]),g*(y1-y2));
			atomicAdd(&(grad_xyz1[(i*n+j)*3+2]),g*(z1-z2));
			atomicAdd(&(grad_xyz2[(i*m+j2)*3+0]),-(g*(x1-x2)));
			atomicAdd(&(grad_xyz2[(i*m+j2)*3+1]),-(g*(y1-y2)));
			atomicAdd(&(grad_xyz2[(i*m+j2)*3+2]),-(g*(z1-z2)));
		}
	}
}
void NmDistanceGradKernelLauncher(int b,int n,const float * xyz1,int m,const float * xyz2,const float * grad_dist1,const int * idx1,const float * grad_dist2,const int * idx2,float * grad_xyz1,float * grad_xyz2){
	cudaMemset(grad_xyz1,0,b*n*3*4);
	cudaMemset(grad_xyz2,0,b*m*3*4);
	NmDistanceGradKernel<<<dim3(1,16,1),256>>>(b,n,xyz1,m,xyz2,grad_dist1,idx1,grad_xyz1,grad_xyz2);
	NmDistanceGradKernel<<<dim3(1,16,1),256>>>(b,m,xyz2,n,xyz1,grad_dist2,idx2,grad_xyz2,grad_xyz1);
}

#endif


================================================
FILE: code/tf_ops/__init__.py
================================================


================================================
FILE: code/tf_ops/emd/__init__.py
================================================


================================================
FILE: code/tf_ops/emd/tf_auctionmatch.cpp
================================================
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/common_shape_fns.h"
#include <algorithm>
#include <vector>
#include <math.h>
using namespace tensorflow;
REGISTER_OP("AuctionMatch")
	.Input("xyz1: float32")
	.Input("xyz2: float32")
	.Output("matchl: int32")
	.Output("matchr: int32")
	.SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c){
        ::tensorflow::shape_inference::ShapeHandle dims1;
        c->WithRank(c->input(0), 3, &dims1);
        ::tensorflow::shape_inference::ShapeHandle dims2;
        c->WithRank(c->input(1), 3, &dims2);
        ::tensorflow::shape_inference::ShapeHandle output1 = c->MakeShape({c->Dim(dims1, 0), c->Dim(dims1, 1)});
        c->set_output(0, output1);
        ::tensorflow::shape_inference::ShapeHandle output2 = c->MakeShape({c->Dim(dims2, 0), c->Dim(dims2, 1)});
        c->set_output(1, output2);
        return Status::OK();
	});
void AuctionMatchLauncher(int b,int n,const float * xyz1,const float * xyz2,int * matchl,int * matchr,float * cost);

class AuctionMatchGpuOp: public OpKernel{
	public:
		explicit AuctionMatchGpuOp(OpKernelConstruction* context):OpKernel(context){}
		void Compute(OpKernelContext * context)override{
			const Tensor& xyz1_tensor=context->input(0);
			OP_REQUIRES(context,xyz1_tensor.dims()==3 && xyz1_tensor.shape().dim_size(2)==3,errors::InvalidArgument("ApproxMatch expects (batch_size,num_points,3) xyz1 shape"));
			auto xyz1_flat=xyz1_tensor.flat<float>();
			const float * xyz1=&(xyz1_flat(0));
			int b=xyz1_tensor.shape().dim_size(0);
			int n=xyz1_tensor.shape().dim_size(1);
			OP_REQUIRES(context,n<=4096,errors::InvalidArgument("AuctionMatch handles at most 4096 dataset points"));

			const Tensor& xyz2_tensor=context->input(1);
			OP_REQUIRES(context,xyz2_tensor.dims()==3 && xyz2_tensor.shape().dim_size(2)==3 && xyz2_tensor.shape().dim_size(0)==b && xyz2_tensor.shape().dim_size(1)==n,errors::InvalidArgument("AuctionMatch expects (batch_size,num_points,3) xyz2 shape, and shape must match with xyz1"));
			auto xyz2_flat=xyz2_tensor.flat<float>();
			const float * xyz2=&(xyz2_flat(0));

			Tensor * matchl_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n},&matchl_tensor));
			auto matchl_flat=matchl_tensor->flat<int>();
			int * matchl=&(matchl_flat(0));
			Tensor * matchr_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(1,TensorShape{b,n},&matchr_tensor));
			auto matchr_flat=matchr_tensor->flat<int>();
			int * matchr=&(matchr_flat(0));

			Tensor temp_tensor;
			OP_REQUIRES_OK(context,context->allocate_temp(DataTypeToEnum<float>::value,TensorShape{b,n,n},&temp_tensor));
			auto temp_flat=temp_tensor.flat<float>();
			float * temp=&(temp_flat(0));

			AuctionMatchLauncher(b,n,xyz1,xyz2,matchl,matchr,temp);
		}
};
REGISTER_KERNEL_BUILDER(Name("AuctionMatch").Device(DEVICE_GPU), AuctionMatchGpuOp);


================================================
FILE: code/tf_ops/emd/tf_auctionmatch.py
================================================
import tensorflow as tf
from tensorflow.python.framework import ops
import sys
import os

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
auctionmatch_module = tf.load_op_library(os.path.join(BASE_DIR, 'tf_auctionmatch_so.so'))

def auction_match(xyz1,xyz2):
	'''
input:
	xyz1 : batch_size * #points * 3
	xyz2 : batch_size * #points * 3
returns:
	matchl : batch_size * #npoints
	matchr : batch_size * #npoints
	'''
	print ("Here")
	return auctionmatch_module.auction_match(xyz1,xyz2)
ops.NoGradient('AuctionMatch')

# TF1.0 API requires set shape in C++
# @tf.RegisterShape('AuctionMatch')
# def _auction_match_shape(op):
# 	shape1=op.inputs[0].get_shape().with_rank(3)
# 	shape2=op.inputs[1].get_shape().with_rank(3)
# 	return [
# 		tf.TensorShape([shape1.dims[0],shape1.dims[1]]),
# 		tf.TensorShape([shape2.dims[0],shape2.dims[1]])
# 	]

if __name__=='__main__':
    from tf_ops.grouping import tf_grouping
    from tf_ops.sampling import tf_sampling

    npoint=4096
    xyz1_in=tf.placeholder(tf.float32,shape=(32,npoint,3))
    xyz2_in=tf.placeholder(tf.float32,shape=(32,npoint,3))
    matchl_out,matchr_out=auction_match(xyz1_in,xyz2_in)
    matched_out=tf_sampling.gather_point(xyz2_in,matchl_out)
    import numpy as np
    np.random.seed(100)
    xyz1=np.random.randn(32,npoint,3).astype('float32')
    xyz2=xyz1.copy()+np.random.randn(32,npoint,3)*0.01
    for i in xrange(len(xyz2)):
        xyz2[i]=np.roll(xyz2[i],i,axis=0)
    with tf.Session('') as sess:
        ret=sess.run(matched_out,feed_dict={xyz1_in:xyz1,xyz2_in:xyz2})
    print ((xyz1-ret)**2).mean()


================================================
FILE: code/tf_ops/emd/tf_auctionmatch_compile.sh
================================================
echo 'nvcc'
/usr/local/cuda-8.0/bin/nvcc tf_auctionmatch_g.cu -o tf_auctionmatch_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_30 
echo 'g++'

g++ -std=c++11 tf_auctionmatch.cpp tf_auctionmatch_g.cu.o -o tf_auctionmatch_so.so -shared -fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -I /home/ganeshiyer/tensorflow/lib/python2.7/site-packages/tensorflow/include  -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2


================================================
FILE: code/tf_ops/emd/tf_auctionmatch_g.cu
================================================
#include <cstdio>
__global__ void AuctionMatchKernel(int b,int n,const float * __restrict__ xyz1,const float * __restrict__ xyz2,int * matchl,int * matchr,float * cost){
	//this kernel handles up to 4096 points
	const int NMax=4096;
	__shared__ short Queue[NMax];
	__shared__ short matchrbuf[NMax];
	__shared__ float pricer[NMax];
	__shared__ float bests[32][3];
	__shared__ int qhead,qlen;
	const int BufLen=2048;
	__shared__ float buf[BufLen];
	for (int bno=blockIdx.x;bno<b;bno+=gridDim.x){
		int cnt=0;
		float tolerance=1e-4;
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			matchl[bno*n+j]=-1;
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			matchrbuf[j]=-1;
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			Queue[j]=j;
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			pricer[j]=0;
		const int Block=512;
		for (int k0=0;k0<n;k0+=Block){
			int k1=min(n,k0+Block);
			for (int k=threadIdx.x;k<(k1-k0)*3;k+=blockDim.x)
				buf[k]=xyz1[bno*n*3+k0*3+k];
			__syncthreads();
			for (int j=threadIdx.x;j<n;j+=blockDim.x){
				float x2=xyz2[bno*n*3+j*3+0];
				float y2=xyz2[bno*n*3+j*3+1];
				float z2=xyz2[bno*n*3+j*3+2];
				for (int k=k0;k<k1;k++){
					float x1=buf[(k-k0)*3+0];
					float y1=buf[(k-k0)*3+1];
					float z1=buf[(k-k0)*3+2];
					float d=sqrtf((x1-x2)*(x1-x2)+(y1-y2)*(y1-y2)+(z1-z2)*(z1-z2));
					cost[blockIdx.x*n*n+k*n+j]=d;
				}
			}
			__syncthreads();
		}
		//calculate the distacne
		if (threadIdx.x==0){
			qhead=0;
			qlen=n;
		}
		__syncthreads();
		int loaded=0;
		float value9,value10,value11,value12,value13,value14,value15,value16;
		while (qlen){
			int i=Queue[qhead];
			int i2;
			if (qhead+1<n)
				i2=Queue[qhead+1];
			else
				i2=Queue[0];
			float best=1e38f,best2=1e38f;
			int bestj=0;
			if (n==blockDim.x*8){
				int j=threadIdx.x;
				float value1,value2,value3,value4,value5,value6,value7,value8;
				if (loaded){
					value1=value9+pricer[j];
					value2=value10+pricer[j+blockDim.x];
					value3=value11+pricer[j+blockDim.x*2];
					value4=value12+pricer[j+blockDim.x*3];
					value5=value13+pricer[j+blockDim.x*4];
					value6=value14+pricer[j+blockDim.x*5];
					value7=value15+pricer[j+blockDim.x*6];
					value8=value16+pricer[j+blockDim.x*7];
					loaded=0;
				}else{
					value1=cost[blockIdx.x*n*n+i*n+j]+pricer[j];
					value2=cost[blockIdx.x*n*n+i*n+j+blockDim.x]+pricer[j+blockDim.x];
					value3=cost[blockIdx.x*n*n+i*n+j+blockDim.x*2]+pricer[j+blockDim.x*2];
					value4=cost[blockIdx.x*n*n+i*n+j+blockDim.x*3]+pricer[j+blockDim.x*3];
					value5=cost[blockIdx.x*n*n+i*n+j+blockDim.x*4]+pricer[j+blockDim.x*4];
					value6=cost[blockIdx.x*n*n+i*n+j+blockDim.x*5]+pricer[j+blockDim.x*5];
					value7=cost[blockIdx.x*n*n+i*n+j+blockDim.x*6]+pricer[j+blockDim.x*6];
					value8=cost[blockIdx.x*n*n+i*n+j+blockDim.x*7]+pricer[j+blockDim.x*7];
					value9=cost[blockIdx.x*n*n+i2*n+j];
					value10=cost[blockIdx.x*n*n+i2*n+j+blockDim.x];
					value11=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*2];
					value12=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*3];
					value13=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*4];
					value14=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*5];
					value15=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*6];
					value16=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*7];
					loaded=qlen>1;
				}
				int vj,vj2,vj3,vj4;
				if (value1<value2){
					vj=j;
				}else{
					vj=j+blockDim.x;
					float t=value1;
					value1=value2;
					value2=t;
				}
				if (value3<value4){
					vj2=j+blockDim.x*2;
				}else{
					vj2=j+blockDim.x*3;
					float t=value3;
					value3=value4;
					value4=t;
				}
				if (value5<value6){
					vj3=j+blockDim.x*4;
				}else{
					vj3=j+blockDim.x*5;
					float t=value5;
					value5=value6;
					value6=t;
				}
				if (value7<value8){
					vj4=j+blockDim.x*6;
				}else{
					vj4=j+blockDim.x*7;
					float t=value7;
					value7=value8;
					value8=t;
				}
				if (value1<value3){
					value2=fminf(value2,value3);
				}else{
					value2=fminf(value1,value4);
					value1=value3;
					vj=vj2;
				}
				if (value5<value7){
					value6=fminf(value6,value7);
				}else{
					value6=fminf(value5,value8);
					value5=value7;
					vj3=vj4;
				}
				if (value1<value5){
					best=value1;
					bestj=vj;
					best2=fminf(value2,value5);
				}else{
					best2=fminf(value1,value6);
					best=value5;
					bestj=vj3;
				}
			}else if (n>=blockDim.x*4){
				for (int j=threadIdx.x;j<n;j+=blockDim.x*4){
					float value1=cost[blockIdx.x*n*n+i*n+j]+pricer[j];
					float value2=cost[blockIdx.x*n*n+i*n+j+blockDim.x]+pricer[j+blockDim.x];
					float value3=cost[blockIdx.x*n*n+i*n+j+blockDim.x*2]+pricer[j+blockDim.x*2];
					float value4=cost[blockIdx.x*n*n+i*n+j+blockDim.x*3]+pricer[j+blockDim.x*3];
					int vj,vj2;
					if (value1<value2){
						vj=j;
					}else{
						vj=j+blockDim.x;
						float t=value1;
						value1=value2;
						value2=t;
					}
					if (value3<value4){
						vj2=j+blockDim.x*2;
					}else{
						vj2=j+blockDim.x*3;
						float t=value3;
						value3=value4;
						value4=t;
					}
					if (value1<value3){
						value2=fminf(value2,value3);
					}else{
						value2=fminf(value1,value4);
						value1=value3;
						vj=vj2;
					}
					if (best<value1){
						best2=fminf(best2,value1);
					}else{
						best2=fminf(best,value2);
						best=value1;
						bestj=vj;
					}
				}
			}else if (n>=blockDim.x*2){
				for (int j=threadIdx.x;j<n;j+=blockDim.x*2){
					float value1=cost[blockIdx.x*n*n+i*n+j]+pricer[j];
					float value2=cost[blockIdx.x*n*n+i*n+j+blockDim.x]+pricer[j+blockDim.x];
					int vj;
					if (value1<value2){
						vj=j;
					}else{
						vj=j+blockDim.x;
						float t=value1;
						value1=value2;
						value2=t;
					}
					if (best<value1){
						best2=fminf(best2,value1);
					}else{
						best2=fminf(best,value2);
						best=value1;
						bestj=vj;
					}
				}
			}else{
				for (int j=threadIdx.x;j<n;j+=blockDim.x){
					float value=cost[blockIdx.x*n*n+i*n+j]+pricer[j];
					if (best<value){
						best2=fminf(best2,value);
					}else{
						best2=best;
						bestj=j;
						best=value;
					}
				}
			}
			for (int i=16;i>0;i>>=1){
				float b1=__shfl_down(best,i,32);
				float b2=__shfl_down(best2,i,32);
				int bj=__shfl_down(bestj,i,32);
				if (best<b1){
					best2=fminf(b1,best2);
				}else{
					best=b1;
					best2=fminf(best,b2);
					bestj=bj;
				}
			}
			if ((threadIdx.x&31)==0){
				bests[threadIdx.x>>5][0]=best;
				bests[threadIdx.x>>5][1]=best2;
				*(int*)&bests[threadIdx.x>>5][2]=bestj;
			}
			__syncthreads();
			int nn=blockDim.x>>5;
			if (threadIdx.x<nn){
				best=bests[threadIdx.x][0];
				best2=bests[threadIdx.x][1];
				bestj=*(int*)&bests[threadIdx.x][2];
				for (int i=nn>>1;i>0;i>>=1){
					float b1=__shfl_down(best,i,32);
					float b2=__shfl_down(best2,i,32);
					int bj=__shfl_down(bestj,i,32);
					if (best<b1){
						best2=fminf(b1,best2);
					}else{
						best=b1;
						best2=fminf(best,b2);
						bestj=bj;
					}
				}
			}
			if (threadIdx.x==0){
				float delta=best2-best+tolerance;
				qhead++;
				qlen--;
				if (qhead>=n)
					qhead-=n;
				int old=matchrbuf[bestj];
				pricer[bestj]+=delta;
				cnt++;
				if (old!=-1){
					int ql=qlen;
					int tail=qhead+ql;
					qlen=ql+1;
					if (tail>=n)
						tail-=n;
					Queue[tail]=old;
				}
				if (cnt==(40*n)){
					if (tolerance==1.0)
						qlen=0;
					tolerance=fminf(1.0,tolerance*100);
					cnt=0;
				}
			}
			__syncthreads();
			if (threadIdx.x==0){
				matchrbuf[bestj]=i;
			}
		}
		__syncthreads();
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			matchr[bno*n+j]=matchrbuf[j];
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			matchl[bno*n+matchrbuf[j]]=j;
		__syncthreads();
	}
}
void AuctionMatchLauncher(int b,int n,const float * xyz1,const float * xyz2,int * matchl,int * matchr,float * cost){
	AuctionMatchKernel<<<32,512>>>(b,n,xyz1,xyz2,matchl,matchr,cost);
}


================================================
FILE: code/tf_ops/grouping/__init__.py
================================================


================================================
FILE: code/tf_ops/grouping/compile.sh
================================================
g++ query_ball_point.cpp -o query_ball_point
nvcc query_ball_point.cu -o query_ball_point_cuda
nvcc query_ball_point_block.cu -o query_ball_point_block
nvcc query_ball_point_grid.cu -o query_ball_point_grid
nvcc query_ball_point_grid_count.cu -o query_ball_point_grid_count
g++ -Wall selection_sort.cpp -o selection_sort
nvcc selection_sort.cu -o selection_sort_cuda


================================================
FILE: code/tf_ops/grouping/query_ball_point.cpp
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}
// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)
// output: idx (b,m,nsample)
void query_ball_point_cpu(int b, int n, int m, const float* radius, int nsample, const float *xyz1, const float *xyz2, int *idx) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            int cnt = 0;
            for (int k=0;k<n;++k) {
                if (cnt == nsample)
                    break; // only pick the FIRST nsample points in the ball
	        float x2=xyz2[j*3+0];
	        float y2=xyz2[j*3+1];
	        float z2=xyz2[j*3+2];
	        float x1=xyz1[k*3+0];
	        float y1=xyz1[k*3+1];
	        float z1=xyz1[k*3+2];
		float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
                if (d<*radius) {
                    if (cnt==0) { // set ALL indices to k, s.t. if there are less points in ball than nsample, we still have valid (repeating) indices
                        for (int l=0;l<nsample;++l)
                            idx[j*nsample+l] = k;
                    }
                    idx[j*nsample+cnt] = k;
                    cnt+=1;
                }
            }
        }
        xyz1+=n*3;
        xyz2+=m*3;
        idx+=m*nsample;
    }
}


// input: points (b,n,c), idx (b,m,nsample)
// output: out (b,m,nsample,c)
void group_point_cpu(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int k=0;k<nsample;++k) {
                int ii = idx[j*nsample+k];
                for (int l=0;l<c;++l) {
                    out[j*nsample*c+k*c+l] = points[ii*c+l];
                }
            }
        }
        points+=n*c;
        idx+=m*nsample;
        out+=m*nsample*c;
    }
}

// input: grad_out (b,m,nsample,c), idx (b,m,nsample), 
// output: grad_points (b,n,c)
void group_point_grad_cpu(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int k=0;k<nsample;++k) {
                int ii = idx[j*nsample+k];
                for (int l=0;l<c;++l) {
                     grad_points[ii*c+l] += grad_out[j*nsample*c+k*c+l];
                }
            }
        }
        idx+=m*nsample;
        grad_out+=m*nsample*c;
        grad_points+=n*c;
    }
}

int main()
{
    int b=32,n=512,m=128,nsample=64,c=64;
    float radius=0.1;
    float *xyz1=new float[b*n*3];
    float *xyz2=new float[b*m*3];
    float *points=new float[b*n*c];
    int *idx=new int[b*m*nsample];
    memset(idx, 0, sizeof(int)*b*m*nsample);
    float *out=new float[b*m*nsample*c];
    float *grad_out=new float[b*m*nsample*c]; // grad to out
    memset(grad_out, 0.0, sizeof(float)*b*m*nsample*c);
    float *grad_points=new float[b*n*c]; // grad to points
    for (int i=0;i<b*n*3;i++)
        xyz1[i]=randomf();
    for (int i=0;i<b*m*3;i++)
        xyz2[i]=randomf();
    for (int i=0;i<b*n*c;i++)
        points[i]=randomf();

    double t0=get_time();
    query_ball_point_cpu(b,n,m,radius,nsample,xyz1,xyz2,idx);
    printf("query_ball_point cpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_cpu(b,n,c,m,nsample,points,idx,out);
    printf("grou_point cpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_grad_cpu(b,n,c,m,nsample,grad_out,idx,grad_points);
    printf("grou_point_grad cpu time %f\n",get_time()-t0);

    return 0;
}


================================================
FILE: code/tf_ops/grouping/query_ball_point.cu
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
#include "cuPrintf.cuh"
#include "cuPrintf.cu"

using namespace std;
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}
// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)
// output: idx (b,m,nsample)
__global__ void query_ball_point_gpu(int b, int n, int m, const float* radius, int nsample, const float *xyz1, const float *xyz2, int *idx) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            int cnt = 0;
            for (int k=0;k<n;++k) {
                if (cnt == nsample)
                    break; // only pick the FIRST nsample points in the ball
	            float x2=xyz2[j*3+0];
	            float y2=xyz2[j*3+1];
	            float z2=xyz2[j*3+2];
	            float x1=xyz1[k*3+0];
	            float y1=xyz1[k*3+1];
	            float z1=xyz1[k*3+2];
		        float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
                if (d<radius[0]) {
                    if (cnt==0) { // set ALL indices to k, s.t. if there are less points in ball than nsample, we still have valid (repeating) indices
                        for (int l=0;l<nsample;++l)
                            idx[j*nsample+l] = k;
                    }
                    idx[j*nsample+cnt] = k;
                    cnt+=1;
                }
            }
        }
        xyz1+=n*3;
        xyz2+=m*3;
        idx+=m*nsample;
    }
}


// input: points (b,n,c), idx (b,m,nsample)
// output: out (b,m,nsample,c)
__global__ void group_point_gpu(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int k=0;k<nsample;++k) {
                int ii = idx[j*nsample+k];
                for (int l=0;l<c;++l) {
                    out[j*nsample*c+k*c+l] = points[ii*c+l];
                }
            }
        }
        points+=n*c;
        idx+=m*nsample;
        out+=m*nsample*c;
    }
}

// input: grad_out (b,m,nsample,c), idx (b,m,nsample), 
// output: grad_points (b,n,c)
__global__ void group_point_grad_gpu(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int k=0;k<nsample;++k) {
                int ii = idx[j*nsample+k];
                for (int l=0;l<c;++l) {
                     grad_points[ii*c+l] += grad_out[j*nsample*c+k*c+l];
                }
            }
        }
        idx+=m*nsample;
        grad_out+=m*nsample*c;
        grad_points+=n*c;
    }
}

int main()
{
    int b=32,n=512,m=128,nsample=64,c=64;
    float radius=0.1;
    float *xyz1, *xyz2, *points;
    cudaMallocManaged(&xyz1, b*n*3*sizeof(float));
    cudaMallocManaged(&xyz2, b*m*3*sizeof(float));
    cudaMallocManaged(&points, b*n*c*sizeof(float));
    int *idx;
    cudaMallocManaged(&idx, b*m*nsample*sizeof(int));
    memset(idx, 0, sizeof(int)*b*m*nsample);
    float *out, *grad_out;
    cudaMallocManaged(&out, b*m*nsample*c*sizeof(float));
    cudaMallocManaged(&grad_out, b*m*nsample*c*sizeof(float));
    memset(grad_out, 0.0, sizeof(float)*b*m*nsample*c);
    float *grad_points;
    cudaMallocManaged(&grad_points, b*n*c*sizeof(float));

    for (int i=0;i<b*n*3;i++)
        xyz1[i]=randomf();
    for (int i=0;i<b*m*3;i++)
        xyz2[i]=randomf();
    for (int i=0;i<b*n*c;i++)
        points[i]=randomf();

    double t0=get_time();
    query_ball_point_gpu<<<1,1>>>(b,n,m,radius,nsample,xyz1,xyz2,idx);
    cudaDeviceSynchronize();
    printf("query_ball_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_gpu<<<1,1>>>(b,n,c,m,nsample,points,idx,out);
    cudaDeviceSynchronize();
    printf("grou_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_grad_gpu<<<1,1>>>(b,n,c,m,nsample,grad_out,idx,grad_points);
    cudaDeviceSynchronize();
    printf("grou_point_grad gpu time %f\n",get_time()-t0);

    cudaFree(xyz1);
    cudaFree(xyz2);
    cudaFree(points);
    cudaFree(idx);
    cudaFree(out);
    cudaFree(grad_out);
    cudaFree(grad_points);
    return 0;
}


================================================
FILE: code/tf_ops/grouping/query_ball_point_block.cu
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}
// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)
// output: idx (b,m,nsample)
__global__ void query_ball_point_gpu(int b, int n, int m, const float *radius, int nsample, const float *xyz1, const float *xyz2, int *idx) {
    int index = threadIdx.x;
    xyz1 += n*3*index;
    xyz2 += m*3*index;
    idx += m*nsample*index;

    for (int j=0;j<m;++j) {
        int cnt = 0;
        for (int k=0;k<n;++k) {
            if (cnt == nsample)
                break; // only pick the FIRST nsample points in the ball
            float x2=xyz2[j*3+0];
            float y2=xyz2[j*3+1];
            float z2=xyz2[j*3+2];
            float x1=xyz1[k*3+0];
            float y1=xyz1[k*3+1];
            float z1=xyz1[k*3+2];
    	float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
            if (d<radius[0]) {
                if (cnt==0) { // set ALL indices to k, s.t. if there are less points in ball than nsample, we still have valid (repeating) indices
                    for (int l=0;l<nsample;++l)
                        idx[j*nsample+l] = k;
                }
                idx[j*nsample+cnt] = k;
                cnt+=1;
            }
        }
    }
}


// input: points (b,n,c), idx (b,m,nsample)
// output: out (b,m,nsample,c)
__global__ void group_point_gpu(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out) {
    int index = threadIdx.x;
    points += n*c*index;
    idx += m*nsample*index;
    out += m*nsample*c*index;

    for (int j=0;j<m;++j) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                out[j*nsample*c+k*c+l] = points[ii*c+l];
            }
        }
    }
}

// input: grad_out (b,m,nsample,c), idx (b,m,nsample), 
// output: grad_points (b,n,c)
__global__ void group_point_grad_gpu(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points) {
    int index = threadIdx.x;
    idx += m*nsample*index;
    grad_out += m*nsample*c*index;
    grad_points += n*c*index;

    for (int j=0;j<m;++j) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                 grad_points[ii*c+l] += grad_out[j*nsample*c+k*c+l];
            }
        }
    }
}

int main()
{
    int b=32,n=512,m=128,nsample=64,c=64;
    float radius=0.1;
    float *xyz1, *xyz2, *points;
    cudaMallocManaged(&xyz1, b*n*3*sizeof(float));
    cudaMallocManaged(&xyz2, b*m*3*sizeof(float));
    cudaMallocManaged(&points, b*n*c*sizeof(float));
    int *idx;
    cudaMallocManaged(&idx, b*m*nsample*sizeof(int));
    memset(idx, 0, sizeof(int)*b*m*nsample);
    float *out, *grad_out;
    cudaMallocManaged(&out, b*m*nsample*c*sizeof(float));
    cudaMallocManaged(&grad_out, b*m*nsample*c*sizeof(float));
    memset(grad_out, 0.0, sizeof(float)*b*m*nsample*c);
    float *grad_points;
    cudaMallocManaged(&grad_points, b*n*c*sizeof(float));

    for (int i=0;i<b*n*3;i++)
        xyz1[i]=randomf();
    for (int i=0;i<b*m*3;i++)
        xyz2[i]=randomf();
    for (int i=0;i<b*n*c;i++)
        points[i]=randomf();

    double t0=get_time();
    query_ball_point_gpu<<<1,b>>>(b,n,m,radius,nsample,xyz1,xyz2,idx);
    cudaDeviceSynchronize();
    printf("query_ball_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_gpu<<<1,b>>>(b,n,c,m,nsample,points,idx,out);
    cudaDeviceSynchronize();
    printf("grou_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_grad_gpu<<<1,b>>>(b,n,c,m,nsample,grad_out,idx,grad_points);
    cudaDeviceSynchronize();
    printf("grou_point_grad gpu time %f\n",get_time()-t0);

    cudaFree(xyz1);
    cudaFree(xyz2);
    cudaFree(points);
    cudaFree(idx);
    cudaFree(out);
    cudaFree(grad_out);
    cudaFree(grad_points);
    return 0;
}


================================================
FILE: code/tf_ops/grouping/query_ball_point_grid.cu
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}
// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)
// output: idx (b,m,nsample)
__global__ void query_ball_point_gpu(int b, int n, int m, const float *radius, int nsample, const float *xyz1, const float *xyz2, int *idx) {
    int batch_index = blockIdx.x;
    xyz1 += n*3*batch_index;
    xyz2 += m*3*batch_index;
    idx += m*nsample*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;
    
    for (int j=index;j<m;j+=stride) {
        int cnt = 0;
        for (int k=0;k<n;++k) {
            if (cnt == nsample)
                break; // only pick the FIRST nsample points in the ball
            float x2=xyz2[j*3+0];
            float y2=xyz2[j*3+1];
            float z2=xyz2[j*3+2];
            float x1=xyz1[k*3+0];
            float y1=xyz1[k*3+1];
            float z1=xyz1[k*3+2];
    	float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
            if (d<radius[0]) {
                if (cnt==0) { // set ALL indices to k, s.t. if there are less points in ball than nsample, we still have valid (repeating) indices
                    for (int l=0;l<nsample;++l)
                        idx[j*nsample+l] = k;
                }
                idx[j*nsample+cnt] = k;
                cnt+=1;
            }
        }
    }
}


// input: points (b,n,c), idx (b,m,nsample)
// output: out (b,m,nsample,c)
__global__ void group_point_gpu(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out) {
    int batch_index = blockIdx.x;
    points += n*c*batch_index;
    idx += m*nsample*batch_index;
    out += m*nsample*c*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;
    
    for (int j=index;j<m;j+=stride) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                out[j*nsample*c+k*c+l] = points[ii*c+l];
            }
        }
    }
}

// input: grad_out (b,m,nsample,c), idx (b,m,nsample), 
// output: grad_points (b,n,c)
__global__ void group_point_grad_gpu(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points) {
    int batch_index = blockIdx.x;
    idx += m*nsample*batch_index;
    grad_out += m*nsample*c*batch_index;
    grad_points += n*c*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;

    for (int j=index;j<m;j+=stride) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                 // Use atomic add to avoid race condition
                 atomicAdd(&grad_points[ii*c+l], grad_out[j*nsample*c+k*c+l]);
            }
        }
    }
}

int main()
{
    int b=32,n=512,m=128,nsample=64,c=64;
    float radius=0.1;
    float *xyz1, *xyz2, *points;
    cudaMallocManaged(&xyz1, b*n*3*sizeof(float));
    cudaMallocManaged(&xyz2, b*m*3*sizeof(float));
    cudaMallocManaged(&points, b*n*c*sizeof(float));
    int *idx;
    cudaMallocManaged(&idx, b*m*nsample*sizeof(int));
    memset(idx, 0, sizeof(int)*b*m*nsample);
    float *out, *grad_out;
    cudaMallocManaged(&out, b*m*nsample*c*sizeof(float));
    cudaMallocManaged(&grad_out, b*m*nsample*c*sizeof(float));
    memset(grad_out, 0.0, sizeof(float)*b*m*nsample*c);
    float *grad_points;
    cudaMallocManaged(&grad_points, b*n*c*sizeof(float));

    for (int i=0;i<b*n*3;i++)
        xyz1[i]=randomf();
    for (int i=0;i<b*m*3;i++)
        xyz2[i]=randomf();
    for (int i=0;i<b*n*c;i++)
        points[i]=randomf();

    double t0=get_time();
    query_ball_point_gpu<<<b,256>>>(b,n,m,radius,nsample,xyz1,xyz2,idx);
    cudaDeviceSynchronize();
    printf("query_ball_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_gpu<<<b,256>>>(b,n,c,m,nsample,points,idx,out);
    cudaDeviceSynchronize();
    printf("grou_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_grad_gpu<<<b,256>>>(b,n,c,m,nsample,grad_out,idx,grad_points);
    cudaDeviceSynchronize();
    printf("grou_point_grad gpu time %f\n",get_time()-t0);

    cudaFree(xyz1);
    cudaFree(xyz2);
    cudaFree(points);
    cudaFree(idx);
    cudaFree(out);
    cudaFree(grad_out);
    cudaFree(grad_points);
    return 0;
}


================================================
FILE: code/tf_ops/grouping/selection_sort.cpp
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}

// input: k (1), distance matrix dist (b,m,n)
// output: idx (b,m,n), val (b,m,n)
void selection_sort_cpu(int b, int n, int m, int k, const float *dist, int *idx, float *val) {
    float *p_dist;
    float tmp;
    int tmpi;
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int s=0;s<n;++s) {
                val[i*m*n+j*n+s] = dist[i*m*n+j*n+s];
                idx[i*m*n+j*n+s] = s;
            }
        }
    }

    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int s=0;s<n;++s)
                printf("%f ", dist[i*m*n+j*n+s]);
            printf("\n");
            p_dist = val+j*n;
            // selection sort for the first k elements
            for (int s=0;s<k;++s) {
                int min=s; 
                // find the min
                for (int t=s+1;t<n;++t) {
                    if (p_dist[t]<p_dist[min]) {
                        min = t;
                    }
                }
                printf("%d\n", min);
                // swap min-th and i-th element
                if (min!=s) {
                    tmp = p_dist[min];
                    p_dist[min] = p_dist[s];
                    p_dist[s] = tmp;
                    tmpi = idx[j*n+min];
                    idx[j*n+min] = idx[j*n+s];
                    idx[j*n+s] = tmpi;
                }       
            }
        }
        idx+=m*n;
        val+=m*n;
    }
}

int main()
{
    //int b=32,n=10000,m=1000,k=128;
    int b=2,n=4,m=2,k=3;
    float *dist=new float[b*m*n];
    int *idx=new int[b*m*n];
    float *val=new float[b*m*n];
    memset(idx, 0, sizeof(int)*b*m*n);
    //for (int i=0;i<b*n*m;i++)
    //    dist[i]=randomf();
    for (int i=0;i<b*n*m;i++) {
        dist[i] = float(10-i);
        printf("%f ", dist[i]);
    }
    printf("\n");


    double t0=get_time();
    selection_sort_cpu(b,n,m,k,dist,idx,val);
    printf("selection sort cpu time %f\n",get_time()-t0);

    for (int i=0;i<b*n*m;++i)
        printf("%d ", idx[i]);
    printf("\n");
    for (int i=0;i<b*n*m;++i)
        printf("%f ", val[i]);
    printf("\n");
    return 0;
}


================================================
FILE: code/tf_ops/grouping/selection_sort.cu
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}

// input: k (1), distance matrix dist (b,m,n)
// output: idx (b,m,k), val (b,m,k)
__global__ void selection_sort_gpu(int b, int n, int m, int k, float *dist, int *idx, float *val) {
    int batch_index = blockIdx.x;
    dist+=m*n*batch_index;
    idx+=m*k*batch_index;
    val+=m*k*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;

    float *p_dist;
    for (int j=index;j<m;j+=stride) {
        p_dist = dist+j*n;
        // selection sort for the first k elements
        for (int s=0;s<k;++s) {
            int min=s; 
            // find the min
            for (int t=s+1;t<n;++t) {
                if (p_dist[t]<p_dist[min]) {
                    min = t;
                }
            }
            // update idx and val
            idx[j*n+s] = min;
            val[j*n+s] = p_dist[min];
            // swap min-th and i-th element
            float tmp = p_dist[min];
            p_dist[min] = p_dist[s];
            p_dist[s] = tmp;
        }
    }
}

int main()
{
    //int b=32,n=10000,m=1000,k=128;
    int b=32,n=2048,m=512,k=128;
    float *dist;
    int *idx;
    float *val;
    cudaMallocManaged(&dist, b*m*n*sizeof(float));
    cudaMallocManaged(&idx, b*m*k*sizeof(int));
    cudaMallocManaged(&val, b*m*k*sizeof(float));
    cudaMemset(idx, 0, sizeof(int)*b*m*k);
    for (int i=0;i<b*n*m;i++)
        dist[i]=randomf();

    double t0=get_time();
    selection_sort_gpu<<<b,256>>>(b,n,m,k,dist,idx,val);
    cudaDeviceSynchronize();
    printf("selection sort cpu time %f\n",get_time()-t0);

    return 0;
}


================================================
FILE: code/tf_ops/grouping/selection_sort_const.cu
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}

// input: k (1), distance matrix dist (b,m,n)
// output: idx (b,m,n), dist_out (b,m,n)
__global__ void selection_sort_gpu(int b, int n, int m, int k, const float *dist, int *outi, float *out) {
    int batch_index = blockIdx.x;
    dist+=m*n*batch_index;
    outi+=m*n*batch_index;
    out+=m*n*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;

    // copy from dist to dist_out
    for (int j=index;j<m;j+=stride) {
        for (int s=0;s<n;++s) {
            out[j*n+s] = dist[j*n+s];
            outi[j*n+s] = s;
        }
    }

    float *p_dist;
    for (int j=index;j<m;j+=stride) {
        p_dist = out+j*n;
        // selection sort for the first k elements
        for (int s=0;s<k;++s) {
            int min=s; 
            // find the min
            for (int t=s+1;t<n;++t) {
                if (p_dist[t]<p_dist[min]) {
                    min = t;
                }
            }
            // swap min-th and i-th element
            if (min!=s) {
                float tmp = p_dist[min];
                p_dist[min] = p_dist[s];
                p_dist[s] = tmp;
                int tmpi = outi[j*n+min];
                outi[j*n+min] = outi[j*n+s];
                outi[j*n+s] = tmpi;
            }
        }
    }
}

int main()
{
    //int b=32,n=10000,m=1000,k=128;
    int b=32,n=2048,m=512,k=128;
    //int b=2,n=4,m=2,k=3;
    float *dist;
    int *idx;
    float *dist_out;
    cudaMallocManaged(&dist, b*m*n*sizeof(float));
    cudaMallocManaged(&idx, b*m*n*sizeof(int));
    cudaMallocManaged(&dist_out, b*m*n*sizeof(float));
    cudaMemset(idx, 0, sizeof(int)*b*m*n);
    for (int i=0;i<b*n*m;i++)
        dist[i]=randomf();
    //for (int i=0;i<b*n*m;i++) {
    //    dist[i] = float(10-i);
    //    printf("%f ", dist[i]);
    //}
    //printf("\n");

    double t0=get_time();
    selection_sort_gpu<<<b,256>>>(b,n,m,k,dist,idx,dist_out);
    cudaDeviceSynchronize();
    printf("selection sort cpu time %f\n",get_time()-t0);
    
    //for (int i=0;i<b*n*m;++i)
    //    printf("%d ", idx[i]);
    //printf("\n");

    return 0;
}


================================================
FILE: code/tf_ops/grouping/test_knn.py
================================================
import tensorflow as tf
import numpy as np

np.random.seed(0)


a_val = np.random.random((2,5,3))
b_val = np.random.random((2,2,3))
for b in range(2):
    print '--- ', b
    t1 = a_val[b,:,:]
    t2 = b_val[b,:,:]
    for i in range(2): #npoint in b
        print '-- point b: ', i
        for j in range(5): # npoint in a
            d = np.sum((t2[i,:]-t1[j,:])**2)
            print d
            

a = tf.constant(a_val)
b = tf.constant(b_val)
print a.get_shape()
k = 3

a = tf.tile(tf.reshape(a, (2,1,5,3)), [1,2,1,1])
b = tf.tile(tf.reshape(b, (2,2,1,3)), [1,1,5,1])

dist = -tf.reduce_sum((a-b)**2, -1)
print dist

val, idx = tf.nn.top_k(dist, k=k)
print val, idx
sess = tf.Session()
print sess.run(a)
print sess.run(b)
print sess.run(dist)
print sess.run(val)
print sess.run(idx)
print sess.run(idx).shape


================================================
FILE: code/tf_ops/grouping/tf_grouping.cpp
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <iostream>
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/common_shape_fns.h"
#include <cuda_runtime.h>
using namespace tensorflow;

REGISTER_OP("QueryBallPoint")
    .Attr("nsample: int")
    .Input("xyz1: float32")
    .Input("xyz2: float32")
    .Input("radius: float32")
    .Output("idx: int32")
    .Output("pts_cnt: int32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        ::tensorflow::shape_inference::ShapeHandle dims2; // batch_size * npoint * 3
        c->WithRank(c->input(1), 3, &dims2);
        int nsample;
        TF_RETURN_IF_ERROR(c->GetAttr("nsample", &nsample));
        ::tensorflow::shape_inference::ShapeHandle output1 = c->MakeShape({c->Dim(dims2, 0), c->Dim(dims2, 1), nsample});
        c->set_output(0, output1);
        ::tensorflow::shape_inference::ShapeHandle output2 = c->MakeShape({c->Dim(dims2, 0), c->Dim(dims2, 1)});
        c->set_output(1, output2);
        return Status::OK();
    });
REGISTER_OP("SelectionSort")
    .Attr("k: int")
    .Input("dist: float32")
    .Output("outi: int32")
    .Output("out: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        c->set_output(0, c->input(0));
        c->set_output(1, c->input(0));
        return Status::OK();
    });
REGISTER_OP("GroupPoint")
    .Input("points: float32")
    .Input("idx: int32")
    .Output("out: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        ::tensorflow::shape_inference::ShapeHandle dims1; // batch_size * ndataset * channels
        c->WithRank(c->input(0), 3, &dims1);
        ::tensorflow::shape_inference::ShapeHandle dims2; // batch_size * npoints * nsample
        c->WithRank(c->input(1), 3, &dims2);
        // batch_size * npoints * nsample * channels
        ::tensorflow::shape_inference::ShapeHandle output = c->MakeShape({c->Dim(dims2, 0), c->Dim(dims2, 1), c->Dim(dims2, 2), c->Dim(dims1, 2)});
        c->set_output(0, output);
        return Status::OK();
    });
REGISTER_OP("GroupPointGrad")
    .Input("points: float32")
    .Input("idx: int32")
    .Input("grad_out: float32")
    .Output("grad_points: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        c->set_output(0, c->input(0));
        return Status::OK();
    });


void queryBallPointLauncher(int b, int n, int m, const float* radius, int nsample, const float *xyz1, const float *xyz2, int *idx, int *pts_cnt);
class QueryBallPointGpuOp : public OpKernel {
    public:
        explicit QueryBallPointGpuOp(OpKernelConstruction* context) : OpKernel(context) {
            //OP_REQUIRES_OK(context, context->GetAttr("radius", &radius_));
            //OP_REQUIRES(context, radius_ > 0, errors::InvalidArgument("QueryBallPoint expects positive radius"));

            OP_REQUIRES_OK(context, context->GetAttr("nsample", &nsample_));
            OP_REQUIRES(context, nsample_ > 0, errors::InvalidArgument("QueryBallPoint expects positive nsample"));
        }

        void Compute(OpKernelContext* context) override {
            const Tensor& xyz1_tensor = context->input(0);
            OP_REQUIRES(context, xyz1_tensor.dims()==3 && xyz1_tensor.shape().dim_size(2)==3, errors::InvalidArgument("QueryBallPoint expects (batch_size, ndataset, 3) xyz1 shape."));
            int b = xyz1_tensor.shape().dim_size(0);
            int n = xyz1_tensor.shape().dim_size(1);

            const Tensor& xyz2_tensor = context->input(1);
            OP_REQUIRES(context, xyz2_tensor.dims()==3 && xyz2_tensor.shape().dim_size(2)==3, errors::InvalidArgument("QueryBallPoint expects (batch_size, npoint, 3) xyz2 shape."));
            int m = xyz2_tensor.shape().dim_size(1);

            Tensor *idx_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0, TensorShape{b,m,nsample_}, &idx_tensor));
            Tensor *pts_cnt_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(1, TensorShape{b,m}, &pts_cnt_tensor));

            const Tensor& radius_tensor = context->input(2);
            auto radius_flat = radius_tensor.flat<float>();
            const float *radius = &(radius_flat(0));

            auto xyz1_flat = xyz1_tensor.flat<float>();
            const float *xyz1 = &(xyz1_flat(0));
            auto xyz2_flat = xyz2_tensor.flat<float>();
            const float *xyz2 = &(xyz2_flat(0));
            auto idx_flat = idx_tensor->flat<int>();
            int *idx = &(idx_flat(0));
            auto pts_cnt_flat = pts_cnt_tensor->flat<int>();
            int *pts_cnt = &(pts_cnt_flat(0));
            queryBallPointLauncher(b,n,m,radius,nsample_,xyz1,xyz2,idx,pts_cnt);
        }
    private:
        int nsample_;
};
REGISTER_KERNEL_BUILDER(Name("QueryBallPoint").Device(DEVICE_GPU), QueryBallPointGpuOp);

void selectionSortLauncher(int b, int n, int m, int k, const float *dist, int *outi, float *out);
class SelectionSortGpuOp : public OpKernel {
    public:
        explicit SelectionSortGpuOp(OpKernelConstruction* context) : OpKernel(context) {
            OP_REQUIRES_OK(context, context->GetAttr("k", &k_));
            OP_REQUIRES(context, k_ > 0, errors::InvalidArgument("SelectionSort expects positive k"));
        }

        void Compute(OpKernelContext* context) override {
            const Tensor& dist_tensor = context->input(0);
            OP_REQUIRES(context, dist_tensor.dims()==3, errors::InvalidArgument("SelectionSort expects (b,m,n) dist shape."));
            int b = dist_tensor.shape().dim_size(0);
            int m = dist_tensor.shape().dim_size(1);
            int n = dist_tensor.shape().dim_size(2);

            Tensor *outi_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0, TensorShape{b,m,n}, &outi_tensor));
            Tensor *out_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(1, TensorShape{b,m,n}, &out_tensor));

            auto dist_flat = dist_tensor.flat<float>();
            const float *dist = &(dist_flat(0));
            auto outi_flat = outi_tensor->flat<int>();
            int *outi = &(outi_flat(0));
            auto out_flat = out_tensor->flat<float>();
            float *out = &(out_flat(0));
            selectionSortLauncher(b,n,m,k_,dist,outi,out);
        }
    private:
        int k_;
};
REGISTER_KERNEL_BUILDER(Name("SelectionSort").Device(DEVICE_GPU), SelectionSortGpuOp);


void groupPointLauncher(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out);
class GroupPointGpuOp: public OpKernel{
    public:
        explicit GroupPointGpuOp(OpKernelConstruction * context):OpKernel(context){}

        void Compute(OpKernelContext * context) override {
            const Tensor& points_tensor=context->input(0);
            OP_REQUIRES(context, points_tensor.dims()==3, errors::InvalidArgument("GroupPoint expects (batch_size, num_points, channel) points shape"));
            int b = points_tensor.shape().dim_size(0);
            int n = points_tensor.shape().dim_size(1);
            int c = points_tensor.shape().dim_size(2);

            const Tensor& idx_tensor=context->input(1);
            OP_REQUIRES(context,idx_tensor.dims()==3 && idx_tensor.shape().dim_size(0)==b, errors::InvalidArgument("GroupPoint expects (batch_size, npoints, nsample) idx shape"));
            int m = idx_tensor.shape().dim_size(1);
            int nsample = idx_tensor.shape().dim_size(2);

            Tensor * out_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0,TensorShape{b,m,nsample,c}, &out_tensor));

            auto points_flat = points_tensor.flat<float>();
            const float *points = &(points_flat(0));
            auto idx_flat = idx_tensor.flat<int>();
            const int *idx = &(idx_flat(0));
            auto out_flat = out_tensor->flat<float>();
            float *out = &(out_flat(0));
            groupPointLauncher(b,n,c,m,nsample,points,idx,out);
        }
};
REGISTER_KERNEL_BUILDER(Name("GroupPoint").Device(DEVICE_GPU),GroupPointGpuOp);

void groupPointGradLauncher(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points);
class GroupPointGradGpuOp: public OpKernel{
    public:
        explicit GroupPointGradGpuOp(OpKernelConstruction * context):OpKernel(context){}

        void Compute(OpKernelContext * context) override {
            const Tensor& points_tensor=context->input(0);
            OP_REQUIRES(context, points_tensor.dims()==3, errors::InvalidArgument("GroupPointGrad expects (batch_size, num_points, channel) points shape"));
            int b = points_tensor.shape().dim_size(0);
            int n = points_tensor.shape().dim_size(1);
            int c = points_tensor.shape().dim_size(2);

            const Tensor& idx_tensor=context->input(1);
            OP_REQUIRES(context,idx_tensor.dims()==3 && idx_tensor.shape().dim_size(0)==b, errors::InvalidArgument("GroupPointGrad expects (batch_size, npoints, nsample) idx shape"));
            int m = idx_tensor.shape().dim_size(1);
            int nsample = idx_tensor.shape().dim_size(2);

            const Tensor& grad_out_tensor=context->input(2);
            OP_REQUIRES(context,grad_out_tensor.dims()==4 && grad_out_tensor.shape().dim_size(0)==b && grad_out_tensor.shape().dim_size(1)==m && grad_out_tensor.shape().dim_size(2)==nsample && grad_out_tensor.shape().dim_size(3)==c, errors::InvalidArgument("GroupPointGrad expects (batch_size, npoints, nsample, channel) grad_out shape"));

            Tensor * grad_points_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0,TensorShape{b,n,c}, &grad_points_tensor));

            auto points_flat = points_tensor.flat<float>();
            const float *points = &(points_flat(0));
            auto idx_flat = idx_tensor.flat<int>();
            const int *idx = &(idx_flat(0));
            auto grad_out_flat = grad_out_tensor.flat<float>();
            const float *grad_out = &(grad_out_flat(0));
            auto grad_points_flat = grad_points_tensor->flat<float>();
            float *grad_points = &(grad_points_flat(0));
            cudaMemset(grad_points, 0, sizeof(float)*b*n*c);
            groupPointGradLauncher(b,n,c,m,nsample,grad_out,idx,grad_points);
        }
};
REGISTER_KERNEL_BUILDER(Name("GroupPointGrad").Device(DEVICE_GPU),GroupPointGradGpuOp);


================================================
FILE: code/tf_ops/grouping/tf_grouping.py
================================================
import tensorflow as tf
from tensorflow.python.framework import ops
import sys
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
grouping_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_grouping_so.so'))
def query_ball_point(radius, nsample, xyz1, xyz2):
    '''
    Input:
        radius: float32, ball search radius
        nsample: int32, number of points selected in each ball region
        xyz1: (batch_size, ndataset, 3) float32 array, input points
        xyz2: (batch_size, npoint, 3) float32 array, query points
    Output:
        idx: (batch_size, npoint, nsample) int32 array, indices to input points
        pts_cnt: (batch_size, npoint) int32 array, number of unique points in each local region
    '''
    #return grouping_module.query_ball_point(radius, nsample, xyz1, xyz2)
    return grouping_module.query_ball_point(xyz1, xyz2, radius, nsample)
ops.NoGradient('QueryBallPoint')
def select_top_k(k, dist):
    '''
    Input:
        k: int32, number of k SMALLEST elements selected
        dist: (b,m,n) float32 array, distance matrix, m query points, n dataset points
    Output:
        idx: (b,m,n) int32 array, first k in n are indices to the top k
        dist_out: (b,m,n) float32 array, first k in n are the top k
    '''
    return grouping_module.selection_sort(dist, k)
ops.NoGradient('SelectionSort')
def group_point(points, idx):
    '''
    Input:
        points: (batch_size, ndataset, channel) float32 array, points to sample from
        idx: (batch_size, npoint, nsample) int32 array, indices to points
    Output:
        out: (batch_size, npoint, nsample, channel) float32 array, values sampled from points
    '''
    return grouping_module.group_point(points, idx)
@tf.RegisterGradient('GroupPoint')
def _group_point_grad(op, grad_out):
    points = op.inputs[0]
    idx = op.inputs[1]
    return [grouping_module.group_point_grad(points, idx, grad_out), None]

def knn_point(k, xyz1, xyz2):
    '''
    Input:
        k: int32, number of k in k-nn search
        xyz1: (batch_size, ndataset, c) float32 array, input points
        xyz2: (batch_size, npoint, c) float32 array, query points
    Output:
        val: (batch_size, npoint, k) float32 array, L2 distances
        idx: (batch_size, npoint, k) int32 array, indices to input points
    '''
    # b = xyz1.get_shape()[0].value
    # n = xyz1.get_shape()[1].value
    # c = xyz1.get_shape()[2].value
    # m = xyz2.get_shape()[1].value
    # xyz1 = tf.tile(tf.reshape(xyz1, (b,1,n,c)), [1,m,1,1])
    # xyz2 = tf.tile(tf.reshape(xyz2, (b,m,1,c)), [1,1,n,1])
    xyz1 = tf.expand_dims(xyz1,axis=1)
    xyz2 = tf.expand_dims(xyz2,axis=2)
    dist = tf.reduce_sum((xyz1-xyz2)**2, -1)

    # outi, out = select_top_k(k, dist)
    # idx = tf.slice(outi, [0,0,0], [-1,-1,k])
    # val = tf.slice(out, [0,0,0], [-1,-1,k])

    val, idx = tf.nn.top_k(-dist, k=k) # ONLY SUPPORT CPU
    return val, idx

if __name__=='__main__':
    knn=True
    import numpy as np
    import time
    np.random.seed(100)
    pts = np.random.random((32,512,64)).astype('float32')
    tmp1 = np.random.random((32,512,3)).astype('float32')
    tmp2 = np.random.random((32,128,3)).astype('float32')
    with tf.device('/gpu:1'):
        points = tf.constant(pts)
        xyz1 = tf.constant(tmp1)
        xyz2 = tf.constant(tmp2)
        radius = 0.1 
        nsample = 64
        if knn:
            _, idx = knn_point(nsample, xyz1, xyz2)
            grouped_points = group_point(points, idx)
        else:
            idx, _ = query_ball_point(radius, nsample, xyz1, xyz2)
            grouped_points = group_point(points, idx)
            #grouped_points_grad = tf.ones_like(grouped_points)
            #points_grad = tf.gradients(grouped_points, points, grouped_points_grad)
    with tf.Session('') as sess:
        now = time.time() 
        for _ in range(100):
            ret = sess.run(grouped_points)
        print time.time() - now
        print ret.shape, ret.dtype
        print ret
    
    
================================================
FILE: code/tf_ops/grouping/tf_grouping_compile.sh
================================================
#/bin/bash
/usr/local/cuda-8.0/bin/nvcc tf_grouping_g.cu -o tf_grouping_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

g++ -std=c++11 tf_grouping.cpp tf_grouping_g.cu.o -o tf_grouping_so.so -shared -fPIC -I /home/ganeshiyer/tensorflow/lib/python2.7/site-packages/tensorflow/include  -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0


================================================
FILE: code/tf_ops/grouping/tf_grouping_g.cu
================================================
// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)
// output: idx (b,m,nsample), pts_cnt (b,m)
__global__ void query_ball_point_gpu(int b, int n, int m, const float *radius, int nsample, const float *xyz1, const float *xyz2, int *idx, int *pts_cnt) {
    int batch_index = blockIdx.x;
    xyz1 += n*3*batch_index;
    xyz2 += m*3*batch_index;
    idx += m*nsample*batch_index;
    pts_cnt += m*batch_index; // counting how many unique points selected in local region

    int index = threadIdx.x;
    int stride = blockDim.x;
    
    for (int j=index;j<m;j+=stride) {
        int cnt = 0;
        for (int k=0;k<n;++k) {
            if (cnt == nsample)
                break; // only pick the FIRST nsample points in the ball
            float x2=xyz2[j*3+0];
            float y2=xyz2[j*3+1];
            float z2=xyz2[j*3+2];
            float x1=xyz1[k*3+0];
            float y1=xyz1[k*3+1];
            float z1=xyz1[k*3+2];
    	    float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
            if (d<radius[0]) {
                if (cnt==0) { // set ALL indices to k, s.t. if there are less points in ball than nsample, we still have valid (repeating) indices
                    for (int l=0;l<nsample;++l)
                        idx[j*nsample+l] = k;
                }
                idx[j*nsample+cnt] = k;
                cnt+=1;
            }
        }
        pts_cnt[j] = cnt;
    }
}

// input: points (b,n,c), idx (b,m,nsample)
// output: out (b,m,nsample,c)
__global__ void group_point_gpu(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out) {
    int batch_index = blockIdx.x;
    points += n*c*batch_index;
    idx += m*nsample*batch_index;
    out += m*nsample*c*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;
    
    for (int j=index;j<m;j+=stride) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                out[j*nsample*c+k*c+l] = points[ii*c+l];
            }
        }
    }
}

// input: grad_out (b,m,nsample,c), idx (b,m,nsample), 
// output: grad_points (b,n,c)
__global__ void group_point_grad_gpu(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points) {
    int batch_index = blockIdx.x;
    idx += m*nsample*batch_index;
    grad_out += m*nsample*c*batch_index;
    grad_points += n*c*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;

    for (int j=index;j<m;j+=stride) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                 atomicAdd(&grad_points[ii*c+l], grad_out[j*nsample*c+k*c+l]);
            }
        }
    }
}

// input: k (1), distance matrix dist (b,m,n)
// output: idx (b,m,n), dist_out (b,m,n)
// only the top k results within n are useful
__global__ void selection_sort_gpu(int b, int n, int m, int k, const float *dist, int *outi, float *out) {
    int batch_index = blockIdx.x;
    dist+=m*n*batch_index;
    outi+=m*n*batch_index;
    out+=m*n*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;

    // copy from dist to dist_out
    for (int j=index;j<m;j+=stride) {
        for (int s=0;s<n;++s) {
            out[j*n+s] = dist[j*n+s];
            outi[j*n+s] = s;
        }
    }

    float *p_dist;
    for (int j=index;j<m;j+=stride) {
        p_dist = out+j*n;
        // selection sort for the first k elements
        for (int s=0;s<k;++s) {
            int min=s; 
            // find the min
            for (int t=s+1;t<n;++t) {
                if (p_dist[t]<p_dist[min]) {
                    min = t;
                }
            }
            // swap min-th and i-th element
            if (min!=s) {
                float tmp = p_dist[min];
                p_dist[min] = p_dist[s];
                p_dist[s] = tmp;
                int tmpi = outi[j*n+min];
                outi[j*n+min] = outi[j*n+s];
                outi[j*n+s] = tmpi;
            }
        }
    }
}

void queryBallPointLauncher(int b, int n, int m, const float *radius, int nsample, const float *xyz1, const float *xyz2, int *idx, int *pts_cnt) {
    query_ball_point_gpu<<<b,256>>>(b,n,m,radius,nsample,xyz1,xyz2,idx,pts_cnt);
    //cudaDeviceSynchronize();
}
void selectionSortLauncher(int b, int n, int m, int k, const float *dist, int *outi, float *out) {
    selection_sort_gpu<<<b,256>>>(b,n,m,k,dist,outi,out); 
    //cudaDeviceSynchronize();
}
void groupPointLauncher(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out){
    group_point_gpu<<<b,256>>>(b,n,c,m,nsample,points,idx,out);
    //cudaDeviceSynchronize();
}
void groupPointGradLauncher(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points){
    group_point_grad_gpu<<<b,256>>>(b,n,c,m,nsample,grad_out,idx,grad_points);
    //group_point_grad_gpu<<<1,1>>>(b,n,c,m,nsample,grad_out,idx,grad_points);
    //cudaDeviceSynchronize();
}


================================================
FILE: code/tf_ops/grouping/tf_grouping_op_test.py
================================================
import tensorflow as tf
import numpy as np
from tf_grouping import query_ball_point, group_point

class GroupPointTest(tf.test.TestCase):
  def test(self):
    pass

  def test_grad(self):
    with tf.device('/gpu:0'):
      points = tf.constant(np.random.random((1,128,16)).astype('float32'))
      print points
      xyz1 = tf.constant(np.random.random((1,128,3)).astype('float32'))
      xyz2 = tf.constant(np.random.random((1,8,3)).astype('float32'))
      radius = 0.3 
      nsample = 32
      idx, pts_cnt = query_ball_point(radius, nsample, xyz1, xyz2)
      grouped_points = group_point(points, idx)
      print grouped_points

    with self.test_session():
      print "---- Going to compute gradient error"
      err = tf.test.compute_gradient_error(points, (1,128,16), grouped_points, (1,8,32,16))
      print err
      self.assertLess(err, 1e-4) 

if __name__=='__main__':
  tf.test.main() 


================================================
FILE: code/tf_ops/interpolation/__init__.py
================================================


================================================
FILE: code/tf_ops/interpolation/interpolate.cpp
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}

// Find three nearest neigbors with square distance
// input: xyz1 (b,n,3), xyz2(b,m,3)
// output: dist (b,n,3), idx (b,n,3)
void threenn_cpu(int b, int n, int m, const float *xyz1, const float *xyz2, float *dist, int *idx) {
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
	    float x1=xyz1[j*3+0];
	    float y1=xyz1[j*3+1];
	    float z1=xyz1[j*3+2];
            double best1=1e40; double best2=1e40; double best3=1e40;
            int besti1=0; int besti2=0; int besti3=0;
            for (int k=0;k<m;++k) {
                float x2=xyz2[k*3+0];
	        float y2=xyz2[k*3+1];
	        float z2=xyz2[k*3+2];
		//float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
		double d=x2*x2+y2*y2+z2*z2;
                if (d<best1) {
                    best3=best2;
                    besti3=besti2;
                    best2=best1;
                    besti2=besti1;
                    best1=d;
                    besti1=k;
                } else if (d<best2) {
                    best3=best2;
                    besti3=besti2;
                    best2=d;
                    besti2=k;
                } else if (d<best3) {
                    best3=d;
                    besti3=k;
                }
            } 
            dist[j*3]=best1;
            idx[j*3]=besti1;
            dist[j*3+1]=best2;
            idx[j*3+1]=besti2;
            dist[j*3+2]=best3;
            idx[j*3+2]=besti3;
        } 
        xyz1+=n*3;
        xyz2+=m*3;
        dist+=n*3;
        idx+=n*3;
    }
} 

// CONSTANT WEIGHT TODO
// input: dist (b,n,3)
// output: weight (b,n,3)
void get_weights_cpu(int b, int n, const float *dist, float *weight) {
    const float w = 1.0/3.0;
    for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
            weight[j*3]=w;
            weight[j*3+1]=w;
            weight[j*3+2]=w;
        } 
        dist+=n*3;
        weight+=n*3;
    }
}

// input: points (b,m,c), idx (b,n,3), weight (b,n,3)
// output: out (b,n,c)
void interpolate_cpu(int b, int m, int c, int n, const float *points, const int *idx, const float *weight, float *out) {
     float w1,w2,w3;
     int i1,i2,i3;
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
            w1=weight[j*3];
            w2=weight[j*3+1];
            w3=weight[j*3+2]; 
            i1=idx[j*3];
            i2=idx[j*3+1];
            i3=idx[j*3+2];
            for (int l=0;l<c;++l) {
                out[j*c+l] = points[i1*c+l]*w1 + points[i2*c+l]*w2 + points[i3*c+l]*w3;
            }
        } 
        points+=m*c;
        idx+=n*3;
        weight+=n*3;
        out+=n*c;
    }
}

// input: grad_out (b,n,c), idx (b,n,3), weight (b,n,3)
// output: grad_points (b,m,c)
void interpolate_grad_cpu(int b, int n, int c, int m, const float *grad_out, const int *idx, const float *weight, float *grad_points) {
     float w1,w2,w3;
     int i1,i2,i3;
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
            w1=weight[j*3];
            w2=weight[j*3+1];
            w3=weight[j*3+2]; 
            i1=idx[j*3];
            i2=idx[j*3+1];
            i3=idx[j*3+2];
            for (int l=0;l<c;++l) {
                grad_points[i1*c+l] += grad_out[j*c+l]*w1;
                grad_points[i2*c+l] += grad_out[j*c+l]*w2;
                grad_points[i3*c+l] += grad_out[j*c+l]*w3;
            }
        } 
        grad_out+=n*c;
        idx+=n*3;
        weight+=n*3;
        grad_points+=m*c;
    }
}

int main()
{
    int b=32,n=512,m=128,c=64;
    float *xyz1=new float[b*n*3];
    float *xyz2=new float[b*m*3];
    float *dist=new float[b*n*3];
    int *idx=new int[b*n*3];
    memset(idx, 0, sizeof(int)*b*n*3);
    float *weight=new float[b*n*3];
    float *points=new float[b*m*c];
    float *out=new float[b*n*c];
    float *grad_out=new float[b*n*c]; // grad to out
    memset(grad_out, 0.0, sizeof(float)*b*n*c);
    float *grad_points=new float[b*m*c]; // grad to points
    for (int i=0;i<b*n*3;i++)
        xyz1[i]=randomf();
    for (int i=0;i<b*m*3;i++)
        xyz2[i]=randomf();
    for (int i=0;i<b*m*c;i++)
        points[i]=randomf();

    double t0=get_time();
    threenn_cpu(b,n,m,xyz1,xyz2,dist,idx);
    printf("threenn cpu time %f\n",get_time()-t0);
    
    t0=get_time();
    get_weights_cpu(b,n,dist,weight);
    printf("get_weights_cpu cpu time %f\n",get_time()-t0);

    t0=get_time();
    interpolate_cpu(b,m,c,n,points,idx,weight,out);
    printf("interpolate_cpu cpu time %f\n",get_time()-t0);

    t0=get_time();
    interpolate_grad_cpu(b,n,c,m,grad_out,idx,weight,grad_points);
    printf("interpolate_grad_cpu cpu time %f\n",get_time()-t0);
    return 0;
}


================================================
FILE: code/tf_ops/interpolation/tf_interpolate.cpp
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/common_shape_fns.h"
using namespace tensorflow;

REGISTER_OP("ThreeNN")
    .Input("xyz1: float32")
    .Input("xyz2: float32")
    .Output("dist: float32")
    .Output("idx: int32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        c->set_output(0, c->input(0));
        c->set_output(1, c->input(0));
        return Status::OK();
    });
REGISTER_OP("ThreeInterpolate")
    .Input("points: float32")
    .Input("idx: int32")
    .Input("weight: float32")
    .Output("out: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        ::tensorflow::shape_inference::ShapeHandle dims1; // (b,m,c)
        c->WithRank(c->input(0), 3, &dims1);
        ::tensorflow::shape_inference::ShapeHandle dims2; // (b,n,3)
        c->WithRank(c->input(1), 3, &dims2);
        // (b,n,c)
        ::tensorflow::shape_inference::ShapeHandle output = c->MakeShape({c->Dim(dims1, 0), c->Dim(dims2, 1), c->Dim(dims1, 2)});
        c->set_output(0, output);
        return Status::OK();
    });
REGISTER_OP("ThreeInterpolateGrad")
    .Input("points: float32")
    .Input("idx: int32")
    .Input("weight: float32")
    .Input("grad_out: float32")
    .Output("grad_points: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        c->set_output(0, c->input(0));
        return Status::OK();
    });

float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}

// Find three nearest neigbors with square distance
// input: xyz1 (b,n,3), xyz2(b,m,3)
// output: dist (b,n,3), idx (b,n,3)
void threenn_cpu(int b, int n, int m, const float *xyz1, const float *xyz2, float *dist, int *idx) {
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
	    float x1=xyz1[j*3+0];
	    float y1=xyz1[j*3+1];
	    float z1=xyz1[j*3+2];
            double best1=1e40; double best2=1e40; double best3=1e40;
            int besti1=0; int besti2=0; int besti3=0;
            for (int k=0;k<m;++k) {
                float x2=xyz2[k*3+0];
	        float y2=xyz2[k*3+1];
	        float z2=xyz2[k*3+2];
		//float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
		double d=(x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1);
                if (d<best1) {
                    best3=best2;
                    besti3=besti2;
                    best2=best1;
                    besti2=besti1;
                    best1=d;
                    besti1=k;
                } else if (d<best2) {
                    best3=best2;
                    besti3=besti2;
                    best2=d;
                    besti2=k;
                } else if (d<best3) {
                    best3=d;
                    besti3=k;
                }
            } 
            dist[j*3]=best1;
            idx[j*3]=besti1;
            dist[j*3+1]=best2;
            idx[j*3+1]=besti2;
            dist[j*3+2]=best3;
            idx[j*3+2]=besti3;
        } 
        xyz1+=n*3;
        xyz2+=m*3;
        dist+=n*3;
        idx+=n*3;
    }
} 

// input: points (b,m,c), idx (b,n,3), weight (b,n,3)
// output: out (b,n,c)
void threeinterpolate_cpu(int b, int m, int c, int n, const float *points, const int *idx, const float *weight, float *out) {
     float w1,w2,w3;
     int i1,i2,i3;
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
            w1=weight[j*3];
            w2=weight[j*3+1];
            w3=weight[j*3+2]; 
            i1=idx[j*3];
            i2=idx[j*3+1];
            i3=idx[j*3+2];
            for (int l=0;l<c;++l) {
                out[j*c+l] = points[i1*c+l]*w1 + points[i2*c+l]*w2 + points[i3*c+l]*w3;
            }
        } 
        points+=m*c;
        idx+=n*3;
        weight+=n*3;
        out+=n*c;
    }
}

// input: grad_out (b,n,c), idx (b,n,3), weight (b,n,3)
// output: grad_points (b,m,c)
void threeinterpolate_grad_cpu(int b, int n, int c, int m, const float *grad_out, const int *idx, const float *weight, float *grad_points) {
     float w1,w2,w3;
     int i1,i2,i3;
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
            w1=weight[j*3];
            w2=weight[j*3+1];
            w3=weight[j*3+2]; 
            i1=idx[j*3];
            i2=idx[j*3+1];
            i3=idx[j*3+2];
            for (int l=0;l<c;++l) {
                grad_points[i1*c+l] += grad_out[j*c+l]*w1;
                grad_points[i2*c+l] += grad_out[j*c+l]*w2;
                grad_points[i3*c+l] += grad_out[j*c+l]*w3;
            }
        } 
        grad_out+=n*c;
        idx+=n*3;
        weight+=n*3;
        grad_points+=m*c;
    }
}


class ThreeNNOp : public OpKernel {
    public:
        explicit ThreeNNOp(OpKernelConstruction* context) : OpKernel(context) {}

        void Compute(OpKernelContext* context) override {
            const Tensor& xyz1_tensor = context->input(0);
            OP_REQUIRES(context, xyz1_tensor.dims()==3 && xyz1_tensor.shape().dim_size(2)==3, errors::InvalidArgument("ThreeNN expects (b,n,3) xyz1 shape."));
            int b = xyz1_tensor.shape().dim_size(0);
            int n = xyz1_tensor.shape().dim_size(1);

            const Tensor& xyz2_tensor = context->input(1);
            OP_REQUIRES(context, xyz2_tensor.dims()==3 && xyz2_tensor.shape().dim_size(2)==3, errors::InvalidArgument("ThreeNN expects (b,m,3) xyz2 shape."));
            int m = xyz2_tensor.shape().dim_size(1);

            Tensor *dist_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0, TensorShape{b,n,3}, &dist_tensor));
            Tensor *idx_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(1, TensorShape{b,n,3}, &idx_tensor));

            auto xyz1_flat = xyz1_tensor.flat<float>();
            const float *xyz1 = &(xyz1_flat(0));
            auto xyz2_flat = xyz2_tensor.flat<float>();
            const float *xyz2 = &(xyz2_flat(0));
            auto dist_flat = dist_tensor->flat<float>();
            float *dist = &(dist_flat(0));
            auto idx_flat = idx_tensor->flat<int>();
            int *idx = &(idx_flat(0));
            threenn_cpu(b,n,m,xyz1,xyz2,dist,idx);
        }
};
REGISTER_KERNEL_BUILDER(Name("ThreeNN").Device(DEVICE_CPU), ThreeNNOp);


class ThreeInterpolateOp: public OpKernel{
    public:
        explicit ThreeInterpolateOp(OpKernelConstruction * context):OpKernel(context){}

        void Compute(OpKernelContext * context) override {
            const Tensor& points_tensor=context->input(0);
            OP_REQUIRES(context, points_tensor.dims()==3, errors::InvalidArgument("ThreeInterpolate expects (b,m,c) points shape"));
            int b = points_tensor.shape().dim_size(0);
            int m = points_tensor.shape().dim_size(1);
            int c = points_tensor.shape().dim_size(2);

            const Tensor& idx_tensor=context->input(1);
            OP_REQUIRES(context,idx_tensor.dims()==3 && idx_tensor.shape().dim_size(0)==b && idx_tensor.shape().dim_size(2)==3, errors::InvalidArgument("ThreeInterpolate expects (b,n,3) idx shape"));
            int n = idx_tensor.shape().dim_size(1);
            const Tensor& weight_tensor=context->input(2);
            OP_REQUIRES(context,weight_tensor.dims()==3 && weight_tensor.shape().dim_size(0)==b && weight_tensor.shape().dim_size(1)==n && weight_tensor.shape().dim_size(2)==3, errors::InvalidArgument("ThreeInterpolate expects (b,n,3) weight shape"));

            Tensor * out_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0,TensorShape{b,n,c}, &out_tensor));

            auto points_flat = points_tensor.flat<float>();
            const float *points = &(points_flat(0));
            auto idx_flat = idx_tensor.flat<int>();
            const int *idx = &(idx_flat(0));
            auto weight_flat = weight_tensor.flat<float>();
            const float *weight = &(weight_flat(0));
            auto out_flat = out_tensor->flat<float>();
            float *out = &(out_flat(0));
            threeinterpolate_cpu(b,m,c,n,points,idx,weight,out);
        }
};
REGISTER_KERNEL_BUILDER(Name("ThreeInterpolate").Device(DEVICE_CPU),ThreeInterpolateOp);


class ThreeInterpolateGradOp: public OpKernel{
    public:
        explicit ThreeInterpolateGradOp(OpKernelConstruction * context):OpKernel(context){}

        void Compute(OpKernelContext * context) override {
            const Tensor& points_tensor=context->input(0);
            OP_REQUIRES(context, points_tensor.dims()==3, errors::InvalidArgument("ThreeInterpolateGrad expects (b,m,c) points shape"));
            int b = points_tensor.shape().dim_size(0);
            int m = points_tensor.shape().dim_size(1);
            int c = points_tensor.shape().dim_size(2);

            const Tensor& idx_tensor=context->input(1);
            OP_REQUIRES(context,idx_tensor.dims()==3 && idx_tensor.shape().dim_size(0)==b, errors::InvalidArgument("ThreeInterpolateGrad expects (b,n,3) idx shape"));
            int n = idx_tensor.shape().dim_size(1);
            const Tensor& weight_tensor=context->input(2);
            OP_REQUIRES(context,weight_tensor.dims()==3 && weight_tensor.shape().dim_size(0)==b && weight_tensor.shape().dim_size(1)==n && weight_tensor.shape().dim_size(2)==3, errors::InvalidArgument("ThreeInterpolateGrad expects (b,n,3) weight shape"));

            const Tensor& grad_out_tensor=context->input(3);
            OP_REQUIRES(context,grad_out_tensor.dims()==3 && grad_out_tensor.shape().dim_size(0)==b && grad_out_tensor.shape().dim_size(1)==n && grad_out_tensor.shape().dim_size(2)==c, errors::InvalidArgument("ThreeInterpolateGrad expects (b,n,c) grad_out shape"));

            Tensor * grad_points_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0,TensorShape{b,m,c}, &grad_points_tensor));

            auto points_flat = points_tensor.flat<float>();
            const float *points = &(points_flat(0));
            auto idx_flat = idx_tensor.flat<int>();
            const int *idx = &(idx_flat(0));
            auto weight_flat = weight_tensor.flat<float>();
            const float *weight = &(weight_flat(0));
            auto grad_out_flat = grad_out_tensor.flat<float>();
            const float *grad_out = &(grad_out_flat(0));
            auto grad_points_flat = grad_points_tensor->flat<float>();
            float *grad_points = &(grad_points_flat(0));
            memset(grad_points, 0, sizeof(float)*b*m*c);
            threeinterpolate_grad_cpu(b,n,c,m,grad_out,idx,weight,grad_points);
        }
};
REGISTER_KERNEL_BUILDER(Name("ThreeInterpolateGrad").Device(DEVICE_CPU),ThreeInterpolateGradOp);


================================================
FILE: code/tf_ops/interpolation/tf_interpolate.py
================================================
import tensorflow as tf
from tensorflow.python.framework import ops
import sys
import os
BASE_DIR = os.path.dirname(__file__)
sys.path.append(BASE_DIR)
interpolate_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_interpolate_so.so'))
def three_nn(xyz1, xyz2):
    '''
    Input:
        xyz1: (b,n,3) float32 array, unknown points
        xyz2: (b,m,3) float32 array, known points
    Output:
        dist: (b,n,3) float32 array, distances to known points
        idx: (b,n,3) int32 array, indices to known points
    '''
    return interpolate_module.three_nn(xyz1, xyz2)
ops.NoGradient('ThreeNN')
def three_interpolate(points, idx, weight):
    '''
    Input:
        points: (b,m,c) float32 array, known points
        idx: (b,n,3) int32 array, indices to known points
        weight: (b,n,3) float32 array, weights on known points
    Output:
        out: (b,n,c) float32 array, interpolated point values
    '''
    return interpolate_module.three_interpolate(points, idx, weight)
@tf.RegisterGradient('ThreeInterpolate')
def _three_interpolate_grad(op, grad_out):
    points = op.inputs[0]
    idx = op.inputs[1]
    weight = op.inputs[2]
    return [interpolate_module.three_interpolate_grad(points, idx, weight, grad_out), None, None]

if __name__=='__main__':
    import numpy as np
    import time
    np.random.seed(100)
    pts = np.random.random((32,128,64)).astype('float32')
    tmp1 = np.random.random((32,512,3)).astype('float32')
    tmp2 = np.random.random((32,128,3)).astype('float32')
    with tf.device('/cpu:0'):
        points = tf.constant(pts)
        xyz1 = tf.constant(tmp1)
        xyz2 = tf.constant(tmp2)
        dist, idx = three_nn(xyz1, xyz2)
        weight = tf.ones_like(dist)/3.0
        interpolated_points = three_interpolate(points, idx, weight)
    with tf.Session('') as sess:
        now = time.time() 
        for _ in range(100):
            ret = sess.run(interpolated_points)
        print time.time() - now
        print ret.shape, ret.dtype
        #print ret
    
    
================================================
FILE: code/tf_ops/interpolation/tf_interpolate_compile.sh
================================================
g++ -std=c++11 tf_interpolate.cpp -o tf_interpolate_so.so -shared -fPIC -I /home/ganeshiyer/tensorflow/lib/python2.7/site-packages/tensorflow/include  -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0


================================================
FILE: code/tf_ops/interpolation/tf_interpolate_op_test.py
================================================
import tensorflow as tf
import numpy as np
from tf_interpolate import three_nn, three_interpolate

class GroupPointTest(tf.test.TestCase):
  def test(self):
    pass

  def test_grad(self):
    with self.test_session():
      points = tf.constant(np.random.random((1,8,16)).astype('float32'))
      print points
      xyz1 = tf.constant(np.random.random((1,128,3)).astype('float32'))
      xyz2 = tf.constant(np.random.random((1,8,3)).astype('float32'))
      dist, idx = three_nn(xyz1, xyz2)
      weight = tf.ones_like(dist)/3.0
      interpolated_points = three_interpolate(points, idx, weight)
      print interpolated_points
      err = tf.test.compute_gradient_error(points, (1,8,16), interpolated_points, (1,128,16))
      print err
      self.assertLess(err, 1e-4) 

if __name__=='__main__':
  tf.test.main() 


================================================
FILE: code/tf_ops/interpolation/visu_interpolation.py
================================================
''' Visualize part segmentation '''
import os
import sys
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.append('/home/rqi/Projects/toolkits/visualization')
from show3d_balls import showpoints
import numpy as np
from tf_interpolate import three_nn, three_interpolate
import tensorflow as tf


pts2 = np.array([[0,0,1],[1,0,0],[0,1,0],[1,1,0]]).astype('float32')
xyz1 = np.random.random((100,3)).astype('float32')
xyz2 = np.array([[0,0,0],[1,0,0],[0,1,0],[1,1,1]]).astype('float32')

def fun(xyz1,xyz2,pts2):
    with tf.device('/cpu:0'):
        points = tf.constant(np.expand_dims(pts2,0))
        xyz1 = tf.constant(np.expand_dims(xyz1,0))
        xyz2 = tf.constant(np.expand_dims(xyz2,0))
        dist, idx = three_nn(xyz1, xyz2)
        #weight = tf.ones_like(dist)/3.0
        dist = tf.maximum(dist, 1e-10)
        norm = tf.reduce_sum((1.0/dist),axis=2,keep_dims=True)
        norm = tf.tile(norm, [1,1,3])
        print norm
        weight = (1.0/dist) / norm
        interpolated_points = three_interpolate(points, idx, weight)
    with tf.Session('') as sess:
        tmp,pts1,d,w = sess.run([xyz1, interpolated_points, dist, weight])
        #print w
        pts1 = pts1.squeeze()
    return pts1

pts1 = fun(xyz1,xyz2,pts2) 
all_pts = np.zeros((104,3))
all_pts[0:100,:] = pts1
all_pts[100:,:] = pts2
all_xyz = np.zeros((104,3))
all_xyz[0:100,:]=xyz1
all_xyz[100:,:]=xyz2
showpoints(xyz2, pts2, ballradius=8)
showpoints(xyz1, pts1, ballradius=8)
showpoints(all_xyz, all_pts, ballradius=8)


================================================
FILE: code/tf_ops/sampling/__init__.py
================================================


================================================
FILE: code/tf_ops/sampling/tf_sampling.cpp
================================================
/* Furthest point sampling
 * Original author: Haoqiang Fan
 * Modified by Charles R. Qi
 * All Rights Reserved. 2017. 
 */
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/common_shape_fns.h"
#include <cuda_runtime.h>

using namespace tensorflow;

REGISTER_OP("ProbSample")
  .Input("inp: float32")
  .Input("inpr: float32")
  .Output("out: int32")
  .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
    ::tensorflow::shape_inference::ShapeHandle dims1; // batch_size * ncategory
    c->WithRank(c->input(0), 2, &dims1);
    ::tensorflow::shape_inference::ShapeHandle dims2; // batch_size * npoints
    c->WithRank(c->input(1), 2, &dims2);
    // batch_size * npoints
    ::tensorflow::shape_inference::ShapeHandle output = c->MakeShape({c->Dim(dims2, 0), c->Dim(dims2, 1)});
    c->set_output(0, output);
    return Status::OK();
  });
REGISTER_OP("FarthestPointSample")
  .Attr("npoint: int")
  .Input("inp: float32")
  .Output("out: int32")
  .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
    ::tensorflow::shape_inference::ShapeHandle dims1; // batch_size * npoint * 3
    c->WithRank(c->input(0), 3, &dims1);
    int npoint;
    TF_RETURN_IF_ERROR(c->GetAttr("npoint", &npoint));
    ::tensorflow::shape_inference::ShapeHandle output = c->MakeShape({c->Dim(dims1, 0), npoint});
    c->set_output(0, output);
    return Status::OK();
  });
REGISTER_OP("GatherPoint")
  .Input("inp: float32")
  .Input("idx: int32")
  .Output("out: float32")
  .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
    ::tensorflow::shape_inference::ShapeHandle dims1; // batch_size * ndataset * 3
    c->WithRank(c->input(0), 3, &dims1);
    ::tensorflow::shape_inference::ShapeHandle dims2; // batch_size * npoints
    c->WithRank(c->input(1), 2, &dims2);
    // batch_size * npoints * 3
    ::tensorflow::shape_inference::ShapeHandle output = c->MakeShape({c->Dim(dims1, 0), c->Dim(dims2, 1), c->Dim(dims1, 2)});
    c->set_output(0, output);
    return Status::OK();
  });
REGISTER_OP("GatherPointGrad")
  .Input("inp: float32")
  .Input("idx: int32")
  .Input("out_g: float32")
  .Output("inp_g: float32")
  .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
    c->set_output(0, c->input(0));
    return Status::OK();
  });

void probsampleLauncher(int b,int n,int m,const float * inp_p,const float * inp_r,float * temp,int * out);
class ProbSampleGpuOp: public OpKernel{
  public:
    explicit ProbSampleGpuOp(OpKernelConstruction* context):OpKernel(context){}
    void Compute(OpKernelContext * context)override{
      const Tensor& inp_tensor=context->input(0);
      const Tensor& inpr_tensor=context->input(1);
      auto inp_flat=inp_tensor.flat<float>();
      auto inpr_flat=inpr_tensor.flat<float>();
      const float * inp=&(inp_flat(0));
      const float * inpr=&(inpr_flat(0));
      OP_REQUIRES(context,inp_tensor.dims()==2,errors::InvalidArgument("ProbSample expects (batch_size,num_choices) inp shape"));
      int b=inp_tensor.shape().dim_size(0);
      int n=inp_tensor.shape().dim_size(1);
      OP_REQUIRES(context,inpr_tensor.dims()==2 && inpr_tensor.shape().dim_size(0)==b,errors::InvalidArgument("ProbSample expects (batch_size,num_points) inpr shape"));
      int m=inpr_tensor.shape().dim_size(1);
      Tensor * out_tensor=NULL;
      OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,m},&out_tensor));
      auto out_flat=out_tensor->flat<int>();
      int * out=&(out_flat(0));
      Tensor temp_tensor;
      OP_REQUIRES_OK(context,context->allocate_temp(DataTypeToEnum<float>::value,TensorShape{b,n},&temp_tensor));
      auto temp_flat=temp_tensor.flat<float>();
      float * temp=&(temp_flat(0));
      probsampleLauncher(b,n,m,inp,inpr,temp,out);
    }
};
REGISTER_KERNEL_BUILDER(Name("ProbSample").Device(DEVICE_GPU), ProbSampleGpuOp);

void farthestpointsamplingLauncher(int b,int n,int m,const float * inp,float * temp,int * out);
class FarthestPointSampleGpuOp: public OpKernel{
  public:
    explicit FarthestPointSampleGpuOp(OpKernelConstruction* context):OpKernel(context) {
                    OP_REQUIRES_OK(context, context->GetAttr("npoint", &npoint_));
                    OP_REQUIRES(context, npoint_ > 0, errors::InvalidArgument("FarthestPointSample expects positive npoint"));
                }
    void Compute(OpKernelContext * context)override{
      int m = npoint_;

      const Tensor& inp_tensor=context->input(0);
      OP_REQUIRES(context,inp_tensor.dims()==3 && inp_tensor.shape().dim_size(2)==3,errors::InvalidArgument("FarthestPointSample expects (batch_size,num_points,3) inp shape"));
      int b=inp_tensor.shape().dim_size(0);
      int n=inp_tensor.shape().dim_size(1);
      auto inp_flat=inp_tensor.flat<float>();
      const float * inp=&(inp_flat(0));
      Tensor * out_tensor;
      OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,m},&out_tensor));
      auto out_flat=out_tensor->flat<int>();
      int * out=&(out_flat(0));
      Tensor temp_tensor;
      OP_REQUIRES_OK(context,context->allocate_temp(DataTypeToEnum<float>::value,TensorShape{32,n},&temp_tensor));
      auto temp_flat=temp_tensor.flat<float>();
      float * temp=&(temp_flat(0));
      farthestpointsamplingLauncher(b,n,m,inp,temp,out);
    }
    private:
        int npoint_;
};
REGISTER_KERNEL_BUILDER(Name("FarthestPointSample").Device(DEVICE_GPU),FarthestPointSampleGpuOp);

void gatherpointLauncher(int b,int n,int m,const float * inp,const int * idx,float * out);
class GatherPointGpuOp: public OpKernel{
  public:
    explicit GatherPointGpuOp(OpKernelConstruction * context):OpKernel(context){}
    void Compute(OpKernelContext * context)override{
      const Tensor& inp_tensor=context->input(0);
      OP_REQUIRES(context,inp_tensor.dims()==3 && inp_tensor.shape().dim_size(2)==3,errors::InvalidArgument("GatherPoint expects (batch_size,num_points,3) inp shape"));
      int b=inp_tensor.shape().dim_size(0);
      int n=inp_tensor.shape().dim_size(1);
      const Tensor& idx_tensor=context->input(1);
      OP_REQUIRES(context,idx_tensor.dims()==2 && idx_tensor.shape().dim_size(0)==b,errors::InvalidArgument("GatherPoint expects (batch_size,num_result) idx shape"));
      int m=idx_tensor.shape().dim_size(1);
      auto inp_flat=inp_tensor.flat<float>();
      const float * inp=&(inp_flat(0));
      auto idx_flat=idx_tensor.flat<int>();
      const int * idx=&(idx_flat(0));
      Tensor * out_tensor=NULL;
      OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,m,3},&out_tensor));
      auto out_flat=out_tensor->flat<float>();
      float * out=&(out_flat(0));
      gatherpointLauncher(b,n,m,inp,idx,out);
    }
};
REGISTER_KERNEL_BUILDER(Name("GatherPoint").Device(DEVICE_GPU),GatherPointGpuOp);

void scatteraddpointLauncher(int b,int n,int m,const float * out_g,const int * idx,float * inp_g);
class GatherPointGradGpuOp: public OpKernel{
  public:
    explicit GatherPointGradGpuOp(OpKernelConstruction * context):OpKernel(context){}
    void Compute(OpKernelContext * context)override{
      const Tensor& inp_tensor=context->input(0);
      OP_REQUIRES(context,inp_tensor.dims()==3 && inp_tensor.shape().dim_size(2)==3,errors::InvalidArgument("GatherPointGradGpuOp expects (batch_size,num_points,3) inp"));
      int b=inp_tensor.shape().dim_size(0);
      int n=inp_tensor.shape().dim_size(1);
      const Tensor& idx_tensor=context->input(1);
      OP_REQUIRES(context,idx_tensor.dims()==2 && idx_tensor.shape().dim_size(0)==b,errors::InvalidArgument("GatherPointGradGpuOp expects (batch_size,num_result) idx shape"));
      int m=idx_tensor.shape().dim_size(1);
      auto inp_flat=inp_tensor.flat<float>();
      const float * inp=&(inp_flat(0));
      auto idx_flat=idx_tensor.flat<int>();
      const int * idx=&(idx_flat(0));
      const Tensor& out_g_tensor=context->input(2);
      OP_REQUIRES(context,out_g_tensor.dims()==3 && out_g_tensor.shape().dim_size(0)==b && out_g_tensor.shape().dim_size(1)==m && out_g_tensor.shape().dim_size(2)==3,errors::InvalidArgument("GatherPointGradGpuOp expects (batch_size,num_result,3) out_g shape"));
      auto out_g_flat=out_g_tensor.flat<float>();
      const float * out_g=&(out_g_flat(0));
      Tensor * inp_g_tensor=NULL;
      OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n,3},&inp_g_tensor));
      auto inp_g_flat=inp_g_tensor->flat<float>();
      float * inp_g=&(inp_g_flat(0));
      cudaMemset(inp_g,0,b*n*3*4);
      scatteraddpointLauncher(b,n,m,out_g,idx,inp_g);
    }
};
REGISTER_KERNEL_BUILDER(Name("GatherPointGrad").Device(DEVICE_GPU),GatherPointGradGpuOp);


================================================
FILE: code/tf_ops/sampling/tf_sampling.py
================================================
''' Furthest point sampling
Original author: Haoqiang Fan
Modified by Charles R. Qi
All Rights Reserved. 2017. 
'''
import tensorflow as tf
from tensorflow.python.framework import ops
import sys
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
sampling_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_sampling_so.so'))
def prob_sample(inp,inpr):
    '''
input:
    batch_size * ncategory float32
    batch_size * npoints   float32
returns:
    batch_size * npoints   int32
    '''
    return sampling_module.prob_sample(inp,inpr)
ops.NoGradient('ProbSample')
# TF1.0 API requires set shape in C++
#@tf.RegisterShape('ProbSample')
#def _prob_sample_shape(op):
#    shape1=op.inputs[0].get_shape().with_rank(2)
#    shape2=op.inputs[1].get_shape().with_rank(2)
#    return [tf.TensorShape([shape2.dims[0],shape2.dims[1]])]
def gather_point(inp,idx):
    '''
input:
    batch_size * ndataset * 3   float32
    batch_size * npoints        int32
returns:
    batch_size * npoints * 3    float32
    '''
    return sampling_module.gather_point(inp,idx)
#@tf.RegisterShape('GatherPoint')
#def _gather_point_shape(op):
#    shape1=op.inputs[0].get_shape().with_rank(3)
#    shape2=op.inputs[1].get_shape().with_rank(2)
#    return [tf.TensorShape([shape1.dims[0],shape2.dims[1],shape1.dims[2]])]
@tf.RegisterGradient('GatherPoint')
def _gather_point_grad(op,out_g):
    inp=op.inputs[0]
    idx=op.inputs[1]
    return [sampling_module.gather_point_grad(inp,idx,out_g),None]
def farthest_point_sample(npoint,inp):
    '''
input:
    int32
    batch_size * ndataset * 3   float32
returns:
    batch_size * npoint         int32
    '''
    return sampling_module.farthest_point_sample(inp, npoint)
ops.NoGradient('FarthestPointSample')
    

if __name__=='__main__':
    import numpy as np
    np.random.seed(100)
    triangles=np.random.rand(1,5,3,3).astype('float32')
    with tf.device('/gpu:1'):
        inp=tf.constant(triangles)
        tria=inp[:,:,0,:]
        trib=inp[:,:,1,:]
        tric=inp[:,:,2,:]
        areas=tf.sqrt(tf.reduce_sum(tf.cross(trib-tria,tric-tria)**2,2)+1e-9)
        randomnumbers=tf.random_uniform((1,8192))
        triids=prob_sample(areas,randomnumbers)
        tria_sample=gather_point(tria,triids)
        trib_sample=gather_point(trib,triids)
        tric_sample=gather_point(tric,triids)
        us=tf.random_uniform((1,8192))
        vs=tf.random_uniform((1,8192))
        uplusv=1-tf.abs(us+vs-1)
        uminusv=us-vs
        us=(uplusv+uminusv)*0.5
        vs=(uplusv-uminusv)*0.5
        pt_sample=tria_sample+(trib_sample-tria_sample)*tf.expand_dims(us,-1)+(tric_sample-tria_sample)*tf.expand_dims(vs,-1)
        print 'pt_sample: ', pt_sample
        reduced_sample=gather_point(pt_sample,farthest_point_sample(1024,pt_sample))
        print reduced_sample
    with tf.Session('') as sess:
        ret=sess.run(reduced_sample)
    print ret.shape,ret.dtype
    import cPickle as pickle
    pickle.dump(ret,open('1.pkl','wb'),-1)


================================================
FILE: code/tf_ops/sampling/tf_sampling_compile.sh
================================================
#/bin/bash
/usr/local/cuda-8.0/bin/nvcc tf_sampling_g.cu -o tf_sampling_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -I /home/ganeshiyer/tensorflow/lib/python2.7/site-packages/tensorflow/include  -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0


================================================
FILE: code/tf_ops/sampling/tf_sampling_g.cu
================================================
/* Furthest point sampling GPU implementation
 * Original author: Haoqiang Fan
 * Modified by Charles R. Qi
 * All Rights Reserved. 2017. 
 */

__global__ void cumsumKernel(int b,int n,const float * __restrict__ inp,float * __restrict__ out){
  const int BlockSize=2048;
  const int paddingLevel=5;
  __shared__ float buffer4[BlockSize*4];
  __shared__ float buffer[BlockSize+(BlockSize>>paddingLevel)];
  for (int i=blockIdx.x;i<b;i+=gridDim.x){
    float runningsum=0,runningsum2=0;
    for (int j=0;j<n;j+=BlockSize*4){
      int n24_i=min(n-j,BlockSize*4);
      int n24=(n24_i+3)&~3;
      int n2=n24>>2;
      for (int k=threadIdx.x*4;k<n24_i;k+=blockDim.x*4){
        if (k+3<n24_i){
          float v1=inp[i*n+j+k];
          float v2=inp[i*n+j+k+1];
          v2+=v1;
          float v3=inp[i*n+j+k+2];
          float v4=inp[i*n+j+k+3];
          v4+=v3;
          v3+=v2;
          v4+=v2;
          buffer4[k]=v1;
          buffer4[k+1]=v2;
          buffer4[k+2]=v3;
          buffer4[k+3]=v4;
          buffer[(k>>2)+(k>>(2+paddingLevel))]=v4;
        }else{
          float v=0;
          for (int k2=k;k2<n24_i;k2++){
            v+=inp[i*n+j+k2];
            buffer4[k2]=v;
          }
          for (int k2=n24_i;k2<n24;k2++){
            buffer4[k2]=v;
          }
          buffer[(k>>2)+(k>>(2+paddingLevel))]=v;
        }
      }
      int u=0;
      for (;(2<<u)<=n2;u++){
        __syncthreads();
        for (int k=threadIdx.x;k<int(n2>>(u+1));k+=blockDim.x){
          int i1=(((k<<1)+2)<<u)-1;
          int i2=(((k<<1)+1)<<u)-1;
          i1+=i1>>paddingLevel;
          i2+=i2>>paddingLevel;
          buffer[i1]+=buffer[i2];
        }
      }
      u--;
      for (;u>=0;u--){
        __syncthreads();
        for (int k=threadIdx.x;k<int((n2-(1<<u))>>(u+1));k+=blockDim.x){
          int i1=(((k<<1)+3)<<u)-1;
          int i2=(((k<<1)+2)<<u)-1;
          i1+=i1>>paddingLevel;
          i2+=i2>>paddingLevel;
          buffer[i1]+=buffer[i2];
        }
      }
      __syncthreads();
      for (int k=threadIdx.x*4;k<n24;k+=blockDim.x*4){
        if (k!=0){
          int k2=((k>>2)-1)+(((k>>2)-1)>>paddingLevel);
          buffer4[k]+=buffer[k2];
          buffer4[k+1]+=buffer[k2];
          buffer4[k+2]+=buffer[k2];
          buffer4[k+3]+=buffer[k2];
        }
      }
      __syncthreads();
      for (int k=threadIdx.x;k<n24_i;k+=blockDim.x){
        out[i*n+j+k]=buffer4[k]+runningsum;
      }
      float t=buffer[(n2-1)+((n2-1)>>paddingLevel)]+runningsum2;
      float r2=runningsum+t;
      runningsum2=t-(r2-runningsum);
      runningsum=r2;
      __syncthreads();
    }
  }
}

__global__ void binarysearchKernel(int b,int n,int m,const float * __restrict__ dataset,const float * __restrict__ query, int * __restrict__ result){
  int base=1;
  while (base<n)
    base<<=1;
  for (int i=blockIdx.x;i<b;i+=gridDim.x){
    for (int j=blockIdx.y*blockDim.x+threadIdx.x;j<m;j+=blockDim.x*gridDim.y){
      float q=query[i*m+j]*dataset[i*n+n-1];
      int r=n-1;
      for (int k=base;k>=1;k>>=1)
        if (r>=k && dataset[i*n+r-k]>=q)
          r-=k;
      result[i*m+j]=r;
    }
  }
}
__global__ void farthestpointsamplingKernel(int b,int n,int m,const float * __restrict__ dataset,float * __restrict__ temp,int * __restrict__ idxs){
  if (m<=0)
    return;
  const int BlockSize=512;
  __shared__ float dists[BlockSize];
  __shared__ int dists_i[BlockSize];
  const int BufferSize=3072;
  __shared__ float buf[BufferSize*3];
  for (int i=blockIdx.x;i<b;i+=gridDim.x){
    int old=0;
    if (threadIdx.x==0)
      idxs[i*m+0]=old;
    for (int j=threadIdx.x;j<n;j+=blockDim.x){
      temp[blockIdx.x*n+j]=1e38;
    }
    for (int j=threadIdx.x;j<min(BufferSize,n)*3;j+=blockDim.x){
      buf[j]=dataset[i*n*3+j];
    }
    __syncthreads();
    for (int j=1;j<m;j++){
      int besti=0;
      float best=-1;
      float x1=dataset[i*n*3+old*3+0];
      float y1=dataset[i*n*3+old*3+1];
      float z1=dataset[i*n*3+old*3+2];
      for (int k=threadIdx.x;k<n;k+=blockDim.x){
        float td=temp[blockIdx.x*n+k];
        float x2,y2,z2;
        if (k<BufferSize){
          x2=buf[k*3+0];
          y2=buf[k*3+1];
          z2=buf[k*3+2];
        }else{
          x2=dataset[i*n*3+k*3+0];
          y2=dataset[i*n*3+k*3+1];
          z2=dataset[i*n*3+k*3+2];
        }
        float d=(x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1);
        float d2=min(d,td);
        if (d2!=td)
          temp[blockIdx.x*n+k]=d2;
        if (d2>best){
          best=d2;
          besti=k;
        }
      }
      dists[threadIdx.x]=best;
      dists_i[threadIdx.x]=besti;
      for (int u=0;(1<<u)<blockDim.x;u++){
        __syncthreads();
        if (threadIdx.x<(blockDim.x>>(u+1))){
          int i1=(threadIdx.x*2)<<u;
          int i2=(threadIdx.x*2+1)<<u;
          if (dists[i1]<dists[i2]){
            dists[i1]=dists[i2];
            dists_i[i1]=dists_i[i2];
          }
        }
      }
      __syncthreads();
      old=dists_i[0];
      if (threadIdx.x==0)
        idxs[i*m+j]=old;
    }
  }
}

__global__ void gatherpointKernel(int b,int n,int m,const float * __restrict__ inp,const int * __restrict__ idx,float * __restrict__ out){
  for (int i=blockIdx.x;i<b;i+=gridDim.x){
    for (int j=blockIdx.y*blockDim.x+threadIdx.x;j<m;j+=blockDim.x*gridDim.y){
      int a=idx[i*m+j];
      out[(i*m+j)*3+0]=inp[(i*n+a)*3+0];
      out[(i*m+j)*3+1]=inp[(i*n+a)*3+1];
      out[(i*m+j)*3+2]=inp[(i*n+a)*3+2];
    }
  }
}

__global__ void scatteraddpointKernel(int b,int n,int m,const float * __restrict__ out_g,const int * __restrict__ idx,float * __restrict__ inp_g){
  for (int i=blockIdx.x;i<b;i+=gridDim.x){
    for (int j=blockIdx.y*blockDim.x+threadIdx.x;j<m;j+=blockDim.x*gridDim.y){
      int a=idx[i*m+j];
      atomicAdd(&inp_g[(i*n+a)*3+0],out_g[(i*m+j)*3+0]);
      atomicAdd(&inp_g[(i*n+a)*3+1],out_g[(i*m+j)*3+1]);
      atomicAdd(&inp_g[(i*n+a)*3+2],out_g[(i*m+j)*3+2]);
    }
  }
}

void cumsumLauncher(int b,int n,const float * inp,float * out){
  cumsumKernel<<<32,512>>>(b,n,inp,out);
}
//require b*n working space
void probsampleLauncher(int b,int n,int m,const float * inp_p,const float * inp_r,float * temp,int * out){
  cumsumKernel<<<32,512>>>(b,n,inp_p,temp);
  binarysearchKernel<<<dim3(32,8,1),512>>>(b,n,m,temp,inp_r,out);
}
//require 32*n working space
void farthestpointsamplingLauncher(int b,int n,int m,const float * inp,float * temp,int * out){
  farthestpointsamplingKernel<<<32,512>>>(b,n,m,inp,temp,out);
}
void gatherpointLauncher(int b,int n,int m,const float * inp,const int * idx,float * out){
  gatherpointKernel<<<dim3(2,8,1),512>>>(b,n,m,inp,idx,out);
}
void scatteraddpointLauncher(int b,int n,int m,const float * out_g,const int * idx,float * inp_g){
  scatteraddpointKernel<<<dim3(2,8,1),512>>>(b,n,m,out_g,idx,inp_g);
}


================================================
FILE: code/train_model_combined.py
================================================
import numpy as np
import tensorflow as tf
import scipy.misc as smc

import config_res as config

from common.cnn_utils_res import *
from common import resnet_rgb_model as model
from common import resnet_depth_model as model_depth
from common import all_transformer as at3
from common import global_agg_net
from common.Lie_functions import exponential_map_single

import nw_loader_color as ldr
import model_utils


_BETA_CONST = 1.0
_ALPHA_CONST = 1.0
IMG_HT = config.depth_img_params['IMG_HT']
IMG_WDT = config.depth_img_params['IMG_WDT']
batch_size = config.net_params['batch_size']
learning_rate = config.net_params['learning_rate']
n_epochs = config.net_params['epochs']
current_epoch = config.net_params['load_epoch']

tf.reset_default_graph()

X1 = tf.placeholder(tf.float32, shape = (batch_size, IMG_HT, IMG_WDT, 3), name = "X1")
X2 = tf.placeholder(tf.float32, shape = (batch_size, IMG_HT, IMG_WDT, 1), name = "X2")
depth_maps_target = tf.placeholder(tf.float32, shape = (batch_size, IMG_HT, IMG_WDT, 1), name = "depth_maps_target")
expected_transforms = tf.placeholder(tf.float32, shape = (batch_size, 4, 4), name = "expected_transforms")

phase = tf.placeholder(tf.bool, [], name = "phase")
phase_rgb = tf.placeholder(tf.bool, [], name = "phase_rgb")
keep_prob = tf.placeholder(tf.float32, name = "keep_prob")

fx = config.camera_params['fx']
fy = config.camera_params['fy']
cx = config.camera_params['cx']
cy = config.camera_params['cy']

fx_scaled = 2*(fx)/np.float32(IMG_WDT)              # focal length x scaled for -1 to 1 range
fy_scaled = 2*(fy)/np.float32(IMG_HT)               # focal length y scaled for -1 to 1 range
cx_scaled = -1 + 2*(cx - 1.0)/np.float32(IMG_WDT)   # optical center x scaled for -1 to 1 range
cy_scaled = -1 + 2*(cy - 1.0)/np.float32(IMG_HT)    # optical center y scaled for -1 to 1 range

K_mat_scaled = np.array([[fx_scaled,  0.0, cx_scaled],
                         [0.0, fy_scaled,  cy_scaled],
                         [0.0, 0.0, 1.0]], dtype = np.float32)

K_final = tf.constant(K_mat_scaled, dtype = tf.float32)
small_transform = tf.constant(config.camera_params['cam_transform_02_inv'], dtype = tf.float32)


X2_pooled = tf.nn.max_pool(X2, ksize=[1,5,5,1], strides=[1,1,1,1], padding="SAME")
depth_maps_target_pooled = tf.nn.max_pool(depth_maps_target, ksize=[1,5,5,1], strides=[1,1,1,1], padding="SAME")

output_vectors, weight_summaries = global_agg_net.End_Net_Out(X1, phase_rgb, X2_pooled, phase, keep_prob)

# se(3) -> SE(3) for the whole batch
predicted_transforms = tf.map_fn(lambda x:exponential_map_single(output_vectors[x]), elems=tf.range(0, batch_size, 1), dtype=tf.float32)

# transforms depth maps by the predicted transformation
depth_maps_predicted, cloud_pred = tf.map_fn(lambda x:at3._simple_transformer(X2_pooled[x,:,:,0]*40.0 + 40.0, predicted_transforms[x], K_final, small_transform), elems = tf.range(0, batch_size, 1), dtype = (tf.float32, tf.float32))

# transforms depth maps by the expected transformation
depth_maps_expected, cloud_exp = tf.map_fn(lambda x:at3._simple_transformer(X2_pooled[x,:,:,0]*40.0 + 40.0, expected_transforms[x], K_final, small_transform), elems = tf.range(0, batch_size, 1), dtype = (tf.float32, tf.float32))

# photometric loss between predicted and expected transformation
photometric_loss = tf.nn.l2_loss(tf.subtract((depth_maps_expected[:,10:-10,10:-10] - 40.0)/40.0, (depth_maps_predicted[:,10:-10,10:-10] - 40.0)/40.0))

# earth mover's distance between point clouds
cloud_loss = model_utils.get_emd_loss(cloud_pred, cloud_exp)

# final loss term
predicted_loss_train = _ALPHA_CONST*photometric_loss + _BETA_CONST*cloud_loss

tf.add_to_collection('losses1', predicted_loss_train)
loss1 = tf.add_n(tf.get_collection('losses1'))

train_step = tf.train.AdamOptimizer(learning_rate = config.net_params['learning_rate'],
                                    beta1 = config.net_params['beta1']).minimize(predicted_loss_train)

predicted_loss_validation = tf.nn.l2_loss(tf.subtract((depth_maps_expected[:,10:-10,10:-10] - 40.0)/40.0, (depth_maps_predicted[:,10:-10,10:-10] - 40.0)/40.0))

cloud_loss_validation = model_utils.get_emd_loss(cloud_pred, cloud_exp)

training_summary_1 = tf.summary.scalar('cloud_loss', _BETA_CONST*cloud_loss)
training_summary_2 = tf.summary.scalar('photometric_loss', photometric_loss)
validation_summary_1 = tf.summary.scalar('Validation_loss', predicted_loss_validation)
validation_summary_2 = tf.summary.scalar('Validation_cloud_loss', cloud_loss_validation)

merge_train = tf.summary.merge([training_summary_1] + [training_summary_2] + weight_summaries)
merge_val = tf.summary.merge([validation_summary_1] + [validation_summary_2])

saver = tf.train.Saver()

# tensorflow gpu configuration. Not to be confused with network configuration file

config_tf = tf.ConfigProto()
config_tf.gpu_options.allow_growth=True

with tf.Session(config = config_tf) as sess:
    sess.run(tf.global_variables_initializer())

    writer = tf.summary.FileWriter("./logs_simple_transformer/")

    total_iterations_train = 0
    total_iterations_validate = 0

    if(current_epoch == 0):
        writer.add_graph(sess.graph)

    checkpoint_path = config.paths['checkpoint_path']

    if(current_epoch > 0):
        print("Restoring Checkpoint")

        saver.restore(sess, checkpoint_path + "/model-%d"%current_epoch)
        current_epoch+=1
        total_iterations_train = current_epoch*config.net_params['total_frames_train']/batch_size
        total_iterations_validate = current_epoch*config.net_params['total_frames_validation']/batch_size

    for epoch in range(current_epoch, n_epochs):
        total_partitions_train = config.net_params['total_frames_train']/config.net_params['partition_limit']
        total_partitions_validation = config.net_params['total_frames_validation']/config.net_params['partition_limit']
        ldr.shuffle()

        for part in range(total_partitions_train):
            source_container, target_container, source_img_container, target_img_container, transforms_container = ldr.load(part, mode = "train")

            for source_b, target_b, source_img_b, target_img_b, transforms_b in zip(source_container, target_container, source_img_container, target_img_container, transforms_container):

                outputs= sess.run([depth_maps_predicted, depth_maps_expected, predicted_loss_train, X2_pooled, train_step, merge_train, predicted_transforms, cloud_loss, photometric_loss, loss1], feed_dict={X1: source_img_b, X2: source_b, depth_maps_target: target_b, expected_transforms: transforms_b ,phase:True, keep_prob:0.5, phase_rgb: False})

                dmaps_pred = outputs[0]
                dmaps_exp = outputs[1]
                loss = outputs[2]
                source = outputs[3]

                if(total_iterations_train%10 == 0):
                    writer.add_summary(outputs[5], total_iterations_train/10)

                print(outputs[8], _ALPHA_CONST*outputs[8], outputs[7], _BETA_CONST*outputs[7], outputs[9],total_iterations_train)

                random_disp = np.random.randint(batch_size)
                print(outputs[6][random_disp])
                print(transforms_b[random_disp])

                if(total_iterations_train%125 == 0):

                    smc.imsave(config.paths['training_imgs_path'] + "/training_save_%d.png"%total_iterations_train, np.vstack((source[random_disp,:,:,0]*40.0 + 40.0, dmaps_pred[random_disp], dmaps_exp[random_disp])))

                total_iterations_train+=1

        if (epoch%1 == 0):
            print("Saving after epoch %d"%epoch)
            saver.save(sess, checkpoint_path + "/model-%d"%epoch)

        for part in range(total_partitions_validation):
            source_container, target_container, source_img_container, target_img_container, transforms_container = ldr.load(part, mode = "validation")

            for source_b, target_b, source_img_b, target_img_b, transforms_b in zip(source_container, target_container, source_img_container, target_img_container, transforms_container):

                outputs= sess.run([depth_maps_predicted, depth_maps_expected, predicted_loss_validation, X2_pooled, merge_val, cloud_loss_validation], feed_dict={X1: source_img_b, X2: source_b, depth_maps_target: target_b, expected_transforms: transforms_b ,phase:False, keep_prob:1.0, phase_rgb: False})

                dmaps_pred = outputs[0]
                dmaps_exp = outputs[1]
                loss = outputs[2]
                source = outputs[3]

                writer.add_summary(outputs[4], total_iterations_validate)
                total_iterations_validate+=1

                print(loss, total_iterations_validate, outputs[5])

                if(total_iterations_validate%25 == 0):

                    random_disp = np.random.randint(batch_size)

                    smc.imsave(config.paths['validation_imgs_path'] + "/validation_save_%d.png"%total_iterations_validate, np.vstack((source[random_disp,:,:,0]*40.0 + 40.0, dmaps_pred[random_disp], dmaps_exp[random_disp])))


================================================
FILE: code/utils/__init__.py
================================================


================================================
FILE: code/utils/data_prep_util.py
================================================
import os
import sys
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
from plyfile import (PlyData, PlyElement, make2d, PlyParseError, PlyProperty)
import numpy as np
import h5py

SAMPLING_BIN = os.path.join(BASE_DIR, 'third_party/mesh_sampling/build/pcsample')

SAMPLING_POINT_NUM = 2048
SAMPLING_LEAF_SIZE = 0.005

MODELNET40_PATH = '../datasets/modelnet40'
def export_ply(pc, filename):
	vertex = np.zeros(pc.shape[0], dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')])
	for i in range(pc.shape[0]):
		vertex[i] = (pc[i][0], pc[i][1], pc[i][2])
	ply_out = PlyData([PlyElement.describe(vertex, 'vertex', comments=['vertices'])])
	ply_out.write(filename)

# Sample points on the obj shape
def get_sampling_command(obj_filename, ply_filename):
    cmd = SAMPLING_BIN + ' ' + obj_filename
    cmd += ' ' + ply_filename
    cmd += ' -n_samples %d ' % SAMPLING_POINT_NUM
    cmd += ' -leaf_size %f ' % SAMPLING_LEAF_SIZE
    return cmd

# --------------------------------------------------------------
# Following are the helper functions to load MODELNET40 shapes
# --------------------------------------------------------------

# Read in the list of categories in MODELNET40
def get_category_names():
    shape_names_file = os.path.join(MODELNET40_PATH, 'shape_names.txt')
    shape_names = [line.rstrip() for line in open(shape_names_file)]
    return shape_names

# Return all the filepaths for the shapes in MODELNET40 
def get_obj_filenames():
    obj_filelist_file = os.path.join(MODELNET40_PATH, 'filelist.txt')
    obj_filenames = [os.path.join(MODELNET40_PATH, line.rstrip()) for line in open(obj_filelist_file)]
    print('Got %d obj files in modelnet40.' % len(obj_filenames))
    return obj_filenames

# Helper function to create the father folder and all subdir folders if not exist
def batch_mkdir(output_folder, subdir_list):
    if not os.path.exists(output_folder):
        os.mkdir(output_folder)
    for subdir in subdir_list:
        if not os.path.exists(os.path.join(output_folder, subdir)):
            os.mkdir(os.path.join(output_folder, subdir))

# ----------------------------------------------------------------
# Following are the helper functions to load save/load HDF5 files
# ----------------------------------------------------------------

# Write numpy array data and label to h5_filename
def save_h5_data_label_normal(h5_filename, data, label, normal, 
		data_dtype='float32', label_dtype='uint8', noral_dtype='float32'):
    h5_fout = h5py.File(h5_filename)
    h5_fout.create_dataset(
            'data', data=data,
            compression='gzip', compression_opts=4,
            dtype=data_dtype)
    h5_fout.create_dataset(
            'normal', data=normal,
            compression='gzip', compression_opts=4,
            dtype=normal_dtype)
    h5_fout.create_dataset(
            'label', data=label,
            compression='gzip', compression_opts=1,
            dtype=label_dtype)
    h5_fout.close()


# Write numpy array data and label to h5_filename
def save_h5(h5_filename, data, label, data_dtype='uint8', label_dtype='uint8'):
    h5_fout = h5py.File(h5_filename)
    h5_fout.create_dataset(
            'data', data=data,
            compression='gzip', compression_opts=4,
            dtype=data_dtype)
    h5_fout.create_dataset(
            'label', data=label,
            compression='gzip', compression_opts=1,
            dtype=label_dtype)
    h5_fout.close()

# Read numpy array data and label from h5_filename
def load_h5_data_label_normal(h5_filename):
    f = h5py.File(h5_filename)
    data = f['data'][:]
    label = f['label'][:]
    normal = f['normal'][:]
    return (data, label, normal)

# Read numpy array data and label from h5_filename
def load_h5_data_label_seg(h5_filename):
    f = h5py.File(h5_filename)
    data = f['data'][:]
    label = f['label'][:]
    seg = f['pid'][:]
    return (data, label, seg)

# Read numpy array data and label from h5_filename
def load_h5(h5_filename):
    f = h5py.File(h5_filename)
    data = f['data'][:]
    label = f['label'][:]
    return (data, label)

# ----------------------------------------------------------------
# Following are the helper functions to load save/load PLY files
# ----------------------------------------------------------------

# Load PLY file
def load_ply_data(filename, point_num):
    plydata = PlyData.read(filename)
    pc = plydata['vertex'].data[:point_num]
    pc_array = np.array([[x, y, z] for x,y,z in pc])
    return pc_array

# Load PLY file
def load_ply_normal(filename, point_num):
    plydata = PlyData.read(filename)
    pc = plydata['normal'].data[:point_num]
    pc_array = np.array([[x, y, z] for x,y,z in pc])
    return pc_array

# Make up rows for Nxk array
# Input Pad is 'edge' or 'constant'
def pad_arr_rows(arr, row, pad='edge'):
    assert(len(arr.shape) == 2)
    assert(arr.shape[0] <= row)
    assert(pad == 'edge' or pad == 'constant')
    if arr.shape[0] == row:
        return arr
    if pad == 'edge':
        return np.lib.pad(arr, ((0, row-arr.shape[0]), (0, 0)), 'edge')
    if pad == 'constant':
        return np.lib.pad(arr, ((0, row-arr.shape[0]), (0, 0)), 'constant', (0, 0))


================================================
FILE: code/utils/eulerangles.py
================================================
# emacs: -*- mode: python-mode; py-indent-offset: 4; indent-tabs-mode: nil -*-
# vi: set ft=python sts=4 ts=4 sw=4 et:
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
#
#   See COPYING file distributed along with the NiBabel package for the
#   copyright and license terms.
#
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
''' Module implementing Euler angle rotations and their conversions

See:

* http://en.wikipedia.org/wiki/Rotation_matrix
* http://en.wikipedia.org/wiki/Euler_angles
* http://mathworld.wolfram.com/EulerAngles.html

See also: *Representing Attitude with Euler Angles and Quaternions: A
Reference* (2006) by James Diebel. A cached PDF link last found here:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.5134

Euler's rotation theorem tells us that any rotation in 3D can be
described by 3 angles.  Let's call the 3 angles the *Euler angle vector*
and call the angles in the vector :math:`alpha`, :math:`beta` and
:math:`gamma`.  The vector is [ :math:`alpha`,
:math:`beta`. :math:`gamma` ] and, in this description, the order of the
parameters specifies the order in which the rotations occur (so the
rotation corresponding to :math:`alpha` is applied first).

In order to specify the meaning of an *Euler angle vector* we need to
specify the axes around which each of the rotations corresponding to
:math:`alpha`, :math:`beta` and :math:`gamma` will occur.

There are therefore three axes for the rotations :math:`alpha`,
:math:`beta` and :math:`gamma`; let's call them :math:`i` :math:`j`,
:math:`k`.

Let us express the rotation :math:`alpha` around axis `i` as a 3 by 3
rotation matrix `A`.  Similarly :math:`beta` around `j` becomes 3 x 3
matrix `B` and :math:`gamma` around `k` becomes matrix `G`.  Then the
whole rotation expressed by the Euler angle vector [ :math:`alpha`,
:math:`beta`. :math:`gamma` ], `R` is given by::

   R = np.dot(G, np.dot(B, A))

See http://mathworld.wolfram.com/EulerAngles.html

The order :math:`G B A` expresses the fact that the rotations are
performed in the order of the vector (:math:`alpha` around axis `i` =
`A` first).

To convert a given Euler angle vector to a meaningful rotation, and a
rotation matrix, we need to define:

* the axes `i`, `j`, `k`
* whether a rotation matrix should be applied on the left of a vector to
  be transformed (vectors are column vectors) or on the right (vectors
  are row vectors).
* whether the rotations move the axes as they are applied (intrinsic
  rotations) - compared the situation where the axes stay fixed and the
  vectors move within the axis frame (extrinsic)
* the handedness of the coordinate system

See: http://en.wikipedia.org/wiki/Rotation_matrix#Ambiguities

We are using the following conventions:

* axes `i`, `j`, `k` are the `z`, `y`, and `x` axes respectively.  Thus
  an Euler angle vector [ :math:`alpha`, :math:`beta`. :math:`gamma` ]
  in our convention implies a :math:`alpha` radian rotation around the
  `z` axis, followed by a :math:`beta` rotation around the `y` axis,
  followed by a :math:`gamma` rotation around the `x` axis.
* the rotation matrix applies on the left, to column vectors on the
  right, so if `R` is the rotation matrix, and `v` is a 3 x N matrix
  with N column vectors, the transformed vector set `vdash` is given by
  ``vdash = np.dot(R, v)``.
* extrinsic rotations - the axes are fixed, and do not move with the
  rotations.
* a right-handed coordinate system

The convention of rotation around ``z``, followed by rotation around
``y``, followed by rotation around ``x``, is known (confusingly) as
"xyz", pitch-roll-yaw, Cardan angles, or Tait-Bryan angles.
'''

import math

import sys
if sys.version_info >= (3,0):
    from functools import reduce

import numpy as np


_FLOAT_EPS_4 = np.finfo(float).eps * 4.0


def euler2mat(z=0, y=0, x=0):
    ''' Return matrix for rotations around z, y and x axes

    Uses the z, then y, then x convention above

    Parameters
    ----------
    z : scalar
       Rotation angle in radians around z-axis (performed first)
    y : scalar
       Rotation angle in radians around y-axis
    x : scalar
       Rotation angle in radians around x-axis (performed last)

    Returns
    -------
    M : array shape (3,3)
       Rotation matrix giving same rotation as for given angles

    Examples
    --------
    >>> zrot = 1.3 # radians
    >>> yrot = -0.1
    >>> xrot = 0.2
    >>> M = euler2mat(zrot, yrot, xrot)
    >>> M.shape == (3, 3)
    True

    The output rotation matrix is equal to the composition of the
    individual rotations

    >>> M1 = euler2mat(zrot)
    >>> M2 = euler2mat(0, yrot)
    >>> M3 = euler2mat(0, 0, xrot)
    >>> composed_M = np.dot(M3, np.dot(M2, M1))
    >>> np.allclose(M, composed_M)
    True

    You can specify rotations by named arguments

    >>> np.all(M3 == euler2mat(x=xrot))
    True

    When applying M to a vector, the vector should column vector to the
    right of M.  If the right hand side is a 2D array rather than a
    vector, then each column of the 2D array represents a vector.

    >>> vec = np.array([1, 0, 0]).reshape((3,1))
    >>> v2 = np.dot(M, vec)
    >>> vecs = np.array([[1, 0, 0],[0, 1, 0]]).T # giving 3x2 array
    >>> vecs2 = np.dot(M, vecs)

    Rotations are counter-clockwise.

    >>> zred = np.dot(euler2mat(z=np.pi/2), np.eye(3))
    >>> np.allclose(zred, [[0, -1, 0],[1, 0, 0], [0, 0, 1]])
    True
    >>> yred = np.dot(euler2mat(y=np.pi/2), np.eye(3))
    >>> np.allclose(yred, [[0, 0, 1],[0, 1, 0], [-1, 0, 0]])
    True
    >>> xred = np.dot(euler2mat(x=np.pi/2), np.eye(3))
    >>> np.allclose(xred, [[1, 0, 0],[0, 0, -1], [0, 1, 0]])
    True

    Notes
    -----
    The direction of rotation is given by the right-hand rule (orient
    the thumb of the right hand along the axis around which the rotation
    occurs, with the end of the thumb at the positive end of the axis;
    curl your fingers; the direction your fingers curl is the direction
    of rotation).  Therefore, the rotations are counterclockwise if
    looking along the axis of rotation from positive to negative.
    '''
    Ms = []
    if z:
        cosz = math.cos(z)
        sinz = math.sin(z)
        Ms.append(np.array(
                [[cosz, -sinz, 0],
                 [sinz, cosz, 0],
                 [0, 0, 1]]))
    if y:
        cosy = math.cos(y)
        siny = math.sin(y)
        Ms.append(np.array(
                [[cosy, 0, siny],
                 [0, 1, 0],
                 [-siny, 0, cosy]]))
    if x:
        cosx = math.cos(x)
        sinx = math.sin(x)
        Ms.append(np.array(
                [[1, 0, 0],
                 [0, cosx, -sinx],
                 [0, sinx, cosx]]))
    if Ms:
        return reduce(np.dot, Ms[::-1])
    return np.eye(3)


def mat2euler(M, cy_thresh=None):
    ''' Discover Euler angle vector from 3x3 matrix

    Uses the conventions above.

    Parameters
    ----------
    M : array-like, shape (3,3)
    cy_thresh : None or scalar, optional
       threshold below which to give up on straightforward arctan for
       estimating x rotation.  If None (default), estimate from
       precision of input.

    Returns
    -------
    z : scalar
    y : scalar
    x : scalar
       Rotations in radians around z, y, x axes, respectively

    Notes
    -----
    If there was no numerical error, the routine could be derived using
    Sympy expression for z then y then x rotation matrix, which is::

      [                       cos(y)*cos(z),                       -cos(y)*sin(z),         sin(y)],
      [cos(x)*sin(z) + cos(z)*sin(x)*sin(y), cos(x)*cos(z) - sin(x)*sin(y)*sin(z), -cos(y)*sin(x)],
      [sin(x)*sin(z) - cos(x)*cos(z)*sin(y), cos(z)*sin(x) + cos(x)*sin(y)*sin(z),  cos(x)*cos(y)]

    with the obvious derivations for z, y, and x

       z = atan2(-r12, r11)
       y = asin(r13)
       x = atan2(-r23, r33)

    Problems arise when cos(y) is close to zero, because both of::

       z = atan2(cos(y)*sin(z), cos(y)*cos(z))
       x = atan2(cos(y)*sin(x), cos(x)*cos(y))

    will be close to atan2(0, 0), and highly unstable.

    The ``cy`` fix for numerical instability below is from: *Graphics
    Gems IV*, Paul Heckbert (editor), Academic Press, 1994, ISBN:
    0123361559.  Specifically it comes from EulerAngles.c by Ken
    Shoemake, and deals with the case where cos(y) is close to zero:

    See: http://www.graphicsgems.org/

    The code appears to be licensed (from the website) as "can be used
    without restrictions".
    '''
    M = np.asarray(M)
    if cy_thresh is None:
        try:
            cy_thresh = np.finfo(M.dtype).eps * 4
        except ValueError:
            cy_thresh = _FLOAT_EPS_4
    r11, r12, r13, r21, r22, r23, r31, r32, r33 = M.flat
    # cy: sqrt((cos(y)*cos(z))**2 + (cos(x)*cos(y))**2)
    cy = math.sqrt(r33*r33 + r23*r23)
    if cy > cy_thresh: # cos(y) not close to zero, standard form
        z = math.atan2(-r12,  r11) # atan2(cos(y)*sin(z), cos(y)*cos(z))
        y = math.atan2(r13,  cy) # atan2(sin(y), cy)
        x = math.atan2(-r23, r33) # atan2(cos(y)*sin(x), cos(x)*cos(y))
    else: # cos(y) (close to) zero, so x -> 0.0 (see above)
        # so r21 -> sin(z), r22 -> cos(z) and
        z = math.atan2(r21,  r22)
        y = math.atan2(r13,  cy) # atan2(sin(y), cy)
        x = 0.0
    return z, y, x


def euler2quat(z=0, y=0, x=0):
    ''' Return quaternion corresponding to these Euler angles

    Uses the z, then y, then x convention above

    Parameters
    ----------
    z : scalar
       Rotation angle in radians around z-axis (performed first)
    y : scalar
       Rotation angle in radians around y-axis
    x : scalar
       Rotation angle in radians around x-axis (performed last)

    Returns
    -------
    quat : array shape (4,)
       Quaternion in w, x, y z (real, then vector) format

    Notes
    -----
    We can derive this formula in Sympy using:

    1. Formula giving quaternion corresponding to rotation of theta radians
       about arbitrary axis:
       http://mathworld.wolfram.com/EulerParameters.html
    2. Generated formulae from 1.) for quaternions corresponding to
       theta radians rotations about ``x, y, z`` axes
    3. Apply quaternion multiplication formula -
       http://en.wikipedia.org/wiki/Quaternions#Hamilton_product - to
       formulae from 2.) to give formula for combined rotations.
    '''
    z = z/2.0
    y = y/2.0
    x = x/2.0
    cz = math.cos(z)
    sz = math.sin(z)
    cy = math.cos(y)
    sy = math.sin(y)
    cx = math.cos(x)
    sx = math.sin(x)
    return np.array([
             cx*cy*cz - sx*sy*sz,
             cx*sy*sz + cy*cz*sx,
             cx*cz*sy - sx*cy*sz,
             cx*cy*sz + sx*cz*sy])


def quat2euler(q):
    ''' Return Euler angles corresponding to quaternion `q`

    Parameters
    ----------
    q : 4 element sequence
       w, x, y, z of quaternion

    Returns
    -------
    z : scalar
       Rotation angle in radians around z-axis (performed first)
    y : scalar
       Rotation angle in radians around y-axis
    x : scalar
       Rotation angle in radians around x-axis (performed last)

    Notes
    -----
    It's possible to reduce the amount of calculation a little, by
    combining parts of the ``quat2mat`` and ``mat2euler`` functions, but
    the reduction in computation is small, and the code repetition is
    large.
    '''
    # delayed import to avoid cyclic dependencies
    import nibabel.quaternions as nq
    return mat2euler(nq.quat2mat(q))


def euler2angle_axis(z=0, y=0, x=0):
    ''' Return angle, axis corresponding to these Euler angles

    Uses the z, then y, then x convention above

    Parameters
    ----------
    z : scalar
       Rotation angle in radians around z-axis (performed first)
    y : scalar
       Rotation angle in radians around y-axis
    x : scalar
       Rotation angle in radians around x-axis (performed last)

    Returns
    -------
    theta : scalar
       angle of rotation
    vector : array shape (3,)
       axis around which rotation occurs

    Examples
    --------
    >>> theta, vec = euler2angle_axis(0, 1.5, 0)
    >>> print(theta)
    1.5
    >>> np.allclose(vec, [0, 1, 0])
    True
    '''
    # delayed import to avoid cyclic dependencies
    import nibabel.quaternions as nq
    return nq.quat2angle_axis(euler2quat(z, y, x))


def angle_axis2euler(theta, vector, is_normalized=False):
    ''' Convert angle, axis pair to Euler angles

    Parameters
    ----------
    theta : scalar
       angle of rotation
    vector : 3 element sequence
       vector specifying axis for rotation.
    is_normalized : bool, optional
       True if vector is already normalized (has norm of 1).  Default
       False

    Returns
    -------
    z : scalar
    y : scalar
    x : scalar
       Rotations in radians around z, y, x axes, respectively

    Examples
    --------
    >>> z, y, x = angle_axis2euler(0, [1, 0, 0])
    >>> np.allclose((z, y, x), 0)
    True

    Notes
    -----
    It's possible to reduce the amount of calculation a little, by
    combining parts of the ``angle_axis2mat`` and ``mat2euler``
    functions, but the reduction in computation is small, and the code
    repetition is large.
    '''
    # delayed import to avoid cyclic dependencies
    import nibabel.quaternions as nq
    M = nq.angle_axis2mat(theta, vector, is_normalized)
    return mat2euler(M)


================================================
FILE: code/utils/modelnet_data_prep.py
================================================
import os
import sys
import glob
import h5py
import numpy as np
from multiprocessing.dummy import Pool as ThreadPool
import threading
from tqdm import tqdm

SAMPLING_BIN = "/home/lqyu/workspace/pcl-pcl-1.8.1/build/bin/pcl_mesh_sampling"
SAMPLING_POINT_NUM = 8192
SAMPLING_LEAF_SIZE = 0.02
SAVE_ROOT_PATH = '../../data/Patches'


def nonuniform_sampling(num = 4096, sample_num = 1024):
    sample = set()
    loc = np.random.rand()*0.6+0.2
    while(len(sample)<sample_num):
        a = int(np.random.normal(loc=loc,scale=0.4)*num)
        if a<0 or a>=num:
            continue
        sample.add(a)

    return list(sample)

def save_file(file_path, data):
    if not os.path.exists(os.path.split(file_path)[0]):
        os.makedirs(os.path.split(file_path)[0])
    np.savetxt(file_path, data, fmt='%.6f')

def tmp(file_path=None):
    file_list = glob.glob('/home/lqyu/workspace/PointSR/data/ModelNet10_test/ModelNet10_MC_2k/*.xyz')
    file_list.sort()
    for file_path in file_list:
        print file_path
        data_4096 = np.loadtxt(file_path)
        data = data_4096[:, 0:3]
        centroid = np.mean(data, axis=0, keepdims=True)
        data = data - centroid
        furthest_distance = np.amax(np.sqrt(np.sum(abs(data) ** 2, axis=-1)))
        data = data / furthest_distance
        data_4096[:, 0:3] = data
        save_path = file_path.replace('ModelNet10_MC_2k','ModelNet10_MC_2k_normalize')
        save_file(save_path,data_4096)

        off_path = file_path.replace('ModelNet10_MC_2k','mesh')
        off_path = off_path.replace('.xyz', '.off')
        save_off_path = off_path.replace('mesh', 'mesh_normalize')
        if not os.path.exists(os.path.split(save_off_path)[0]):
            os.makedirs(os.path.split(save_off_path)[0])

        offFile = open(off_path, 'r')
        lines = offFile.readlines()
        offFile.close()
        with open(save_off_path, 'w') as f:
            f.writelines(lines[0:2])
            params = lines[1].split(' ')
            nVert = int(params[0])
            for i in range(2, nVert + 2):
                coord = lines[i].split(' ')
                coords = []
                for item in coord:
                    if item!='':
                        coords.append(item)
                x = (float(coords[0]) - centroid[0, 0]) / furthest_distance
                y = (float(coords[1]) - centroid[0, 1]) / furthest_distance
                z = (float(coords[2]) - centroid[0, 2]) / furthest_distance
                f.write('%.6f %.6f %.6f\n' % (x, y, z))
            f.writelines(lines[nVert + 2:])
        # continue

        # path0 = file_path.replace('poisson_20k','20000_normalize')
        # save_file(path0, data_4096)

        # idx = np.argsort(data_4096[:,np.random.randint(0,2)])
        # data_4096 = data_4096[idx]
        # path1 = file_path.replace('poisson_20k','poisson_5k2')
        # idx1 = nonuniform_sampling(num=len(data_4096), sample_num=len(data_4096)/4)
        # data1 = data_4096[idx1, ...]
        # save_file(path1,data1)


def nonuniformsample_from_pointcloud_fn():
    file_list1 = glob.glob('/home/lqyu/server/proj49/PointSR_data/test_data/our_collected_data/Poisson_20k/*.xyz')
    file_list1.sort()
    # #handle with file_list2 and select the complete whole
    # tmp_list1  = []
    # tmp_list2 = []
    # for item in file_list2:
    #     tmp_item = item.replace('big_girl','biggirl')
    #     if len(tmp_item.split('/')[-1].split('_'))==3:#biggirl_01_1
    #         tmp_list1.append(item)
    #     else:#biggirl_1
    #         tmp_list2.append(item)
    file_list = file_list1

    for file_path in tqdm(file_list):
        print file_path
        phase = file_path.split('/')[-2]
        name = file_path.split('/')[-1].replace("xyz", "xyz")

        data_4096 = np.loadtxt(file_path)
        # data = data_4096[:, 0:3]
        # centroid = np.mean(data, axis=0, keepdims=True)
        # data = data - centroid
        # furthest_distance = np.amax(np.sqrt(np.sum(abs(data) ** 2, axis=-1)))
        # data = data / furthest_distance
        # data_4096[:, 0:3] = data

        path_4096 = file_path
        path_nonuniform_1024 = path_4096.replace('Poisson_20k','Poisson_5k_nonuniform')

        #idx_nonuniform_2048 = nonuniform_sampling(num=4096, sample_num=2048)
        #data_nonuniform_2048 = data_4096[idx_nonuniform_2048, ...]
        sort_idx = np.argsort(data_4096[:, np.random.randint(0, 2)])
        perm_idx = np.random.permutation(np.arange(len(data_4096)))

        idx_nonuniform_1024 = nonuniform_sampling(num=len(data_4096), sample_num=5000)
        data_nonuniform_1024 = data_4096[sort_idx[idx_nonuniform_1024],...]

        # data_nonuniform_1024 = data_4096[perm_idx[:5000], ...]

        #save_file(path_4096,data_4096)
        #save_file(path_nonuniform_2048,data_nonuniform_2048)
        save_file(path_nonuniform_1024,data_nonuniform_1024)


def possion_sample_fn(file_path):
    phase = file_path.split('/')[-2]
    name = file_path.split('/')[-1].replace("off", "xyz")
    xyz_name = os.path.join(SAVE_ROOT_PATH, '20000~', phase, name)
    if not os.path.exists(xyz_name):
        sample_cmd = '../../third_party/PdSampling_nofix %s %s %s' % (str(20000), file_path, xyz_name)
        print sample_cmd
        if os.system(sample_cmd):
            print "cannot sample file: %s" % (file_path)
            return 1

    # xyz_name = os.path.join(SAVE_ROOT_PATH, '4096', phase, name)
    # if not os.path.exists(xyz_name):
    #     sample_cmd = '../../third_party_pc901/PdSampling %s %s %s' % (str(4096), file_path, xyz_name)
    #     print sample_cmd
    #     if os.system(sample_cmd):
    #         print "cannot sample file: %s" % (file_path)
    #         return 1
    #
    # xyz_name = os.path.join(SAVE_ROOT_PATH, '2048', phase, name)
    # if not os.path.exists(xyz_name):
    #     sample_cmd = '../../third_party_pc901/PdSampling %s %s %s' % (str(2048), file_path, xyz_name)
    #     print sample_cmd
    #     if os.system(sample_cmd):
    #         print "cannot sample file: %s" % (file_path)
    #         return 1

    # xyz_name = os.path.join(SAVE_ROOT_PATH, '8192~', phase, name)
    # if os.path.exists(xyz_name):
    #     return
    # print file_path
    # sample_cmd = '../../third_party/PdSampling_nofix %s %s %s' % (str(8192), file_path, xyz_name)
    # if os.system(sample_cmd):
    #     print "cannot sample file: %s" % (file_path)
    #     return 1


def normal_off_file(folder='/home/lqyu/workspace/PointSR/data/ModelNet10'):
    file_list = glob.glob(folder+'/*/train/*.off')
    file_list.sort()
    for file_path in file_list:
        phase = file_path.split('/')[-2]
        name = file_path.split('/')[-1][:-4]+'.xyz'
        data_1024 = np.loadtxt(os.path.join('../../data/ModelNet10_poisson', '1024', phase, name))
        data_2048 = np.loadtxt(os.path.join('../../data/ModelNet10_poisson', '2048', phase, name))
        data_4096 = np.loadtxt(os.path.join('../../data/ModelNet10_poisson', '4096', phase, name))

        data_ori = np.concatenate((data_1024, data_2048, data_4096), axis=0)
        data = data_ori[:, 0:3]
        centroid = np.mean(data, axis=0, keepdims=True)
        data = data - centroid
        furthest_distance = np.amax(np.sqrt(np.sum(abs(data) ** 2, axis=-1)))

        save_off_path = file_path.replace('ModelNet10','ModelNet10_normalize')
        if not os.path.exists(os.path.split(save_off_path)[0]):
            os.makedirs(os.path.split(save_off_path)[0])

        offFile = open(file_path, 'r')
        lines = offFile.readlines()
        offFile.close()
        with open(save_off_path,'w') as f:
            f.writelines(lines[0:2])
            params = lines[1].split(' ')
            nVert = int(params[0])
            for i in range(2,nVert+2):
                coord = lines[i].split(' ')
                x = (float(coord[0])-centroid[0,0])/furthest_distance
                y = (float(coord[1])-centroid[0,1])/furthest_distance
                z = (float(coord[2])-centroid[0,2])/furthest_distance
                f.write('%.6f %.6f %.6f\n'%(x,y,z))
            f.writelines(lines[nVert+2:])


def recalculateNormal_fn(file_path):
    SAVE_ROOT_PATH='../../data/ModelNet10_poisson_normal'
    phase = file_path.split('/')[-2]
    name = file_path.split('/')[-1]
    xyz_name = file_path.split('/')[-1][:-4] + "_p.xyz"
    xyz_normal_name = file_path.split('/')[-1].replace("off", "xyz")

    if os.path.exists(os.path.join(SAVE_ROOT_PATH, '2048_nonuniform', phase, name)):
        return
    data_1024 = np.loadtxt(os.path.join('../../ModelNet10', '1024', phase, name))
    data_2048 = np.loadtxt(os.path.join('../../ModelNet10', '2048', phase, name))
    data_4096 = np.loadtxt(os.path.join('../../ModelNet10', '4096', phase, name))
    data_8196 = np.loadtxt(os.path.join('../../ModelNet10', '20000~', phase, name))

    data = np.concatenate((data_1024, data_2048, data_4096,data_8196), axis=0)
    np.savetxt(xyz_name, data)
    normal_cmd = 'meshlabserver -i %s -o %s -s ../../third_party/calculate_normal.mlx -om vn' % (
        xyz_name, xyz_normal_name)
    if os.system(normal_cmd):
        print "cannot calculate normal file: %s" % (file_path)
        return 1

    data_ori = np.loadtxt(xyz_normal_name)
    data_ori = data_ori[0:1024*7,:]

    data = data_ori[:, 0:3]
    centroid = np.mean(data, axis=0, keepdims=True)
    data = data - centroid
    furthest_distance = np.amax(np.sqrt(np.sum(abs(data) ** 2, axis=-1)))
    data = data / furthest_distance
    data_ori[:, 0:3] = data

    data_1024 = data_ori[0:1024 * 1, :]
    data_2048 = data_ori[1024 * 1:1024 * 3, :]
    data_4096 = data_ori[1024 * 3:1024 * 7, :]

    #generate nonuniform data
    idx_nonuniform_2048 = nonuniform_sampling(num=len(data_4096), sample_num=2048)
    data_nonuniform_2048 = data_4096[idx_nonuniform_2048, ...]
    idx_nonuniform_1024 = nonuniform_sampling(num=len(data_4096), sample_num=1024)
    data_nonuniform_1024 = data_4096[idx_nonuniform_1024, ...]

    # save
    path_1024 = os.path.join(SAVE_ROOT_PATH, '1024', phase, name)
    path_2048 = os.path.join(SAVE_ROOT_PATH, '2048', phase, name)
    path_4096 = os.path.join(SAVE_ROOT_PATH, '4096', phase, name)
    path_nonuniform_1024 = os.path.join(SAVE_ROOT_PATH, '1024_nonuniform', phase, name)
    path_nonuniform_2048 = os.path.join(SAVE_ROOT_PATH, '2048_nonuniform', phase, name)

    if not os.path.exists(os.path.join(SAVE_ROOT_PATH, '1024', phase)):
        os.makedirs(os.path.join(SAVE_ROOT_PATH, '1024', phase))
    if not os.path.exists(os.path.join(SAVE_ROOT_PATH, '2048', phase)):
        os.makedirs(os.path.join(SAVE_ROOT_PATH, '2048', phase))
    if not os.path.exists(os.path.join(SAVE_ROOT_PATH, '4096', phase)):
        os.makedirs(os.path.join(SAVE_ROOT_PATH, '4096', phase))
    if not os.path.exists(os.path.join(SAVE_ROOT_PATH, '1024_nonuniform', phase)):
        os.makedirs(os.path.join(SAVE_ROOT_PATH, '1024_nonuniform', phase))
    if not os.path.exists(os.path.join(SAVE_ROOT_PATH, '2048_nonuniform', phase)):
        os.makedirs(os.path.join(SAVE_ROOT_PATH, '2048_nonuniform', phase))

    np.savetxt(path_1024, data_1024, fmt='%.6f')
    np.savetxt(path_2048, data_2048, fmt='%.6f')
    np.savetxt(path_4096, data_4096, fmt='%.6f')
    np.savetxt(path_nonuniform_1024, data_nonuniform_1024, fmt='%.6f')
    np.savetxt(path_nonuniform_2048, data_nonuniform_2048, fmt='%.6f')

    os.remove(xyz_name)
    os.remove(xyz_normal_name)
    return


def fix_off_file(filepath):
    with open(filepath,'r') as f:
        line = f.readline()
        if line=='OFF\n':
            return
        print filepath
        lines = f.readlines()

    nums = line.split(' ')
    n1 = nums[0][3:]
    n2 = nums[1]
    n3 = nums[2]
    with open(filepath,'w') as f:
        f.write('OFF\n%s %s %s'%(n1,n2,n3))
        f.writelines(lines)


def possion_sample_fn(phase='train'):
    file_list = glob.glob(os.path.join('../../data/ModelNet10', '*', phase, '*.off'))
    file_list.sort()
    new_file_list = []

    for item in file_list:
        name = item.split('/')[-1][:-3]+"xyz"
        xyz_name = os.path.join(SAVE_ROOT_PATH, '20000~', phase,name)
        if not os.path.exists(xyz_name):
            new_file_list.append(item)
    print('Got %d files in modelnet10.' % (len(new_file_list)))
    pool = ThreadPool(8)
    pool.map(possion_sample_fn, new_file_list)

def recalculateNormal(phase='train'):
    file_list = glob.glob(os.path.join('../../ModelNet10/1024',phase,'*.xyz'))
    file_list.sort()
    print('Got %d files in modelnet10.' % (len(file_list)))
    pool = ThreadPool(1)
    pool.map(recalculateNormal_fn, file_list)

def nonuniformsample_from_pointcloud(phase='train'):
    file_list = glob.glob(os.path.join('../../data/surface_with_area/pcl_4096~','*.xyz'))
    file_list.sort()
    print('Got %d files in modelnet10.' % (len(file_list)))
    for item in file_list:
        tmp(item)


def save_h52(save_names = ['poisson_4096','poisson_2048']):
    h5_filename = '/home/lqyu/workspace/PointSR/data/Patches_tt.h5'
    file_names1 = glob.glob(os.path.join('/home/lqyu/server/proj49/PointSR_data/train_data/SHREC',save_names[0],'*.xyz'))
    file_names1.sort()
    #select which data to save
    select_file_names = []
    for name in file_names1:
        id = int(name.split('/')[-1].split('_')[0][1:])
        if id<1000:
            select_file_names.append(name)
    ##read data
    names = []
    catetogy = len(save_names)
    data = [[] for item in range(catetogy)]

    # for item in tqdm(select_file_names):
    #     path1 = os.path.join('/home/lqyu/server/proj49/PointSR_data/train_data/poisson2_4096', item.split('/')[-1])
    #     path2 = os.path.join('/home/lqyu/server/proj49/PointSR_data/train_data/poisson_1024', item.split('/')[-1])
    #     path3 = os.path.join('/home/lqyu/server/proj49/PointSR_data/train_data/MC_4096', item.split('/')[-1])
    #
    #     if os.path.exists(item) and os.path.exists(path1) and os.path.exists(path2):
    #         tmp_data = np.loadtxt(item)
    #         data[0].append(tmp_data)
    #
    #         tmp_data = np.loadtxt(path1)
    #         data[1].append(tmp_data)
    #
    #         tmp_data = np.loadtxt(path2)
    #         data[2].append(tmp_data)
    #
    #         tmp_data = np.loadtxt(path3)
    #         data[3].append(tmp_data)
    #         names.append(item)
    #     else:
    #         print item

    for item in tqdm(select_file_names):

        item_data = []
        for i in range(catetogy):
            path = item.replace(save_names[0],save_names[i])
            tmp_data = np.loadtxt(path)

            centroid = np.mean(tmp_data[:, 0:3], axis=0, keepdims=True)
            tmp_data[:, 0:3] = tmp_data[:, 0:3] - centroid
            furthest_distance = np.amax(np.sqrt(np.sum(abs(tmp_data[:, 0:3]) ** 2, axis=-1)))
            tmp_data[:, 0:3] = tmp_data[:, 0:3] / furthest_distance
            item_data.append(tmp_data)
        if len(item_data)==catetogy:
            names.append(item)
            for i in range(catetogy):
                data[i].append(item_data[i])

    for i in range(catetogy-1):
        assert len(data[i])==len(data[i+1])
    assert len(data[i])==len(names)
    print len(names)
    h5_fout = h5py.File(h5_filename)
    for i in range(catetogy):
        h5_fout.create_dataset(
            save_names[i], data=data[i],
            compression='gzip', compression_opts=4,
            dtype=np.float32)
    string_dt = h5py.special_dtype(vlen=str)
    h5_fout.create_dataset(
        'name', data=names,
        compression='gzip', compression_opts=1,
        dtype=string_dt)
    h5_fout.close()


def save_h5(h5_filename,save_names = ['patch_poisson_4096','patch_poisson_1024_nonuniform']):

    file_names = os.listdir(os.path.join('',save_names[0]))
    file_names.sort()

    #select which data to save
    test_name = ['nicolo','vaselion','bunny','gril']
    select_file_names = []
    for name in file_names:
        mark = False
        for tt in test_name:
            if tt in name:
                mark = True
                break
        if mark==False:
            select_file_names.append(name)

    ##read data
    names = []
    catetogy = len(save_names)
    data = [[] for item in range(catetogy)]
    for item in tqdm(select_file_names):
        item_data = []
        for i in range(catetogy):
            path = os.path.join(pointcloud_path,save_names[i],item)
            tmp_data = np.loadtxt(path)
            if tmp_data.shape[0]==int(save_names[i].split('_')[0]):
                item_data.append(tmp_data)
        if len(item_data)==catetogy:
            names.append(item)
            for i in range(catetogy):
                data[i].append(item_data[i])

    for i in range(catetogy-1):
        assert len(data[i])==len(data[i+1])
    assert len(data[i])==len(names)

    h5_fout = h5py.File(h5_filename)
    for i in range(catetogy):
        h5_fout.create_dataset(
            save_names[i], data=data[i],
            compression='gzip', compression_opts=4,
            dtype=np.float32)
    string_dt = h5py.special_dtype(vlen=str)
    h5_fout.create_dataset(
        'name', data=names,
        compression='gzip', compression_opts=1,
        dtype=string_dt)
    h5_fout.close()

def load_h5(h5_filename):
    f = h5py.File(h5_filename)
    data_1024 = f['data_1024'][:]
    data_4096 = f['data_4096'][:]
    name = f['name'][:]
    return (data_1024,data_4096)

if __name__ == '__main__':
    #nonuniform_subsample_from_pointcloud('/home/lqyu/workspace/PointSR/data/ModelNet10_pc',sampling_num=2048)

    pointcloud_path = '/home/lqyu/workspace/PointSR/data/Patches'
    h5_filename = '/home/lqyu/workspace/PointSR/data/Patches_tt.h5'
    #nonuniformsample_from_pointcloud_fn('aa')
    #nonuniformsample_from_pointcloud_fn('ss')
    #nonuniformsample_from_pointcloud('train')
    #calculateNormal('train')
    #load_h5('/home/lqyu/workspace/PointSR/data/ModelNet10_tt.h5')
    save_h52()
    #normal_off_file()

================================================
FILE: code/utils/off2obj.py
================================================
#! /usr/bin/python
# Written by John Bowers
# http://johnsresearch.wordpress.com
# 2009
# You are welcome to use this however you want, this is public domain.

import sys

if len(sys.argv) == 3:
    off_path = sys.argv[1]
    obj_path = sys.argv[2]
else:
    print "USAGE: off2obj.py [path to mesh] [output path]"
    sys.exit(0)

# Class Mesh represents a mesh by a vertex list and a face list
# and has a method loadFromOffFile to load the Mesh data from an
# OFF file.
class Mesh:
    """Class Represents a Mesh by (V, F)"""
    def __init__(self):
	self.verts = []
	self.faces = []
	self.nVerts = 0
	self.nFaces = 0
	self.edges = None
    def writeToObjFile(self, pathToObjFile):
	objFile = open(pathToObjFile, 'w')
	objFile.write("# off2obj OBJ File")
	objFile.write("# http://johnsresearch.wordpress.com\n")
	for vert in self.verts:
	    objFile.write("v ")
	    objFile.write(str(vert[0]))
	    objFile.write(" ")
	    objFile.write(str(vert[1]))
	    objFile.write(" ")
	    objFile.write(str(vert[2]))
	    objFile.write("\n")
	objFile.write("s off\n")
	for face in self.faces:
	    objFile.write("f ")
	    objFile.write(str(face[0]+1))
	    objFile.write(" ")
	    objFile.write(str(face[1]+1))
	    objFile.write(" ")
	    objFile.write(str(face[2]+1))
	    objFile.write("\n")
	objFile.close()
    def loadFromOffFile(self, pathToOffFile):
	#Reset this mesh:
	self.verts = []
	self.faces = []
	self.nVerts = 0
	self.nFaces = 0

	#Open the file for reading:
	offFile = open(pathToOffFile, 'r')
	lines = offFile.readlines()

	#Read the number of verts and faces
	params = lines[1].split()
	self.nVerts = int(params[0])
	self.nFaces = int(params[1])

	#split the remaining lines into vert and face arrays
	vertLines = lines[2:2+self.nVerts]
	faceLines = lines[2+self.nVerts:2+self.nVerts+self.nFaces]

	#Create the verts array
	for vertLine in vertLines:
	    XYZ = vertLine.split()
	    self.verts.append([float(XYZ[0]), float(XYZ[1]), float(XYZ[2])])

	#Create the faces array
	for faceLine in faceLines:
	    XYZ = faceLine.split()
	    self.faces.append((int(XYZ[1]), int(XYZ[2]), int(XYZ[3])))
	    if not(int(XYZ[0]) == 3):
		print "ERROR: This OFF loader can only handle meshes with 3 vertex faces."
		print "A face with", XYZ[0], "vertices is included in the file. Exiting."
		offFile.close()
		sys.exit(0)

	#Cleanup
	offFile.close()
    def edgeList(self):
	if not(self.edges == None):
	    return self.edges
	self.edges = []
	for i in range(0, self.nVerts):
	    self.edges.append([])
	for face in self.faces:
	    i = face[0]
	    j = face[1]
	    k = face[2]
	    if not(j in self.edges[i]):
		self.edges[i].append(j)
	    if not(k in self.edges[i]):
		self.edges[i].append(k)
	    if not(i in self.edges[j]):
		self.edges[j].append(i)
	    if not(k in self.edges[j]):
		self.edges[j].append(k)
	    if not(i in self.edges[k]):
		self.edges[k].append(i)
	    if not(j in self.edges[k]):
		self.edges[k].append(j)
	return self.edges

""" Main Program """

mesh = Mesh()
mesh.loadFromOffFile(off_path)
mesh.writeToObjFile(obj_path)

================================================
FILE: code/utils/pc_util.py
================================================
""" Utility functions for processing point clouds.

Author: Charles R. Qi, Hao Su
Date: November 2016
"""

import os
import sys
from matplotlib import pyplot as plt
from matplotlib import colors

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)

# Draw point cloud
from eulerangles import euler2mat

# Point cloud IO
import numpy as np
from plyfile import PlyData, PlyElement

 
# ----------------------------------------
# Point Cloud/Volume Conversions
# ----------------------------------------

def point_cloud_to_volume_batch(point_clouds, vsize=12, radius=1.0, flatten=True):
    """ Input is BxNx3 batch of point cloud
        Output is Bx(vsize^3)
    """
    vol_list = []
    for b in range(point_clouds.shape[0]):
        vol = point_cloud_to_volume(np.squeeze(point_clouds[b,:,:]), vsize, radius)
        if flatten:
            vol_list.append(vol.flatten())
        else:
            vol_list.append(np.expand_dims(np.expand_dims(vol, -1), 0))
    if flatten:
        return np.vstack(vol_list)
    else:
        return np.concatenate(vol_list, 0)


def point_cloud_to_volume(points, vsize, radius=1.0):
    """ input is Nx3 points.
        output is vsize*vsize*vsize
        assumes points are in range [-radius, radius]
    """
    vol = np.zeros((vsize,vsize,vsize))
    voxel = 2*radius/float(vsize)
    locations = (points + radius)/voxel
    locations = locations.astype(int)
    vol[locations[:,0],locations[:,1],locations[:,2]] = 1.0
    return vol

#a = np.zeros((16,1024,3))
#print point_cloud_to_volume_batch(a, 12, 1.0, False).shape

def volume_to_point_cloud(vol):
    """ vol is occupancy grid (value = 0 or 1) of size vsize*vsize*vsize
        return Nx3 numpy array.
    """
    vsize = vol.shape[0]
    assert(vol.shape[1] == vsize and vol.shape[1] == vsize)
    points = []
    for a in range(vsize):
        for b in range(vsize):
            for c in range(vsize):
                if vol[a,b,c] == 1:
                    points.append(np.array([a,b,c]))
    if len(points) == 0:
        return np.zeros((0,3))
    points = np.vstack(points)
    return points

# ----------------------------------------
# Point cloud IO
# ----------------------------------------

def read_ply(filename):
    """ read XYZ point cloud from filename PLY file """
    plydata = PlyData.read(filename)
    pc = plydata['vertex'].data
    pc_array = np.array([[x, y, z] for x,y,z in pc])
    return pc_array


def write_ply(points, filename, text=True):
    """ input: Nx3, write points to filename as PLY format. """
    points = [(points[i,0], points[i,1], points[i,2]) for i in range(points.shape[0])]
    vertex = np.array(points, dtype=[('x', 'f4'), ('y', 'f4'),('z', 'f4')])
    el = PlyElement.describe(vertex, 'vertex', comments=['vertices'])
    PlyData([el], text=text).write(filename)


# ----------------------------------------
# Simple Point cloud and Volume Renderers
# ----------------------------------------

def draw_point_cloud(input_points, canvasSize=500, space=240, diameter=10,
                     xrot=0, yrot=0, zrot=0, switch_xyz=[0,1,2], normalize=True):
    """ Render point cloud to image with alpha channel.
        Input:
            points: Nx3 numpy array (+y is up direction)
        Output:
            gray image as numpy array of size canvasSizexcanvasSize
    """
    canvasSizeX = canvasSize
    canvasSizeY = canvasSize

    image = np.zeros((canvasSizeX, canvasSizeY))
    if input_points is None or input_points.shape[0] == 0:
        return image

    points = input_points[:, switch_xyz]
    M = euler2mat(zrot, yrot, xrot)
    points = (np.dot(M, points.transpose())).transpose()

    # Normalize the point cloud
    # We normalize scale to fit points in a unit sphere
    if normalize:
        centroid = np.mean(points, axis=0)
        points -= centroid
        furthest_distance = np.max(np.sqrt(np.sum(abs(points)**2,axis=-1)))
        points /= furthest_distance

    # Pre-compute the Gaussian disk
    radius = (diameter-1)/2.0
    disk = np.zeros((diameter, diameter))
    for i in range(diameter):
        for j in range(diameter):
            if (i - radius) * (i-radius) + (j-radius) * (j-radius) <= radius * radius:
                disk[i, j] = np.exp((-(i-radius)**2 - (j-radius)**2)/(radius**2))
    mask = np.argwhere(disk > 0)
    dx = mask[:, 0]
    dy = mask[:, 1]
    dv = disk[disk > 0]
    
    # Order points by z-buffer
    zorder = np.argsort(points[:, 2])
    points = points[zorder, :]
    points[:, 2] = (points[:, 2] - np.min(points[:, 2])) / (np.max(points[:, 2] - np.min(points[:, 2])))
    max_depth = np.max(points[:, 2])
       
    for i in range(points.shape[0]):
        j = points.shape[0] - i - 1
        x = points[j, 0]
        y = points[j, 1]
        xc = canvasSizeX/2 + (x*space)
        yc = canvasSizeY/2 + (y*space)
        xc = int(np.round(xc))
        yc = int(np.round(yc))
        
        px = dx + xc
        py = dy + yc
        #image[px, py] = image[px, py] * 0.7 + dv * (max_depth - points[j, 2]) * 0.3
        image[px, py] = image[px, py] * 0.7 + dv * 0.3

    val = np.max(image)
    val = np.percentile(image,99.9)
    image = image / val
    mask = image==0

    image[image>1.0]=1.0
    image = 1.0-image
    #image = np.expand_dims(image, axis=-1)
    #image = np.concatenate((image*0.3+0.7,np.ones_like(image), np.ones_like(image)), axis=2)
    #image = colors.hsv_to_rgb(image)
    image[mask]=1.0


    return image

def point_cloud_three_views(points,diameter=5):
    """ input points Nx3 numpy array (+y is up direction).
        return an numpy array gray image of size 500x1500. """ 
    # +y is up direction
    # xrot is azimuth
    # yrot is in-plane
    # zrot is elevation
    # img1 = draw_point_cloud(points, xrot=90/180.0*np.pi,  yrot=0/180.0*np.pi, zrot=0/180.0*np.pi,diameter=diameter)
    # img2 = draw_point_cloud(points, xrot=180/180.0*np.pi, yrot=0/180.0*np.pi, zrot=0/180.0*np.pi,diameter=diameter)
    # img3 = draw_point_cloud(points, xrot=0/180.0*np.pi,  yrot=-90/180.0*np.pi, zrot=0/180.0*np.pi,diameter=diameter)
    # image_large = np.concatenate([img1, img2, img3], 1)

    img1 = draw_point_cloud(points, zrot=110 / 180.0 * np.pi, xrot=135 / 180.0 * np.pi, yrot=0 / 180.0 * np.pi,diameter=diameter)
    img2 = draw_point_cloud(points, zrot=70 / 180.0 * np.pi, xrot=135 / 180.0 * np.pi, yrot=0 / 180.0 * np.pi,diameter=diameter)
    img3 = draw_point_cloud(points, zrot=180.0 / 180.0 * np.pi, xrot=90 / 180.0 * np.pi, yrot=0 / 180.0 * np.pi,diameter=diameter)
    image_large = np.concatenate([img1, img2, img3], 1)

    return image_large


from PIL import Image
def point_cloud_three_views_demo():
    """ Demo for draw_point_cloud function """
    points = read_ply('../third_party/mesh_sampling/piano.ply')
    im_array = point_cloud_three_views(points)
    img = Image.fromarray(np.uint8(im_array*255.0))
    img.save('piano.jpg')

if __name__=="__main__":
    point_cloud_three_views_demo()


import matplotlib.pyplot as plt
def pyplot_draw_point_cloud(points, output_filename):
    """ points is a Nx3 numpy array """
    fig = plt.figure()
    ax = fig.add_subplot(111, projection='3d')
    ax.scatter(points[:,0], points[:,1], points[:,2])
    ax.set_xlabel('x')
    ax.set_ylabel('y')
    ax.set_zlabel('z')
    #savefig(output_filename)

def pyplot_draw_volume(vol, output_filename):
    """ vol is of size vsize*vsize*vsize
        output an image to output_filename
    """
    points = volume_to_point_cloud(vol)
    pyplot_draw_point_cloud(points, output_filename)


================================================
FILE: code/utils/plyfile.py
================================================
#   Copyright 2014 Darsh Ranjan
#
#   This file is part of python-plyfile.
#
#   python-plyfile is free software: you can redistribute it and/or
#   modify it under the terms of the GNU General Public License as
#   published by the Free Software Foundation, either version 3 of the
#   License, or (at your option) any later version.
#
#   python-plyfile is distributed in the hope that it will be useful,
#   but WITHOUT ANY WARRANTY; without even the implied warranty of
#   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
#   General Public License for more details.
#
#   You should have received a copy of the GNU General Public License
#   along with python-plyfile.  If not, see
#       <http://www.gnu.org/licenses/>.

from itertools import islice as _islice

import numpy as _np
from sys import byteorder as _byteorder


try:
    _range = xrange
except NameError:
    _range = range


# Many-many relation
_data_type_relation = [
    ('int8', 'i1'),
    ('char', 'i1'),
    ('uint8', 'u1'),
    ('uchar', 'b1'),
    ('uchar', 'u1'),
    ('int16', 'i2'),
    ('short', 'i2'),
    ('uint16', 'u2'),
    ('ushort', 'u2'),
    ('int32', 'i4'),
    ('int', 'i4'),
    ('uint32', 'u4'),
    ('uint', 'u4'),
    ('float32', 'f4'),
    ('float', 'f4'),
    ('float64', 'f8'),
    ('double', 'f8')
]

_data_types = dict(_data_type_relation)
_data_type_reverse = dict((b, a) for (a, b) in _data_type_relation)

_types_list = []
_types_set = set()
for (_a, _b) in _data_type_relation:
    if _a not in _types_set:
        _types_list.append(_a)
        _types_set.add(_a)
    if _b not in _types_set:
        _types_list.append(_b)
        _types_set.add(_b)


_byte_order_map = {
    'ascii': '=',
    'binary_little_endian': '<',
    'binary_big_endian': '>'
}

_byte_order_reverse = {
    '<': 'binary_little_endian',
    '>': 'binary_big_endian'
}

_native_byte_order = {'little': '<', 'big': '>'}[_byteorder]


def _lookup_type(type_str):
    if type_str not in _data_type_reverse:
        try:
            type_str = _data_types[type_str]
        except KeyError:
            raise ValueError("field type %r not in %r" %
                             (type_str, _types_list))

    return _data_type_reverse[type_str]


def _split_line(line, n):
    fields = line.split(None, n)
    if len(fields) == n:
        fields.append('')

    assert len(fields) == n + 1

    return fields


def make2d(array, cols=None, dtype=None):
    '''
    Make a 2D array from an array of arrays.  The `cols' and `dtype'
    arguments can be omitted if the array is not empty.

    '''
    if (cols is None or dtype is None) and not len(array):
        raise RuntimeError("cols and dtype must be specified for empty "
                           "array")

    if cols is None:
        cols = len(array[0])

    if dtype is None:
        dtype = array[0].dtype

    return _np.fromiter(array, [('_', dtype, (cols,))],
                        count=len(array))['_']


class PlyParseError(Exception):

    '''
    Raised when a PLY file cannot be parsed.

    The attributes `element', `row', `property', and `message' give
    additional information.

    '''

    def __init__(self, message, element=None, row=None, prop=None):
        self.message = message
        self.element = element
        self.row = row
        self.prop = prop

        s = ''
        if self.element:
            s += 'element %r: ' % self.element.name
        if self.row is not None:
            s += 'row %d: ' % self.row
        if self.prop:
            s += 'property %r: ' % self.prop.name
        s += self.message

        Exception.__init__(self, s)

    def __repr__(self):
        return ('PlyParseError(%r, element=%r, row=%r, prop=%r)' %
                self.message, self.element, self.row, self.prop)


class PlyData(object):

    '''
    PLY file header and data.

    A PlyData instance is created in one of two ways: by the static
    method PlyData.read (to read a PLY file), or directly from __init__
    given a sequence of elements (which can then be written to a PLY
    file).

    '''

    def __init__(self, elements=[], text=False, byte_order='=',
                 comments=[], obj_info=[]):
        '''
        elements: sequence of PlyElement instances.

        text: whether the resulting PLY file will be text (True) or
            binary (False).

        byte_order: '<' for little-endian, '>' for big-endian, or '='
            for native.  This is only relevant if `text' is False.

        comments: sequence of strings that will be placed in the header
            between the 'ply' and 'format ...' lines.

        obj_info: like comments, but will be placed in the header with
            "obj_info ..." instead of "comment ...".

        '''
        if byte_order == '=' and not text:
            byte_order = _native_byte_order

        self.byte_order = byte_order
        self.text = text

        self.comments = list(comments)
        self.obj_info = list(obj_info)
        self.elements = elements

    def _get_elements(self):
        return self._elements

    def _set_elements(self, elements):
        self._elements = tuple(elements)
        self._index()

    elements = property(_get_elements, _set_elements)

    def _get_byte_order(self):
        return self._byte_order

    def _set_byte_order(self, byte_order):
        if byte_order not in ['<', '>', '=']:
            raise ValueError("byte order must be '<', '>', or '='")

        self._byte_order = byte_order

    byte_order = property(_get_byte_order, _set_byte_order)

    def _index(self):
        self._element_lookup = dict((elt.name, elt) for elt in
                                    self._elements)
        if len(self._element_lookup) != len(self._elements):
            raise ValueError("two elements with same name")

    @staticmethod
    def _parse_header(stream):
        '''
        Parse a PLY header from a readable file-like stream.

        '''
        lines = []
        comments = {'comment': [], 'obj_info': []}
        while True:
            line = stream.readline().decode('ascii').strip()
            fields = _split_line(line, 1)

            if fields[0] == 'end_header':
                break

            elif fields[0] in comments.keys():
                lines.append(fields)
            else:
                lines.append(line.split())

        a = 0
        if lines[a] != ['ply']:
            raise PlyParseError("expected 'ply'")

        a += 1
        while lines[a][0] in comments.keys():
            comments[lines[a][0]].append(lines[a][1])
            a += 1

        if lines[a][0] != 'format':
            raise PlyParseError("expected 'format'")

        if lines[a][2] != '1.0':
            raise PlyParseError("expected version '1.0'")

        if len(lines[a]) != 3:
            raise PlyParseError("too many fields after 'format'")

        fmt = lines[a][1]

        if fmt not in _byte_order_map:
            raise PlyParseError("don't understand format %r" % fmt)

        byte_order = _byte_order_map[fmt]
        text = fmt == 'ascii'

        a += 1
        while a < len(lines) and lines[a][0] in comments.keys():
            comments[lines[a][0]].append(lines[a][1])
            a += 1

        return PlyData(PlyElement._parse_multi(lines[a:]),
                       text, byte_order,
                       comments['comment'], comments['obj_info'])

    @staticmethod
    def read(stream):
        '''
        Read PLY data from a readable file-like object or filename.

        '''
        (must_close, stream) = _open_stream(stream, 'read')
        try:
            data = PlyData._parse_header(stream)
            for elt in data:
                elt._read(stream, data.text, data.byte_order)
        finally:
            if must_close:
                stream.close()

        return data

    def write(self, stream):
        '''
        Write PLY data to a writeable file-like object or filename.

        '''
        (must_close, stream) = _open_stream(stream, 'write')
        try:
            stream.write(self.header.encode('ascii'))
            stream.write(b'\r\n')
            for elt in self:
                elt._write(stream, self.text, self.byte_order)
        finally:
            if must_close:
                stream.close()

    @property
    def header(self):
        '''
        Provide PLY-formatted metadata for the instance.

        '''
        lines = ['ply']

        if self.text:
            lines.append('format ascii 1.0')
        else:
            lines.append('format ' +
                         _byte_order_reverse[self.byte_order] +
                         ' 1.0')

        # Some information is lost here, since all comments are placed
        # between the 'format' line and the first element.
        for c in self.comments:
            lines.append('comment ' + c)

        for c in self.obj_info:
            lines.append('obj_info ' + c)

        lines.extend(elt.header for elt in self.elements)
        lines.append('end_header')
        return '\r\n'.join(lines)

    def __iter__(self):
        return iter(self.elements)

    def __len__(self):
        return len(self.elements)

    def __contains__(self, name):
        return name in self._element_lookup

    def __getitem__(self, name):
        return self._element_lookup[name]

    def __str__(self):
        return self.header

    def __repr__(self):
        return ('PlyData(%r, text=%r, byte_order=%r, '
                'comments=%r, obj_info=%r)' %
                (self.elements, self.text, self.byte_order,
                 self.comments, self.obj_info))


def _open_stream(stream, read_or_write):
    if hasattr(stream, read_or_write):
        return (False, stream)
    try:
        return (True, open(stream, read_or_write[0] + 'b'))
    except TypeError:
        raise RuntimeError("expected open file or filename")


class PlyElement(object):

    '''
    PLY file element.

    A client of this library doesn't normally need to instantiate this
    directly, so the following is only for the sake of documenting the
    internals.

    Creating a PlyElement instance is generally done in one of two ways:
    as a byproduct of PlyData.read (when reading a PLY file) and by
    PlyElement.describe (before writing a PLY file).

    '''

    def __init__(self, name, properties, count, comments=[]):
        '''
        This is not part of the public interface.  The preferred methods
        of obtaining PlyElement instances are PlyData.read (to read from
        a file) and PlyElement.describe (to construct from a numpy
        array).

        '''
        self._name = str(name)
        self._check_name()
        self._count = count

        self._properties = tuple(properties)
        self._index()

        self.comments = list(comments)

        self._have_list = any(isinstance(p, PlyListProperty)
                              for p in self.properties)

    @property
    def count(self):
        return self._count

    def _get_data(self):
        return self._data

    def _set_data(self, data):
        self._data = data
        self._count = len(data)
        self._check_sanity()

    data = property(_get_data, _set_data)

    def _check_sanity(self):
        for prop in self.properties:
            if prop.name not in self._data.dtype.fields:
                raise ValueError("dangling property %r" % prop.name)

    def _get_properties(self):
        return self._properties

    def _set_properties(self, properties):
        self._properties = tuple(properties)
        self._check_sanity()
        self._index()

    properties = property(_get_properties, _set_properties)

    def _index(self):
        self._property_lookup = dict((prop.name, prop)
                                     for prop in self._properties)
        if len(self._property_lookup) != len(self._properties):
            raise ValueError("two properties with same name")

    def ply_property(self, name):
        return self._property_lookup[name]

    @property
    def name(self):
        return self._name

    def _check_name(self):
        if any(c.isspace() for c in self._name):
            msg = "element name %r contains spaces" % self._name
            raise ValueError(msg)

    def dtype(self, byte_order='='):
        '''
        Return the numpy dtype of the in-memory representation of the
        data.  (If there are no list properties, and the PLY format is
        binary, then this also accurately describes the on-disk
        representation of the element.)

        '''
        return [(prop.name, prop.dtype(byte_order))
                for prop in self.properties]

    @staticmethod
    def _parse_multi(header_lines):
        '''
        Parse a list of PLY element definitions.

        '''
        elements = []
        while header_lines:
            (elt, header_lines) = PlyElement._parse_one(header_lines)
            elements.append(elt)

        return elements

    @staticmethod
    def _parse_one(lines):
        '''
        Consume one element definition.  The unconsumed input is
        returned along with a PlyElement instance.

        '''
        a = 0
        line = lines[a]

        if line[0] != 'element':
            raise PlyParseError("expected 'element'")
        if len(line) > 3:
            raise PlyParseError("too many fields after 'element'")
        if len(line) < 3:
            raise PlyParseError("too few fields after 'element'")

        (name, count) = (line[1], int(line[2]))

        comments = []
        properties = []
        while True:
            a += 1
            if a >= len(lines):
                break

            if lines[a][0] == 'comment':
                comments.append(lines[a][1])
            elif lines[a][0] == 'property':
                properties.append(PlyProperty._parse_one(lines[a]))
            else:
                break

        return (PlyElement(name, properties, count, comments),
                lines[a:])

    @staticmethod
    def describe(data, name, len_types={}, val_types={},
                 comments=[]):
        '''
        Construct a PlyElement from an array's metadata.

        len_types and val_types can be given as mappings from list
        property names to type strings (like 'u1', 'f4', etc., or
        'int8', 'float32', etc.). These can be used to define the length
        and value types of list properties.  List property lengths
        always default to type 'u1' (8-bit unsigned integer), and value
        types default to 'i4' (32-bit integer).

        '''
        if not isinstance(data, _np.ndarray):
            raise TypeError("only numpy arrays are supported")

        if len(data.shape) != 1:
            raise ValueError("only one-dimensional arrays are "
                             "supported")

        count = len(data)

        properties = []
        descr = data.dtype.descr

        for t in descr:
            if not isinstance(t[1], str):
                raise ValueError("nested records not supported")

            if not t[0]:
                raise ValueError("field with empty name")

            if len(t) != 2 or t[1][1] == 'O':
                # non-scalar field, which corresponds to a list
                # property in PLY.

                if t[1][1] == 'O':
                    if len(t) != 2:
                        raise ValueError("non-scalar object fields not "
                                         "supported")

                len_str = _data_type_reverse[len_types.get(t[0], 'u1')]
                if t[1][1] == 'O':
                    val_type = val_types.get(t[0], 'i4')
                    val_str = _lookup_type(val_type)
                else:
                    val_str = _lookup_type(t[1][1:])

                prop = PlyListProperty(t[0], len_str, val_str)
            else:
                val_str = _lookup_type(t[1][1:])
                prop = PlyProperty(t[0], val_str)

            properties.append(prop)

        elt = PlyElement(name, properties, count, comments)
        elt.data = data

        return elt

    def _read(self, stream, text, byte_order):
        '''
        Read the actual data from a PLY file.

        '''
        if text:
            self._read_txt(stream)
        else:
            if self._have_list:
                # There are list properties, so a simple load is
                # impossible.
                self._read_bin(stream, byte_order)
            else:
                # There are no list properties, so loading the data is
                # much more straightforward.
                self._data = _np.fromfile(stream,
                                          self.dtype(byte_order),
                                          self.count)

        if len(self._data) < self.count:
            k = len(self._data)
            del self._data
            raise PlyParseError("early end-of-file", self, k)

        self._check_sanity()

    def _write(self, stream, text, byte_order):
        '''
        Write the data to a PLY file.

        '''
        if text:
            self._write_txt(stream)
        else:
            if self._have_list:
                # There are list properties, so serialization is
                # slightly complicated.
                self._write_bin(stream, byte_order)
            else:
                # no list properties, so serialization is
                # straightforward.
                self.data.astype(self.dtype(byte_order),
                                 copy=False).tofile(stream)

    def _read_txt(self, stream):
        '''
        Load a PLY element from an ASCII-format PLY file.  The element
        may contain list properties.

        '''
        self._data = _np.empty(self.count, dtype=self.dtype())

        k = 0
        for line in _islice(iter(stream.readline, b''), self.count):
            fields = iter(line.strip().split())
            for prop in self.properties:
                try:
                    self._data[prop.name][k] = prop._from_fields(fields)
                except StopIteration:
                    raise PlyParseError("early end-of-line",
                                        self, k, prop)
                except ValueError:
                    raise PlyParseError("malformed input",
                                        self, k, prop)
            try:
                next(fields)
            except StopIteration:
                pass
            else:
                raise PlyParseError("expected end-of-line", self, k)
            k += 1

        if k < self.count:
            del self._data
            raise PlyParseError("early end-of-file", self, k)

    def _write_txt(self, stream):
        '''
        Save a PLY element to an ASCII-format PLY file.  The element may
        contain list properties.

        '''
        for rec in self.data:
            fields = []
            for prop in self.properties:
                fields.extend(prop._to_fields(rec[prop.name]))

            _np.savetxt(stream, [fields], '%.18g', newline='\r\n')

    def _read_bin(self, stream, byte_order):
        '''
        Load a PLY element from a binary PLY file.  The element may
        contain list properties.

        '''
        self._data = _np.empty(self.count, dtype=self.dtype(byte_order))

        for k in _range(self.count):
            for prop in self.properties:
                try:
                    self._data[prop.name][k] = \
                        prop._read_bin(stream, byte_order)
                except StopIteration:
                    raise PlyParseError("early end-of-file",
                                        self, k, prop)

    def _write_bin(self, stream, byte_order):
        '''
        Save a PLY element to a binary PLY file.  The element may
        contain list properties.

        '''
        for rec in self.data:
            for prop in self.properties:
                prop._write_bin(rec[prop.name], stream, byte_order)

    @property
    def header(self):
        '''
        Format this element's metadata as it would appear in a PLY
        header.

        '''
        lines = ['element %s %d' % (self.name, self.count)]

        # Some information is lost here, since all comments are placed
        # between the 'element' line and the first property definition.
        for c in self.comments:
            lines.append('comment ' + c)

        lines.extend(list(map(str, self.properties)))

        return '\r\n'.join(lines)

    def __getitem__(self, key):
        return self.data[key]

    def __setitem__(self, key, value):
        self.data[key] = value

    def __str__(self):
        return self.header

    def __repr__(self):
        return ('PlyElement(%r, %r, count=%d, comments=%r)' %
                (self.name, self.properties, self.count,
                 self.comments))


class PlyProperty(object):

    '''
    PLY property description.  This class is pure metadata; the data
    itself is contained in PlyElement instances.

    '''

    def __init__(self, name, val_dtype):
        self._name = str(name)
        self._check_name()
        self.val_dtype = val_dtype

    def _get_val_dtype(self):
        return self._val_dtype

    def _set_val_dtype(self, val_dtype):
        self._val_dtype = _data_types[_lookup_type(val_dtype)]

    val_dtype = property(_get_val_dtype, _set_val_dtype)

    @property
    def name(self):
        return self._name

    def _check_name(self):
        if any(c.isspace() for c in self._name):
            msg = "Error: property name %r contains spaces" % self._name
            raise RuntimeError(msg)

    @staticmethod
    def _parse_one(line):
        assert line[0] == 'property'

        if line[1] == 'list':
            if len(line) > 5:
                raise PlyParseError("too many fields after "
                                    "'property list'")
            if len(line) < 5:
                raise PlyParseError("too few fields after "
                                    "'property list'")

            return PlyListProperty(line[4], line[2], line[3])

        else:
            if len(line) > 3:
                raise PlyParseError("too many fields after "
                                    "'property'")
            if len(line) < 3:
                raise PlyParseError("too few fields after "
                                    "'property'")

            return PlyProperty(line[2], line[1])

    def dtype(self, byte_order='='):
        '''
        Return the numpy dtype description for this property (as a tuple
        of strings).

        '''
        return byte_order + self.val_dtype

    def _from_fields(self, fields):
        '''
        Parse from generator.  Raise StopIteration if the property could
        not be read.

        '''
        return _np.dtype(self.dtype()).type(next(fields))

    def _to_fields(self, data):
        '''
        Return generator over one item.

        '''
        yield _np.dtype(self.dtype()).type(data)

    def _read_bin(self, stream, byte_order):
        '''
        Read data from a binary stream.  Raise StopIteration if the
        property could not be read.

        '''
        try:
            return _np.fromfile(stream, self.dtype(byte_order), 1)[0]
        except IndexError:
            raise StopIteration

    def _write_bin(self, data, stream, byte_order):
        '''
        Write data to a binary stream.

        '''
        _np.dtype(self.dtype(byte_order)).type(data).tofile(stream)

    def __str__(self):
        val_str = _data_type_reverse[self.val_dtype]
        return 'property %s %s' % (val_str, self.name)

    def __repr__(self):
        return 'PlyProperty(%r, %r)' % (self.name,
                                        _lookup_type(self.val_dtype))


class PlyListProperty(PlyProperty):

    '''
    PLY list property description.

    '''

    def __init__(self, name, len_dtype, val_dtype):
        PlyProperty.__init__(self, name, val_dtype)

        self.len_dtype = len_dtype

    def _get_len_dtype(self):
        return self._len_dtype

    def _set_len_dtype(self, len_dtype):
        self._len_dtype = _data_types[_lookup_type(len_dtype)]

    len_dtype = property(_get_len_dtype, _set_len_dtype)

    def dtype(self, byte_order='='):
        '''
        List properties always have a numpy dtype of "object".

        '''
        return '|O'

    def list_dtype(self, byte_order='='):
        '''
        Return the pair (len_dtype, val_dtype) (both numpy-friendly
        strings).

        '''
        return (byte_order + self.len_dtype,
                byte_order + self.val_dtype)

    def _from_fields(self, fields):
        (len_t, val_t) = self.list_dtype()

        n = int(_np.dtype(len_t).type(next(fields)))

        data = _np.loadtxt(list(_islice(fields, n)), val_t, ndmin=1)
        if len(data) < n:
            raise StopIteration

        return data

    def _to_fields(self, data):
        '''
        Return generator over the (numerical) PLY representation of the
        list data (length followed by actual data).

        '''
        (len_t, val_t) = self.list_dtype()

        data = _np.asarray(data, dtype=val_t).ravel()

        yield _np.dtype(len_t).type(data.size)
        for x in data:
            yield x

    def _read_bin(self, stream, byte_order):
        (len_t, val_t) = self.list_dtype(byte_order)

        try:
            n = _np.fromfile(stream, len_t, 1)[0]
        except IndexError:
            raise StopIteration

        data = _np.fromfile(stream, val_t, n)
        if len(data) < n:
            raise StopIteration

        return data

    def _write_bin(self, data, stream, byte_order):
        '''
        Write data to a binary stream.

        '''
        (len_t, val_t) = self.list_dtype(byte_order)

        data = _np.asarray(data, dtype=val_t).ravel()

        _np.array(data.size, dtype=len_t).tofile(stream)
        data.tofile(stream)

    def __str__(self):
        len_str = _data_type_reverse[self.len_dtype]
        val_str = _data_type_reverse[self.val_dtype]
        return 'property list %s %s %s' % (len_str, val_str, self.name)

    def __repr__(self):
        return ('PlyListProperty(%r, %r, %r)' %
                (self.name,
                 _lookup_type(self.len_dtype),
                 _lookup_type(self.val_dtype)))


================================================
FILE: code/utils/pointnet_util.py
================================================
""" PointNet++ Layers

Author: Charles R. Qi
Date: November 2017
"""

import os
import sys
from tf_ops.sampling.tf_sampling import farthest_point_sample, gather_point
from tf_ops.grouping.tf_grouping import query_ball_point, group_point, knn_point
from tf_ops.interpolation.tf_interpolate import three_nn, three_interpolate
import tensorflow as tf
import numpy as np
import tf_util2

def sample_and_group(npoint, radius, nsample, xyz, points, tnet_spec=None, knn=False, use_xyz=True):
    '''
    Input:
        npoint: int32
        radius: float32
        nsample: int32
        xyz: (batch_size, ndataset, 3) TF tensor
        points: (batch_size, ndataset, channel) TF tensor, if None will just use xyz as points
        tnet_spec: dict (keys: mlp, mlp2, is_training, bn_decay), if None do not apply tnet
        knn: bool, if True use kNN instead of radius search
        use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
    Output:
        new_xyz: (batch_size, npoint, 3) TF tensor
        new_points: (batch_size, npoint, nsample, 3+channel) TF tensor
        idx: (batch_size, npoint, nsample) TF tensor, indices of local points as in ndataset points
        grouped_xyz: (batch_size, npoint, nsample, 3) TF tensor, normalized point XYZs
            (subtracted by seed point XYZ) in local regions
    '''

    new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz)) # (batch_size, npoint, 3)
    if knn:
        _,idx = knn_point(nsample, xyz, new_xyz)
    else:
        if np.isscalar(radius):
            idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz)
        else:
            idx_list = []
            for radius_one, xyz_one, new_xyz_one in zip(tf.unstack(radius,axis=0), tf.unstack(xyz, axis=0),tf.unstack(new_xyz, axis=0)):
                idx_one, _ = query_ball_point(radius_one, nsample, tf.expand_dims(xyz_one, axis=0), tf.expand_dims(new_xyz_one, axis=0))
                idx_list.append(idx_one)
            idx = tf.stack(idx_list, axis=0)
            idx = tf.squeeze(idx, axis=1)

    grouped_xyz = group_point(xyz, idx) # (batch_size, npoint, nsample, 3)
    grouped_xyz -= tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]) # translation normalization
    if tnet_spec is not None:
        grouped_xyz = tnet(grouped_xyz, tnet_spec)
    if points is not None:
        grouped_points = group_point(points, idx) # (batch_size, npoint, nsample, channel)
        if use_xyz:
            # new_points = tf.concat([grouped_xyz, tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1]),grouped_points], axis=-1) # (batch_size, npoint, nample, 3+channel)
            new_points = tf.concat([grouped_xyz, grouped_points],axis=-1)  # (batch_size, npoint, nample, 3+channel)
        else:
            new_points = grouped_points
    else:
        # new_points =  tf.concat([grouped_xyz, tf.tile(tf.expand_dims(new_xyz, 2), [1,1,nsample,1])], axis=-1)
        new_points = grouped_xyz

    return new_xyz, new_points, idx, grouped_xyz


def sample_and_group_all(xyz, points, use_xyz=True):
    '''
    Inputs:
        xyz: (batch_size, ndataset, 3) TF tensor
        points: (batch_size, ndataset, channel) TF tensor, if None will just use xyz as points
        use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
    Outputs:
        new_xyz: (batch_size, 1, 3) as (0,0,0)
        new_points: (batch_size, 1, ndataset, 3+channel) TF tensor
    Note:
        Equivalent to sample_and_group with npoint=1, radius=inf, use (0,0,0) as the centroid
    '''
    batch_size = xyz.get_shape()[0].value
    nsample = xyz.get_shape()[1].value
    new_xyz = tf.constant(np.tile(np.array([0,0,0]).reshape((1,1,3)), (batch_size,1,1)),dtype=tf.float32) # (batch_size, 1, 3)
    idx = tf.constant(np.tile(np.array(range(nsample)).reshape((1,1,nsample)), (batch_size,1,1)))
    grouped_xyz = tf.reshape(xyz, (batch_size, 1, nsample, 3)) # (batch_size, npoint=1, nsample, 3)
    if points is not None:
        if use_xyz:
            new_points = tf.concat([xyz, points], axis=2) # (batch_size, 16, 259)
        else:
            new_points = points
        new_points = tf.expand_dims(new_points, 1) # (batch_size, 1, 16, 259)
    else:
        new_points = grouped_xyz
    return new_xyz, new_points, idx, grouped_xyz


def pointnet_sa_module(xyz, points, npoint, radius, nsample, mlp, mlp2, group_all, is_training,
                       bn_decay, scope, bn=True, ibn=False, pooling='max', tnet_spec=None, knn=False, use_xyz=True):
    ''' PointNet Set Abstraction (SA) Module
        Input:
            xyz: (batch_size, ndataset, 3) TF tensor
            points: (batch_size, ndataset, channel) TF tensor
            npoint: int32 -- #points sampled in farthest point sampling
            radius: float32 -- search radius in local region
            batch_radius: the size of each object
            nsample: int32 -- how many points in each local region
            mlp: list of int32 -- output size for MLP on each point
            mlp2: list of int32 -- output size for MLP on each region
            group_all: bool -- group all points into one PC if set true, OVERRIDE
                npoint, radius and nsample settings
            use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
        Return:
            new_xyz: (batch_size, npoint, 3) TF tensor
            new_points: (batch_size, npoint, mlp[-1] or mlp2[-1]) TF tensor
            idx: (batch_size, npoint, nsample) int32 -- indices for local regions
    '''
    with tf.variable_scope(scope) as sc:
        if group_all:
            nsample = xyz.get_shape()[1].value
            new_xyz, new_points, idx, grouped_xyz = sample_and_group_all(xyz, points, use_xyz)
        else:
            new_xyz, new_points, idx, grouped_xyz = sample_and_group(npoint, radius, nsample, xyz, points, tnet_spec, knn, use_xyz)
        if mlp2 is None: mlp2 = []
        for i, num_out_channel in enumerate(mlp):
            new_points = tf_util2.conv2d(new_points, num_out_channel, [1,1],
                                        padding='VALID', stride=[1,1],
                                        bn=bn, ibn=ibn, is_training=is_training,
                                        scope='conv%d'%(i), bn_decay=bn_decay) 
        if pooling=='avg':
            new_points = tf.layers.average_pooling2d(new_points, [1,nsample], [1,1], padding='VALID', name='avgpool1')
        elif pooling=='weighted_avg':
            with tf.variable_scope('weighted_avg1'):
                dists = tf.norm(grouped_xyz,axis=-1,ord=2,keep_dims=True)
                exp_dists = tf.exp(-dists * 5)
                weights = exp_dists/tf.reduce_sum(exp_dists,axis=2,keep_dims=True) # (batch_size, npoint, nsample, 1)
                new_points *= weights # (batch_size, npoint, nsample, mlp[-1])
                new_points = tf.reduce_sum(new_points, axis=2, keep_dims=True)
        elif pooling=='max':
            new_points = tf.reduce_max(new_points, axis=[2], keep_dims=True)
        elif pooling=='min':
            new_points = tf.layers.max_pooling2d(-1 * new_points, [1, nsample], [1, 1], padding='VALID',name='minpool1')
        elif pooling=='max_and_avg':
            avg_points = tf.layers.max_pooling2d(new_points, [1,nsample], [1,1], padding='VALID', name='maxpool1')
            max_points = tf.layers.average_pooling2d(new_points, [1,nsample],[1,1], padding='VALID', name='avgpool1')
            new_points = tf.concat([avg_points, max_points], axis=-1)
            
        if mlp2 is None: mlp2 = []
        for i, num_out_channel in enumerate(mlp2):
            new_points = tf_util2.conv2d(new_points, num_out_channel, [1,1],
                                        padding='VALID', stride=[1,1],
                                        bn=bn, ibn=ibn,is_training=is_training,
                                        scope='conv_post_%d'%(i), bn_decay=bn_decay) 
        new_points = tf.squeeze(new_points, [2]) # (batch_size, npoints, mlp2[-1])
        return new_xyz, new_points, idx

def pointnet_sa_module_msg(xyz, points, npoint, radius_list, nsample_list, mlp_list, is_training, bn_decay, scope, bn=True, ibn = False, use_xyz=True):
    ''' PointNet Set Abstraction (SA) module with Multi-Scale Grouping (MSG)
        Input:
            xyz: (batch_size, ndataset, 3) TF tensor
            points: (batch_size, ndataset, channel) TF tensor
            npoint: int32 -- #points sampled in farthest point sampling
            radius: list of float32 -- search radius in local region
            nsample: list of int32 -- how many points in each local region
            mlp: list of list of int32 -- output size for MLP on each point
            use_xyz: bool, if True concat XYZ with local point features, otherwise just use point features
        Return:
            new_xyz: (batch_size, npoint, 3) TF tensor
            new_points: (batch_size, npoint, \sum_k{mlp[k][-1]}) TF tensor
    '''
    with tf.variable_scope(scope) as sc:
        new_xyz = gather_point(xyz, farthest_point_sample(npoint, xyz))
        new_points_list = []
        for i in range(len(radius_list)):
            radius = radius_list[i]
            nsample = nsample_list[i]
            idx, pts_cnt = query_ball_point(radius, nsample, xyz, new_xyz)
            grouped_xyz = group_point(xyz, idx)
            grouped_xyz -= tf.expand_dims(new_xyz, 2)
            if points is not None:
                grouped_points = group_point(points, idx)
                if use_xyz:
                    grouped_points = tf.concat([grouped_points, grouped_xyz], axis=-1)
            else:
                grouped_points = grouped_xyz
            for j,num_out_channel in enumerate(mlp_list[i]):
                grouped_points = tf_util2.conv2d(grouped_points, num_out_channel, [1,1],
                                                padding='VALID', stride=[1,1], bn=bn, ibn=ibn,is_training=is_training,
                                                scope='conv%d_%d'%(i,j), bn_decay=bn_decay)
            new_points = tf.reduce_max(grouped_points, axis=[2])
            new_points_list.append(new_points)
        new_points_concat = tf.concat(new_points_list, axis=-1)
        return new_xyz, new_points_concat

 
def pointnet_fp_module(xyz1, xyz2, points1, points2, mlp, is_training, bn_decay, scope, bn=True,ibn=False):
    ''' PointNet Feature Propogation (FP) Module
        Input:                                                                                                      
            xyz1: (batch_size, ndataset1, 3) TF tensor                                                              
            xyz2: (batch_size, ndataset2, 3) TF tensor, sparser than xyz1                                           
            points1: (batch_size, ndataset1, nchannel1) TF tensor                                                   
            points2: (batch_size, ndataset2, nchannel2) TF tensor
            mlp: list of int32 -- output size for MLP on each point                                                 
        Return:
            new_points: (batch_size, ndataset1, mlp[-1]) TF tensor
    '''
    with tf.variable_scope(scope) as sc:
        dist, idx = three_nn(xyz1, xyz2)
        dist = tf.maximum(dist, 1e-10)
        norm = tf.reduce_sum((1.0/dist),axis=2,keep_dims=True)
        norm = tf.tile(norm,[1,1,3])
        weight = (1.0/dist) / norm
        interpolated_points = three_interpolate(points2, idx, weight)

        if points1 is not None:
            new_points1 = tf.concat(axis=2, values=[interpolated_points, points1]) # B,ndataset1,nchannel1+nchannel2
        else:
            new_points1 = interpolated_points
        new_points1 = tf.expand_dims(new_points1, 2)
        for i, num_out_channel in enumerate(mlp):
            new_points1 = tf_util2.conv2d(new_points1, num_out_channel, [1,1],
                                         padding='VALID', stride=[1,1],
                                         bn=bn, ibn=ibn,is_training=is_training,
                                         scope='conv_%d'%(i), bn_decay=bn_decay)
        new_points1 = tf.squeeze(new_points1, [2]) # B,ndataset1,mlp[-1]
        return new_points1


================================================
FILE: code/utils/provider.py
================================================
import os
import sys
import numpy as np
import h5py
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)

# Download dataset for point cloud classification
DATA_DIR = os.path.join(BASE_DIR, 'data')
if not os.path.exists(DATA_DIR):
    os.mkdir(DATA_DIR)
if not os.path.exists(os.path.join(DATA_DIR, 'modelnet40_ply_hdf5_2048')):
    www = 'https://shapenet.cs.stanford.edu/media/modelnet40_ply_hdf5_2048.zip'
    zipfile = os.path.basename(www)
    os.system('wget %s; unzip %s' % (www, zipfile))
    os.system('mv %s %s' % (zipfile[:-4], DATA_DIR))
    os.system('rm %s' % (zipfile))


def shuffle_data(data, labels):
    """ Shuffle data and labels.
        Input:
          data: B,N,... numpy array
          label: B,... numpy array
        Return:
          shuffled data, label and shuffle indices
    """
    idx = np.arange(len(labels))
    np.random.shuffle(idx)
    return data[idx, ...], labels[idx], idx


def rotate_point_cloud(batch_data):
    """ Randomly rotate the point clouds to augument the dataset
        rotation is per shape based along up direction
        Input:
          BxNx3 array, original batch of point clouds
        Return:
          BxNx3 array, rotated batch of point clouds
    """
    rotated_data = np.zeros(batch_data.shape, dtype=np.float32)
    for k in xrange(batch_data.shape[0]):
        rotation_angle = np.random.uniform() * 2 * np.pi
        cosval = np.cos(rotation_angle)
        sinval = np.sin(rotation_angle)
        rotation_matrix = np.array([[cosval, 0, sinval],
                                    [0, 1, 0],
                                    [-sinval, 0, cosval]])
        shape_pc = batch_data[k, ...]
        rotated_data[k, ...] = np.dot(shape_pc.reshape((-1, 3)), rotation_matrix)
    return rotated_data


def rotate_point_cloud_by_angle(batch_data, rotation_angle):
    """ Rotate the point cloud along up direction with certain angle.
        Input:
          BxNx3 array, original batch of point clouds
        Return:
          BxNx3 array, rotated batch of point clouds
    """
    rotated_data = np.zeros(batch_data.shape, dtype=np.float32)
    for k in xrange(batch_data.shape[0]):
        #rotation_angle = np.random.uniform() * 2 * np.pi
        cosval = np.cos(rotation_angle)
        sinval = np.sin(rotation_angle)
        rotation_matrix = np.array([[cosval, 0, sinval],
                                    [0, 1, 0],
                                    [-sinval, 0, cosval]])
        shape_pc = batch_data[k, ...]
        rotated_data[k, ...] = np.dot(shape_pc.reshape((-1, 3)), rotation_matrix)
    return rotated_data


def rotate_perturbation_point_cloud(batch_data, angle_sigma=0.06, angle_clip=0.18):
    """ Randomly perturb the point clouds by small rotations
        Input:
          BxNx3 array, original batch of point clouds
        Return:
          BxNx3 array, rotated batch of point clouds
    """
    rotated_data = np.zeros(batch_data.shape, dtype=np.float32)
    for k in xrange(batch_data.shape[0]):
        angles = np.clip(angle_sigma*np.random.randn(3), -angle_clip, angle_clip)
        Rx = np.array([[1,0,0],
                       [0,np.cos(angles[0]),-np.sin(angles[0])],
                       [0,np.sin(angles[0]),np.cos(angles[0])]])
        Ry = np.array([[np.cos(angles[1]),0,np.sin(angles[1])],
                       [0,1,0],
                       [-np.sin(angles[1]),0,np.cos(angles[1])]])
        Rz = np.array([[np.cos(angles[2]),-np.sin(angles[2]),0],
                       [np.sin(angles[2]),np.cos(angles[2]),0],
                       [0,0,1]])
        R = np.dot(Rz, np.dot(Ry,Rx))
        shape_pc = batch_data[k, ...]
        rotated_data[k, ...] = np.dot(shape_pc.reshape((-1, 3)), R)
    return rotated_data


def jitter_point_cloud(batch_data, sigma=0.01, clip=0.05):
    """ Randomly jitter points. jittering is per point.
        Input:
          BxNx3 array, original batch of point clouds
        Return:
          BxNx3 array, jittered batch of point clouds
    """
    B, N, C = batch_data.shape
    assert(clip > 0)
    jittered_data = np.clip(sigma * np.random.randn(B, N, C), -1*clip, clip)
    jittered_data += batch_data
    return jittered_data

def shift_point_cloud(batch_data, shift_range=0.1):
    """ Randomly shift point cloud. Shift is per point cloud.
        Input:
          BxNx3 array, original batch of point clouds
        Return:
          BxNx3 array, shifted batch of point clouds
    """
    B, N, C = batch_data.shape
    shifts = np.random.uniform(-shift_range, shift_range, (B,3))
    for batch_index in range(B):
        batch_data[batch_index,:,:] += shifts[batch_index,:]
    return batch_data


def random_scale_point_cloud(batch_data, scale_low=0.8, scale_high=1.25):
    """ Randomly scale the point cloud. Scale is per point cloud.
        Input:
            BxNx3 array, original batch of point clouds
        Return:
            BxNx3 array, scaled batch of point clouds
    """
    B, N, C = batch_data.shape
    scales = np.random.uniform(scale_low, scale_high, B)
    for batch_index in range(B):
        batch_data[batch_index,:,:] *= scales[batch_index]
    return batch_data

def getDataFiles(list_filename):
    return [line.rstrip() for line in open(list_filename)]

def load_h5(h5_filename):
    f = h5py.File(h5_filename)
    data = f['data'][:]
    label = f['label'][:]
    return (data, label)

def loadDataFile(filename):
    return load_h5(filename)

================================================
FILE: code/utils/show3d.py
================================================
'''

The default behavior is to visualize the points as white dots
>>>show3d.showpoints(np.random.rand(10000,3))

Control:
key q: quit
key Q: sys.exit(0)
key n: zoom in
key m: zoom out
key s: save screenshot to 'show3d.png'
Mouse: rotate

You can also play a video by specifying waittime
>>>[show3d.showpoints(np.random.rand(10000,3),waittime=10) for i in xrange(10000)]

Color can also be useful
>>>green=np.linspace(0,1,10000)
>>>red=np.linspace(1,0,10000)
>>>blue=np.linspace(1,0,10000)**2
>>>show3d.showpoints(np.random.rand(10000,3),green,red,blue)

Additional Parameters
---------------------
normalizecolor:
    if True (default), scale the maximum color to 1 for each channel.
magnifyBlue:
    if True, magnify the blue dots to make them more visible
background:
    the background color. Defaults to black (0,0,0)
freezerot:
    disable rotation

'''


import numpy as np
# import cv2
import sys
showsz=800
mousex,mousey=0.5,0.5
zoom=1.0
changed=True
def onmouse(*args):
    global mousex,mousey,changed
    y=args[1]
    x=args[2]
    mousex=x/float(showsz)
    mousey=y/float(showsz)
    changed=True

def showpoints(xyz,c0=None,c1=None,c2=None,waittime=0,showrot=False,magnifyBlue=0,freezerot=False,background=(0,0,0),normalizecolor=True):
    # cv2.namedWindow('show3d')
    # cv2.moveWindow('show3d', 0, 0)
    # cv2.setMouseCallback('show3d', onmouse)

    global showsz,mousex,mousey,zoom,changed
    if len(xyz.shape)!=2 or xyz.shape[1]!=3:
        raise Exception('showpoints expects (n,3) shape for xyz')
    if c0 is not None and c0.shape!=xyz.shape[:1]:
        raise Exception('showpoints expects (n,) shape for c0')
    if c1 is not None and c1.shape!=xyz.shape[:1]:
        raise Exception('showpoints expects (n,) shape for c1')
    if c2 is not None and c2.shape!=xyz.shape[:1]:
        raise Exception('showpoints expects (n,) shape for c2')
    xyz=xyz-xyz.mean(axis=0)
    radius=((xyz**2).sum(axis=-1)**0.5).max()
    xyz/=(radius*2.2)/showsz
    if c0 is None:
        c0=np.zeros((len(xyz),),dtype='float32')+255
    if c1 is None:
        c1=c0
    if c2 is None:
        c2=c0
    if normalizecolor:
        c0=c0/((c0.max()+1e-14)/255.0)
        c1=c1/((c1.max()+1e-14)/255.0)
        c2=c2/((c2.max()+1e-14)/255.0)

    show=np.zeros((showsz,showsz,3),dtype='uint8')
    def render():
        rotmat=np.eye(3)
        if not freezerot:
            xangle=(mousey-0.5)*np.pi*1.2
        else:
            xangle=0
        rotmat=rotmat.dot(np.array([
            [1.0,0.0,0.0],
            [0.0,np.cos(xangle),-np.sin(xangle)],
            [0.0,np.sin(xangle),np.cos(xangle)],
            ]))
        if not freezerot:
            yangle=(mousex-0.5)*np.pi*1.2
        else:
            yangle=0
        rotmat=rotmat.dot(np.array([
            [np.cos(yangle),0.0,-np.sin(yangle)],
            [0.0,1.0,0.0],
            [np.sin(yangle),0.0,np.cos(yangle)],
            ]))
        rotmat*=zoom
        nxyz=xyz.dot(rotmat)
        nz=nxyz[:,2].argsort()
        nxyz=nxyz[nz]
        nxyz=(nxyz[:,:2]+[showsz/2,showsz/2]).astype('int32')
        p=nxyz[:,0]*showsz+nxyz[:,1]
        show[:]=background
        m=(nxyz[:,0]>=0)*(nxyz[:,0]<showsz)*(nxyz[:,1]>=0)*(nxyz[:,1]<showsz)
        show.reshape((showsz*showsz,3))[p[m],1]=c0[nz][m]
        show.reshape((showsz*showsz,3))[p[m],2]=c1[nz][m]
        show.reshape((showsz*showsz,3))[p[m],0]=c2[nz][m]
        if magnifyBlue>0:
            show[:,:,0]=np.maximum(show[:,:,0],np.roll(show[:,:,0],1,axis=0))
            if magnifyBlue>=2:
                show[:,:,0]=np.maximum(show[:,:,0],np.roll(show[:,:,0],-1,axis=0))
            show[:,:,0]=np.maximum(show[:,:,0],np.roll(show[:,:,0],1,axis=1))
            if magnifyBlue>=2:
                show[:,:,0]=np.maximum(show[:,:,0],np.roll(show[:,:,0],-1,axis=1))
        if showrot:
	    print("Show")
            # cv2.putText(show,'xangle %d'%(int(xangle/np.pi*180)),(30,showsz-30),0,0.5,cv2.cv.CV_RGB(255,0,0))
            # cv2.putText(show,'yangle %d'%(int(yangle/np.pi*180)),(30,showsz-50),0,0.5,cv2.cv.CV_RGB(255,0,0))
            # cv2.putText(show,'zoom %d%%'%(int(zoom*100)),(30,showsz-70),0,0.5,cv2.cv.CV_RGB(255,0,0))
    changed=True
    while True:
        if changed:
            render()
            changed=False
        # cv2.imshow('show3d',show)
        if waittime==0:
            # cmd=cv2.waitKey(10)%256
            print("Some waittime small")
        else:
	    print("Some waittime")
            # cmd=cv2.waitKey(waittime)%256
        if cmd==ord('q'):
            break
        elif cmd==ord('Q'):
            sys.exit(0)
        if cmd==ord('n'):
            zoom*=1.1
            changed=True
        elif cmd==ord('m'):
            zoom/=1.1
            changed=True
        elif cmd==ord('r'):
            zoom=1.0
            changed=True
        elif cmd==ord('s'):
	    print("to show img")
            # cv2.imwrite('show3d.png',show)
        if waittime!=0:
            break
    return cmd
if __name__=='__main__':
    showpoints(np.random.rand(10000,3))
    green=np.linspace(0,1,10000)
    red=np.linspace(1,0,10000)**0.5
    blue=np.linspace(1,0,10000)
    showpoints(np.random.rand(10000,3),green,red,blue,magnifyBlue=True)


================================================
FILE: code/utils/tf_util.py
================================================
""" Wrapper functions for TensorFlow layers.

Author: Charles R. Qi
Date: November 2017
"""

import numpy as np
import tensorflow as tf

def _variable_on_cpu(name, shape, initializer, use_fp16=False):
  """Helper to create a Variable stored on CPU memory.
  Args:
    name: name of the variable
    shape: list of ints
    initializer: initializer for Variable
  Returns:
    Variable Tensor
  """
  dtype = tf.float16 if use_fp16 else tf.float32
  var = tf.get_variable(name, shape, initializer=initializer, dtype=dtype)
  return var

def _variable_with_weight_decay(name, shape, stddev, wd, use_xavier=True):
  """Helper to create an initialized Variable with weight decay.

  Note that the Variable is initialized with a truncated normal distribution.
  A weight decay is added only if one is specified.

  Args:
    name: name of the variable
    shape: list of ints
    stddev: standard deviation of a truncated Gaussian
    wd: add L2Loss weight decay multiplied by this float. If None, weight
        decay is not added for this Variable.
    use_xavier: bool, whether to use xavier initializer

  Returns:
    Variable Tensor
  """
  if use_xavier:
    initializer = tf.contrib.layers.xavier_initializer()
  else:
    initializer = tf.truncated_normal_initializer(stddev=stddev)
  var = _variable_on_cpu(name, shape, initializer)
  if wd is not None:
    weight_decay = tf.multiply(tf.nn.l2_loss(var), wd, name='weight_loss')
    tf.add_to_collection('losses', weight_decay)
  return var


def conv1d(inputs,
           num_output_channels,
           kernel_size,
           scope,
           stride=1,
           padding='SAME',
           use_xavier=True,
           stddev=1e-3,
           weight_decay=0.0,
           activation_fn=tf.nn.relu,
           bn=False,
           bn_decay=None,
           is_training=None,
           use_bias = True):
  """ 1D convolution with non-linear operation.

  Args:
    inputs: 3-D tensor variable BxLxC
    num_output_channels: int
    kernel_size: int
    scope: string
    stride: int
    padding: 'SAME' or 'VALID'
    use_xavier: bool, use xavier_initializer if true
    stddev: float, stddev for truncated_normal init
    weight_decay: float
    activation_fn: function
    bn: bool, whether to use batch norm
    bn_decay: float or float tensor variable in [0,1]
    is_training: bool Tensor variable

  Returns:
    Variable tensor
  """
  with tf.variable_scope(scope) as sc:
    num_in_channels = inputs.get_shape()[-1].value
    kernel_shape = [kernel_size,
                    num_in_channels, num_output_channels]
    kernel = _variable_with_weight_decay('weights',
                                         shape=kernel_shape,
                                         use_xavier=use_xavier,
                                         stddev=stddev,
                                         wd=weight_decay)
    outputs = tf.nn.conv1d(inputs, kernel,
                           stride=stride,
                           padding=padding)
    if use_bias:
        biases = _variable_on_cpu('biases', [num_output_channels],
                                tf.constant_initializer(0.0))
        outputs = tf.nn.bias_add(outputs, biases)

    if bn:
      outputs = batch_norm_for_conv1d(outputs, is_training,
                                      bn_decay=bn_decay, scope='bn')

    if activation_fn is not None:
      outputs = activation_fn(outputs)
    return outputs


def conv2d(inputs,
           num_output_channels,
           kernel_size,
           scope,
           stride=[1, 1],
           padding='SAME',
           use_xavier=True,
           stddev=1e-3,
           weight_decay=0.00001,
           activation_fn=tf.nn.relu,
           bn=False,
           bn_decay=None,
           is_training=None):
  """ 2D convolution with non-linear operation.

  Args:
    inputs: 4-D tensor variable BxHxWxC
    num_output_channels: int
    kernel_size: a list of 2 ints
    scope: string
    stride: a list of 2 ints
    padding: 'SAME' or 'VALID'
    use_xavier: bool, use xavier_initializer if true
    stddev: float, stddev for truncated_normal init
    weight_decay: float
    activation_fn: function
    bn: bool, whether to use batch norm
    bn_decay: float or float tensor variable in [0,1]
    is_training: bool Tensor variable

  Returns:
    Variable tensor
  """
  with tf.variable_scope(scope) as sc:
      kernel_h, kernel_w = kernel_size
      num_in_channels = inputs.get_shape()[-1].value
      kernel_shape = [kernel_h, kernel_w,
                      num_in_channels, num_output_channels]
      kernel = _variable_with_weight_decay('weights',
                                           shape=kernel_shape,
                                           use_xavier=use_xavier,
                                           stddev=stddev,
                                           wd=weight_decay)
      stride_h, stride_w = stride
      outputs = tf.nn.conv2d(inputs, kernel,
                             [1, stride_h, stride_w, 1],
                             padding=padding)
      biases = _variable_on_cpu('biases', [num_output_channels],
                                tf.constant_initializer(0.0))
      outputs = tf.nn.bias_add(outputs, biases)

      if bn:
        outputs = batch_norm_for_conv2d(outputs, is_training,
                                        bn_decay=bn_decay, scope='bn')

      if activation_fn is not None:
        outputs = activation_fn(outputs)
      return outputs


def conv2d_transpose(inputs,
                     num_output_channels,
                     kernel_size,
                     scope,
                     stride=[1, 1],
                     padding='SAME',
                     use_xavier=True,
                     stddev=1e-3,
                     weight_decay=0.0,
                     activation_fn=tf.nn.relu,
                     bn=False,
                     bn_decay=None,
                     is_training=None):
  """ 2D convolution transpose with non-linear operation.

  Args:
    inputs: 4-D tensor variable BxHxWxC
    num_output_channels: int
    kernel_size: a list of 2 ints
    scope: string
    stride: a list of 2 ints
    padding: 'SAME' or 'VALID'
    use_xavier: bool, use xavier_initializer if true
    stddev: float, stddev for truncated_normal init
    weight_decay: float
    activation_fn: function
    bn: bool, whether to use batch norm
    bn_decay: float or float tensor variable in [0,1]
    is_training: bool Tensor variable

  Returns:
    Variable tensor

  Note: conv2d(conv2d_transpose(a, num_out, ksize, stride), a.shape[-1], ksize, stride) == a
  """
  with tf.variable_scope(scope) as sc:
      kernel_h, kernel_w = kernel_size
      num_in_channels = inputs.get_shape()[-1].value
      kernel_shape = [kernel_h, kernel_w,
                      num_output_channels, num_in_channels] # reversed to conv2d
      kernel = _variable_with_weight_decay('weights',
                                           shape=kernel_shape,
                                           use_xavier=use_xavier,
                                           stddev=stddev,
                                           wd=weight_decay)
      stride_h, stride_w = stride
      
      # from slim.convolution2d_transpose
      def get_deconv_dim(dim_size, stride_size, kernel_size, padding):
          dim_size *= stride_size

          if padding == 'VALID' and dim_size is not None:
            dim_size += max(kernel_size - stride_size, 0)
          return dim_size

      # caculate output shape
      batch_size = inputs.get_shape()[0].value
      height = inputs.get_shape()[1].value
      width = inputs.get_shape()[2].value
      out_height = get_deconv_dim(height, stride_h, kernel_h, padding)
      out_width = get_deconv_dim(width, stride_w, kernel_w, padding)
      output_shape = [batch_size, out_height, out_width, num_output_channels]

      outputs = tf.nn.conv2d_transpose(inputs, kernel, output_shape,
                             [1, stride_h, stride_w, 1],
                             padding=padding)
      biases = _variable_on_cpu('biases', [num_output_channels],
                                tf.constant_initializer(0.0))
      outputs = tf.nn.bias_add(outputs, biases)

      if bn:
        outputs = batch_norm_for_conv2d(outputs, is_training,
                                        bn_decay=bn_decay, scope='bn')

      if activation_fn is not None:
        outputs = activation_fn(outputs)
      return outputs

   
def conv3d(inputs,
           num_output_channels,
           kernel_size,
           scope,
           stride=[1, 1, 1],
           padding='SAME',
           use_xavier=True,
           stddev=1e-3,
           weight_decay=0.0,
           activation_fn=tf.nn.relu,
           bn=False,
           bn_decay=None,
           is_training=None):
  """ 3D convolution with non-linear operation.

  Args:
    inputs: 5-D tensor variable BxDxHxWxC
    num_output_channels: int
    kernel_size: a list of 3 ints
    scope: string
    stride: a list of 3 ints
    padding: 'SAME' or 'VALID'
    use_xavier: bool, use xavier_initializer if true
    stddev: float, stddev for truncated_normal init
    weight_decay: float
    activation_fn: function
    bn: bool, whether to use batch norm
    bn_decay: float or float tensor variable in [0,1]
    is_training: bool Tensor variable

  Returns:
    Variable tensor
  """
  with tf.variable_scope(scope) as sc:
    kernel_d, kernel_h, kernel_w = kernel_size
    num_in_channels = inputs.get_shape()[-1].value
    kernel_shape = [kernel_d, kernel_h, kernel_w,
                    num_in_channels, num_output_channels]
    kernel = _variable_with_weight_decay('weights',
                                         shape=kernel_shape,
                                         use_xavier=use_xavier,
                                         stddev=stddev,
                                         wd=weight_decay)
    stride_d, stride_h, stride_w = stride
    outputs = tf.nn.conv3d(inputs, kernel,
                           [1, stride_d, stride_h, stride_w, 1],
                           padding=padding)
    biases = _variable_on_cpu('biases', [num_output_channels],
                              tf.constant_initializer(0.0))
    outputs = tf.nn.bias_add(outputs, biases)
    
    if bn:
      outputs = batch_norm_for_conv3d(outputs, is_training,
                                      bn_decay=bn_decay, scope='bn')

    if activation_fn is not None:
      outputs = activation_fn(outputs)
    return outputs

def fully_connected(inputs,
                    num_outputs,
                    scope,
                    use_xavier=True,
                    stddev=1e-3,
                    weight_decay=0.0,
                    activation_fn=tf.nn.relu,
                    bn=False,
                    bn_decay=None,
                    is_training=None):
  """ Fully connected layer with non-linear operation.
  
  Args:
    inputs: 2-D tensor BxN
    num_outputs: int
  
  Returns:
    Variable tensor of size B x num_outputs.
  """
  with tf.variable_scope(scope) as sc:
    num_input_units = inputs.get_shape()[-1].value
    weights = _variable_with_weight_decay('weights',
                                          shape=[num_input_units, num_outputs],
                                          use_xavier=use_xavier,
                                          stddev=stddev,
                                          wd=weight_decay)
    outputs = tf.matmul(inputs, weights)
    biases = _variable_on_cpu('biases', [num_outputs],
                             tf.constant_initializer(0.0))
    outputs = tf.nn.bias_add(outputs, biases)
     
    if bn:
      outputs = batch_norm_for_fc(outputs, is_training, bn_decay, 'bn')

    if activation_fn is not None:
      outputs = activation_fn(outputs)
    return outputs


def max_pool2d(inputs,
               kernel_size,
               scope,
               stride=[2, 2],
               padding='VALID'):
  """ 2D max pooling.

  Args:
    inputs: 4-D tensor BxHxWxC
    kernel_size: a list of 2 ints
    stride: a list of 2 ints
  
  Returns:
    Variable tensor
  """
  with tf.variable_scope(scope) as sc:
    kernel_h, kernel_w = kernel_size
    stride_h, stride_w = stride
    outputs = tf.nn.max_pool(inputs,
                             ksize=[1, kernel_h, kernel_w, 1],
                             strides=[1, stride_h, stride_w, 1],
                             padding=padding,
                             name=sc.name)
    return outputs

def avg_pool2d(inputs,
               kernel_size,
               scope,
               stride=[2, 2],
               padding='VALID'):
  """ 2D avg pooling.

  Args:
    inputs: 4-D tensor BxHxWxC
    kernel_size: a list of 2 ints
    stride: a list of 2 ints
  
  Returns:
    Variable tensor
  """
  with tf.variable_scope(scope) as sc:
    kernel_h, kernel_w = kernel_size
    stride_h, stride_w = stride
    outputs = tf.nn.avg_pool(inputs,
                             ksize=[1, kernel_h, kernel_w, 1],
                             strides=[1, stride_h, stride_w, 1],
                             padding=padding,
                             name=sc.name)
    return outputs


def max_pool3d(inputs,
               kernel_size,
               scope,
               stride=[2, 2, 2],
               padding='VALID'):
  """ 3D max pooling.

  Args:
    inputs: 5-D tensor BxDxHxWxC
    kernel_size: a list of 3 ints
    stride: a list of 3 ints
  
  Returns:
    Variable tensor
  """
  with tf.variable_scope(scope) as sc:
    kernel_d, kernel_h, kernel_w = kernel_size
    stride_d, stride_h, stride_w = stride
    outputs = tf.nn.max_pool3d(inputs,
                               ksize=[1, kernel_d, kernel_h, kernel_w, 1],
                               strides=[1, stride_d, stride_h, stride_w, 1],
                               padding=padding,
                               name=sc.name)
    return outputs

def avg_pool3d(inputs,
               kernel_size,
               scope,
               stride=[2, 2, 2],
               padding='VALID'):
  """ 3D avg pooling.

  Args:
    inputs: 5-D tensor BxDxHxWxC
    kernel_size: a list of 3 ints
    stride: a list of 3 ints
  
  Returns:
    Variable tensor
  """
  with tf.variable_scope(scope) as sc:
    kernel_d, kernel_h, kernel_w = kernel_size
    stride_d, stride_h, stride_w = stride
    outputs = tf.nn.avg_pool3d(inputs,
                               ksize=[1, kernel_d, kernel_h, kernel_w, 1],
                               strides=[1, stride_d, stride_h, stride_w, 1],
                               padding=padding,
                               name=sc.name)
    return outputs


def batch_norm_template(inputs, is_training, scope, moments_dims, bn_decay):
  """ Batch normalization on convolutional maps and beyond...
  Ref.: http://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow
  
  Args:
      inputs:        Tensor, k-D input ... x C could be BC or BHWC or BDHWC
      is_training:   boolean tf.Varialbe, true indicates training phase
      scope:         string, variable scope
      moments_dims:  a list of ints, indicating dimensions for moments calculation
      bn_decay:      float or float tensor variable, controling moving average weight
  Return:
      normed:        batch-normalized maps
  """
  # For support of GAN
  #bn_decay = bn_decay if bn_decay is not None else 0.9
  #return tf.contrib.layers.batch_norm(inputs, 
  #                                    center=True, scale=True, 
  #                                    is_training=is_training, decay=bn_decay,updates_collections=None,
  #                                    scope=scope)
  with tf.variable_scope(scope) as sc:
    num_channels = inputs.get_shape()[-1].value
    beta = tf.Variable(tf.constant(0.0, shape=[num_channels]),
                       name='beta', trainable=True)
    gamma = tf.Variable(tf.constant(1.0, shape=[num_channels]),
                        name='gamma', trainable=True)
    
    batch_mean, batch_var = tf.nn.moments(inputs, moments_dims, name='moments')
    decay = bn_decay if bn_decay is not None else 0.9
    ema = tf.train.ExponentialMovingAverage(decay=decay)
    # Operator that maintains moving averages of variables.
    # Need to set reuse=False, otherwise if reuse, will see moments_1/mean/ExponentialMovingAverage/ does not exist
    # https://github.com/shekkizh/WassersteinGAN.tensorflow/issues/3
    with tf.variable_scope(tf.get_variable_scope(), reuse=False):
        ema_apply_op = tf.cond(is_training,
                               lambda: ema.apply([batch_mean, batch_var]),
                               lambda: tf.no_op())
    
    # Update moving average and return current batch's avg and var.
    def mean_var_with_update():
      with tf.control_dependencies([ema_apply_op]):
        return tf.identity(batch_mean), tf.identity(batch_var)
    
    # ema.average returns the Variable holding the average of var.
    mean, var = tf.cond(is_training,
                        mean_var_with_update,
                        lambda: (ema.average(batch_mean), ema.average(batch_var)))
    normed = tf.nn.batch_normalization(inputs, mean, var, beta, gamma, 1e-3)
  return normed

def batch_norm_for_fc(inputs, is_training, bn_decay, scope):
  """ Batch normalization on FC data.
  
  Args:
      inputs:      Tensor, 2D BxC input
      is_training: boolean tf.Varialbe, true indicates training phase
      bn_decay:    float or float tensor variable, controling moving average weight
      scope:       string, variable scope
  Return:
      normed:      batch-normalized maps
  """
  return batch_norm_template(inputs, is_training, scope, [0,], bn_decay)


def batch_norm_for_conv1d(inputs, is_training, bn_decay, scope):
  """ Batch normalization on 1D convolutional maps.
  
  Args:
      inputs:      Tensor, 3D BLC input maps
      is_training: boolean tf.Varialbe, true indicates training phase
      bn_decay:    float or float tensor variable, controling moving average weight
      scope:       string, variable scope
  Return:
      normed:      batch-normalized maps
  """
  return batch_norm_template(inputs, is_training, scope, [0,1], bn_decay)


def batch_norm_for_conv2d(inputs, is_training, bn_decay, scope):
  """ Batch normalization on 2D convolutional maps.
  
  Args:
      inputs:      Tensor, 4D BHWC input maps
      is_training: boolean tf.Varialbe, true indicates training phase
      bn_decay:    float or float tensor variable, controling moving average weight
      scope:       string, variable scope
  Return:
      normed:      batch-normalized maps
  """
  return batch_norm_template(inputs, is_training, scope, [0,1,2], bn_decay)


def batch_norm_for_conv3d(inputs, is_training, bn_decay, scope):
  """ Batch normalization on 3D convolutional maps.
  
  Args:
      inputs:      Tensor, 5D BDHWC input maps
      is_training: boolean tf.Varialbe, true indicates training phase
      bn_decay:    float or float tensor variable, controling moving average weight
      scope:       string, variable scope
  Return:
      normed:      batch-normalized maps
  """
  return batch_norm_template(inputs, is_training, scope, [0,1,2,3], bn_decay)


def dropout(inputs,
            is_training,
            scope,
            keep_prob=0.5,
            noise_shape=None):
  """ Dropout layer.

  Args:
    inputs: tensor
    is_training: boolean tf.Variable
    scope: string
    keep_prob: float in [0,1]
    noise_shape: list of ints

  Returns:
    tensor variable
  """
  with tf.variable_scope(scope) as sc:
    outputs = tf.cond(is_training,
                      lambda: tf.nn.dropout(inputs, keep_prob, noise_shape),
                      lambda: inputs)
    return outputs


================================================
FILE: code/utils/tf_util2.py
================================================
import tensorflow as tf

def lrelu(x, alpha=0.2):
  return tf.nn.relu(x) - alpha * tf.nn.relu(-x)


# def lrelu2(x, leak=0.2, name="lrelu"):
#     with tf.variable_scope(name):
#         f1 = 0.5 * (1 + leak)
#         f2 = 0.5 * (1 - leak)
#         return f1 * x + f2 * abs(x)

def instance_norm(net, train=True,weight_decay=0.00001):
    batch, rows, cols, channels = [i.value for i in net.get_shape()]
    var_shape = [channels]
    mu, sigma_sq = tf.nn.moments(net, [1, 2], keep_dims=True)

    shift = tf.get_variable('shift',shape=var_shape,
                            initializer=tf.zeros_initializer,
                            regularizer=tf.contrib.layers.l2_regularizer(weight_decay))
    scale = tf.get_variable('scale', shape=var_shape,
                            initializer=tf.ones_initializer,
                            regularizer=tf.contrib.layers.l2_regularizer(weight_decay))
    epsilon = 1e-3
    normalized = (net - mu) / tf.square(sigma_sq + epsilon)
    return scale * normalized + shift

def conv2d(inputs,
           num_output_channels,
           kernel_size,
           scope=None,
           stride=[1, 1],
           padding='SAME',
           use_xavier=True,
           stddev=1e-3,
           weight_decay=0.00001,
           activation_fn=tf.nn.relu,
           bn=False,
           ibn = False,
           bn_decay=None,
           use_bias = True,
           is_training=None,
           reuse=None):
  """ 2D convolution with non-linear operation.

  Args:
    inputs: 4-D tensor variable BxHxWxC
    num_output_channels: int
    kernel_size: a list of 2 ints
    scope: string
    stride: a list of 2 ints
    padding: 'SAME' or 'VALID'
    use_xavier: bool, use xavier_initializer if true
    stddev: float, stddev for truncated_normal init
    weight_decay: float
    activation_fn: function
    bn: bool, whether to use batch norm
    bn_decay: float or float tensor variable in [0,1]
    is_training: bool Tensor variable

  Returns:
    Variable tensor
  """
  with tf.variable_scope(scope,reuse=reuse) as sc:
      if use_xavier:
          initializer = tf.contrib.layers.xavier_initializer()
      else:
          initializer = tf.truncated_normal_initializer(stddev=stddev)

      outputs = tf.layers.conv2d(inputs,num_output_channels,kernel_size,stride,padding,
                                 kernel_initializer=initializer,
                                 kernel_regularizer=tf.contrib.layers.l2_regularizer(weight_decay),
                                 bias_regularizer=tf.contrib.layers.l2_regularizer(weight_decay),
                                 use_bias=use_bias,reuse=None)
      assert not (bn and ibn)
      if bn:
          outputs = tf.layers.batch_normalization(outputs,momentum=bn_decay,training=is_training,renorm=False,fused=True)
          #outputs = tf.contrib.layers.batch_norm(outputs,is_training=is_training)
      if ibn:
          outputs = instance_norm(outputs,is_training)


      if activation_fn is not None:
        outputs = activation_fn(outputs)

      return outputs


def fully_connected(inputs,
                    num_outputs,
                    scope,
                    use_xavier=True,
                    stddev=1e-3,
                    weight_decay=0.00001,
                    activation_fn=tf.nn.relu,
                    bn=False,
                    bn_decay=None,
                    use_bias = True,
                    is_training=None):
    """ Fully connected layer with non-linear operation.

    Args:
      inputs: 2-D tensor BxN
      num_outputs: int

    Returns:
      Variable tensor of size B x num_outputs.
    """

    with tf.variable_scope(scope) as sc:
        if use_xavier:
            initializer = tf.contrib.layers.xavier_initializer()
        else:
            initializer = tf.truncated_normal_initializer(stddev=stddev)

        outputs = tf.layers.dense(inputs,num_outputs,
                                  use_bias=use_bias,kernel_initializer=initializer,
                                  kernel_regularizer=tf.contrib.layers.l2_regularizer(weight_decay),
                                  bias_regularizer=tf.contrib.layers.l2_regularizer(weight_decay),
                                  reuse=None)

        if bn:
            outputs = tf.layers.batch_normalization(outputs, momentum=bn_decay, training=is_training, renorm=False)

        if activation_fn is not None:
            outputs = activation_fn(outputs)

        return outputs


================================================
FILE: code/utils/write_result2html.py
================================================
import os
import numpy as np
from tqdm import tqdm
from utils import pc_util
from scipy.misc import imsave

def write_result():
    root_path = "/home/lqyu/server/proj49/PointSR_data/test_data/our_collected_data"
    model_names = ['1024_nonormal_generator2_2', '1024_nonormal_generator2_2_uniformloss',
                   '1024_nonormal_generator2_2_recursive']

    index_path = os.path.join("index.html")
    index = open(index_path, "w")
    index.write("<html><body><table><tr>")
    index.write("<th width='5%%'>name</th>")

    index.write("<tr><th></th>")
    for model in model_names:
        index.write("<th>%s</th>" % model)
    index.write("</tr>")

    # get sample list
    items = os.listdir(root_path + "/" + model_names[0])
    items.sort()

    # mkdir model image path
    for model in model_names:
        if not os.path.exists(root_path + "/" + model + "_three_view_img/"):
            os.makedirs(root_path + "/" + model + "_three_view_img/")

    # write img to file
    for item in tqdm(items):
        index.write("<tr>")
        index.write("<td>%s</td>" % item)

        # write prediction
        for model in model_names:
            path = root_path + "/" + model +"/" + item
            if not os.path.exists(path):
                continue
            img_path = root_path + "/" + model + "_three_view_img/" + item
            img_path = img_path.replace("xyz", "png")
            if not os.path.exists(img_path):
                data = np.loadtxt(path)
                data = data[:, 0:3]
                img = pc_util.point_cloud_three_views(data, diameter=8)
                imsave(img_path, img)
            index.write("<td><img width='100%%', src='%s'></td>" % img_path)
        index.write("</tr>")
    index.close()


def write_result2html_benchmark():
    root_path = "/home/lqyu/server/proj49/PointSR_data/test_data/our_collected_data"
    phase = 'surface_benchmark'
    input_path ="../data/"+phase+"/1024_nonuniform"
    gt_path = "../data/"+phase+"/4096"
    model_names = ['1024_nonormal_generator2_2','1024_nonormal_generator2_2_uniformloss','1024_nonormal_generator2_2_recursive']


    index_path = os.path.join(root_path, phase + "_index.html")
    index = open(index_path, "w")
    index.write("<html><body><table><tr>")
    index.write("<th width='5%%'>name</th><th>Input</th>")
    index.write("<th>Refered GT</th></tr>")

    index.write("<tr><th></th>")
    for model in model_names:
        index.write("<th>%s</th>" % model)
    index.write("</tr>")

    # get sample list
    items = os.listdir(root_path + "/" + model_names[0] + "/result/" + phase)
    items.sort()

    # mkdir model image path
    for model in model_names:
        if not os.path.exists(root_path + "/" + model + "/result/" + phase + "_three_view_img/"):
            os.makedirs(root_path + "/" + model + "/result/" + phase + "_three_view_img/")

    # write img to file
    for item in tqdm(items):
        index.write("<tr>")
        index.write("<td>%s</td>" % item)

        # write input image
        object = item.split("_")[0]
        id = item.split(".")[0]
        path = input_path + "/%s.xyz" % (id)
        img_path = input_path + "_three_view_img/%s.png" % (id)
        if not os.path.exists(input_path + "_three_view_img/"):
            os.makedirs(input_path + "_three_view_img/")
        if not os.path.exists(img_path):
            data = np.loadtxt(path)
            data = data[:, 0:3]
            img = pc_util.point_cloud_three_views(data,diameter=8)
            imsave(img_path, img)
        index.write("<td><img width='100%%', src='%s'></td>" % img_path)
        # write gt image
        path = gt_path + "/%s.xyz" % (id)
        img_path = gt_path + "_three_view_img/%s.png" % (id)
        if not os.path.exists(gt_path + "_three_view_img/"):
            os.makedirs(gt_path + "_three_view_img/")
        if not os.path.exists(img_path):
            data = np.loadtxt(path)
            data = data[:, 0:3]
            img = pc_util.point_cloud_three_views(data,diameter=8)
            imsave(img_path, img)
        index.write("<td><img width='100%%', src='%s'></td>" % img_path)
        index.write("</tr>")

        index.write("<tr><th></th>")
        # write prediction
        for model in model_names:
            path = root_path + "/" + model + "/result/" + phase + "/" + item
            if not os.path.exists(path):
                continue
            img_path = root_path + "/" + model + "/result/" + phase + "_three_view_img/" + item
            img_path = img_path.replace("xyz", "png")
            if not os.path.exists(img_path):
                data = np.loadtxt(path)
                data = data[:, 0:3]
                img = pc_util.point_cloud_three_views(data,diameter=8)
                imsave(img_path, img)
            index.write("<td><img width='100%%', src='%s'></td>" % img_path)
        index.write("</tr>")
    index.close()


def write_result2html_ModelNet():
    root_path = "../model"
    gt_path = "../data/ModelNet10_poisson_normal"
    #gt_path = "../data/Patches"
    model_names = ['1024_generator2_2','new_1024_generator2_2','new_1024_generator2_2_fixed_lr']
    phase = 'test'

    index_path = os.path.join(root_path, phase + "_index.html")
    index = open(index_path, "w")
    index.write("<html><body><table><tr>")
    index.write("<th width='5%%'>name</th><th>Input</th>")
    index.write("<th>Refered GT</th></tr>")

    index.write("<tr><th></th>")
    for model in model_names:
        index.write("<th>%s</th>" % model)
    index.write("</tr>")

    # get sample list
    items = os.listdir(root_path + "/" + model_names[0] + "/result/" + phase)
    items.sort()

    # mkdir model image path
    for model in model_names:
        if not os.path.exists(root_path + "/" + model + "/result/" + phase + "_three_view_img/"):
            os.makedirs(root_path + "/" + model + "/result/" + phase + "_three_view_img/")

    # write img to file
    for item in tqdm(items[::25]):
        index.write("<tr>")
        index.write("<td>%s</td>" % item)

        # write input image
        object = item.split("_")[0]
        id = item.split(".")[0]
        fixed = "%s/1024_nonuniform/%s" % (gt_path, 'train')
        path = fixed + "/%s.xyz" % (id)
        img_path = fixed + "_three_view_img/%s.png" % (id)
        if not os.path.exists(fixed + "_three_view_img/"):
            os.makedirs(fixed + "_three_view_img/")
        if not os.path.exists(img_path):
            data = np.loadtxt(path)
            data = data[:, 0:3]
            img = pc_util.point_cloud_three_views(data,diameter=8)
            imsave(img_path, img)
        index.write("<td><img width='100%%', src='%s'></td>" % img_path)
        # write gt image
        fixed = "%s/4096/%s" % (gt_path, 'train')
        path = fixed + "/%s.xyz" % (id)
        img_path = fixed + "_three_view_img/%s.png" % (id)
        if not os.path.exists(fixed + "_three_view_img/"):
            os.makedirs(fixed + "_three_view_img/")
        if not os.path.exists(img_path):
            data = np.loadtxt(path)
            data = data[:, 0:3]
            img = pc_util.point_cloud_three_views(data,diameter=8)
            imsave(img_path, img)
        index.write("<td><img width='100%%', src='%s'></td>" % img_path)
        index.write("</tr>")

        index.write("<tr><th></th>")
        # write prediction
        for model in model_names:
            path = root_path + "/" + model + "/result/" + phase + "/" + item
            if not os.path.exists(path):
                continue
            img_path = root_path + "/" + model + "/result/" + phase + "_three_view_img/" + item
            img_path = img_path.replace("xyz", "png")
            if not os.path.exists(img_path):
                data = np.loadtxt(path)
                data = data[:, 0:3]
                img = pc_util.point_cloud_three_views(data,diameter=8)
                imsave(img_path, img)
            index.write("<td><img width='100%%', src='%s'></td>" % img_path)
        index.write("</tr>")
    index.close()

if __name__ == '__main__':
    write_result2html_ModelNet()
    #write_result2html_benchmark()
    #calculate_emd_error('ModelNet40')


================================================
FILE: docs/.nojekyll
================================================


================================================
FILE: docs/_site/css/main.css
================================================
body { margin: 60px auto; width: 70%; }

a { text-decoration: none; color: #999; }
a:hover { text-decoration: underline; }*/

p, ul { font-size: 1.5em; line-height: 1.4em; color: #333; }

h1, h2, h3, h4 { font-family: 'Helvetica', 'Arial', 'Sans-Serif'; }
h1 { font-size: 3em;  }
h2 { font-size: 2.7em; }
h3 { font-size: 2.3em; }
h4 { font-size: 1.9em; }

nav ul, footer ul { font-size: 1em; font-family: 'Helvetica', 'Arial', 'Sans-Serif'; padding: 0px; list-style: none; font-weight: bold; }
nav ul li, footer ul li { display: inline; margin-right: 20px; }

footer { border-top: 1px solid #d5d5d5; font-size: .8em; }

/* Blog */
ul.posts { margin: 20px auto 40px; font-size: 1.5em; }
ul.posts li { list-style: none; }

/* CV */
.cv { margin: 0px 0px 60px; }
.cv h1 { font-size: 3em; }
.cv h2 { font-size: 2em; }
.cv address, .cv .download { font-family: 'Helvetica', 'Arial', 'Sans-Serif'; }
.cv address, .cv p { font-size: 1.2em; }
.cv .download { float: right; font-size: .8em; text-transform: uppercase; }


================================================
FILE: docs/_site/css/project.css
================================================

@import url('https://fonts.googleapis.com/css?family=Lato:400,400i,700,900&subset=latin-ext');

body {
  font-family: "Lato", sans-serif;
  color: #322f30;
  background: #fff;
  -webkit-font-smoothing: antialiased;
  margin: 10px auto; width: 70%;
}

a {
  text-decoration: none;
  color: #001f3f;
}


a:link, a:visited{
  color: #39CCCC;
  /* text-decoration: underline; */
}

a:hover {
  text-decoration: underline;
  color: #001f3f;
}

a.paper {
    font-size:
    text-decoration: none;
    color: #001f3f;
}


a.paper:link, a.paper:visited{
  color: #322f30;
  /* text-decoration: underline; */
}

a.paper:hover {
  text-decoration: underline;
  color: #39CCCC;
}


anchor{
    line-height: 0;
    font-size: 0;
    color: transparent;
}

/*p, ul { font-size: 1.5em; line-height: 1.4em; color: #fff; }*/
p { font-size: 1.25em; line-height: 1.2em; color: #322f30; font-weight: 200; text-align: justify;}
p2{ font-size: 1.0em; line-height: 0.9em; color: #322f30; font-weight: 200; text-align: justify; margin-right: 20px}
ul { font-size: 1.5em; line-height: 1.4em; color: #322f30;}

h2, h3, h4 { font-family: 'Lato', 'Sans-Serif';
                font-weight:300; }
/* h1 { font-size: 3em;  } */
h2 { font-size: 2.7em; }
h3 { font-size: 2.3em; margin-top: 0.8em; margin-bottom: 0.8em; text-align: center;}
h4 { font-size: 1.9em; margin-top: 0.8em; margin-bottom: 0.8em; text-align: center;}

papertitle {
font-family: 'Lato', Verdana, Helvetica, sans-serif;
font-size: 14px;
font-weight: 700
}

/* video {

  width: 100% !important;
  height: auto !important;
  margin: 0 auto;
  display: block;
} */

hr{border: 0px; height: 1px; background-image: linear-gradient(to right, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0)); }

.center {
  padding-top: 20px;
  padding-bottom: 20px;
  width: 90% !important;
  height: 90% !important;
  margin: 0 auto;
  display: block;
}

h5{ font-family: 'Lato', 'Sans-Serif';font-weight:300; font-size: 3em; text-align: center; margin-top: 0.8em; margin-bottom: 0.8em;}
h1{ font-family: 'Lato', 'Sans-Serif';font-weight:400; font-size: 3em; text-align: center; margin-top: 0.8em; margin-bottom: 0.8em;}

/*nav ul{ font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 20px; list-style: none; text-align: center;}*/
nav ul{ font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 20px;list-style: none; text-align: center; margin-top: 10px;}
footer ul { font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 0px; list-style: none;}
/*nav ul li{ display: inline; margin-right: 20px; padding: 20px;}*/
nav ul li{ display: inline; padding: 20px;}
/*footer div ul li{ display: inline; margin-right: 20px; text-align: left;}*/
footer div ul li{ display: inline; margin-right: 20px; text-align: left;}

footer { border-top: 1px solid #d5d5d5; font-size: .8em; }

/* Blog */
ul.posts { margin: 20px auto 40px; font-size: 1.5em; }
ul.posts li { list-style: none; }

/* CV */
.cv { margin: 0px 0px 60px; }
.cv h1 { font-size: 3em; }
.cv h2 { font-size: 2em; }
.cv address, .cv .download { font-family: 'Lato', 'Arial', 'Sans-Serif'; }
.cv address, .cv p { font-size: 1.2em; }
.cv .download { float: right; font-size: .8em; text-transform: uppercase; }

img.center_thumbnail{
    margin: 0 auto;
    display: block;
}


img {
    display: inline;
    margin: 0 auto;
    height: 100%;
    width: auto;
}


================================================
FILE: docs/_site/css/project_file.css
================================================
@import url(https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.css);

/*body {
  font-family: "Lato", sans-serif;
  color: #fff;
  background: #322f30;
  -webkit-font-smoothing: antialiased;
  margin: 20px auto; width: 70%;
}*/

body {
  font-family: "Lato", sans-serif;
  color: #fff;
  background: #322f30;
  -webkit-font-smoothing: antialiased;
  margin: 10px auto; width: 70%;
}

a {
  text-decoration: none;
  color: #fff;
}


a:link, a:visited{
  color: #7fdbff;
  /*text-decoration: underline;*/
}

a:hover {
  text-decoration: underline;
}

/*p, ul { font-size: 1.5em; line-height: 1.4em; color: #fff; }*/
p { font-size: 1.25em; line-height: 1.2em; color: #fff; font-weight: 200; text-align: justify;}
ul { font-size: 1.5em; line-height: 1.4em; color: #fff;}

h1, h2, h3, h4 { font-family: 'Lato', 'Sans-Serif';
                font-weight:300; }
h1 { font-size: 3em;  }
h2 { font-size: 2.7em; }
h3 { font-size: 2.3em; }
h4 { font-size: 1.9em; }

h5{ font-family: 'Lato', 'Sans-Serif';font-weight:300; font-size: 3em; text-align: center; margin-top: 0.8em; margin-bottom: 0.8em;}


/*nav ul{ font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 20px; list-style: none; text-align: center;}*/
nav ul{ font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 20px;list-style: none; text-align: center; margin-top: 10px;}
footer ul { font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 0px; list-style: none;}
/*nav ul li{ display: inline; margin-right: 20px; padding: 20px;}*/
nav ul li{ display: inline; padding: 20px;}
/*footer div ul li{ display: inline; margin-right: 20px; text-align: left;}*/
footer div ul li{ display: inline; margin-right: 20px; text-align: left;}

footer { border-top: 1px solid #d5d5d5; font-size: .8em; }


================================================
FILE: docs/_site/index.html
================================================
<!DOCTYPE html>

<html>
	<!-- <meta name=viewport content=“width=800”> -->
	<head>
		<title>Ganesh Iyer</title>
		<!-- link to main stylesheet -->
		<link href="https://fonts.googleapis.com/css?family=Lato:400,400i,700,900&subset=latin-ext" rel="stylesheet">
		<link rel="stylesheet" type="text/css" href="/css/project.css">
	</head>
	<body>

		<h1>CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks</h5>

		<nav>
    		<ul>
        		<li><a href="https://epiception.github.io/">Ganesh Iyer<sup>1</sup></a></li>
        		<li><a href="http://karnikram.info/">Karnik Ram<sup>1</sup></a></li>
        		<li><a href="http://krrish94.github.io/">Krishna Murthy<sup>2</sup></a></li>
				<li><a href="https://www.iiit.ac.in/people/faculty/mkrishna/">K. Madhava Krishna<sup>1</sup></a></li>
				<li><p2>1. Robotics Research Center, International Institute of Information and Technology</p2></li>
				<li><p2>2. Montreal Institute of Learning Algorithms, Universit‌&#233; de Montr‌&#233;al</p2></li>
    		</ul>
		</nav>

		<div>
			<img src="/assets/teaser.png" alt="Paper Teaser" class = "center">
		</div>

				<p>3D LiDARs and 2D cameras are increasingly being
		used alongside each other in sensor rigs for perception tasks.
		Before these sensors can be used to gather meaningful data,
		however, their extrinsics (and intrinsics) need to be accurately
		calibrated, as the performance of the sensor rig is extremely
		sensitive to these calibration parameters. A vast majority of
		existing calibration techniques require significant amounts of
		data and/or calibration targets and human effort, severely
		impacting their applicability in large-scale production systems.
		We address this gap with CalibNet: a self-supervised deep
		network capable of automatically estimating the 6-DoF rigid
		body transformation between a 3D LiDAR and a 2D camera in
		real-time. CalibNet alleviates the need for calibration targets,
		thereby resulting in significant savings in calibration efforts.
		During training, the network only takes as input a LiDAR point
		cloud, the corresponding monocular image, and the camera
		calibration matrix K. At train time, we do not impose direct
		supervision (i.e., we do not directly regress to the calibration
		parameters, for example). Instead, we train the network to
		predict calibration parameters that maximize the geometric and
		photometric consistency of the input images and point clouds.
		CalibNet learns to iteratively solve the underlying geometric
		problem and accurately predicts extrinsic calibration parameters for a wide range of mis-calibrations, without requiring
		retraining or domain adaptation.</p>

		<hr>
			<h3> Video </h3>
		</hr>

		<div>
	 		<video class="center" src="assets/video.mp4" controls width="1200" height="450">⁪</video>
		</div>

		<hr>
			<a href="https://github.com/epiception/CalibNet">
			<h3> Code </h3></a>
			<h4>Coming Soon!</h4>
		</hr>

		<hr>
			<h3> Paper </h3>
			<a class="paper" href="/">
 				<img class="center_thumbnail" src="assets/pdf_thumbnail.png" alt="Paper Thumbnail" width="1" height="2" border-style = "dotted" border = 1px>
			</a>
			<h4>Coming Soon!</h4>
		</hr>


	</body>
</html>


================================================
FILE: docs/css/main.css
================================================
body { margin: 60px auto; width: 70%; }

a { text-decoration: none; color: #999; }
a:hover { text-decoration: underline; }*/

p, ul { font-size: 1.5em; line-height: 1.4em; color: #333; }

h1, h2, h3, h4 { font-family: 'Helvetica', 'Arial', 'Sans-Serif'; }
h1 { font-size: 3em;  }
h2 { font-size: 2.7em; }
h3 { font-size: 2.3em; }
h4 { font-size: 1.9em; }

nav ul, footer ul { font-size: 1em; font-family: 'Helvetica', 'Arial', 'Sans-Serif'; padding: 0px; list-style: none; font-weight: bold; }
nav ul li, footer ul li { display: inline; margin-right: 20px; }

footer { border-top: 1px solid #d5d5d5; font-size: .8em; }

/* Blog */
ul.posts { margin: 20px auto 40px; font-size: 1.5em; }
ul.posts li { list-style: none; }

/* CV */
.cv { margin: 0px 0px 60px; }
.cv h1 { font-size: 3em; }
.cv h2 { font-size: 2em; }
.cv address, .cv .download { font-family: 'Helvetica', 'Arial', 'Sans-Serif'; }
.cv address, .cv p { font-size: 1.2em; }
.cv .download { float: right; font-size: .8em; text-transform: uppercase; }


================================================
FILE: docs/css/project.css
================================================

@import url('https://fonts.googleapis.com/css?family=Lato:400,400i,700,900&subset=latin-ext');

body {
  font-family: "Lato", sans-serif;
  color: #322f30;
  background: #fff;
  -webkit-font-smoothing: antialiased;
  margin: 10px auto; width: 70%;
}

a {
  text-decoration: none;
  color: #001f3f;
}


a:link, a:visited{
  color: #39CCCC;
  /* text-decoration: underline; */
}

a:hover {
  text-decoration: underline;
  color: #001f3f;
}

a.paper {
    font-size:
    text-decoration: none;
    color: #001f3f;
}


a.paper:link, a.paper:visited{
  color: #322f30;
  /* text-decoration: underline; */
}

a.paper:hover {
  text-decoration: underline;
  color: #39CCCC;
}


anchor{
    line-height: 0;
    font-size: 0;
    color: transparent;
}

/*p, ul { font-size: 1.5em; line-height: 1.4em; color: #fff; }*/
p { font-size: 1.25em; line-height: 1.2em; color: #322f30; font-weight: 200; text-align: justify;}
p2{ font-size: 1.0em; line-height: 0.9em; color: #322f30; font-weight: 200; text-align: justify; margin-right: 20px}
ul { font-size: 1.5em; line-height: 1.4em; color: #322f30;}

h2, h3, h4 { font-family: 'Lato', 'Sans-Serif';
                font-weight:300; }
/* h1 { font-size: 3em;  } */
h2 { font-size: 2.7em; }
h3 { font-size: 2.3em; margin-top: 0.8em; margin-bottom: 0.8em; text-align: center;}
h4 { font-size: 1.9em; margin-top: 0.8em; margin-bottom: 0.8em; text-align: center;}

papertitle {
font-family: 'Lato', Verdana, Helvetica, sans-serif;
font-size: 14px;
font-weight: 700
}

/* video {

  width: 100% !important;
  height: auto !important;
  margin: 0 auto;
  display: block;
} */

hr{border: 0px; height: 1px; background-image: linear-gradient(to right, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.75), rgba(0, 0, 0, 0)); }

.center {
  padding-top: 20px;
  padding-bottom: 20px;
  width: 90% !important;
  height: 90% !important;
  margin: 0 auto;
  display: block;
}

h5{ font-family: 'Lato', 'Sans-Serif';font-weight:300; font-size: 3em; text-align: center; margin-top: 0.8em; margin-bottom: 0.8em;}
h1{ font-family: 'Lato', 'Sans-Serif';font-weight:400; font-size: 3em; text-align: center; margin-top: 0.8em; margin-bottom: 0.8em;}

/*nav ul{ font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 20px; list-style: none; text-align: center;}*/
nav ul{ font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 20px;list-style: none; text-align: center; margin-top: 10px;}
footer ul { font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 0px; list-style: none;}
/*nav ul li{ display: inline; margin-right: 20px; padding: 20px;}*/
nav ul li{ display: inline; padding: 20px;}
/*footer div ul li{ display: inline; margin-right: 20px; text-align: left;}*/
footer div ul li{ display: inline; margin-right: 20px; text-align: left;}

footer { border-top: 1px solid #d5d5d5; font-size: .8em; }

/* Blog */
ul.posts { margin: 20px auto 40px; font-size: 1.5em; }
ul.posts li { list-style: none; }

/* CV */
.cv { margin: 0px 0px 60px; }
.cv h1 { font-size: 3em; }
.cv h2 { font-size: 2em; }
.cv address, .cv .download { font-family: 'Lato', 'Arial', 'Sans-Serif'; }
.cv address, .cv p { font-size: 1.2em; }
.cv .download { float: right; font-size: .8em; text-transform: uppercase; }

img.center_thumbnail{
    margin: 0 auto;
    display: block;
}


img {
    display: inline;
    margin: 0 auto;
    height: 100%;
    width: auto;
}


================================================
FILE: docs/css/project_file.css
================================================
@import url(https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.css);

/*body {
  font-family: "Lato", sans-serif;
  color: #fff;
  background: #322f30;
  -webkit-font-smoothing: antialiased;
  margin: 20px auto; width: 70%;
}*/

body {
  font-family: "Lato", sans-serif;
  color: #fff;
  background: #322f30;
  -webkit-font-smoothing: antialiased;
  margin: 10px auto; width: 70%;
}

a {
  text-decoration: none;
  color: #fff;
}


a:link, a:visited{
  color: #7fdbff;
  /*text-decoration: underline;*/
}

a:hover {
  text-decoration: underline;
}

/*p, ul { font-size: 1.5em; line-height: 1.4em; color: #fff; }*/
p { font-size: 1.25em; line-height: 1.2em; color: #fff; font-weight: 200; text-align: justify;}
ul { font-size: 1.5em; line-height: 1.4em; color: #fff;}

h1, h2, h3, h4 { font-family: 'Lato', 'Sans-Serif';
                font-weight:300; }
h1 { font-size: 3em;  }
h2 { font-size: 2.7em; }
h3 { font-size: 2.3em; }
h4 { font-size: 1.9em; }

h5{ font-family: 'Lato', 'Sans-Serif';font-weight:300; font-size: 3em; text-align: center; margin-top: 0.8em; margin-bottom: 0.8em;}


/*nav ul{ font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 20px; list-style: none; text-align: center;}*/
nav ul{ font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 20px;list-style: none; text-align: center; margin-top: 10px;}
footer ul { font-size: 1.5em; font-family: 'Lato', 'Arial', 'Sans-Serif'; padding: 0px; list-style: none;}
/*nav ul li{ display: inline; margin-right: 20px; padding: 20px;}*/
nav ul li{ display: inline; padding: 20px;}
/*footer div ul li{ display: inline; margin-right: 20px; text-align: left;}*/
footer div ul li{ display: inline; margin-right: 20px; text-align: left;}

footer { border-top: 1px solid #d5d5d5; font-size: .8em; }


================================================
FILE: docs/index.html
================================================
<!DOCTYPE html>

<html>
	<!-- <meta name=viewport content=“width=800”> -->
	<head>
		<title>CalibNet</title>
		<!-- link to main stylesheet -->
		<link href="https://fonts.googleapis.com/css?family=Lato:400,400i,700,900&subset=latin-ext" rel="stylesheet">
		<link rel="stylesheet" type="text/css" href="./css/project.css">
	</head>
	<body>

		<h1>CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks</h5>

		<nav>
    		<ul>
        		<li><a href="https://epiception.github.io/">Ganesh Iyer<sup>1</sup></a></li>
        		<li><a href="http://karnikram.info/">Karnik Ram<sup>1</sup></a></li>
        		<li><a href="http://krrish94.github.io/">Krishna Murthy<sup>2</sup></a></li>
				<li><a href="https://www.iiit.ac.in/people/faculty/mkrishna/">K. Madhava Krishna<sup>1</sup></a></li>
				<li><p2>1. Robotics Research Center, International Institute of Information and Technology</p2></li>
				<li><p2>2. Montreal Institute of Learning Algorithms, Universit‌&#233; de Montr‌&#233;al</p2></li>
    		</ul>
		</nav>

		<div>
			<img src="./assets/teaser.png" alt="Paper Teaser" class = "center">
		</div>

				<p>3D LiDARs and 2D cameras are increasingly being
		used alongside each other in sensor rigs for perception tasks.
		Before these sensors can be used to gather meaningful data,
		however, their extrinsics (and intrinsics) need to be accurately
		calibrated, as the performance of the sensor rig is extremely
		sensitive to these calibration parameters. A vast majority of
		existing calibration techniques require significant amounts of
		data and/or calibration targets and human effort, severely
		impacting their applicability in large-scale production systems.
		We address this gap with CalibNet: a self-supervised deep
		network capable of automatically estimating the 6-DoF rigid
		body transformation between a 3D LiDAR and a 2D camera in
		real-time. CalibNet alleviates the need for calibration targets,
		thereby resulting in significant savings in calibration efforts.
		During training, the network only takes as input a LiDAR point
		cloud, the corresponding monocular image, and the camera
		calibration matrix K. At train time, we do not impose direct
		supervision (i.e., we do not directly regress to the calibration
		parameters, for example). Instead, we train the network to
		predict calibration parameters that maximize the geometric and
		photometric consistency of the input images and point clouds.
		CalibNet learns to iteratively solve the underlying geometric
		problem and accurately predicts extrinsic calibration parameters for a wide range of mis-calibrations, without requiring
		retraining or domain adaptation.</p>

		<hr>
			<h3> Video </h3>
		</hr>

		<div>
	 		<video class="center" src="./assets/video.mp4" controls width="1200" height="450">⁪</video>
		</div>

		<hr>
			<h3> Paper </h3>
			<a class="paper" href="https://arxiv.org/abs/1803.08181">
 				<img class="center_thumbnail" src="./assets/pdf_thumbnail.png" alt="Paper Thumbnail" width="1" height="2" border-style = "dotted" border = 1px>
			</a>
			<!-- <h4>Coming Soon!</h4> -->
		</hr>

		<hr>
			<a href="https://github.com/epiception/CalibNet">
			<p style="text-align:center">For the Tensorflow implementation</p>
			<h3> Check out the Code! </h3></a>

			<!-- <h4>Coming Soon!</h4> -->
		</hr>


	</body>
</html>