Full Code of epiception/CalibNet for AI

main a6fed6de1805 cached
83 files
325.6 KB
98.3k tokens
303 symbols
1 requests
Download .txt
Showing preview only (348K chars total). Download the full file or copy to clipboard to get everything.
Repository: epiception/CalibNet
Branch: main
Commit: a6fed6de1805
Files: 83
Total size: 325.6 KB

Directory structure:
gitextract_1fl8qxli/

├── LICENSE
├── README.md
├── code/
│   ├── common/
│   │   ├── Lie_functions.py
│   │   ├── __init__.py
│   │   ├── all_transformer.py
│   │   ├── cnn_utils_res.py
│   │   ├── global_agg_net.py
│   │   ├── resnet_depth_model.py
│   │   └── resnet_rgb_model.py
│   ├── config_res.py
│   ├── dataset_files/
│   │   ├── dataset_build_color.py
│   │   ├── dataset_build_color_2.py
│   │   ├── dataset_builder_parallel.sh
│   │   └── parser.py
│   ├── model_utils.py
│   ├── nw_loader_color.py
│   ├── tf_ops/
│   │   ├── CD/
│   │   │   ├── __init__.py
│   │   │   ├── makefile
│   │   │   ├── render_balls_so.cpp
│   │   │   ├── tf_nndistance.cpp
│   │   │   ├── tf_nndistance.py
│   │   │   ├── tf_nndistance_g.cu
│   │   │   └── tf_nndistance_g.cu.o
│   │   ├── __init__.py
│   │   ├── emd/
│   │   │   ├── __init__.py
│   │   │   ├── tf_auctionmatch.cpp
│   │   │   ├── tf_auctionmatch.py
│   │   │   ├── tf_auctionmatch_compile.sh
│   │   │   ├── tf_auctionmatch_g.cu
│   │   │   └── tf_auctionmatch_g.cu.o
│   │   ├── grouping/
│   │   │   ├── __init__.py
│   │   │   ├── compile.sh
│   │   │   ├── query_ball_point.cpp
│   │   │   ├── query_ball_point.cu
│   │   │   ├── query_ball_point_block.cu
│   │   │   ├── query_ball_point_grid.cu
│   │   │   ├── selection_sort
│   │   │   ├── selection_sort.cpp
│   │   │   ├── selection_sort.cu
│   │   │   ├── selection_sort_const.cu
│   │   │   ├── test_knn.py
│   │   │   ├── tf_grouping.cpp
│   │   │   ├── tf_grouping.py
│   │   │   ├── tf_grouping_compile.sh
│   │   │   ├── tf_grouping_g.cu
│   │   │   ├── tf_grouping_g.cu.o
│   │   │   └── tf_grouping_op_test.py
│   │   ├── interpolation/
│   │   │   ├── __init__.py
│   │   │   ├── interpolate.cpp
│   │   │   ├── tf_interpolate.cpp
│   │   │   ├── tf_interpolate.py
│   │   │   ├── tf_interpolate_compile.sh
│   │   │   ├── tf_interpolate_op_test.py
│   │   │   └── visu_interpolation.py
│   │   └── sampling/
│   │       ├── __init__.py
│   │       ├── tf_sampling.cpp
│   │       ├── tf_sampling.py
│   │       ├── tf_sampling_compile.sh
│   │       ├── tf_sampling_g.cu
│   │       └── tf_sampling_g.cu.o
│   ├── train_model_combined.py
│   └── utils/
│       ├── __init__.py
│       ├── data_prep_util.py
│       ├── eulerangles.py
│       ├── modelnet_data_prep.py
│       ├── off2obj.py
│       ├── pc_util.py
│       ├── plyfile.py
│       ├── pointnet_util.py
│       ├── provider.py
│       ├── show3d.py
│       ├── tf_util.py
│       ├── tf_util2.py
│       └── write_result2html.py
└── docs/
    ├── .nojekyll
    ├── _site/
    │   ├── css/
    │   │   ├── main.css
    │   │   ├── project.css
    │   │   └── project_file.css
    │   └── index.html
    ├── css/
    │   ├── main.css
    │   ├── project.css
    │   └── project_file.css
    └── index.html

================================================
FILE CONTENTS
================================================

================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2018 Ganesh Iyer

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# CalibNet

### [DEPRECATED] This repository is no longer actively supported. 

While the authors work on an update, please check out this unofficial implementation: [CalibNet_pytorch](https://github.com/gitouni/CalibNet_pytorch) :fire: :slightly_smiling_face:
___

Code for our paper:
[CalibNet: Self-Supervised Extrinsic Calibration using 3D Spatial Transformer Networks](https://arxiv.org/pdf/1803.08181.pdf)

Check out our [project page](https://epiception.github.io/CalibNet/)!

![CalibNet_gif1](https://media.giphy.com/media/1zjOgLf7j4lHmeMubG/giphy.gif)

### Prerequisites
CalibNet is trained on Tensorflow 1.3, CUDA 8.0, CUDNN 7.0.1


##### Installation

The code for point cloud distance loss is modified from [PU-NET](https://github.com/yulequan/PU-Net), PointNet++, PointSetGeneration.

This repository, thus, is based on Tensorflow and the TF operators from PointNet++ and PU-NET.

For installing tensorflow, please follow the official instructions in here. The code is tested under TF1.3 and Python 2.7 on Ubuntu 16.04.

For compiling TF operators, please check tf_xxx_compile.sh under each op subfolder in code/tf_ops folder, and change the path correctly to ../path/to/tensorflow/include. Note that you need to update nvcc, python and tensoflow include library if necessary. You also need to remove -D_GLIBCXX_USE_CXX11_ABI=0 flag in g++ command in order to compile correctly if necessary.

We are working to update the code and installation steps for the latest tensorflow versions.

### Dataset Preparation

To prepare the dataset, run /dataset_files/dataset_builder_parallel.sh in the directory where you wish to store. We will also create a parser `parsed_set.txt` for the dataset, that contains the file names for training.

```
git clone https://github.com/epiception/CalibNet.git
or
svn checkout https://github.com/CalibNet/trunk/code (for the code)
cd ../path/to/dataset_directory
bash ../path/to/code_folder/dataset_files/dataset_builder_parallel.sh
cd ../path/to/CalibNet/code
python dataset_files/parser.py ../dataset_directory/2011_09_26/
```
##### Resnet-18
Pretrained Resnet-18 parameters can be found [here](https://drive.google.com/open?id=1XGqdBH3A88m1LgUIe5tS7VjKjtQc1A6V).


### Training

Before training, be sure to make requisite changes to the paths and training parameters in the config file `config_res.py`.
We trained using 2 GPUs. The base code is written to support ops for the same device configuration. 

To begin training:
```
CUDA_VISIBLE_DEVICES=<device_id1>,<device_id2> python -B train_model_combined.py
```

##### Trained Weights
Trained weights for the base variant (non-iterative) model is available [here](https://drive.google.com/drive/folders/138hq7OgTEBmG-wK52h7gchg5ob1WqARn?usp=sharing). This model was trained for 44 epochs. As mentioned in the paper, the iterative realignment model for better translation outputs will uploaded soon.

### Evaluation/Test
Code for Direct Evaluation/Testing pipeline for point cloud calibration will be uploaded soon.


================================================
FILE: code/common/Lie_functions.py
================================================
import numpy as np
import tensorflow as tf

def for_translation(Old_transform,t):

    R = Old_transform[:3,:3]
    t = tf.expand_dims(t,1)
    trans_T = tf.concat([R,t], 1)

    return trans_T

def for_rotation(Old_transform):

    R = Old_transform[:3,:3]
    t = tf.expand_dims(tf.constant(np.array([0.0, 0.0, 0.0], dtype = np.float32)), 1)
    rots_T = tf.concat([R,t], 1)

    return rots_T


def exponential_map_single(vec):

    "Exponential Map Operation. Decoupled for SO(3) and translation t"

    with tf.name_scope("Exponential_map"):

        u = vec[:3]
        omega = vec[3:]

        theta = tf.sqrt(omega[0]*omega[0] + omega[1]*omega[1] + omega[2]*omega[2])

        omega_cross = tf.stack([0.0, -omega[2], omega[1], omega[2], 0.0, -omega[0], -omega[1], omega[0], 0.0])
        omega_cross = tf.reshape(omega_cross, [3,3])

        #Taylor's approximation for A,B and C not being used currently, approximations preferable for low values of theta

        # A = 1.0 - (tf.pow(theta,2)/factorial(3.0)) + (tf.pow(theta, 4)/factorial(5.0))
        # B = 1.0/factorial(2.0) - (tf.pow(theta,2)/factorial(4.0)) + (tf.pow(theta, 4)/factorial(6.0))
        # C = 1.0/factorial(3.0) - (tf.pow(theta,2)/factorial(5.0)) + (tf.pow(theta, 4)/factorial(7.0))

        A = tf.sin(theta)/theta

        B = (1.0 - tf.cos(theta))/(tf.pow(theta,2))

        C = (1.0 - A)/(tf.pow(theta,2))

        omega_cross_square = tf.matmul(omega_cross, omega_cross)

        R = tf.eye(3,3) + A*omega_cross + B*omega_cross_square

        V = tf.eye(3,3) + B*omega_cross + C*omega_cross_square
        Vu = tf.matmul(V,tf.expand_dims(u,1))

        T = tf.concat([R, Vu], 1)

        return T


def transforms_mul(T1, T2):


    T1 = tf.concat ([T1, tf.constant(np.array([[0.0, 0.0, 0.0, 1.0]]), dtype = tf.float32)], 0)
    T2 = tf.concat ([T2, tf.constant(np.array([[0.0, 0.0, 0.0, 1.0]]), dtype = tf.float32)], 0)

    product = tf.matmul(T1, T2)

    return product


================================================
FILE: code/common/__init__.py
================================================


================================================
FILE: code/common/all_transformer.py
================================================
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import scipy.misc as smc

import config_res as config

IMG_HT = config.depth_img_params['IMG_HT']
IMG_WDT = config.depth_img_params['IMG_WDT']
batch_size = 1

shape = (IMG_HT, IMG_WDT)

def _simple_transformer(depth_map, t_mat, k_mat, small_transform):

    batch_grids, transformed_depth_map, sparse_cloud  = _3D_meshgrid_batchwise_diff(IMG_HT, IMG_WDT, depth_map, batch_size, t_mat, k_mat, small_transform)

    x_all = tf.reshape(batch_grids[:,0], (IMG_HT, IMG_WDT))
    y_all = tf.reshape(batch_grids[:,1], (IMG_HT, IMG_WDT))

    return _bilinear_sampling(transformed_depth_map, x_all, y_all), sparse_cloud


def sparsify_cloud(S):

    """
    Cluster centers of point clouds used to sparsify cloud for Earth Mover's Distance. Using 4096 centroids
    """

    with tf.device('/cpu:0'):

        point_limit = 4096
        no_points = tf.shape(S)[0]
        no_partitions = no_points/tf.constant(point_limit, dtype=tf.int32)
        saved_points = tf.gather_nd(S, [tf.expand_dims(tf.range(0, no_partitions*point_limit), 1)])
        saved_points = tf.reshape(saved_points, [point_limit, no_partitions, 3])
        saved_points_sparse = tf.reduce_mean(saved_points, 1)

        return saved_points_sparse

def _3D_meshgrid_batchwise_diff(height, width, depth_img, num_batch, transformation_matrix, tf_K_mat, small_transform):

    """
    Creates 3d sampling meshgrid
    """

    x_index = tf.linspace(-1.0, 1.0, width)
    y_index = tf.linspace(-1.0, 1.0, height)
    z_index = tf.range(0, width*height)

    x_t, y_t = tf.meshgrid(x_index, y_index)

    # flatten
    x_t_flat = tf.reshape(x_t, [1,-1])
    y_t_flat = tf.reshape(y_t, [1,-1])
    ZZ = tf.reshape(depth_img, [-1])

    zeros_target = tf.zeros_like(ZZ)
    mask = tf.not_equal(ZZ, zeros_target)
    ones = tf.ones_like(x_t_flat)

    sampling_grid_2d = tf.concat([x_t_flat, y_t_flat, ones], 0)
    sampling_grid_2d_sparse = tf.transpose(tf.boolean_mask(tf.transpose(sampling_grid_2d), mask))
    ZZ_saved = tf.boolean_mask(ZZ, mask)
    ones_saved = tf.expand_dims(tf.ones_like(ZZ_saved), 0)

    projection_grid_3d = tf.matmul(tf.matrix_inverse(tf_K_mat), sampling_grid_2d_sparse*ZZ_saved)

    homog_points_3d = tf.concat([projection_grid_3d, ones_saved], 0)

    final_transformation_matrix = tf.matmul(transformation_matrix, small_transform)[:3,:]
    warped_sampling_grid = tf.matmul(final_transformation_matrix, homog_points_3d)

    points_2d = tf.matmul(tf_K_mat, warped_sampling_grid[:3,:])

    Z = points_2d[2,:]

    x_dash_pred = points_2d[0,:]
    y_dash_pred = points_2d[1,:]
    point_cloud = tf.stack([x_dash_pred, y_dash_pred, Z], 1)

    sparse_point_cloud = sparsify_cloud(point_cloud)

    x = tf.transpose(points_2d[0,:]/Z)
    y = tf.transpose(points_2d[1,:]/Z)

    mask_int = tf.cast(mask, 'int32')

    updated_indices = tf.expand_dims(tf.boolean_mask(mask_int*z_index, mask), 1)

    updated_Z = tf.scatter_nd(updated_indices, Z, tf.constant([width*height]))
    updated_x = tf.scatter_nd(updated_indices, x, tf.constant([width*height]))
    neg_ones = tf.ones_like(updated_x)*-1.0
    updated_x_fin = tf.where(tf.equal(updated_Z, zeros_target), neg_ones, updated_x)

    updated_y = tf.scatter_nd(updated_indices, y, tf.constant([width*height]))
    updated_y_fin = tf.where(tf.equal(updated_Z, zeros_target), neg_ones, updated_y)

    reprojected_grid = tf.stack([updated_x_fin, updated_y_fin], 1)

    transformed_depth = tf.reshape(updated_Z, (IMG_HT, IMG_WDT))

    return reprojected_grid, transformed_depth, sparse_point_cloud

def reverse_all(z):

    """Reversing from cantor function indices to correct indices"""

    z = tf.cast(z, 'float32')
    w = tf.floor((tf.sqrt(8.*z + 1.) - 1.)/2.0)
    t = (w**2 + w)/2.0
    y = tf.clip_by_value(tf.expand_dims(z - t, 1), 0.0, IMG_HT - 1)
    x = tf.clip_by_value(tf.expand_dims(w - y[:,0], 1), 0.0, IMG_WDT - 1)

    return tf.concat([y,x], 1)

def get_pixel_value(img, x, y):

    """Cantor pairing for removing non-unique updates and indices. At the time of implementation, unfixed issue with scatter_nd causes problems with int32 update values. Till resolution, implemented on cpu """

    with tf.device('/cpu:0'):
        indices = tf.stack([y, x], 2)
        indices = tf.reshape(indices, (375*1242, 2))
        values = tf.reshape(img, [-1])

        Y = indices[:,0]
        X = indices[:,1]
        Z = (X + Y)*(X + Y + 1)/2 + Y

        filtered, idx = tf.unique(tf.squeeze(Z))
        updated_values  = tf.unsorted_segment_max(values, idx, tf.shape(filtered)[0])

        # updated_indices = tf.map_fn(fn=lambda i: reverse(i), elems=filtered, dtype=tf.float32)
        updated_indices = reverse_all(filtered)
        updated_indices = tf.cast(updated_indices, 'int32')
        resolved_map = tf.scatter_nd(updated_indices, updated_values, img.shape)

        return resolved_map

def _bilinear_sampling(img, x_func, y_func):

    """
    Sampling from input image and performing bilinear interpolation
    """

    max_y = tf.constant(IMG_HT - 1, dtype=tf.int32)
    max_x = tf.constant(IMG_WDT - 1, dtype=tf.int32)

    zero = tf.zeros([], dtype='int32')

    # rescale x and y to [0, W/H/D]
    x = 0.5 * ((x_func + 1.0) * tf.cast(IMG_WDT - 1, 'float32'))
    y = 0.5 * ((y_func + 1.0) * tf.cast(IMG_HT - 1, 'float32'))

    x = tf.clip_by_value(x, 0.0, tf.cast(max_x, 'float32'))
    y = tf.clip_by_value(y, 0.0, tf.cast(max_y, 'float32'))

    # grab 4 nearest corner points for each (x_i, y_i)
    # i.e. we need a rectangle around the point of interest
    x0 = tf.cast(tf.floor(x), 'int32')
    x1 = x0 + 1
    y0 = tf.cast(tf.floor(y), 'int32')
    y1 = y0 + 1

    # clip to range [0, H/W] to not violate img boundaries
    x0 = tf.clip_by_value(x0, 0, max_x)
    x1 = tf.clip_by_value(x1, 0, max_x)
    y0 = tf.clip_by_value(y0, 0, max_y)
    y1 = tf.clip_by_value(y1, 0, max_y)

    # find Ia, Ib, Ic, Id

    Ia = get_pixel_value(img, x0, y0)
    Ib = get_pixel_value(img, x0, y1)
    Ic = get_pixel_value(img, x1, y0)
    Id = get_pixel_value(img, x1, y1)

    x0 = tf.cast(x0, 'float32')
    x1 = tf.cast(x1, 'float32')
    y0 = tf.cast(y0, 'float32')
    y1 = tf.cast(y1, 'float32')

    # calculate deltas
    wa = (x1-x) * (y1-y)
    wb = (x1-x) * (y-y0)
    wc = (x-x0) * (y1-y)
    wd = (x-x0) * (y-y0)

    loc = wa*Ia + wb*Ib + wc*Ic + wd*Id

    return loc


================================================
FILE: code/common/cnn_utils_res.py
================================================
import numpy as np
import tensorflow as tf

def weight_variable(shape, str):
    '''
    Helper function to create a weight variable initialized with
    a normal distribution (truncated to two standard devs)
    Parameters
    ----------
    shape : list
        Size of weight variable
    '''

    W = tf.get_variable("weight" + str, shape=shape, initializer=tf.contrib.layers.xavier_initializer())
    return W

def weight_variable_fc(shape, str):
    '''
    Helper function to create a weight variable initialized with
    a normal distribution (truncated to two standard devs)
    Parameters
    ----------
    shape : list
        Size of weight variable
    '''

    W = 0.01*tf.get_variable("weight" + str, shape=shape, initializer=tf.contrib.layers.xavier_initializer())
    return W

def bias_variable(shape, str):

    B = tf.Variable(tf.constant(0.0, shape= shape, dtype=tf.float32), name="bias" + str)
    return B


def init_weights(W, str, to_train):
    init = tf.constant_initializer(W)
    weight = tf.get_variable('weight'+str, shape=W.shape, dtype=tf.float32, initializer= init, trainable = to_train)
    return weight

def init_bias(B, layerno, to_train):
    init = tf.constant_initializer(B)
    bias = tf.get_variable('bias_%d'%layerno, shape=B.shape, dtype=tf.float32, initializer= init, trainable = to_train)
    return bias

def conv2d_batchnorm(x, W, name, phase, beta_r, gamma_r, mean_r, variance_r, stride = [1,1,1,1], relu = True):

    beta = tf.constant_initializer(beta_r)
    gamma = tf.constant_initializer(gamma_r)
    moving_mean = tf.constant_initializer(mean_r)
    moving_variance = tf.constant_initializer(variance_r)

    with tf.name_scope(name):
        mid1 =  tf.nn.conv2d(x, W, strides = stride, padding = "SAME")
        with tf.name_scope('batch_norm'):
            mid2 = tf.contrib.layers.batch_norm(mid1, param_initializers={'beta': beta, 'gamma': gamma, 'moving_mean': moving_mean,'moving_variance': moving_variance,}, is_training = phase, updates_collections = None, scale = True, decay = 0.9)
            if(relu == True):
                return tf.nn.relu(mid2)
            else:
                return mid2

def conv2d_batchnorm_init(x, W, name, phase, stride = [1,1,1,1], relu = True):

    with tf.name_scope(name):
        mid1 =  tf.nn.conv2d(x, W, strides = stride, padding ="SAME")
        with tf.name_scope('batch_norm'):
            mid2 = tf.contrib.layers.batch_norm(mid1, is_training = phase, updates_collections = None)
            if(relu == True):
                return tf.nn.relu(mid2)
            else:
                return mid2

def conv2d_init(x, W, name, phase, stride, padding):
    with tf.name_scope(name):
        mid1 =  tf.nn.conv2d(x, W, strides = stride, padding = padding)
        mid2 = tf.nn.relu(mid1)

        return mid2


def conv2d_bias_init(x, W, b, name):
    with tf.name_scope(name):
        mid1 =  tf.nn.conv2d(x, W, strides = [1,1,1,1], padding = "SAME") + b
        mid2 = tf.nn.relu(mid1)

        return mid2

def max_pool(x, name):
    return tf.nn.max_pool(x, ksize = [1,2,2,1], strides = [1,2,2,1], padding = "SAME", name=name)

def variable_summaries(var):
  """Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""

  with tf.name_scope('summaries'):
    mean = tf.reduce_mean(var)
    sum_mean = tf.summary.scalar('mean', mean)
    with tf.name_scope('stddev'):
      stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
    sum_stddev = tf.summary.scalar('stddev', stddev)
    #tf.summary.scalar('max', tf.reduce_max(var))
    #tf.summary.scalar('min', tf.reduce_min(var))
    sum_hist = tf.summary.histogram('histogram', var)
    return [sum_mean, sum_hist, sum_stddev]


================================================
FILE: code/common/global_agg_net.py
================================================
import numpy as np
import tensorflow as tf
import scipy.misc as smc
import matplotlib.pyplot as plt

import config_res as config
from cnn_utils_res import *

import resnet_rgb_model as model
import resnet_depth_model as model_depth

batch_size = config.net_params['batch_size']
current_epoch = config.net_params['load_epoch']

def End_Net_weights_init():

    """
    Initialize Aggregation Network Weights and Summaries
    """

    W_ext1 = weight_variable([3,3,768,384], "_8")
    W_ext2 = weight_variable([3,3,384,384], "_9")
    W_ext3 = weight_variable([1,2,384,384], "_10")

    W_ext4_rot = weight_variable([1,1,384,384], "_11")
    W_fc_rot = weight_variable_fc([3840,3], "_12")

    W_ext4_tr = weight_variable([1,1,384,384], "_13")
    W_fc_tr = weight_variable_fc([3840,3], "_14")

    end_weights = [W_ext1, W_ext2, W_ext3, W_ext4_rot, W_fc_rot, W_ext4_tr, W_fc_tr]

    weight_summaries = []

    for weight_index in range(len(end_weights)):
        with tf.name_scope('weight_%d'%weight_index):
            weight_summaries += variable_summaries(end_weights[weight_index])

    return end_weights, weight_summaries

def End_Net(input_x, phase_depth, keep_prob):

    """
    Define Aggregation Network
    """

    weights, summaries = End_Net_weights_init()

    layer8 = conv2d_batchnorm_init(input_x, weights[0], name="conv_9", phase= phase_depth, stride=[1,2,2,1])
    layer9 = conv2d_batchnorm_init(layer8, weights[1], name="conv_10", phase= phase_depth, stride=[1,2,2,1])
    layer10 = conv2d_batchnorm_init(layer9, weights[2], name="conv_11", phase= phase_depth, stride=[1,1,1,1])

    layer11_rot = conv2d_batchnorm_init(layer10, weights[3], name="conv_12", phase= phase_depth, stride=[1,1,1,1])
    layer11_m_rot = tf.reshape(layer11_rot, [batch_size, 3840])
    layer11_drop_rot = tf.nn.dropout(layer11_m_rot, keep_prob)
    layer11_vec_rot = (tf.matmul(layer11_drop_rot, weights[4]))

    layer11_tr = conv2d_batchnorm_init(layer10, weights[5], name="conv_13", phase= phase_depth, stride=[1,1,1,1])
    layer11_m_tr = tf.reshape(layer11_tr, [batch_size, 3840])
    layer11_drop_tr = tf.nn.dropout(layer11_m_tr, keep_prob)
    layer11_vec_tr = (tf.matmul(layer11_drop_tr, weights[6]))

    output_vectors = tf.concat([layer11_vec_tr, layer11_vec_rot], 1)
    return output_vectors, summaries


def End_Net_Out(X1, phase_rgb, pooled_input2, phase, keep_prob):

    """
    Computation Graph
    """

    RGB_Net_obj = model.Resnet(X1, phase_rgb)
    Depth_Net_obj = model_depth.Depthnet(pooled_input2, phase)

    with tf.variable_scope('ResNet'):
        with tf.device('/device:GPU:0'):
            output_rgb = RGB_Net_obj.Net()
    with tf.variable_scope('DepthNet'):
        with tf.device('/device:GPU:1'):
            output_depth = Depth_Net_obj.Net()

    layer_next = tf.concat([output_rgb, output_depth], 3)

    end_net_op = End_Net(layer_next, phase, keep_prob)

    return end_net_op


================================================
FILE: code/common/resnet_depth_model.py
================================================
import numpy as np
import tensorflow as tf
import json
from cnn_utils_res import *

import config_res

with open(config_res.paths['resnet_params_path']) as f_in:
    parameters = json.load(f_in)

class Depthnet:

    def __init__(self, input_y, phase, parameters = parameters):

        self.input_y = input_y
        self.phase = phase
        self.parameters = parameters
        self.layer_zero
        self.layer
        self.Net

    def Net(self):
        layer_zero_out = self.layer_zero(self.input_y)

        current_output = layer_zero_out

        for layer_idx in range(1,5):
            layer_out = self.layer(current_output, layer_idx)
            current_output = layer_out

        return current_output

    def layer_zero(self, layer_input):

        layer_dict =self.parameters['layer0']
        bl_str = "block_1"

        W = np.array(layer_dict[bl_str]['conv1']['weight'], dtype = np.float32)
        bn_mov_mean = np.array(layer_dict[bl_str]['bn1']['running_mean'], dtype = np.float32)
        bn_mov_var = np.array(layer_dict[bl_str]['bn1']['running_var'], dtype = np.float32)
        bn_gamma = np.array(layer_dict[bl_str]['bn1']['weight'], dtype = np.float32)
        bn_beta = np.array(layer_dict[bl_str]['bn1']['bias'], dtype = np.float32)

        shapex = W.shape
        W_conv = weight_variable([shapex[0], shapex[1], 1, shapex[3]/2], "_depth_0")
        # out = conv2d_batchnorm(layer_input, W_conv, "layer_depth_0", self.phase, bn_beta, bn_gamma, bn_mov_mean, bn_mov_var, [1,2,2,1], True)
        out = conv2d_batchnorm_init(layer_input, W_conv, "layer_depth_0", self.phase, [1,2,2,1], True)

        out = tf.nn.max_pool(out, [1,3,3,1], strides=[1,2,2,1], padding="SAME")

        print('layer0', out.shape)
        return out

    def layer(self, layer_input, layer_no):
        layer_dict = self.parameters['layer%d'%layer_no]

        cur = layer_input
        res = layer_input

        for b_no in range(1,3):
            bl_str = "block_%d"%b_no

            stride = [0,0]
            if(b_no == 1):
                stride = [2,1]
            else:
                stride = [1,1]

            # for in_bno in range(1,3):

            W1 = np.array(layer_dict[bl_str]['conv1']['weight'], dtype = np.float32)
            bn_mov_mean1 = np.array(layer_dict[bl_str]['bn1']['running_mean'], dtype = np.float32)
            bn_mov_var1 = np.array(layer_dict[bl_str]['bn1']['running_var'], dtype = np.float32)
            bn_gamma1 = np.array(layer_dict[bl_str]['bn1']['weight'], dtype = np.float32)
            bn_beta1 = np.array(layer_dict[bl_str]['bn1']['bias'], dtype = np.float32)

            W2 = np.array(layer_dict[bl_str]['conv2']['weight'], dtype = np.float32)
            bn_mov_mean2 = np.array(layer_dict[bl_str]['bn2']['running_mean'], dtype = np.float32)
            bn_mov_var2 = np.array(layer_dict[bl_str]['bn2']['running_var'], dtype = np.float32)
            bn_gamma2 = np.array(layer_dict[bl_str]['bn2']['weight'], dtype = np.float32)
            bn_beta2 = np.array(layer_dict[bl_str]['bn2']['bias'], dtype = np.float32)

            # W_conv1 = init_weights(W1, "_l_%d_bl_%d_no_%d"%(layer_no,b_no, 1), False)
            # W_conv2 = init_weights(W2, "_l_%d_bl_%d_no_%d"%(layer_no,b_no, 2), False)

            shapex1 = W1.shape
            shapex2 = W2.shape

            W_conv1 = weight_variable([shapex1[0], shapex1[1], shapex1[2]/2, shapex1[3]/2], "dep_l_%d_bl_%d_no_%d"%(layer_no,b_no, 1))
            W_conv2 = weight_variable([shapex2[0], shapex2[1], shapex2[2]/2, shapex2[3]/2], "dep_l_%d_bl_%d_no_%d"%(layer_no,b_no, 2))

            # out1 = conv2d_batchnorm(cur, W_conv1, "layer_%d_%d_1"%(layer_no,b_no), self.phase, bn_beta1, bn_gamma1, bn_mov_mean1, bn_mov_var1, [1,stride[0],stride[0],1], False)

            out1 = conv2d_batchnorm_init(cur, W_conv1, "dep_layer_%d_%d_1"%(layer_no,b_no), self.phase, [1,stride[0],stride[0],1], False)

            print("layer_%d_%d_1"%(layer_no,b_no), out1.shape)

            """ if layer1 no downsample, so stride 2,1 then 1,1 """
            """else stride 2,1 then downsample then 1,1 """

            if(layer_no > 1 and b_no == 1):
                downsample_dict = self.parameters['layer%d_downsample'%layer_no]
                W_dn = np.array(downsample_dict['block_1']['conv']['weight'], dtype = np.float32)
                bn_mov_mean_dn = np.array(downsample_dict['block_1']['bn']['running_mean'], dtype = np.float32)
                bn_mov_var_dn = np.array(downsample_dict['block_1']['bn']['running_var'], dtype = np.float32)
                bn_gamma_dn = np.array(downsample_dict['block_1']['bn']['weight'], dtype = np.float32)
                bn_beta_dn = np.array(downsample_dict['block_1']['bn']['bias'], dtype = np.float32)

                # W_conv_dn = init_weights(W_dn, "downsample_%d"%(layer_no), False)
                # res = conv2d_batchnorm(res, W_conv_dn, "layer_dn_%d"%(layer_no), self.phase, bn_beta_dn, bn_gamma_dn, bn_mov_mean_dn, bn_mov_var_dn, [1,2,2,1], False)

                shapex = W_dn.shape

                W_conv_dn = weight_variable([shapex[0], shapex[1], shapex[2]/2, shapex[3]/2], "dep_downsample_%d"%(layer_no))
                res = conv2d_batchnorm_init(res, W_conv_dn, "dep_layer_dn_%d"%(layer_no), self.phase, [1,2,2,1], False)

                print("downsample_layer_%d_%d_1"%(layer_no,b_no), res.shape)

                out1 = tf.nn.relu(out1 + res)

            else:
                out1 = tf.nn.relu(out1)

            out2 = conv2d_batchnorm_init(out1, W_conv2, "dep_layer_%d_%d_2"%(layer_no,b_no), self.phase, [1,stride[1],stride[1],1], True)
            print("layer_%d_%d_2"%(layer_no,b_no), out2.shape)
            cur = out2

        return cur


================================================
FILE: code/common/resnet_rgb_model.py
================================================
import numpy as np
import tensorflow as tf
import json
from cnn_utils_res import *
import config_res

with open(config_res.paths['resnet_params_path']) as f_in:
    parameters = json.load(f_in)


class Resnet:

    def __init__(self, input_x, phase, parameters = parameters):

        self.input_x = input_x
        self.phase = phase
        self.parameters = parameters
        self.layer_zero
        self.layer
        self.Net

    def Net(self):
        layer_zero_out = self.layer_zero(self.input_x)

        current_output = layer_zero_out

        for layer_idx in range(1,5):
            layer_out = self.layer(current_output, layer_idx)
            current_output = layer_out

        return current_output

    def layer_zero(self, layer_input):

        layer_dict =self.parameters['layer0']
        bl_str = "block_1"

        W = np.array(layer_dict[bl_str]['conv1']['weight'], dtype = np.float32)
        bn_mov_mean = np.array(layer_dict[bl_str]['bn1']['running_mean'], dtype = np.float32)
        bn_mov_var = np.array(layer_dict[bl_str]['bn1']['running_var'], dtype = np.float32)
        bn_gamma = np.array(layer_dict[bl_str]['bn1']['weight'], dtype = np.float32)
        bn_beta = np.array(layer_dict[bl_str]['bn1']['bias'], dtype = np.float32)

        W_conv = init_weights(W, "_0", False)

        # with tf.name_scope('layer_0'):
        out = conv2d_batchnorm(layer_input, W_conv, "layer_0", self.phase, bn_beta, bn_gamma, bn_mov_mean, bn_mov_var, [1,2,2,1], True)
        out = tf.nn.max_pool(out, [1,3,3,1], strides=[1,2,2,1], padding="SAME")

        print('layer0', out.shape)
        return out

    def layer(self, layer_input, layer_no):
        layer_dict = self.parameters['layer%d'%layer_no]

        cur = layer_input
        res = layer_input

        for b_no in range(1,3):
            bl_str = "block_%d"%b_no

            stride = [0,0]
            if(b_no == 1):
                stride = [2,1]
            else:
                stride = [1,1]

            # for in_bno in range(1,3):

            W1 = np.array(layer_dict[bl_str]['conv1']['weight'], dtype = np.float32)
            bn_mov_mean1 = np.array(layer_dict[bl_str]['bn1']['running_mean'], dtype = np.float32)
            bn_mov_var1 = np.array(layer_dict[bl_str]['bn1']['running_var'], dtype = np.float32)
            bn_gamma1 = np.array(layer_dict[bl_str]['bn1']['weight'], dtype = np.float32)
            bn_beta1 = np.array(layer_dict[bl_str]['bn1']['bias'], dtype = np.float32)

            W2 = np.array(layer_dict[bl_str]['conv2']['weight'], dtype = np.float32)
            bn_mov_mean2 = np.array(layer_dict[bl_str]['bn2']['running_mean'], dtype = np.float32)
            bn_mov_var2 = np.array(layer_dict[bl_str]['bn2']['running_var'], dtype = np.float32)
            bn_gamma2 = np.array(layer_dict[bl_str]['bn2']['weight'], dtype = np.float32)
            bn_beta2 = np.array(layer_dict[bl_str]['bn2']['bias'], dtype = np.float32)

            W_conv1 = init_weights(W1, "_l_%d_bl_%d_no_%d"%(layer_no,b_no, 1), False)
            W_conv2 = init_weights(W2, "_l_%d_bl_%d_no_%d"%(layer_no,b_no, 2), False)

            # with tf.name_scope("layer_%d_%d_1"%(layer_no,b_no)):
            out1 = conv2d_batchnorm(cur, W_conv1, "layer_%d_%d_1"%(layer_no,b_no), self.phase, bn_beta1, bn_gamma1, bn_mov_mean1, bn_mov_var1, [1,stride[0],stride[0],1], False)

            print("layer_%d_%d_1"%(layer_no,b_no), out1.shape)

            """ if layer1 no downsample, so stride 2,1 then 1,1 """
            """else stride 2,1 then downsample then 1,1 """

            if(layer_no > 1 and b_no == 1):
                downsample_dict = self.parameters['layer%d_downsample'%layer_no]
                W_dn = np.array(downsample_dict['block_1']['conv']['weight'], dtype = np.float32)
                bn_mov_mean_dn = np.array(downsample_dict['block_1']['bn']['running_mean'], dtype = np.float32)
                bn_mov_var_dn = np.array(downsample_dict['block_1']['bn']['running_var'], dtype = np.float32)
                bn_gamma_dn = np.array(downsample_dict['block_1']['bn']['weight'], dtype = np.float32)
                bn_beta_dn = np.array(downsample_dict['block_1']['bn']['bias'], dtype = np.float32)

                W_conv_dn = init_weights(W_dn, "downsample_%d"%(layer_no), False)
                # with tf.name_scope("downsample_layer_%d_%d"%(layer_no,b_no)):
                res = conv2d_batchnorm(res, W_conv_dn, "layer_dn_%d"%(layer_no), self.phase, bn_beta_dn, bn_gamma_dn, bn_mov_mean_dn, bn_mov_var_dn, [1,2,2,1], False)

                print("downsample_layer_%d_%d_1"%(layer_no,b_no), res.shape)

                out1 = tf.nn.relu(out1 + res)

            else:
                out1 = tf.nn.relu(out1)

            # with tf.name_scope("layer_%d_%d_2"%(layer_no,b_no)):
            out2 = conv2d_batchnorm(out1, W_conv2, "layer_%d_%d_2"%(layer_no,b_no), self.phase, bn_beta2, bn_gamma2, bn_mov_mean2, bn_mov_var2, [1,stride[1],stride[1],1], True)
            print("layer_%d_%d_2"%(layer_no,b_no), out2.shape)
            cur = out2

        return cur


================================================
FILE: code/config_res.py
================================================
import numpy as np

# Important paths

# resnet_params_path: path containing .pkl file of RESNET18_BN weights
# dataset_path_full: path to parser file
# checkpoint_path: dir where checkpoints are saved and loaded from
# training_imgs_path: dir to save training progress frames (spatial transformer outputs during training)
# validation_imgs_path: dir to save validation progress frames (spatial transformer outputs during validation)
paths = dict(
	resnet_params_path = "../Extrinsic_Calibration_2/parameters.json",
	dataset_path_full = "/home/ganeshiyer/Extrinsic_Calibration_1.5/model_with_emd/rotation_results/parsed_set.txt",
	checkpoint_path = "/tmp/ganesh_saved_Weights/Checkpoint_simple_transformer",
	training_imgs_path = "/tmp/ganesh_saved_Weights/training_imgs",
	validation_imgs_path = "/tmp/ganesh_saved_Weights/validation_imgs"
)

# Depth Map parameters

# IMG_HT: input image height
# IMG_WDT: input image width
depth_img_params = dict(
	IMG_HT = 375,
	IMG_WDT = 1242
)

# Camera parameters
# The current parameters are for the training set created for the all raw sequences: http://www.cvlibs.net/datasets/kitti/raw_data.php on  2011_09_26. Camera parameters change based on the date of sequence, hence used in config

# fx: focal length x
# fy: focal length y
# cx: camera center cx
# cy: camera center cy
# cam_transform_02: Transform to color camera 02 (applied after transform)
# cam_transform_02_inv: Inverse translation of above
camera_params = dict(
	fx = 7.215377e+02,
    fy = 7.215377e+02,
    cx = 6.095593e+02,
    cy = 1.728540e+02,

    cam_transform_02 =  np.array([1.0, 0.0, 0.0, (-4.485728e+01)/7.215377e+02,
	                              0.0, 1.0, 0.0, (-2.163791e-01)/7.215377e+02,
	                              0.0, 0.0, 1.0, -2.745884e-03,
	                              0.0, 0.0, 0.0, 1.0]).reshape(4,4),

	cam_transform_02_inv =  np.array([1.0, 0.0, 0.0, (4.485728e+01)/7.215377e+02,
	                                  0.0, 1.0, 0.0, (2.163791e-01)/7.215377e+02,
	                                  0.0, 0.0, 1.0, 2.745884e-03,
	                                  0.0, 0.0, 0.0, 1.0]).reshape(4,4)
)


# Network and Training Parameters

# batch_size: batch_size taken during training
# total_frames: total instances (check parsed_set.txt for total number of lines)
# total_frames_train: total training instances
# total_frames_validation: total validation instances
# partition_limit: partition size of the total dataset loaded into memory during training
# epochs: total number of epochs
# learning_rate
# beta1: momentum term for Adam Optimizer
# load_epoch: Load checkpoint no. 0 at the start of training (can be changed to resume training)
net_params = dict(
	batch_size = 20,
	total_frames = 30000,
	total_frames_train = 24000,
	total_frames_validation = 6000,
	partition_limit = 1200,
	epochs = 40,
	learning_rate = 5e-4,
	beta1 = 0.9,
	load_epoch = 0
	)


================================================
FILE: code/dataset_files/dataset_build_color.py
================================================
"""

For sequence: 2011_09_26

"""

import numpy as np
import matplotlib.pyplot as plt
import os
import glob
import argparse
from natsort import natsorted as ns
from skimage import io

import scipy.misc as smc
plt.ion()


IMG_HT = 375
IMG_WDT = 1242

fx = 7.215377e+02
fy = 7.215377e+02
cx = 6.095593e+02
cy = 1.728540e+02

K = np.array([7.215377e+02, 0.000000e+00, 6.095593e+02, 0.000000e+00, 7.215377e+02, 1.728540e+02, 0.000000e+00, 0.000000e+00, 1.000000e+00]).reshape(3,3)

velo_to_cam_R = np.array([7.533745e-03, -9.999714e-01, -6.166020e-04, 1.480249e-02, 7.280733e-04, -9.998902e-01, 9.998621e-01, 7.523790e-03, 1.480755e-02]).reshape(3,3)
velo_to_cam_T = np.array([-4.069766e-03, -7.631618e-02, -2.717806e-01]).reshape(3,1)

velo_to_cam = np.vstack((np.hstack((velo_to_cam_R, velo_to_cam_T)), np.array([[0,0,0,1]])))

R_rect_00 =  np.array([9.999239e-01, 9.837760e-03, -7.445048e-03, 0.0,
                      -9.869795e-03, 9.999421e-01, -4.278459e-03, 0.0,
                       7.402527e-03, 4.351614e-03, 9.999631e-01,  0.0,
                       0.0,          0.0,          0.0,           1.0]).reshape(4,4)

cam_02_transform = np.array([1.0, 0.0, 0.0, 4.485728e+01/fx,
                             0.0, 1.0, 0.0, 2.163791e-01/fy,
                             0.0, 0.0, 1.0, 2.745884e-03,
                             0.0, 0.0, 0.0, 1.0]).reshape(4,4)


parser = argparse.ArgumentParser(description="Create Lidar Dataset")
parser.add_argument("path", help = "path_to_folder, end with number", type = str)
args = parser.parse_args()

#
# main_path = "/tmp/lidar_calibration_dataset/2011_09_26/2011_09_26_drive_0104"

main_path = args.path

def timestamp_sync(path):
    txt1 = np.loadtxt(path + "_extract/velodyne_points/timestamps.txt", dtype = str)
    txt2 = np.loadtxt(path + "_sync/velodyne_points/timestamps.txt", dtype = str)
    file_list = ns(glob.glob(path + "_extract/velodyne_points/data/*.txt"))

    times1 = txt1[:,1]
    times2 = txt2[:,1]

    for idx in range(times1.shape[0]):
        times1[idx] = times1[idx].split(":")[2]

    for idx in range(times2.shape[0]):
        times2[idx] = times2[idx].split(":")[2]

    start_pt = times2[0]
    end_pt = times2[-1]

    index_start = np.where(times1 == start_pt)[0][0]
    index_end = np.where(times1 == end_pt)[0][0]

    return file_list[index_start:index_end+1]

if not os.path.exists(main_path + "_sync/depth_maps"):
    os.makedirs(main_path + "_sync/depth_maps")

if not os.path.exists(main_path + "_sync/target_imgs"):
    os.makedirs(main_path + "_sync/target_imgs")

if not os.path.exists(main_path + "_sync/depth_maps_transformed"):
    os.makedirs(main_path + "_sync/depth_maps_transformed")

depth_maps_folder = main_path + "_sync/depth_maps"
target_img_folder = main_path + "_sync/target_imgs"
depth_maps_transformed_folder = main_path + "_sync/depth_maps_transformed"

point_files = timestamp_sync(main_path)
imgs_files = ns(glob.glob(main_path + "_sync/image_02/data/*.png"))

angle_limit = 0.34722965035593395/1.25
tr_limit = 0.34722965035593395/1.25

angle_list = np.zeros((1,16), dtype = np.float32)

for img_name, cloud_name in zip(imgs_files, point_files):

    print(img_name, cloud_name)

    omega_x = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    omega_y = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    omega_z = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    tr_x = tr_limit*np.random.random_sample() - (tr_limit/2.0)
    tr_y = tr_limit*np.random.random_sample() - (tr_limit/2.0)
    tr_z = tr_limit*np.random.random_sample() - (tr_limit/2.0)

    theta = np.sqrt(omega_x**2 + omega_y**2 + omega_z**2)
    omega_cross = np.array([0.0, -omega_z, omega_y, omega_z, 0.0, -omega_x, -omega_y, omega_x, 0.0]).reshape(3,3)

    A = np.sin(theta)/theta
    B = (1.0 - np.cos(theta))/(theta**2)

    R = np.eye(3,3) + A*omega_cross + B*np.matmul(omega_cross, omega_cross)

    T = np.array([tr_x, tr_y, tr_z]).reshape(3,1)

    random_transform = np.vstack((np.hstack((R, T)), np.array([[0.0, 0.0, 0.0, 1.0]])))

    to_write_tr = np.expand_dims(np.ndarray.flatten(random_transform), 0)
    angle_list = np.vstack((angle_list, to_write_tr))

    points = np.loadtxt(cloud_name)
    points = points[:,:3]
    ones_col = np.ones(shape=(points.shape[0],1))
    points = np.hstack((points,ones_col))
    current_img = smc.imread(img_name)
    # current_img_color = smc.imread(color_name)

    img = smc.imread(img_name)
    img_ht = img.shape[0]
    img_wdt = img.shape[1]

    points_in_cam_axis = np.matmul(R_rect_00, (np.matmul(velo_to_cam, points.T)))
    transformed_points = np.matmul(random_transform, points_in_cam_axis)
    # transformed_points = transformed_points[:-1,:]

    points_2d = np.matmul(K, np.matmul(cam_02_transform, transformed_points)[:-1,:])

    # points_2d = np.matmul(K, points_in_cam_axis[:-1,:])

    Z = points_2d[2,:]
    x = (points_2d[0,:]/Z).T
    y = (points_2d[1,:]/Z).T

    x = np.clip(x, 0.0, img_wdt - 1)
    y = np.clip(y, 0.0, img_ht - 1)

    reprojected_img = np.zeros_like(img)
    for x_idx, y_idx,z_idx in zip(x,y,Z):
        if(z_idx>0):
            reprojected_img[int(y_idx), int(x_idx)] = z_idx

    smc.imsave(depth_maps_transformed_folder + "/" + img_name[-14:], reprojected_img)

    points_2d = np.matmul(K, np.matmul(cam_02_transform, points_in_cam_axis)[:-1,:])

    Z = points_2d[2,:]
    x = (points_2d[0,:]/Z).T
    y = (points_2d[1,:]/Z).T

    x = np.clip(x, 0.0, img_wdt - 1)
    y = np.clip(y, 0.0, img_ht - 1)

    reprojected_img = np.zeros_like(img)
    for x_idx, y_idx,z_idx in zip(x,y,Z):
        if(z_idx>0):
            reprojected_img[int(y_idx), int(x_idx)] = z_idx
    pooled_img = reprojected_img

    print(img_name[-14:])

    reconstructed_img = current_img*(pooled_img>0.)
    smc.imsave(depth_maps_folder + "/" + img_name[-14:], pooled_img)
    smc.imsave(target_img_folder + "/" + img_name[-14:], reconstructed_img)

np.savetxt(depth_maps_transformed_folder + "/../angle_list.txt", angle_list[1:], fmt = "%.4f")


================================================
FILE: code/dataset_files/dataset_build_color_2.py
================================================
"""

For sequence: 2011_09_26

"""

import numpy as np
import matplotlib.pyplot as plt
import os
import glob
import argparse
from natsort import natsorted as ns
from skimage import io

import scipy.misc as smc
plt.ion()


IMG_HT = 375
IMG_WDT = 1242

fx = 7.215377e+02
fy = 7.215377e+02
cx = 6.095593e+02
cy = 1.728540e+02

K = np.array([7.215377e+02, 0.000000e+00, 6.095593e+02, 0.000000e+00, 7.215377e+02, 1.728540e+02, 0.000000e+00, 0.000000e+00, 1.000000e+00]).reshape(3,3)

velo_to_cam_R = np.array([7.533745e-03, -9.999714e-01, -6.166020e-04, 1.480249e-02, 7.280733e-04, -9.998902e-01, 9.998621e-01, 7.523790e-03, 1.480755e-02]).reshape(3,3)
velo_to_cam_T = np.array([-4.069766e-03, -7.631618e-02, -2.717806e-01]).reshape(3,1)

velo_to_cam = np.vstack((np.hstack((velo_to_cam_R, velo_to_cam_T)), np.array([[0,0,0,1]])))

R_rect_00 =  np.array([9.999239e-01, 9.837760e-03, -7.445048e-03, 0.0,
                      -9.869795e-03, 9.999421e-01, -4.278459e-03, 0.0,
                       7.402527e-03, 4.351614e-03, 9.999631e-01,  0.0,
                       0.0,          0.0,          0.0,           1.0]).reshape(4,4)

cam_02_transform = np.array([1.0, 0.0, 0.0, 4.485728e+01/fx,
                             0.0, 1.0, 0.0, 2.163791e-01/fy,
                             0.0, 0.0, 1.0, 2.745884e-03,
                             0.0, 0.0, 0.0, 1.0]).reshape(4,4)


parser = argparse.ArgumentParser(description="Create Lidar Dataset")
parser.add_argument("path", help = "path_to_folder, end with number", type = str)
args = parser.parse_args()

#
# main_path = "/tmp/lidar_calibration_dataset/2011_09_26/2011_09_26_drive_0104"

main_path = args.path

def timestamp_sync(path):
    txt1 = np.loadtxt(path + "_extract/velodyne_points/timestamps.txt", dtype = str)
    txt2 = np.loadtxt(path + "_sync/velodyne_points/timestamps.txt", dtype = str)
    file_list = ns(glob.glob(path + "_extract/velodyne_points/data/*.txt"))

    times1 = txt1[:,1]
    times2 = txt2[:,1]

    for idx in range(times1.shape[0]):
        times1[idx] = times1[idx].split(":")[2]

    for idx in range(times2.shape[0]):
        times2[idx] = times2[idx].split(":")[2]

    start_pt = times2[0]
    end_pt = times2[-1]

    index_start = np.where(times1 == start_pt)[0][0]
    index_end = np.where(times1 == end_pt)[0][0]

    return file_list[index_start:index_end+1]

if not os.path.exists(main_path + "_sync/depth_maps_2"):
    os.makedirs(main_path + "_sync/depth_maps_2")

if not os.path.exists(main_path + "_sync/target_imgs_2"):
    os.makedirs(main_path + "_sync/target_imgs_2")

if not os.path.exists(main_path + "_sync/depth_maps_transformed_2"):
    os.makedirs(main_path + "_sync/depth_maps_transformed_2")

depth_maps_folder = main_path + "_sync/depth_maps_2"
target_img_folder = main_path + "_sync/target_imgs_2"
depth_maps_transformed_folder = main_path + "_sync/depth_maps_transformed_2"

point_files = timestamp_sync(main_path)
imgs_files = ns(glob.glob(main_path + "_sync/image_02/data/*.png"))

angle_limit = 0.34722965035593395/2.50
tr_limit = 0.34722965035593395*1.5

angle_list = np.zeros((1,16), dtype = np.float32)

for img_name, cloud_name in zip(imgs_files, point_files):

    print(img_name, cloud_name)

    omega_x = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    omega_y = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    omega_z = angle_limit*np.random.random_sample() - (angle_limit/2.0)
    tr_x = tr_limit*np.random.random_sample() - (tr_limit/2.0)
    tr_y = tr_limit*np.random.random_sample() - (tr_limit/2.0)
    tr_z = tr_limit*np.random.random_sample() - (tr_limit/2.0)

    theta = np.sqrt(omega_x**2 + omega_y**2 + omega_z**2)
    omega_cross = np.array([0.0, -omega_z, omega_y, omega_z, 0.0, -omega_x, -omega_y, omega_x, 0.0]).reshape(3,3)

    A = np.sin(theta)/theta
    B = (1.0 - np.cos(theta))/(theta**2)

    R = np.eye(3,3) + A*omega_cross + B*np.matmul(omega_cross, omega_cross)

    T = np.array([tr_x, tr_y, tr_z]).reshape(3,1)

    random_transform = np.vstack((np.hstack((R, T)), np.array([[0.0, 0.0, 0.0, 1.0]])))

    to_write_tr = np.expand_dims(np.ndarray.flatten(random_transform), 0)
    angle_list = np.vstack((angle_list, to_write_tr))

    points = np.loadtxt(cloud_name)
    points = points[:,:3]
    ones_col = np.ones(shape=(points.shape[0],1))
    points = np.hstack((points,ones_col))
    current_img = smc.imread(img_name)
    # current_img_color = smc.imread(color_name)

    img = smc.imread(img_name)
    img_ht = img.shape[0]
    img_wdt = img.shape[1]

    points_in_cam_axis = np.matmul(R_rect_00, (np.matmul(velo_to_cam, points.T)))
    transformed_points = np.matmul(random_transform, points_in_cam_axis)
    # transformed_points = transformed_points[:-1,:]

    points_2d = np.matmul(K, np.matmul(cam_02_transform, transformed_points)[:-1,:])

    # points_2d = np.matmul(K, points_in_cam_axis[:-1,:])

    Z = points_2d[2,:]
    x = (points_2d[0,:]/Z).T
    y = (points_2d[1,:]/Z).T

    x = np.clip(x, 0.0, img_wdt - 1)
    y = np.clip(y, 0.0, img_ht - 1)

    reprojected_img = np.zeros_like(img)
    for x_idx, y_idx,z_idx in zip(x,y,Z):
        if(z_idx>0):
            reprojected_img[int(y_idx), int(x_idx)] = z_idx

    smc.imsave(depth_maps_transformed_folder + "/" + img_name[-14:], reprojected_img)

    points_2d = np.matmul(K, np.matmul(cam_02_transform, points_in_cam_axis)[:-1,:])

    Z = points_2d[2,:]
    x = (points_2d[0,:]/Z).T
    y = (points_2d[1,:]/Z).T

    x = np.clip(x, 0.0, img_wdt - 1)
    y = np.clip(y, 0.0, img_ht - 1)

    reprojected_img = np.zeros_like(img)
    for x_idx, y_idx,z_idx in zip(x,y,Z):
        if(z_idx>0):
            reprojected_img[int(y_idx), int(x_idx)] = z_idx
    pooled_img = reprojected_img

    print(img_name[-14:])
    print(np.max(pooled_img), np.min(pooled_img))

    reconstructed_img = current_img*(pooled_img>0.)
    smc.imsave(depth_maps_folder + "/" + img_name[-14:], pooled_img)
    smc.imsave(target_img_folder + "/" + img_name[-14:], reconstructed_img)

np.savetxt(depth_maps_transformed_folder + "/../angle_list_2.txt", angle_list[1:], fmt = "%.4f")


================================================
FILE: code/dataset_files/dataset_builder_parallel.sh
================================================
SCRIPTPATH="$( cd "$(dirname "$0")" ; pwd -P )"

array=( 0001 0002 0005 0009 0011 0013 0014 0015 0017 0018 0019 0020 0022 0023 0027 0028 0029 0032 0035 0036 0039 0046 0048 0051 0052 0056 0057 0059 0060 0061 0064 0070 0079 0084 0086 0087 0091 0093 0095 0096 0101 0104 0106 0113 0117 )

task()
{
    wget -k -c https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_09_26_drive_$1/2011_09_26_drive_$1_extract.zip
    wget -k -c https://s3.eu-central-1.amazonaws.com/avg-kitti/raw_data/2011_09_26_drive_$1/2011_09_26_drive_$1_sync.zip
    echo "Downloading $1"

    unzip 2011_09_26_drive_$1_extract.zip
    unzip 2011_09_26_drive_$1_sync.zip
    echo "Extracting $1"

    python -B $SCRIPTPATH/dataset_build_color.py $PWD/2011_09_26/2011_09_26_drive_$1
    python -B $SCRIPTPATH/dataset_build_color_2.py $PWD/2011_09_26/2011_09_26_drive_$1
    echo "Perturbing Depth Maps for ${array[$id]}"
}


N=4
(
for id in {0..44}
do
   ((i=i%N)); ((i++==0)) && wait
   task ${array[$id]} &
done
)

wait
echo "All processes done!"


================================================
FILE: code/dataset_files/parser.py
================================================
import numpy as np
import scipy.misc as smc
from natsort import natsorted as ns
import glob, os
import argparse

parser = argparse.ArgumentParser(description="Create Lidar Dataset Parser file")
parser.add_argument("path", help = "path_to_folder", type = str)
args = parser.parse_args()

dataset_path = args.path

#Picking up all sync folders
folder_names = ns(glob.glob(dataset_path +"*_sync" + os.path.sep))

dataset_array = np.zeros(dtype = str, shape = (1,20))
dataset_array_2 = np.zeros(dtype = str, shape = (1,20))

for fn in folder_names:
    print fn
    file_names_source = ns(glob.glob(fn + "depth_maps_transformed/*.png"))
    file_names_target = ns(glob.glob(fn + "depth_maps/*.png"))
    img_source = ns(glob.glob(fn + "image_02/data/*.png"))
    img_target = ns(glob.glob(fn + "image_03/data/*.png"))
    transforms_list = np.loadtxt(fn + "angle_list.txt", dtype = str)

    file_names_source = np.array(file_names_source, dtype=str).reshape(-1,1)
    file_names_target = np.array(file_names_target, dtype=str).reshape(-1,1)
    img_source = np.array(img_source, dtype=str).reshape(-1,1)
    img_target = np.array(img_target, dtype=str).reshape(-1,1)

    dataset = np.hstack((file_names_source, file_names_target, img_source, img_target, transforms_list))
    print(dataset.shape)

    dataset_array = np.vstack((dataset_array, dataset))

    #######################################################################################

    file_names_source_2 = ns(glob.glob(fn + "depth_maps_transformed_2/*.png"))
    file_names_target_2 = ns(glob.glob(fn + "depth_maps_2/*.png"))

    transforms_list_2 = np.loadtxt(fn + "angle_list_2.txt", dtype = str)

    file_names_source_2 = np.array(file_names_source_2, dtype=str).reshape(-1,1)
    file_names_target_2 = np.array(file_names_target_2, dtype=str).reshape(-1,1)

    dataset_2 = np.hstack((file_names_source_2, file_names_target_2, img_source, img_target, transforms_list_2))
    print(dataset_2.shape)

    dataset_array_2 = np.vstack((dataset_array_2, dataset_2))




dataset_array = dataset_array[1:]
dataset_array_2 = dataset_array_2[1:]

final_array = np.vstack((dataset_array, dataset_array_2))

np.random.shuffle(final_array)
np.savetxt("parsed_set.txt", final_array, fmt = "%s", delimiter=' ')


================================================
FILE: code/model_utils.py
================================================
import os
import tensorflow as tf
from tf_ops.emd import tf_auctionmatch
from tf_ops.CD import tf_nndistance
from tf_ops.sampling import tf_sampling
from tf_ops.grouping.tf_grouping import query_ball_point, group_point

def pre_load_checkpoint(checkpoint_dir):
    ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
    if ckpt and ckpt.model_checkpoint_path:
        # print(" [*] Reading checkpoint from {}".format(ckpt.model_checkpoint_path))
        epoch_step = int(os.path.basename(ckpt.model_checkpoint_path).split('-')[1])
        return epoch_step,ckpt.model_checkpoint_path
    else:
        return 0,None


def get_repulsion_loss4(pred, nsample=20, radius=0.07):
    # pred: (batch_size, npoint,3)
    idx, pts_cnt = query_ball_point(radius, nsample, pred, pred)
    tf.summary.histogram('smooth/unque_index', pts_cnt)

    grouped_pred = group_point(pred, idx)  # (batch_size, npoint, nsample, 3)
    grouped_pred -= tf.expand_dims(pred, 2)

    ##get the uniform loss
    h = 0.03
    dist_square = tf.reduce_sum(grouped_pred ** 2, axis=-1)
    dist_square, idx = tf.nn.top_k(-dist_square, 5)
    dist_square = -dist_square[:, :, 1:]  # remove the first one
    dist_square = tf.maximum(1e-12,dist_square)
    dist = tf.sqrt(dist_square)
    weight = tf.exp(-dist_square/h**2)
    uniform_loss = tf.reduce_mean(radius-dist*weight)
    return uniform_loss


def get_emd_loss(pred, gt):
    """ pred: BxNxC,
        label: BxN, """
    batch_size = pred.get_shape()[0].value
    matchl_out, matchr_out = tf_auctionmatch.auction_match(pred, gt)
    matched_out = tf_sampling.gather_point(gt, matchl_out)
    dist = tf.reshape((pred - matched_out) ** 2, shape=(batch_size, -1))
    emd_loss = tf.reduce_sum(dist)
    return emd_loss

def get_cd_loss(pred, gt):
    """ pred: BxNxC,
        label: BxN, """
    dists_forward, _, dists_backward, _ = tf_nndistance.nn_distance(gt, pred)

    #dists_forward is for each element in gt, the cloest distance to this element
    CD_dist = tf.reduce_sum(dists_forward) + tf.reduce_sum(dists_backward)
    return CD_dist


if __name__ == '__main__':
    gt = tf.constant([[[1,0,0],[2,0,0],[3,0,0],[4,0,0]]],tf.float32)
    pred = tf.constant([[[-10,0,0], [1,0, 0], [2,0, 0], [3,0,0]]],tf.float32)

    dists_forward, idx1, dists_backward, idx2 = tf_nndistance.nn_distance(gt, pred)
    with tf.Session() as sess:
        print idx1.eval() # for each element in gt, the idx of pred
        print idx2.eval() # for each element in pred,
        print dists_forward.eval()
        print dists_backward.eval()


================================================
FILE: code/nw_loader_color.py
================================================
import numpy as np
import glob, os, argparse
import scipy.misc as smc
from tqdm import tqdm
import matplotlib.pyplot as plt

import config_res as config

total = config.net_params['total_frames']
total_train = config.net_params['total_frames_train']
total_validation = config.net_params['total_frames_validation']
partition_limit = config.net_params['partition_limit']

IMG_HT = config.depth_img_params['IMG_HT']
IMG_WDT = config.depth_img_params['IMG_WDT']
batch_size = config.net_params['batch_size']

dataset = np.loadtxt(config.paths['dataset_path_full'], dtype = str)

dataset_train = dataset[:total_train]
dataset_validation = dataset[total_train:total]

def shuffle():
    np.random.shuffle(dataset_train)
    np.random.shuffle(dataset_validation)

def load(p_no, mode):

    if(mode == "train"):
        dataset_part = dataset_train[p_no*partition_limit:(p_no + 1)*partition_limit]
    elif(mode == "validation"):
        dataset_part = dataset_validation[p_no*partition_limit:(p_no + 1)*partition_limit]
    source_file_names = dataset_part[:,0]
    target_file_names = dataset_part[:,1]
    source_image_names = dataset_part[:,2]
    target_image_names = dataset_part[:,3]
    transforms = np.float32(dataset_part[:,4:])

    target_container = np.zeros((partition_limit, IMG_HT, IMG_WDT, 1), dtype = np.float32)
    source_container = np.zeros((partition_limit, IMG_HT, IMG_WDT, 1), dtype = np.float32)
    source_img_container = np.zeros((partition_limit, IMG_HT, IMG_WDT, 3), dtype = np.float32)
    target_img_container = np.zeros((partition_limit, IMG_HT, IMG_WDT, 3), dtype = np.float32)
    transforms_container = np.zeros((partition_limit, 4, 4), dtype = np.float32)

    c_idx = 0
    for s_name, t_name, img_source_name, img_target_name, transform in tqdm(zip(source_file_names, target_file_names, source_image_names, target_image_names, transforms)):

        warped_ip = np.float32(smc.imread(s_name, True))
        warped_ip[0:5,:] = 0.0 ; warped_ip[:,0:5] = 0.0 ; warped_ip[IMG_HT - 5:,:] = 0.0 ; warped_ip[:,IMG_WDT-5:] = 0.0 ;
        warped_ip = (warped_ip - 40.0)/40.0
        source_container[c_idx, :, :, 0] = warped_ip

        target_ip = np.float32(smc.imread(t_name, True))
        target_ip[0:5,:] = 0.0 ; target_ip[:,0:5] = 0.0 ; target_ip[IMG_HT - 5:,:] = 0.0 ; target_ip[:,IMG_WDT-5:] = 0.0 ;
        target_ip = (target_ip - 40.0)/40.0
        target_container[c_idx, :, :, 0] = target_ip

        source_img = np.float32(smc.imread(img_source_name))
        source_img[0:5,:,:] = 0.0 ; source_img[:,0:5,:] = 0.0 ; source_img[IMG_HT - 5:,:,:] = 0.0 ; source_img[:,IMG_WDT-5:,:] = 0.0 ;
        source_img = (source_img - 127.5)/127.5
        source_img_container[c_idx, :, :, :] = source_img

        target_img = np.float32(smc.imread(img_target_name))
        target_img[0:5,:,:] = 0.0 ; target_img[:,0:5,:] = 0.0 ; target_img[IMG_HT - 5:,:,:] = 0.0 ; target_img[:,IMG_WDT-5:,:] = 0.0 ;
        target_img = (target_img - 127.5)/127.5
        target_img_container[c_idx, :, :, :] = target_img

        transforms_container[c_idx, :, :] = np.linalg.inv(transform.reshape(4,4))
        c_idx+=1

    source_container = source_container.reshape(partition_limit/batch_size, batch_size, IMG_HT, IMG_WDT , 1)
    target_container = target_container.reshape(partition_limit/batch_size, batch_size, IMG_HT, IMG_WDT , 1)
    source_img_container = source_img_container.reshape(partition_limit/batch_size, batch_size, IMG_HT, IMG_WDT, 3)
    target_img_container = target_img_container.reshape(partition_limit/batch_size, batch_size, IMG_HT, IMG_WDT, 3)
    transforms_container = transforms_container.reshape(partition_limit/batch_size, batch_size, 4, 4)

    return source_container, target_container, source_img_container, target_img_container, transforms_container


================================================
FILE: code/tf_ops/CD/__init__.py
================================================


================================================
FILE: code/tf_ops/CD/makefile
================================================
nvcc = /usr/local/cuda-8.0/bin/nvcc
cudalib = /usr/local/cuda-8.0/lib64/
tensorflow = /home/ganeshiyer/tensorflow/lib/python2.7/site-packages/tensorflow/include

all: tf_nndistance_so.so render_balls_so.so
.PHONY : all

tf_nndistance_so.so: tf_nndistance_g.cu.o tf_nndistance.cpp
	g++ -std=c++11 tf_nndistance.cpp tf_nndistance_g.cu.o -o tf_nndistance_so.so -shared -fPIC -I $(tensorflow) -lcudart -L $(cudalib) -O2 -D_GLIBCXX_USE_CXX11_ABI=0

tf_nndistance_g.cu.o: tf_nndistance_g.cu
	$(nvcc) -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11 -c -o tf_nndistance_g.cu.o tf_nndistance_g.cu -I $(tensorflow) -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -O2

render_balls_so.so: render_balls_so.cpp
	g++ -std=c++11 render_balls_so.cpp -o render_balls_so.so -shared -fPIC -O2 -D_GLIBCXX_USE_CXX11_ABI=0




================================================
FILE: code/tf_ops/CD/render_balls_so.cpp
================================================
#include <cstdio>
#include <vector>
#include <algorithm>
#include <math.h>
using namespace std;

struct PointInfo{
	int x,y,z;
	float r,g,b;
};

extern "C"{

void render_ball(int h,int w,unsigned char * show,int n,int * xyzs,float * c0,float * c1,float * c2,int r){
	r=max(r,1);
	vector<int> depth(h*w,-2100000000);
	vector<PointInfo> pattern;
	for (int dx=-r;dx<=r;dx++)
		for (int dy=-r;dy<=r;dy++)
			if (dx*dx+dy*dy<r*r){
				double dz=sqrt(double(r*r-dx*dx-dy*dy));
				PointInfo pinfo;
				pinfo.x=dx;
				pinfo.y=dy;
				pinfo.z=dz;
				pinfo.r=dz/r;
				pinfo.g=dz/r;
				pinfo.b=dz/r;
				pattern.push_back(pinfo);
			}
	double zmin=0,zmax=0;
	for (int i=0;i<n;i++){
		if (i==0){
			zmin=xyzs[i*3+2]-r;
			zmax=xyzs[i*3+2]+r;
		}else{
			zmin=min(zmin,double(xyzs[i*3+2]-r));
			zmax=max(zmax,double(xyzs[i*3+2]+r));
		}
	}
	for (int i=0;i<n;i++){
		int x=xyzs[i*3+0],y=xyzs[i*3+1],z=xyzs[i*3+2];
		for (int j=0;j<int(pattern.size());j++){
			int x2=x+pattern[j].x;
			int y2=y+pattern[j].y;
			int z2=z+pattern[j].z;
			if (!(x2<0 || x2>=h || y2<0 || y2>=w) && depth[x2*w+y2]<z2){
				depth[x2*w+y2]=z2;
				double intensity=min(1.0,(z2-zmin)/(zmax-zmin)*0.7+0.3);
				show[(x2*w+y2)*3+0]=pattern[j].b*c2[i]*intensity;
				show[(x2*w+y2)*3+1]=pattern[j].g*c0[i]*intensity;
				show[(x2*w+y2)*3+2]=pattern[j].r*c1[i]*intensity;
			}
		}
	}
}

}//extern "C"


================================================
FILE: code/tf_ops/CD/tf_nndistance.cpp
================================================
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
REGISTER_OP("NnDistance")
	.Input("xyz1: float32")
	.Input("xyz2: float32")
	.Output("dist1: float32")
	.Output("idx1: int32")
	.Output("dist2: float32")
	.Output("idx2: int32");
REGISTER_OP("NnDistanceGrad")
	.Input("xyz1: float32")
	.Input("xyz2: float32")
	.Input("grad_dist1: float32")
	.Input("idx1: int32")
	.Input("grad_dist2: float32")
	.Input("idx2: int32")
	.Output("grad_xyz1: float32")
	.Output("grad_xyz2: float32");
using namespace tensorflow;

static void nnsearch(int b,int n,int m,const float * xyz1,const float * xyz2,float * dist,int * idx){
	for (int i=0;i<b;i++){
		for (int j=0;j<n;j++){
			float x1=xyz1[(i*n+j)*3+0];
			float y1=xyz1[(i*n+j)*3+1];
			float z1=xyz1[(i*n+j)*3+2];
			double best=0;
			int besti=0;
			for (int k=0;k<m;k++){
				float x2=xyz2[(i*m+k)*3+0]-x1;
				float y2=xyz2[(i*m+k)*3+1]-y1;
				float z2=xyz2[(i*m+k)*3+2]-z1;
				double d=x2*x2+y2*y2+z2*z2;
				if (k==0 || d<best){
					best=d;
					besti=k;
				}
			}
			dist[i*n+j]=best;
			idx[i*n+j]=besti;
		}
	}
}

class NnDistanceOp : public OpKernel{
	public:
		explicit NnDistanceOp(OpKernelConstruction* context):OpKernel(context){}
		void Compute(OpKernelContext * context)override{
			const Tensor& xyz1_tensor=context->input(0);
			const Tensor& xyz2_tensor=context->input(1);
			OP_REQUIRES(context,xyz1_tensor.dims()==3,errors::InvalidArgument("NnDistance requires xyz1 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz1_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistance only accepts 3d point set xyz1"));
			int b=xyz1_tensor.shape().dim_size(0);
			int n=xyz1_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.dims()==3,errors::InvalidArgument("NnDistance requires xyz2 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistance only accepts 3d point set xyz2"));
			int m=xyz2_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(0)==b,errors::InvalidArgument("NnDistance expects xyz1 and xyz2 have same batch size"));
			auto xyz1_flat=xyz1_tensor.flat<float>();
			const float * xyz1=&xyz1_flat(0);
			auto xyz2_flat=xyz2_tensor.flat<float>();
			const float * xyz2=&xyz2_flat(0);
			Tensor * dist1_tensor=NULL;
			Tensor * idx1_tensor=NULL;
			Tensor * dist2_tensor=NULL;
			Tensor * idx2_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n},&dist1_tensor));
			OP_REQUIRES_OK(context,context->allocate_output(1,TensorShape{b,n},&idx1_tensor));
			auto dist1_flat=dist1_tensor->flat<float>();
			auto idx1_flat=idx1_tensor->flat<int>();
			OP_REQUIRES_OK(context,context->allocate_output(2,TensorShape{b,m},&dist2_tensor));
			OP_REQUIRES_OK(context,context->allocate_output(3,TensorShape{b,m},&idx2_tensor));
			auto dist2_flat=dist2_tensor->flat<float>();
			auto idx2_flat=idx2_tensor->flat<int>();
			float * dist1=&(dist1_flat(0));
			int * idx1=&(idx1_flat(0));
			float * dist2=&(dist2_flat(0));
			int * idx2=&(idx2_flat(0));
			nnsearch(b,n,m,xyz1,xyz2,dist1,idx1);
			nnsearch(b,m,n,xyz2,xyz1,dist2,idx2);
		}
};
REGISTER_KERNEL_BUILDER(Name("NnDistance").Device(DEVICE_CPU), NnDistanceOp);
class NnDistanceGradOp : public OpKernel{
	public:
		explicit NnDistanceGradOp(OpKernelConstruction* context):OpKernel(context){}
		void Compute(OpKernelContext * context)override{
			const Tensor& xyz1_tensor=context->input(0);
			const Tensor& xyz2_tensor=context->input(1);
			const Tensor& grad_dist1_tensor=context->input(2);
			const Tensor& idx1_tensor=context->input(3);
			const Tensor& grad_dist2_tensor=context->input(4);
			const Tensor& idx2_tensor=context->input(5);
			OP_REQUIRES(context,xyz1_tensor.dims()==3,errors::InvalidArgument("NnDistanceGrad requires xyz1 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz1_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistanceGrad only accepts 3d point set xyz1"));
			int b=xyz1_tensor.shape().dim_size(0);
			int n=xyz1_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.dims()==3,errors::InvalidArgument("NnDistanceGrad requires xyz2 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistanceGrad only accepts 3d point set xyz2"));
			int m=xyz2_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(0)==b,errors::InvalidArgument("NnDistanceGrad expects xyz1 and xyz2 have same batch size"));
			OP_REQUIRES(context,grad_dist1_tensor.shape()==(TensorShape{b,n}),errors::InvalidArgument("NnDistanceGrad requires grad_dist1 be of shape(batch,#points)"));
			OP_REQUIRES(context,idx1_tensor.shape()==(TensorShape{b,n}),errors::InvalidArgument("NnDistanceGrad requires idx1 be of shape(batch,#points)"));
			OP_REQUIRES(context,grad_dist2_tensor.shape()==(TensorShape{b,m}),errors::InvalidArgument("NnDistanceGrad requires grad_dist2 be of shape(batch,#points)"));
			OP_REQUIRES(context,idx2_tensor.shape()==(TensorShape{b,m}),errors::InvalidArgument("NnDistanceGrad requires idx2 be of shape(batch,#points)"));
			auto xyz1_flat=xyz1_tensor.flat<float>();
			const float * xyz1=&xyz1_flat(0);
			auto xyz2_flat=xyz2_tensor.flat<float>();
			const float * xyz2=&xyz2_flat(0);
			auto idx1_flat=idx1_tensor.flat<int>();
			const int * idx1=&idx1_flat(0);
			auto idx2_flat=idx2_tensor.flat<int>();
			const int * idx2=&idx2_flat(0);
			auto grad_dist1_flat=grad_dist1_tensor.flat<float>();
			const float * grad_dist1=&grad_dist1_flat(0);
			auto grad_dist2_flat=grad_dist2_tensor.flat<float>();
			const float * grad_dist2=&grad_dist2_flat(0);
			Tensor * grad_xyz1_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n,3},&grad_xyz1_tensor));
			Tensor * grad_xyz2_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(1,TensorShape{b,m,3},&grad_xyz2_tensor));
			auto grad_xyz1_flat=grad_xyz1_tensor->flat<float>();
			float * grad_xyz1=&grad_xyz1_flat(0);
			auto grad_xyz2_flat=grad_xyz2_tensor->flat<float>();
			float * grad_xyz2=&grad_xyz2_flat(0);
			for (int i=0;i<b*n*3;i++)
				grad_xyz1[i]=0;
			for (int i=0;i<b*m*3;i++)
				grad_xyz2[i]=0;
			for (int i=0;i<b;i++){
				for (int j=0;j<n;j++){
					float x1=xyz1[(i*n+j)*3+0];
					float y1=xyz1[(i*n+j)*3+1];
					float z1=xyz1[(i*n+j)*3+2];
					int j2=idx1[i*n+j];
					float x2=xyz2[(i*m+j2)*3+0];
					float y2=xyz2[(i*m+j2)*3+1];
					float z2=xyz2[(i*m+j2)*3+2];
					float g=grad_dist1[i*n+j]*2;
					grad_xyz1[(i*n+j)*3+0]+=g*(x1-x2);
					grad_xyz1[(i*n+j)*3+1]+=g*(y1-y2);
					grad_xyz1[(i*n+j)*3+2]+=g*(z1-z2);
					grad_xyz2[(i*m+j2)*3+0]-=(g*(x1-x2));
					grad_xyz2[(i*m+j2)*3+1]-=(g*(y1-y2));
					grad_xyz2[(i*m+j2)*3+2]-=(g*(z1-z2));
				}
				for (int j=0;j<m;j++){
					float x1=xyz2[(i*m+j)*3+0];
					float y1=xyz2[(i*m+j)*3+1];
					float z1=xyz2[(i*m+j)*3+2];
					int j2=idx2[i*m+j];
					float x2=xyz1[(i*n+j2)*3+0];
					float y2=xyz1[(i*n+j2)*3+1];
					float z2=xyz1[(i*n+j2)*3+2];
					float g=grad_dist2[i*m+j]*2;
					grad_xyz2[(i*m+j)*3+0]+=g*(x1-x2);
					grad_xyz2[(i*m+j)*3+1]+=g*(y1-y2);
					grad_xyz2[(i*m+j)*3+2]+=g*(z1-z2);
					grad_xyz1[(i*n+j2)*3+0]-=(g*(x1-x2));
					grad_xyz1[(i*n+j2)*3+1]-=(g*(y1-y2));
					grad_xyz1[(i*n+j2)*3+2]-=(g*(z1-z2));
				}
			}
		}
};
REGISTER_KERNEL_BUILDER(Name("NnDistanceGrad").Device(DEVICE_CPU), NnDistanceGradOp);

void NmDistanceKernelLauncher(int b,int n,const float * xyz,int m,const float * xyz2,float * result,int * result_i,float * result2,int * result2_i);
class NnDistanceGpuOp : public OpKernel{
	public:
		explicit NnDistanceGpuOp(OpKernelConstruction* context):OpKernel(context){}
		void Compute(OpKernelContext * context)override{
			const Tensor& xyz1_tensor=context->input(0);
			const Tensor& xyz2_tensor=context->input(1);
			OP_REQUIRES(context,xyz1_tensor.dims()==3,errors::InvalidArgument("NnDistance requires xyz1 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz1_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistance only accepts 3d point set xyz1"));
			int b=xyz1_tensor.shape().dim_size(0);
			int n=xyz1_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.dims()==3,errors::InvalidArgument("NnDistance requires xyz2 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistance only accepts 3d point set xyz2"));
			int m=xyz2_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(0)==b,errors::InvalidArgument("NnDistance expects xyz1 and xyz2 have same batch size"));
			auto xyz1_flat=xyz1_tensor.flat<float>();
			const float * xyz1=&xyz1_flat(0);
			auto xyz2_flat=xyz2_tensor.flat<float>();
			const float * xyz2=&xyz2_flat(0);
			Tensor * dist1_tensor=NULL;
			Tensor * idx1_tensor=NULL;
			Tensor * dist2_tensor=NULL;
			Tensor * idx2_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n},&dist1_tensor));
			OP_REQUIRES_OK(context,context->allocate_output(1,TensorShape{b,n},&idx1_tensor));
			auto dist1_flat=dist1_tensor->flat<float>();
			auto idx1_flat=idx1_tensor->flat<int>();
			OP_REQUIRES_OK(context,context->allocate_output(2,TensorShape{b,m},&dist2_tensor));
			OP_REQUIRES_OK(context,context->allocate_output(3,TensorShape{b,m},&idx2_tensor));
			auto dist2_flat=dist2_tensor->flat<float>();
			auto idx2_flat=idx2_tensor->flat<int>();
			float * dist1=&(dist1_flat(0));
			int * idx1=&(idx1_flat(0));
			float * dist2=&(dist2_flat(0));
			int * idx2=&(idx2_flat(0));
			NmDistanceKernelLauncher(b,n,xyz1,m,xyz2,dist1,idx1,dist2,idx2);
		}
};
REGISTER_KERNEL_BUILDER(Name("NnDistance").Device(DEVICE_GPU), NnDistanceGpuOp);

void NmDistanceGradKernelLauncher(int b,int n,const float * xyz1,int m,const float * xyz2,const float * grad_dist1,const int * idx1,const float * grad_dist2,const int * idx2,float * grad_xyz1,float * grad_xyz2);
class NnDistanceGradGpuOp : public OpKernel{
	public:
		explicit NnDistanceGradGpuOp(OpKernelConstruction* context):OpKernel(context){}
		void Compute(OpKernelContext * context)override{
			const Tensor& xyz1_tensor=context->input(0);
			const Tensor& xyz2_tensor=context->input(1);
			const Tensor& grad_dist1_tensor=context->input(2);
			const Tensor& idx1_tensor=context->input(3);
			const Tensor& grad_dist2_tensor=context->input(4);
			const Tensor& idx2_tensor=context->input(5);
			OP_REQUIRES(context,xyz1_tensor.dims()==3,errors::InvalidArgument("NnDistanceGrad requires xyz1 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz1_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistanceGrad only accepts 3d point set xyz1"));
			int b=xyz1_tensor.shape().dim_size(0);
			int n=xyz1_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.dims()==3,errors::InvalidArgument("NnDistanceGrad requires xyz2 be of shape (batch,#points,3)"));
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(2)==3,errors::InvalidArgument("NnDistanceGrad only accepts 3d point set xyz2"));
			int m=xyz2_tensor.shape().dim_size(1);
			OP_REQUIRES(context,xyz2_tensor.shape().dim_size(0)==b,errors::InvalidArgument("NnDistanceGrad expects xyz1 and xyz2 have same batch size"));
			OP_REQUIRES(context,grad_dist1_tensor.shape()==(TensorShape{b,n}),errors::InvalidArgument("NnDistanceGrad requires grad_dist1 be of shape(batch,#points)"));
			OP_REQUIRES(context,idx1_tensor.shape()==(TensorShape{b,n}),errors::InvalidArgument("NnDistanceGrad requires idx1 be of shape(batch,#points)"));
			OP_REQUIRES(context,grad_dist2_tensor.shape()==(TensorShape{b,m}),errors::InvalidArgument("NnDistanceGrad requires grad_dist2 be of shape(batch,#points)"));
			OP_REQUIRES(context,idx2_tensor.shape()==(TensorShape{b,m}),errors::InvalidArgument("NnDistanceGrad requires idx2 be of shape(batch,#points)"));
			auto xyz1_flat=xyz1_tensor.flat<float>();
			const float * xyz1=&xyz1_flat(0);
			auto xyz2_flat=xyz2_tensor.flat<float>();
			const float * xyz2=&xyz2_flat(0);
			auto idx1_flat=idx1_tensor.flat<int>();
			const int * idx1=&idx1_flat(0);
			auto idx2_flat=idx2_tensor.flat<int>();
			const int * idx2=&idx2_flat(0);
			auto grad_dist1_flat=grad_dist1_tensor.flat<float>();
			const float * grad_dist1=&grad_dist1_flat(0);
			auto grad_dist2_flat=grad_dist2_tensor.flat<float>();
			const float * grad_dist2=&grad_dist2_flat(0);
			Tensor * grad_xyz1_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n,3},&grad_xyz1_tensor));
			Tensor * grad_xyz2_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(1,TensorShape{b,m,3},&grad_xyz2_tensor));
			auto grad_xyz1_flat=grad_xyz1_tensor->flat<float>();
			float * grad_xyz1=&grad_xyz1_flat(0);
			auto grad_xyz2_flat=grad_xyz2_tensor->flat<float>();
			float * grad_xyz2=&grad_xyz2_flat(0);
			NmDistanceGradKernelLauncher(b,n,xyz1,m,xyz2,grad_dist1,idx1,grad_dist2,idx2,grad_xyz1,grad_xyz2);
		}
};
REGISTER_KERNEL_BUILDER(Name("NnDistanceGrad").Device(DEVICE_GPU), NnDistanceGradGpuOp);


================================================
FILE: code/tf_ops/CD/tf_nndistance.py
================================================
import os
import tensorflow as tf
from tensorflow.python.framework import ops

BASE_DIR = os.path.dirname(os.path.abspath(__file__))

nn_distance_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_nndistance_so.so'))

def nn_distance(xyz1,xyz2):
	'''
Computes the distance of nearest neighbors for a pair of point clouds
input: xyz1: (batch_size,#points_1,3)  the first point cloud
input: xyz2: (batch_size,#points_2,3)  the second point cloud
output: dist1: (batch_size,#point_1)   distance from first to second
output: idx1:  (batch_size,#point_1)   nearest neighbor from first to second
output: dist2: (batch_size,#point_2)   distance from second to first
output: idx2:  (batch_size,#point_2)   nearest neighbor from second to first
	'''
	return nn_distance_module.nn_distance(xyz1,xyz2)
#@tf.RegisterShape('NnDistance')
#def _nn_distance_shape(op):
	#shape1=op.inputs[0].get_shape().with_rank(3)
	#shape2=op.inputs[1].get_shape().with_rank(3)
	#return [tf.TensorShape([shape1.dims[0],shape1.dims[1]]),tf.TensorShape([shape1.dims[0],shape1.dims[1]]),
		#tf.TensorShape([shape2.dims[0],shape2.dims[1]]),tf.TensorShape([shape2.dims[0],shape2.dims[1]])]
@ops.RegisterGradient('NnDistance')
def _nn_distance_grad(op,grad_dist1,grad_idx1,grad_dist2,grad_idx2):
	xyz1=op.inputs[0]
	xyz2=op.inputs[1]
	idx1=op.outputs[1]
	idx2=op.outputs[3]
	return nn_distance_module.nn_distance_grad(xyz1,xyz2,grad_dist1,idx1,grad_dist2,idx2)


if __name__=='__main__':
	import numpy as np
	import random
	import time
	from tensorflow.python.ops.gradient_checker import compute_gradient
	random.seed(100)
	np.random.seed(100)
	with tf.Session('') as sess:
		xyz1=np.random.randn(32,16384,3).astype('float32')
		xyz2=np.random.randn(32,1024,3).astype('float32')
		#with tf.device('/gpu:0'):
		if True:
			inp1=tf.Variable(xyz1)
			inp2=tf.constant(xyz2)
			reta,retb,retc,retd=nn_distance(inp1,inp2)
			loss=tf.reduce_sum(reta)+tf.reduce_sum(retc)
			train=tf.train.GradientDescentOptimizer(learning_rate=0.05).minimize(loss)
		sess.run(tf.initialize_all_variables())
		t0=time.time()
		t1=t0
		best=1e100
		for i in xrange(100):
			trainloss,_=sess.run([loss,train])
			newt=time.time()
			best=min(best,newt-t1)
			print i,trainloss,(newt-t0)/(i+1),best
			t1=newt
		#print sess.run([inp1,retb,inp2,retd])
		#grads=compute_gradient([inp1,inp2],[(16,32,3),(16,32,3)],loss,(1,),[xyz1,xyz2])
		#for i,j in grads:
			#print i.shape,j.shape,np.mean(np.abs(i-j)),np.mean(np.abs(i)),np.mean(np.abs(j))
		#for i in xrange(10):
			#t0=time.time()
			#a,b,c,d=sess.run([reta,retb,retc,retd],feed_dict={inp1:xyz1,inp2:xyz2})
			#print 'time',time.time()-t0
		#print a.shape,b.shape,c.shape,d.shape
		#print a.dtype,b.dtype,c.dtype,d.dtype
		#samples=np.array(random.sample(range(xyz2.shape[1]),100),dtype='int32')
		#dist1=((xyz1[:,samples,None,:]-xyz2[:,None,:,:])**2).sum(axis=-1).min(axis=-1)
		#idx1=((xyz1[:,samples,None,:]-xyz2[:,None,:,:])**2).sum(axis=-1).argmin(axis=-1)
		#print np.abs(dist1-a[:,samples]).max()
		#print np.abs(idx1-b[:,samples]).max()
		#dist2=((xyz2[:,samples,None,:]-xyz1[:,None,:,:])**2).sum(axis=-1).min(axis=-1)
		#idx2=((xyz2[:,samples,None,:]-xyz1[:,None,:,:])**2).sum(axis=-1).argmin(axis=-1)
		#print np.abs(dist2-c[:,samples]).max()
		#print np.abs(idx2-d[:,samples]).max()



================================================
FILE: code/tf_ops/CD/tf_nndistance_g.cu
================================================
#if GOOGLE_CUDA
#define EIGEN_USE_GPU
#include "third_party/eigen3/unsupported/Eigen/CXX11/Tensor"

__global__ void NmDistanceKernel(int b,int n,const float * xyz,int m,const float * xyz2,float * result,int * result_i){
	const int batch=512;
	__shared__ float buf[batch*3];
	for (int i=blockIdx.x;i<b;i+=gridDim.x){
		for (int k2=0;k2<m;k2+=batch){
			int end_k=min(m,k2+batch)-k2;
			for (int j=threadIdx.x;j<end_k*3;j+=blockDim.x){
				buf[j]=xyz2[(i*m+k2)*3+j];
			}
			__syncthreads();
			for (int j=threadIdx.x+blockIdx.y*blockDim.x;j<n;j+=blockDim.x*gridDim.y){
				float x1=xyz[(i*n+j)*3+0];
				float y1=xyz[(i*n+j)*3+1];
				float z1=xyz[(i*n+j)*3+2];
				int best_i=0;
				float best=0;
				int end_ka=end_k-(end_k&3);
				if (end_ka==batch){
					for (int k=0;k<batch;k+=4){
						{
							float x2=buf[k*3+0]-x1;
							float y2=buf[k*3+1]-y1;
							float z2=buf[k*3+2]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (k==0 || d<best){
								best=d;
								best_i=k+k2;
							}
						}
						{
							float x2=buf[k*3+3]-x1;
							float y2=buf[k*3+4]-y1;
							float z2=buf[k*3+5]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+1;
							}
						}
						{
							float x2=buf[k*3+6]-x1;
							float y2=buf[k*3+7]-y1;
							float z2=buf[k*3+8]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+2;
							}
						}
						{
							float x2=buf[k*3+9]-x1;
							float y2=buf[k*3+10]-y1;
							float z2=buf[k*3+11]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+3;
							}
						}
					}
				}else{
					for (int k=0;k<end_ka;k+=4){
						{
							float x2=buf[k*3+0]-x1;
							float y2=buf[k*3+1]-y1;
							float z2=buf[k*3+2]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (k==0 || d<best){
								best=d;
								best_i=k+k2;
							}
						}
						{
							float x2=buf[k*3+3]-x1;
							float y2=buf[k*3+4]-y1;
							float z2=buf[k*3+5]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+1;
							}
						}
						{
							float x2=buf[k*3+6]-x1;
							float y2=buf[k*3+7]-y1;
							float z2=buf[k*3+8]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+2;
							}
						}
						{
							float x2=buf[k*3+9]-x1;
							float y2=buf[k*3+10]-y1;
							float z2=buf[k*3+11]-z1;
							float d=x2*x2+y2*y2+z2*z2;
							if (d<best){
								best=d;
								best_i=k+k2+3;
							}
						}
					}
				}
				for (int k=end_ka;k<end_k;k++){
					float x2=buf[k*3+0]-x1;
					float y2=buf[k*3+1]-y1;
					float z2=buf[k*3+2]-z1;
					float d=x2*x2+y2*y2+z2*z2;
					if (k==0 || d<best){
						best=d;
						best_i=k+k2;
					}
				}
				if (k2==0 || result[(i*n+j)]>best){
					result[(i*n+j)]=best;
					result_i[(i*n+j)]=best_i;
				}
			}
			__syncthreads();
		}
	}
}
void NmDistanceKernelLauncher(int b,int n,const float * xyz,int m,const float * xyz2,float * result,int * result_i,float * result2,int * result2_i){
	NmDistanceKernel<<<dim3(32,16,1),512>>>(b,n,xyz,m,xyz2,result,result_i);
	NmDistanceKernel<<<dim3(32,16,1),512>>>(b,m,xyz2,n,xyz,result2,result2_i);
}
__global__ void NmDistanceGradKernel(int b,int n,const float * xyz1,int m,const float * xyz2,const float * grad_dist1,const int * idx1,float * grad_xyz1,float * grad_xyz2){
	for (int i=blockIdx.x;i<b;i+=gridDim.x){
		for (int j=threadIdx.x+blockIdx.y*blockDim.x;j<n;j+=blockDim.x*gridDim.y){
			float x1=xyz1[(i*n+j)*3+0];
			float y1=xyz1[(i*n+j)*3+1];
			float z1=xyz1[(i*n+j)*3+2];
			int j2=idx1[i*n+j];
			float x2=xyz2[(i*m+j2)*3+0];
			float y2=xyz2[(i*m+j2)*3+1];
			float z2=xyz2[(i*m+j2)*3+2];
			float g=grad_dist1[i*n+j]*2;
			atomicAdd(&(grad_xyz1[(i*n+j)*3+0]),g*(x1-x2));
			atomicAdd(&(grad_xyz1[(i*n+j)*3+1]),g*(y1-y2));
			atomicAdd(&(grad_xyz1[(i*n+j)*3+2]),g*(z1-z2));
			atomicAdd(&(grad_xyz2[(i*m+j2)*3+0]),-(g*(x1-x2)));
			atomicAdd(&(grad_xyz2[(i*m+j2)*3+1]),-(g*(y1-y2)));
			atomicAdd(&(grad_xyz2[(i*m+j2)*3+2]),-(g*(z1-z2)));
		}
	}
}
void NmDistanceGradKernelLauncher(int b,int n,const float * xyz1,int m,const float * xyz2,const float * grad_dist1,const int * idx1,const float * grad_dist2,const int * idx2,float * grad_xyz1,float * grad_xyz2){
	cudaMemset(grad_xyz1,0,b*n*3*4);
	cudaMemset(grad_xyz2,0,b*m*3*4);
	NmDistanceGradKernel<<<dim3(1,16,1),256>>>(b,n,xyz1,m,xyz2,grad_dist1,idx1,grad_xyz1,grad_xyz2);
	NmDistanceGradKernel<<<dim3(1,16,1),256>>>(b,m,xyz2,n,xyz1,grad_dist2,idx2,grad_xyz2,grad_xyz1);
}

#endif


================================================
FILE: code/tf_ops/__init__.py
================================================


================================================
FILE: code/tf_ops/emd/__init__.py
================================================


================================================
FILE: code/tf_ops/emd/tf_auctionmatch.cpp
================================================
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/common_shape_fns.h"
#include <algorithm>
#include <vector>
#include <math.h>
using namespace tensorflow;
REGISTER_OP("AuctionMatch")
	.Input("xyz1: float32")
	.Input("xyz2: float32")
	.Output("matchl: int32")
	.Output("matchr: int32")
	.SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c){
        ::tensorflow::shape_inference::ShapeHandle dims1;
        c->WithRank(c->input(0), 3, &dims1);
        ::tensorflow::shape_inference::ShapeHandle dims2;
        c->WithRank(c->input(1), 3, &dims2);
        ::tensorflow::shape_inference::ShapeHandle output1 = c->MakeShape({c->Dim(dims1, 0), c->Dim(dims1, 1)});
        c->set_output(0, output1);
        ::tensorflow::shape_inference::ShapeHandle output2 = c->MakeShape({c->Dim(dims2, 0), c->Dim(dims2, 1)});
        c->set_output(1, output2);
        return Status::OK();
	});
void AuctionMatchLauncher(int b,int n,const float * xyz1,const float * xyz2,int * matchl,int * matchr,float * cost);

class AuctionMatchGpuOp: public OpKernel{
	public:
		explicit AuctionMatchGpuOp(OpKernelConstruction* context):OpKernel(context){}
		void Compute(OpKernelContext * context)override{
			const Tensor& xyz1_tensor=context->input(0);
			OP_REQUIRES(context,xyz1_tensor.dims()==3 && xyz1_tensor.shape().dim_size(2)==3,errors::InvalidArgument("ApproxMatch expects (batch_size,num_points,3) xyz1 shape"));
			auto xyz1_flat=xyz1_tensor.flat<float>();
			const float * xyz1=&(xyz1_flat(0));
			int b=xyz1_tensor.shape().dim_size(0);
			int n=xyz1_tensor.shape().dim_size(1);
			OP_REQUIRES(context,n<=4096,errors::InvalidArgument("AuctionMatch handles at most 4096 dataset points"));

			const Tensor& xyz2_tensor=context->input(1);
			OP_REQUIRES(context,xyz2_tensor.dims()==3 && xyz2_tensor.shape().dim_size(2)==3 && xyz2_tensor.shape().dim_size(0)==b && xyz2_tensor.shape().dim_size(1)==n,errors::InvalidArgument("AuctionMatch expects (batch_size,num_points,3) xyz2 shape, and shape must match with xyz1"));
			auto xyz2_flat=xyz2_tensor.flat<float>();
			const float * xyz2=&(xyz2_flat(0));

			Tensor * matchl_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n},&matchl_tensor));
			auto matchl_flat=matchl_tensor->flat<int>();
			int * matchl=&(matchl_flat(0));
			Tensor * matchr_tensor=NULL;
			OP_REQUIRES_OK(context,context->allocate_output(1,TensorShape{b,n},&matchr_tensor));
			auto matchr_flat=matchr_tensor->flat<int>();
			int * matchr=&(matchr_flat(0));

			Tensor temp_tensor;
			OP_REQUIRES_OK(context,context->allocate_temp(DataTypeToEnum<float>::value,TensorShape{b,n,n},&temp_tensor));
			auto temp_flat=temp_tensor.flat<float>();
			float * temp=&(temp_flat(0));

			AuctionMatchLauncher(b,n,xyz1,xyz2,matchl,matchr,temp);
		}
};
REGISTER_KERNEL_BUILDER(Name("AuctionMatch").Device(DEVICE_GPU), AuctionMatchGpuOp);


================================================
FILE: code/tf_ops/emd/tf_auctionmatch.py
================================================
import tensorflow as tf
from tensorflow.python.framework import ops
import sys
import os

BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
auctionmatch_module = tf.load_op_library(os.path.join(BASE_DIR, 'tf_auctionmatch_so.so'))

def auction_match(xyz1,xyz2):
	'''
input:
	xyz1 : batch_size * #points * 3
	xyz2 : batch_size * #points * 3
returns:
	matchl : batch_size * #npoints
	matchr : batch_size * #npoints
	'''
	print ("Here")
	return auctionmatch_module.auction_match(xyz1,xyz2)
ops.NoGradient('AuctionMatch')

# TF1.0 API requires set shape in C++
# @tf.RegisterShape('AuctionMatch')
# def _auction_match_shape(op):
# 	shape1=op.inputs[0].get_shape().with_rank(3)
# 	shape2=op.inputs[1].get_shape().with_rank(3)
# 	return [
# 		tf.TensorShape([shape1.dims[0],shape1.dims[1]]),
# 		tf.TensorShape([shape2.dims[0],shape2.dims[1]])
# 	]

if __name__=='__main__':
    from tf_ops.grouping import tf_grouping
    from tf_ops.sampling import tf_sampling

    npoint=4096
    xyz1_in=tf.placeholder(tf.float32,shape=(32,npoint,3))
    xyz2_in=tf.placeholder(tf.float32,shape=(32,npoint,3))
    matchl_out,matchr_out=auction_match(xyz1_in,xyz2_in)
    matched_out=tf_sampling.gather_point(xyz2_in,matchl_out)
    import numpy as np
    np.random.seed(100)
    xyz1=np.random.randn(32,npoint,3).astype('float32')
    xyz2=xyz1.copy()+np.random.randn(32,npoint,3)*0.01
    for i in xrange(len(xyz2)):
        xyz2[i]=np.roll(xyz2[i],i,axis=0)
    with tf.Session('') as sess:
        ret=sess.run(matched_out,feed_dict={xyz1_in:xyz1,xyz2_in:xyz2})
    print ((xyz1-ret)**2).mean()


================================================
FILE: code/tf_ops/emd/tf_auctionmatch_compile.sh
================================================
echo 'nvcc'
/usr/local/cuda-8.0/bin/nvcc tf_auctionmatch_g.cu -o tf_auctionmatch_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC -arch=sm_30 
echo 'g++'

g++ -std=c++11 tf_auctionmatch.cpp tf_auctionmatch_g.cu.o -o tf_auctionmatch_so.so -shared -fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -I /home/ganeshiyer/tensorflow/lib/python2.7/site-packages/tensorflow/include  -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2


================================================
FILE: code/tf_ops/emd/tf_auctionmatch_g.cu
================================================
#include <cstdio>
__global__ void AuctionMatchKernel(int b,int n,const float * __restrict__ xyz1,const float * __restrict__ xyz2,int * matchl,int * matchr,float * cost){
	//this kernel handles up to 4096 points
	const int NMax=4096;
	__shared__ short Queue[NMax];
	__shared__ short matchrbuf[NMax];
	__shared__ float pricer[NMax];
	__shared__ float bests[32][3];
	__shared__ int qhead,qlen;
	const int BufLen=2048;
	__shared__ float buf[BufLen];
	for (int bno=blockIdx.x;bno<b;bno+=gridDim.x){
		int cnt=0;
		float tolerance=1e-4;
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			matchl[bno*n+j]=-1;
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			matchrbuf[j]=-1;
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			Queue[j]=j;
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			pricer[j]=0;
		const int Block=512;
		for (int k0=0;k0<n;k0+=Block){
			int k1=min(n,k0+Block);
			for (int k=threadIdx.x;k<(k1-k0)*3;k+=blockDim.x)
				buf[k]=xyz1[bno*n*3+k0*3+k];
			__syncthreads();
			for (int j=threadIdx.x;j<n;j+=blockDim.x){
				float x2=xyz2[bno*n*3+j*3+0];
				float y2=xyz2[bno*n*3+j*3+1];
				float z2=xyz2[bno*n*3+j*3+2];
				for (int k=k0;k<k1;k++){
					float x1=buf[(k-k0)*3+0];
					float y1=buf[(k-k0)*3+1];
					float z1=buf[(k-k0)*3+2];
					float d=sqrtf((x1-x2)*(x1-x2)+(y1-y2)*(y1-y2)+(z1-z2)*(z1-z2));
					cost[blockIdx.x*n*n+k*n+j]=d;
				}
			}
			__syncthreads();
		}
		//calculate the distacne
		if (threadIdx.x==0){
			qhead=0;
			qlen=n;
		}
		__syncthreads();
		int loaded=0;
		float value9,value10,value11,value12,value13,value14,value15,value16;
		while (qlen){
			int i=Queue[qhead];
			int i2;
			if (qhead+1<n)
				i2=Queue[qhead+1];
			else
				i2=Queue[0];
			float best=1e38f,best2=1e38f;
			int bestj=0;
			if (n==blockDim.x*8){
				int j=threadIdx.x;
				float value1,value2,value3,value4,value5,value6,value7,value8;
				if (loaded){
					value1=value9+pricer[j];
					value2=value10+pricer[j+blockDim.x];
					value3=value11+pricer[j+blockDim.x*2];
					value4=value12+pricer[j+blockDim.x*3];
					value5=value13+pricer[j+blockDim.x*4];
					value6=value14+pricer[j+blockDim.x*5];
					value7=value15+pricer[j+blockDim.x*6];
					value8=value16+pricer[j+blockDim.x*7];
					loaded=0;
				}else{
					value1=cost[blockIdx.x*n*n+i*n+j]+pricer[j];
					value2=cost[blockIdx.x*n*n+i*n+j+blockDim.x]+pricer[j+blockDim.x];
					value3=cost[blockIdx.x*n*n+i*n+j+blockDim.x*2]+pricer[j+blockDim.x*2];
					value4=cost[blockIdx.x*n*n+i*n+j+blockDim.x*3]+pricer[j+blockDim.x*3];
					value5=cost[blockIdx.x*n*n+i*n+j+blockDim.x*4]+pricer[j+blockDim.x*4];
					value6=cost[blockIdx.x*n*n+i*n+j+blockDim.x*5]+pricer[j+blockDim.x*5];
					value7=cost[blockIdx.x*n*n+i*n+j+blockDim.x*6]+pricer[j+blockDim.x*6];
					value8=cost[blockIdx.x*n*n+i*n+j+blockDim.x*7]+pricer[j+blockDim.x*7];
					value9=cost[blockIdx.x*n*n+i2*n+j];
					value10=cost[blockIdx.x*n*n+i2*n+j+blockDim.x];
					value11=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*2];
					value12=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*3];
					value13=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*4];
					value14=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*5];
					value15=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*6];
					value16=cost[blockIdx.x*n*n+i2*n+j+blockDim.x*7];
					loaded=qlen>1;
				}
				int vj,vj2,vj3,vj4;
				if (value1<value2){
					vj=j;
				}else{
					vj=j+blockDim.x;
					float t=value1;
					value1=value2;
					value2=t;
				}
				if (value3<value4){
					vj2=j+blockDim.x*2;
				}else{
					vj2=j+blockDim.x*3;
					float t=value3;
					value3=value4;
					value4=t;
				}
				if (value5<value6){
					vj3=j+blockDim.x*4;
				}else{
					vj3=j+blockDim.x*5;
					float t=value5;
					value5=value6;
					value6=t;
				}
				if (value7<value8){
					vj4=j+blockDim.x*6;
				}else{
					vj4=j+blockDim.x*7;
					float t=value7;
					value7=value8;
					value8=t;
				}
				if (value1<value3){
					value2=fminf(value2,value3);
				}else{
					value2=fminf(value1,value4);
					value1=value3;
					vj=vj2;
				}
				if (value5<value7){
					value6=fminf(value6,value7);
				}else{
					value6=fminf(value5,value8);
					value5=value7;
					vj3=vj4;
				}
				if (value1<value5){
					best=value1;
					bestj=vj;
					best2=fminf(value2,value5);
				}else{
					best2=fminf(value1,value6);
					best=value5;
					bestj=vj3;
				}
			}else if (n>=blockDim.x*4){
				for (int j=threadIdx.x;j<n;j+=blockDim.x*4){
					float value1=cost[blockIdx.x*n*n+i*n+j]+pricer[j];
					float value2=cost[blockIdx.x*n*n+i*n+j+blockDim.x]+pricer[j+blockDim.x];
					float value3=cost[blockIdx.x*n*n+i*n+j+blockDim.x*2]+pricer[j+blockDim.x*2];
					float value4=cost[blockIdx.x*n*n+i*n+j+blockDim.x*3]+pricer[j+blockDim.x*3];
					int vj,vj2;
					if (value1<value2){
						vj=j;
					}else{
						vj=j+blockDim.x;
						float t=value1;
						value1=value2;
						value2=t;
					}
					if (value3<value4){
						vj2=j+blockDim.x*2;
					}else{
						vj2=j+blockDim.x*3;
						float t=value3;
						value3=value4;
						value4=t;
					}
					if (value1<value3){
						value2=fminf(value2,value3);
					}else{
						value2=fminf(value1,value4);
						value1=value3;
						vj=vj2;
					}
					if (best<value1){
						best2=fminf(best2,value1);
					}else{
						best2=fminf(best,value2);
						best=value1;
						bestj=vj;
					}
				}
			}else if (n>=blockDim.x*2){
				for (int j=threadIdx.x;j<n;j+=blockDim.x*2){
					float value1=cost[blockIdx.x*n*n+i*n+j]+pricer[j];
					float value2=cost[blockIdx.x*n*n+i*n+j+blockDim.x]+pricer[j+blockDim.x];
					int vj;
					if (value1<value2){
						vj=j;
					}else{
						vj=j+blockDim.x;
						float t=value1;
						value1=value2;
						value2=t;
					}
					if (best<value1){
						best2=fminf(best2,value1);
					}else{
						best2=fminf(best,value2);
						best=value1;
						bestj=vj;
					}
				}
			}else{
				for (int j=threadIdx.x;j<n;j+=blockDim.x){
					float value=cost[blockIdx.x*n*n+i*n+j]+pricer[j];
					if (best<value){
						best2=fminf(best2,value);
					}else{
						best2=best;
						bestj=j;
						best=value;
					}
				}
			}
			for (int i=16;i>0;i>>=1){
				float b1=__shfl_down(best,i,32);
				float b2=__shfl_down(best2,i,32);
				int bj=__shfl_down(bestj,i,32);
				if (best<b1){
					best2=fminf(b1,best2);
				}else{
					best=b1;
					best2=fminf(best,b2);
					bestj=bj;
				}
			}
			if ((threadIdx.x&31)==0){
				bests[threadIdx.x>>5][0]=best;
				bests[threadIdx.x>>5][1]=best2;
				*(int*)&bests[threadIdx.x>>5][2]=bestj;
			}
			__syncthreads();
			int nn=blockDim.x>>5;
			if (threadIdx.x<nn){
				best=bests[threadIdx.x][0];
				best2=bests[threadIdx.x][1];
				bestj=*(int*)&bests[threadIdx.x][2];
				for (int i=nn>>1;i>0;i>>=1){
					float b1=__shfl_down(best,i,32);
					float b2=__shfl_down(best2,i,32);
					int bj=__shfl_down(bestj,i,32);
					if (best<b1){
						best2=fminf(b1,best2);
					}else{
						best=b1;
						best2=fminf(best,b2);
						bestj=bj;
					}
				}
			}
			if (threadIdx.x==0){
				float delta=best2-best+tolerance;
				qhead++;
				qlen--;
				if (qhead>=n)
					qhead-=n;
				int old=matchrbuf[bestj];
				pricer[bestj]+=delta;
				cnt++;
				if (old!=-1){
					int ql=qlen;
					int tail=qhead+ql;
					qlen=ql+1;
					if (tail>=n)
						tail-=n;
					Queue[tail]=old;
				}
				if (cnt==(40*n)){
					if (tolerance==1.0)
						qlen=0;
					tolerance=fminf(1.0,tolerance*100);
					cnt=0;
				}
			}
			__syncthreads();
			if (threadIdx.x==0){
				matchrbuf[bestj]=i;
			}
		}
		__syncthreads();
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			matchr[bno*n+j]=matchrbuf[j];
		for (int j=threadIdx.x;j<n;j+=blockDim.x)
			matchl[bno*n+matchrbuf[j]]=j;
		__syncthreads();
	}
}
void AuctionMatchLauncher(int b,int n,const float * xyz1,const float * xyz2,int * matchl,int * matchr,float * cost){
	AuctionMatchKernel<<<32,512>>>(b,n,xyz1,xyz2,matchl,matchr,cost);
}



================================================
FILE: code/tf_ops/grouping/__init__.py
================================================


================================================
FILE: code/tf_ops/grouping/compile.sh
================================================
g++ query_ball_point.cpp -o query_ball_point
nvcc query_ball_point.cu -o query_ball_point_cuda
nvcc query_ball_point_block.cu -o query_ball_point_block
nvcc query_ball_point_grid.cu -o query_ball_point_grid
nvcc query_ball_point_grid_count.cu -o query_ball_point_grid_count
g++ -Wall selection_sort.cpp -o selection_sort
nvcc selection_sort.cu -o selection_sort_cuda


================================================
FILE: code/tf_ops/grouping/query_ball_point.cpp
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}
// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)
// output: idx (b,m,nsample)
void query_ball_point_cpu(int b, int n, int m, const float* radius, int nsample, const float *xyz1, const float *xyz2, int *idx) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            int cnt = 0;
            for (int k=0;k<n;++k) {
                if (cnt == nsample)
                    break; // only pick the FIRST nsample points in the ball
	        float x2=xyz2[j*3+0];
	        float y2=xyz2[j*3+1];
	        float z2=xyz2[j*3+2];
	        float x1=xyz1[k*3+0];
	        float y1=xyz1[k*3+1];
	        float z1=xyz1[k*3+2];
		float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
                if (d<*radius) {
                    if (cnt==0) { // set ALL indices to k, s.t. if there are less points in ball than nsample, we still have valid (repeating) indices
                        for (int l=0;l<nsample;++l)
                            idx[j*nsample+l] = k;
                    }
                    idx[j*nsample+cnt] = k;
                    cnt+=1;
                }
            }
        }
        xyz1+=n*3;
        xyz2+=m*3;
        idx+=m*nsample;
    }
}


// input: points (b,n,c), idx (b,m,nsample)
// output: out (b,m,nsample,c)
void group_point_cpu(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int k=0;k<nsample;++k) {
                int ii = idx[j*nsample+k];
                for (int l=0;l<c;++l) {
                    out[j*nsample*c+k*c+l] = points[ii*c+l];
                }
            }
        }
        points+=n*c;
        idx+=m*nsample;
        out+=m*nsample*c;
    }
}

// input: grad_out (b,m,nsample,c), idx (b,m,nsample), 
// output: grad_points (b,n,c)
void group_point_grad_cpu(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int k=0;k<nsample;++k) {
                int ii = idx[j*nsample+k];
                for (int l=0;l<c;++l) {
                     grad_points[ii*c+l] += grad_out[j*nsample*c+k*c+l];
                }
            }
        }
        idx+=m*nsample;
        grad_out+=m*nsample*c;
        grad_points+=n*c;
    }
}

int main()
{
    int b=32,n=512,m=128,nsample=64,c=64;
    float radius=0.1;
    float *xyz1=new float[b*n*3];
    float *xyz2=new float[b*m*3];
    float *points=new float[b*n*c];
    int *idx=new int[b*m*nsample];
    memset(idx, 0, sizeof(int)*b*m*nsample);
    float *out=new float[b*m*nsample*c];
    float *grad_out=new float[b*m*nsample*c]; // grad to out
    memset(grad_out, 0.0, sizeof(float)*b*m*nsample*c);
    float *grad_points=new float[b*n*c]; // grad to points
    for (int i=0;i<b*n*3;i++)
        xyz1[i]=randomf();
    for (int i=0;i<b*m*3;i++)
        xyz2[i]=randomf();
    for (int i=0;i<b*n*c;i++)
        points[i]=randomf();

    double t0=get_time();
    query_ball_point_cpu(b,n,m,radius,nsample,xyz1,xyz2,idx);
    printf("query_ball_point cpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_cpu(b,n,c,m,nsample,points,idx,out);
    printf("grou_point cpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_grad_cpu(b,n,c,m,nsample,grad_out,idx,grad_points);
    printf("grou_point_grad cpu time %f\n",get_time()-t0);

    return 0;
}


================================================
FILE: code/tf_ops/grouping/query_ball_point.cu
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
#include "cuPrintf.cuh"
#include "cuPrintf.cu"

using namespace std;
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}
// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)
// output: idx (b,m,nsample)
__global__ void query_ball_point_gpu(int b, int n, int m, const float* radius, int nsample, const float *xyz1, const float *xyz2, int *idx) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            int cnt = 0;
            for (int k=0;k<n;++k) {
                if (cnt == nsample)
                    break; // only pick the FIRST nsample points in the ball
	            float x2=xyz2[j*3+0];
	            float y2=xyz2[j*3+1];
	            float z2=xyz2[j*3+2];
	            float x1=xyz1[k*3+0];
	            float y1=xyz1[k*3+1];
	            float z1=xyz1[k*3+2];
		        float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
                if (d<radius[0]) {
                    if (cnt==0) { // set ALL indices to k, s.t. if there are less points in ball than nsample, we still have valid (repeating) indices
                        for (int l=0;l<nsample;++l)
                            idx[j*nsample+l] = k;
                    }
                    idx[j*nsample+cnt] = k;
                    cnt+=1;
                }
            }
        }
        xyz1+=n*3;
        xyz2+=m*3;
        idx+=m*nsample;
    }
}


// input: points (b,n,c), idx (b,m,nsample)
// output: out (b,m,nsample,c)
__global__ void group_point_gpu(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int k=0;k<nsample;++k) {
                int ii = idx[j*nsample+k];
                for (int l=0;l<c;++l) {
                    out[j*nsample*c+k*c+l] = points[ii*c+l];
                }
            }
        }
        points+=n*c;
        idx+=m*nsample;
        out+=m*nsample*c;
    }
}

// input: grad_out (b,m,nsample,c), idx (b,m,nsample), 
// output: grad_points (b,n,c)
__global__ void group_point_grad_gpu(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points) {
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int k=0;k<nsample;++k) {
                int ii = idx[j*nsample+k];
                for (int l=0;l<c;++l) {
                     grad_points[ii*c+l] += grad_out[j*nsample*c+k*c+l];
                }
            }
        }
        idx+=m*nsample;
        grad_out+=m*nsample*c;
        grad_points+=n*c;
    }
}

int main()
{
    int b=32,n=512,m=128,nsample=64,c=64;
    float radius=0.1;
    float *xyz1, *xyz2, *points;
    cudaMallocManaged(&xyz1, b*n*3*sizeof(float));
    cudaMallocManaged(&xyz2, b*m*3*sizeof(float));
    cudaMallocManaged(&points, b*n*c*sizeof(float));
    int *idx;
    cudaMallocManaged(&idx, b*m*nsample*sizeof(int));
    memset(idx, 0, sizeof(int)*b*m*nsample);
    float *out, *grad_out;
    cudaMallocManaged(&out, b*m*nsample*c*sizeof(float));
    cudaMallocManaged(&grad_out, b*m*nsample*c*sizeof(float));
    memset(grad_out, 0.0, sizeof(float)*b*m*nsample*c);
    float *grad_points;
    cudaMallocManaged(&grad_points, b*n*c*sizeof(float));

    for (int i=0;i<b*n*3;i++)
        xyz1[i]=randomf();
    for (int i=0;i<b*m*3;i++)
        xyz2[i]=randomf();
    for (int i=0;i<b*n*c;i++)
        points[i]=randomf();

    double t0=get_time();
    query_ball_point_gpu<<<1,1>>>(b,n,m,radius,nsample,xyz1,xyz2,idx);
    cudaDeviceSynchronize();
    printf("query_ball_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_gpu<<<1,1>>>(b,n,c,m,nsample,points,idx,out);
    cudaDeviceSynchronize();
    printf("grou_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_grad_gpu<<<1,1>>>(b,n,c,m,nsample,grad_out,idx,grad_points);
    cudaDeviceSynchronize();
    printf("grou_point_grad gpu time %f\n",get_time()-t0);

    cudaFree(xyz1);
    cudaFree(xyz2);
    cudaFree(points);
    cudaFree(idx);
    cudaFree(out);
    cudaFree(grad_out);
    cudaFree(grad_points);
    return 0;
}


================================================
FILE: code/tf_ops/grouping/query_ball_point_block.cu
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}
// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)
// output: idx (b,m,nsample)
__global__ void query_ball_point_gpu(int b, int n, int m, const float *radius, int nsample, const float *xyz1, const float *xyz2, int *idx) {
    int index = threadIdx.x;
    xyz1 += n*3*index;
    xyz2 += m*3*index;
    idx += m*nsample*index;

    for (int j=0;j<m;++j) {
        int cnt = 0;
        for (int k=0;k<n;++k) {
            if (cnt == nsample)
                break; // only pick the FIRST nsample points in the ball
            float x2=xyz2[j*3+0];
            float y2=xyz2[j*3+1];
            float z2=xyz2[j*3+2];
            float x1=xyz1[k*3+0];
            float y1=xyz1[k*3+1];
            float z1=xyz1[k*3+2];
    	float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
            if (d<radius[0]) {
                if (cnt==0) { // set ALL indices to k, s.t. if there are less points in ball than nsample, we still have valid (repeating) indices
                    for (int l=0;l<nsample;++l)
                        idx[j*nsample+l] = k;
                }
                idx[j*nsample+cnt] = k;
                cnt+=1;
            }
        }
    }
}


// input: points (b,n,c), idx (b,m,nsample)
// output: out (b,m,nsample,c)
__global__ void group_point_gpu(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out) {
    int index = threadIdx.x;
    points += n*c*index;
    idx += m*nsample*index;
    out += m*nsample*c*index;

    for (int j=0;j<m;++j) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                out[j*nsample*c+k*c+l] = points[ii*c+l];
            }
        }
    }
}

// input: grad_out (b,m,nsample,c), idx (b,m,nsample), 
// output: grad_points (b,n,c)
__global__ void group_point_grad_gpu(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points) {
    int index = threadIdx.x;
    idx += m*nsample*index;
    grad_out += m*nsample*c*index;
    grad_points += n*c*index;

    for (int j=0;j<m;++j) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                 grad_points[ii*c+l] += grad_out[j*nsample*c+k*c+l];
            }
        }
    }
}

int main()
{
    int b=32,n=512,m=128,nsample=64,c=64;
    float radius=0.1;
    float *xyz1, *xyz2, *points;
    cudaMallocManaged(&xyz1, b*n*3*sizeof(float));
    cudaMallocManaged(&xyz2, b*m*3*sizeof(float));
    cudaMallocManaged(&points, b*n*c*sizeof(float));
    int *idx;
    cudaMallocManaged(&idx, b*m*nsample*sizeof(int));
    memset(idx, 0, sizeof(int)*b*m*nsample);
    float *out, *grad_out;
    cudaMallocManaged(&out, b*m*nsample*c*sizeof(float));
    cudaMallocManaged(&grad_out, b*m*nsample*c*sizeof(float));
    memset(grad_out, 0.0, sizeof(float)*b*m*nsample*c);
    float *grad_points;
    cudaMallocManaged(&grad_points, b*n*c*sizeof(float));

    for (int i=0;i<b*n*3;i++)
        xyz1[i]=randomf();
    for (int i=0;i<b*m*3;i++)
        xyz2[i]=randomf();
    for (int i=0;i<b*n*c;i++)
        points[i]=randomf();

    double t0=get_time();
    query_ball_point_gpu<<<1,b>>>(b,n,m,radius,nsample,xyz1,xyz2,idx);
    cudaDeviceSynchronize();
    printf("query_ball_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_gpu<<<1,b>>>(b,n,c,m,nsample,points,idx,out);
    cudaDeviceSynchronize();
    printf("grou_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_grad_gpu<<<1,b>>>(b,n,c,m,nsample,grad_out,idx,grad_points);
    cudaDeviceSynchronize();
    printf("grou_point_grad gpu time %f\n",get_time()-t0);

    cudaFree(xyz1);
    cudaFree(xyz2);
    cudaFree(points);
    cudaFree(idx);
    cudaFree(out);
    cudaFree(grad_out);
    cudaFree(grad_points);
    return 0;
}


================================================
FILE: code/tf_ops/grouping/query_ball_point_grid.cu
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}
// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)
// output: idx (b,m,nsample)
__global__ void query_ball_point_gpu(int b, int n, int m, const float *radius, int nsample, const float *xyz1, const float *xyz2, int *idx) {
    int batch_index = blockIdx.x;
    xyz1 += n*3*batch_index;
    xyz2 += m*3*batch_index;
    idx += m*nsample*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;
    
    for (int j=index;j<m;j+=stride) {
        int cnt = 0;
        for (int k=0;k<n;++k) {
            if (cnt == nsample)
                break; // only pick the FIRST nsample points in the ball
            float x2=xyz2[j*3+0];
            float y2=xyz2[j*3+1];
            float z2=xyz2[j*3+2];
            float x1=xyz1[k*3+0];
            float y1=xyz1[k*3+1];
            float z1=xyz1[k*3+2];
    	float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
            if (d<radius[0]) {
                if (cnt==0) { // set ALL indices to k, s.t. if there are less points in ball than nsample, we still have valid (repeating) indices
                    for (int l=0;l<nsample;++l)
                        idx[j*nsample+l] = k;
                }
                idx[j*nsample+cnt] = k;
                cnt+=1;
            }
        }
    }
}


// input: points (b,n,c), idx (b,m,nsample)
// output: out (b,m,nsample,c)
__global__ void group_point_gpu(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out) {
    int batch_index = blockIdx.x;
    points += n*c*batch_index;
    idx += m*nsample*batch_index;
    out += m*nsample*c*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;
    
    for (int j=index;j<m;j+=stride) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                out[j*nsample*c+k*c+l] = points[ii*c+l];
            }
        }
    }
}

// input: grad_out (b,m,nsample,c), idx (b,m,nsample), 
// output: grad_points (b,n,c)
__global__ void group_point_grad_gpu(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points) {
    int batch_index = blockIdx.x;
    idx += m*nsample*batch_index;
    grad_out += m*nsample*c*batch_index;
    grad_points += n*c*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;

    for (int j=index;j<m;j+=stride) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                 // Use atomic add to avoid race condition
                 atomicAdd(&grad_points[ii*c+l], grad_out[j*nsample*c+k*c+l]);
            }
        }
    }
}

int main()
{
    int b=32,n=512,m=128,nsample=64,c=64;
    float radius=0.1;
    float *xyz1, *xyz2, *points;
    cudaMallocManaged(&xyz1, b*n*3*sizeof(float));
    cudaMallocManaged(&xyz2, b*m*3*sizeof(float));
    cudaMallocManaged(&points, b*n*c*sizeof(float));
    int *idx;
    cudaMallocManaged(&idx, b*m*nsample*sizeof(int));
    memset(idx, 0, sizeof(int)*b*m*nsample);
    float *out, *grad_out;
    cudaMallocManaged(&out, b*m*nsample*c*sizeof(float));
    cudaMallocManaged(&grad_out, b*m*nsample*c*sizeof(float));
    memset(grad_out, 0.0, sizeof(float)*b*m*nsample*c);
    float *grad_points;
    cudaMallocManaged(&grad_points, b*n*c*sizeof(float));

    for (int i=0;i<b*n*3;i++)
        xyz1[i]=randomf();
    for (int i=0;i<b*m*3;i++)
        xyz2[i]=randomf();
    for (int i=0;i<b*n*c;i++)
        points[i]=randomf();

    double t0=get_time();
    query_ball_point_gpu<<<b,256>>>(b,n,m,radius,nsample,xyz1,xyz2,idx);
    cudaDeviceSynchronize();
    printf("query_ball_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_gpu<<<b,256>>>(b,n,c,m,nsample,points,idx,out);
    cudaDeviceSynchronize();
    printf("grou_point gpu time %f\n",get_time()-t0);

    t0=get_time();
    group_point_grad_gpu<<<b,256>>>(b,n,c,m,nsample,grad_out,idx,grad_points);
    cudaDeviceSynchronize();
    printf("grou_point_grad gpu time %f\n",get_time()-t0);

    cudaFree(xyz1);
    cudaFree(xyz2);
    cudaFree(points);
    cudaFree(idx);
    cudaFree(out);
    cudaFree(grad_out);
    cudaFree(grad_points);
    return 0;
}


================================================
FILE: code/tf_ops/grouping/selection_sort.cpp
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}

// input: k (1), distance matrix dist (b,m,n)
// output: idx (b,m,n), val (b,m,n)
void selection_sort_cpu(int b, int n, int m, int k, const float *dist, int *idx, float *val) {
    float *p_dist;
    float tmp;
    int tmpi;
    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int s=0;s<n;++s) {
                val[i*m*n+j*n+s] = dist[i*m*n+j*n+s];
                idx[i*m*n+j*n+s] = s;
            }
        }
    }

    for (int i=0;i<b;++i) {
        for (int j=0;j<m;++j) {
            for (int s=0;s<n;++s)
                printf("%f ", dist[i*m*n+j*n+s]);
            printf("\n");
            p_dist = val+j*n;
            // selection sort for the first k elements
            for (int s=0;s<k;++s) {
                int min=s; 
                // find the min
                for (int t=s+1;t<n;++t) {
                    if (p_dist[t]<p_dist[min]) {
                        min = t;
                    }
                }
                printf("%d\n", min);
                // swap min-th and i-th element
                if (min!=s) {
                    tmp = p_dist[min];
                    p_dist[min] = p_dist[s];
                    p_dist[s] = tmp;
                    tmpi = idx[j*n+min];
                    idx[j*n+min] = idx[j*n+s];
                    idx[j*n+s] = tmpi;
                }       
            }
        }
        idx+=m*n;
        val+=m*n;
    }
}

int main()
{
    //int b=32,n=10000,m=1000,k=128;
    int b=2,n=4,m=2,k=3;
    float *dist=new float[b*m*n];
    int *idx=new int[b*m*n];
    float *val=new float[b*m*n];
    memset(idx, 0, sizeof(int)*b*m*n);
    //for (int i=0;i<b*n*m;i++)
    //    dist[i]=randomf();
    for (int i=0;i<b*n*m;i++) {
        dist[i] = float(10-i);
        printf("%f ", dist[i]);
    }
    printf("\n");



    double t0=get_time();
    selection_sort_cpu(b,n,m,k,dist,idx,val);
    printf("selection sort cpu time %f\n",get_time()-t0);

    for (int i=0;i<b*n*m;++i)
        printf("%d ", idx[i]);
    printf("\n");
    for (int i=0;i<b*n*m;++i)
        printf("%f ", val[i]);
    printf("\n");
    return 0;
}


================================================
FILE: code/tf_ops/grouping/selection_sort.cu
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}

// input: k (1), distance matrix dist (b,m,n)
// output: idx (b,m,k), val (b,m,k)
__global__ void selection_sort_gpu(int b, int n, int m, int k, float *dist, int *idx, float *val) {
    int batch_index = blockIdx.x;
    dist+=m*n*batch_index;
    idx+=m*k*batch_index;
    val+=m*k*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;

    float *p_dist;
    for (int j=index;j<m;j+=stride) {
        p_dist = dist+j*n;
        // selection sort for the first k elements
        for (int s=0;s<k;++s) {
            int min=s; 
            // find the min
            for (int t=s+1;t<n;++t) {
                if (p_dist[t]<p_dist[min]) {
                    min = t;
                }
            }
            // update idx and val
            idx[j*n+s] = min;
            val[j*n+s] = p_dist[min];
            // swap min-th and i-th element
            float tmp = p_dist[min];
            p_dist[min] = p_dist[s];
            p_dist[s] = tmp;
        }
    }
}

int main()
{
    //int b=32,n=10000,m=1000,k=128;
    int b=32,n=2048,m=512,k=128;
    float *dist;
    int *idx;
    float *val;
    cudaMallocManaged(&dist, b*m*n*sizeof(float));
    cudaMallocManaged(&idx, b*m*k*sizeof(int));
    cudaMallocManaged(&val, b*m*k*sizeof(float));
    cudaMemset(idx, 0, sizeof(int)*b*m*k);
    for (int i=0;i<b*n*m;i++)
        dist[i]=randomf();

    double t0=get_time();
    selection_sort_gpu<<<b,256>>>(b,n,m,k,dist,idx,val);
    cudaDeviceSynchronize();
    printf("selection sort cpu time %f\n",get_time()-t0);

    return 0;
}


================================================
FILE: code/tf_ops/grouping/selection_sort_const.cu
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}

// input: k (1), distance matrix dist (b,m,n)
// output: idx (b,m,n), dist_out (b,m,n)
__global__ void selection_sort_gpu(int b, int n, int m, int k, const float *dist, int *outi, float *out) {
    int batch_index = blockIdx.x;
    dist+=m*n*batch_index;
    outi+=m*n*batch_index;
    out+=m*n*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;

    // copy from dist to dist_out
    for (int j=index;j<m;j+=stride) {
        for (int s=0;s<n;++s) {
            out[j*n+s] = dist[j*n+s];
            outi[j*n+s] = s;
        }
    }

    float *p_dist;
    for (int j=index;j<m;j+=stride) {
        p_dist = out+j*n;
        // selection sort for the first k elements
        for (int s=0;s<k;++s) {
            int min=s; 
            // find the min
            for (int t=s+1;t<n;++t) {
                if (p_dist[t]<p_dist[min]) {
                    min = t;
                }
            }
            // swap min-th and i-th element
            if (min!=s) {
                float tmp = p_dist[min];
                p_dist[min] = p_dist[s];
                p_dist[s] = tmp;
                int tmpi = outi[j*n+min];
                outi[j*n+min] = outi[j*n+s];
                outi[j*n+s] = tmpi;
            }
        }
    }
}

int main()
{
    //int b=32,n=10000,m=1000,k=128;
    int b=32,n=2048,m=512,k=128;
    //int b=2,n=4,m=2,k=3;
    float *dist;
    int *idx;
    float *dist_out;
    cudaMallocManaged(&dist, b*m*n*sizeof(float));
    cudaMallocManaged(&idx, b*m*n*sizeof(int));
    cudaMallocManaged(&dist_out, b*m*n*sizeof(float));
    cudaMemset(idx, 0, sizeof(int)*b*m*n);
    for (int i=0;i<b*n*m;i++)
        dist[i]=randomf();
    //for (int i=0;i<b*n*m;i++) {
    //    dist[i] = float(10-i);
    //    printf("%f ", dist[i]);
    //}
    //printf("\n");

    double t0=get_time();
    selection_sort_gpu<<<b,256>>>(b,n,m,k,dist,idx,dist_out);
    cudaDeviceSynchronize();
    printf("selection sort cpu time %f\n",get_time()-t0);
    
    //for (int i=0;i<b*n*m;++i)
    //    printf("%d ", idx[i]);
    //printf("\n");

    return 0;
}


================================================
FILE: code/tf_ops/grouping/test_knn.py
================================================
import tensorflow as tf
import numpy as np

np.random.seed(0)


a_val = np.random.random((2,5,3))
b_val = np.random.random((2,2,3))
for b in range(2):
    print '--- ', b
    t1 = a_val[b,:,:]
    t2 = b_val[b,:,:]
    for i in range(2): #npoint in b
        print '-- point b: ', i
        for j in range(5): # npoint in a
            d = np.sum((t2[i,:]-t1[j,:])**2)
            print d
            


a = tf.constant(a_val)
b = tf.constant(b_val)
print a.get_shape()
k = 3

a = tf.tile(tf.reshape(a, (2,1,5,3)), [1,2,1,1])
b = tf.tile(tf.reshape(b, (2,2,1,3)), [1,1,5,1])

dist = -tf.reduce_sum((a-b)**2, -1)
print dist

val, idx = tf.nn.top_k(dist, k=k)
print val, idx
sess = tf.Session()
print sess.run(a)
print sess.run(b)
print sess.run(dist)
print sess.run(val)
print sess.run(idx)
print sess.run(idx).shape


================================================
FILE: code/tf_ops/grouping/tf_grouping.cpp
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <iostream>
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/common_shape_fns.h"
#include <cuda_runtime.h>
using namespace tensorflow;

REGISTER_OP("QueryBallPoint")
    .Attr("nsample: int")
    .Input("xyz1: float32")
    .Input("xyz2: float32")
    .Input("radius: float32")
    .Output("idx: int32")
    .Output("pts_cnt: int32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        ::tensorflow::shape_inference::ShapeHandle dims2; // batch_size * npoint * 3
        c->WithRank(c->input(1), 3, &dims2);
        int nsample;
        TF_RETURN_IF_ERROR(c->GetAttr("nsample", &nsample));
        ::tensorflow::shape_inference::ShapeHandle output1 = c->MakeShape({c->Dim(dims2, 0), c->Dim(dims2, 1), nsample});
        c->set_output(0, output1);
        ::tensorflow::shape_inference::ShapeHandle output2 = c->MakeShape({c->Dim(dims2, 0), c->Dim(dims2, 1)});
        c->set_output(1, output2);
        return Status::OK();
    });
REGISTER_OP("SelectionSort")
    .Attr("k: int")
    .Input("dist: float32")
    .Output("outi: int32")
    .Output("out: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        c->set_output(0, c->input(0));
        c->set_output(1, c->input(0));
        return Status::OK();
    });
REGISTER_OP("GroupPoint")
    .Input("points: float32")
    .Input("idx: int32")
    .Output("out: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        ::tensorflow::shape_inference::ShapeHandle dims1; // batch_size * ndataset * channels
        c->WithRank(c->input(0), 3, &dims1);
        ::tensorflow::shape_inference::ShapeHandle dims2; // batch_size * npoints * nsample
        c->WithRank(c->input(1), 3, &dims2);
        // batch_size * npoints * nsample * channels
        ::tensorflow::shape_inference::ShapeHandle output = c->MakeShape({c->Dim(dims2, 0), c->Dim(dims2, 1), c->Dim(dims2, 2), c->Dim(dims1, 2)});
        c->set_output(0, output);
        return Status::OK();
    });
REGISTER_OP("GroupPointGrad")
    .Input("points: float32")
    .Input("idx: int32")
    .Input("grad_out: float32")
    .Output("grad_points: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        c->set_output(0, c->input(0));
        return Status::OK();
    });


void queryBallPointLauncher(int b, int n, int m, const float* radius, int nsample, const float *xyz1, const float *xyz2, int *idx, int *pts_cnt);
class QueryBallPointGpuOp : public OpKernel {
    public:
        explicit QueryBallPointGpuOp(OpKernelConstruction* context) : OpKernel(context) {
            //OP_REQUIRES_OK(context, context->GetAttr("radius", &radius_));
            //OP_REQUIRES(context, radius_ > 0, errors::InvalidArgument("QueryBallPoint expects positive radius"));

            OP_REQUIRES_OK(context, context->GetAttr("nsample", &nsample_));
            OP_REQUIRES(context, nsample_ > 0, errors::InvalidArgument("QueryBallPoint expects positive nsample"));
        }

        void Compute(OpKernelContext* context) override {
            const Tensor& xyz1_tensor = context->input(0);
            OP_REQUIRES(context, xyz1_tensor.dims()==3 && xyz1_tensor.shape().dim_size(2)==3, errors::InvalidArgument("QueryBallPoint expects (batch_size, ndataset, 3) xyz1 shape."));
            int b = xyz1_tensor.shape().dim_size(0);
            int n = xyz1_tensor.shape().dim_size(1);

            const Tensor& xyz2_tensor = context->input(1);
            OP_REQUIRES(context, xyz2_tensor.dims()==3 && xyz2_tensor.shape().dim_size(2)==3, errors::InvalidArgument("QueryBallPoint expects (batch_size, npoint, 3) xyz2 shape."));
            int m = xyz2_tensor.shape().dim_size(1);

            Tensor *idx_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0, TensorShape{b,m,nsample_}, &idx_tensor));
            Tensor *pts_cnt_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(1, TensorShape{b,m}, &pts_cnt_tensor));

            const Tensor& radius_tensor = context->input(2);
            auto radius_flat = radius_tensor.flat<float>();
            const float *radius = &(radius_flat(0));

            auto xyz1_flat = xyz1_tensor.flat<float>();
            const float *xyz1 = &(xyz1_flat(0));
            auto xyz2_flat = xyz2_tensor.flat<float>();
            const float *xyz2 = &(xyz2_flat(0));
            auto idx_flat = idx_tensor->flat<int>();
            int *idx = &(idx_flat(0));
            auto pts_cnt_flat = pts_cnt_tensor->flat<int>();
            int *pts_cnt = &(pts_cnt_flat(0));
            queryBallPointLauncher(b,n,m,radius,nsample_,xyz1,xyz2,idx,pts_cnt);
        }
    private:
        int nsample_;
};
REGISTER_KERNEL_BUILDER(Name("QueryBallPoint").Device(DEVICE_GPU), QueryBallPointGpuOp);

void selectionSortLauncher(int b, int n, int m, int k, const float *dist, int *outi, float *out);
class SelectionSortGpuOp : public OpKernel {
    public:
        explicit SelectionSortGpuOp(OpKernelConstruction* context) : OpKernel(context) {
            OP_REQUIRES_OK(context, context->GetAttr("k", &k_));
            OP_REQUIRES(context, k_ > 0, errors::InvalidArgument("SelectionSort expects positive k"));
        }

        void Compute(OpKernelContext* context) override {
            const Tensor& dist_tensor = context->input(0);
            OP_REQUIRES(context, dist_tensor.dims()==3, errors::InvalidArgument("SelectionSort expects (b,m,n) dist shape."));
            int b = dist_tensor.shape().dim_size(0);
            int m = dist_tensor.shape().dim_size(1);
            int n = dist_tensor.shape().dim_size(2);

            Tensor *outi_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0, TensorShape{b,m,n}, &outi_tensor));
            Tensor *out_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(1, TensorShape{b,m,n}, &out_tensor));

            auto dist_flat = dist_tensor.flat<float>();
            const float *dist = &(dist_flat(0));
            auto outi_flat = outi_tensor->flat<int>();
            int *outi = &(outi_flat(0));
            auto out_flat = out_tensor->flat<float>();
            float *out = &(out_flat(0));
            selectionSortLauncher(b,n,m,k_,dist,outi,out);
        }
    private:
        int k_;
};
REGISTER_KERNEL_BUILDER(Name("SelectionSort").Device(DEVICE_GPU), SelectionSortGpuOp);


void groupPointLauncher(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out);
class GroupPointGpuOp: public OpKernel{
    public:
        explicit GroupPointGpuOp(OpKernelConstruction * context):OpKernel(context){}

        void Compute(OpKernelContext * context) override {
            const Tensor& points_tensor=context->input(0);
            OP_REQUIRES(context, points_tensor.dims()==3, errors::InvalidArgument("GroupPoint expects (batch_size, num_points, channel) points shape"));
            int b = points_tensor.shape().dim_size(0);
            int n = points_tensor.shape().dim_size(1);
            int c = points_tensor.shape().dim_size(2);

            const Tensor& idx_tensor=context->input(1);
            OP_REQUIRES(context,idx_tensor.dims()==3 && idx_tensor.shape().dim_size(0)==b, errors::InvalidArgument("GroupPoint expects (batch_size, npoints, nsample) idx shape"));
            int m = idx_tensor.shape().dim_size(1);
            int nsample = idx_tensor.shape().dim_size(2);

            Tensor * out_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0,TensorShape{b,m,nsample,c}, &out_tensor));

            auto points_flat = points_tensor.flat<float>();
            const float *points = &(points_flat(0));
            auto idx_flat = idx_tensor.flat<int>();
            const int *idx = &(idx_flat(0));
            auto out_flat = out_tensor->flat<float>();
            float *out = &(out_flat(0));
            groupPointLauncher(b,n,c,m,nsample,points,idx,out);
        }
};
REGISTER_KERNEL_BUILDER(Name("GroupPoint").Device(DEVICE_GPU),GroupPointGpuOp);

void groupPointGradLauncher(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points);
class GroupPointGradGpuOp: public OpKernel{
    public:
        explicit GroupPointGradGpuOp(OpKernelConstruction * context):OpKernel(context){}

        void Compute(OpKernelContext * context) override {
            const Tensor& points_tensor=context->input(0);
            OP_REQUIRES(context, points_tensor.dims()==3, errors::InvalidArgument("GroupPointGrad expects (batch_size, num_points, channel) points shape"));
            int b = points_tensor.shape().dim_size(0);
            int n = points_tensor.shape().dim_size(1);
            int c = points_tensor.shape().dim_size(2);

            const Tensor& idx_tensor=context->input(1);
            OP_REQUIRES(context,idx_tensor.dims()==3 && idx_tensor.shape().dim_size(0)==b, errors::InvalidArgument("GroupPointGrad expects (batch_size, npoints, nsample) idx shape"));
            int m = idx_tensor.shape().dim_size(1);
            int nsample = idx_tensor.shape().dim_size(2);

            const Tensor& grad_out_tensor=context->input(2);
            OP_REQUIRES(context,grad_out_tensor.dims()==4 && grad_out_tensor.shape().dim_size(0)==b && grad_out_tensor.shape().dim_size(1)==m && grad_out_tensor.shape().dim_size(2)==nsample && grad_out_tensor.shape().dim_size(3)==c, errors::InvalidArgument("GroupPointGrad expects (batch_size, npoints, nsample, channel) grad_out shape"));

            Tensor * grad_points_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0,TensorShape{b,n,c}, &grad_points_tensor));

            auto points_flat = points_tensor.flat<float>();
            const float *points = &(points_flat(0));
            auto idx_flat = idx_tensor.flat<int>();
            const int *idx = &(idx_flat(0));
            auto grad_out_flat = grad_out_tensor.flat<float>();
            const float *grad_out = &(grad_out_flat(0));
            auto grad_points_flat = grad_points_tensor->flat<float>();
            float *grad_points = &(grad_points_flat(0));
            cudaMemset(grad_points, 0, sizeof(float)*b*n*c);
            groupPointGradLauncher(b,n,c,m,nsample,grad_out,idx,grad_points);
        }
};
REGISTER_KERNEL_BUILDER(Name("GroupPointGrad").Device(DEVICE_GPU),GroupPointGradGpuOp);




================================================
FILE: code/tf_ops/grouping/tf_grouping.py
================================================
import tensorflow as tf
from tensorflow.python.framework import ops
import sys
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
grouping_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_grouping_so.so'))
def query_ball_point(radius, nsample, xyz1, xyz2):
    '''
    Input:
        radius: float32, ball search radius
        nsample: int32, number of points selected in each ball region
        xyz1: (batch_size, ndataset, 3) float32 array, input points
        xyz2: (batch_size, npoint, 3) float32 array, query points
    Output:
        idx: (batch_size, npoint, nsample) int32 array, indices to input points
        pts_cnt: (batch_size, npoint) int32 array, number of unique points in each local region
    '''
    #return grouping_module.query_ball_point(radius, nsample, xyz1, xyz2)
    return grouping_module.query_ball_point(xyz1, xyz2, radius, nsample)
ops.NoGradient('QueryBallPoint')
def select_top_k(k, dist):
    '''
    Input:
        k: int32, number of k SMALLEST elements selected
        dist: (b,m,n) float32 array, distance matrix, m query points, n dataset points
    Output:
        idx: (b,m,n) int32 array, first k in n are indices to the top k
        dist_out: (b,m,n) float32 array, first k in n are the top k
    '''
    return grouping_module.selection_sort(dist, k)
ops.NoGradient('SelectionSort')
def group_point(points, idx):
    '''
    Input:
        points: (batch_size, ndataset, channel) float32 array, points to sample from
        idx: (batch_size, npoint, nsample) int32 array, indices to points
    Output:
        out: (batch_size, npoint, nsample, channel) float32 array, values sampled from points
    '''
    return grouping_module.group_point(points, idx)
@tf.RegisterGradient('GroupPoint')
def _group_point_grad(op, grad_out):
    points = op.inputs[0]
    idx = op.inputs[1]
    return [grouping_module.group_point_grad(points, idx, grad_out), None]

def knn_point(k, xyz1, xyz2):
    '''
    Input:
        k: int32, number of k in k-nn search
        xyz1: (batch_size, ndataset, c) float32 array, input points
        xyz2: (batch_size, npoint, c) float32 array, query points
    Output:
        val: (batch_size, npoint, k) float32 array, L2 distances
        idx: (batch_size, npoint, k) int32 array, indices to input points
    '''
    # b = xyz1.get_shape()[0].value
    # n = xyz1.get_shape()[1].value
    # c = xyz1.get_shape()[2].value
    # m = xyz2.get_shape()[1].value
    # xyz1 = tf.tile(tf.reshape(xyz1, (b,1,n,c)), [1,m,1,1])
    # xyz2 = tf.tile(tf.reshape(xyz2, (b,m,1,c)), [1,1,n,1])
    xyz1 = tf.expand_dims(xyz1,axis=1)
    xyz2 = tf.expand_dims(xyz2,axis=2)
    dist = tf.reduce_sum((xyz1-xyz2)**2, -1)

    # outi, out = select_top_k(k, dist)
    # idx = tf.slice(outi, [0,0,0], [-1,-1,k])
    # val = tf.slice(out, [0,0,0], [-1,-1,k])

    val, idx = tf.nn.top_k(-dist, k=k) # ONLY SUPPORT CPU
    return val, idx

if __name__=='__main__':
    knn=True
    import numpy as np
    import time
    np.random.seed(100)
    pts = np.random.random((32,512,64)).astype('float32')
    tmp1 = np.random.random((32,512,3)).astype('float32')
    tmp2 = np.random.random((32,128,3)).astype('float32')
    with tf.device('/gpu:1'):
        points = tf.constant(pts)
        xyz1 = tf.constant(tmp1)
        xyz2 = tf.constant(tmp2)
        radius = 0.1 
        nsample = 64
        if knn:
            _, idx = knn_point(nsample, xyz1, xyz2)
            grouped_points = group_point(points, idx)
        else:
            idx, _ = query_ball_point(radius, nsample, xyz1, xyz2)
            grouped_points = group_point(points, idx)
            #grouped_points_grad = tf.ones_like(grouped_points)
            #points_grad = tf.gradients(grouped_points, points, grouped_points_grad)
    with tf.Session('') as sess:
        now = time.time() 
        for _ in range(100):
            ret = sess.run(grouped_points)
        print time.time() - now
        print ret.shape, ret.dtype
        print ret
    
    


================================================
FILE: code/tf_ops/grouping/tf_grouping_compile.sh
================================================
#/bin/bash
/usr/local/cuda-8.0/bin/nvcc tf_grouping_g.cu -o tf_grouping_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

g++ -std=c++11 tf_grouping.cpp tf_grouping_g.cu.o -o tf_grouping_so.so -shared -fPIC -I /home/ganeshiyer/tensorflow/lib/python2.7/site-packages/tensorflow/include  -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0


================================================
FILE: code/tf_ops/grouping/tf_grouping_g.cu
================================================
// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)
// output: idx (b,m,nsample), pts_cnt (b,m)
__global__ void query_ball_point_gpu(int b, int n, int m, const float *radius, int nsample, const float *xyz1, const float *xyz2, int *idx, int *pts_cnt) {
    int batch_index = blockIdx.x;
    xyz1 += n*3*batch_index;
    xyz2 += m*3*batch_index;
    idx += m*nsample*batch_index;
    pts_cnt += m*batch_index; // counting how many unique points selected in local region

    int index = threadIdx.x;
    int stride = blockDim.x;
    
    for (int j=index;j<m;j+=stride) {
        int cnt = 0;
        for (int k=0;k<n;++k) {
            if (cnt == nsample)
                break; // only pick the FIRST nsample points in the ball
            float x2=xyz2[j*3+0];
            float y2=xyz2[j*3+1];
            float z2=xyz2[j*3+2];
            float x1=xyz1[k*3+0];
            float y1=xyz1[k*3+1];
            float z1=xyz1[k*3+2];
    	    float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
            if (d<radius[0]) {
                if (cnt==0) { // set ALL indices to k, s.t. if there are less points in ball than nsample, we still have valid (repeating) indices
                    for (int l=0;l<nsample;++l)
                        idx[j*nsample+l] = k;
                }
                idx[j*nsample+cnt] = k;
                cnt+=1;
            }
        }
        pts_cnt[j] = cnt;
    }
}

// input: points (b,n,c), idx (b,m,nsample)
// output: out (b,m,nsample,c)
__global__ void group_point_gpu(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out) {
    int batch_index = blockIdx.x;
    points += n*c*batch_index;
    idx += m*nsample*batch_index;
    out += m*nsample*c*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;
    
    for (int j=index;j<m;j+=stride) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                out[j*nsample*c+k*c+l] = points[ii*c+l];
            }
        }
    }
}

// input: grad_out (b,m,nsample,c), idx (b,m,nsample), 
// output: grad_points (b,n,c)
__global__ void group_point_grad_gpu(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points) {
    int batch_index = blockIdx.x;
    idx += m*nsample*batch_index;
    grad_out += m*nsample*c*batch_index;
    grad_points += n*c*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;

    for (int j=index;j<m;j+=stride) {
        for (int k=0;k<nsample;++k) {
            int ii = idx[j*nsample+k];
            for (int l=0;l<c;++l) {
                 atomicAdd(&grad_points[ii*c+l], grad_out[j*nsample*c+k*c+l]);
            }
        }
    }
}

// input: k (1), distance matrix dist (b,m,n)
// output: idx (b,m,n), dist_out (b,m,n)
// only the top k results within n are useful
__global__ void selection_sort_gpu(int b, int n, int m, int k, const float *dist, int *outi, float *out) {
    int batch_index = blockIdx.x;
    dist+=m*n*batch_index;
    outi+=m*n*batch_index;
    out+=m*n*batch_index;

    int index = threadIdx.x;
    int stride = blockDim.x;

    // copy from dist to dist_out
    for (int j=index;j<m;j+=stride) {
        for (int s=0;s<n;++s) {
            out[j*n+s] = dist[j*n+s];
            outi[j*n+s] = s;
        }
    }

    float *p_dist;
    for (int j=index;j<m;j+=stride) {
        p_dist = out+j*n;
        // selection sort for the first k elements
        for (int s=0;s<k;++s) {
            int min=s; 
            // find the min
            for (int t=s+1;t<n;++t) {
                if (p_dist[t]<p_dist[min]) {
                    min = t;
                }
            }
            // swap min-th and i-th element
            if (min!=s) {
                float tmp = p_dist[min];
                p_dist[min] = p_dist[s];
                p_dist[s] = tmp;
                int tmpi = outi[j*n+min];
                outi[j*n+min] = outi[j*n+s];
                outi[j*n+s] = tmpi;
            }
        }
    }
}

void queryBallPointLauncher(int b, int n, int m, const float *radius, int nsample, const float *xyz1, const float *xyz2, int *idx, int *pts_cnt) {
    query_ball_point_gpu<<<b,256>>>(b,n,m,radius,nsample,xyz1,xyz2,idx,pts_cnt);
    //cudaDeviceSynchronize();
}
void selectionSortLauncher(int b, int n, int m, int k, const float *dist, int *outi, float *out) {
    selection_sort_gpu<<<b,256>>>(b,n,m,k,dist,outi,out); 
    //cudaDeviceSynchronize();
}
void groupPointLauncher(int b, int n, int c, int m, int nsample, const float *points, const int *idx, float *out){
    group_point_gpu<<<b,256>>>(b,n,c,m,nsample,points,idx,out);
    //cudaDeviceSynchronize();
}
void groupPointGradLauncher(int b, int n, int c, int m, int nsample, const float *grad_out, const int *idx, float *grad_points){
    group_point_grad_gpu<<<b,256>>>(b,n,c,m,nsample,grad_out,idx,grad_points);
    //group_point_grad_gpu<<<1,1>>>(b,n,c,m,nsample,grad_out,idx,grad_points);
    //cudaDeviceSynchronize();
}


================================================
FILE: code/tf_ops/grouping/tf_grouping_op_test.py
================================================
import tensorflow as tf
import numpy as np
from tf_grouping import query_ball_point, group_point

class GroupPointTest(tf.test.TestCase):
  def test(self):
    pass

  def test_grad(self):
    with tf.device('/gpu:0'):
      points = tf.constant(np.random.random((1,128,16)).astype('float32'))
      print points
      xyz1 = tf.constant(np.random.random((1,128,3)).astype('float32'))
      xyz2 = tf.constant(np.random.random((1,8,3)).astype('float32'))
      radius = 0.3 
      nsample = 32
      idx, pts_cnt = query_ball_point(radius, nsample, xyz1, xyz2)
      grouped_points = group_point(points, idx)
      print grouped_points

    with self.test_session():
      print "---- Going to compute gradient error"
      err = tf.test.compute_gradient_error(points, (1,128,16), grouped_points, (1,8,32,16))
      print err
      self.assertLess(err, 1e-4) 

if __name__=='__main__':
  tf.test.main() 


================================================
FILE: code/tf_ops/interpolation/__init__.py
================================================


================================================
FILE: code/tf_ops/interpolation/interpolate.cpp
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include <string>
#include <vector>
using namespace std;
float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}

// Find three nearest neigbors with square distance
// input: xyz1 (b,n,3), xyz2(b,m,3)
// output: dist (b,n,3), idx (b,n,3)
void threenn_cpu(int b, int n, int m, const float *xyz1, const float *xyz2, float *dist, int *idx) {
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
	    float x1=xyz1[j*3+0];
	    float y1=xyz1[j*3+1];
	    float z1=xyz1[j*3+2];
            double best1=1e40; double best2=1e40; double best3=1e40;
            int besti1=0; int besti2=0; int besti3=0;
            for (int k=0;k<m;++k) {
                float x2=xyz2[k*3+0];
	        float y2=xyz2[k*3+1];
	        float z2=xyz2[k*3+2];
		//float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
		double d=x2*x2+y2*y2+z2*z2;
                if (d<best1) {
                    best3=best2;
                    besti3=besti2;
                    best2=best1;
                    besti2=besti1;
                    best1=d;
                    besti1=k;
                } else if (d<best2) {
                    best3=best2;
                    besti3=besti2;
                    best2=d;
                    besti2=k;
                } else if (d<best3) {
                    best3=d;
                    besti3=k;
                }
            } 
            dist[j*3]=best1;
            idx[j*3]=besti1;
            dist[j*3+1]=best2;
            idx[j*3+1]=besti2;
            dist[j*3+2]=best3;
            idx[j*3+2]=besti3;
        } 
        xyz1+=n*3;
        xyz2+=m*3;
        dist+=n*3;
        idx+=n*3;
    }
} 

// CONSTANT WEIGHT TODO
// input: dist (b,n,3)
// output: weight (b,n,3)
void get_weights_cpu(int b, int n, const float *dist, float *weight) {
    const float w = 1.0/3.0;
    for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
            weight[j*3]=w;
            weight[j*3+1]=w;
            weight[j*3+2]=w;
        } 
        dist+=n*3;
        weight+=n*3;
    }
}

// input: points (b,m,c), idx (b,n,3), weight (b,n,3)
// output: out (b,n,c)
void interpolate_cpu(int b, int m, int c, int n, const float *points, const int *idx, const float *weight, float *out) {
     float w1,w2,w3;
     int i1,i2,i3;
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
            w1=weight[j*3];
            w2=weight[j*3+1];
            w3=weight[j*3+2]; 
            i1=idx[j*3];
            i2=idx[j*3+1];
            i3=idx[j*3+2];
            for (int l=0;l<c;++l) {
                out[j*c+l] = points[i1*c+l]*w1 + points[i2*c+l]*w2 + points[i3*c+l]*w3;
            }
        } 
        points+=m*c;
        idx+=n*3;
        weight+=n*3;
        out+=n*c;
    }
}

// input: grad_out (b,n,c), idx (b,n,3), weight (b,n,3)
// output: grad_points (b,m,c)
void interpolate_grad_cpu(int b, int n, int c, int m, const float *grad_out, const int *idx, const float *weight, float *grad_points) {
     float w1,w2,w3;
     int i1,i2,i3;
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
            w1=weight[j*3];
            w2=weight[j*3+1];
            w3=weight[j*3+2]; 
            i1=idx[j*3];
            i2=idx[j*3+1];
            i3=idx[j*3+2];
            for (int l=0;l<c;++l) {
                grad_points[i1*c+l] += grad_out[j*c+l]*w1;
                grad_points[i2*c+l] += grad_out[j*c+l]*w2;
                grad_points[i3*c+l] += grad_out[j*c+l]*w3;
            }
        } 
        grad_out+=n*c;
        idx+=n*3;
        weight+=n*3;
        grad_points+=m*c;
    }
}

int main()
{
    int b=32,n=512,m=128,c=64;
    float *xyz1=new float[b*n*3];
    float *xyz2=new float[b*m*3];
    float *dist=new float[b*n*3];
    int *idx=new int[b*n*3];
    memset(idx, 0, sizeof(int)*b*n*3);
    float *weight=new float[b*n*3];
    float *points=new float[b*m*c];
    float *out=new float[b*n*c];
    float *grad_out=new float[b*n*c]; // grad to out
    memset(grad_out, 0.0, sizeof(float)*b*n*c);
    float *grad_points=new float[b*m*c]; // grad to points
    for (int i=0;i<b*n*3;i++)
        xyz1[i]=randomf();
    for (int i=0;i<b*m*3;i++)
        xyz2[i]=randomf();
    for (int i=0;i<b*m*c;i++)
        points[i]=randomf();

    double t0=get_time();
    threenn_cpu(b,n,m,xyz1,xyz2,dist,idx);
    printf("threenn cpu time %f\n",get_time()-t0);
    
    t0=get_time();
    get_weights_cpu(b,n,dist,weight);
    printf("get_weights_cpu cpu time %f\n",get_time()-t0);

    t0=get_time();
    interpolate_cpu(b,m,c,n,points,idx,weight,out);
    printf("interpolate_cpu cpu time %f\n",get_time()-t0);

    t0=get_time();
    interpolate_grad_cpu(b,n,c,m,grad_out,idx,weight,grad_points);
    printf("interpolate_grad_cpu cpu time %f\n",get_time()-t0);
    return 0;
}


================================================
FILE: code/tf_ops/interpolation/tf_interpolate.cpp
================================================
#include <cstdio>
#include <ctime>
#include <cstring> // memset
#include <cstdlib> // rand, RAND_MAX
#include <cmath> // sqrtf
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/common_shape_fns.h"
using namespace tensorflow;

REGISTER_OP("ThreeNN")
    .Input("xyz1: float32")
    .Input("xyz2: float32")
    .Output("dist: float32")
    .Output("idx: int32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        c->set_output(0, c->input(0));
        c->set_output(1, c->input(0));
        return Status::OK();
    });
REGISTER_OP("ThreeInterpolate")
    .Input("points: float32")
    .Input("idx: int32")
    .Input("weight: float32")
    .Output("out: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        ::tensorflow::shape_inference::ShapeHandle dims1; // (b,m,c)
        c->WithRank(c->input(0), 3, &dims1);
        ::tensorflow::shape_inference::ShapeHandle dims2; // (b,n,3)
        c->WithRank(c->input(1), 3, &dims2);
        // (b,n,c)
        ::tensorflow::shape_inference::ShapeHandle output = c->MakeShape({c->Dim(dims1, 0), c->Dim(dims2, 1), c->Dim(dims1, 2)});
        c->set_output(0, output);
        return Status::OK();
    });
REGISTER_OP("ThreeInterpolateGrad")
    .Input("points: float32")
    .Input("idx: int32")
    .Input("weight: float32")
    .Input("grad_out: float32")
    .Output("grad_points: float32")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        c->set_output(0, c->input(0));
        return Status::OK();
    });

float randomf(){
    return (rand()+0.5)/(RAND_MAX+1.0);
}
static double get_time(){
    timespec tp;
    clock_gettime(CLOCK_MONOTONIC,&tp);
    return tp.tv_sec+tp.tv_nsec*1e-9;
}

// Find three nearest neigbors with square distance
// input: xyz1 (b,n,3), xyz2(b,m,3)
// output: dist (b,n,3), idx (b,n,3)
void threenn_cpu(int b, int n, int m, const float *xyz1, const float *xyz2, float *dist, int *idx) {
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
	    float x1=xyz1[j*3+0];
	    float y1=xyz1[j*3+1];
	    float z1=xyz1[j*3+2];
            double best1=1e40; double best2=1e40; double best3=1e40;
            int besti1=0; int besti2=0; int besti3=0;
            for (int k=0;k<m;++k) {
                float x2=xyz2[k*3+0];
	        float y2=xyz2[k*3+1];
	        float z2=xyz2[k*3+2];
		//float d=max(sqrtf((x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1)),1e-20f);
		double d=(x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1);
                if (d<best1) {
                    best3=best2;
                    besti3=besti2;
                    best2=best1;
                    besti2=besti1;
                    best1=d;
                    besti1=k;
                } else if (d<best2) {
                    best3=best2;
                    besti3=besti2;
                    best2=d;
                    besti2=k;
                } else if (d<best3) {
                    best3=d;
                    besti3=k;
                }
            } 
            dist[j*3]=best1;
            idx[j*3]=besti1;
            dist[j*3+1]=best2;
            idx[j*3+1]=besti2;
            dist[j*3+2]=best3;
            idx[j*3+2]=besti3;
        } 
        xyz1+=n*3;
        xyz2+=m*3;
        dist+=n*3;
        idx+=n*3;
    }
} 

// input: points (b,m,c), idx (b,n,3), weight (b,n,3)
// output: out (b,n,c)
void threeinterpolate_cpu(int b, int m, int c, int n, const float *points, const int *idx, const float *weight, float *out) {
     float w1,w2,w3;
     int i1,i2,i3;
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
            w1=weight[j*3];
            w2=weight[j*3+1];
            w3=weight[j*3+2]; 
            i1=idx[j*3];
            i2=idx[j*3+1];
            i3=idx[j*3+2];
            for (int l=0;l<c;++l) {
                out[j*c+l] = points[i1*c+l]*w1 + points[i2*c+l]*w2 + points[i3*c+l]*w3;
            }
        } 
        points+=m*c;
        idx+=n*3;
        weight+=n*3;
        out+=n*c;
    }
}

// input: grad_out (b,n,c), idx (b,n,3), weight (b,n,3)
// output: grad_points (b,m,c)
void threeinterpolate_grad_cpu(int b, int n, int c, int m, const float *grad_out, const int *idx, const float *weight, float *grad_points) {
     float w1,w2,w3;
     int i1,i2,i3;
     for (int i=0;i<b;++i) {
        for (int j=0;j<n;++j) {
            w1=weight[j*3];
            w2=weight[j*3+1];
            w3=weight[j*3+2]; 
            i1=idx[j*3];
            i2=idx[j*3+1];
            i3=idx[j*3+2];
            for (int l=0;l<c;++l) {
                grad_points[i1*c+l] += grad_out[j*c+l]*w1;
                grad_points[i2*c+l] += grad_out[j*c+l]*w2;
                grad_points[i3*c+l] += grad_out[j*c+l]*w3;
            }
        } 
        grad_out+=n*c;
        idx+=n*3;
        weight+=n*3;
        grad_points+=m*c;
    }
}



class ThreeNNOp : public OpKernel {
    public:
        explicit ThreeNNOp(OpKernelConstruction* context) : OpKernel(context) {}

        void Compute(OpKernelContext* context) override {
            const Tensor& xyz1_tensor = context->input(0);
            OP_REQUIRES(context, xyz1_tensor.dims()==3 && xyz1_tensor.shape().dim_size(2)==3, errors::InvalidArgument("ThreeNN expects (b,n,3) xyz1 shape."));
            int b = xyz1_tensor.shape().dim_size(0);
            int n = xyz1_tensor.shape().dim_size(1);

            const Tensor& xyz2_tensor = context->input(1);
            OP_REQUIRES(context, xyz2_tensor.dims()==3 && xyz2_tensor.shape().dim_size(2)==3, errors::InvalidArgument("ThreeNN expects (b,m,3) xyz2 shape."));
            int m = xyz2_tensor.shape().dim_size(1);

            Tensor *dist_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0, TensorShape{b,n,3}, &dist_tensor));
            Tensor *idx_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(1, TensorShape{b,n,3}, &idx_tensor));

            auto xyz1_flat = xyz1_tensor.flat<float>();
            const float *xyz1 = &(xyz1_flat(0));
            auto xyz2_flat = xyz2_tensor.flat<float>();
            const float *xyz2 = &(xyz2_flat(0));
            auto dist_flat = dist_tensor->flat<float>();
            float *dist = &(dist_flat(0));
            auto idx_flat = idx_tensor->flat<int>();
            int *idx = &(idx_flat(0));
            threenn_cpu(b,n,m,xyz1,xyz2,dist,idx);
        }
};
REGISTER_KERNEL_BUILDER(Name("ThreeNN").Device(DEVICE_CPU), ThreeNNOp);



class ThreeInterpolateOp: public OpKernel{
    public:
        explicit ThreeInterpolateOp(OpKernelConstruction * context):OpKernel(context){}

        void Compute(OpKernelContext * context) override {
            const Tensor& points_tensor=context->input(0);
            OP_REQUIRES(context, points_tensor.dims()==3, errors::InvalidArgument("ThreeInterpolate expects (b,m,c) points shape"));
            int b = points_tensor.shape().dim_size(0);
            int m = points_tensor.shape().dim_size(1);
            int c = points_tensor.shape().dim_size(2);

            const Tensor& idx_tensor=context->input(1);
            OP_REQUIRES(context,idx_tensor.dims()==3 && idx_tensor.shape().dim_size(0)==b && idx_tensor.shape().dim_size(2)==3, errors::InvalidArgument("ThreeInterpolate expects (b,n,3) idx shape"));
            int n = idx_tensor.shape().dim_size(1);
            const Tensor& weight_tensor=context->input(2);
            OP_REQUIRES(context,weight_tensor.dims()==3 && weight_tensor.shape().dim_size(0)==b && weight_tensor.shape().dim_size(1)==n && weight_tensor.shape().dim_size(2)==3, errors::InvalidArgument("ThreeInterpolate expects (b,n,3) weight shape"));

            Tensor * out_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0,TensorShape{b,n,c}, &out_tensor));

            auto points_flat = points_tensor.flat<float>();
            const float *points = &(points_flat(0));
            auto idx_flat = idx_tensor.flat<int>();
            const int *idx = &(idx_flat(0));
            auto weight_flat = weight_tensor.flat<float>();
            const float *weight = &(weight_flat(0));
            auto out_flat = out_tensor->flat<float>();
            float *out = &(out_flat(0));
            threeinterpolate_cpu(b,m,c,n,points,idx,weight,out);
        }
};
REGISTER_KERNEL_BUILDER(Name("ThreeInterpolate").Device(DEVICE_CPU),ThreeInterpolateOp);


class ThreeInterpolateGradOp: public OpKernel{
    public:
        explicit ThreeInterpolateGradOp(OpKernelConstruction * context):OpKernel(context){}

        void Compute(OpKernelContext * context) override {
            const Tensor& points_tensor=context->input(0);
            OP_REQUIRES(context, points_tensor.dims()==3, errors::InvalidArgument("ThreeInterpolateGrad expects (b,m,c) points shape"));
            int b = points_tensor.shape().dim_size(0);
            int m = points_tensor.shape().dim_size(1);
            int c = points_tensor.shape().dim_size(2);

            const Tensor& idx_tensor=context->input(1);
            OP_REQUIRES(context,idx_tensor.dims()==3 && idx_tensor.shape().dim_size(0)==b, errors::InvalidArgument("ThreeInterpolateGrad expects (b,n,3) idx shape"));
            int n = idx_tensor.shape().dim_size(1);
            const Tensor& weight_tensor=context->input(2);
            OP_REQUIRES(context,weight_tensor.dims()==3 && weight_tensor.shape().dim_size(0)==b && weight_tensor.shape().dim_size(1)==n && weight_tensor.shape().dim_size(2)==3, errors::InvalidArgument("ThreeInterpolateGrad expects (b,n,3) weight shape"));

            const Tensor& grad_out_tensor=context->input(3);
            OP_REQUIRES(context,grad_out_tensor.dims()==3 && grad_out_tensor.shape().dim_size(0)==b && grad_out_tensor.shape().dim_size(1)==n && grad_out_tensor.shape().dim_size(2)==c, errors::InvalidArgument("ThreeInterpolateGrad expects (b,n,c) grad_out shape"));

            Tensor * grad_points_tensor = nullptr;
            OP_REQUIRES_OK(context, context->allocate_output(0,TensorShape{b,m,c}, &grad_points_tensor));

            auto points_flat = points_tensor.flat<float>();
            const float *points = &(points_flat(0));
            auto idx_flat = idx_tensor.flat<int>();
            const int *idx = &(idx_flat(0));
            auto weight_flat = weight_tensor.flat<float>();
            const float *weight = &(weight_flat(0));
            auto grad_out_flat = grad_out_tensor.flat<float>();
            const float *grad_out = &(grad_out_flat(0));
            auto grad_points_flat = grad_points_tensor->flat<float>();
            float *grad_points = &(grad_points_flat(0));
            memset(grad_points, 0, sizeof(float)*b*m*c);
            threeinterpolate_grad_cpu(b,n,c,m,grad_out,idx,weight,grad_points);
        }
};
REGISTER_KERNEL_BUILDER(Name("ThreeInterpolateGrad").Device(DEVICE_CPU),ThreeInterpolateGradOp);




================================================
FILE: code/tf_ops/interpolation/tf_interpolate.py
================================================
import tensorflow as tf
from tensorflow.python.framework import ops
import sys
import os
BASE_DIR = os.path.dirname(__file__)
sys.path.append(BASE_DIR)
interpolate_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_interpolate_so.so'))
def three_nn(xyz1, xyz2):
    '''
    Input:
        xyz1: (b,n,3) float32 array, unknown points
        xyz2: (b,m,3) float32 array, known points
    Output:
        dist: (b,n,3) float32 array, distances to known points
        idx: (b,n,3) int32 array, indices to known points
    '''
    return interpolate_module.three_nn(xyz1, xyz2)
ops.NoGradient('ThreeNN')
def three_interpolate(points, idx, weight):
    '''
    Input:
        points: (b,m,c) float32 array, known points
        idx: (b,n,3) int32 array, indices to known points
        weight: (b,n,3) float32 array, weights on known points
    Output:
        out: (b,n,c) float32 array, interpolated point values
    '''
    return interpolate_module.three_interpolate(points, idx, weight)
@tf.RegisterGradient('ThreeInterpolate')
def _three_interpolate_grad(op, grad_out):
    points = op.inputs[0]
    idx = op.inputs[1]
    weight = op.inputs[2]
    return [interpolate_module.three_interpolate_grad(points, idx, weight, grad_out), None, None]

if __name__=='__main__':
    import numpy as np
    import time
    np.random.seed(100)
    pts = np.random.random((32,128,64)).astype('float32')
    tmp1 = np.random.random((32,512,3)).astype('float32')
    tmp2 = np.random.random((32,128,3)).astype('float32')
    with tf.device('/cpu:0'):
        points = tf.constant(pts)
        xyz1 = tf.constant(tmp1)
        xyz2 = tf.constant(tmp2)
        dist, idx = three_nn(xyz1, xyz2)
        weight = tf.ones_like(dist)/3.0
        interpolated_points = three_interpolate(points, idx, weight)
    with tf.Session('') as sess:
        now = time.time() 
        for _ in range(100):
            ret = sess.run(interpolated_points)
        print time.time() - now
        print ret.shape, ret.dtype
        #print ret
    
    
    


================================================
FILE: code/tf_ops/interpolation/tf_interpolate_compile.sh
================================================
g++ -std=c++11 tf_interpolate.cpp -o tf_interpolate_so.so -shared -fPIC -I /home/ganeshiyer/tensorflow/lib/python2.7/site-packages/tensorflow/include  -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0


================================================
FILE: code/tf_ops/interpolation/tf_interpolate_op_test.py
================================================
import tensorflow as tf
import numpy as np
from tf_interpolate import three_nn, three_interpolate

class GroupPointTest(tf.test.TestCase):
  def test(self):
    pass

  def test_grad(self):
    with self.test_session():
      points = tf.constant(np.random.random((1,8,16)).astype('float32'))
      print points
      xyz1 = tf.constant(np.random.random((1,128,3)).astype('float32'))
      xyz2 = tf.constant(np.random.random((1,8,3)).astype('float32'))
      dist, idx = three_nn(xyz1, xyz2)
      weight = tf.ones_like(dist)/3.0
      interpolated_points = three_interpolate(points, idx, weight)
      print interpolated_points
      err = tf.test.compute_gradient_error(points, (1,8,16), interpolated_points, (1,128,16))
      print err
      self.assertLess(err, 1e-4) 

if __name__=='__main__':
  tf.test.main() 


================================================
FILE: code/tf_ops/interpolation/visu_interpolation.py
================================================
''' Visualize part segmentation '''
import os
import sys
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
sys.path.append('/home/rqi/Projects/toolkits/visualization')
from show3d_balls import showpoints
import numpy as np
from tf_interpolate import three_nn, three_interpolate
import tensorflow as tf


pts2 = np.array([[0,0,1],[1,0,0],[0,1,0],[1,1,0]]).astype('float32')
xyz1 = np.random.random((100,3)).astype('float32')
xyz2 = np.array([[0,0,0],[1,0,0],[0,1,0],[1,1,1]]).astype('float32')

def fun(xyz1,xyz2,pts2):
    with tf.device('/cpu:0'):
        points = tf.constant(np.expand_dims(pts2,0))
        xyz1 = tf.constant(np.expand_dims(xyz1,0))
        xyz2 = tf.constant(np.expand_dims(xyz2,0))
        dist, idx = three_nn(xyz1, xyz2)
        #weight = tf.ones_like(dist)/3.0
        dist = tf.maximum(dist, 1e-10)
        norm = tf.reduce_sum((1.0/dist),axis=2,keep_dims=True)
        norm = tf.tile(norm, [1,1,3])
        print norm
        weight = (1.0/dist) / norm
        interpolated_points = three_interpolate(points, idx, weight)
    with tf.Session('') as sess:
        tmp,pts1,d,w = sess.run([xyz1, interpolated_points, dist, weight])
        #print w
        pts1 = pts1.squeeze()
    return pts1

pts1 = fun(xyz1,xyz2,pts2) 
all_pts = np.zeros((104,3))
all_pts[0:100,:] = pts1
all_pts[100:,:] = pts2
all_xyz = np.zeros((104,3))
all_xyz[0:100,:]=xyz1
all_xyz[100:,:]=xyz2
showpoints(xyz2, pts2, ballradius=8)
showpoints(xyz1, pts1, ballradius=8)
showpoints(all_xyz, all_pts, ballradius=8)


================================================
FILE: code/tf_ops/sampling/__init__.py
================================================


================================================
FILE: code/tf_ops/sampling/tf_sampling.cpp
================================================
/* Furthest point sampling
 * Original author: Haoqiang Fan
 * Modified by Charles R. Qi
 * All Rights Reserved. 2017. 
 */
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/framework/common_shape_fns.h"
#include <cuda_runtime.h>

using namespace tensorflow;

REGISTER_OP("ProbSample")
  .Input("inp: float32")
  .Input("inpr: float32")
  .Output("out: int32")
  .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
    ::tensorflow::shape_inference::ShapeHandle dims1; // batch_size * ncategory
    c->WithRank(c->input(0), 2, &dims1);
    ::tensorflow::shape_inference::ShapeHandle dims2; // batch_size * npoints
    c->WithRank(c->input(1), 2, &dims2);
    // batch_size * npoints
    ::tensorflow::shape_inference::ShapeHandle output = c->MakeShape({c->Dim(dims2, 0), c->Dim(dims2, 1)});
    c->set_output(0, output);
    return Status::OK();
  });
REGISTER_OP("FarthestPointSample")
  .Attr("npoint: int")
  .Input("inp: float32")
  .Output("out: int32")
  .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
    ::tensorflow::shape_inference::ShapeHandle dims1; // batch_size * npoint * 3
    c->WithRank(c->input(0), 3, &dims1);
    int npoint;
    TF_RETURN_IF_ERROR(c->GetAttr("npoint", &npoint));
    ::tensorflow::shape_inference::ShapeHandle output = c->MakeShape({c->Dim(dims1, 0), npoint});
    c->set_output(0, output);
    return Status::OK();
  });
REGISTER_OP("GatherPoint")
  .Input("inp: float32")
  .Input("idx: int32")
  .Output("out: float32")
  .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
    ::tensorflow::shape_inference::ShapeHandle dims1; // batch_size * ndataset * 3
    c->WithRank(c->input(0), 3, &dims1);
    ::tensorflow::shape_inference::ShapeHandle dims2; // batch_size * npoints
    c->WithRank(c->input(1), 2, &dims2);
    // batch_size * npoints * 3
    ::tensorflow::shape_inference::ShapeHandle output = c->MakeShape({c->Dim(dims1, 0), c->Dim(dims2, 1), c->Dim(dims1, 2)});
    c->set_output(0, output);
    return Status::OK();
  });
REGISTER_OP("GatherPointGrad")
  .Input("inp: float32")
  .Input("idx: int32")
  .Input("out_g: float32")
  .Output("inp_g: float32")
  .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
    c->set_output(0, c->input(0));
    return Status::OK();
  });

void probsampleLauncher(int b,int n,int m,const float * inp_p,const float * inp_r,float * temp,int * out);
class ProbSampleGpuOp: public OpKernel{
  public:
    explicit ProbSampleGpuOp(OpKernelConstruction* context):OpKernel(context){}
    void Compute(OpKernelContext * context)override{
      const Tensor& inp_tensor=context->input(0);
      const Tensor& inpr_tensor=context->input(1);
      auto inp_flat=inp_tensor.flat<float>();
      auto inpr_flat=inpr_tensor.flat<float>();
      const float * inp=&(inp_flat(0));
      const float * inpr=&(inpr_flat(0));
      OP_REQUIRES(context,inp_tensor.dims()==2,errors::InvalidArgument("ProbSample expects (batch_size,num_choices) inp shape"));
      int b=inp_tensor.shape().dim_size(0);
      int n=inp_tensor.shape().dim_size(1);
      OP_REQUIRES(context,inpr_tensor.dims()==2 && inpr_tensor.shape().dim_size(0)==b,errors::InvalidArgument("ProbSample expects (batch_size,num_points) inpr shape"));
      int m=inpr_tensor.shape().dim_size(1);
      Tensor * out_tensor=NULL;
      OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,m},&out_tensor));
      auto out_flat=out_tensor->flat<int>();
      int * out=&(out_flat(0));
      Tensor temp_tensor;
      OP_REQUIRES_OK(context,context->allocate_temp(DataTypeToEnum<float>::value,TensorShape{b,n},&temp_tensor));
      auto temp_flat=temp_tensor.flat<float>();
      float * temp=&(temp_flat(0));
      probsampleLauncher(b,n,m,inp,inpr,temp,out);
    }
};
REGISTER_KERNEL_BUILDER(Name("ProbSample").Device(DEVICE_GPU), ProbSampleGpuOp);

void farthestpointsamplingLauncher(int b,int n,int m,const float * inp,float * temp,int * out);
class FarthestPointSampleGpuOp: public OpKernel{
  public:
    explicit FarthestPointSampleGpuOp(OpKernelConstruction* context):OpKernel(context) {
                    OP_REQUIRES_OK(context, context->GetAttr("npoint", &npoint_));
                    OP_REQUIRES(context, npoint_ > 0, errors::InvalidArgument("FarthestPointSample expects positive npoint"));
                }
    void Compute(OpKernelContext * context)override{
      int m = npoint_;

      const Tensor& inp_tensor=context->input(0);
      OP_REQUIRES(context,inp_tensor.dims()==3 && inp_tensor.shape().dim_size(2)==3,errors::InvalidArgument("FarthestPointSample expects (batch_size,num_points,3) inp shape"));
      int b=inp_tensor.shape().dim_size(0);
      int n=inp_tensor.shape().dim_size(1);
      auto inp_flat=inp_tensor.flat<float>();
      const float * inp=&(inp_flat(0));
      Tensor * out_tensor;
      OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,m},&out_tensor));
      auto out_flat=out_tensor->flat<int>();
      int * out=&(out_flat(0));
      Tensor temp_tensor;
      OP_REQUIRES_OK(context,context->allocate_temp(DataTypeToEnum<float>::value,TensorShape{32,n},&temp_tensor));
      auto temp_flat=temp_tensor.flat<float>();
      float * temp=&(temp_flat(0));
      farthestpointsamplingLauncher(b,n,m,inp,temp,out);
    }
    private:
        int npoint_;
};
REGISTER_KERNEL_BUILDER(Name("FarthestPointSample").Device(DEVICE_GPU),FarthestPointSampleGpuOp);

void gatherpointLauncher(int b,int n,int m,const float * inp,const int * idx,float * out);
class GatherPointGpuOp: public OpKernel{
  public:
    explicit GatherPointGpuOp(OpKernelConstruction * context):OpKernel(context){}
    void Compute(OpKernelContext * context)override{
      const Tensor& inp_tensor=context->input(0);
      OP_REQUIRES(context,inp_tensor.dims()==3 && inp_tensor.shape().dim_size(2)==3,errors::InvalidArgument("GatherPoint expects (batch_size,num_points,3) inp shape"));
      int b=inp_tensor.shape().dim_size(0);
      int n=inp_tensor.shape().dim_size(1);
      const Tensor& idx_tensor=context->input(1);
      OP_REQUIRES(context,idx_tensor.dims()==2 && idx_tensor.shape().dim_size(0)==b,errors::InvalidArgument("GatherPoint expects (batch_size,num_result) idx shape"));
      int m=idx_tensor.shape().dim_size(1);
      auto inp_flat=inp_tensor.flat<float>();
      const float * inp=&(inp_flat(0));
      auto idx_flat=idx_tensor.flat<int>();
      const int * idx=&(idx_flat(0));
      Tensor * out_tensor=NULL;
      OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,m,3},&out_tensor));
      auto out_flat=out_tensor->flat<float>();
      float * out=&(out_flat(0));
      gatherpointLauncher(b,n,m,inp,idx,out);
    }
};
REGISTER_KERNEL_BUILDER(Name("GatherPoint").Device(DEVICE_GPU),GatherPointGpuOp);

void scatteraddpointLauncher(int b,int n,int m,const float * out_g,const int * idx,float * inp_g);
class GatherPointGradGpuOp: public OpKernel{
  public:
    explicit GatherPointGradGpuOp(OpKernelConstruction * context):OpKernel(context){}
    void Compute(OpKernelContext * context)override{
      const Tensor& inp_tensor=context->input(0);
      OP_REQUIRES(context,inp_tensor.dims()==3 && inp_tensor.shape().dim_size(2)==3,errors::InvalidArgument("GatherPointGradGpuOp expects (batch_size,num_points,3) inp"));
      int b=inp_tensor.shape().dim_size(0);
      int n=inp_tensor.shape().dim_size(1);
      const Tensor& idx_tensor=context->input(1);
      OP_REQUIRES(context,idx_tensor.dims()==2 && idx_tensor.shape().dim_size(0)==b,errors::InvalidArgument("GatherPointGradGpuOp expects (batch_size,num_result) idx shape"));
      int m=idx_tensor.shape().dim_size(1);
      auto inp_flat=inp_tensor.flat<float>();
      const float * inp=&(inp_flat(0));
      auto idx_flat=idx_tensor.flat<int>();
      const int * idx=&(idx_flat(0));
      const Tensor& out_g_tensor=context->input(2);
      OP_REQUIRES(context,out_g_tensor.dims()==3 && out_g_tensor.shape().dim_size(0)==b && out_g_tensor.shape().dim_size(1)==m && out_g_tensor.shape().dim_size(2)==3,errors::InvalidArgument("GatherPointGradGpuOp expects (batch_size,num_result,3) out_g shape"));
      auto out_g_flat=out_g_tensor.flat<float>();
      const float * out_g=&(out_g_flat(0));
      Tensor * inp_g_tensor=NULL;
      OP_REQUIRES_OK(context,context->allocate_output(0,TensorShape{b,n,3},&inp_g_tensor));
      auto inp_g_flat=inp_g_tensor->flat<float>();
      float * inp_g=&(inp_g_flat(0));
      cudaMemset(inp_g,0,b*n*3*4);
      scatteraddpointLauncher(b,n,m,out_g,idx,inp_g);
    }
};
REGISTER_KERNEL_BUILDER(Name("GatherPointGrad").Device(DEVICE_GPU),GatherPointGradGpuOp);



================================================
FILE: code/tf_ops/sampling/tf_sampling.py
================================================
''' Furthest point sampling
Original author: Haoqiang Fan
Modified by Charles R. Qi
All Rights Reserved. 2017. 
'''
import tensorflow as tf
from tensorflow.python.framework import ops
import sys
import os
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
sampling_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_sampling_so.so'))
def prob_sample(inp,inpr):
    '''
input:
    batch_size * ncategory float32
    batch_size * npoints   float32
returns:
    batch_size * npoints   int32
    '''
    return sampling_module.prob_sample(inp,inpr)
ops.NoGradient('ProbSample')
# TF1.0 API requires set shape in C++
#@tf.RegisterShape('ProbSample')
#def _prob_sample_shape(op):
#    shape1=op.inputs[0].get_shape().with_rank(2)
#    shape2=op.inputs[1].get_shape().with_rank(2)
#    return [tf.TensorShape([shape2.dims[0],shape2.dims[1]])]
def gather_point(inp,idx):
    '''
input:
    batch_size * ndataset * 3   float32
    batch_size * npoints        int32
returns:
    batch_size * npoints * 3    float32
    '''
    return sampling_module.gather_point(inp,idx)
#@tf.RegisterShape('GatherPoint')
#def _gather_point_shape(op):
#    shape1=op.inputs[0].get_shape().with_rank(3)
#    shape2=op.inputs[1].get_shape().with_rank(2)
#    return [tf.TensorShape([shape1.dims[0],shape2.dims[1],shape1.dims[2]])]
@tf.RegisterGradient('GatherPoint')
def _gather_point_grad(op,out_g):
    inp=op.inputs[0]
    idx=op.inputs[1]
    return [sampling_module.gather_point_grad(inp,idx,out_g),None]
def farthest_point_sample(npoint,inp):
    '''
input:
    int32
    batch_size * ndataset * 3   float32
returns:
    batch_size * npoint         int32
    '''
    return sampling_module.farthest_point_sample(inp, npoint)
ops.NoGradient('FarthestPointSample')
    

if __name__=='__main__':
    import numpy as np
    np.random.seed(100)
    triangles=np.random.rand(1,5,3,3).astype('float32')
    with tf.device('/gpu:1'):
        inp=tf.constant(triangles)
        tria=inp[:,:,0,:]
        trib=inp[:,:,1,:]
        tric=inp[:,:,2,:]
        areas=tf.sqrt(tf.reduce_sum(tf.cross(trib-tria,tric-tria)**2,2)+1e-9)
        randomnumbers=tf.random_uniform((1,8192))
        triids=prob_sample(areas,randomnumbers)
        tria_sample=gather_point(tria,triids)
        trib_sample=gather_point(trib,triids)
        tric_sample=gather_point(tric,triids)
        us=tf.random_uniform((1,8192))
        vs=tf.random_uniform((1,8192))
        uplusv=1-tf.abs(us+vs-1)
        uminusv=us-vs
        us=(uplusv+uminusv)*0.5
        vs=(uplusv-uminusv)*0.5
        pt_sample=tria_sample+(trib_sample-tria_sample)*tf.expand_dims(us,-1)+(tric_sample-tria_sample)*tf.expand_dims(vs,-1)
        print 'pt_sample: ', pt_sample
        reduced_sample=gather_point(pt_sample,farthest_point_sample(1024,pt_sample))
        print reduced_sample
    with tf.Session('') as sess:
        ret=sess.run(reduced_sample)
    print ret.shape,ret.dtype
    import cPickle as pickle
    pickle.dump(ret,open('1.pkl','wb'),-1)


================================================
FILE: code/tf_ops/sampling/tf_sampling_compile.sh
================================================
#/bin/bash
/usr/local/cuda-8.0/bin/nvcc tf_sampling_g.cu -o tf_sampling_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -I /home/ganeshiyer/tensorflow/lib/python2.7/site-packages/tensorflow/include  -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0


================================================
FILE: code/tf_ops/sampling/tf_sampling_g.cu
================================================
/* Furthest point sampling GPU implementation
 * Original author: Haoqiang Fan
 * Modified by Charles R. Qi
 * All Rights Reserved. 2017. 
 */

__global__ void cumsumKernel(int b,int n,const float * __restrict__ inp,float * __restrict__ out){
  const int BlockSize=2048;
  const int paddingLevel=5;
  __shared__ float buffer4[BlockSize*4];
  __shared__ float buffer[BlockSize+(BlockSize>>paddingLevel)];
  for (int i=blockIdx.x;i<b;i+=gridDim.x){
    float runningsum=0,runningsum2=0;
    for (int j=0;j<n;j+=BlockSize*4){
      int n24_i=min(n-j,BlockSize*4);
      int n24=(n24_i+3)&~3;
      int n2=n24>>2;
      for (int k=threadIdx.x*4;k<n24_i;k+=blockDim.x*4){
        if (k+3<n24_i){
          float v1=inp[i*n+j+k];
          float v2=inp[i*n+j+k+1];
          v2+=v1;
          float v3=inp[i*n+j+k+2];
          float v4=inp[i*n+j+k+3];
          v4+=v3;
          v3+=v2;
          v4+=v2;
          buffer4[k]=v1;
          buffer4[k+1]=v2;
          buffer4[k+2]=v3;
          buffer4[k+3]=v4;
          buffer[(k>>2)+(k>>(2+paddingLevel))]=v4;
        }else{
          float v=0;
          for (int k2=k;k2<n24_i;k2++){
            v+=inp[i*n+j+k2];
            buffer4[k2]=v;
          }
          for (int k2=n24_i;k2<n24;k2++){
            buffer4[k2]=v;
          }
          buffer[(k>>2)+(k>>(2+paddingLevel))]=v;
        }
      }
      int u=0;
      for (;(2<<u)<=n2;u++){
        __syncthreads();
        for (int k=threadIdx.x;k<int(n2>>(u+1));k+=blockDim.x){
          int i1=(((k<<1)+2)<<u)-1;
          int i2=(((k<<1)+1)<<u)-1;
          i1+=i1>>paddingLevel;
          i2+=i2>>paddingLevel;
          buffer[i1]+=buffer[i2];
        }
      }
      u--;
      for (;u>=0;u--){
        __syncthreads();
        for (int k=threadIdx.x;k<int((n2-(1<<u))>>(u+1));k+=blockDim.x){
          int i1=(((k<<1)+3)<<u)-1;
          int i2=(((k<<1)+2)<<u)-1;
          i1+=i1>>paddingLevel;
          i2+=i2>>paddingLevel;
          buffer[i1]+=buffer[i2];
        }
      }
      __syncthreads();
      for (int k=threadIdx.x*4;k<n24;k+=blockDim.x*4){
        if (k!=0){
          int k2=((k>>2)-1)+(((k>>2)-1)>>paddingLevel);
          buffer4[k]+=buffer[k2];
          buffer4[k+1]+=buffer[k2];
          buffer4[k+2]+=buffer[k2];
          buffer4[k+3]+=buffer[k2];
        }
      }
      __syncthreads();
      for (int k=threadIdx.x;k<n24_i;k+=blockDim.x){
        out[i*n+j+k]=buffer4[k]+runningsum;
      }
      float t=buffer[(n2-1)+((n2-1)>>paddingLevel)]+runningsum2;
      float r2=runningsum+t;
      runningsum2=t-(r2-runningsum);
      runningsum=r2;
      __syncthreads();
    }
  }
}

__global__ void binarysearchKernel(int b,int n,int m,const float * __restrict__ dataset,const float * __restrict__ query, int * __restrict__ result){
  int base=1;
  while (base<n)
    base<<=1;
  for (int i=blockIdx.x;i<b;i+=gridDim.x){
    for (int j=blockIdx.y*blockDim.x+threadIdx.x;j<m;j+=blockDim.x*gridDim.y){
      float q=query[i*m+j]*dataset[i*n+n-1];
      int r=n-1;
      for (int k=base;k>=1;k>>=1)
        if (r>=k && dataset[i*n+r-k]>=q)
          r-=k;
      result[i*m+j]=r;
    }
  }
}
__global__ void farthestpointsamplingKernel(int b,int n,int m,const float * __restrict__ dataset,float * __restrict__ temp,int * __restrict__ idxs){
  if (m<=0)
    return;
  const int BlockSize=512;
  __shared__ float dists[BlockSize];
  __shared__ int dists_i[BlockSize];
  const int BufferSize=3072;
  __shared__ float buf[BufferSize*3];
  for (int i=blockIdx.x;i<b;i+=gridDim.x){
    int old=0;
    if (threadIdx.x==0)
      idxs[i*m+0]=old;
    for (int j=threadIdx.x;j<n;j+=blockDim.x){
      temp[blockIdx.x*n+j]=1e38;
    }
    for (int j=threadIdx.x;j<min(BufferSize,n)*3;j+=blockDim.x){
      buf[j]=dataset[i*n*3+j];
    }
    __syncthreads();
    for (int j=1;j<m;j++){
      int besti=0;
      float best=-1;
      float x1=dataset[i*n*3+old*3+0];
      float y1=dataset[i*n*3+old*3+1];
      float z1=dataset[i*n*3+old*3+2];
      for (int k=threadIdx.x;k<n;k+=blockDim.x){
        float td=temp[blockIdx.x*n+k];
        float x2,y2,z2;
        if (k<BufferSize){
          x2=buf[k*3+0];
          y2=buf[k*3+1];
          z2=buf[k*3+2];
        }else{
          x2=dataset[i*n*3+k*3+0];
          y2=dataset[i*n*3+k*3+1];
          z2=dataset[i*n*3+k*3+2];
        }
        float d=(x2-x1)*(x2-x1)+(y2-y1)*(y2-y1)+(z2-z1)*(z2-z1);
        float d2=min(d,td);
        if (d2!=td)
          temp[blockIdx.x*n+k]=d2;
        if (d2>best){
          best=d2;
          besti=k;
        }
      }
      dists[threadIdx.x]=best;
      dists_i[threadIdx.x]=besti;
      for (int u=0;(1<<u)<blockDim.x;u++){
        __syncthreads();
        if (threadIdx.x<(blockDim.x>>(u+1))){
          int i1=(threadIdx.x*2)<<u;
          int i2=(threadIdx.x*2+1)<<u;
          if (dists[i1]<dists[i2]){
            dists[i1]=dists[i2];
            dists_i[i1]=dists_i[i2];
          }
        }
      }
      __syncthreads();
      old=dists_i[0];
      if (threadIdx.x==0)
        idxs[i*m+j]=old;
    }
  }
}

__global__ void gatherpointKernel(int b,int n,int m,const float * __restrict__ inp,const int * __restrict__ idx,float * __restrict__ out){
  for (int i=blockIdx.x;i<b;i+=gridDim.x){
    for (int j=blockIdx.y*blockDim.x+threadIdx.x;j<m;j+=blockDim.x*gridDim.y){
      int a=idx[i*m+j];
      out[(i*m+j)*3+0]=inp[(i*n+a)*3+0];
      out[(i*m+j)*3+1]=inp[(i*n+a)*3+1];
      out[(i*m+j)*3+2]=inp[(i*n+a)*3+2];
    }
  }
}

__global__ void scatteraddpointKernel(int b,int n,int m,const float * __restrict__ out_g,const int * __restrict__ idx,float * __restrict__ inp_g){
  for (int i=blockIdx.x;i<b;i+=gridDim.x){
    for (int j=blockIdx.y*blockDim.x+threadIdx.x;j<m;j+=blockDim.x*gridDim.y){
      int a=idx[i*m+j];
      atomicAdd(&inp_g[(i*n+a)*3+0],out_g[(i*m+j)*3+0]);
      atomicAdd(&inp_g[(i*n+a)*3+1],out_g[(i*m+j)*3+1]);
      atomicAdd(&inp_g[(i*n+a)*3+2],out_g[(i*m+j)*3+2]);
    }
  }
}

void cumsumLauncher(int b,int n,const float * inp,float * out){
  cumsumKernel<<<32,512>>>(b,n,inp,out);
}
//require b*n working space
void probsampleLauncher(int b,int n,int m,const float * inp_p,const float * inp_r,float * temp,int * out){
  cumsumKernel<<<32,512>>>(b,n,inp_p,temp);
  binarysearchKernel<<<dim3(32,8,1),512>>>(b,n,m,temp,inp_r,out);
}
//require 32*n working space
void farthestpointsamplingLauncher(int b,int n,int m,const float * inp,float * temp,int * out){
  farthestpointsamplingKernel<<<32,512>>>(b,n,m,inp,temp,out);
}
void gatherpointLauncher(int b,int n,int m,const float * inp,const int * idx,float * out){
  gatherpointKernel<<<dim3(2,8,1),512>>>(b,n,m,inp,idx,out);
}
void scatteraddpointLauncher(int b,int n,int m,const float * out_g,const int * idx,float * inp_g){
  scatteraddpointKernel<<<dim3(2,8,1),512>>>(b,n,m,out_g,idx,inp_g);
}



================================================
FILE: code/train_model_combined.py
================================================
import numpy as np
import tensorflow as tf
import scipy.misc as smc

import config_res as config

from common.cnn_utils_res import *
from common import resnet_rgb_model as model
from common import resnet_depth_model as model_depth
from common import all_transformer as at3
from common import global_agg_net
from common.Lie_functions import exponential_map_single

import nw_loader_color as ldr
import model_utils


_BETA_CONST = 1.0
_ALPHA_CONST = 1.0
IMG_HT = config.depth_img_params['IMG_HT']
IMG_WDT = config.depth_img_params['IMG_WDT']
batch_size = config.net_params['batch_size']
learning_rate = config.net_params['learning_rate']
n_epochs = config.net_params['epochs']
current_epoch = config.net_params['load_epoch']

tf.reset_default_graph()

X1 = tf.placeholder(tf.float32, shape = (batch_size, IMG_HT, IMG_WDT, 3), name = "X1")
X2 = tf.placeholder(tf.float32, shape = (batch_size, IMG_HT, IMG_WDT, 1), name = "X2")
depth_maps_target = tf.placeholder(tf.float32, shape = (batch_size, IMG_HT, IMG_WDT, 1), name = "depth_maps_target")
expected_transforms = tf.placeholder(tf.float32, shape = (batch_size, 4, 4), name = "expected_transforms")

phase = tf.placeholder(tf.bool, [], name = "phase")
phase_rgb = tf.placeholder(tf.bool, [], name = "phase_rgb")
keep_prob = tf.placeholder(tf.float32, name = "keep_prob")

fx = config.camera_params['fx']
fy = config.camera_params['fy']
cx = config.camera_params['cx']
cy = config.camera_params['cy']

fx_scaled = 2*(fx)/np.float32(IMG_WDT)              # focal length x scaled for -1 to 1 range
fy_scaled = 2*(fy)/np.float32(IMG_HT)               # focal length y scaled for -1 to 1 range
cx_scaled = -1 + 2*(cx - 1.0)/np.float32(IMG_WDT)   # optical center x scaled for -1 to 1 range
cy_scaled = -1 + 2*(cy - 1.0)/np.float32(IMG_HT)    # optical center y scaled for -1 to 1 range

K_mat_scaled = np.array([[fx_scaled,  0.0, cx_scaled],
                         [0.0, fy_scaled,  cy_scaled],
                         [0.0, 0.0, 1.0]], dtype = np.float32)

K_final = tf.constant(K_mat_scaled, dtype = tf.float32)
small_transform = tf.constant(config.camera_params['cam_transform_02_inv'], dtype = tf.float32)


X2_pooled = tf.nn.max_pool(X2, ksize=[1,5,5,1], strides=[1,1,1,1], padding="SAME")
depth_maps_target_pooled = tf.nn.max_pool(depth_maps_target, ksize=[1,5,5,1], strides=[1,1,1,1], padding="SAME")

output_vectors, weight_summaries = global_agg_net.End_Net_Out(X1, phase_rgb, X2_pooled, phase, keep_prob)

# se(3) -> SE(3) for the whole batch
predicted_transforms = tf.map_fn(lambda x:exponential_map_single(output_vectors[x]), elems=tf.range(0, batch_size, 1), dtype=tf.float32)

# transforms depth maps by the predicted transformation
depth_maps_predicted, cloud_pred = tf.map_fn(lambda x:at3._simple_transformer(X2_pooled[x,:,:,0]*40.0 + 40.0, predicted_transforms[x], K_final, small_transform), elems = tf.range(0, batch_size, 1), dtype = (tf.float32, tf.float32))

# transforms depth maps by the expected transformation
depth_maps_expected, cloud_exp = tf.map_fn(lambda x:at3._simple_transformer(X2_pooled[x,:,:,0]*40.0 + 40.0, expected_transforms[x], K_final, small_transform), elems = tf.range(0, batch_size, 1), dtype = (tf.float32, tf.float32))

# photometric loss between predicted and expected transformation
photometric_loss = tf.nn.l2_loss(tf.subtract((depth_maps_expected[:,10:-10,10:-10] - 40.0)/40.0, (depth_maps_predicted[:,10:-10,10:-10] - 40.0)/40.0))

# earth mover's distance between point clouds
cloud_loss = model_utils.get_emd_loss(cloud_pred, cloud_exp)

# final loss term
predicted_loss_train = _ALPHA_CONST*photometric_loss + _BETA_CONST*cloud_loss

tf.add_to_collection('losses1', predicted_loss_train)
loss1 = tf.add_n(tf.get_collection('losses1'))

train_step = tf.train.AdamOptimizer(learning_rate = config.net_params['learning_rate'],
                                    beta1 = config.net_params['beta1']).minimize(predicted_loss_train)

predicted_loss_validation = tf.nn.l2_loss(tf.subtract((depth_maps_expected[:,10:-10,10:-10] - 40.0)/40.0, (depth_maps_predicted[:,10:-10,10:-10] - 40.0)/40.0))

cloud_loss_validation = model_utils.get_emd_loss(cloud_pred, cloud_exp)

training_summary_1 = tf.summary.scalar('cloud_loss', _BETA_CONST*cloud_loss)
training_summary_2 = tf.summary.scalar('photometric_loss', photometric_loss)
validation_summary_1 = tf.summary.scalar('Validation_loss', predicted_loss_validation)
validation_summary_2 = tf.summary.scalar('Validation_cloud_loss', cloud_loss_validation)

merge_train = tf.summary.merge([training_summary_1] + [training_summary_2] + weight_summaries)
merge_val = tf.summary.merge([validation_summary_1] + [validation_summary_2])

saver = tf.train.Saver()

# tensorflow gpu configuration. Not to be confused with network configuration file

config_tf = tf.ConfigProto()
config_tf.gpu_options.allow_growth=True

with tf.Session(config = config_tf) as sess:
    sess.run(tf.global_variables_initializer())

    writer = tf.summary.FileWriter("./logs_simple_transformer/")

    total_iterations_train = 0
    total_iterations_validate = 0

    if(current_epoch == 0):
        writer.add_graph(sess.graph)

    checkpoint_path = config.paths['checkpoint_path']

    if(current_epoch > 0):
        print("Restoring Checkpoint")

        saver.restore(sess, checkpoint_path + "/model-%d"%current_epoch)
        current_epoch+=1
        total_iterations_train = current_epoch*config.net_params['total_frames_train']/batch_size
        total_iterations_validate = current_epoch*config.net_params['total_frames_validation']/batch_size

    for epoch in range(current_epoch, n_epochs):
        total_partitions_train = config.net_params['total_frames_train']/config.net_params['partition_limit']
        total_partitions_validation = config.net_params['total_frames_validation']/config.net_params['partition_limit']
        ldr.shuffle()

        for part in range(total_partitions_train):
            source_container, target_container, source_img_container, target_img_container, transforms_container = ldr.load(part, mode = "train")

            for source_b, target_b, source_img_b, target_img_b, transforms_b in zip(source_container, target_container, source_img_container, target_img_container, transforms_container):

                outputs= sess.run([depth_maps_predicted, depth_maps_expected, predicted_loss_train, X2_pooled, train_step, merge_train, predicted_transforms, cloud_loss, photometric_loss, loss1], feed_dict={X1: source_img_b, X2: source_b, depth_maps_target: target_b, expected_transforms: transforms_b ,phase:True, keep_prob:0.5, phase_rgb: False})

                dmaps_pred = outputs[0]
                dmaps_exp = outputs[1]
                loss = outputs[2]
                source = outputs[3]

                if(total_iterations_train%10 == 0):
                    writer.add_summary(outputs[5], total_iterations_train/10)

                print(outputs[8], _ALPHA_CONST*outputs[8], outputs[7], _BETA_CONST*outputs[7], outputs[9],total_iterations_train)

                random_disp = np.random.randint(batch_size)
                print(outputs[6][random_disp])
                print(transforms_b[random_disp])

                if(total_iterations_train%125 == 0):

                    smc.imsave(config.paths['training_imgs_path'] + "/training_save_%d.png"%total_iterations_train, np.vstack((source[random_disp,:,:,0]*40.0 + 40.0, dmaps_pred[random_disp], dmaps_exp[random_disp])))

                total_iterations_train+=1

        if (epoch%1 == 0):
            print("Saving after epoch %d"%epoch)
            saver.save(sess, checkpoint_path + "/model-%d"%epoch)

        for part in range(total_partitions_validation):
            source_container, target_container, source_img_container, target_img_container, transforms_container = ldr.load(part, mode = "validation")

            for source_b, target_b, source_img_b, target_img_b, transforms_b in zip(source_container, target_container, source_img_container, target_img_container, transforms_container):

                outputs= sess.run([depth_maps_predicted, depth_maps_expected, predicted_loss_validation, X2_pooled, merge_val, cloud_loss_validation], feed_dict={X1: source_img_b, X2: source_b, depth_maps_target: target_b, expected_transforms: transforms_b ,phase:False, keep_prob:1.0, phase_rgb: False})

                dmaps_pred = outputs[0]
                dmaps_exp = outputs[1]
                loss = outputs[2]
                source = outputs[3]

                writer.add_summary(outputs[4], total_iterations_validate)
                total_iterations_validate+=1

                print(loss, total_iterations_validate, outputs[5])

                if(total_iterations_validate%25 == 0):

                    random_disp = np.random.randint(batch_size)

                    smc.imsave(config.paths['validation_imgs_path'] + "/validation_save_%d.png"%total_iterations_validate, np.vstack((source[random_disp,:,:,0]*40.0 + 40.0, dmaps_pred[random_disp], dmaps_exp[random_disp])))


================================================
FILE: code/utils/__init__.py
================================================


================================================
FILE: code/utils/data_prep_util.py
================================================
import os
import sys
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(BASE_DIR)
from plyfile import (PlyData, PlyElement, make2d, PlyParseError, PlyProperty)
import numpy as np
import h5py

SAMPLING_BIN = os.path.join(BASE_DIR, 'third_party/mesh_sampling/build/pcsample')

SAMPLING_POINT_NUM = 2048
SAMPLING_LEAF_SIZE = 0.005

MODELNET40_PATH = '../datasets/modelnet40'
def export_ply(pc, filename):
	vertex = np.zeros(pc.shape[0], dtype=[('x', 'f4'), ('y', 'f4'), ('z', 'f4')])
	for i in range(pc.shape[0]):
		vertex[i] = (pc[i][0], pc[i][1], pc[i][2])
	ply_out = PlyData([PlyElement.describe(vertex, 'vertex', comments=['vertices'])])
	ply_out.write(filename)

# Sample points on the obj shape
def get_sampling_command(obj_filename, ply_filename):
    cmd = SAMPLING_BIN + ' ' + obj_filename
    cmd += ' ' + ply_filename
    cmd += ' -n_samples %d ' % SAMPLING_POINT_NUM
    cmd += ' -leaf_size %f ' % SAMPLING_LEAF_SIZE
    return cmd

# --------------------------------------------------------------
# Following are the helper functions to load MODELNET40 shapes
# --------------------------------------------------------------

# Read in the list of categories in MODELNET40
def get_category_names():
    shape_names_file = os.path.join(MODELNET40_PATH, 'shape_names.txt')
    shape_names = [line.rstrip() for line in open(shape_names_file)]
    return shape_names

# Return all the filepaths for the shapes in MODELNET40 
def get_obj_filenames():
    obj_filelist_file = os.path.join(MODELNET40_PATH, 'filelist.txt')
    obj_filenames = [os.path.join(MODELNET40_PATH, line.rstrip()) for line in open(obj_filelist_file)]
    print('Got %d obj files in modelnet40.' % len(obj_filenames))
    return obj_filenames

# Helper function to create the father folder and all subdir folders if not exist
def batch_mkdir(output_folder, subdir_list):
    if not os.path.exists(output_folder):
        os.mkdir(output_folder)
    for subdir in subdir_list:
        if not os.path.exists(os.path.join(output_folder, subdir)):
            os.mkdir(os.path.join(output_folder, subdir))

# ----------------------------------------------------------------
# Following are the helper functions to load save/load HDF5 files
# ----------------------------------------------------------------

# Write numpy array data and label to h5_filename
def save_h5_data_label_normal(h5_filename, data, label, normal, 
		data_dtype='float32', label_dtype='uint8', noral_dtype='float32'):
    h5_fout = h5py.File(h5_filename)
    h5_fout.create_dataset(
            'data', data=data,
            compression='gzip', compression_opts=4,
            dtype=data_dtype)
    h5_fout.create_dataset(
            'normal', data=normal,
            compression='gzip', compression_opts=4,
            dtype=normal_dtype)
    h5_fout.create_dataset(
            'label', data=label,
            compression='gzip', compression_opts=1,
            dtype=label_dtype)
    h5_fout.close()


# Write numpy array data and label to h5_filename
def save_h5(h5_filename, data, label, data_dtype='uint8', label_dtype='uint8'):
    h5_fout = h5py.File(h5_filename)
    h5_fout.create_dataset(
            'data', data=data,
            compression='gzip', compression_opts=4,
            dtype=data_dtype)
    h5_fout.create_dataset(
            'label', data=label,
            compression='gzip', compression_opts=1,
            dtype=label_dtype)
    h5_fout.close()

# Read numpy array data and label from h5_filename
def load_h5_data_label_normal(h5_filename):
    f = h5py.File(h5_filename)
    data = f['data'][:]
    label = f['label'][:]
    normal = f['normal'][:]
    return (data, label, normal)

# Read numpy array data and label from h5_filename
def load_h5_data_label_seg(h5_filename):
    f = h5py.File(h5_filename)
    data = f['data'][:]
    label = f['label'][:]
    seg = f['pid'][:]
    return (data, label, seg)

# Read numpy array data and label from h5_filename
def load_h5(h5_filename):
    f = h5py.File(h5_filename)
    data = f['data'][:]
    label = f['label'][:]
    return (data, label)

# ----------------------------------------------------------------
# Following are the helper functions to load save/load PLY files
# ----------------------------------------------------------------

# Load PLY file
def load_ply_data(filename, point_num):
    plydata = PlyData.read(filename)
    pc = plydata['vertex'].data[:point_num]
    pc_array = np.array([[x, y, z] for x,y,z in pc])
    return pc_array

# Load PLY file
def load_ply_normal(filename, point_num):
    plydata = PlyData.read(filename)
    pc = plydata['normal'].data[:point_num]
    pc_array = np.array([[x, y, z] for x,y,z in pc])
    return pc_array

# Make up rows for Nxk array
# Input Pad is 'edge' or 'constant'
def pad_arr_rows(arr, row, pad='edge'):
    assert(len(arr.shape) == 2)
    assert(arr.shape[0] <= row)
    assert(pad == 'edge' or pad == 'constant')
    if arr.shape[0] == row:
        return arr
    if pad == 'edge':
        return np.lib.pad(arr, ((0, row-arr.shape[0]), (0, 0)), 'edge')
    if pad == 'constant':
        return np.lib.pad(arr, ((0, row-arr.shape[0]), (0, 0)), 'constant', (0, 0))




================================================
FILE: code/utils/eulerangles.py
================================================
# emacs: -*- mode: python-mode; py-indent-offset: 4; indent-tabs-mode: nil -*-
# vi: set ft=python sts=4 ts=4 sw=4 et:
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
#
#   See COPYING file distributed along with the NiBabel package for the
#   copyright and license terms.
#
### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ### ##
''' Module implementing Euler angle rotations and their conversions

See:

* http://en.wikipedia.org/wiki/Rotation_matrix
* http://en.wikipedia.org/wiki/Euler_angles
* http://mathworld.wolfram.com/EulerAngles.html

See also: *Representing Attitude with Euler Angles and Quaternions: A
Reference* (2006) by James Diebel. A cached PDF link last found here:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.5134

Euler's rotation theorem tells us that any rotation in 3D can be
described by 3 angles.  Let's call the 3 angles the *Euler angle vector*
and call the angles in the vector :math:`alpha`, :math:`beta` and
:math:`gamma`.  The vector is [ :math:`alpha`,
:math:`beta`. :math:`gamma` ] and, in this description, the order of the
parameters specifies the order in which the rotations occur (so the
rotation corresponding to :math:`a
Download .txt
gitextract_1fl8qxli/

├── LICENSE
├── README.md
├── code/
│   ├── common/
│   │   ├── Lie_functions.py
│   │   ├── __init__.py
│   │   ├── all_transformer.py
│   │   ├── cnn_utils_res.py
│   │   ├── global_agg_net.py
│   │   ├── resnet_depth_model.py
│   │   └── resnet_rgb_model.py
│   ├── config_res.py
│   ├── dataset_files/
│   │   ├── dataset_build_color.py
│   │   ├── dataset_build_color_2.py
│   │   ├── dataset_builder_parallel.sh
│   │   └── parser.py
│   ├── model_utils.py
│   ├── nw_loader_color.py
│   ├── tf_ops/
│   │   ├── CD/
│   │   │   ├── __init__.py
│   │   │   ├── makefile
│   │   │   ├── render_balls_so.cpp
│   │   │   ├── tf_nndistance.cpp
│   │   │   ├── tf_nndistance.py
│   │   │   ├── tf_nndistance_g.cu
│   │   │   └── tf_nndistance_g.cu.o
│   │   ├── __init__.py
│   │   ├── emd/
│   │   │   ├── __init__.py
│   │   │   ├── tf_auctionmatch.cpp
│   │   │   ├── tf_auctionmatch.py
│   │   │   ├── tf_auctionmatch_compile.sh
│   │   │   ├── tf_auctionmatch_g.cu
│   │   │   └── tf_auctionmatch_g.cu.o
│   │   ├── grouping/
│   │   │   ├── __init__.py
│   │   │   ├── compile.sh
│   │   │   ├── query_ball_point.cpp
│   │   │   ├── query_ball_point.cu
│   │   │   ├── query_ball_point_block.cu
│   │   │   ├── query_ball_point_grid.cu
│   │   │   ├── selection_sort
│   │   │   ├── selection_sort.cpp
│   │   │   ├── selection_sort.cu
│   │   │   ├── selection_sort_const.cu
│   │   │   ├── test_knn.py
│   │   │   ├── tf_grouping.cpp
│   │   │   ├── tf_grouping.py
│   │   │   ├── tf_grouping_compile.sh
│   │   │   ├── tf_grouping_g.cu
│   │   │   ├── tf_grouping_g.cu.o
│   │   │   └── tf_grouping_op_test.py
│   │   ├── interpolation/
│   │   │   ├── __init__.py
│   │   │   ├── interpolate.cpp
│   │   │   ├── tf_interpolate.cpp
│   │   │   ├── tf_interpolate.py
│   │   │   ├── tf_interpolate_compile.sh
│   │   │   ├── tf_interpolate_op_test.py
│   │   │   └── visu_interpolation.py
│   │   └── sampling/
│   │       ├── __init__.py
│   │       ├── tf_sampling.cpp
│   │       ├── tf_sampling.py
│   │       ├── tf_sampling_compile.sh
│   │       ├── tf_sampling_g.cu
│   │       └── tf_sampling_g.cu.o
│   ├── train_model_combined.py
│   └── utils/
│       ├── __init__.py
│       ├── data_prep_util.py
│       ├── eulerangles.py
│       ├── modelnet_data_prep.py
│       ├── off2obj.py
│       ├── pc_util.py
│       ├── plyfile.py
│       ├── pointnet_util.py
│       ├── provider.py
│       ├── show3d.py
│       ├── tf_util.py
│       ├── tf_util2.py
│       └── write_result2html.py
└── docs/
    ├── .nojekyll
    ├── _site/
    │   ├── css/
    │   │   ├── main.css
    │   │   ├── project.css
    │   │   └── project_file.css
    │   └── index.html
    ├── css/
    │   ├── main.css
    │   ├── project.css
    │   └── project_file.css
    └── index.html
Download .txt
SYMBOL INDEX (303 symbols across 39 files)

FILE: code/common/Lie_functions.py
  function for_translation (line 4) | def for_translation(Old_transform,t):
  function for_rotation (line 12) | def for_rotation(Old_transform):
  function exponential_map_single (line 21) | def exponential_map_single(vec):
  function transforms_mul (line 59) | def transforms_mul(T1, T2):

FILE: code/common/all_transformer.py
  function _simple_transformer (line 14) | def _simple_transformer(depth_map, t_mat, k_mat, small_transform):
  function sparsify_cloud (line 24) | def sparsify_cloud(S):
  function _3D_meshgrid_batchwise_diff (line 41) | def _3D_meshgrid_batchwise_diff(height, width, depth_img, num_batch, tra...
  function reverse_all (line 105) | def reverse_all(z):
  function get_pixel_value (line 117) | def get_pixel_value(img, x, y):
  function _bilinear_sampling (line 140) | def _bilinear_sampling(img, x_func, y_func):

FILE: code/common/cnn_utils_res.py
  function weight_variable (line 4) | def weight_variable(shape, str):
  function weight_variable_fc (line 17) | def weight_variable_fc(shape, str):
  function bias_variable (line 30) | def bias_variable(shape, str):
  function init_weights (line 36) | def init_weights(W, str, to_train):
  function init_bias (line 41) | def init_bias(B, layerno, to_train):
  function conv2d_batchnorm (line 46) | def conv2d_batchnorm(x, W, name, phase, beta_r, gamma_r, mean_r, varianc...
  function conv2d_batchnorm_init (line 62) | def conv2d_batchnorm_init(x, W, name, phase, stride = [1,1,1,1], relu = ...
  function conv2d_init (line 73) | def conv2d_init(x, W, name, phase, stride, padding):
  function conv2d_bias_init (line 81) | def conv2d_bias_init(x, W, b, name):
  function max_pool (line 88) | def max_pool(x, name):
  function variable_summaries (line 91) | def variable_summaries(var):

FILE: code/common/global_agg_net.py
  function End_Net_weights_init (line 15) | def End_Net_weights_init():
  function End_Net (line 41) | def End_Net(input_x, phase_depth, keep_prob):
  function End_Net_Out (line 67) | def End_Net_Out(X1, phase_rgb, pooled_input2, phase, keep_prob):

FILE: code/common/resnet_depth_model.py
  class Depthnet (line 11) | class Depthnet:
    method __init__ (line 13) | def __init__(self, input_y, phase, parameters = parameters):
    method Net (line 22) | def Net(self):
    method layer_zero (line 33) | def layer_zero(self, layer_input):
    method layer (line 54) | def layer(self, layer_input, layer_no):

FILE: code/common/resnet_rgb_model.py
  class Resnet (line 11) | class Resnet:
    method __init__ (line 13) | def __init__(self, input_x, phase, parameters = parameters):
    method Net (line 22) | def Net(self):
    method layer_zero (line 33) | def layer_zero(self, layer_input):
    method layer (line 53) | def layer(self, layer_input, layer_no):

FILE: code/dataset_files/dataset_build_color.py
  function timestamp_sync (line 54) | def timestamp_sync(path):

FILE: code/dataset_files/dataset_build_color_2.py
  function timestamp_sync (line 54) | def timestamp_sync(path):

FILE: code/model_utils.py
  function pre_load_checkpoint (line 8) | def pre_load_checkpoint(checkpoint_dir):
  function get_repulsion_loss4 (line 18) | def get_repulsion_loss4(pred, nsample=20, radius=0.07):
  function get_emd_loss (line 38) | def get_emd_loss(pred, gt):
  function get_cd_loss (line 48) | def get_cd_loss(pred, gt):

FILE: code/nw_loader_color.py
  function shuffle (line 23) | def shuffle():
  function load (line 27) | def load(p_no, mode):

FILE: code/tf_ops/CD/render_balls_so.cpp
  type PointInfo (line 7) | struct PointInfo{
  function render_ball (line 14) | void render_ball(int h,int w,unsigned char * show,int n,int * xyzs,float...

FILE: code/tf_ops/CD/tf_nndistance.cpp
  function nnsearch (line 21) | static void nnsearch(int b,int n,int m,const float * xyz1,const float * ...
  class NnDistanceOp (line 45) | class NnDistanceOp : public OpKernel{
    method NnDistanceOp (line 47) | explicit NnDistanceOp(OpKernelConstruction* context):OpKernel(context){}
    method Compute (line 48) | void Compute(OpKernelContext * context)override{
  class NnDistanceGradOp (line 84) | class NnDistanceGradOp : public OpKernel{
    method NnDistanceGradOp (line 86) | explicit NnDistanceGradOp(OpKernelConstruction* context):OpKernel(cont...
    method Compute (line 87) | void Compute(OpKernelContext * context)override{
  class NnDistanceGpuOp (line 169) | class NnDistanceGpuOp : public OpKernel{
    method NnDistanceGpuOp (line 171) | explicit NnDistanceGpuOp(OpKernelConstruction* context):OpKernel(conte...
    method Compute (line 172) | void Compute(OpKernelContext * context)override{
  class NnDistanceGradGpuOp (line 209) | class NnDistanceGradGpuOp : public OpKernel{
    method NnDistanceGradGpuOp (line 211) | explicit NnDistanceGradGpuOp(OpKernelConstruction* context):OpKernel(c...
    method Compute (line 212) | void Compute(OpKernelContext * context)override{

FILE: code/tf_ops/CD/tf_nndistance.py
  function nn_distance (line 9) | def nn_distance(xyz1,xyz2):
  function _nn_distance_grad (line 27) | def _nn_distance_grad(op,grad_dist1,grad_idx1,grad_dist2,grad_idx2):

FILE: code/tf_ops/emd/tf_auctionmatch.cpp
  class AuctionMatchGpuOp (line 27) | class AuctionMatchGpuOp: public OpKernel{
    method AuctionMatchGpuOp (line 29) | explicit AuctionMatchGpuOp(OpKernelConstruction* context):OpKernel(con...
    method Compute (line 30) | void Compute(OpKernelContext * context)override{

FILE: code/tf_ops/emd/tf_auctionmatch.py
  function auction_match (line 10) | def auction_match(xyz1,xyz2):

FILE: code/tf_ops/grouping/query_ball_point.cpp
  function randomf (line 9) | float randomf(){
  function get_time (line 12) | static double get_time(){
  function query_ball_point_cpu (line 19) | void query_ball_point_cpu(int b, int n, int m, const float* radius, int ...
  function group_point_cpu (line 52) | void group_point_cpu(int b, int n, int c, int m, int nsample, const floa...
  function group_point_grad_cpu (line 70) | void group_point_grad_cpu(int b, int n, int c, int m, int nsample, const...
  function main (line 86) | int main()

FILE: code/tf_ops/grouping/selection_sort.cpp
  function randomf (line 9) | float randomf(){
  function get_time (line 12) | static double get_time(){
  function selection_sort_cpu (line 20) | void selection_sort_cpu(int b, int n, int m, int k, const float *dist, i...
  function main (line 65) | int main()

FILE: code/tf_ops/grouping/tf_grouping.cpp
  class QueryBallPointGpuOp (line 68) | class QueryBallPointGpuOp : public OpKernel {
    method QueryBallPointGpuOp (line 70) | explicit QueryBallPointGpuOp(OpKernelConstruction* context) : OpKernel...
    method Compute (line 78) | void Compute(OpKernelContext* context) override {
  class SelectionSortGpuOp (line 113) | class SelectionSortGpuOp : public OpKernel {
    method SelectionSortGpuOp (line 115) | explicit SelectionSortGpuOp(OpKernelConstruction* context) : OpKernel(...
    method Compute (line 120) | void Compute(OpKernelContext* context) override {
  class GroupPointGpuOp (line 147) | class GroupPointGpuOp: public OpKernel{
    method GroupPointGpuOp (line 149) | explicit GroupPointGpuOp(OpKernelConstruction * context):OpKernel(cont...
    method Compute (line 151) | void Compute(OpKernelContext * context) override {
  class GroupPointGradGpuOp (line 178) | class GroupPointGradGpuOp: public OpKernel{
    method GroupPointGradGpuOp (line 180) | explicit GroupPointGradGpuOp(OpKernelConstruction * context):OpKernel(...
    method Compute (line 182) | void Compute(OpKernelContext * context) override {

FILE: code/tf_ops/grouping/tf_grouping.py
  function query_ball_point (line 8) | def query_ball_point(radius, nsample, xyz1, xyz2):
  function select_top_k (line 22) | def select_top_k(k, dist):
  function group_point (line 33) | def group_point(points, idx):
  function _group_point_grad (line 43) | def _group_point_grad(op, grad_out):
  function knn_point (line 48) | def knn_point(k, xyz1, xyz2):

FILE: code/tf_ops/grouping/tf_grouping_op_test.py
  class GroupPointTest (line 5) | class GroupPointTest(tf.test.TestCase):
    method test (line 6) | def test(self):
    method test_grad (line 9) | def test_grad(self):

FILE: code/tf_ops/interpolation/interpolate.cpp
  function randomf (line 9) | float randomf(){
  function get_time (line 12) | static double get_time(){
  function threenn_cpu (line 21) | void threenn_cpu(int b, int n, int m, const float *xyz1, const float *xy...
  function get_weights_cpu (line 69) | void get_weights_cpu(int b, int n, const float *dist, float *weight) {
  function interpolate_cpu (line 84) | void interpolate_cpu(int b, int m, int c, int n, const float *points, co...
  function interpolate_grad_cpu (line 108) | void interpolate_grad_cpu(int b, int n, int c, int m, const float *grad_...
  function main (line 132) | int main()

FILE: code/tf_ops/interpolation/tf_interpolate.cpp
  function randomf (line 48) | float randomf(){
  function get_time (line 51) | static double get_time(){
  function threenn_cpu (line 60) | void threenn_cpu(int b, int n, int m, const float *xyz1, const float *xy...
  function threeinterpolate_cpu (line 107) | void threeinterpolate_cpu(int b, int m, int c, int n, const float *point...
  function threeinterpolate_grad_cpu (line 131) | void threeinterpolate_grad_cpu(int b, int n, int c, int m, const float *...
  class ThreeNNOp (line 157) | class ThreeNNOp : public OpKernel {
    method ThreeNNOp (line 159) | explicit ThreeNNOp(OpKernelConstruction* context) : OpKernel(context) {}
    method Compute (line 161) | void Compute(OpKernelContext* context) override {
  class ThreeInterpolateOp (line 191) | class ThreeInterpolateOp: public OpKernel{
    method ThreeInterpolateOp (line 193) | explicit ThreeInterpolateOp(OpKernelConstruction * context):OpKernel(c...
    method Compute (line 195) | void Compute(OpKernelContext * context) override {
  class ThreeInterpolateGradOp (line 225) | class ThreeInterpolateGradOp: public OpKernel{
    method ThreeInterpolateGradOp (line 227) | explicit ThreeInterpolateGradOp(OpKernelConstruction * context):OpKern...
    method Compute (line 229) | void Compute(OpKernelContext * context) override {

FILE: code/tf_ops/interpolation/tf_interpolate.py
  function three_nn (line 8) | def three_nn(xyz1, xyz2):
  function three_interpolate (line 19) | def three_interpolate(points, idx, weight):
  function _three_interpolate_grad (line 30) | def _three_interpolate_grad(op, grad_out):

FILE: code/tf_ops/interpolation/tf_interpolate_op_test.py
  class GroupPointTest (line 5) | class GroupPointTest(tf.test.TestCase):
    method test (line 6) | def test(self):
    method test_grad (line 9) | def test_grad(self):

FILE: code/tf_ops/interpolation/visu_interpolation.py
  function fun (line 16) | def fun(xyz1,xyz2,pts2):

FILE: code/tf_ops/sampling/tf_sampling.cpp
  class ProbSampleGpuOp (line 66) | class ProbSampleGpuOp: public OpKernel{
    method ProbSampleGpuOp (line 68) | explicit ProbSampleGpuOp(OpKernelConstruction* context):OpKernel(conte...
    method Compute (line 69) | void Compute(OpKernelContext * context)override{
  class FarthestPointSampleGpuOp (line 95) | class FarthestPointSampleGpuOp: public OpKernel{
    method FarthestPointSampleGpuOp (line 97) | explicit FarthestPointSampleGpuOp(OpKernelConstruction* context):OpKer...
    method Compute (line 101) | void Compute(OpKernelContext * context)override{
  class GatherPointGpuOp (line 126) | class GatherPointGpuOp: public OpKernel{
    method GatherPointGpuOp (line 128) | explicit GatherPointGpuOp(OpKernelConstruction * context):OpKernel(con...
    method Compute (line 129) | void Compute(OpKernelContext * context)override{
  class GatherPointGradGpuOp (line 151) | class GatherPointGradGpuOp: public OpKernel{
    method GatherPointGradGpuOp (line 153) | explicit GatherPointGradGpuOp(OpKernelConstruction * context):OpKernel...
    method Compute (line 154) | void Compute(OpKernelContext * context)override{

FILE: code/tf_ops/sampling/tf_sampling.py
  function prob_sample (line 13) | def prob_sample(inp,inpr):
  function gather_point (line 29) | def gather_point(inp,idx):
  function _gather_point_grad (line 44) | def _gather_point_grad(op,out_g):
  function farthest_point_sample (line 48) | def farthest_point_sample(npoint,inp):

FILE: code/utils/data_prep_util.py
  function export_ply (line 15) | def export_ply(pc, filename):
  function get_sampling_command (line 23) | def get_sampling_command(obj_filename, ply_filename):
  function get_category_names (line 35) | def get_category_names():
  function get_obj_filenames (line 41) | def get_obj_filenames():
  function batch_mkdir (line 48) | def batch_mkdir(output_folder, subdir_list):
  function save_h5_data_label_normal (line 60) | def save_h5_data_label_normal(h5_filename, data, label, normal,
  function save_h5 (line 79) | def save_h5(h5_filename, data, label, data_dtype='uint8', label_dtype='u...
  function load_h5_data_label_normal (line 92) | def load_h5_data_label_normal(h5_filename):
  function load_h5_data_label_seg (line 100) | def load_h5_data_label_seg(h5_filename):
  function load_h5 (line 108) | def load_h5(h5_filename):
  function load_ply_data (line 119) | def load_ply_data(filename, point_num):
  function load_ply_normal (line 126) | def load_ply_normal(filename, point_num):
  function pad_arr_rows (line 134) | def pad_arr_rows(arr, row, pad='edge'):

FILE: code/utils/eulerangles.py
  function euler2mat (line 98) | def euler2mat(z=0, y=0, x=0):
  function mat2euler (line 198) | def mat2euler(M, cy_thresh=None):
  function euler2quat (line 271) | def euler2quat(z=0, y=0, x=0):
  function quat2euler (line 319) | def quat2euler(q):
  function euler2angle_axis (line 348) | def euler2angle_axis(z=0, y=0, x=0):
  function angle_axis2euler (line 382) | def angle_axis2euler(theta, vector, is_normalized=False):

FILE: code/utils/modelnet_data_prep.py
  function nonuniform_sampling (line 16) | def nonuniform_sampling(num = 4096, sample_num = 1024):
  function save_file (line 27) | def save_file(file_path, data):
  function tmp (line 32) | def tmp(file_path=None):
  function nonuniformsample_from_pointcloud_fn (line 84) | def nonuniformsample_from_pointcloud_fn():
  function possion_sample_fn (line 129) | def possion_sample_fn(file_path):
  function normal_off_file (line 166) | def normal_off_file(folder='/home/lqyu/workspace/PointSR/data/ModelNet10'):
  function recalculateNormal_fn (line 202) | def recalculateNormal_fn(file_path):
  function fix_off_file (line 273) | def fix_off_file(filepath):
  function possion_sample_fn (line 290) | def possion_sample_fn(phase='train'):
  function recalculateNormal (line 304) | def recalculateNormal(phase='train'):
  function nonuniformsample_from_pointcloud (line 311) | def nonuniformsample_from_pointcloud(phase='train'):
  function save_h52 (line 319) | def save_h52(save_names = ['poisson_4096','poisson_2048']):
  function save_h5 (line 390) | def save_h5(h5_filename,save_names = ['patch_poisson_4096','patch_poisso...
  function load_h5 (line 440) | def load_h5(h5_filename):

FILE: code/utils/off2obj.py
  class Mesh (line 19) | class Mesh:
    method __init__ (line 21) | def __init__(self):
    method writeToObjFile (line 27) | def writeToObjFile(self, pathToObjFile):
    method loadFromOffFile (line 49) | def loadFromOffFile(self, pathToOffFile):
    method edgeList (line 86) | def edgeList(self):

FILE: code/utils/pc_util.py
  function point_cloud_to_volume_batch (line 27) | def point_cloud_to_volume_batch(point_clouds, vsize=12, radius=1.0, flat...
  function point_cloud_to_volume (line 44) | def point_cloud_to_volume(points, vsize, radius=1.0):
  function volume_to_point_cloud (line 59) | def volume_to_point_cloud(vol):
  function read_ply (line 80) | def read_ply(filename):
  function write_ply (line 88) | def write_ply(points, filename, text=True):
  function draw_point_cloud (line 100) | def draw_point_cloud(input_points, canvasSize=500, space=240, diameter=10,
  function point_cloud_three_views (line 174) | def point_cloud_three_views(points,diameter=5):
  function point_cloud_three_views_demo (line 195) | def point_cloud_three_views_demo():
  function pyplot_draw_point_cloud (line 207) | def pyplot_draw_point_cloud(points, output_filename):
  function pyplot_draw_volume (line 217) | def pyplot_draw_volume(vol, output_filename):

FILE: code/utils/plyfile.py
  function _lookup_type (line 80) | def _lookup_type(type_str):
  function _split_line (line 91) | def _split_line(line, n):
  function make2d (line 101) | def make2d(array, cols=None, dtype=None):
  class PlyParseError (line 121) | class PlyParseError(Exception):
    method __init__ (line 131) | def __init__(self, message, element=None, row=None, prop=None):
    method __repr__ (line 148) | def __repr__(self):
  class PlyData (line 153) | class PlyData(object):
    method __init__ (line 165) | def __init__(self, elements=[], text=False, byte_order='=',
    method _get_elements (line 193) | def _get_elements(self):
    method _set_elements (line 196) | def _set_elements(self, elements):
    method _get_byte_order (line 202) | def _get_byte_order(self):
    method _set_byte_order (line 205) | def _set_byte_order(self, byte_order):
    method _index (line 213) | def _index(self):
    method _parse_header (line 220) | def _parse_header(stream):
    method read (line 275) | def read(stream):
    method write (line 291) | def write(self, stream):
    method header (line 307) | def header(self):
    method __iter__ (line 333) | def __iter__(self):
    method __len__ (line 336) | def __len__(self):
    method __contains__ (line 339) | def __contains__(self, name):
    method __getitem__ (line 342) | def __getitem__(self, name):
    method __str__ (line 345) | def __str__(self):
    method __repr__ (line 348) | def __repr__(self):
  function _open_stream (line 355) | def _open_stream(stream, read_or_write):
  class PlyElement (line 364) | class PlyElement(object):
    method __init__ (line 379) | def __init__(self, name, properties, count, comments=[]):
    method count (line 400) | def count(self):
    method _get_data (line 403) | def _get_data(self):
    method _set_data (line 406) | def _set_data(self, data):
    method _check_sanity (line 413) | def _check_sanity(self):
    method _get_properties (line 418) | def _get_properties(self):
    method _set_properties (line 421) | def _set_properties(self, properties):
    method _index (line 428) | def _index(self):
    method ply_property (line 434) | def ply_property(self, name):
    method name (line 438) | def name(self):
    method _check_name (line 441) | def _check_name(self):
    method dtype (line 446) | def dtype(self, byte_order='='):
    method _parse_multi (line 458) | def _parse_multi(header_lines):
    method _parse_one (line 471) | def _parse_one(lines):
    method describe (line 507) | def describe(data, name, len_types={}, val_types={},
    method _read (line 567) | def _read(self, stream, text, byte_order):
    method _write (line 593) | def _write(self, stream, text, byte_order):
    method _read_txt (line 611) | def _read_txt(self, stream):
    method _write_txt (line 643) | def _write_txt(self, stream):
    method _read_bin (line 656) | def _read_bin(self, stream, byte_order):
    method _write_bin (line 673) | def _write_bin(self, stream, byte_order):
    method header (line 684) | def header(self):
    method __getitem__ (line 701) | def __getitem__(self, key):
    method __setitem__ (line 704) | def __setitem__(self, key, value):
    method __str__ (line 707) | def __str__(self):
    method __repr__ (line 710) | def __repr__(self):
  class PlyProperty (line 716) | class PlyProperty(object):
    method __init__ (line 724) | def __init__(self, name, val_dtype):
    method _get_val_dtype (line 729) | def _get_val_dtype(self):
    method _set_val_dtype (line 732) | def _set_val_dtype(self, val_dtype):
    method name (line 738) | def name(self):
    method _check_name (line 741) | def _check_name(self):
    method _parse_one (line 747) | def _parse_one(line):
    method dtype (line 770) | def dtype(self, byte_order='='):
    method _from_fields (line 778) | def _from_fields(self, fields):
    method _to_fields (line 786) | def _to_fields(self, data):
    method _read_bin (line 793) | def _read_bin(self, stream, byte_order):
    method _write_bin (line 804) | def _write_bin(self, data, stream, byte_order):
    method __str__ (line 811) | def __str__(self):
    method __repr__ (line 815) | def __repr__(self):
  class PlyListProperty (line 820) | class PlyListProperty(PlyProperty):
    method __init__ (line 827) | def __init__(self, name, len_dtype, val_dtype):
    method _get_len_dtype (line 832) | def _get_len_dtype(self):
    method _set_len_dtype (line 835) | def _set_len_dtype(self, len_dtype):
    method dtype (line 840) | def dtype(self, byte_order='='):
    method list_dtype (line 847) | def list_dtype(self, byte_order='='):
    method _from_fields (line 856) | def _from_fields(self, fields):
    method _to_fields (line 867) | def _to_fields(self, data):
    method _read_bin (line 881) | def _read_bin(self, stream, byte_order):
    method _write_bin (line 895) | def _write_bin(self, data, stream, byte_order):
    method __str__ (line 907) | def __str__(self):
    method __repr__ (line 912) | def __repr__(self):

FILE: code/utils/pointnet_util.py
  function sample_and_group (line 16) | def sample_and_group(npoint, radius, nsample, xyz, points, tnet_spec=Non...
  function sample_and_group_all (line 67) | def sample_and_group_all(xyz, points, use_xyz=True):
  function pointnet_sa_module (line 95) | def pointnet_sa_module(xyz, points, npoint, radius, nsample, mlp, mlp2, ...
  function pointnet_sa_module_msg (line 154) | def pointnet_sa_module_msg(xyz, points, npoint, radius_list, nsample_lis...
  function pointnet_fp_module (line 193) | def pointnet_fp_module(xyz1, xyz2, points1, points2, mlp, is_training, b...

FILE: code/utils/provider.py
  function shuffle_data (line 20) | def shuffle_data(data, labels):
  function rotate_point_cloud (line 33) | def rotate_point_cloud(batch_data):
  function rotate_point_cloud_by_angle (line 54) | def rotate_point_cloud_by_angle(batch_data, rotation_angle):
  function rotate_perturbation_point_cloud (line 74) | def rotate_perturbation_point_cloud(batch_data, angle_sigma=0.06, angle_...
  function jitter_point_cloud (line 99) | def jitter_point_cloud(batch_data, sigma=0.01, clip=0.05):
  function shift_point_cloud (line 112) | def shift_point_cloud(batch_data, shift_range=0.1):
  function random_scale_point_cloud (line 126) | def random_scale_point_cloud(batch_data, scale_low=0.8, scale_high=1.25):
  function getDataFiles (line 139) | def getDataFiles(list_filename):
  function load_h5 (line 142) | def load_h5(h5_filename):
  function loadDataFile (line 148) | def loadDataFile(filename):

FILE: code/utils/show3d.py
  function onmouse (line 44) | def onmouse(*args):
  function showpoints (line 52) | def showpoints(xyz,c0=None,c1=None,c2=None,waittime=0,showrot=False,magn...

FILE: code/utils/tf_util.py
  function _variable_on_cpu (line 10) | def _variable_on_cpu(name, shape, initializer, use_fp16=False):
  function _variable_with_weight_decay (line 23) | def _variable_with_weight_decay(name, shape, stddev, wd, use_xavier=True):
  function conv1d (line 51) | def conv1d(inputs,
  function conv2d (line 113) | def conv2d(inputs,
  function conv2d_transpose (line 173) | def conv2d_transpose(inputs,
  function conv3d (line 253) | def conv3d(inputs,
  function fully_connected (line 312) | def fully_connected(inputs,
  function max_pool2d (line 351) | def max_pool2d(inputs,
  function avg_pool2d (line 376) | def avg_pool2d(inputs,
  function max_pool3d (line 402) | def max_pool3d(inputs,
  function avg_pool3d (line 427) | def avg_pool3d(inputs,
  function batch_norm_template (line 456) | def batch_norm_template(inputs, is_training, scope, moments_dims, bn_dec...
  function batch_norm_for_fc (line 505) | def batch_norm_for_fc(inputs, is_training, bn_decay, scope):
  function batch_norm_for_conv1d (line 519) | def batch_norm_for_conv1d(inputs, is_training, bn_decay, scope):
  function batch_norm_for_conv2d (line 535) | def batch_norm_for_conv2d(inputs, is_training, bn_decay, scope):
  function batch_norm_for_conv3d (line 550) | def batch_norm_for_conv3d(inputs, is_training, bn_decay, scope):
  function dropout (line 564) | def dropout(inputs,

FILE: code/utils/tf_util2.py
  function lrelu (line 3) | def lrelu(x, alpha=0.2):
  function instance_norm (line 13) | def instance_norm(net, train=True,weight_decay=0.00001):
  function conv2d (line 28) | def conv2d(inputs,
  function fully_connected (line 89) | def fully_connected(inputs,

FILE: code/utils/write_result2html.py
  function write_result (line 7) | def write_result():
  function write_result2html_benchmark (line 53) | def write_result2html_benchmark():
  function write_result2html_ModelNet (line 130) | def write_result2html_ModelNet():
Condensed preview — 83 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (353K chars).
[
  {
    "path": "LICENSE",
    "chars": 1068,
    "preview": "MIT License\n\nCopyright (c) 2018 Ganesh Iyer\n\nPermission is hereby granted, free of charge, to any person obtaining a cop"
  },
  {
    "path": "README.md",
    "chars": 3010,
    "preview": "# CalibNet\n\n### [DEPRECATED] This repository is no longer actively supported. \n\nWhile the authors work on an update, ple"
  },
  {
    "path": "code/common/Lie_functions.py",
    "chars": 1959,
    "preview": "import numpy as np\nimport tensorflow as tf\n\ndef for_translation(Old_transform,t):\n\n    R = Old_transform[:3,:3]\n    t = "
  },
  {
    "path": "code/common/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/common/all_transformer.py",
    "chars": 6420,
    "preview": "import tensorflow as tf\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport scipy.misc as smc\n\nimport config_res a"
  },
  {
    "path": "code/common/cnn_utils_res.py",
    "chars": 3710,
    "preview": "import numpy as np\nimport tensorflow as tf\n\ndef weight_variable(shape, str):\n    '''\n    Helper function to create a wei"
  },
  {
    "path": "code/common/global_agg_net.py",
    "chars": 2922,
    "preview": "import numpy as np\nimport tensorflow as tf\nimport scipy.misc as smc\nimport matplotlib.pyplot as plt\n\nimport config_res a"
  },
  {
    "path": "code/common/resnet_depth_model.py",
    "chars": 5720,
    "preview": "import numpy as np\nimport tensorflow as tf\nimport json\nfrom cnn_utils_res import *\n\nimport config_res\n\nwith open(config_"
  },
  {
    "path": "code/common/resnet_rgb_model.py",
    "chars": 5070,
    "preview": "import numpy as np\nimport tensorflow as tf\nimport json\nfrom cnn_utils_res import *\nimport config_res\n\nwith open(config_r"
  },
  {
    "path": "code/config_res.py",
    "chars": 2899,
    "preview": "import numpy as np\n\n# Important paths\n\n# resnet_params_path: path containing .pkl file of RESNET18_BN weights\n# dataset_"
  },
  {
    "path": "code/dataset_files/dataset_build_color.py",
    "chars": 6039,
    "preview": "\"\"\"\n\nFor sequence: 2011_09_26\n\n\"\"\"\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport os\nimport glob\nimport argp"
  },
  {
    "path": "code/dataset_files/dataset_build_color_2.py",
    "chars": 6108,
    "preview": "\"\"\"\n\nFor sequence: 2011_09_26\n\n\"\"\"\n\nimport numpy as np\nimport matplotlib.pyplot as plt\nimport os\nimport glob\nimport argp"
  },
  {
    "path": "code/dataset_files/dataset_builder_parallel.sh",
    "chars": 1024,
    "preview": "SCRIPTPATH=\"$( cd \"$(dirname \"$0\")\" ; pwd -P )\"\n\narray=( 0001 0002 0005 0009 0011 0013 0014 0015 0017 0018 0019 0020 002"
  },
  {
    "path": "code/dataset_files/parser.py",
    "chars": 2269,
    "preview": "import numpy as np\nimport scipy.misc as smc\nfrom natsort import natsorted as ns\nimport glob, os\nimport argparse\n\nparser "
  },
  {
    "path": "code/model_utils.py",
    "chars": 2557,
    "preview": "import os\nimport tensorflow as tf\nfrom tf_ops.emd import tf_auctionmatch\nfrom tf_ops.CD import tf_nndistance\nfrom tf_ops"
  },
  {
    "path": "code/nw_loader_color.py",
    "chars": 3800,
    "preview": "import numpy as np\nimport glob, os, argparse\nimport scipy.misc as smc\nfrom tqdm import tqdm\nimport matplotlib.pyplot as "
  },
  {
    "path": "code/tf_ops/CD/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/tf_ops/CD/makefile",
    "chars": 784,
    "preview": "nvcc = /usr/local/cuda-8.0/bin/nvcc\ncudalib = /usr/local/cuda-8.0/lib64/\ntensorflow = /home/ganeshiyer/tensorflow/lib/py"
  },
  {
    "path": "code/tf_ops/CD/render_balls_so.cpp",
    "chars": 1366,
    "preview": "#include <cstdio>\n#include <vector>\n#include <algorithm>\n#include <math.h>\nusing namespace std;\n\nstruct PointInfo{\n\tint "
  },
  {
    "path": "code/tf_ops/CD/tf_nndistance.cpp",
    "chars": 13072,
    "preview": "#include \"tensorflow/core/framework/op.h\"\n#include \"tensorflow/core/framework/op_kernel.h\"\nREGISTER_OP(\"NnDistance\")\n\t.I"
  },
  {
    "path": "code/tf_ops/CD/tf_nndistance.py",
    "chars": 3286,
    "preview": "import os\nimport tensorflow as tf\nfrom tensorflow.python.framework import ops\n\nBASE_DIR = os.path.dirname(os.path.abspat"
  },
  {
    "path": "code/tf_ops/CD/tf_nndistance_g.cu",
    "chars": 4570,
    "preview": "#if GOOGLE_CUDA\n#define EIGEN_USE_GPU\n#include \"third_party/eigen3/unsupported/Eigen/CXX11/Tensor\"\n\n__global__ void NmDi"
  },
  {
    "path": "code/tf_ops/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/tf_ops/emd/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/tf_ops/emd/tf_auctionmatch.cpp",
    "chars": 3001,
    "preview": "#include \"tensorflow/core/framework/op.h\"\n#include \"tensorflow/core/framework/op_kernel.h\"\n#include \"tensorflow/core/fra"
  },
  {
    "path": "code/tf_ops/emd/tf_auctionmatch.py",
    "chars": 1609,
    "preview": "import tensorflow as tf\nfrom tensorflow.python.framework import ops\nimport sys\nimport os\n\nBASE_DIR = os.path.dirname(os."
  },
  {
    "path": "code/tf_ops/emd/tf_auctionmatch_compile.sh",
    "chars": 436,
    "preview": "echo 'nvcc'\n/usr/local/cuda-8.0/bin/nvcc tf_auctionmatch_g.cu -o tf_auctionmatch_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xc"
  },
  {
    "path": "code/tf_ops/emd/tf_auctionmatch_g.cu",
    "chars": 7831,
    "preview": "#include <cstdio>\n__global__ void AuctionMatchKernel(int b,int n,const float * __restrict__ xyz1,const float * __restric"
  },
  {
    "path": "code/tf_ops/grouping/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/tf_ops/grouping/compile.sh",
    "chars": 367,
    "preview": "g++ query_ball_point.cpp -o query_ball_point\nnvcc query_ball_point.cu -o query_ball_point_cuda\nnvcc query_ball_point_blo"
  },
  {
    "path": "code/tf_ops/grouping/query_ball_point.cpp",
    "chars": 3832,
    "preview": "#include <cstdio>\n#include <ctime>\n#include <cstring> // memset\n#include <cstdlib> // rand, RAND_MAX\n#include <cmath> //"
  },
  {
    "path": "code/tf_ops/grouping/query_ball_point.cu",
    "chars": 4420,
    "preview": "#include <cstdio>\n#include <ctime>\n#include <cstring> // memset\n#include <cstdlib> // rand, RAND_MAX\n#include <cmath> //"
  },
  {
    "path": "code/tf_ops/grouping/query_ball_point_block.cu",
    "chars": 4240,
    "preview": "#include <cstdio>\n#include <ctime>\n#include <cstring> // memset\n#include <cstdlib> // rand, RAND_MAX\n#include <cmath> //"
  },
  {
    "path": "code/tf_ops/grouping/query_ball_point_grid.cu",
    "chars": 4599,
    "preview": "#include <cstdio>\n#include <ctime>\n#include <cstring> // memset\n#include <cstdlib> // rand, RAND_MAX\n#include <cmath> //"
  },
  {
    "path": "code/tf_ops/grouping/selection_sort.cpp",
    "chars": 2481,
    "preview": "#include <cstdio>\n#include <ctime>\n#include <cstring> // memset\n#include <cstdlib> // rand, RAND_MAX\n#include <cmath> //"
  },
  {
    "path": "code/tf_ops/grouping/selection_sort.cu",
    "chars": 1920,
    "preview": "#include <cstdio>\n#include <ctime>\n#include <cstring> // memset\n#include <cstdlib> // rand, RAND_MAX\n#include <cmath> //"
  },
  {
    "path": "code/tf_ops/grouping/selection_sort_const.cu",
    "chars": 2454,
    "preview": "#include <cstdio>\n#include <ctime>\n#include <cstring> // memset\n#include <cstdlib> // rand, RAND_MAX\n#include <cmath> //"
  },
  {
    "path": "code/tf_ops/grouping/test_knn.py",
    "chars": 816,
    "preview": "import tensorflow as tf\nimport numpy as np\n\nnp.random.seed(0)\n\n\na_val = np.random.random((2,5,3))\nb_val = np.random.rand"
  },
  {
    "path": "code/tf_ops/grouping/tf_grouping.cpp",
    "chars": 10656,
    "preview": "#include <cstdio>\n#include <ctime>\n#include <cstring> // memset\n#include <cstdlib> // rand, RAND_MAX\n#include <cmath> //"
  },
  {
    "path": "code/tf_ops/grouping/tf_grouping.py",
    "chars": 4018,
    "preview": "import tensorflow as tf\nfrom tensorflow.python.framework import ops\nimport sys\nimport os\nBASE_DIR = os.path.dirname(os.p"
  },
  {
    "path": "code/tf_ops/grouping/tf_grouping_compile.sh",
    "chars": 391,
    "preview": "#/bin/bash\n/usr/local/cuda-8.0/bin/nvcc tf_grouping_g.cu -o tf_grouping_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -"
  },
  {
    "path": "code/tf_ops/grouping/tf_grouping_g.cu",
    "chars": 5062,
    "preview": "// input: radius (1), nsample (1), xyz1 (b,n,3), xyz2 (b,m,3)\n// output: idx (b,m,nsample), pts_cnt (b,m)\n__global__ voi"
  },
  {
    "path": "code/tf_ops/grouping/tf_grouping_op_test.py",
    "chars": 904,
    "preview": "import tensorflow as tf\nimport numpy as np\nfrom tf_grouping import query_ball_point, group_point\n\nclass GroupPointTest(t"
  },
  {
    "path": "code/tf_ops/interpolation/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/tf_ops/interpolation/interpolate.cpp",
    "chars": 5003,
    "preview": "#include <cstdio>\n#include <ctime>\n#include <cstring> // memset\n#include <cstdlib> // rand, RAND_MAX\n#include <cmath> //"
  },
  {
    "path": "code/tf_ops/interpolation/tf_interpolate.cpp",
    "chars": 10969,
    "preview": "#include <cstdio>\n#include <ctime>\n#include <cstring> // memset\n#include <cstdlib> // rand, RAND_MAX\n#include <cmath> //"
  },
  {
    "path": "code/tf_ops/interpolation/tf_interpolate.py",
    "chars": 2028,
    "preview": "import tensorflow as tf\nfrom tensorflow.python.framework import ops\nimport sys\nimport os\nBASE_DIR = os.path.dirname(__fi"
  },
  {
    "path": "code/tf_ops/interpolation/tf_interpolate_compile.sh",
    "chars": 252,
    "preview": "g++ -std=c++11 tf_interpolate.cpp -o tf_interpolate_so.so -shared -fPIC -I /home/ganeshiyer/tensorflow/lib/python2.7/sit"
  },
  {
    "path": "code/tf_ops/interpolation/tf_interpolate_op_test.py",
    "chars": 818,
    "preview": "import tensorflow as tf\nimport numpy as np\nfrom tf_interpolate import three_nn, three_interpolate\n\nclass GroupPointTest("
  },
  {
    "path": "code/tf_ops/interpolation/visu_interpolation.py",
    "chars": 1533,
    "preview": "''' Visualize part segmentation '''\nimport os\nimport sys\nROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__fi"
  },
  {
    "path": "code/tf_ops/sampling/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/tf_ops/sampling/tf_sampling.cpp",
    "chars": 8762,
    "preview": "/* Furthest point sampling\n * Original author: Haoqiang Fan\n * Modified by Charles R. Qi\n * All Rights Reserved. 2017. \n"
  },
  {
    "path": "code/tf_ops/sampling/tf_sampling.py",
    "chars": 3011,
    "preview": "''' Furthest point sampling\nOriginal author: Haoqiang Fan\nModified by Charles R. Qi\nAll Rights Reserved. 2017. \n'''\nimpo"
  },
  {
    "path": "code/tf_ops/sampling/tf_sampling_compile.sh",
    "chars": 390,
    "preview": "#/bin/bash\n/usr/local/cuda-8.0/bin/nvcc tf_sampling_g.cu -o tf_sampling_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -"
  },
  {
    "path": "code/tf_ops/sampling/tf_sampling_g.cu",
    "chars": 6803,
    "preview": "/* Furthest point sampling GPU implementation\n * Original author: Haoqiang Fan\n * Modified by Charles R. Qi\n * All Right"
  },
  {
    "path": "code/train_model_combined.py",
    "chars": 9013,
    "preview": "import numpy as np\nimport tensorflow as tf\nimport scipy.misc as smc\n\nimport config_res as config\n\nfrom common.cnn_utils_"
  },
  {
    "path": "code/utils/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "code/utils/data_prep_util.py",
    "chars": 5189,
    "preview": "import os\nimport sys\nBASE_DIR = os.path.dirname(os.path.abspath(__file__))\nsys.path.append(BASE_DIR)\nfrom plyfile import"
  },
  {
    "path": "code/utils/eulerangles.py",
    "chars": 13431,
    "preview": "# emacs: -*- mode: python-mode; py-indent-offset: 4; indent-tabs-mode: nil -*-\n# vi: set ft=python sts=4 ts=4 sw=4 et:\n#"
  },
  {
    "path": "code/utils/modelnet_data_prep.py",
    "chars": 17968,
    "preview": "import os\nimport sys\nimport glob\nimport h5py\nimport numpy as np\nfrom multiprocessing.dummy import Pool as ThreadPool\nimp"
  },
  {
    "path": "code/utils/off2obj.py",
    "chars": 3052,
    "preview": "#! /usr/bin/python\n# Written by John Bowers\n# http://johnsresearch.wordpress.com\n# 2009\n# You are welcome to use this ho"
  },
  {
    "path": "code/utils/pc_util.py",
    "chars": 7566,
    "preview": "\"\"\" Utility functions for processing point clouds.\n\nAuthor: Charles R. Qi, Hao Su\nDate: November 2016\n\"\"\"\n\nimport os\nimp"
  },
  {
    "path": "code/utils/plyfile.py",
    "chars": 26330,
    "preview": "#   Copyright 2014 Darsh Ranjan\n#\n#   This file is part of python-plyfile.\n#\n#   python-plyfile is free software: you ca"
  },
  {
    "path": "code/utils/pointnet_util.py",
    "chars": 12250,
    "preview": "\"\"\" PointNet++ Layers\n\nAuthor: Charles R. Qi\nDate: November 2017\n\"\"\"\n\nimport os\nimport sys\nfrom tf_ops.sampling.tf_sampl"
  },
  {
    "path": "code/utils/provider.py",
    "chars": 5462,
    "preview": "import os\nimport sys\nimport numpy as np\nimport h5py\nBASE_DIR = os.path.dirname(os.path.abspath(__file__))\nsys.path.appen"
  },
  {
    "path": "code/utils/show3d.py",
    "chars": 5365,
    "preview": "'''\r\n\r\nThe default behavior is to visualize the points as white dots\r\n>>>show3d.showpoints(np.random.rand(10000,3))\r\n\r\nC"
  },
  {
    "path": "code/utils/tf_util.py",
    "chars": 19800,
    "preview": "\"\"\" Wrapper functions for TensorFlow layers.\n\nAuthor: Charles R. Qi\nDate: November 2017\n\"\"\"\n\nimport numpy as np\nimport t"
  },
  {
    "path": "code/utils/tf_util2.py",
    "chars": 4481,
    "preview": "import tensorflow as tf\n\ndef lrelu(x, alpha=0.2):\n  return tf.nn.relu(x) - alpha * tf.nn.relu(-x)\n\n\n# def lrelu2(x, leak"
  },
  {
    "path": "code/utils/write_result2html.py",
    "chars": 8176,
    "preview": "import os\nimport numpy as np\nfrom tqdm import tqdm\nfrom utils import pc_util\nfrom scipy.misc import imsave\n\ndef write_re"
  },
  {
    "path": "docs/.nojekyll",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "docs/_site/css/main.css",
    "chars": 1011,
    "preview": "body { margin: 60px auto; width: 70%; }\n\na { text-decoration: none; color: #999; }\na:hover { text-decoration: underline;"
  },
  {
    "path": "docs/_site/css/project.css",
    "chars": 3392,
    "preview": "\n@import url('https://fonts.googleapis.com/css?family=Lato:400,400i,700,900&subset=latin-ext');\n\nbody {\n  font-family: \""
  },
  {
    "path": "docs/_site/css/project_file.css",
    "chars": 1814,
    "preview": "@import url(https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.css);\n\n/*body {\n  font-family: \"L"
  },
  {
    "path": "docs/_site/index.html",
    "chars": 3224,
    "preview": "<!DOCTYPE html>\n\n<html>\n\t<!-- <meta name=viewport content=“width=800”> -->\n\t<head>\n\t\t<title>Ganesh Iyer</title>\n\t\t<!-- l"
  },
  {
    "path": "docs/css/main.css",
    "chars": 1011,
    "preview": "body { margin: 60px auto; width: 70%; }\n\na { text-decoration: none; color: #999; }\na:hover { text-decoration: underline;"
  },
  {
    "path": "docs/css/project.css",
    "chars": 3392,
    "preview": "\n@import url('https://fonts.googleapis.com/css?family=Lato:400,400i,700,900&subset=latin-ext');\n\nbody {\n  font-family: \""
  },
  {
    "path": "docs/css/project_file.css",
    "chars": 1814,
    "preview": "@import url(https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.6.3/css/font-awesome.css);\n\n/*body {\n  font-family: \"L"
  },
  {
    "path": "docs/index.html",
    "chars": 3364,
    "preview": "<!DOCTYPE html>\n\n<html>\n\t<!-- <meta name=viewport content=“width=800”> -->\n\t<head>\n\t\t<title>CalibNet</title>\n\t\t<!-- link"
  }
]

// ... and 5 more files (download for full content)

About this extraction

This page contains the full source code of the epiception/CalibNet GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 83 files (325.6 KB), approximately 98.3k tokens, and a symbol index with 303 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Copied to clipboard!