Repository: batman-nair/project-defude Branch: master Commit: 88c54091e148 Files: 20 Total size: 98.8 KB Directory structure: gitextract_tncuquzk/ ├── .gitignore ├── README.md ├── defocus/ │ └── defocus.py ├── depth/ │ ├── README.md │ ├── average_gradients.py │ ├── bilinear_sampler.py │ ├── depth_dataloader.py │ ├── depth_model.py │ ├── depth_simple.py │ └── get_model.sh ├── gui.py ├── images/ │ ├── sample0_disp.npy │ ├── sample1_disp.npy │ └── sample2_disp.npy ├── main.py ├── preprocessing.py ├── requirements.txt ├── screenshot/ │ └── README.md ├── ui-stepper.glade └── ui.glade ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .env/ model/ __pycache__/ .pyc ================================================ FILE: README.md ================================================ # Project Defude

Project Defude

*Synthetic Defocusing using Monocular Depth Estimation* Graded blurring of an image based on how far a point is from focus using just a single image with no extra data. To achieve this we generate a depth map for the image using machine learning. Create blurred versions of the image in levels based on how far a point is from the depth of focus. Stitch the different blurred images to create the final image with the selected point fully focussed and points further away become more blurred. ## How to run You'll need to setup Tensorflow v1 and OpenCV for the bare minimum run. > **_NOTE:_** Setting up Tensorflow v1 can be tricky, hence it is recommeneded to use a prebuilt docker image. > ```bash > docker pull tensorflow/tensorflow:1.0.0-py3 > ``` > GPU specific images are also available. Install a trained model for depth estimation. ```sh sh ./depth/get_model.sh model_kitti depth/ ``` This will download the model which will give the best results for the sample in the repo. Call main.py with model path and image path ```sh python main.py --model_path /path/to/model --image_path /path/to/image ``` The image will load up. Clicking anywhere sets that as the point of focus, regenerating the image. *I've actually pushed in the depth data for the sample images, so you can actually run the program without tensorflow or a trained model for the sample images :)* ### To use our nice GUI Install dependencies from requirements.txt Run the gui version as ```sh python gui.py --model_path /path/to/model ``` Gui version requires some additional packages. Install as necessary. ## Depth Estimation We have used the method based on the paper Unsupervised Monocular Depth Estimation with Left-Right Consistency. You can find more about their amazing paper [here](http://visual.cs.ucl.ac.uk/pubs/monoDepth/). They train a machine learning model to generate a depth map using just a single image. ### How the machine learing model is trained in simple terms: The model is trained on a large set of stereo(left-right) images to generate a right image from a given left image. Generating the right image, the model is learning internally about the depth of various points in the image. Once the model is trained. We can give it a simple image and it will be able to generate a depth map for that image. You can learn more about their amazing project on [github](https://github.com/mrharicot/monodepth). Their trained models give way better results than what we can so we are using that. The depth estimation code we have is a minimal stripped down version just to run the model. ## The Team This was done as a final year project by [Haritha Paul](https://github.com/haritha1997), [Navin Mohan](https://github.com/navin-mohan), [Roshan V](https://github.com/ros-han) and me. Leave a star if you liked the project. :) ================================================ FILE: defocus/defocus.py ================================================ import argparse import copy import cv2 import numpy as np import os parser = argparse.ArgumentParser(description='Defocus using depth map.') parser.add_argument('--image_path', type=str, help='path to input image', default='../images/sample2.png') parser.add_argument('--blur_method', type=str, help='The type of blur to be applied', default='gaussian') args = parser.parse_args() # This class holds functions that handle defocusing of an image using a depth map # It takes in the path to the image to be defocused and the blur method to be used for defocusing # It requires the depth map to be saved in the same folder as the image, with the suffix and extension _disp.npy # Parameters: # image_path: Path to the image which is to be defocused # blur_method: The blur function to be used for blurring. # Available values are: gaussian, avg_blur, median, bilateral class DefocuserObject(): def __init__(self, image_path = "../images/sample2.png", blur_method = "gaussian"): self.img_name = os.path.basename(image_path).split('.')[0] self.img_ext = os.path.basename(image_path).split('.')[-1] self.img_dir = os.path.dirname(image_path) self.blur_method = blur_method # Lambda functions to call the appropriate blur function according to set method # All the functions takes 2 arguments: the image and the kernel size to be used in the function # As the kernel size increases the amount of blur increases self.blur_function = { 'avg_blur': lambda img,ker_size: cv2.blur(img, (ker_size,ker_size)), 'gaussian': lambda img,ker_size: cv2.GaussianBlur(img, (ker_size,ker_size), 0), 'median': lambda img,ker_size: cv2.medianBlur(img, ker_size), 'bilateral': lambda img,ker_size: cv2.bilateralFilter(img, ker_size, 75, 75), } self.depth_data = np.load(os.path.join(self.img_dir, self.img_name + "_disp.npy")) self.img = cv2.imread(os.path.join(self.img_dir, self.img_name + "." + self.img_ext)) self.blur_imgs = [] # The blurred versions of the images can be precalculated self.blur_images() # Normalizes the depth values based on the depth of focus # The depth of focus is saved in point_of_focus # norm_depth_data holds the final normalized depth of focus # 0 value corresponds to the point which is focused # 1 value corresponds to the point furthest from the point of focus def normalize_pof(self): self.norm_depth_data = self.depth_data - self.point_of_focus self.norm_depth_data = np.abs(self.norm_depth_data) self.norm_depth_data = self.norm_depth_data / self.norm_depth_data.max() # Mouse callback function # The point of the click is taken as the point of focus and defocusing is performed around it def depth_callback(self, event, x, y, flags, param): if event == cv2.EVENT_LBUTTONDOWN: self.point_of_focus = self.depth_data[y][x] self.defocus_with_pof() # The depth map is normalized around the point of focus after which the defocusing of the image is done by sectioning and masking def defocus_with_pof(self): print("Point of focus: ", self.point_of_focus) self.normalize_pof() print("Normalized depth data around point of focus") section_size = 1 / len(self.blur_imgs) final_image = np.zeros(self.img.shape) for index, blur_img in enumerate(self.blur_imgs): mask = (index*section_size <= self.norm_depth_data) & (self.norm_depth_data < (index+1)*section_size) masked_img = copy.deepcopy(blur_img) # Applying mask on copy of blurred image masked_img[mask==0] = [0, 0, 0] final_image = final_image + masked_img final_image = np.uint8(final_image) # cv2.imshow("Final", final_image) cv2.imwrite(os.path.join(self.img_dir, self.img_name + "_defocus.png"), final_image) cv2.imshow("Project Defude", final_image) def set_pof_from_coord(self, norm_x, norm_y): h, w = self.depth_data.shape[:2] self.point_of_focus = self.depth_data[int(h * norm_y)][int(w * norm_x)] self.defocus_with_pof() # Creates a window to display the original image # The callback function is attached to this window def view_image_for_blur(self): cv2.namedWindow("Project Defude", flags = (cv2.WINDOW_GUI_NORMAL | cv2.WINDOW_AUTOSIZE)) cv2.setMouseCallback("Project Defude", self.depth_callback) cv2.imshow("Project Defude", self.img) # cv2.namedWindow("Project Defude", flags=cv2.WINDOW_GUI_NORMAL) while(cv2.waitKey() != 27): pass cv2.destroyAllWindows() # Generated different blurred versions of the original image # The blurred versions are stored in the list blur_imgs def blur_images(self): print("Generating blurred versions for image") self.blur_imgs.append(self.img) for ker_size in range(5, 22, 4): self.blur_imgs.append(self.blur_function[self.blur_method](self.img, ker_size)) # Show blurred images # for index, blur_img in enumerate(self.blur_imgs): # cv2.imshow("blur " + str(index), blur_img) if __name__ == "__main__": PATH = args.image_path BLUR = args.blur_method defocuser = DefocuserObject(image_path = PATH, blur_method = BLUR) # defocuser.set_pof_from_coord(0.9, 0.9) defocuser.view_image_for_blur() ================================================ FILE: depth/README.md ================================================ # Monodepth Tensorflow implementation of unsupervised single image depth prediction using a convolutional neural network.

monodepth

**Unsupervised Monocular Depth Estimation with Left-Right Consistency** [Clément Godard](http://www0.cs.ucl.ac.uk/staff/C.Godard/), [Oisin Mac Aodha](http://vision.caltech.edu/~macaodha/) and [Gabriel J. Brostow](http://www0.cs.ucl.ac.uk/staff/g.brostow/) CVPR 2017 For more details: [github](http://visual.cs.ucl.ac.uk/pubs/monoDepth/), [project page](http://visual.cs.ucl.ac.uk/pubs/monoDepth/), [arXiv](https://arxiv.org/abs/1609.03677) This code is a minimal stripped down version of the original monodepth code using OpenCV backend. All ownership and rights to this code and it's implementation belong to the original project owners and their organization. Please check the [original repository](https://github.com/mrharicot/monodepth) and their [LICENSE](https://github.com/mrharicot/monodepth/blob/master/LICENSE) file for more information. ================================================ FILE: depth/average_gradients.py ================================================ # Copyright 2015 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from __future__ import absolute_import, division, print_function import tensorflow as tf def average_gradients(tower_grads): average_grads = [] for grad_and_vars in zip(*tower_grads): # Note that each grad_and_vars looks like the following: # ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN)) grads = [] for g, _ in grad_and_vars: # Add 0 dimension to the gradients to represent the tower. expanded_g = tf.expand_dims(g, 0) # Append on a 'tower' dimension which we will average over below. grads.append(expanded_g) # Average over the 'tower' dimension. grad = tf.concat(axis=0, values=grads) grad = tf.reduce_mean(grad, 0) # Keep in mind that the Variables are redundant because they are shared # across towers. So .. we will just return the first tower's pointer to # the Variable. v = grad_and_vars[0][1] grad_and_var = (grad, v) average_grads.append(grad_and_var) return average_grads ================================================ FILE: depth/bilinear_sampler.py ================================================ # Copyright 2016 The TensorFlow Authors. All Rights Reserved. # Copyright 2017 Modifications Clement Godard. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== from __future__ import absolute_import, division, print_function import tensorflow as tf def bilinear_sampler_1d_h(input_images, x_offset, wrap_mode='border', name='bilinear_sampler', **kwargs): def _repeat(x, n_repeats): with tf.variable_scope('_repeat'): rep = tf.tile(tf.expand_dims(x, 1), [1, n_repeats]) return tf.reshape(rep, [-1]) def _interpolate(im, x, y): with tf.variable_scope('_interpolate'): # handle both texture border types _edge_size = 0 if _wrap_mode == 'border': _edge_size = 1 im = tf.pad(im, [[0, 0], [1, 1], [1, 1], [0, 0]], mode='CONSTANT') x = x + _edge_size y = y + _edge_size elif _wrap_mode == 'edge': _edge_size = 0 else: return None x = tf.clip_by_value(x, 0.0, _width_f - 1 + 2 * _edge_size) x0_f = tf.floor(x) y0_f = tf.floor(y) x1_f = x0_f + 1 x0 = tf.cast(x0_f, tf.int32) y0 = tf.cast(y0_f, tf.int32) x1 = tf.cast(tf.minimum(x1_f, _width_f - 1 + 2 * _edge_size), tf.int32) dim2 = (_width + 2 * _edge_size) dim1 = (_width + 2 * _edge_size) * (_height + 2 * _edge_size) base = _repeat(tf.range(_num_batch) * dim1, _height * _width) base_y0 = base + y0 * dim2 idx_l = base_y0 + x0 idx_r = base_y0 + x1 im_flat = tf.reshape(im, tf.stack([-1, _num_channels])) pix_l = tf.gather(im_flat, idx_l) pix_r = tf.gather(im_flat, idx_r) weight_l = tf.expand_dims(x1_f - x, 1) weight_r = tf.expand_dims(x - x0_f, 1) return weight_l * pix_l + weight_r * pix_r def _transform(input_images, x_offset): with tf.variable_scope('transform'): # grid of (x_t, y_t, 1), eq (1) in ref [1] x_t, y_t = tf.meshgrid(tf.linspace(0.0, _width_f - 1.0, _width), tf.linspace(0.0 , _height_f - 1.0 , _height)) x_t_flat = tf.reshape(x_t, (1, -1)) y_t_flat = tf.reshape(y_t, (1, -1)) x_t_flat = tf.tile(x_t_flat, tf.stack([_num_batch, 1])) y_t_flat = tf.tile(y_t_flat, tf.stack([_num_batch, 1])) x_t_flat = tf.reshape(x_t_flat, [-1]) y_t_flat = tf.reshape(y_t_flat, [-1]) x_t_flat = x_t_flat + tf.reshape(x_offset, [-1]) * _width_f input_transformed = _interpolate(input_images, x_t_flat, y_t_flat) output = tf.reshape( input_transformed, tf.stack([_num_batch, _height, _width, _num_channels])) return output with tf.variable_scope(name): _num_batch = tf.shape(input_images)[0] _height = tf.shape(input_images)[1] _width = tf.shape(input_images)[2] _num_channels = tf.shape(input_images)[3] _height_f = tf.cast(_height, tf.float32) _width_f = tf.cast(_width, tf.float32) _wrap_mode = wrap_mode output = _transform(input_images, x_offset) return output ================================================ FILE: depth/depth_dataloader.py ================================================ # Copyright UCL Business plc 2017. Patent Pending. All rights reserved. # # The MonoDepth Software is licensed under the terms of the UCLB ACP-A licence # which allows for non-commercial use only, the full terms of which are made # available in the LICENSE file. # # For any other use of the software not covered by the UCLB ACP-A Licence, # please contact info@uclb.com """Depth data loader. """ import tensorflow as tf def string_length_tf(t): return tf.py_func(len, [t], [tf.int64]) class DepthDataloader(object): """Depth dataloader""" def __init__(self, data_path, filenames_file, params, mode): self.data_path = data_path self.params = params self.mode = mode self.left_image_batch = None self.right_image_batch = None input_queue = tf.train.string_input_producer([filenames_file], shuffle=False) line_reader = tf.TextLineReader() _, line = line_reader.read(input_queue) split_line = tf.string_split([line]).values # we load only one image for test if mode == 'test': left_image_path = tf.string_join([self.data_path, split_line[0]]) left_image_o = self.read_image(left_image_path) if mode == 'train': # randomly flip images do_flip = tf.random_uniform([], 0, 1) left_image = tf.cond(do_flip > 0.5, lambda: tf.image.flip_left_right(right_image_o), lambda: left_image_o) right_image = tf.cond(do_flip > 0.5, lambda: tf.image.flip_left_right(left_image_o), lambda: right_image_o) # randomly augment images do_augment = tf.random_uniform([], 0, 1) left_image, right_image = tf.cond(do_augment > 0.5, lambda: self.augment_image_pair(left_image, right_image), lambda: (left_image, right_image)) left_image.set_shape( [None, None, 3]) right_image.set_shape([None, None, 3]) # capacity = min_after_dequeue + (num_threads + a small safety margin) * batch_size min_after_dequeue = 2048 capacity = min_after_dequeue + 4 * params.batch_size self.left_image_batch, self.right_image_batch = tf.train.shuffle_batch([left_image, right_image], params.batch_size, capacity, min_after_dequeue, params.num_threads) elif mode == 'test': self.left_image_batch = tf.stack([left_image_o, tf.image.flip_left_right(left_image_o)], 0) self.left_image_batch.set_shape( [2, None, None, 3]) def augment_image_pair(self, left_image, right_image): # randomly shift gamma random_gamma = tf.random_uniform([], 0.8, 1.2) left_image_aug = left_image ** random_gamma right_image_aug = right_image ** random_gamma # randomly shift brightness random_brightness = tf.random_uniform([], 0.5, 2.0) left_image_aug = left_image_aug * random_brightness right_image_aug = right_image_aug * random_brightness # randomly shift color random_colors = tf.random_uniform([3], 0.8, 1.2) white = tf.ones([tf.shape(left_image)[0], tf.shape(left_image)[1]]) color_image = tf.stack([white * random_colors[i] for i in range(3)], axis=2) left_image_aug *= color_image right_image_aug *= color_image # saturate left_image_aug = tf.clip_by_value(left_image_aug, 0, 1) right_image_aug = tf.clip_by_value(right_image_aug, 0, 1) return left_image_aug, right_image_aug def read_image(self, image_path): # tf.decode_image does not return the image size, this is an ugly workaround to handle both jpeg and png path_length = string_length_tf(image_path)[0] file_extension = tf.substr(image_path, path_length - 3, 3) file_cond = tf.equal(file_extension, 'jpg') image = tf.cond(file_cond, lambda: tf.image.decode_jpeg(tf.read_file(image_path)), lambda: tf.image.decode_png(tf.read_file(image_path))) image = tf.image.convert_image_dtype(image, tf.float32) image = tf.image.resize_images(image, [self.params.height, self.params.width], tf.image.ResizeMethod.AREA) return image ================================================ FILE: depth/depth_model.py ================================================ # Copyright UCL Business plc 2017. Patent Pending. All rights reserved. # # The MonoDepth Software is licensed under the terms of the UCLB ACP-A licence # which allows for non-commercial use only, the full terms of which are made # available in the LICENSE file. # # For any other use of the software not covered by the UCLB ACP-A Licence, # please contact info@uclb.com """Fully convolutional model for monocular depth estimation by Clement Godard, Oisin Mac Aodha and Gabriel J. Brostow http://visual.cs.ucl.ac.uk/pubs/monoDepth/ """ from __future__ import absolute_import, division, print_function from collections import namedtuple import numpy as np import tensorflow as tf import tensorflow.contrib.slim as slim from bilinear_sampler import * depth_parameters = namedtuple('parameters', 'encoder, ' 'height, width, ' 'batch_size, ' 'num_threads, ' 'num_epochs, ' 'wrap_mode, ' 'use_deconv, ' 'alpha_image_loss, ' 'disp_gradient_loss_weight, ' 'lr_loss_weight, ' 'full_summary') class DepthModel(object): """depth model""" def __init__(self, params, mode, left, right, reuse_variables=None, model_index=0): self.params = params self.mode = mode self.left = left self.right = right self.model_collection = ['model_' + str(model_index)] self.reuse_variables = reuse_variables self.build_model() self.build_outputs() if self.mode == 'test': return self.build_losses() self.build_summaries() def gradient_x(self, img): gx = img[:,:,:-1,:] - img[:,:,1:,:] return gx def gradient_y(self, img): gy = img[:,:-1,:,:] - img[:,1:,:,:] return gy def upsample_nn(self, x, ratio): s = tf.shape(x) h = s[1] w = s[2] return tf.image.resize_nearest_neighbor(x, [h * ratio, w * ratio]) def scale_pyramid(self, img, num_scales): scaled_imgs = [img] s = tf.shape(img) h = s[1] w = s[2] for i in range(num_scales - 1): ratio = 2 ** (i + 1) nh = h // ratio nw = w // ratio scaled_imgs.append(tf.image.resize_area(img, [nh, nw])) return scaled_imgs def generate_image_left(self, img, disp): return bilinear_sampler_1d_h(img, -disp) def generate_image_right(self, img, disp): return bilinear_sampler_1d_h(img, disp) def SSIM(self, x, y): C1 = 0.01 ** 2 C2 = 0.03 ** 2 mu_x = slim.avg_pool2d(x, 3, 1, 'VALID') mu_y = slim.avg_pool2d(y, 3, 1, 'VALID') sigma_x = slim.avg_pool2d(x ** 2, 3, 1, 'VALID') - mu_x ** 2 sigma_y = slim.avg_pool2d(y ** 2, 3, 1, 'VALID') - mu_y ** 2 sigma_xy = slim.avg_pool2d(x * y , 3, 1, 'VALID') - mu_x * mu_y SSIM_n = (2 * mu_x * mu_y + C1) * (2 * sigma_xy + C2) SSIM_d = (mu_x ** 2 + mu_y ** 2 + C1) * (sigma_x + sigma_y + C2) SSIM = SSIM_n / SSIM_d return tf.clip_by_value((1 - SSIM) / 2, 0, 1) def get_disparity_smoothness(self, disp, pyramid): disp_gradients_x = [self.gradient_x(d) for d in disp] disp_gradients_y = [self.gradient_y(d) for d in disp] image_gradients_x = [self.gradient_x(img) for img in pyramid] image_gradients_y = [self.gradient_y(img) for img in pyramid] weights_x = [tf.exp(-tf.reduce_mean(tf.abs(g), 3, keep_dims=True)) for g in image_gradients_x] weights_y = [tf.exp(-tf.reduce_mean(tf.abs(g), 3, keep_dims=True)) for g in image_gradients_y] smoothness_x = [disp_gradients_x[i] * weights_x[i] for i in range(4)] smoothness_y = [disp_gradients_y[i] * weights_y[i] for i in range(4)] return smoothness_x + smoothness_y def get_disp(self, x): disp = 0.3 * self.conv(x, 2, 3, 1, tf.nn.sigmoid) return disp def conv(self, x, num_out_layers, kernel_size, stride, activation_fn=tf.nn.elu): p = np.floor((kernel_size - 1) / 2).astype(np.int32) p_x = tf.pad(x, [[0, 0], [p, p], [p, p], [0, 0]]) return slim.conv2d(p_x, num_out_layers, kernel_size, stride, 'VALID', activation_fn=activation_fn) def conv_block(self, x, num_out_layers, kernel_size): conv1 = self.conv(x, num_out_layers, kernel_size, 1) conv2 = self.conv(conv1, num_out_layers, kernel_size, 2) return conv2 def maxpool(self, x, kernel_size): p = np.floor((kernel_size - 1) / 2).astype(np.int32) p_x = tf.pad(x, [[0, 0], [p, p], [p, p], [0, 0]]) return slim.max_pool2d(p_x, kernel_size) def resconv(self, x, num_layers, stride): do_proj = tf.shape(x)[3] != num_layers or stride == 2 shortcut = [] conv1 = self.conv(x, num_layers, 1, 1) conv2 = self.conv(conv1, num_layers, 3, stride) conv3 = self.conv(conv2, 4 * num_layers, 1, 1, None) if do_proj: shortcut = self.conv(x, 4 * num_layers, 1, stride, None) else: shortcut = x return tf.nn.elu(conv3 + shortcut) def resblock(self, x, num_layers, num_blocks): out = x for i in range(num_blocks - 1): out = self.resconv(out, num_layers, 1) out = self.resconv(out, num_layers, 2) return out def upconv(self, x, num_out_layers, kernel_size, scale): upsample = self.upsample_nn(x, scale) conv = self.conv(upsample, num_out_layers, kernel_size, 1) return conv def deconv(self, x, num_out_layers, kernel_size, scale): p_x = tf.pad(x, [[0, 0], [1, 1], [1, 1], [0, 0]]) conv = slim.conv2d_transpose(p_x, num_out_layers, kernel_size, scale, 'SAME') return conv[:,3:-1,3:-1,:] def build_resnet50(self): #set convenience functions conv = self.conv if self.params.use_deconv: upconv = self.deconv else: upconv = self.upconv with tf.variable_scope('encoder'): conv1 = conv(self.model_input, 64, 7, 2) # H/2 - 64D pool1 = self.maxpool(conv1, 3) # H/4 - 64D conv2 = self.resblock(pool1, 64, 3) # H/8 - 256D conv3 = self.resblock(conv2, 128, 4) # H/16 - 512D conv4 = self.resblock(conv3, 256, 6) # H/32 - 1024D conv5 = self.resblock(conv4, 512, 3) # H/64 - 2048D with tf.variable_scope('skips'): skip1 = conv1 skip2 = pool1 skip3 = conv2 skip4 = conv3 skip5 = conv4 # DECODING with tf.variable_scope('decoder'): upconv6 = upconv(conv5, 512, 3, 2) #H/32 concat6 = tf.concat([upconv6, skip5], 3) iconv6 = conv(concat6, 512, 3, 1) upconv5 = upconv(iconv6, 256, 3, 2) #H/16 concat5 = tf.concat([upconv5, skip4], 3) iconv5 = conv(concat5, 256, 3, 1) upconv4 = upconv(iconv5, 128, 3, 2) #H/8 concat4 = tf.concat([upconv4, skip3], 3) iconv4 = conv(concat4, 128, 3, 1) self.disp4 = self.get_disp(iconv4) udisp4 = self.upsample_nn(self.disp4, 2) upconv3 = upconv(iconv4, 64, 3, 2) #H/4 concat3 = tf.concat([upconv3, skip2, udisp4], 3) iconv3 = conv(concat3, 64, 3, 1) self.disp3 = self.get_disp(iconv3) udisp3 = self.upsample_nn(self.disp3, 2) upconv2 = upconv(iconv3, 32, 3, 2) #H/2 concat2 = tf.concat([upconv2, skip1, udisp3], 3) iconv2 = conv(concat2, 32, 3, 1) self.disp2 = self.get_disp(iconv2) udisp2 = self.upsample_nn(self.disp2, 2) upconv1 = upconv(iconv2, 16, 3, 2) #H concat1 = tf.concat([upconv1, udisp2], 3) iconv1 = conv(concat1, 16, 3, 1) self.disp1 = self.get_disp(iconv1) def build_model(self): with slim.arg_scope([slim.conv2d, slim.conv2d_transpose], activation_fn=tf.nn.elu): with tf.variable_scope('model', reuse=self.reuse_variables): self.left_pyramid = self.scale_pyramid(self.left, 4) if self.mode == 'train': self.right_pyramid = self.scale_pyramid(self.right, 4) self.model_input = self.left #build model if self.params.encoder == 'resnet50': self.build_resnet50() else: return None def build_outputs(self): # STORE DISPARITIES with tf.variable_scope('disparities'): self.disp_est = [self.disp1, self.disp2, self.disp3, self.disp4] self.disp_left_est = [tf.expand_dims(d[:,:,:,0], 3) for d in self.disp_est] self.disp_right_est = [tf.expand_dims(d[:,:,:,1], 3) for d in self.disp_est] if self.mode == 'test': return # GENERATE IMAGES with tf.variable_scope('images'): self.left_est = [self.generate_image_left(self.right_pyramid[i], self.disp_left_est[i]) for i in range(4)] self.right_est = [self.generate_image_right(self.left_pyramid[i], self.disp_right_est[i]) for i in range(4)] # LR CONSISTENCY with tf.variable_scope('left-right'): self.right_to_left_disp = [self.generate_image_left(self.disp_right_est[i], self.disp_left_est[i]) for i in range(4)] self.left_to_right_disp = [self.generate_image_right(self.disp_left_est[i], self.disp_right_est[i]) for i in range(4)] # DISPARITY SMOOTHNESS with tf.variable_scope('smoothness'): self.disp_left_smoothness = self.get_disparity_smoothness(self.disp_left_est, self.left_pyramid) self.disp_right_smoothness = self.get_disparity_smoothness(self.disp_right_est, self.right_pyramid) def build_losses(self): with tf.variable_scope('losses', reuse=self.reuse_variables): # IMAGE RECONSTRUCTION # L1 self.l1_left = [tf.abs( self.left_est[i] - self.left_pyramid[i]) for i in range(4)] self.l1_reconstruction_loss_left = [tf.reduce_mean(l) for l in self.l1_left] self.l1_right = [tf.abs(self.right_est[i] - self.right_pyramid[i]) for i in range(4)] self.l1_reconstruction_loss_right = [tf.reduce_mean(l) for l in self.l1_right] # SSIM self.ssim_left = [self.SSIM( self.left_est[i], self.left_pyramid[i]) for i in range(4)] self.ssim_loss_left = [tf.reduce_mean(s) for s in self.ssim_left] self.ssim_right = [self.SSIM(self.right_est[i], self.right_pyramid[i]) for i in range(4)] self.ssim_loss_right = [tf.reduce_mean(s) for s in self.ssim_right] # WEIGTHED SUM self.image_loss_right = [self.params.alpha_image_loss * self.ssim_loss_right[i] + (1 - self.params.alpha_image_loss) * self.l1_reconstruction_loss_right[i] for i in range(4)] self.image_loss_left = [self.params.alpha_image_loss * self.ssim_loss_left[i] + (1 - self.params.alpha_image_loss) * self.l1_reconstruction_loss_left[i] for i in range(4)] self.image_loss = tf.add_n(self.image_loss_left + self.image_loss_right) # DISPARITY SMOOTHNESS self.disp_left_loss = [tf.reduce_mean(tf.abs(self.disp_left_smoothness[i])) / 2 ** i for i in range(4)] self.disp_right_loss = [tf.reduce_mean(tf.abs(self.disp_right_smoothness[i])) / 2 ** i for i in range(4)] self.disp_gradient_loss = tf.add_n(self.disp_left_loss + self.disp_right_loss) # LR CONSISTENCY self.lr_left_loss = [tf.reduce_mean(tf.abs(self.right_to_left_disp[i] - self.disp_left_est[i])) for i in range(4)] self.lr_right_loss = [tf.reduce_mean(tf.abs(self.left_to_right_disp[i] - self.disp_right_est[i])) for i in range(4)] self.lr_loss = tf.add_n(self.lr_left_loss + self.lr_right_loss) # TOTAL LOSS self.total_loss = self.image_loss + self.params.disp_gradient_loss_weight * self.disp_gradient_loss + self.params.lr_loss_weight * self.lr_loss def build_summaries(self): # SUMMARIES with tf.device('/cpu:0'): for i in range(4): tf.summary.scalar('ssim_loss_' + str(i), self.ssim_loss_left[i] + self.ssim_loss_right[i], collections=self.model_collection) tf.summary.scalar('l1_loss_' + str(i), self.l1_reconstruction_loss_left[i] + self.l1_reconstruction_loss_right[i], collections=self.model_collection) tf.summary.scalar('image_loss_' + str(i), self.image_loss_left[i] + self.image_loss_right[i], collections=self.model_collection) tf.summary.scalar('disp_gradient_loss_' + str(i), self.disp_left_loss[i] + self.disp_right_loss[i], collections=self.model_collection) tf.summary.scalar('lr_loss_' + str(i), self.lr_left_loss[i] + self.lr_right_loss[i], collections=self.model_collection) tf.summary.image('disp_left_est_' + str(i), self.disp_left_est[i], max_outputs=4, collections=self.model_collection) tf.summary.image('disp_right_est_' + str(i), self.disp_right_est[i], max_outputs=4, collections=self.model_collection) ================================================ FILE: depth/depth_simple.py ================================================ # Copyright UCL Business plc 2017. Patent Pending. All rights reserved. # # The MonoDepth Software is licensed under the terms of the UCLB ACP-A licence # which allows for non-commercial use only, the full terms of which are made # available in the LICENSE file. # # For any other use of the software not covered by the UCLB ACP-A Licence, # please contact info@uclb.com # only keep warnings and errors import os os.environ['TF_CPP_MIN_LOG_LEVEL']='0' import numpy as np import argparse import re import time import tensorflow as tf import tensorflow.contrib.slim as slim import cv2 import matplotlib matplotlib.use('agg') import matplotlib.pyplot as plt from depth_model import * from depth_dataloader import * from average_gradients import * parser = argparse.ArgumentParser(description='Depth TensorFlow implementation.') parser.add_argument('--encoder', type=str, help='type of encoder, resnet50', default='resnet50') parser.add_argument('--image_path', type=str, help='path to the image', required=True) parser.add_argument('--checkpoint_path', type=str, help='path to a specific checkpoint to load', required=True) parser.add_argument('--input_height', type=int, help='input height', default=256) parser.add_argument('--input_width', type=int, help='input width', default=512) args = parser.parse_args() def post_process_disparity(disp): _, h, w = disp.shape l_disp = disp[0,:,:] r_disp = np.fliplr(disp[1,:,:]) m_disp = 0.5 * (l_disp + r_disp) l, _ = np.meshgrid(np.linspace(0, 1, w), np.linspace(0, 1, h)) l_mask = 1.0 - np.clip(20 * (l - 0.05), 0, 1) r_mask = np.fliplr(l_mask) return r_mask * l_disp + l_mask * r_disp + (1.0 - l_mask - r_mask) * m_disp def test_simple(params): """Test function.""" left = tf.placeholder(tf.float32, [2, args.input_height, args.input_width, 3]) model = DepthModel(params, "test", left, None) input_image = cv2.imread(args.image_path) input_image = cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB) original_height, original_width, num_channels = input_image.shape input_image = cv2.resize(input_image, dsize = (args.input_width, args.input_height)) input_image = input_image.astype(np.float32) / 255 input_images = np.stack((input_image, np.fliplr(input_image)), 0) # SESSION config = tf.ConfigProto(allow_soft_placement=True) sess = tf.Session(config=config) # SAVER train_saver = tf.train.Saver() # INIT sess.run(tf.global_variables_initializer()) sess.run(tf.local_variables_initializer()) coordinator = tf.train.Coordinator() # RESTORE restore_path = args.checkpoint_path.split(".")[0] train_saver.restore(sess, restore_path) disp = sess.run(model.disp_left_est[0], feed_dict={left: input_images}) disp_pp = post_process_disparity(disp.squeeze()).astype(np.float32) output_directory = os.path.dirname(args.image_path) output_name = os.path.splitext(os.path.basename(args.image_path))[0] # np.save(os.path.join(output_directory, "{}_disp.npy".format(output_name)), disp_pp) disp_to_img = cv2.resize(disp_pp.squeeze(), dsize = (original_width, original_height)) np.save(os.path.join(output_directory, "{}_disp.npy".format(output_name)), disp_to_img) plt.imsave(os.path.join(output_directory, "{}_disp.png".format(output_name)), disp_to_img, cmap='plasma') print('done!') def main(_): params = depth_parameters( encoder=args.encoder, height=args.input_height, width=args.input_width, batch_size=2, num_threads=1, num_epochs=1, wrap_mode="border", use_deconv=False, alpha_image_loss=0, disp_gradient_loss_weight=0, lr_loss_weight=0, full_summary=False) test_simple(params) if __name__ == '__main__': tf.app.run() ================================================ FILE: depth/get_model.sh ================================================ model_name=$1 output_location=$2 filename=$model_name.zip url=http://visual.cs.ucl.ac.uk/pubs/monoDepth/models/$filename output_file=$output_location/$filename echo "Downloading $model_name" wget -nc $url -O $output_file unzip $output_file -d $output_location rm $output_file ================================================ FILE: gui.py ================================================ import gi gi.require_version('Gtk','3.0') from gi.repository import Gtk, Gdk, GdkPixbuf import threading from functools import partial import os from defocus.defocus import DefocuserObject import cv2 class DefudeGui(object): def __init__(self,checkpoint_path,glade_file='ui-stepper.glade'): self.builder = Gtk.Builder() # load the glade file describing the UI self.builder.add_from_file(glade_file) # connect the event handlers self.builder.connect_signals(self) # all the available windows self.main_window = self.builder.get_object('main-window') self.about_dialog = self.builder.get_object('about-dialog') self.help_dialog = self.builder.get_object('help-dialog') # some ui elements self.input_image_drop = self.builder.get_object('input-image-drop') self.select_input_image_picker = self.builder.get_object('select-image-file-picker') self.step_stack = self.builder.get_object('main-step-stack') self.image_drop_dest = self.builder.get_object('image-drag-drop-dest') self.input_image_spinner = self.builder.get_object('input-image-spinner') self.input_image_status_line = self.builder.get_object('input-image-status') self.input_image_delete_btn = self.builder.get_object('delete-input-img-btn') self.input_image_next_btn = self.builder.get_object('input-image-next-btn') self.depth_map_image = self.builder.get_object('depth-map-image') self.pof_image = self.builder.get_object('pick-pof-image') self.pof_status_line = self.builder.get_object('pof-status-line') self.pof_status_spinner = self.builder.get_object('pof-status-spinner') self.result_image = self.builder.get_object('result-image') self.header_bar = self.builder.get_object('main-header-bar') page_ids = ( 'start-page', 'select-image-page', 'depth-map-preview', 'pick-pof', 'result-page' ) self.pages = tuple( self.builder.get_object(page_id) for page_id in page_ids ) # some useful constants self.IMAGE_WIDTH = 600 self.CHECKPOINT_PATH = checkpoint_path self.STATUS_MESSAGES = { 'depth_est_running': 'Estimating depthmap...', 'defocus_running': 'Performing defocusing...', 'idle': '', 'resizing': 'Loading the image...' } self.DEFAULT_INPUT_IMAGE = 'assets/drag-and-drop.png' self.WINDOW_TITLE = 'Synthetic Defocusing and Depth Estimation Tool' # state variables self.INPUT_IMAGE_PATH = None self.INPUT_IMAGE_SIZE = None self.DEPTH_MAP_PATH = None self.DEFOCUS_IMAGE_PATH = None self.CURRENT_STACK_PAGE = 0 # set the image as a drag drop destination self.image_drop_dest.drag_dest_set( # do all the default stuff Gtk.DestDefaults.ALL, # enforce target [Gtk.TargetEntry.new('text/plain',Gtk.TargetFlags(4), 129)], Gdk.DragAction.COPY ) def _next_page(self): if self.CURRENT_STACK_PAGE < len(self.pages) - 1: self.CURRENT_STACK_PAGE += 1 self.step_stack.set_visible_child(self.pages[self.CURRENT_STACK_PAGE]) def _prev_page(self): if self.CURRENT_STACK_PAGE > 0: self.CURRENT_STACK_PAGE -= 1 self.step_stack.set_visible_child(self.pages[self.CURRENT_STACK_PAGE]) def _set_input_image_impl(self, img_path): spinner = self.input_image_spinner status_line = self.input_image_status_line self.INPUT_IMAGE_PATH = img_path img,w,h = self._resize_image(img_path,return_size=True) self.INPUT_IMAGE_SIZE = (w,h) self.input_image_drop.set_from_pixbuf(img) self.select_input_image_picker.hide() spinner.stop() status_line.set_label(self.STATUS_MESSAGES['idle']) self.input_image_delete_btn.set_sensitive(True) self.input_image_next_btn.set_sensitive(True) def _set_input_image(self, img_path): spinner = self.input_image_spinner status_line = self.input_image_status_line spinner.start() status_line.set_label(self.STATUS_MESSAGES['resizing']) thread = threading.Thread(target=partial(self._set_input_image_impl,img_path)) thread.daemon = True thread.start() def _unset_input_image(self): self.INPUT_IMAGE_PATH = None self.INPUT_IMAGE_SIZE = None self.input_image_drop.set_from_file(self.DEFAULT_INPUT_IMAGE) self.select_input_image_picker.show() self.input_image_next_btn.set_sensitive(False) def _resize_image(self,path,size=None,return_size=False): img = GdkPixbuf.Pixbuf.new_from_file(path) if size is None: h = img.get_height() w = img.get_width() ar = h / float(w) size = (self.IMAGE_WIDTH,int(self.IMAGE_WIDTH*ar)) resized = img.scale_simple(size[0],size[1],GdkPixbuf.InterpType.BILINEAR) if return_size: return resized,size[0],size[1] return resized def _estimate_depthmap_impl(self): os.system("python ./depth/depth_simple.py --checkpoint_path " + self.CHECKPOINT_PATH + " --image_path " + self.INPUT_IMAGE_PATH) self.DEPTH_MAP_PATH = os.path.join(os.path.dirname(self.INPUT_IMAGE_PATH), os.path.basename(self.INPUT_IMAGE_PATH).split('.')[0]) + '_disp.png' depthmap_img = self._resize_image(self.DEPTH_MAP_PATH) self.depth_map_image.set_from_pixbuf(depthmap_img) self.input_image_spinner.stop() self.input_image_status_line.set_label(self.STATUS_MESSAGES['idle']) self._next_page() self.input_image_next_btn.set_sensitive(True) self.input_image_delete_btn.set_sensitive(True) def _estimate_depthmap(self): self.input_image_next_btn.set_sensitive(False) self.input_image_delete_btn.set_sensitive(False) self.input_image_spinner.start() self.input_image_status_line.set_label(self.STATUS_MESSAGES['depth_est_running']) thread = threading.Thread(target=self._estimate_depthmap_impl) thread.daemon = True thread.start() def _defocus_image_impl(self, x_norm, y_norm): defocusser = DefocuserObject(self.INPUT_IMAGE_PATH) defocusser.set_pof_from_coord(x_norm,y_norm) self.DEFOCUS_IMAGE_PATH = os.path.join(os.path.dirname(self.INPUT_IMAGE_PATH), os.path.basename(self.INPUT_IMAGE_PATH).split('.')[0]) + '_defocus.png' img = self._resize_image(self.DEFOCUS_IMAGE_PATH) self.result_image.set_from_pixbuf(img) self.pof_status_spinner.stop() self.pof_status_line.set_label(self.STATUS_MESSAGES['idle']) self._next_page() def _defocus_image(self, x_norm, y_norm): self.pof_status_spinner.start() self.pof_status_line.set_label(self.STATUS_MESSAGES['defocus_running']) thread = threading.Thread(target=partial(self._defocus_image_impl,x_norm,y_norm)) thread.daemon = True thread.start() def _save(self, src_filename): dialog = Gtk.FileChooserDialog("Save as",self.main_window,Gtk.FileChooserAction.SAVE,(Gtk.STOCK_SAVE,Gtk.ResponseType.OK,Gtk.STOCK_CANCEL,Gtk.ResponseType.CANCEL)) response = dialog.run() if response == Gtk.ResponseType.OK: img = cv2.imread(src_filename) cv2.imwrite(dialog.get_filename(),img) dialog.destroy() def _cleanup(self): file_list = ( self.DEFOCUS_IMAGE_PATH, self.DEPTH_MAP_PATH, ) for file in file_list: if file: if os.path.isfile(file): os.remove(file) def show(self): self.main_window.show_all() Gtk.main() def onDestroy(self, *args): Gtk.main_quit() # self._cleanup() def onStartPageNext(self, *args): self._next_page() def onBack(self, *args): self._prev_page() def onAbout(self, *args): self.about_dialog.run() self.about_dialog.hide() def onHelp(self, *args): self.help_dialog.run() self.help_dialog.hide() def onImagePickerSet(self, *args): input_file_name = args[0].get_filename() self._set_input_image(input_file_name) def onImageDrop(self, *args): filename = args[4].get_text()[7:-1] self._set_input_image(filename) def onDeleteInputImage(self, *args): self._unset_input_image() self.input_image_delete_btn.set_sensitive(False) def onInputImageNextBtn(self, *args): self._estimate_depthmap() def onDepthMapNextBtn(self, *args): img = self.input_image_drop.get_pixbuf() self.pof_image.set_from_pixbuf(img) self._next_page() def onDepthMapSave(self, *args): self._save(self.DEPTH_MAP_PATH) def onPofPick(self, *args): event = args[1] x,y = event.x, event.y x_norm = x / self.INPUT_IMAGE_SIZE[0] y_norm = y / self.INPUT_IMAGE_SIZE[1] print("x_norm: {} y_norm: {}".format(x_norm,y_norm)) self._defocus_image(x_norm,y_norm) def onResultSave(self, *args): self._save(self.DEFOCUS_IMAGE_PATH) if __name__ == '__main__': parser = argparse.ArgumentParser(description='Synthetic Defocussing Using Depth Estimation') parser.add_argument('--model_path', type=str, help='path to saved model', required=True) args = parser.parse_args() model_path = os.path.abspath(args.model_path) gui = DefudeGui(model_path) gui.show() ================================================ FILE: main.py ================================================ import argparse import os parser = argparse.ArgumentParser(description='Synthetic Defocussing Using Depth Estimation') parser.add_argument('--image_path', type=str, help='path to input image', default='images/sample2.png') parser.add_argument('--model_path', type=str, help='path to saved model', default='blah') parser.add_argument('--blur_method', type=str, help='the type of blur to be applied', default='gaussian') args = parser.parse_args() img_path = os.path.abspath(args.image_path) model_path = os.path.abspath(args.model_path) blur_method = args.blur_method os.system("python ./depth/depth_simple.py --model_path " + model_path + " --image_path " + img_path) os.system("python ./defocus/defocus.py --image_path " + img_path + " --blur_method " + blur_method) ================================================ FILE: preprocessing.py ================================================ # Project: Synthetic Defocusing using Unsupervised Monocular Depth Estimation # This file contains the function for preprocessing stage # Input: Image file of any image format extension # Output: Preprocessed image for input to the model import cv2 import numpy as np import glob # Used for file access # Performs preprocessing functions on the given image # Params: # img: Image in OpenCV image type (numpy.ndarray) # Returns: Preprocessed image in OpenCV image type def preprocess(img): # Denoising denoised_img = cv2.fastNlMeansDenoisingColored(img) # Conversion to Grayscale # gray_img = cv2.cvtColor(denoised_img, cv2.COLOR_BGR2GRAY) # Sharpening kernel = np.array([[0,-1,0], [-1,5,-1], [0,-1,0]]) sharpen_img = cv2.filter2D(denoised_img, -1, kernel) # Histogram Equalization equ_img = cv2.equalizeHist(sharpen_img) clahe = cv2.createCLAHE(clipLimit=2.0, tileGridSize=(8,8)) clahe = cv2.createCLAHE() equ_img = clahe.apply(sharpen_img) # Resizing resized_img = cv2.resize(sharpen_img, (512,256), cv2.INTER_AREA) prep_img = resized_img return prep_img # Run the preprocessing functions on a single image with before and after # Escape Key terminates the windows # The output of the file can be written to a file by setting output param # Params # file: Filename of the image as string # waitTime: How long the image should be shown in ms # output: Write the preprocessed file here # Returns: false if terminated using Esc key else true def preprocess_single(file, waitTime = 0, output = ""): img = cv2.imread(file) prep_img = preprocess(img) if output: cv2.imwrite(output, prep_img) cv2.imshow("Original Image", img) cv2.imshow("Preprocessed Image", prep_img) if cv2.waitKey(waitTime) == 27: cv2.destroyAllWindows() return 0 cv2.destroyAllWindows() # Performs preprocessing function on multiple images in a folder # Params: # folder_path: Folder name which holds the images with ending slash def preprocess_multiple(folder_path): folder_path = folder_path + ".png" for file in glob.glob(folder_path): preprocess_single(file, 2000) if __name__ == "__main__": preprocess_single("./images/sample0.png", output="./sample0_pre.png") ================================================ FILE: requirements.txt ================================================ cycler==0.10.0 kiwisolver==1.1.0 matplotlib==3.0.3 numpy==1.16.3 opencv-python==4.1.0.25 protobuf==3.7.1 pycairo==1.18.1 PyGObject==3.32.1 pyparsing==2.4.0 python-dateutil==2.8.0 six==1.12.0 tensorflow==1.15.4 ================================================ FILE: screenshot/README.md ================================================ ## Our beautiful UI :D ![defude_home](defude_home.png) ![defude_select_image](defude_select_image.png) ![defude_estimating_depth](defude_estim_depth.png) ![defude_depthmap](defude_depthmap.png) ![defude_defocused](defude_defocused.png) ================================================ FILE: ui-stepper.glade ================================================

================================================ FILE: ui.glade ================================================