Repository: griffbr/ODMD Branch: main Commit: b348a8017479 Files: 38 Total size: 31.8 KB Directory structure: gitextract_0g3k5q3j/ ├── .gitignore ├── README.md ├── config/ │ ├── camera/ │ │ ├── ODMS_camera.yaml │ │ └── hsr_grasp_camera.yaml │ ├── data_gen/ │ │ ├── ODMS.yaml │ │ └── standard_data.yaml │ ├── model/ │ │ ├── DBox.yaml │ │ └── DBox_absolute.yaml │ └── train/ │ └── train_demo.yaml ├── data/ │ ├── odmd/ │ │ ├── test/ │ │ │ ├── test_normal.pk │ │ │ ├── test_perturb_detection.pk │ │ │ ├── test_perturb_motion.pk │ │ │ └── test_robot.pk │ │ └── val/ │ │ ├── val_normal.pk │ │ ├── val_perturb_detection.pk │ │ ├── val_perturb_motion.pk │ │ └── val_robot.pk │ └── odms_detection/ │ ├── test/ │ │ ├── test_odms_detect_driving.pk │ │ ├── test_odms_detect_normal.pk │ │ ├── test_odms_detect_perturb.pk │ │ └── test_odms_detect_robot.pk │ └── val/ │ ├── val_odms_detect_driving.pk │ ├── val_odms_detect_normal.pk │ ├── val_odms_detect_perturb.pk │ └── val_odms_detect_robot.pk ├── dbox/ │ ├── __init__.py │ ├── dbox.py │ └── model_utils.py ├── demo/ │ ├── demo_DBox_eval.py │ ├── demo_DBox_train.py │ ├── demo_datagen.py │ └── demo_dataset_eval.py └── odmd/ ├── __init__.py ├── closed_form/ │ ├── __init__.py │ └── closed_form.py └── data_gen/ ├── __init__.py ├── data_gen.py └── data_gen_util.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .DS_Store *.pyc *.swp /data/example_generated_data/ /results/* ================================================ FILE: README.md ================================================ # ODMD Dataset ODMD is the first dataset for learning **O**bject **D**epth via **M**otion and **D**etection. ODMD training data are configurable and extensible, with each training example consisting of a series of object detection bounding boxes, camera movement distances, and ground truth object depth. As a benchmark evaluation, we provide four ODMD validation and test sets with 21,600 examples in multiple domains, and we also convert 15,650 examples from the [ODMS benchmark](https://github.com/griffbr/odms "ODMS dataset website") for detection. In our paper, we use a single ODMD-trained network with object detection *or* segmentation to achieve state-of-the-art results on existing driving and robotics benchmarks and estimate object depth from a camera phone, demonstrating how ODMD is a viable tool for monocular depth estimation in a variety of mobile applications. Contact: Brent Griffin (griffb at umich dot edu) __Depth results using a camera phone.__ ![alt text](./figure/example_ODMD_phone_results.jpg?raw=true "Depth results using a camera phone") ## Using ODMD __Run__ ``./demo/demo_datagen.py`` to generate random ODMD data to train or test your model.
Example data generation and camera configurations are provided in the ``./config/`` folder. ``demo_datagen.py`` has the option to save data into a static dataset for repeated use.
[native Python] __Run__ ``./demo/demo_dataset_eval.py`` to evaluate your model on the ODMD validation and test sets.
``demo_dataset_eval.py`` has an example evaluation for the BoxLS baseline and instructions for using our detection-based version of [ODMS](https://github.com/griffbr/ODMS "ODMS dataset website"). Results are saved in the ``./results/`` folder.
[native Python] ## Benchmark | Method | Normal | Perturb Camera | Perturb Detect | Robot | All | | --------------- | --------------- | --------------- | --------------- | --------------- | --------------- | | [DBox](https://arxiv.org/abs/2103.01468 "CVPR 2021 Paper") | 1.73 | 2.45 | 2.54 | **11.17** | **4.47** | | [DBoxAbs](https://arxiv.org/abs/2103.01468 "CVPR 2021 Paper") | 1.11 | **2.05** | **1.75** | 13.29 | 4.55 | | [BoxLS](https://arxiv.org/abs/2103.01468 "CVPR 2021 Paper") | **0.00** | 4.47 | 21.60 | 21.23 | 11.83 | Is your technique missing although it's published and the code is public? Let us know and we'll add it. ## Using DBox Method __Run__ ``./demo/demo_dataset_DBox_train.py`` to train your own DBox model using ODMD.
__Run__ ``./demo/demo_dataset_DBox_eval.py`` after training to evaluate your DBox model.
Example training and DBox model configurations are provided in the ``./config/`` folder. Models are saved in the ``./results/model/`` folder. ``demo_dataset_DBox_eval.py`` also has instructions for using our pretrained DBox model.
[native Python, has Torch dependency] ## Publication Please cite our [paper](https://openaccess.thecvf.com/content/CVPR2021/html/Griffin_Depth_From_Camera_Motion_and_Object_Detection_CVPR_2021_paper.html "Depth from Camera Motion and Object Detection pdf") if you find it useful for your research. ``` @inproceedings{GrCoCVPR21, author = {Griffin, Brent A. and Corso, Jason J.}, booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, title = {Depth from Camera Motion and Object Detection}, year = {2021} } ``` __CVPR 2021 presentation video:__ https://youtu.be/bgEGjt9NU9w [![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/bgEGjt9NU9w/0.jpg)](https://youtu.be/bgEGjt9NU9w) ## Use This code is available for non-commercial research purposes only. ================================================ FILE: config/camera/ODMS_camera.yaml ================================================ camera_name: Pinhole ODMS Camera image_dim: [480,640] fx: 240.5 fy: 240.5 cx: 320.5 cy: 240.5 ================================================ FILE: config/camera/hsr_grasp_camera.yaml ================================================ camera_name: HSR Grasp Camera image_dim: [480,640] fx: 205.5 fy: 205.5 cx: 320.5 cy: 240.5 ================================================ FILE: config/data_gen/ODMS.yaml ================================================ data_description: Z motion for ODMS z_lim: [0.55,1] move_max: [0,0,0.4625] size_lim: [0.01,0.175] move_min: [0,0,0.05] num_pos: 10 perturb: True std_dev: 0.001 cam_dev: 0.01 shuffle: 0.1 end_swap: True ================================================ FILE: config/data_gen/standard_data.yaml ================================================ data_description: XYZ motion based on robot-collected data z_lim: [0.55,1] move_max: [0.25,0.175,0.325] size_lim: [0.01,0.175] move_min: [0,0,0.05] num_pos: 10 perturb: True std_dev: 0.001 cam_dev: 0.01 shuffle: 0.1 end_swap: True ================================================ FILE: config/model/DBox.yaml ================================================ model_name: DBox params: hidden_sz: 128 input_obs: 1 fcs: [256,256,256,256,256,256] fc_drop: 0 peep: True all_loss: False prediction: normalized sequence_in: True sequence_dist: True ================================================ FILE: config/model/DBox_absolute.yaml ================================================ model_name: DBox params: hidden_sz: 128 input_obs: 1 fcs: [256,256,256,256,256,256] fc_drop: 0 peep: True all_loss: False prediction: absolute sequence_in: True sequence_dist: False ================================================ FILE: config/train/train_demo.yaml ================================================ batch_size: 512 train_iter: 100000 n_display: 1000 n_train_model_saves: 10 ================================================ FILE: dbox/__init__.py ================================================ from .dbox import * from .model_utils import * ================================================ FILE: dbox/dbox.py ================================================ import torch import torch.nn as nn import torch.nn.functional as F import IPython, math, os # LSTM design inspired by https://towardsdatascience.com/building-a-lstm-by-hand-on-pytorch-59c02a4ec091 def pass_arg_for_model(model_name, **kwargs): if model_name == "DBox": model = DBox(**kwargs) return model class DBox(nn.Module): def __init__(self,input_obs,hidden_sz,n_obs,fcs=[1],fc_drop=0,peep=False, all_loss=False): super(DBox, self).__init__() # LSTM. input_sz = input_obs*7 self.idx = [[i+j for j in range(input_obs)] for i in range(n_obs-input_obs+1)] self.n_in_obs = input_obs self.input_sz = input_sz self.hidden_sz = hidden_sz self.W = nn.Parameter(torch.Tensor(input_sz, hidden_sz * 4)) self.U = nn.Parameter(torch.Tensor(hidden_sz, hidden_sz * 4)) self.bias = nn.Parameter(torch.Tensor(hidden_sz * 4)) self.peephole = peep self.all_loss = all_loss if self.all_loss: self.n_predict = len(self.idx) else: self.n_predict = 1 if peep: self.V = nn.Parameter(torch.Tensor(hidden_sz, hidden_sz * 4)) self.cat_sz = n_obs * 7 self.init_weights() # Linear layers. self.n_fc = len(fcs) self.relu = nn.ReLU(inplace=True) self.linears = nn.ModuleList([nn.Linear(hidden_sz+self.cat_sz,fcs[0])]) for i in range(self.n_fc-1): self.linears.append(nn.Linear(fcs[i] + self.cat_sz, fcs[i+1])) self.linears.append(nn.Linear(fcs[-1],1)) # Dropout. if fc_drop > 0: self.drop = True self.dropout = nn.Dropout(p=fc_drop) else: self.drop = False self.return_info = False def init_weights(self): stdv = 1.0 / math.sqrt(self.hidden_sz) for weight in self.parameters(): weight.data.uniform_(-stdv, stdv) def forward(self, x, init_state=None): """Assumes x is of shape (batch, sequence, feature)""" bs, seq_sz, _ = x.size() hidden_seq = [] if init_state is None: h_t, c_t = (torch.zeros(bs, self.hidden_sz).to(x.device), torch.zeros(bs, self.hidden_sz).to(x.device)) else: h_t, c_t = init_state HS = self.hidden_sz #for t in range(seq_sz): # x_t = x[:, t, :] for idx in self.idx: x_t = x[:, idx, :].reshape(bs, 1, self.input_sz)[:,0,:] # Batch computations into single matrix multiplication. if self.peephole: gates = x_t @ self.W + h_t @ self.U + self.bias + c_t @ self.V else: gates = x_t @ self.W + h_t @ self.U + self.bias o_t = torch.sigmoid(gates[:, HS*3:]) # output i_t, f_t, g_t = ( torch.sigmoid(gates[:, :HS]), # input torch.sigmoid(gates[:, HS:HS*2]), # forget torch.tanh(gates[:, HS*2:HS*3]), ) c_t = f_t * c_t + i_t * g_t if self.peephole: gates = x_t @ self.W + h_t @ self.U + self.bias + c_t @ self.V o_t = torch.sigmoid(gates[:, HS*3:]) # output h_t = o_t * torch.tanh(c_t) if self.return_info or self.all_loss: hidden_seq.append(h_t.unsqueeze(0)) # Linear layers. x_cat = x.reshape(1, bs, self.cat_sz) d_set = torch.zeros(bs, self.n_predict) if self.all_loss: h_t_set = hidden_seq else: h_t_set = [h_t.unsqueeze(0)] for j, d in enumerate(h_t_set): for i in range(self.n_fc): d = torch.cat((d, x_cat), 2) d = self.linears[i](d) d = self.relu(d) if self.drop: d = self.dropout(d) d = self.linears[-1](d) d_set[:,j] = d.reshape(-1) if self.return_info: hidden_seq = torch.cat(hidden_seq, dim=0) # Reshape from (sequence, batch, feature) to (batch, seq, feature) hidden_seq = hidden_seq.transpose(0, 1).contiguous() return d_set, hidden_seq, (h_t, c_t) else: return d_set ================================================ FILE: dbox/model_utils.py ================================================ import torch, os, IPython, numpy as np, yaml from copy import deepcopy from .dbox import * def count_parameters(model): return sum(p.numel() for p in model.parameters() if p.requires_grad) def load_model(model_config, n_pos=10): m_param = yaml.full_load(open(model_config)) m_param["params"]["n_obs"] = n_pos # Take model configuration parameters and set up network architecture. net = pass_arg_for_model(m_param["model_name"], **m_param["params"]) # Randomly select between both GPUs (distribute training). gpus = ["cuda:0", "cuda:1"] idx = np.random.randint(0,2) try: device = torch.device(gpus[idx]) net.to(device) except: try: device = torch.device(gpus[idx-1]) net.to(device) except: device = torch.device("cpu") net.to(device) m_param["n_predict"] = net.n_predict return net, device, m_param def load_training_params(train_config): train = yaml.full_load(open(train_config)) train["display_iter"] = int(train["train_iter"] / train["n_display"]) train["save_iter"] = np.linspace(0, train["train_iter"], train["n_train_model_saves"] + 1).astype(int) return train def load_weights(net, weight_file): # Load weights onto already initialized network model. try: net.load_state_dict(torch.load(weight_file)) except: net.load_state_dict(torch.load(weight_file, map_location=lambda storage, loc: storage)) return net class BoundingBoxToNetwork: """ BoudingBoxToNetwork converts bounding boxes to network input. """ def __init__(self, m_params, n_bat=1): """ Args: n_obs (int): Number of bounding box and movement inputs to network. n_bat (int): Number of input sets batched together. """ self.n_obs = m_params["params"]["n_obs"] self.bb_in = self.n_obs * 4 #self.cam_in = (self.n_obs - 1) * 3 self.cam_in = self.n_obs * 3 self.in_sz = self.bb_in + self.cam_in self.prediction = m_params["prediction"] self.sequence = m_params["sequence_in"] if self.sequence: self.seq_dist = m_params["sequence_dist"] self.n_predict = m_params["n_predict"] self.set_batch(n_bat) def set_batch(self, n_bat): # Change network input batch size. self.n_bat = n_bat self.tmp_in = np.zeros(shape=(n_bat, self.in_sz), dtype="float32") self.labels = torch.zeros(n_bat, self.n_predict, dtype=torch.float) if self.sequence: self.inputs =torch.zeros(self.n_bat,self.n_obs,7,dtype=torch.float) self.tmp2=np.zeros(shape=(self.n_bat,self.n_obs,7),dtype="float32") else: self.inputs = torch.zeros(self.n_bat,self.in_sz,dtype=torch.float) def bb_to_labels(self, bb_3D, bb): # Convert bounding box data to network input and labels. idx = np.linspace(0, bb_3D["n_positions"] - 1, self.n_obs, dtype=int) cam_idx = np.linspace(0,bb_3D["n_positions"]-2,self.n_obs-1,dtype=int) bb = missing_detection_network_filter(bb, idx) self.tmp_in[:,:self.bb_in] = bb["bboxes"][idx,:-1].reshape( self.bb_in, self.n_bat).T self.tmp_in[:,-self.cam_in:-3] = bb_3D["camera_movement"][ cam_idx].reshape(self.cam_in-3, self.n_bat).T """ # Temp change for recovering results from old networks. dim = bb_3D["camera_movement"].shape temp = np.zeros(shape=(dim[0]+1,dim[1],dim[2])) temp[:-1] = bb_3D["camera_movement"] temp -= temp[0] bb_3D["camera_movement"] = temp[1:] self.tmp_in[:,-self.cam_in:] = np.array(bb_3D["camera_movement"])[ idx[1:]-1].reshape(self.cam_in, self.n_bat).T """ # Convert input and labels for prediction type. self.labels[:] = torch.from_numpy(bb["bboxes"][-self.n_predict:,-1].T) #self.labels[:] = torch.from_numpy(bb["bboxes"][-1][-1]) if self.prediction == "normalized": self.norm = np.linalg.norm(self.tmp_in[:, -self.cam_in:-self.cam_in+3], axis=1) self.tmp_in[:,-self.cam_in:] = np.multiply( self.tmp_in[:,-self.cam_in:],(1/self.norm)[:, np.newaxis]) self.labels[:] /= np.tile(self.norm, (self.n_predict,1)).T #self.labels[:] /= self.norm # Sequence separates each observation for LSTM input. if self.sequence: self.tmp2[:,:,:4] = self.tmp_in[:,:self.bb_in].reshape( self.n_bat,self.n_obs,4) self.tmp2[:,:,-3:] = self.tmp_in[:,-self.cam_in:].reshape( self.n_bat, self.n_obs, 3) # Sequential has each movement relative to previous observation. if self.seq_dist: self.tmp2[:,1:,-3:] = np.diff(self.tmp2[:,:,-3:], axis=1) self.tmp2[:,0,-3:] = 0 self.inputs[:] = torch.from_numpy(self.tmp2) else: self.inputs[:] = torch.from_numpy(self.tmp_in) def missing_detection_network_filter(bb, idx): miss_idx = np.argwhere(bb["bboxes"][idx,2]==0) n_miss = len(miss_idx) if n_miss > 0: print("\nMissing %i observations! Using nearest valid.\n" % n_miss) bb_init = deepcopy(bb["bboxes"]) for idx in miss_idx: # Replace misssing detection with closest valid observation. obs_idx = np.argwhere((bb_init[:,2,idx[1]]==0)==False) try: replace_i = obs_idx[np.argmin(abs(obs_idx-idx[0]))][0] except: print("No replacement!") replace_i = 0 bb["bboxes"][idx[0],:-1,idx[1]] =bb["bboxes"][replace_i,:-1,idx[1]] return bb ================================================ FILE: demo/demo_DBox_eval.py ================================================ """ Demonstration of how to evaluate DBox network on ODMD. """ import sys, os, IPython, torch, _pickle as pickle, numpy as np file_dir = os.path.dirname(os.path.abspath(__file__)) os.chdir(file_dir) sys.path.insert(0,"../") import odmd, dbox net_name = "DBox_demo" #net_name = "DBox_pretrained" # Uncomment to run DBox model from paper. model_idx = -1 # Can cycle through indices to find best validation performance. # Select dataset to evaluate. dataset = "odmd" # or "odms_detection" for ODMS dataset converted to detection. eval_set = "val" # or "test" once model training and development are complete. # Select configuration (more example settings from paper in config directory). datagen_config = "../config/data_gen/standard_data.yaml" camera_config = "../config/camera/hsr_grasp_camera.yaml" train_config = "../config/train/train_demo.yaml" dbox_config = "../config/model/DBox.yaml" # Initiate data generator, model, data loader, and load weights. odmd_data = odmd.data_gen.DataGenerator(datagen_config) odmd_data.initialize_data_gen(camera_config) net, device, m_params = dbox.load_model(dbox_config, odmd_data.num_pos) bb2net = dbox.BoundingBoxToNetwork(m_params) model_dir = os.path.join("../results", "model", net_name) model_list = sorted([pt for pt in os.listdir(model_dir) if pt.endswith(".pt")]) net = dbox.load_weights(net, os.path.join(model_dir, model_list[model_idx])) # Initiate dataset information. set_dir = os.path.join("../data", dataset, eval_set) set_list = sorted([pk for pk in os.listdir(set_dir) if pk.endswith(".pk")]) percent_error=[]; abs_error=[]; predictions_all=[] with torch.no_grad(): for test in set_list: # Load data for specific set. bb_data = pickle.load(open(os.path.join(set_dir, test), "rb")) bb_3D, bb = bb_data["bb_3D"], bb_data["bb"] # Run DBox with correct post-processing for configuration. bb2net.set_batch(bb_3D["n_ex"]) bb2net.bb_to_labels(bb_3D, bb) inputs = bb2net.inputs.to(device) predictions = net(inputs).cpu().numpy() if bb2net.prediction == "normalized": predictions[:,0] *= bb2net.norm depths = bb["bboxes"][-1][-1] percent_error.append(np.mean( abs(predictions[:,0] - depths) / depths)) abs_error.append(np.mean(abs(predictions[:,0] - depths))) predictions_all.append(predictions) # Print out final results. print("\nResults summary for ODMD %s sets." % eval_set) for i, test_set in enumerate(set_list): print("\n%s-%s:" % (test_set, eval_set)) print("Mean Percent Error: %.4f" % percent_error[i]) print("Mean Absolute Error: %.4f (m)" % abs_error[i]) # Generate final results file. name = model_list[model_idx].split(".pt")[0] data_name = "%s_%s" % (dataset, eval_set) print("\nSaving %s results file for %s.\n" % (data_name, name)) result_data = {"Result Name": name, "Set List": set_list, "Percent Error": percent_error, "Absolute Error": abs_error, "Depth Estimates": predictions_all, "Dataset": data_name} os.makedirs("../results/", exist_ok=True) pickle.dump(result_data, open("../results/%s_%s.pk" % (name, data_name), "wb")) ================================================ FILE: demo/demo_DBox_train.py ================================================ """ Demonstration of how to generate train DBox network on ODMD. """ import sys, os, IPython, torch file_dir = os.path.dirname(os.path.abspath(__file__)) os.chdir(file_dir) sys.path.insert(0,"../") import odmd, dbox net_name = "DBox_demo" # Select configuration (more example settings from paper in config directory). datagen_config = "../config/data_gen/standard_data.yaml" camera_config = "../config/camera/hsr_grasp_camera.yaml" train_config = "../config/train/train_demo.yaml" dbox_config = "../config/model/DBox.yaml" # Initiate data generator, model, training parameters, and data loader. odmd_data = odmd.data_gen.DataGenerator(datagen_config) odmd_data.initialize_data_gen(camera_config) net, device, m_params = dbox.load_model(dbox_config, odmd_data.num_pos) train = dbox.load_training_params(train_config) bb2net = dbox.BoundingBoxToNetwork(m_params, train["batch_size"]) # Initiate training! model_dir = os.path.join("../results", "model", net_name) os.makedirs(model_dir, exist_ok=True) criterion = torch.nn.L1Loss() optimizer = torch.optim.Adam(net.parameters(), lr=0.001, weight_decay=0) running_loss=0.0; ct=0 print("Starting training for %s." % net_name) while ct < train["train_iter"]: # Generate examples for ODMD training (repeat for each training iteration). bb_3D, bb = odmd_data.generate_object_examples(bb2net.n_bat) bb_3D, bb = odmd.data_gen.add_perturbations(bb_3D, bb, odmd_data) # Network inputs and labels, forward pass, loss, and gradient. bb2net.bb_to_labels(bb_3D, bb) inputs, labels = bb2net.inputs.to(device), bb2net.labels.to(device) outputs = net(inputs).to(device) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() running_loss += loss.item() # Print progress details and save model at set interval. ct += 1 if ct % train["display_iter"] == 0: cur_loss = running_loss / train["display_iter"] print("[%9d] loss: %.6f" % (ct, cur_loss)) running_loss = 0.0 if ct in train["save_iter"]: torch.save(net.state_dict(), "%s/%s_%09d.pt" % (model_dir,net_name,ct)) print("[%9d] interval model saved." % ct) ================================================ FILE: demo/demo_datagen.py ================================================ """ Demonstration of how to generate new training data on ODMD. """ import sys, os, IPython, _pickle as pickle file_dir = os.path.dirname(os.path.abspath(__file__)) os.chdir(file_dir) sys.path.insert(0,"../") import odmd # Select configuration (more example settings from paper in config directory). datagen_config = "../config/data_gen/standard_data.yaml" camera_config = "../config/camera/hsr_grasp_camera.yaml" # Other data generation settings. n_examples = 20 # Configure for batch size if training. save_examples = False set_name = "example_data_gen" # Initiate data generator. odmd_data = odmd.data_gen.DataGenerator(datagen_config) odmd_data.initialize_data_gen(camera_config) # Generate examples for ODMD training (repeat for each training iteration). bb_3D, bb = odmd_data.generate_object_examples(n_examples) bb_3D, bb = odmd.data_gen.add_perturbations(bb_3D, bb, odmd_data) bboxes, camera_movements, depths = odmd.data_gen.bb_to_inputs(bb_3D, bb, odmd_data.num_pos) """ Use generated data to train your own network to predict depths given bboxes and camera_movements. See paper for ideas on possible initial configurations. """ # Save generated examples as a static dataset (optional). if save_examples: result_dir = "../data/example_generated_data" os.makedirs(result_dir, exist_ok=True) p_data = {"test_name": set_name, "bb_3D": bb_3D, "bb": bb} pickle.dump(p_data, open(os.path.join(result_dir, "%s.pk" % set_name), 'wb')) ================================================ FILE: demo/demo_dataset_eval.py ================================================ """ Demonstration of how to evaluate a depth estimation model on ODMD. """ import sys, os, IPython, numpy as np, _pickle as pickle file_dir = os.path.dirname(os.path.abspath(__file__)) os.chdir(file_dir) sys.path.insert(0,"../") import odmd # Select dataset to evaluate. dataset = "odmd" # or "odms_detection" for ODMS dataset converted to detection. eval_set = "val" # or "test" once model training and development are complete. # Misc. initialization. n_observations = 10 set_dir = os.path.join("../data", dataset, eval_set) set_list = sorted([pk for pk in os.listdir(set_dir) if pk.endswith(".pk")]) percent_error=[]; abs_error=[]; predictions_all=[] for test in set_list: # Load data for specific set. bb_data = pickle.load(open(os.path.join(set_dir, test), "rb")) bb_3D, bb = bb_data["bb_3D"], bb_data["bb"] bboxes, camera_movements, depths = odmd.data_gen.bb_to_inputs(bb_3D, bb, n_observations) """ Use your own depth estimation model here (to replace Box_LS): """ predictions = odmd.closed_form.Box_LS(bboxes, camera_movements) percent_error.append(np.mean( abs(predictions - depths) / depths)) abs_error.append(np.mean(abs(predictions - depths))) predictions_all.append(depths) # Print out final results. print("\nResults summary for ODMD %s sets." % eval_set) for i, test_set in enumerate(set_list): print("\n%s-%s:" % (test_set, eval_set)) print("Mean Percent Error: %.4f" % percent_error[i]) print("Mean Absolute Error: %.4f (m)" % abs_error[i]) # Generate final results file. name = "Box_LS" data_name = "%s_%s" % (dataset, eval_set) result_data = {"Result Name": name, "Set List": set_list, "Percent Error": percent_error, "Absolute Error": abs_error, "Depth Estimates": predictions_all, "Dataset": data_name} os.makedirs("../results/", exist_ok=True) pickle.dump(result_data, open("../results/%s_%s.pk" % (name, data_name), "wb")) ================================================ FILE: odmd/__init__.py ================================================ import odmd.data_gen import odmd.closed_form ================================================ FILE: odmd/closed_form/__init__.py ================================================ from .closed_form import * ================================================ FILE: odmd/closed_form/closed_form.py ================================================ # File: analytic_model.py import os, IPython, numpy as np, yaml from copy import deepcopy def Box_LS(input_bb, camera_move, n_obs=10): input_bb = missing_detection_filter(input_bb) # Find Ax = b least-squares solution (see equation sheet for details). n_examples = len(input_bb[0,0]) predictions = np.zeros(n_examples) A = np.zeros(shape=(2*n_obs, 3)) A[:n_obs,1]=1; A[n_obs:,2] = 1 b = np.zeros(2*n_obs) z = np.zeros(n_obs) for i in range(n_examples): w = input_bb[:,2,i] h = input_bb[:,3,i] z[:-1] = camera_move[:,2,i] b = np.concatenate((w*z, h*z)) A[:,0] = np.concatenate((w, h)) try: x = np.matmul(np.matmul(np.linalg.inv(np.matmul(A.T,A)),A.T),b) except: print("Warning! Matrix A.T A is not invertable! x is not solved.") x = np.zeros(3) predictions[i] = x[0] return predictions def missing_detection_filter(input_bb): miss_idx = np.argwhere(input_bb[:,2,:] == 0) n_miss = len(miss_idx) if n_miss > 0: print("\nMissing %i observations! Using nearest valid.\n" % n_miss) bb_init = deepcopy(input_bb) for idx in miss_idx: # Replace missing detection with closest valid obsesrvation. obs_idx = np.argwhere((bb_init[:,2,idx[1]]==0)==False) try: replace_idx = obs_idx[np.argmin(abs(obs_idx-idx[0]))][0] except: print("No observation at all!") replace_idx = 0 input_bb[idx[0],:,idx[1]] = input_bb[replace_idx,:,idx[1]] return input_bb def bb_m_parallax(input_bb, camera_move): # NOTE: Be sure that input number of observations is two! in_bb = missing_detection_filter(input_bb) # Using deltaZ neq 0 method from derivation. # Load camera parameters. Normalize cx, cy, fx, fy by image size. cam_file = "../config/camera/hsr_grasp_camera.yaml" params = yaml.full_load(open(cam_file)) dim = params["image_dim"] cx, cy = params["cx"]/dim[1], params["cy"]/dim[0] fx, fy = params["fx"]/dim[1], params["fy"]/dim[0] # Find x0, xf, y0, yf, w0/wf, h0/hf, and Delta X, Y from two observations. dX, dY = camera_move[0, 0, :], camera_move[0, 1, :] x0, xf, y0, yf = in_bb[0,0,:], in_bb[1,0,:], in_bb[0,1,:], in_bb[1,1,:] w0, wf, h0, hf = in_bb[0,2,:], in_bb[1,2,:], in_bb[0,3,:], in_bb[1,3,:] # Solve for depth using x and y motion parallax then average result. depth_mpx = dX*fx / ((xf-cx) - (x0-cx)*(wf/w0)) depth_mpy = dY*fy / ((yf-cy) - (y0-cy)*(hf/h0)) # Replace Inf. values with one. replace_idx = np.argwhere(np.isfinite(depth_mpx)==False) for idx in replace_idx: depth_mpx[idx] = 1 depth_mpy[idx] = 1 """ # No deltaZ version: depth_mpx = dX*fx / (xf - x0) depth_mpy = dY*fy / (yf - y0) """ predictions = (depth_mpx + depth_mpy) / 2 return predictions def bb_opt_expansion(input_bb, camera_move): # NOTE: Be sure that input number of observations is two! in_bb = missing_detection_filter(input_bb) # Load camera parameters. Normalize cx, cy, fx, fy by image size. cam_file = "../config/camera/hsr_grasp_camera.yaml" params = yaml.full_load(open(cam_file)) dim = params["image_dim"] cx, cy = params["cx"]/dim[1], params["cy"]/dim[0] fx, fy = params["fx"]/dim[1], params["fy"]/dim[0] # Find x0, xf, y0, yf, w0/wf, h0/hf, and Delta X, Y from two observations. dZ = camera_move[0, 2, :] x0, xf, y0, yf = in_bb[0,0,:], in_bb[1,0,:], in_bb[0,1,:], in_bb[1,1,:] w0, wf, h0, hf = in_bb[0,2,:], in_bb[1,2,:], in_bb[0,3,:], in_bb[1,3,:] # Solve for depth using w and h optical expansion then average result. depth_oew = dZ / (1 - (wf/w0)) depth_oeh = dZ / (1 - (hf/h0)) # Replace Inf. values with one. replace_idx = np.argwhere(np.isfinite(depth_oew)==False) for idx in replace_idx: depth_oew[idx] = 1 depth_oeh[idx] = 1 predictions = (depth_oew + depth_oeh) / 2 return predictions ================================================ FILE: odmd/data_gen/__init__.py ================================================ from .data_gen import * from .data_gen_util import * ================================================ FILE: odmd/data_gen/data_gen.py ================================================ # File: data_gen.py import IPython, os, numpy as np, yaml class DataGenerator: """ DataGenerator generates random data for depth boxes. """ def __init__(self, data_params): """ Args: data_params (dict): dictionary with the following keys and values: z_lim ([float,float]): minimum and maximum object start depth. move_max (list[float]): maximum position change for X, Y, Z size_range (list[[float,float]]): minimum and maximum width and height of object in world coordinates. num_pos (int): number of positions that object is viewed from. """ self.set_yaml_params(data_params) self.move_range = np.array(self.move_max) - self.move_min self.z_range = self.z_lim[1] - self.z_lim[0] self.size_range = self.size_lim[1] - self.size_lim[0] def set_yaml_params(self, yaml_file): self.set_params(yaml.full_load(open(yaml_file))) def set_params(self, params): for _, key in enumerate(params.keys()): setattr(self, key, params[key]) def initialize_data_gen(self, camera_config): self.set_yaml_params(camera_config) # See derivation 200520 page 33. cfx = self.cx / self.fx; self.xmina = -cfx self.xminb = self.move_max[0] +self.size_lim[1]/2 +self.move_max[2]*cfx wfx = (self.image_dim[1] - self.cx) / self.fx; self.xmaxa = wfx self.xmaxb = -self.move_max[0] -self.size_lim[1]/2-self.move_max[2]*wfx cfy = self.cy / self.fy; self.ymina = -cfy self.yminb = self.move_max[1] +self.size_lim[1]/2 +self.move_max[2]*cfy wfy = (self.image_dim[0] - self.cy) / self.fy; self.ymaxa = wfy self.ymaxb = -self.move_max[1] -self.size_lim[1]/2-self.move_max[2]*wfy self.xa = self.xmaxa - self.xmina; self.xb = self.xmaxb - self.xminb self.ya = self.ymaxa - self.ymina; self.yb = self.ymaxb - self.yminb self.fx_norm = self.fx / self.image_dim[1] self.cx_norm = self.cx / self.image_dim[1] self.fy_norm = self.fy / self.image_dim[0] self.cy_norm = self.cy / self.image_dim[0] # Check that bounds are valid. if (self.z_lim[0] * self.xmina + self.xminb > 0) or (self.z_lim[0] * self.ymina + self.yminb > 0): print("\n\nData gen. bounds are not valid! Try increasing z.\n\n") IPython.embed() def generate_object_examples(self, n_ex): # Find random initial and final object positions, then intermediate. rdm = np.random.rand(n_ex * 8) sign = np.random.randint(0,2, size=n_ex*3)*2 - 1 p = np.zeros(shape=(2,3,n_ex)) # Find initial Z position, then X(Z) and Y(Z) within the field of view. p[0][2] = self.z_lim[0] + self.z_range * rdm[:n_ex] p[0][0] = p[0][2]*self.xmina + self.xminb + \ rdm[n_ex:n_ex*2]*( p[0][2]*self.xa + self.xb ) p[0][1] = p[0][2]*self.ymina + self.yminb + \ rdm[n_ex*2:n_ex*3]*( p[0][2]*self.ya + self.yb ) for i in range(3): p[1][i] = p[0][i] + sign[n_ex*i:n_ex*(i+1)] * (self.move_min[i] + \ self.move_range[i] *rdm[n_ex*(i+3):n_ex*(i+4)]) if self.end_swap: # For greater sample diversity, switch start / end points randomly. swap = np.argwhere(np.random.randint(0,2, size=n_ex)) p[[1,0],:,swap] = p[[0,1],:,swap] if self.num_pos>2: p = self.add_intermediate_positions(p) # Determine camera movement for each position. # Note: camera movement is opposite (-1) of object movement p. movement = (p[-1] - p)[:-1] # Find random height and width of objects. s = [self.size_lim[0] + self.size_range * rdm[n_ex*(i+6):n_ex*(i+7)] for i in range(2)] obj_examples = {"positions": p, "camera_movement": np.array(movement), "sizes": s, "n_ex": n_ex, "n_positions": self.num_pos} bb = self.find_image_bb_from_objects(obj_examples) return obj_examples, bb def add_intermediate_positions(self, p_init): # Add intermediate object positions between initial and final pose. n_ex = len(p_init[0][0]) rdm = np.random.rand(n_ex * 3 * (self.num_pos - 2)) dp = p_init[1] - p_init[0] p = np.zeros(shape=(self.num_pos-2,3,n_ex)) for i in range(3): for j in range(0,self.num_pos-2): p[j][i] = p_init[0][i] + dp[i] \ * rdm[n_ex*(i+j*3):n_ex*(i+1+j*3)] # Sort intermediate points to be monotonicly increasing or decreasing. p.sort(axis=0) descend = dp<0 p[:,descend] = p[::-1][:,descend] p_out = np.zeros(shape=(self.num_pos,3,n_ex)) p_out [[0,-1]] = [p_init[0], p_init[-1]] p_out[1:-1] = p return p_out def find_image_bb_from_objects(self, objects): # Find image bounding boxes for each object and position. s = objects['sizes'] bboxes = [[] for i in range(objects['n_positions'])] for i, p in enumerate(objects['positions']): xc = p[0]*self.fx_norm/p[2] + self.cx_norm yc = p[1]*self.fy_norm/p[2] + self.cy_norm w = s[0]*self.fx_norm/p[2] h = s[1]*self.fy_norm/p[2] z = p[2] bboxes[i] = [xc,yc,w,h,z] image_bb = {"bboxes": np.array(bboxes), "n_positions": objects["n_positions"], "n_ex": objects["n_ex"], "image_dim": self.image_dim, "fx_norm": self.fx_norm, "fy_norm": self.fy_norm, "bbox_format": "[xc_norm; yc_norm; w_norm; h_norm; Z]"} return image_bb ================================================ FILE: odmd/data_gen/data_gen_util.py ================================================ # File: data_gen_util.py import os, IPython, numpy as np def add_perturbations(bb_3D, bb, odmd_data): if odmd_data.perturb: if odmd_data.std_dev > 0: bb = bounding_box_perturbation(bb, odmd_data.std_dev) if odmd_data.shuffle > 0: bb = bounding_box_shuffle(bb, odmd_data.shuffle) if odmd_data.cam_dev > 0: bb_3D = camera_move_perturbation(bb_3D, odmd_data.cam_dev) return bb_3D, bb def bounding_box_perturbation(bb, std_dev): # Add random noise to bounding box. dev = np.random.normal(scale=std_dev,size=(bb["n_positions"],4,bb["n_ex"])) bb["bboxes"][:,:4,:] += dev return bb def bounding_box_shuffle(bb, shuffle): # Randomly shuffle a percentage of bounding boxes to learn data selection. dim = bb["bboxes"].shape n_shuffle = int(dim[2] * shuffle) change_idx = np.random.choice(range(dim[2]), n_shuffle, replace=False) replace_idx = np.random.choice(range(dim[2]), n_shuffle) position_idx = np.random.choice(range(dim[0]), n_shuffle) bb["bboxes"][position_idx,:4,change_idx] = \ bb["bboxes"][position_idx,:4,replace_idx] return bb def camera_move_perturbation(bb_3D, cam_dev): dev = np.random.normal(scale=cam_dev,size=(bb_3D["n_positions"] - 1, 3, bb_3D["n_ex"])) bb_3D["camera_movement"] += dev return bb_3D def bb_to_inputs(bb_3D, bb, n_obs=10): # Prepare input data based on the number of observations used. idx = np.round(np.linspace(0, bb_3D["n_positions"]-1, n_obs)).astype("int") input_bb = np.array(bb["bboxes"])[idx,:-1] camera_move = np.array(bb_3D["camera_movement"])[idx[:-1]] labels = np.array(bb["bboxes"])[-1][-1] # Can also use more labels if training for intermediate depth: # labels = np.array(bb["bboxes"])[:,-1] return input_bb, camera_move, labels