Repository: griffbr/ODMD
Branch: main
Commit: b348a8017479
Files: 38
Total size: 31.8 KB
Directory structure:
gitextract_0g3k5q3j/
├── .gitignore
├── README.md
├── config/
│ ├── camera/
│ │ ├── ODMS_camera.yaml
│ │ └── hsr_grasp_camera.yaml
│ ├── data_gen/
│ │ ├── ODMS.yaml
│ │ └── standard_data.yaml
│ ├── model/
│ │ ├── DBox.yaml
│ │ └── DBox_absolute.yaml
│ └── train/
│ └── train_demo.yaml
├── data/
│ ├── odmd/
│ │ ├── test/
│ │ │ ├── test_normal.pk
│ │ │ ├── test_perturb_detection.pk
│ │ │ ├── test_perturb_motion.pk
│ │ │ └── test_robot.pk
│ │ └── val/
│ │ ├── val_normal.pk
│ │ ├── val_perturb_detection.pk
│ │ ├── val_perturb_motion.pk
│ │ └── val_robot.pk
│ └── odms_detection/
│ ├── test/
│ │ ├── test_odms_detect_driving.pk
│ │ ├── test_odms_detect_normal.pk
│ │ ├── test_odms_detect_perturb.pk
│ │ └── test_odms_detect_robot.pk
│ └── val/
│ ├── val_odms_detect_driving.pk
│ ├── val_odms_detect_normal.pk
│ ├── val_odms_detect_perturb.pk
│ └── val_odms_detect_robot.pk
├── dbox/
│ ├── __init__.py
│ ├── dbox.py
│ └── model_utils.py
├── demo/
│ ├── demo_DBox_eval.py
│ ├── demo_DBox_train.py
│ ├── demo_datagen.py
│ └── demo_dataset_eval.py
└── odmd/
├── __init__.py
├── closed_form/
│ ├── __init__.py
│ └── closed_form.py
└── data_gen/
├── __init__.py
├── data_gen.py
└── data_gen_util.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
.DS_Store
*.pyc
*.swp
/data/example_generated_data/
/results/*
================================================
FILE: README.md
================================================
# ODMD Dataset
ODMD is the first dataset for learning **O**bject **D**epth via **M**otion and **D**etection. ODMD training data are configurable and extensible, with each training example consisting of a series of object detection bounding boxes, camera movement distances, and ground truth object depth. As a benchmark evaluation, we provide four ODMD validation and test sets with 21,600 examples in multiple domains, and we also convert 15,650 examples from the [ODMS benchmark](https://github.com/griffbr/odms "ODMS dataset website") for detection. In our paper, we use a single ODMD-trained network with object detection *or* segmentation to achieve state-of-the-art results on existing driving and robotics benchmarks and estimate object depth from a camera phone, demonstrating how ODMD is a viable tool for monocular depth estimation in a variety of mobile applications.
Contact: Brent Griffin (griffb at umich dot edu)
__Depth results using a camera phone.__

## Using ODMD
__Run__ ``./demo/demo_datagen.py`` to generate random ODMD data to train or test your model.
Example data generation and camera configurations are provided in the ``./config/`` folder.
``demo_datagen.py`` has the option to save data into a static dataset for repeated use.
[native Python]
__Run__ ``./demo/demo_dataset_eval.py`` to evaluate your model on the ODMD validation and test sets.
``demo_dataset_eval.py`` has an example evaluation for the BoxLS baseline and instructions for using our detection-based version of [ODMS](https://github.com/griffbr/ODMS "ODMS dataset website").
Results are saved in the ``./results/`` folder.
[native Python]
## Benchmark
| Method | Normal | Perturb Camera | Perturb Detect | Robot | All |
| --------------- | --------------- | --------------- | --------------- | --------------- | --------------- |
| [DBox](https://arxiv.org/abs/2103.01468 "CVPR 2021 Paper") | 1.73 | 2.45 | 2.54 | **11.17** | **4.47** |
| [DBoxAbs](https://arxiv.org/abs/2103.01468 "CVPR 2021 Paper") | 1.11 | **2.05** | **1.75** | 13.29 | 4.55 |
| [BoxLS](https://arxiv.org/abs/2103.01468 "CVPR 2021 Paper") | **0.00** | 4.47 | 21.60 | 21.23 | 11.83 |
Is your technique missing although it's published and the code is public? Let us know and we'll add it.
## Using DBox Method
__Run__ ``./demo/demo_dataset_DBox_train.py`` to train your own DBox model using ODMD.
__Run__ ``./demo/demo_dataset_DBox_eval.py`` after training to evaluate your DBox model.
Example training and DBox model configurations are provided in the ``./config/`` folder.
Models are saved in the ``./results/model/`` folder.
``demo_dataset_DBox_eval.py`` also has instructions for using our pretrained DBox model.
[native Python, has Torch dependency]
## Publication
Please cite our [paper](https://openaccess.thecvf.com/content/CVPR2021/html/Griffin_Depth_From_Camera_Motion_and_Object_Detection_CVPR_2021_paper.html "Depth from Camera Motion and Object Detection pdf") if you find it useful for your research.
```
@inproceedings{GrCoCVPR21,
author = {Griffin, Brent A. and Corso, Jason J.},
booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
title = {Depth from Camera Motion and Object Detection},
year = {2021}
}
```
__CVPR 2021 presentation video:__ https://youtu.be/bgEGjt9NU9w
[](https://youtu.be/bgEGjt9NU9w)
## Use
This code is available for non-commercial research purposes only.
================================================
FILE: config/camera/ODMS_camera.yaml
================================================
camera_name: Pinhole ODMS Camera
image_dim: [480,640]
fx: 240.5
fy: 240.5
cx: 320.5
cy: 240.5
================================================
FILE: config/camera/hsr_grasp_camera.yaml
================================================
camera_name: HSR Grasp Camera
image_dim: [480,640]
fx: 205.5
fy: 205.5
cx: 320.5
cy: 240.5
================================================
FILE: config/data_gen/ODMS.yaml
================================================
data_description: Z motion for ODMS
z_lim: [0.55,1]
move_max: [0,0,0.4625]
size_lim: [0.01,0.175]
move_min: [0,0,0.05]
num_pos: 10
perturb: True
std_dev: 0.001
cam_dev: 0.01
shuffle: 0.1
end_swap: True
================================================
FILE: config/data_gen/standard_data.yaml
================================================
data_description: XYZ motion based on robot-collected data
z_lim: [0.55,1]
move_max: [0.25,0.175,0.325]
size_lim: [0.01,0.175]
move_min: [0,0,0.05]
num_pos: 10
perturb: True
std_dev: 0.001
cam_dev: 0.01
shuffle: 0.1
end_swap: True
================================================
FILE: config/model/DBox.yaml
================================================
model_name: DBox
params:
hidden_sz: 128
input_obs: 1
fcs: [256,256,256,256,256,256]
fc_drop: 0
peep: True
all_loss: False
prediction: normalized
sequence_in: True
sequence_dist: True
================================================
FILE: config/model/DBox_absolute.yaml
================================================
model_name: DBox
params:
hidden_sz: 128
input_obs: 1
fcs: [256,256,256,256,256,256]
fc_drop: 0
peep: True
all_loss: False
prediction: absolute
sequence_in: True
sequence_dist: False
================================================
FILE: config/train/train_demo.yaml
================================================
batch_size: 512
train_iter: 100000
n_display: 1000
n_train_model_saves: 10
================================================
FILE: dbox/__init__.py
================================================
from .dbox import *
from .model_utils import *
================================================
FILE: dbox/dbox.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import IPython, math, os
# LSTM design inspired by https://towardsdatascience.com/building-a-lstm-by-hand-on-pytorch-59c02a4ec091
def pass_arg_for_model(model_name, **kwargs):
if model_name == "DBox":
model = DBox(**kwargs)
return model
class DBox(nn.Module):
def __init__(self,input_obs,hidden_sz,n_obs,fcs=[1],fc_drop=0,peep=False,
all_loss=False):
super(DBox, self).__init__()
# LSTM.
input_sz = input_obs*7
self.idx = [[i+j for j in range(input_obs)]
for i in range(n_obs-input_obs+1)]
self.n_in_obs = input_obs
self.input_sz = input_sz
self.hidden_sz = hidden_sz
self.W = nn.Parameter(torch.Tensor(input_sz, hidden_sz * 4))
self.U = nn.Parameter(torch.Tensor(hidden_sz, hidden_sz * 4))
self.bias = nn.Parameter(torch.Tensor(hidden_sz * 4))
self.peephole = peep
self.all_loss = all_loss
if self.all_loss: self.n_predict = len(self.idx)
else: self.n_predict = 1
if peep:
self.V = nn.Parameter(torch.Tensor(hidden_sz, hidden_sz * 4))
self.cat_sz = n_obs * 7
self.init_weights()
# Linear layers.
self.n_fc = len(fcs)
self.relu = nn.ReLU(inplace=True)
self.linears = nn.ModuleList([nn.Linear(hidden_sz+self.cat_sz,fcs[0])])
for i in range(self.n_fc-1):
self.linears.append(nn.Linear(fcs[i] + self.cat_sz, fcs[i+1]))
self.linears.append(nn.Linear(fcs[-1],1))
# Dropout.
if fc_drop > 0:
self.drop = True
self.dropout = nn.Dropout(p=fc_drop)
else: self.drop = False
self.return_info = False
def init_weights(self):
stdv = 1.0 / math.sqrt(self.hidden_sz)
for weight in self.parameters():
weight.data.uniform_(-stdv, stdv)
def forward(self, x, init_state=None):
"""Assumes x is of shape (batch, sequence, feature)"""
bs, seq_sz, _ = x.size()
hidden_seq = []
if init_state is None:
h_t, c_t = (torch.zeros(bs, self.hidden_sz).to(x.device),
torch.zeros(bs, self.hidden_sz).to(x.device))
else:
h_t, c_t = init_state
HS = self.hidden_sz
#for t in range(seq_sz):
# x_t = x[:, t, :]
for idx in self.idx:
x_t = x[:, idx, :].reshape(bs, 1, self.input_sz)[:,0,:]
# Batch computations into single matrix multiplication.
if self.peephole:
gates = x_t @ self.W + h_t @ self.U + self.bias + c_t @ self.V
else:
gates = x_t @ self.W + h_t @ self.U + self.bias
o_t = torch.sigmoid(gates[:, HS*3:]) # output
i_t, f_t, g_t = (
torch.sigmoid(gates[:, :HS]), # input
torch.sigmoid(gates[:, HS:HS*2]), # forget
torch.tanh(gates[:, HS*2:HS*3]),
)
c_t = f_t * c_t + i_t * g_t
if self.peephole:
gates = x_t @ self.W + h_t @ self.U + self.bias + c_t @ self.V
o_t = torch.sigmoid(gates[:, HS*3:]) # output
h_t = o_t * torch.tanh(c_t)
if self.return_info or self.all_loss:
hidden_seq.append(h_t.unsqueeze(0))
# Linear layers.
x_cat = x.reshape(1, bs, self.cat_sz)
d_set = torch.zeros(bs, self.n_predict)
if self.all_loss: h_t_set = hidden_seq
else: h_t_set = [h_t.unsqueeze(0)]
for j, d in enumerate(h_t_set):
for i in range(self.n_fc):
d = torch.cat((d, x_cat), 2)
d = self.linears[i](d)
d = self.relu(d)
if self.drop:
d = self.dropout(d)
d = self.linears[-1](d)
d_set[:,j] = d.reshape(-1)
if self.return_info:
hidden_seq = torch.cat(hidden_seq, dim=0)
# Reshape from (sequence, batch, feature) to (batch, seq, feature)
hidden_seq = hidden_seq.transpose(0, 1).contiguous()
return d_set, hidden_seq, (h_t, c_t)
else:
return d_set
================================================
FILE: dbox/model_utils.py
================================================
import torch, os, IPython, numpy as np, yaml
from copy import deepcopy
from .dbox import *
def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
def load_model(model_config, n_pos=10):
m_param = yaml.full_load(open(model_config))
m_param["params"]["n_obs"] = n_pos
# Take model configuration parameters and set up network architecture.
net = pass_arg_for_model(m_param["model_name"], **m_param["params"])
# Randomly select between both GPUs (distribute training).
gpus = ["cuda:0", "cuda:1"]
idx = np.random.randint(0,2)
try:
device = torch.device(gpus[idx])
net.to(device)
except:
try:
device = torch.device(gpus[idx-1])
net.to(device)
except:
device = torch.device("cpu")
net.to(device)
m_param["n_predict"] = net.n_predict
return net, device, m_param
def load_training_params(train_config):
train = yaml.full_load(open(train_config))
train["display_iter"] = int(train["train_iter"] / train["n_display"])
train["save_iter"] = np.linspace(0, train["train_iter"],
train["n_train_model_saves"] + 1).astype(int)
return train
def load_weights(net, weight_file):
# Load weights onto already initialized network model.
try:
net.load_state_dict(torch.load(weight_file))
except:
net.load_state_dict(torch.load(weight_file, map_location=lambda
storage, loc: storage))
return net
class BoundingBoxToNetwork:
"""
BoudingBoxToNetwork converts bounding boxes to network input.
"""
def __init__(self, m_params, n_bat=1):
"""
Args:
n_obs (int): Number of bounding box and movement inputs to network.
n_bat (int): Number of input sets batched together.
"""
self.n_obs = m_params["params"]["n_obs"]
self.bb_in = self.n_obs * 4
#self.cam_in = (self.n_obs - 1) * 3
self.cam_in = self.n_obs * 3
self.in_sz = self.bb_in + self.cam_in
self.prediction = m_params["prediction"]
self.sequence = m_params["sequence_in"]
if self.sequence:
self.seq_dist = m_params["sequence_dist"]
self.n_predict = m_params["n_predict"]
self.set_batch(n_bat)
def set_batch(self, n_bat):
# Change network input batch size.
self.n_bat = n_bat
self.tmp_in = np.zeros(shape=(n_bat, self.in_sz), dtype="float32")
self.labels = torch.zeros(n_bat, self.n_predict, dtype=torch.float)
if self.sequence:
self.inputs =torch.zeros(self.n_bat,self.n_obs,7,dtype=torch.float)
self.tmp2=np.zeros(shape=(self.n_bat,self.n_obs,7),dtype="float32")
else:
self.inputs = torch.zeros(self.n_bat,self.in_sz,dtype=torch.float)
def bb_to_labels(self, bb_3D, bb):
# Convert bounding box data to network input and labels.
idx = np.linspace(0, bb_3D["n_positions"] - 1, self.n_obs, dtype=int)
cam_idx = np.linspace(0,bb_3D["n_positions"]-2,self.n_obs-1,dtype=int)
bb = missing_detection_network_filter(bb, idx)
self.tmp_in[:,:self.bb_in] = bb["bboxes"][idx,:-1].reshape(
self.bb_in, self.n_bat).T
self.tmp_in[:,-self.cam_in:-3] = bb_3D["camera_movement"][
cam_idx].reshape(self.cam_in-3, self.n_bat).T
"""
# Temp change for recovering results from old networks.
dim = bb_3D["camera_movement"].shape
temp = np.zeros(shape=(dim[0]+1,dim[1],dim[2]))
temp[:-1] = bb_3D["camera_movement"]
temp -= temp[0]
bb_3D["camera_movement"] = temp[1:]
self.tmp_in[:,-self.cam_in:] = np.array(bb_3D["camera_movement"])[
idx[1:]-1].reshape(self.cam_in, self.n_bat).T
"""
# Convert input and labels for prediction type.
self.labels[:] = torch.from_numpy(bb["bboxes"][-self.n_predict:,-1].T)
#self.labels[:] = torch.from_numpy(bb["bboxes"][-1][-1])
if self.prediction == "normalized":
self.norm = np.linalg.norm(self.tmp_in[:,
-self.cam_in:-self.cam_in+3], axis=1)
self.tmp_in[:,-self.cam_in:] = np.multiply(
self.tmp_in[:,-self.cam_in:],(1/self.norm)[:, np.newaxis])
self.labels[:] /= np.tile(self.norm, (self.n_predict,1)).T
#self.labels[:] /= self.norm
# Sequence separates each observation for LSTM input.
if self.sequence:
self.tmp2[:,:,:4] = self.tmp_in[:,:self.bb_in].reshape(
self.n_bat,self.n_obs,4)
self.tmp2[:,:,-3:] = self.tmp_in[:,-self.cam_in:].reshape(
self.n_bat, self.n_obs, 3)
# Sequential has each movement relative to previous observation.
if self.seq_dist:
self.tmp2[:,1:,-3:] = np.diff(self.tmp2[:,:,-3:], axis=1)
self.tmp2[:,0,-3:] = 0
self.inputs[:] = torch.from_numpy(self.tmp2)
else:
self.inputs[:] = torch.from_numpy(self.tmp_in)
def missing_detection_network_filter(bb, idx):
miss_idx = np.argwhere(bb["bboxes"][idx,2]==0)
n_miss = len(miss_idx)
if n_miss > 0:
print("\nMissing %i observations! Using nearest valid.\n" % n_miss)
bb_init = deepcopy(bb["bboxes"])
for idx in miss_idx:
# Replace misssing detection with closest valid observation.
obs_idx = np.argwhere((bb_init[:,2,idx[1]]==0)==False)
try:
replace_i = obs_idx[np.argmin(abs(obs_idx-idx[0]))][0]
except:
print("No replacement!")
replace_i = 0
bb["bboxes"][idx[0],:-1,idx[1]] =bb["bboxes"][replace_i,:-1,idx[1]]
return bb
================================================
FILE: demo/demo_DBox_eval.py
================================================
"""
Demonstration of how to evaluate DBox network on ODMD.
"""
import sys, os, IPython, torch, _pickle as pickle, numpy as np
file_dir = os.path.dirname(os.path.abspath(__file__))
os.chdir(file_dir)
sys.path.insert(0,"../")
import odmd, dbox
net_name = "DBox_demo"
#net_name = "DBox_pretrained" # Uncomment to run DBox model from paper.
model_idx = -1 # Can cycle through indices to find best validation performance.
# Select dataset to evaluate.
dataset = "odmd" # or "odms_detection" for ODMS dataset converted to detection.
eval_set = "val" # or "test" once model training and development are complete.
# Select configuration (more example settings from paper in config directory).
datagen_config = "../config/data_gen/standard_data.yaml"
camera_config = "../config/camera/hsr_grasp_camera.yaml"
train_config = "../config/train/train_demo.yaml"
dbox_config = "../config/model/DBox.yaml"
# Initiate data generator, model, data loader, and load weights.
odmd_data = odmd.data_gen.DataGenerator(datagen_config)
odmd_data.initialize_data_gen(camera_config)
net, device, m_params = dbox.load_model(dbox_config, odmd_data.num_pos)
bb2net = dbox.BoundingBoxToNetwork(m_params)
model_dir = os.path.join("../results", "model", net_name)
model_list = sorted([pt for pt in os.listdir(model_dir) if pt.endswith(".pt")])
net = dbox.load_weights(net, os.path.join(model_dir, model_list[model_idx]))
# Initiate dataset information.
set_dir = os.path.join("../data", dataset, eval_set)
set_list = sorted([pk for pk in os.listdir(set_dir) if pk.endswith(".pk")])
percent_error=[]; abs_error=[]; predictions_all=[]
with torch.no_grad():
for test in set_list:
# Load data for specific set.
bb_data = pickle.load(open(os.path.join(set_dir, test), "rb"))
bb_3D, bb = bb_data["bb_3D"], bb_data["bb"]
# Run DBox with correct post-processing for configuration.
bb2net.set_batch(bb_3D["n_ex"])
bb2net.bb_to_labels(bb_3D, bb)
inputs = bb2net.inputs.to(device)
predictions = net(inputs).cpu().numpy()
if bb2net.prediction == "normalized":
predictions[:,0] *= bb2net.norm
depths = bb["bboxes"][-1][-1]
percent_error.append(np.mean( abs(predictions[:,0] - depths) / depths))
abs_error.append(np.mean(abs(predictions[:,0] - depths)))
predictions_all.append(predictions)
# Print out final results.
print("\nResults summary for ODMD %s sets." % eval_set)
for i, test_set in enumerate(set_list):
print("\n%s-%s:" % (test_set, eval_set))
print("Mean Percent Error: %.4f" % percent_error[i])
print("Mean Absolute Error: %.4f (m)" % abs_error[i])
# Generate final results file.
name = model_list[model_idx].split(".pt")[0]
data_name = "%s_%s" % (dataset, eval_set)
print("\nSaving %s results file for %s.\n" % (data_name, name))
result_data = {"Result Name": name, "Set List": set_list,
"Percent Error": percent_error, "Absolute Error": abs_error,
"Depth Estimates": predictions_all, "Dataset": data_name}
os.makedirs("../results/", exist_ok=True)
pickle.dump(result_data, open("../results/%s_%s.pk" % (name, data_name), "wb"))
================================================
FILE: demo/demo_DBox_train.py
================================================
"""
Demonstration of how to generate train DBox network on ODMD.
"""
import sys, os, IPython, torch
file_dir = os.path.dirname(os.path.abspath(__file__))
os.chdir(file_dir)
sys.path.insert(0,"../")
import odmd, dbox
net_name = "DBox_demo"
# Select configuration (more example settings from paper in config directory).
datagen_config = "../config/data_gen/standard_data.yaml"
camera_config = "../config/camera/hsr_grasp_camera.yaml"
train_config = "../config/train/train_demo.yaml"
dbox_config = "../config/model/DBox.yaml"
# Initiate data generator, model, training parameters, and data loader.
odmd_data = odmd.data_gen.DataGenerator(datagen_config)
odmd_data.initialize_data_gen(camera_config)
net, device, m_params = dbox.load_model(dbox_config, odmd_data.num_pos)
train = dbox.load_training_params(train_config)
bb2net = dbox.BoundingBoxToNetwork(m_params, train["batch_size"])
# Initiate training!
model_dir = os.path.join("../results", "model", net_name)
os.makedirs(model_dir, exist_ok=True)
criterion = torch.nn.L1Loss()
optimizer = torch.optim.Adam(net.parameters(), lr=0.001, weight_decay=0)
running_loss=0.0; ct=0
print("Starting training for %s." % net_name)
while ct < train["train_iter"]:
# Generate examples for ODMD training (repeat for each training iteration).
bb_3D, bb = odmd_data.generate_object_examples(bb2net.n_bat)
bb_3D, bb = odmd.data_gen.add_perturbations(bb_3D, bb, odmd_data)
# Network inputs and labels, forward pass, loss, and gradient.
bb2net.bb_to_labels(bb_3D, bb)
inputs, labels = bb2net.inputs.to(device), bb2net.labels.to(device)
outputs = net(inputs).to(device)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
running_loss += loss.item()
# Print progress details and save model at set interval.
ct += 1
if ct % train["display_iter"] == 0:
cur_loss = running_loss / train["display_iter"]
print("[%9d] loss: %.6f" % (ct, cur_loss))
running_loss = 0.0
if ct in train["save_iter"]:
torch.save(net.state_dict(), "%s/%s_%09d.pt" % (model_dir,net_name,ct))
print("[%9d] interval model saved." % ct)
================================================
FILE: demo/demo_datagen.py
================================================
"""
Demonstration of how to generate new training data on ODMD.
"""
import sys, os, IPython, _pickle as pickle
file_dir = os.path.dirname(os.path.abspath(__file__))
os.chdir(file_dir)
sys.path.insert(0,"../")
import odmd
# Select configuration (more example settings from paper in config directory).
datagen_config = "../config/data_gen/standard_data.yaml"
camera_config = "../config/camera/hsr_grasp_camera.yaml"
# Other data generation settings.
n_examples = 20 # Configure for batch size if training.
save_examples = False
set_name = "example_data_gen"
# Initiate data generator.
odmd_data = odmd.data_gen.DataGenerator(datagen_config)
odmd_data.initialize_data_gen(camera_config)
# Generate examples for ODMD training (repeat for each training iteration).
bb_3D, bb = odmd_data.generate_object_examples(n_examples)
bb_3D, bb = odmd.data_gen.add_perturbations(bb_3D, bb, odmd_data)
bboxes, camera_movements, depths = odmd.data_gen.bb_to_inputs(bb_3D, bb,
odmd_data.num_pos)
"""
Use generated data to train your own network to predict depths given bboxes
and camera_movements. See paper for ideas on possible initial configurations.
"""
# Save generated examples as a static dataset (optional).
if save_examples:
result_dir = "../data/example_generated_data"
os.makedirs(result_dir, exist_ok=True)
p_data = {"test_name": set_name, "bb_3D": bb_3D, "bb": bb}
pickle.dump(p_data, open(os.path.join(result_dir, "%s.pk" % set_name), 'wb'))
================================================
FILE: demo/demo_dataset_eval.py
================================================
"""
Demonstration of how to evaluate a depth estimation model on ODMD.
"""
import sys, os, IPython, numpy as np, _pickle as pickle
file_dir = os.path.dirname(os.path.abspath(__file__))
os.chdir(file_dir)
sys.path.insert(0,"../")
import odmd
# Select dataset to evaluate.
dataset = "odmd" # or "odms_detection" for ODMS dataset converted to detection.
eval_set = "val" # or "test" once model training and development are complete.
# Misc. initialization.
n_observations = 10
set_dir = os.path.join("../data", dataset, eval_set)
set_list = sorted([pk for pk in os.listdir(set_dir) if pk.endswith(".pk")])
percent_error=[]; abs_error=[]; predictions_all=[]
for test in set_list:
# Load data for specific set.
bb_data = pickle.load(open(os.path.join(set_dir, test), "rb"))
bb_3D, bb = bb_data["bb_3D"], bb_data["bb"]
bboxes, camera_movements, depths = odmd.data_gen.bb_to_inputs(bb_3D, bb,
n_observations)
"""
Use your own depth estimation model here (to replace Box_LS):
"""
predictions = odmd.closed_form.Box_LS(bboxes, camera_movements)
percent_error.append(np.mean( abs(predictions - depths) / depths))
abs_error.append(np.mean(abs(predictions - depths)))
predictions_all.append(depths)
# Print out final results.
print("\nResults summary for ODMD %s sets." % eval_set)
for i, test_set in enumerate(set_list):
print("\n%s-%s:" % (test_set, eval_set))
print("Mean Percent Error: %.4f" % percent_error[i])
print("Mean Absolute Error: %.4f (m)" % abs_error[i])
# Generate final results file.
name = "Box_LS"
data_name = "%s_%s" % (dataset, eval_set)
result_data = {"Result Name": name, "Set List": set_list,
"Percent Error": percent_error, "Absolute Error": abs_error,
"Depth Estimates": predictions_all, "Dataset": data_name}
os.makedirs("../results/", exist_ok=True)
pickle.dump(result_data, open("../results/%s_%s.pk" % (name, data_name), "wb"))
================================================
FILE: odmd/__init__.py
================================================
import odmd.data_gen
import odmd.closed_form
================================================
FILE: odmd/closed_form/__init__.py
================================================
from .closed_form import *
================================================
FILE: odmd/closed_form/closed_form.py
================================================
# File: analytic_model.py
import os, IPython, numpy as np, yaml
from copy import deepcopy
def Box_LS(input_bb, camera_move, n_obs=10):
input_bb = missing_detection_filter(input_bb)
# Find Ax = b least-squares solution (see equation sheet for details).
n_examples = len(input_bb[0,0])
predictions = np.zeros(n_examples)
A = np.zeros(shape=(2*n_obs, 3))
A[:n_obs,1]=1; A[n_obs:,2] = 1
b = np.zeros(2*n_obs)
z = np.zeros(n_obs)
for i in range(n_examples):
w = input_bb[:,2,i]
h = input_bb[:,3,i]
z[:-1] = camera_move[:,2,i]
b = np.concatenate((w*z, h*z))
A[:,0] = np.concatenate((w, h))
try:
x = np.matmul(np.matmul(np.linalg.inv(np.matmul(A.T,A)),A.T),b)
except:
print("Warning! Matrix A.T A is not invertable! x is not solved.")
x = np.zeros(3)
predictions[i] = x[0]
return predictions
def missing_detection_filter(input_bb):
miss_idx = np.argwhere(input_bb[:,2,:] == 0)
n_miss = len(miss_idx)
if n_miss > 0:
print("\nMissing %i observations! Using nearest valid.\n" % n_miss)
bb_init = deepcopy(input_bb)
for idx in miss_idx:
# Replace missing detection with closest valid obsesrvation.
obs_idx = np.argwhere((bb_init[:,2,idx[1]]==0)==False)
try:
replace_idx = obs_idx[np.argmin(abs(obs_idx-idx[0]))][0]
except:
print("No observation at all!")
replace_idx = 0
input_bb[idx[0],:,idx[1]] = input_bb[replace_idx,:,idx[1]]
return input_bb
def bb_m_parallax(input_bb, camera_move):
# NOTE: Be sure that input number of observations is two!
in_bb = missing_detection_filter(input_bb)
# Using deltaZ neq 0 method from derivation.
# Load camera parameters. Normalize cx, cy, fx, fy by image size.
cam_file = "../config/camera/hsr_grasp_camera.yaml"
params = yaml.full_load(open(cam_file))
dim = params["image_dim"]
cx, cy = params["cx"]/dim[1], params["cy"]/dim[0]
fx, fy = params["fx"]/dim[1], params["fy"]/dim[0]
# Find x0, xf, y0, yf, w0/wf, h0/hf, and Delta X, Y from two observations.
dX, dY = camera_move[0, 0, :], camera_move[0, 1, :]
x0, xf, y0, yf = in_bb[0,0,:], in_bb[1,0,:], in_bb[0,1,:], in_bb[1,1,:]
w0, wf, h0, hf = in_bb[0,2,:], in_bb[1,2,:], in_bb[0,3,:], in_bb[1,3,:]
# Solve for depth using x and y motion parallax then average result.
depth_mpx = dX*fx / ((xf-cx) - (x0-cx)*(wf/w0))
depth_mpy = dY*fy / ((yf-cy) - (y0-cy)*(hf/h0))
# Replace Inf. values with one.
replace_idx = np.argwhere(np.isfinite(depth_mpx)==False)
for idx in replace_idx:
depth_mpx[idx] = 1
depth_mpy[idx] = 1
"""
# No deltaZ version:
depth_mpx = dX*fx / (xf - x0)
depth_mpy = dY*fy / (yf - y0)
"""
predictions = (depth_mpx + depth_mpy) / 2
return predictions
def bb_opt_expansion(input_bb, camera_move):
# NOTE: Be sure that input number of observations is two!
in_bb = missing_detection_filter(input_bb)
# Load camera parameters. Normalize cx, cy, fx, fy by image size.
cam_file = "../config/camera/hsr_grasp_camera.yaml"
params = yaml.full_load(open(cam_file))
dim = params["image_dim"]
cx, cy = params["cx"]/dim[1], params["cy"]/dim[0]
fx, fy = params["fx"]/dim[1], params["fy"]/dim[0]
# Find x0, xf, y0, yf, w0/wf, h0/hf, and Delta X, Y from two observations.
dZ = camera_move[0, 2, :]
x0, xf, y0, yf = in_bb[0,0,:], in_bb[1,0,:], in_bb[0,1,:], in_bb[1,1,:]
w0, wf, h0, hf = in_bb[0,2,:], in_bb[1,2,:], in_bb[0,3,:], in_bb[1,3,:]
# Solve for depth using w and h optical expansion then average result.
depth_oew = dZ / (1 - (wf/w0))
depth_oeh = dZ / (1 - (hf/h0))
# Replace Inf. values with one.
replace_idx = np.argwhere(np.isfinite(depth_oew)==False)
for idx in replace_idx:
depth_oew[idx] = 1
depth_oeh[idx] = 1
predictions = (depth_oew + depth_oeh) / 2
return predictions
================================================
FILE: odmd/data_gen/__init__.py
================================================
from .data_gen import *
from .data_gen_util import *
================================================
FILE: odmd/data_gen/data_gen.py
================================================
# File: data_gen.py
import IPython, os, numpy as np, yaml
class DataGenerator:
"""
DataGenerator generates random data for depth boxes.
"""
def __init__(self, data_params):
"""
Args:
data_params (dict): dictionary with the following keys and values:
z_lim ([float,float]): minimum and maximum object start depth.
move_max (list[float]): maximum position change for X, Y, Z
size_range (list[[float,float]]): minimum and maximum width and
height of object in world coordinates.
num_pos (int): number of positions that object is viewed from.
"""
self.set_yaml_params(data_params)
self.move_range = np.array(self.move_max) - self.move_min
self.z_range = self.z_lim[1] - self.z_lim[0]
self.size_range = self.size_lim[1] - self.size_lim[0]
def set_yaml_params(self, yaml_file):
self.set_params(yaml.full_load(open(yaml_file)))
def set_params(self, params):
for _, key in enumerate(params.keys()):
setattr(self, key, params[key])
def initialize_data_gen(self, camera_config):
self.set_yaml_params(camera_config)
# See derivation 200520 page 33.
cfx = self.cx / self.fx; self.xmina = -cfx
self.xminb = self.move_max[0] +self.size_lim[1]/2 +self.move_max[2]*cfx
wfx = (self.image_dim[1] - self.cx) / self.fx; self.xmaxa = wfx
self.xmaxb = -self.move_max[0] -self.size_lim[1]/2-self.move_max[2]*wfx
cfy = self.cy / self.fy; self.ymina = -cfy
self.yminb = self.move_max[1] +self.size_lim[1]/2 +self.move_max[2]*cfy
wfy = (self.image_dim[0] - self.cy) / self.fy; self.ymaxa = wfy
self.ymaxb = -self.move_max[1] -self.size_lim[1]/2-self.move_max[2]*wfy
self.xa = self.xmaxa - self.xmina; self.xb = self.xmaxb - self.xminb
self.ya = self.ymaxa - self.ymina; self.yb = self.ymaxb - self.yminb
self.fx_norm = self.fx / self.image_dim[1]
self.cx_norm = self.cx / self.image_dim[1]
self.fy_norm = self.fy / self.image_dim[0]
self.cy_norm = self.cy / self.image_dim[0]
# Check that bounds are valid.
if (self.z_lim[0] * self.xmina + self.xminb > 0) or (self.z_lim[0] *
self.ymina + self.yminb > 0):
print("\n\nData gen. bounds are not valid! Try increasing z.\n\n")
IPython.embed()
def generate_object_examples(self, n_ex):
# Find random initial and final object positions, then intermediate.
rdm = np.random.rand(n_ex * 8)
sign = np.random.randint(0,2, size=n_ex*3)*2 - 1
p = np.zeros(shape=(2,3,n_ex))
# Find initial Z position, then X(Z) and Y(Z) within the field of view.
p[0][2] = self.z_lim[0] + self.z_range * rdm[:n_ex]
p[0][0] = p[0][2]*self.xmina + self.xminb + \
rdm[n_ex:n_ex*2]*( p[0][2]*self.xa + self.xb )
p[0][1] = p[0][2]*self.ymina + self.yminb + \
rdm[n_ex*2:n_ex*3]*( p[0][2]*self.ya + self.yb )
for i in range(3):
p[1][i] = p[0][i] + sign[n_ex*i:n_ex*(i+1)] * (self.move_min[i] + \
self.move_range[i] *rdm[n_ex*(i+3):n_ex*(i+4)])
if self.end_swap:
# For greater sample diversity, switch start / end points randomly.
swap = np.argwhere(np.random.randint(0,2, size=n_ex))
p[[1,0],:,swap] = p[[0,1],:,swap]
if self.num_pos>2: p = self.add_intermediate_positions(p)
# Determine camera movement for each position.
# Note: camera movement is opposite (-1) of object movement p.
movement = (p[-1] - p)[:-1]
# Find random height and width of objects.
s = [self.size_lim[0] + self.size_range * rdm[n_ex*(i+6):n_ex*(i+7)]
for i in range(2)]
obj_examples = {"positions": p, "camera_movement": np.array(movement),
"sizes": s, "n_ex": n_ex, "n_positions": self.num_pos}
bb = self.find_image_bb_from_objects(obj_examples)
return obj_examples, bb
def add_intermediate_positions(self, p_init):
# Add intermediate object positions between initial and final pose.
n_ex = len(p_init[0][0])
rdm = np.random.rand(n_ex * 3 * (self.num_pos - 2))
dp = p_init[1] - p_init[0]
p = np.zeros(shape=(self.num_pos-2,3,n_ex))
for i in range(3):
for j in range(0,self.num_pos-2):
p[j][i] = p_init[0][i] + dp[i] \
* rdm[n_ex*(i+j*3):n_ex*(i+1+j*3)]
# Sort intermediate points to be monotonicly increasing or decreasing.
p.sort(axis=0)
descend = dp<0
p[:,descend] = p[::-1][:,descend]
p_out = np.zeros(shape=(self.num_pos,3,n_ex))
p_out [[0,-1]] = [p_init[0], p_init[-1]]
p_out[1:-1] = p
return p_out
def find_image_bb_from_objects(self, objects):
# Find image bounding boxes for each object and position.
s = objects['sizes']
bboxes = [[] for i in range(objects['n_positions'])]
for i, p in enumerate(objects['positions']):
xc = p[0]*self.fx_norm/p[2] + self.cx_norm
yc = p[1]*self.fy_norm/p[2] + self.cy_norm
w = s[0]*self.fx_norm/p[2]
h = s[1]*self.fy_norm/p[2]
z = p[2]
bboxes[i] = [xc,yc,w,h,z]
image_bb = {"bboxes": np.array(bboxes),
"n_positions": objects["n_positions"],
"n_ex": objects["n_ex"], "image_dim": self.image_dim,
"fx_norm": self.fx_norm, "fy_norm": self.fy_norm,
"bbox_format": "[xc_norm; yc_norm; w_norm; h_norm; Z]"}
return image_bb
================================================
FILE: odmd/data_gen/data_gen_util.py
================================================
# File: data_gen_util.py
import os, IPython, numpy as np
def add_perturbations(bb_3D, bb, odmd_data):
if odmd_data.perturb:
if odmd_data.std_dev > 0:
bb = bounding_box_perturbation(bb, odmd_data.std_dev)
if odmd_data.shuffle > 0:
bb = bounding_box_shuffle(bb, odmd_data.shuffle)
if odmd_data.cam_dev > 0:
bb_3D = camera_move_perturbation(bb_3D, odmd_data.cam_dev)
return bb_3D, bb
def bounding_box_perturbation(bb, std_dev):
# Add random noise to bounding box.
dev = np.random.normal(scale=std_dev,size=(bb["n_positions"],4,bb["n_ex"]))
bb["bboxes"][:,:4,:] += dev
return bb
def bounding_box_shuffle(bb, shuffle):
# Randomly shuffle a percentage of bounding boxes to learn data selection.
dim = bb["bboxes"].shape
n_shuffle = int(dim[2] * shuffle)
change_idx = np.random.choice(range(dim[2]), n_shuffle, replace=False)
replace_idx = np.random.choice(range(dim[2]), n_shuffle)
position_idx = np.random.choice(range(dim[0]), n_shuffle)
bb["bboxes"][position_idx,:4,change_idx] = \
bb["bboxes"][position_idx,:4,replace_idx]
return bb
def camera_move_perturbation(bb_3D, cam_dev):
dev = np.random.normal(scale=cam_dev,size=(bb_3D["n_positions"] - 1, 3,
bb_3D["n_ex"]))
bb_3D["camera_movement"] += dev
return bb_3D
def bb_to_inputs(bb_3D, bb, n_obs=10):
# Prepare input data based on the number of observations used.
idx = np.round(np.linspace(0, bb_3D["n_positions"]-1, n_obs)).astype("int")
input_bb = np.array(bb["bboxes"])[idx,:-1]
camera_move = np.array(bb_3D["camera_movement"])[idx[:-1]]
labels = np.array(bb["bboxes"])[-1][-1]
# Can also use more labels if training for intermediate depth:
# labels = np.array(bb["bboxes"])[:,-1]
return input_bb, camera_move, labels