[
  {
    "path": ".gitignore",
    "content": ".DS_Store\n*.pyc\n*.swp\n/data/example_generated_data/\n/results/*"
  },
  {
    "path": "README.md",
    "content": "# ODMD Dataset\nODMD is the first dataset for learning **O**bject **D**epth via **M**otion and **D**etection. ODMD training data are configurable and extensible, with each training example consisting of a series of object detection bounding boxes, camera movement distances, and ground truth object depth. As a benchmark evaluation, we provide four ODMD validation and test sets with 21,600 examples in multiple domains, and we also convert 15,650 examples from the [ODMS benchmark](https://github.com/griffbr/odms \"ODMS dataset website\") for detection. In our paper, we use a single ODMD-trained network with object detection *or* segmentation to achieve state-of-the-art results on existing driving and robotics benchmarks and estimate object depth from a camera phone, demonstrating how ODMD is a viable tool for monocular depth estimation in a variety of mobile applications.\n\nContact: Brent Griffin (griffb at umich dot edu)\n\n__Depth results using a camera phone.__\n![alt text](./figure/example_ODMD_phone_results.jpg?raw=true \"Depth results using a camera phone\")\n\n## Using ODMD\n\n__Run__ ``./demo/demo_datagen.py`` to generate random ODMD data to train or test your model. <br />\nExample data generation and camera configurations are provided in the ``./config/`` folder. \n``demo_datagen.py`` has the option to save data into a static dataset for repeated use. <br />\n[native Python]\n\n__Run__ ``./demo/demo_dataset_eval.py`` to evaluate your model on the ODMD validation and test sets. <br />\n``demo_dataset_eval.py`` has an example evaluation for the Box<sub>LS</sub> baseline and instructions for using our detection-based version of [ODMS](https://github.com/griffbr/ODMS \"ODMS dataset website\"). \nResults are saved in the ``./results/`` folder. <br />\n[native Python]\n\n## Benchmark\n\n| Method | Normal | Perturb Camera | Perturb Detect | Robot | All |\n| --------------- | --------------- | --------------- | --------------- | --------------- | --------------- |\n| [DBox](https://arxiv.org/abs/2103.01468 \"CVPR 2021 Paper\") | 1.73 | 2.45 | 2.54 | **11.17** | **4.47** |\n| [DBox<sub>Abs</sub>](https://arxiv.org/abs/2103.01468 \"CVPR 2021 Paper\") | 1.11 | **2.05** | **1.75** | 13.29 | 4.55 |\n| [Box<sub>LS</sub>](https://arxiv.org/abs/2103.01468 \"CVPR 2021 Paper\") | **0.00** | 4.47 | 21.60 | 21.23 | 11.83 |\n\nIs your technique missing although it's published and the code is public? Let us know and we'll add it.\n\n## Using DBox Method\n\n__Run__ ``./demo/demo_dataset_DBox_train.py`` to train your own DBox model using ODMD. <br />\n__Run__ ``./demo/demo_dataset_DBox_eval.py`` after training to evaluate your DBox model. <br />\nExample training and DBox model configurations are provided in the ``./config/`` folder.\nModels are saved in the ``./results/model/`` folder. \n``demo_dataset_DBox_eval.py`` also has instructions for using our pretrained DBox model. <br />\n[native Python, has Torch dependency]\n\n## Publication\nPlease cite our [paper](https://openaccess.thecvf.com/content/CVPR2021/html/Griffin_Depth_From_Camera_Motion_and_Object_Detection_CVPR_2021_paper.html \"Depth from Camera Motion and Object Detection pdf\") if you find it useful for your research.\n```\n@inproceedings{GrCoCVPR21,\n  author = {Griffin, Brent A. and Corso, Jason J.},\n  booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},\n  title = {Depth from Camera Motion and Object Detection},\n  year = {2021}\n}\n```\n\n__CVPR 2021 presentation video:__ https://youtu.be/bgEGjt9NU9w\n\n[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/bgEGjt9NU9w/0.jpg)](https://youtu.be/bgEGjt9NU9w)\n\n## Use\n\nThis code is available for non-commercial research purposes only.\n"
  },
  {
    "path": "config/camera/ODMS_camera.yaml",
    "content": "camera_name: Pinhole ODMS Camera\nimage_dim: [480,640]\nfx: 240.5\nfy: 240.5\ncx: 320.5\ncy: 240.5"
  },
  {
    "path": "config/camera/hsr_grasp_camera.yaml",
    "content": "camera_name: HSR Grasp Camera\nimage_dim: [480,640]\nfx: 205.5\nfy: 205.5\ncx: 320.5\ncy: 240.5"
  },
  {
    "path": "config/data_gen/ODMS.yaml",
    "content": "data_description: Z motion for ODMS\nz_lim: [0.55,1]\nmove_max: [0,0,0.4625]\nsize_lim: [0.01,0.175]\nmove_min: [0,0,0.05]\nnum_pos: 10\nperturb: True\nstd_dev: 0.001\ncam_dev: 0.01\nshuffle: 0.1\nend_swap: True\n"
  },
  {
    "path": "config/data_gen/standard_data.yaml",
    "content": "data_description: XYZ motion based on robot-collected data\nz_lim: [0.55,1]\nmove_max: [0.25,0.175,0.325]\nsize_lim: [0.01,0.175]\nmove_min: [0,0,0.05]\nnum_pos: 10\nperturb: True\nstd_dev: 0.001\ncam_dev: 0.01\nshuffle: 0.1\nend_swap: True\n"
  },
  {
    "path": "config/model/DBox.yaml",
    "content": "model_name: DBox\nparams:\n  hidden_sz: 128\n  input_obs: 1\n  fcs: [256,256,256,256,256,256]\n  fc_drop: 0\n  peep: True\n  all_loss: False\nprediction: normalized\nsequence_in: True\nsequence_dist: True"
  },
  {
    "path": "config/model/DBox_absolute.yaml",
    "content": "model_name: DBox\nparams:\n  hidden_sz: 128\n  input_obs: 1\n  fcs: [256,256,256,256,256,256]\n  fc_drop: 0\n  peep: True\n  all_loss: False\nprediction: absolute\nsequence_in: True\nsequence_dist: False"
  },
  {
    "path": "config/train/train_demo.yaml",
    "content": "batch_size: 512\ntrain_iter: 100000\nn_display: 1000\nn_train_model_saves: 10"
  },
  {
    "path": "dbox/__init__.py",
    "content": "from .dbox import *\nfrom .model_utils import *"
  },
  {
    "path": "dbox/dbox.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport IPython, math, os\n\n# LSTM design inspired by https://towardsdatascience.com/building-a-lstm-by-hand-on-pytorch-59c02a4ec091\n\ndef pass_arg_for_model(model_name, **kwargs):\n\tif model_name == \"DBox\":\n\t\tmodel = DBox(**kwargs)\n\treturn model\n\nclass DBox(nn.Module):\n\tdef __init__(self,input_obs,hidden_sz,n_obs,fcs=[1],fc_drop=0,peep=False,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tall_loss=False):\n\t\tsuper(DBox, self).__init__()\n\t\t# LSTM.\n\t\tinput_sz = input_obs*7\n\t\tself.idx = [[i+j for j in range(input_obs)] \n\t\t\t\t\t\t\t\t\t\t\tfor i in range(n_obs-input_obs+1)]\n\t\tself.n_in_obs = input_obs\n\t\tself.input_sz = input_sz\n\t\tself.hidden_sz = hidden_sz\n\t\tself.W = nn.Parameter(torch.Tensor(input_sz, hidden_sz * 4))\t\n\t\tself.U = nn.Parameter(torch.Tensor(hidden_sz, hidden_sz * 4))\n\t\tself.bias = nn.Parameter(torch.Tensor(hidden_sz * 4))\n\t\tself.peephole = peep\n\t\tself.all_loss = all_loss\n\t\tif self.all_loss: self.n_predict = len(self.idx)\n\t\telse: self.n_predict = 1\n\t\tif peep:\n\t\t\tself.V = nn.Parameter(torch.Tensor(hidden_sz, hidden_sz * 4))\t\n\t\tself.cat_sz = n_obs * 7\n\t\tself.init_weights()\n\t\t# Linear layers.\n\t\tself.n_fc = len(fcs)\n\t\tself.relu = nn.ReLU(inplace=True)\n\t\tself.linears = nn.ModuleList([nn.Linear(hidden_sz+self.cat_sz,fcs[0])])\n\t\tfor i in range(self.n_fc-1):\n\t\t\tself.linears.append(nn.Linear(fcs[i] + self.cat_sz, fcs[i+1]))\n\t\tself.linears.append(nn.Linear(fcs[-1],1))\n\t\t# Dropout.\n\t\tif fc_drop > 0:\n\t\t\tself.drop = True\n\t\t\tself.dropout = nn.Dropout(p=fc_drop)\n\t\telse: self.drop = False\n\t\tself.return_info = False\n\n\tdef init_weights(self):\n\t\tstdv = 1.0 / math.sqrt(self.hidden_sz)\n\t\tfor weight in self.parameters():\n\t\t\tweight.data.uniform_(-stdv, stdv)\n\n\tdef forward(self, x, init_state=None):\n\t\t\"\"\"Assumes x is of shape (batch, sequence, feature)\"\"\"\n\t\tbs, seq_sz, _ = x.size()\n\t\thidden_seq = []\n\t\tif init_state is None:\n\t\t\th_t, c_t = (torch.zeros(bs, self.hidden_sz).to(x.device),\n\t\t\t\t\t\ttorch.zeros(bs, self.hidden_sz).to(x.device))\n\t\telse:\n\t\t\th_t, c_t = init_state\n\n\t\tHS = self.hidden_sz\n\t\t#for t in range(seq_sz):\n\t\t#\tx_t = x[:, t, :]\n\t\tfor idx in self.idx:\n\t\t\tx_t = x[:, idx, :].reshape(bs, 1, self.input_sz)[:,0,:]\n\t\t\t# Batch computations into single matrix multiplication.\n\t\t\tif self.peephole:\n\t\t\t\tgates = x_t @ self.W + h_t @ self.U + self.bias + c_t @ self.V\n\t\t\telse:\n\t\t\t\tgates = x_t @ self.W + h_t @ self.U + self.bias\n\t\t\t\to_t = torch.sigmoid(gates[:, HS*3:]) # output\n\t\t\ti_t, f_t, g_t = (\n\t\t\t\ttorch.sigmoid(gates[:, :HS]), # input\n\t\t\t\ttorch.sigmoid(gates[:, HS:HS*2]), # forget\n\t\t\t\ttorch.tanh(gates[:, HS*2:HS*3]),\n\t\t\t)\n\t\t\tc_t = f_t * c_t + i_t * g_t\n\t\t\tif self.peephole:\n\t\t\t\tgates = x_t @ self.W + h_t @ self.U + self.bias + c_t @ self.V\n\t\t\t\to_t = torch.sigmoid(gates[:, HS*3:]) # output\n\t\t\th_t = o_t * torch.tanh(c_t)\n\t\t\tif self.return_info or self.all_loss:\n\t\t\t\thidden_seq.append(h_t.unsqueeze(0))\n\n\t\t# Linear layers.\n\t\tx_cat = x.reshape(1, bs, self.cat_sz)\n\t\td_set = torch.zeros(bs, self.n_predict)\n\t\tif self.all_loss: h_t_set = hidden_seq\n\t\telse: h_t_set = [h_t.unsqueeze(0)]\n\t\tfor j, d in enumerate(h_t_set):\n\t\t\tfor i in range(self.n_fc):\n\t\t\t\td = torch.cat((d, x_cat), 2)\n\t\t\t\td = self.linears[i](d)\n\t\t\t\td = self.relu(d)\n\t\t\t\tif self.drop:\n\t\t\t\t\td = self.dropout(d)\n\t\t\td = self.linears[-1](d)\n\t\t\td_set[:,j] = d.reshape(-1)\n\n\t\tif self.return_info:\n\t\t\thidden_seq = torch.cat(hidden_seq, dim=0)\n\t\t\t# Reshape from (sequence, batch, feature) to (batch, seq, feature)\n\t\t\thidden_seq = hidden_seq.transpose(0, 1).contiguous()\n\t\t\treturn d_set, hidden_seq, (h_t, c_t)\n\t\telse:\n\t\t\treturn d_set\n"
  },
  {
    "path": "dbox/model_utils.py",
    "content": "import torch, os, IPython, numpy as np, yaml\nfrom copy import deepcopy\n\nfrom .dbox import *\n\ndef count_parameters(model):\n\treturn sum(p.numel() for p in model.parameters() if p.requires_grad)\n\ndef load_model(model_config, n_pos=10):\n\tm_param = yaml.full_load(open(model_config))\n\tm_param[\"params\"][\"n_obs\"] = n_pos\n\t# Take model configuration parameters and set up network architecture.\n\tnet = pass_arg_for_model(m_param[\"model_name\"], **m_param[\"params\"])\n\t# Randomly select between both GPUs (distribute training).\n\tgpus = [\"cuda:0\", \"cuda:1\"]\n\tidx = np.random.randint(0,2)\n\ttry:\n\t\tdevice = torch.device(gpus[idx])\n\t\tnet.to(device)\n\texcept:\n\t\ttry:\n\t\t\tdevice = torch.device(gpus[idx-1])\n\t\t\tnet.to(device)\n\t\texcept:\n\t\t\tdevice = torch.device(\"cpu\")\n\t\t\tnet.to(device)\n\tm_param[\"n_predict\"] = net.n_predict\n\treturn net, device, m_param\n\ndef load_training_params(train_config):\n\ttrain = yaml.full_load(open(train_config))\n\ttrain[\"display_iter\"] = int(train[\"train_iter\"] / train[\"n_display\"])\n\ttrain[\"save_iter\"] = np.linspace(0, train[\"train_iter\"], \n\t\t\t\t\t\t\t\ttrain[\"n_train_model_saves\"] + 1).astype(int)\n\treturn train\n\ndef load_weights(net, weight_file):\n\t# Load weights onto already initialized network model.\n\ttry:\n\t\tnet.load_state_dict(torch.load(weight_file))\n\texcept:\n\t\tnet.load_state_dict(torch.load(weight_file, map_location=lambda \n\t\t\t\t\t\t\t\t\t\t\t\t\t\tstorage, loc: storage))\n\treturn net\n\nclass BoundingBoxToNetwork:\n\t\"\"\"\n\tBoudingBoxToNetwork converts bounding boxes to network input.\n\t\"\"\"\n\n\tdef __init__(self, m_params, n_bat=1):\n\t\t\"\"\"\n\t\tArgs:\n\t\t\tn_obs (int): Number of bounding box and movement inputs to network.\n\t\t\tn_bat (int): Number of input sets batched together.\n\t\t\"\"\"\n\t\tself.n_obs = m_params[\"params\"][\"n_obs\"]\n\t\tself.bb_in = self.n_obs * 4\n\t\t#self.cam_in = (self.n_obs - 1) * 3\n\t\tself.cam_in = self.n_obs * 3\n\t\tself.in_sz = self.bb_in + self.cam_in\n\t\tself.prediction = m_params[\"prediction\"]\n\t\tself.sequence = m_params[\"sequence_in\"]\n\t\tif self.sequence:\n\t\t\tself.seq_dist = m_params[\"sequence_dist\"]\n\t\tself.n_predict = m_params[\"n_predict\"]\n\t\tself.set_batch(n_bat)\n\n\tdef set_batch(self, n_bat):\n\t\t# Change network input batch size.\n\t\tself.n_bat = n_bat\n\t\tself.tmp_in = np.zeros(shape=(n_bat, self.in_sz), dtype=\"float32\")\n\t\tself.labels = torch.zeros(n_bat, self.n_predict, dtype=torch.float)\n\t\tif self.sequence:\n\t\t\tself.inputs =torch.zeros(self.n_bat,self.n_obs,7,dtype=torch.float)\n\t\t\tself.tmp2=np.zeros(shape=(self.n_bat,self.n_obs,7),dtype=\"float32\")\n\t\telse:\n\t\t\tself.inputs = torch.zeros(self.n_bat,self.in_sz,dtype=torch.float)\n\n\tdef bb_to_labels(self, bb_3D, bb):\n\t\t# Convert bounding box data to network input and labels.\n\t\tidx = np.linspace(0, bb_3D[\"n_positions\"] - 1, self.n_obs, dtype=int)\n\t\tcam_idx = np.linspace(0,bb_3D[\"n_positions\"]-2,self.n_obs-1,dtype=int)\n\t\tbb = missing_detection_network_filter(bb, idx)\n\t\tself.tmp_in[:,:self.bb_in] = bb[\"bboxes\"][idx,:-1].reshape(\n\t\t\t\t\t\t\t\t\t\t\t\t\tself.bb_in, self.n_bat).T\n\t\tself.tmp_in[:,-self.cam_in:-3] = bb_3D[\"camera_movement\"][\n\t\t\t\t\t\t\t\tcam_idx].reshape(self.cam_in-3, self.n_bat).T\n\n\t\t\"\"\"\n\t\t# Temp change for recovering results from old networks.\n\t\tdim = bb_3D[\"camera_movement\"].shape\n\t\ttemp = np.zeros(shape=(dim[0]+1,dim[1],dim[2]))\n\t\ttemp[:-1] = bb_3D[\"camera_movement\"]\n\t\ttemp -= temp[0]\n\t\tbb_3D[\"camera_movement\"] = temp[1:]\n\t\tself.tmp_in[:,-self.cam_in:] = np.array(bb_3D[\"camera_movement\"])[\n\t\t\t\t\t\t\t\tidx[1:]-1].reshape(self.cam_in, self.n_bat).T\n\t\t\"\"\"\n\n\t\t# Convert input and labels for prediction type.\n\t\tself.labels[:] = torch.from_numpy(bb[\"bboxes\"][-self.n_predict:,-1].T)\n\t\t#self.labels[:] = torch.from_numpy(bb[\"bboxes\"][-1][-1])\n\t\tif self.prediction == \"normalized\":\n\t\t\tself.norm = np.linalg.norm(self.tmp_in[:,\n\t\t\t\t\t\t\t\t\t\t-self.cam_in:-self.cam_in+3], axis=1)\n\t\t\tself.tmp_in[:,-self.cam_in:] = np.multiply(\n\t\t\t\t\tself.tmp_in[:,-self.cam_in:],(1/self.norm)[:, np.newaxis])\n\t\t\tself.labels[:] /= np.tile(self.norm, (self.n_predict,1)).T\n\t\t\t#self.labels[:] /=  self.norm\n\t\t\n\t\t# Sequence separates each observation for LSTM input.\n\t\tif self.sequence:\n\t\t\tself.tmp2[:,:,:4] = self.tmp_in[:,:self.bb_in].reshape(\n\t\t\t\t\t\t\t\t\t\t\t\t\tself.n_bat,self.n_obs,4)\n\t\t\tself.tmp2[:,:,-3:] = self.tmp_in[:,-self.cam_in:].reshape(\n\t\t\t\t\t\t\t\t\t\t\t\tself.n_bat, self.n_obs, 3)\n\t\t\t# Sequential has each movement relative to previous observation.\n\t\t\tif self.seq_dist:\t\n\t\t\t\tself.tmp2[:,1:,-3:] = np.diff(self.tmp2[:,:,-3:], axis=1)\n\t\t\t\tself.tmp2[:,0,-3:] = 0\n\t\t\tself.inputs[:] = torch.from_numpy(self.tmp2)\n\t\telse:\n\t\t\tself.inputs[:] = torch.from_numpy(self.tmp_in)\n\ndef missing_detection_network_filter(bb, idx):\n\tmiss_idx = np.argwhere(bb[\"bboxes\"][idx,2]==0)\n\tn_miss = len(miss_idx)\n\tif n_miss > 0:\n\t\tprint(\"\\nMissing %i observations! Using nearest valid.\\n\" % n_miss)\n\t\tbb_init = deepcopy(bb[\"bboxes\"])\n\t\tfor idx in miss_idx:\n\t\t\t# Replace misssing detection with closest valid observation.\n\t\t\tobs_idx = np.argwhere((bb_init[:,2,idx[1]]==0)==False)\n\t\t\ttry:\n\t\t\t\treplace_i = obs_idx[np.argmin(abs(obs_idx-idx[0]))][0]\n\t\t\texcept:\n\t\t\t\tprint(\"No replacement!\")\n\t\t\t\treplace_i = 0\n\t\t\tbb[\"bboxes\"][idx[0],:-1,idx[1]] =bb[\"bboxes\"][replace_i,:-1,idx[1]]\n\treturn bb\n"
  },
  {
    "path": "demo/demo_DBox_eval.py",
    "content": "\"\"\"\nDemonstration of how to evaluate DBox network on ODMD.\n\"\"\"\n\nimport sys, os, IPython, torch, _pickle as pickle, numpy as np\nfile_dir = os.path.dirname(os.path.abspath(__file__))\nos.chdir(file_dir)\nsys.path.insert(0,\"../\")\nimport odmd, dbox\n\nnet_name = \"DBox_demo\"\n#net_name = \"DBox_pretrained\" # Uncomment to run DBox model from paper.\nmodel_idx = -1 # Can cycle through indices to find best validation performance.\n\n# Select dataset to evaluate.\ndataset = \"odmd\" # or \"odms_detection\" for ODMS dataset converted to detection.\neval_set = \"val\" # or \"test\" once model training and development are complete.\n\n# Select configuration (more example settings from paper in config directory).\ndatagen_config = \"../config/data_gen/standard_data.yaml\" \ncamera_config = \"../config/camera/hsr_grasp_camera.yaml\" \ntrain_config = \"../config/train/train_demo.yaml\"\ndbox_config = \"../config/model/DBox.yaml\"\n\n# Initiate data generator, model, data loader, and load weights.\nodmd_data = odmd.data_gen.DataGenerator(datagen_config)\nodmd_data.initialize_data_gen(camera_config)\nnet, device, m_params = dbox.load_model(dbox_config, odmd_data.num_pos)\nbb2net = dbox.BoundingBoxToNetwork(m_params)\nmodel_dir = os.path.join(\"../results\", \"model\", net_name)\nmodel_list = sorted([pt for pt in os.listdir(model_dir) if pt.endswith(\".pt\")])\nnet = dbox.load_weights(net, os.path.join(model_dir, model_list[model_idx]))\n\n# Initiate dataset information.\nset_dir = os.path.join(\"../data\", dataset, eval_set)\nset_list = sorted([pk for pk in os.listdir(set_dir) if pk.endswith(\".pk\")])\npercent_error=[]; abs_error=[]; predictions_all=[]\n\nwith torch.no_grad():\n\tfor test in set_list:\n\t\t# Load data for specific set.\n\t\tbb_data = pickle.load(open(os.path.join(set_dir, test), \"rb\"))\n\t\tbb_3D, bb = bb_data[\"bb_3D\"], bb_data[\"bb\"]\n\n\t\t# Run DBox with correct post-processing for configuration.\n\t\tbb2net.set_batch(bb_3D[\"n_ex\"])\n\t\tbb2net.bb_to_labels(bb_3D, bb)\n\t\tinputs = bb2net.inputs.to(device)\n\t\tpredictions = net(inputs).cpu().numpy()\n\t\tif bb2net.prediction == \"normalized\":\n\t\t\tpredictions[:,0] *= bb2net.norm\n\t\tdepths = bb[\"bboxes\"][-1][-1]\n\n\t\tpercent_error.append(np.mean( abs(predictions[:,0] - depths) / depths))\n\t\tabs_error.append(np.mean(abs(predictions[:,0] - depths)))\n\t\tpredictions_all.append(predictions)\n\n\t# Print out final results.\n\tprint(\"\\nResults summary for ODMD %s sets.\" % eval_set)\n\tfor i, test_set in enumerate(set_list):\n\t\tprint(\"\\n%s-%s:\" % (test_set, eval_set))\n\t\tprint(\"Mean Percent  Error: %.4f\" % percent_error[i]) \n\t\tprint(\"Mean Absolute Error: %.4f (m)\" % abs_error[i]) \n\n# Generate final results file.\nname = model_list[model_idx].split(\".pt\")[0]\ndata_name = \"%s_%s\" % (dataset, eval_set)\nprint(\"\\nSaving %s results file for %s.\\n\" % (data_name, name))\nresult_data = {\"Result Name\": name, \"Set List\": set_list, \n\t\t\t\t\"Percent Error\": percent_error, \"Absolute Error\": abs_error, \n\t\t\t\t\"Depth Estimates\": predictions_all, \"Dataset\": data_name}\nos.makedirs(\"../results/\", exist_ok=True)\npickle.dump(result_data, open(\"../results/%s_%s.pk\" % (name, data_name), \"wb\"))\n"
  },
  {
    "path": "demo/demo_DBox_train.py",
    "content": "\"\"\"\nDemonstration of how to generate train DBox network on ODMD.\n\"\"\"\n\nimport sys, os, IPython, torch\nfile_dir = os.path.dirname(os.path.abspath(__file__))\nos.chdir(file_dir)\nsys.path.insert(0,\"../\")\nimport odmd, dbox\n\nnet_name = \"DBox_demo\"\n\n# Select configuration (more example settings from paper in config directory).\ndatagen_config = \"../config/data_gen/standard_data.yaml\" \ncamera_config = \"../config/camera/hsr_grasp_camera.yaml\" \ntrain_config = \"../config/train/train_demo.yaml\"\ndbox_config = \"../config/model/DBox.yaml\"\n\n# Initiate data generator, model, training parameters, and data loader.\nodmd_data = odmd.data_gen.DataGenerator(datagen_config)\nodmd_data.initialize_data_gen(camera_config)\nnet, device, m_params = dbox.load_model(dbox_config, odmd_data.num_pos)\ntrain = dbox.load_training_params(train_config)\nbb2net = dbox.BoundingBoxToNetwork(m_params, train[\"batch_size\"])\n\n# Initiate training!\nmodel_dir = os.path.join(\"../results\", \"model\", net_name)\nos.makedirs(model_dir, exist_ok=True)\ncriterion = torch.nn.L1Loss()\noptimizer = torch.optim.Adam(net.parameters(), lr=0.001, weight_decay=0)\n\nrunning_loss=0.0; ct=0\nprint(\"Starting training for %s.\" % net_name)\nwhile ct < train[\"train_iter\"]:\n\n\t# Generate examples for ODMD training (repeat for each training iteration).\n\tbb_3D, bb = odmd_data.generate_object_examples(bb2net.n_bat)\n\tbb_3D, bb = odmd.data_gen.add_perturbations(bb_3D, bb, odmd_data)\n\n\t# Network inputs and labels, forward pass, loss, and gradient.\n\tbb2net.bb_to_labels(bb_3D, bb)\n\tinputs, labels = bb2net.inputs.to(device), bb2net.labels.to(device)\n\toutputs = net(inputs).to(device)\n\tloss = criterion(outputs, labels)\n\toptimizer.zero_grad()\n\tloss.backward()\n\toptimizer.step()\n\trunning_loss += loss.item()\n\n\t# Print progress details and save model at set interval.\n\tct += 1\n\tif ct % train[\"display_iter\"] == 0:\n\t\tcur_loss = running_loss / train[\"display_iter\"]\n\t\tprint(\"[%9d] loss: %.6f\" % (ct, cur_loss))\n\t\trunning_loss = 0.0\n\tif ct in train[\"save_iter\"]:\n\t\ttorch.save(net.state_dict(), \"%s/%s_%09d.pt\" % (model_dir,net_name,ct))\n\t\tprint(\"[%9d] interval model saved.\" % ct)\n"
  },
  {
    "path": "demo/demo_datagen.py",
    "content": "\"\"\"\nDemonstration of how to generate new training data on ODMD.\n\"\"\"\n\nimport sys, os, IPython, _pickle as pickle\nfile_dir = os.path.dirname(os.path.abspath(__file__))\nos.chdir(file_dir)\nsys.path.insert(0,\"../\")\nimport odmd\n\n# Select configuration (more example settings from paper in config directory).\ndatagen_config = \"../config/data_gen/standard_data.yaml\" \ncamera_config = \"../config/camera/hsr_grasp_camera.yaml\" \n\n# Other data generation settings.\nn_examples = 20 # Configure for batch size if training.\nsave_examples = False\nset_name = \"example_data_gen\"\n\n# Initiate data generator.\nodmd_data = odmd.data_gen.DataGenerator(datagen_config)\nodmd_data.initialize_data_gen(camera_config)\n\n# Generate examples for ODMD training (repeat for each training iteration).\nbb_3D, bb = odmd_data.generate_object_examples(n_examples)\nbb_3D, bb = odmd.data_gen.add_perturbations(bb_3D, bb, odmd_data)\nbboxes, camera_movements, depths = odmd.data_gen.bb_to_inputs(bb_3D, bb, \n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\todmd_data.num_pos)\n\n\"\"\"\nUse generated data to train your own network to predict depths given bboxes \nand camera_movements. See paper for ideas on possible initial configurations.\n\"\"\"\n\n# Save generated examples as a static dataset (optional).\nif save_examples:\n\tresult_dir = \"../data/example_generated_data\" \n\tos.makedirs(result_dir, exist_ok=True)\n\tp_data = {\"test_name\": set_name, \"bb_3D\": bb_3D, \"bb\": bb}\n\tpickle.dump(p_data, open(os.path.join(result_dir, \"%s.pk\" % set_name), 'wb'))\n"
  },
  {
    "path": "demo/demo_dataset_eval.py",
    "content": "\"\"\"\nDemonstration of how to evaluate a depth estimation model on ODMD.\n\"\"\"\n\nimport sys, os, IPython, numpy as np, _pickle as pickle\nfile_dir = os.path.dirname(os.path.abspath(__file__))\nos.chdir(file_dir)\nsys.path.insert(0,\"../\")\nimport odmd\n\n# Select dataset to evaluate.\ndataset = \"odmd\" # or \"odms_detection\" for ODMS dataset converted to detection.\neval_set = \"val\" # or \"test\" once model training and development are complete.\n\n# Misc. initialization.\nn_observations = 10\nset_dir = os.path.join(\"../data\", dataset, eval_set)\nset_list = sorted([pk for pk in os.listdir(set_dir) if pk.endswith(\".pk\")])\npercent_error=[]; abs_error=[]; predictions_all=[]\n\nfor test in set_list:\n\t# Load data for specific set.\n\tbb_data = pickle.load(open(os.path.join(set_dir, test), \"rb\"))\n\tbb_3D, bb = bb_data[\"bb_3D\"], bb_data[\"bb\"]\n\tbboxes, camera_movements, depths = odmd.data_gen.bb_to_inputs(bb_3D, bb,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tn_observations)\n\n\t\"\"\"\n\tUse your own depth estimation model here (to replace Box_LS):\n\t\"\"\"\n\tpredictions = odmd.closed_form.Box_LS(bboxes, camera_movements)\n\n\tpercent_error.append(np.mean( abs(predictions - depths) / depths))\n\tabs_error.append(np.mean(abs(predictions - depths)))\n\tpredictions_all.append(depths)\n\n# Print out final results.\nprint(\"\\nResults summary for ODMD %s sets.\" % eval_set)\nfor i, test_set in enumerate(set_list):\n\tprint(\"\\n%s-%s:\" % (test_set, eval_set))\n\tprint(\"Mean Percent  Error: %.4f\" % percent_error[i]) \n\tprint(\"Mean Absolute Error: %.4f (m)\" % abs_error[i]) \n\n# Generate final results file.\nname = \"Box_LS\"\ndata_name = \"%s_%s\" % (dataset, eval_set)\nresult_data = {\"Result Name\": name, \"Set List\": set_list, \n\t\t\t\t\"Percent Error\": percent_error, \"Absolute Error\": abs_error, \n\t\t\t\t\"Depth Estimates\": predictions_all, \"Dataset\": data_name}\nos.makedirs(\"../results/\", exist_ok=True)\npickle.dump(result_data, open(\"../results/%s_%s.pk\" % (name, data_name), \"wb\"))\n"
  },
  {
    "path": "odmd/__init__.py",
    "content": "import odmd.data_gen\nimport odmd.closed_form"
  },
  {
    "path": "odmd/closed_form/__init__.py",
    "content": "from .closed_form import *"
  },
  {
    "path": "odmd/closed_form/closed_form.py",
    "content": "# File: analytic_model.py\n\nimport os, IPython, numpy as np, yaml\nfrom copy import deepcopy\n\ndef Box_LS(input_bb, camera_move, n_obs=10):\n\tinput_bb = missing_detection_filter(input_bb)\n\t# Find Ax = b least-squares solution (see equation sheet for details).\n\tn_examples = len(input_bb[0,0])\n\tpredictions = np.zeros(n_examples)\n\tA = np.zeros(shape=(2*n_obs, 3))\n\tA[:n_obs,1]=1; A[n_obs:,2] = 1\n\tb = np.zeros(2*n_obs)\n\tz = np.zeros(n_obs)\n\tfor i in range(n_examples):\n\t\tw = input_bb[:,2,i]\n\t\th = input_bb[:,3,i]\n\t\tz[:-1] = camera_move[:,2,i]\n\t\tb = np.concatenate((w*z, h*z))\n\t\tA[:,0] = np.concatenate((w, h))\n\t\ttry:\n\t\t\tx = np.matmul(np.matmul(np.linalg.inv(np.matmul(A.T,A)),A.T),b)\n\t\texcept:\n\t\t\tprint(\"Warning! Matrix A.T A is not invertable! x is not solved.\")\n\t\t\tx = np.zeros(3)\n\t\tpredictions[i] = x[0]\n\treturn predictions\n\ndef missing_detection_filter(input_bb):\n\tmiss_idx = np.argwhere(input_bb[:,2,:] == 0) \n\tn_miss = len(miss_idx)\n\tif n_miss > 0:\n\t\tprint(\"\\nMissing %i observations! Using nearest valid.\\n\" % n_miss)\n\t\tbb_init = deepcopy(input_bb)\n\t\tfor idx in miss_idx:\n\t\t\t# Replace missing detection with closest valid obsesrvation.\n\t\t\tobs_idx = np.argwhere((bb_init[:,2,idx[1]]==0)==False)\n\t\t\ttry:\n\t\t\t\treplace_idx = obs_idx[np.argmin(abs(obs_idx-idx[0]))][0]\n\t\t\texcept:\n\t\t\t\tprint(\"No observation at all!\")\n\t\t\t\treplace_idx = 0\n\t\t\tinput_bb[idx[0],:,idx[1]] = input_bb[replace_idx,:,idx[1]]\n\treturn input_bb\n\t\ndef bb_m_parallax(input_bb, camera_move):\n\t# NOTE: Be sure that input number of observations is two!\n\tin_bb = missing_detection_filter(input_bb)\n\t# Using deltaZ neq 0 method from derivation.\n\t# Load camera parameters. Normalize cx, cy, fx, fy by image size.\n\tcam_file = \"../config/camera/hsr_grasp_camera.yaml\"\n\tparams = yaml.full_load(open(cam_file))\n\tdim = params[\"image_dim\"]\n\tcx, cy = params[\"cx\"]/dim[1], params[\"cy\"]/dim[0]\n\tfx, fy = params[\"fx\"]/dim[1], params[\"fy\"]/dim[0]\n\t# Find x0, xf, y0, yf, w0/wf, h0/hf, and Delta X, Y from two observations. \n\tdX, dY = camera_move[0, 0, :], camera_move[0, 1, :]\n\tx0, xf, y0, yf = in_bb[0,0,:], in_bb[1,0,:], in_bb[0,1,:], in_bb[1,1,:]\n\tw0, wf, h0, hf = in_bb[0,2,:], in_bb[1,2,:], in_bb[0,3,:], in_bb[1,3,:]\n\t# Solve for depth using x and y motion parallax then average result.\n\tdepth_mpx = dX*fx / ((xf-cx) - (x0-cx)*(wf/w0))\n\tdepth_mpy = dY*fy / ((yf-cy) - (y0-cy)*(hf/h0))\n\t# Replace Inf. values with one.\n\treplace_idx = np.argwhere(np.isfinite(depth_mpx)==False)\n\tfor idx in replace_idx:\n\t\tdepth_mpx[idx] = 1\n\t\tdepth_mpy[idx] = 1\n\t\"\"\"\n\t# No deltaZ version:\n\tdepth_mpx = dX*fx / (xf - x0)\n\tdepth_mpy = dY*fy / (yf - y0)\n\t\"\"\"\n\tpredictions = (depth_mpx + depth_mpy) / 2\n\treturn predictions\n\ndef bb_opt_expansion(input_bb, camera_move):\n\t# NOTE: Be sure that input number of observations is two!\n\tin_bb = missing_detection_filter(input_bb)\n\t# Load camera parameters. Normalize cx, cy, fx, fy by image size.\n\tcam_file = \"../config/camera/hsr_grasp_camera.yaml\"\n\tparams = yaml.full_load(open(cam_file))\n\tdim = params[\"image_dim\"]\n\tcx, cy = params[\"cx\"]/dim[1], params[\"cy\"]/dim[0]\n\tfx, fy = params[\"fx\"]/dim[1], params[\"fy\"]/dim[0]\n\t# Find x0, xf, y0, yf, w0/wf, h0/hf, and Delta X, Y from two observations. \n\tdZ = camera_move[0, 2, :]\n\tx0, xf, y0, yf = in_bb[0,0,:], in_bb[1,0,:], in_bb[0,1,:], in_bb[1,1,:]\n\tw0, wf, h0, hf = in_bb[0,2,:], in_bb[1,2,:], in_bb[0,3,:], in_bb[1,3,:]\n\t# Solve for depth using w and h optical expansion then average result.\n\tdepth_oew = dZ / (1 - (wf/w0))\n\tdepth_oeh = dZ / (1 - (hf/h0))\n\t# Replace Inf. values with one.\n\treplace_idx = np.argwhere(np.isfinite(depth_oew)==False)\n\tfor idx in replace_idx:\n\t\tdepth_oew[idx] = 1\n\t\tdepth_oeh[idx] = 1\n\tpredictions = (depth_oew + depth_oeh) / 2\n\treturn predictions\n\n"
  },
  {
    "path": "odmd/data_gen/__init__.py",
    "content": "from .data_gen import *\nfrom .data_gen_util import *"
  },
  {
    "path": "odmd/data_gen/data_gen.py",
    "content": "# File: data_gen.py\n\nimport IPython, os, numpy as np, yaml\n\nclass DataGenerator:\n\t\"\"\"\n\tDataGenerator generates random data for depth boxes.\n\t\"\"\"\n\n\tdef __init__(self, data_params):\n\t\t\"\"\"\n\t\tArgs:\n\t\t\tdata_params (dict): dictionary with the following keys and values:\n\t\t\t\tz_lim ([float,float]): minimum and maximum object start depth.\n\t\t\t\tmove_max (list[float]): maximum position change for X, Y, Z\n\t\t\t\tsize_range (list[[float,float]]): minimum and maximum width and\n\t\t\t\t\theight of object in world coordinates.\n\t\t\t\tnum_pos (int): number of positions that object is viewed from.\n\t\t\"\"\"\n\t\tself.set_yaml_params(data_params)\n\t\tself.move_range = np.array(self.move_max) - self.move_min\n\t\tself.z_range = self.z_lim[1] - self.z_lim[0]\n\t\tself.size_range = self.size_lim[1] - self.size_lim[0]\n\n\tdef set_yaml_params(self, yaml_file):\n\t\tself.set_params(yaml.full_load(open(yaml_file)))\n\n\tdef set_params(self, params):\n\t\tfor _, key in enumerate(params.keys()):\n\t\t\tsetattr(self, key, params[key])\n\n\tdef initialize_data_gen(self, camera_config):\n\t\tself.set_yaml_params(camera_config)\n\t\t# See derivation 200520 page 33.\n\t\tcfx = self.cx / self.fx; self.xmina = -cfx\n\t\tself.xminb = self.move_max[0] +self.size_lim[1]/2 +self.move_max[2]*cfx \n\t\twfx = (self.image_dim[1] - self.cx) / self.fx; self.xmaxa = wfx\n\t\tself.xmaxb = -self.move_max[0] -self.size_lim[1]/2-self.move_max[2]*wfx\n\t\tcfy = self.cy / self.fy; self.ymina = -cfy\n\t\tself.yminb = self.move_max[1] +self.size_lim[1]/2 +self.move_max[2]*cfy \n\t\twfy = (self.image_dim[0] - self.cy) / self.fy; self.ymaxa = wfy\n\t\tself.ymaxb = -self.move_max[1] -self.size_lim[1]/2-self.move_max[2]*wfy\n\t\tself.xa = self.xmaxa - self.xmina; self.xb = self.xmaxb - self.xminb\n\t\tself.ya = self.ymaxa - self.ymina; self.yb = self.ymaxb - self.yminb\n\t\tself.fx_norm = self.fx / self.image_dim[1] \n\t\tself.cx_norm = self.cx / self.image_dim[1]\n\t\tself.fy_norm = self.fy / self.image_dim[0]\n\t\tself.cy_norm = self.cy / self.image_dim[0]\n\t\t# Check that bounds are valid.\n\t\tif (self.z_lim[0] * self.xmina + self.xminb > 0) or (self.z_lim[0] *\n\t\t\t\t\t\t\t\t\t\t\t\tself.ymina + self.yminb > 0):\n\t\t\tprint(\"\\n\\nData gen. bounds are not valid! Try increasing z.\\n\\n\")\n\t\t\tIPython.embed()\t\n\n\tdef generate_object_examples(self, n_ex):\n\t\t# Find random initial and final object positions, then intermediate.\n\t\trdm = np.random.rand(n_ex * 8)\n\t\tsign = np.random.randint(0,2, size=n_ex*3)*2 - 1\n\t\tp = np.zeros(shape=(2,3,n_ex))\n\t\t# Find initial Z position, then X(Z) and Y(Z) within the field of view.\n\t\tp[0][2] = self.z_lim[0] + self.z_range * rdm[:n_ex]\n\t\tp[0][0] = p[0][2]*self.xmina + self.xminb + \\\n\t\t\t\t\trdm[n_ex:n_ex*2]*( p[0][2]*self.xa + self.xb )\n\t\tp[0][1] = p[0][2]*self.ymina + self.yminb + \\\n\t\t\t\t\trdm[n_ex*2:n_ex*3]*( p[0][2]*self.ya + self.yb )\n\t\tfor i in range(3):\n\t\t\tp[1][i] = p[0][i] + sign[n_ex*i:n_ex*(i+1)] * (self.move_min[i] + \\\n\t\t\t\t\t\t\t\tself.move_range[i] *rdm[n_ex*(i+3):n_ex*(i+4)])\n\t\tif self.end_swap:\n\t\t\t# For greater sample diversity, switch start / end points randomly.\n\t\t\tswap = np.argwhere(np.random.randint(0,2, size=n_ex))\n\t\t\tp[[1,0],:,swap] = p[[0,1],:,swap]\n\t\tif self.num_pos>2: p = self.add_intermediate_positions(p)\n\t\t# Determine camera movement for each position.\n\t\t# Note: camera movement is opposite (-1) of object movement p.\n\t\tmovement = (p[-1] - p)[:-1]\n\t\t# Find random height and width of objects.\n\t\ts = [self.size_lim[0] + self.size_range * rdm[n_ex*(i+6):n_ex*(i+7)] \n\t\t\t\tfor i in range(2)]\n\t\tobj_examples = {\"positions\": p, \"camera_movement\": np.array(movement), \n\t\t\t\t\t\t\"sizes\": s, \"n_ex\": n_ex, \"n_positions\": self.num_pos}\n\t\tbb = self.find_image_bb_from_objects(obj_examples)\n\t\treturn obj_examples, bb\n\n\tdef add_intermediate_positions(self, p_init):\n\t\t# Add intermediate object positions between initial and final pose.\n\t\tn_ex = len(p_init[0][0])\n\t\trdm = np.random.rand(n_ex * 3 * (self.num_pos - 2))\n\t\tdp = p_init[1] - p_init[0]\n\t\tp = np.zeros(shape=(self.num_pos-2,3,n_ex))\n\t\tfor i in range(3):\n\t\t\tfor j in range(0,self.num_pos-2):\n\t\t\t\tp[j][i] = p_init[0][i] + dp[i] \\\n\t\t\t\t\t\t\t\t\t\t\t* rdm[n_ex*(i+j*3):n_ex*(i+1+j*3)]\t\n\t\t# Sort intermediate points to be monotonicly increasing or decreasing.\n\t\tp.sort(axis=0)\n\t\tdescend = dp<0\n\t\tp[:,descend] = p[::-1][:,descend]\n\t\tp_out = np.zeros(shape=(self.num_pos,3,n_ex))\n\t\tp_out [[0,-1]] = [p_init[0], p_init[-1]]\n\t\tp_out[1:-1] = p\n\t\treturn p_out\n\n\tdef find_image_bb_from_objects(self, objects):\n\t\t# Find image bounding boxes for each object and position.\n\t\ts = objects['sizes']\n\t\tbboxes = [[] for i in range(objects['n_positions'])]\n\t\tfor i, p in enumerate(objects['positions']):\n\t\t\txc = p[0]*self.fx_norm/p[2] + self.cx_norm\n\t\t\tyc = p[1]*self.fy_norm/p[2] + self.cy_norm\n\t\t\tw = s[0]*self.fx_norm/p[2]\n\t\t\th = s[1]*self.fy_norm/p[2]\n\t\t\tz = p[2]\n\t\t\tbboxes[i] = [xc,yc,w,h,z]\n\t\timage_bb = {\"bboxes\": np.array(bboxes), \n\t\t\t\t\t\"n_positions\": objects[\"n_positions\"],\n\t\t\t\t\t\"n_ex\": objects[\"n_ex\"], \"image_dim\": self.image_dim,\n\t\t\t\t\t\"fx_norm\": self.fx_norm, \"fy_norm\": self.fy_norm,\n\t\t\t\t\t\"bbox_format\": \"[xc_norm; yc_norm; w_norm; h_norm; Z]\"}\n\t\treturn image_bb\n"
  },
  {
    "path": "odmd/data_gen/data_gen_util.py",
    "content": "# File: data_gen_util.py\n\nimport os, IPython, numpy as np\n\ndef add_perturbations(bb_3D, bb, odmd_data):\n\tif odmd_data.perturb:\n\t\tif odmd_data.std_dev > 0:\n\t\t\tbb = bounding_box_perturbation(bb, odmd_data.std_dev)\n\t\tif odmd_data.shuffle > 0:\n\t\t\tbb = bounding_box_shuffle(bb, odmd_data.shuffle)\n\t\tif odmd_data.cam_dev > 0:\n\t\t\tbb_3D = camera_move_perturbation(bb_3D, odmd_data.cam_dev)\n\treturn bb_3D, bb\n\ndef bounding_box_perturbation(bb, std_dev):\n\t# Add random noise to bounding box.\n\tdev = np.random.normal(scale=std_dev,size=(bb[\"n_positions\"],4,bb[\"n_ex\"]))\n\tbb[\"bboxes\"][:,:4,:] += dev\n\treturn bb\n\ndef bounding_box_shuffle(bb, shuffle):\n\t# Randomly shuffle a percentage of bounding boxes to learn data selection.\n\tdim = bb[\"bboxes\"].shape\n\tn_shuffle = int(dim[2] * shuffle)\n\tchange_idx = np.random.choice(range(dim[2]), n_shuffle, replace=False)\n\treplace_idx = np.random.choice(range(dim[2]), n_shuffle)\n\tposition_idx = np.random.choice(range(dim[0]), n_shuffle)\n\tbb[\"bboxes\"][position_idx,:4,change_idx] = \\\n\t\t\t\t\t\t\t\t\tbb[\"bboxes\"][position_idx,:4,replace_idx]\n\treturn bb\n\ndef camera_move_perturbation(bb_3D, cam_dev):\n\tdev = np.random.normal(scale=cam_dev,size=(bb_3D[\"n_positions\"] - 1, 3,\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tbb_3D[\"n_ex\"]))\n\tbb_3D[\"camera_movement\"] += dev\n\treturn bb_3D\n\ndef bb_to_inputs(bb_3D, bb, n_obs=10):\n\t# Prepare input data based on the number of observations used.\n\tidx = np.round(np.linspace(0, bb_3D[\"n_positions\"]-1, n_obs)).astype(\"int\") \n\tinput_bb = np.array(bb[\"bboxes\"])[idx,:-1]\t\n\tcamera_move = np.array(bb_3D[\"camera_movement\"])[idx[:-1]]\n\tlabels = np.array(bb[\"bboxes\"])[-1][-1]\t\n\t# Can also use more labels if training for intermediate depth:\n\t#\tlabels = np.array(bb[\"bboxes\"])[:,-1]\t\n\treturn input_bb, camera_move, labels\n"
  }
]