Showing preview only (985K chars total). Download the full file or copy to clipboard to get everything.
Repository: ivalab/grasp_multiObject_multiGrasp
Branch: master
Commit: 806ad3d71c2f
Files: 86
Total size: 947.1 KB
Directory structure:
gitextract_6dytcin7/
├── README.md
├── data/
│ └── scripts/
│ ├── dataPreprocessingTest_fasterrcnn_split.m
│ ├── dataPreprocessing_fasterrcnn.m
│ └── fetch_faster_rcnn_models.sh
├── experiments/
│ ├── cfgs/
│ │ ├── res101-lg.yml
│ │ ├── res101.yml
│ │ ├── res50.yml
│ │ └── vgg16.yml
│ ├── logs/
│ │ └── .gitignore
│ └── scripts/
│ ├── convert_vgg16.sh
│ ├── test_faster_rcnn.sh
│ ├── train_faster_rcnn.sh
│ └── train_faster_rcnn.sh~
├── lib/
│ ├── Makefile
│ ├── datasets/
│ │ ├── VOCdevkit-matlab-wrapper/
│ │ │ ├── get_voc_opts.m
│ │ │ ├── voc_eval.m
│ │ │ └── xVOCap.m
│ │ ├── __init__.py
│ │ ├── coco.py
│ │ ├── ds_utils.py
│ │ ├── factory.py
│ │ ├── factory.py~
│ │ ├── graspRGB.py
│ │ ├── graspRGB.py~
│ │ ├── imdb.py
│ │ ├── pascal_voc.py
│ │ ├── tools/
│ │ │ └── mcg_munge.py
│ │ └── voc_eval.py
│ ├── layer_utils/
│ │ ├── __init__.py
│ │ ├── anchor_target_layer.py
│ │ ├── generate_anchors.py
│ │ ├── proposal_layer.py
│ │ ├── proposal_target_layer.py
│ │ ├── proposal_top_layer.py
│ │ └── snippets.py
│ ├── model/
│ │ ├── __init__.py
│ │ ├── bbox_transform.py
│ │ ├── config.py
│ │ ├── config.py~
│ │ ├── nms_wrapper.py
│ │ ├── test.py
│ │ ├── test.py~
│ │ ├── train_val.py
│ │ └── train_val.py~
│ ├── nets/
│ │ ├── __init__.py
│ │ ├── network.py
│ │ ├── resnet_v1.py
│ │ ├── resnet_v1.py~
│ │ └── vgg16.py
│ ├── nms/
│ │ ├── .gitignore
│ │ ├── __init__.py
│ │ ├── cpu_nms.c
│ │ ├── cpu_nms.pyx
│ │ ├── gpu_nms.cpp
│ │ ├── gpu_nms.hpp
│ │ ├── gpu_nms.pyx
│ │ ├── nms_kernel.cu
│ │ └── py_cpu_nms.py
│ ├── roi_data_layer/
│ │ ├── __init__.py
│ │ ├── layer.py
│ │ ├── minibatch.py
│ │ ├── minibatch.py~
│ │ └── roidb.py
│ ├── setup.py
│ ├── setup.py~
│ └── utils/
│ ├── .gitignore
│ ├── __init__.py
│ ├── bbox.pyx
│ ├── blob.py
│ ├── boxes_grid.py
│ ├── nms.py
│ ├── nms.pyx
│ └── timer.py
└── tools/
├── _init_paths.py
├── demo.py~
├── demo_graspRGD.py
├── demo_graspRGD.py~
├── demo_graspRGD_socket.py
├── demo_graspRGD_socket.py~
├── demo_graspRGD_socket_drawer.py~
├── demo_graspRGD_socket_save_to_rgbd.py~
├── demo_graspRGD_vis_mask.py
├── demo_graspRGD_vis_select.py
├── eval_graspRGD.py~
├── mask_gen.py
└── trainval_net.py
================================================
FILE CONTENTS
================================================
================================================
FILE: README.md
================================================
# grasp_multiObject_multiGrasp
This is the implementation of our RA-L work 'Real-world Multi-object, Multi-grasp Detection'. The detector takes RGB-D image input and predicts multiple grasp candidates for a single object or multiple objects, in a single shot. The original arxiv paper can be found [here](https://arxiv.org/pdf/1802.00520.pdf). The final version will be updated after publication process.
<p align="center">
<img src="https://github.com/ivalab/grasp_multiObject_multiGrasp/blob/master/fig/multi_grasp_multi_object.png" alt="drawing" width="300"/>
</p>
If you find it helpful for your research, please consider citing:
@inproceedings{chu2018deep,
title = {Real-World Multiobject, Multigrasp Detection},
author = {F. Chu and R. Xu and P. A. Vela},
journal = {IEEE Robotics and Automation Letters},
year = {2018},
volume = {3},
number = {4},
pages = {3355-3362},
DOI = {10.1109/LRA.2018.2852777},
ISSN = {2377-3766},
month = {Oct}
}
If you encounter any questions, please contact me at fujenchu[at]gatech[dot]edu
### Demo
1. Clone this repository
```
git clone https://github.com/ivalab/grasp_multiObject_multiGrasp.git
cd grasp_multiObject_multiGrasp
```
2. Build Cython modules
```
cd lib
make clean
make
cd ..
```
3. Install [Python COCO API](https://github.com/cocodataset/cocoapi)
```
cd data
git clone https://github.com/pdollar/coco.git
cd coco/PythonAPI
make
cd ../../..
```
4. Download pretrained models
- trained model for grasp on [dropbox drive](https://www.dropbox.com/s/ldapcpanzqdu7tc/models.zip?dl=0)
- put under `output/res50/train/default/`
5. Run demo
```
./tools/demo_graspRGD.py --net res50 --dataset grasp
```
you can see images pop out.
### Train
1. Generate data
1-1. Download [Cornell Dataset](http://pr.cs.cornell.edu/grasping/rect_data/data.php)
1-2. Run `dataPreprocessingTest_fasterrcnn_split.m` (please modify paths according to your structure)
1-3. Follow 'Format Your Dataset' section [here](https://github.com/zeyuanxy/fast-rcnn/tree/master/help/train) to check if your data follows VOC format
2. Train
```
./experiments/scripts/train_faster_rcnn.sh 0 graspRGB res50
```
### ROS version?
Yes! please find it [HERE](https://github.com/ivaROS/ros_deep_grasp)
### Acknowledgment
This repo borrows tons of code from
- [tf-faster-rcnn](https://github.com/endernewton/tf-faster-rcnn) by endernewton
### Resources
- [multi-object grasp dataset](https://github.com/ivalab/grasp_multiObject)
- [grasp annotation tool](https://github.com/ivalab/grasp_annotation_tool)
================================================
FILE: data/scripts/dataPreprocessingTest_fasterrcnn_split.m
================================================
%% script to test dataPreprocessing
%% created by Fu-Jen Chu on 09/15/2016
close all
clear
%parpool(4)
addpath('/media/fujenchu/home3/data/grasps/')
% generate list for splits
list = [100:949 1000:1034];
list_idx = randperm(length(list));
train_list_idx = list_idx(length(list)/5+1:end);
test_list_idx = list_idx(1:length(list)/5);
train_list = list(train_list_idx);
test_list = list(test_list_idx);
for folder = 1:10
display(['processing folder ' int2str(folder)])
imgDataDir = ['/home/fujenchu/projects/deepLearning/tensorflow-finetune-flickr-style-master/data/grasps/' sprintf('%02d',folder) '_rgd'];
txtDataDir = ['/home/fujenchu/projects/deepLearning/tensorflow-finetune-flickr-style-master/data/grasps/' sprintf('%02d',folder)];
%imgDataOutDir = ['/home/fujenchu/projects/deepLearning/tensorflow-finetune-flickr-style-master/data/grasps/' sprintf('%02d',folder) '_Cropped320_rgd'];
imgDataOutDir = '/media/fujenchu/home3/fasterrcnn_grasp/rgd_multibbs_5_5_5_tf/data/Images';
annotationDataOutDir = '/media/fujenchu/home3/fasterrcnn_grasp/rgd_multibbs_5_5_5_tf/data/Annotations';
imgSetTrain = '/media/fujenchu/home3/fasterrcnn_grasp/rgd_multibbs_5_5_5_tf/data/ImageSets/train.txt';
imgSetTest = '/media/fujenchu/home3/fasterrcnn_grasp/rgd_multibbs_5_5_5_tf/data/ImageSets/test.txt';
imgFiles = dir([imgDataDir '/*.png']);
txtFiles = dir([txtDataDir '/*pos.txt']);
logfileID = fopen('log.txt','a');
mainfileID = fopen(['/home/fujenchu/projects/deepLearning/deepGraspExtensiveOffline/data/grasps/scripts/trainttt' sprintf('%02d',folder) '.txt'],'a');
for idx = 1:length(imgFiles)
%% display progress
tic
display(['processing folder: ' sprintf('%02d',folder) ', imgFiles: ' int2str(idx)])
%% reading data
imgName = imgFiles(idx).name;
[pathstr,imgname] = fileparts(imgName);
filenum = str2num(imgname(4:7));
if(any(test_list == filenum))
file_writeID = fopen(imgSetTest,'a');
fprintf(file_writeID, '%s\n', [imgDataDir(1:end-3) 'Cropped320_rgd/' imgname '_preprocessed_1.png' ] );
fclose(file_writeID);
continue;
end
txtName = txtFiles(idx).name;
[pathstr,txtname] = fileparts(txtName);
img = imread([imgDataDir '/' imgname '.png']);
fileID = fopen([txtDataDir '/' txtname '.txt'],'r');
sizeA = [2 100];
bbsIn_all = fscanf(fileID, '%f %f', sizeA);
fclose(fileID);
%% data pre-processing
[imagesOut bbsOut] = dataPreprocessing_fasterrcnn(img, bbsIn_all, 227, 5, 5);
% for each augmented image
for i = 1:1:size(imagesOut,2)
% for each bbs
file_writeID = fopen([annotationDataOutDir '/' imgname '_preprocessed_' int2str(i) '.txt'],'w');
printCount = 0;
for ibbs = 1:1:size(bbsOut{i},2)
A = bbsOut{i}{ibbs};
xy_ctr = sum(A,2)/4; x_ctr = xy_ctr(1); y_ctr = xy_ctr(2);
width = sqrt(sum((A(:,1) - A(:,2)).^2)); height = sqrt(sum((A(:,2) - A(:,3)).^2));
if(A(1,1) > A(1,2))
theta = atan((A(2,2)-A(2,1))/(A(1,1)-A(1,2)));
else
theta = atan((A(2,1)-A(2,2))/(A(1,2)-A(1,1))); % note y is facing down
end
% process to fasterrcnn
x_min = x_ctr - width/2; x_max = x_ctr + width/2;
y_min = y_ctr - height/2; y_max = y_ctr + height/2;
%if(x_min < 0 || y_min < 0 || x_max > 227 || y_max > 227) display('yoooooooo'); end
if((x_min < 0 && x_max < 0) || (y_min > 227 && y_max > 227) || (x_min > 227 && x_max > 227) || (y_min < 0 && y_max < 0)) display('xxxxxxxxx'); break; end
cls = round((theta/pi*180+90)/10) + 1;
% write as lefttop rightdown, Xmin Ymin Xmax Ymax, ex: 261 109 511 705 (x水平 y垂直)
fprintf(file_writeID, '%d %f %f %f %f\n', cls, x_min, y_min, x_max, y_max );
printCount = printCount+1;
end
if(printCount == 0) fprintf(logfileID, '%s\n', [imgname '_preprocessed_' int2str(i) ]);end
fclose(file_writeID);
imwrite(imagesOut{i}, [imgDataOutDir '/' imgname '_preprocessed_' int2str(i) '.png']);
% write filename to imageSet
file_writeID = fopen(imgSetTrain,'a');
fprintf(file_writeID, '%s\n', [imgname '_preprocessed_' int2str(i) ] );
fclose(file_writeID);
end
toc
end
fclose(mainfileID);
end
================================================
FILE: data/scripts/dataPreprocessing_fasterrcnn.m
================================================
function [imagesOut bbsOut] = dataPreprocessing( imageIn, bbsIn_all, cropSize, translationShiftNumber, roatateAngleNumber)
% dataPreprocessing function perfroms
% 1) croping
% 2) padding
% 3) rotatation
% 4) shifting
%
% for a input image with a bbs as 4 points,
% dataPreprocessing outputs a set of images with corresponding bbs.
%
%
% Inputs:
% imageIn: input image (480 by 640 by 3)
% bbsIn: bounding box (2 by 4)
% cropSize: output image size
% shift: shifting offset
% rotate: rotation angle
%
% Outputs:
% imagesOut: output images (n images)
% bbsOut: output bbs according to shift and rotation
%
%% created by Fu-Jen Chu on 09/15/2016
debug_dev = 0;
debug = 0;
%% show image and bbs
if(debug_dev)
figure(1); imshow(imageIn); hold on;
x = bbsIn_all(1, [1:3]);
y = bbsIn_all(2, [1:3]);
plot(x,y); hold off;
end
%% crop image and padding image
% cropping to 321 by 321 from center
imgCrop = imcrop(imageIn, [145 65 351 351]);
% padding to 501 by 501
imgPadding = padarray(imgCrop, [75 75], 'replicate', 'both');
count = 1;
for i_rotate = 1:roatateAngleNumber*translationShiftNumber*translationShiftNumber
% random roatateAngle
theta = randi(360)-1;
%theta = 0;
% random translationShift
dx = randi(101)-51;
%dx = 0;
%% rotation and shifting
% random translationShift
dy = randi(101)-51;
%dy = 0;
imgRotate = imrotate(imgPadding, theta);
if(debug_dev)figure(2); imshow(imgRotate);end
imgCropRotate = imcrop(imgRotate, [size(imgRotate,1)/2-160-dx size(imgRotate,1)/2-160-dy 320 320]);
if(debug_dev)figure(3); imshow(imgCropRotate);end
imgResize = imresize(imgCropRotate, [cropSize cropSize]);
if(debug)figure(4); imshow(imgResize); hold on;end
%% modify bbs
[m, n] = size(bbsIn_all);
bbsNum = n/4;
countbbs = 1;
for idx = 1:bbsNum
bbsIn = bbsIn_all(:,idx*4-3:idx*4);
if(sum(sum(isnan(bbsIn)))) continue; end
bbsInShift = bbsIn - repmat([320; 240], 1, 4);
R = [cos(theta/180*pi) -sin(theta/180*pi); sin(theta/180*pi) cos(theta/180*pi)];
bbsRotated = (bbsInShift'*R)';
bbsInShiftBack = (bbsRotated + repmat([160; 160], 1, 4) + repmat([dx; dy], 1, 4))*cropSize/320;
if(debug)
figure(4)
x = bbsInShiftBack(1, [1:4 1]);
y = bbsInShiftBack(2, [1:4 1]);
plot(x,y); hold on; pause(0.01);
end
bbsOut{count}{countbbs} = bbsInShiftBack;
countbbs = countbbs + 1;
end
imagesOut{count} = imgResize;
count = count +1;
end
end
================================================
FILE: data/scripts/fetch_faster_rcnn_models.sh
================================================
#!/bin/bash
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )/../" && pwd )"
cd $DIR
NET=res101
FILE=voc_0712_80k-110k.tgz
# replace it with gs11655.sp.cs.cmu.edu if ladoga.graphics.cs.cmu.edu does not work
URL=http://ladoga.graphics.cs.cmu.edu/xinleic/tf-faster-rcnn/$NET/$FILE
CHECKSUM=cb32e9df553153d311cc5095b2f8c340
if [ -f $FILE ]; then
echo "File already exists. Checking md5..."
os=`uname -s`
if [ "$os" = "Linux" ]; then
checksum=`md5sum $FILE | awk '{ print $1 }'`
elif [ "$os" = "Darwin" ]; then
checksum=`cat $FILE | md5`
fi
if [ "$checksum" = "$CHECKSUM" ]; then
echo "Checksum is correct. No need to download."
exit 0
else
echo "Checksum is incorrect. Need to download again."
fi
fi
echo "Downloading Resnet 101 Faster R-CNN models Pret-trained on VOC 07+12 (340M)..."
wget $URL -O $FILE
echo "Unzipping..."
tar zxvf $FILE
echo "Done. Please run this command again to verify that checksum = $CHECKSUM."
================================================
FILE: experiments/cfgs/res101-lg.yml
================================================
EXP_DIR: res101-lg
TRAIN:
HAS_RPN: True
IMS_PER_BATCH: 1
BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True
RPN_POSITIVE_OVERLAP: 0.7
RPN_BATCHSIZE: 256
PROPOSAL_METHOD: gt
BG_THRESH_LO: 0.0
DISPLAY: 20
BATCH_SIZE: 256
WEIGHT_DECAY: 0.0001
DOUBLE_BIAS: False
SNAPSHOT_PREFIX: res101_faster_rcnn
SCALES: [800]
MAX_SIZE: 1333
TEST:
HAS_RPN: True
SCALES: [800]
MAX_SIZE: 1333
RPN_POST_NMS_TOP_N: 1000
POOLING_MODE: crop
ANCHOR_SCALES: [2,4,8,16,32]
================================================
FILE: experiments/cfgs/res101.yml
================================================
EXP_DIR: res101
TRAIN:
HAS_RPN: True
IMS_PER_BATCH: 1
BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True
RPN_POSITIVE_OVERLAP: 0.7
RPN_BATCHSIZE: 256
PROPOSAL_METHOD: gt
BG_THRESH_LO: 0.0
DISPLAY: 20
BATCH_SIZE: 256
WEIGHT_DECAY: 0.0001
DOUBLE_BIAS: False
SNAPSHOT_PREFIX: res101_faster_rcnn
TEST:
HAS_RPN: True
POOLING_MODE: crop
================================================
FILE: experiments/cfgs/res50.yml
================================================
EXP_DIR: res50
TRAIN:
HAS_RPN: True
IMS_PER_BATCH: 1
BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True
RPN_POSITIVE_OVERLAP: 0.7
RPN_BATCHSIZE: 256
PROPOSAL_METHOD: gt
BG_THRESH_LO: 0.0
DISPLAY: 20
BATCH_SIZE: 256
WEIGHT_DECAY: 0.0001
DOUBLE_BIAS: False
SNAPSHOT_PREFIX: res50_faster_rcnn
TEST:
HAS_RPN: True
POOLING_MODE: crop
================================================
FILE: experiments/cfgs/vgg16.yml
================================================
EXP_DIR: vgg16
TRAIN:
HAS_RPN: True
IMS_PER_BATCH: 1
BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True
RPN_POSITIVE_OVERLAP: 0.7
RPN_BATCHSIZE: 256
PROPOSAL_METHOD: gt
BG_THRESH_LO: 0.0
DISPLAY: 20
BATCH_SIZE: 256
SNAPSHOT_PREFIX: vgg16_faster_rcnn
TEST:
HAS_RPN: True
POOLING_MODE: crop
================================================
FILE: experiments/logs/.gitignore
================================================
*.txt.*
================================================
FILE: experiments/scripts/convert_vgg16.sh
================================================
#!/bin/bash
set -x
set -e
export PYTHONUNBUFFERED="True"
GPU_ID=$1
DATASET=$2
NET=vgg16
array=( $@ )
len=${#array[@]}
EXTRA_ARGS=${array[@]:2:$len}
EXTRA_ARGS_SLUG=${EXTRA_ARGS// /_}
case ${DATASET} in
pascal_voc)
TRAIN_IMDB="voc_2007_trainval"
TEST_IMDB="voc_2007_test"
STEPSIZE=50000
ITERS=70000
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
pascal_voc_0712)
TRAIN_IMDB="voc_2007_trainval+voc_2012_trainval"
TEST_IMDB="voc_2007_test"
STEPSIZE=80000
ITERS=110000
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
coco)
TRAIN_IMDB="coco_2014_train+coco_2014_valminusminival"
TEST_IMDB="coco_2014_minival"
STEPSIZE=350000
ITERS=490000
ANCHORS="[4,8,16,32]"
RATIOS="[0.5,1,2]"
;;
*)
echo "No dataset given"
exit
;;
esac
set +x
NET_FINAL=${NET}_faster_rcnn_iter_${ITERS}
set -x
if [ ! -f ${NET_FINAL}.index ]; then
if [[ ! -z ${EXTRA_ARGS_SLUG} ]]; then
CUDA_VISIBLE_DEVICES=${GPU_ID} time python ./tools/convert_from_depre.py \
--snapshot ${NET_FINAL} \
--imdb ${TRAIN_IMDB} \
--iters ${ITERS} \
--cfg experiments/cfgs/${NET}.yml \
--tag ${EXTRA_ARGS_SLUG} \
--set ANCHOR_SCALES ${ANCHORS} ANCHOR_RATIOS ${RATIOS} TRAIN.STEPSIZE ${STEPSIZE} ${EXTRA_ARGS}
else
CUDA_VISIBLE_DEVICES=${GPU_ID} time python ./tools/convert_from_depre.py \
--snapshot ${NET_FINAL} \
--imdb ${TRAIN_IMDB} \
--iters ${ITERS} \
--cfg experiments/cfgs/${NET}.yml \
--set ANCHOR_SCALES ${ANCHORS} ANCHOR_RATIOS ${RATIOS} TRAIN.STEPSIZE ${STEPSIZE} ${EXTRA_ARGS}
fi
fi
================================================
FILE: experiments/scripts/test_faster_rcnn.sh
================================================
#!/bin/bash
set -x
set -e
export PYTHONUNBUFFERED="True"
GPU_ID=$1
DATASET=$2
NET=$3
array=( $@ )
len=${#array[@]}
EXTRA_ARGS=${array[@]:3:$len}
EXTRA_ARGS_SLUG=${EXTRA_ARGS// /_}
case ${DATASET} in
pascal_voc)
TRAIN_IMDB="voc_2007_trainval"
TEST_IMDB="voc_2007_test"
ITERS=70000
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
pascal_voc_0712)
TRAIN_IMDB="voc_2007_trainval+voc_2012_trainval"
TEST_IMDB="voc_2007_test"
ITERS=110000
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
coco)
TRAIN_IMDB="coco_2014_train+coco_2014_valminusminival"
TEST_IMDB="coco_2014_minival"
ITERS=490000
ANCHORS="[4,8,16,32]"
RATIOS="[0.5,1,2]"
;;
*)
echo "No dataset given"
exit
;;
esac
LOG="experiments/logs/test_${NET}_${TRAIN_IMDB}_${EXTRA_ARGS_SLUG}.txt.`date +'%Y-%m-%d_%H-%M-%S'`"
exec &> >(tee -a "$LOG")
echo Logging output to "$LOG"
set +x
if [[ ! -z ${EXTRA_ARGS_SLUG} ]]; then
NET_FINAL=output/${NET}/${TRAIN_IMDB}/${EXTRA_ARGS_SLUG}/${NET}_faster_rcnn_iter_${ITERS}.ckpt
else
NET_FINAL=output/${NET}/${TRAIN_IMDB}/default/${NET}_faster_rcnn_iter_${ITERS}.ckpt
fi
set -x
if [[ ! -z ${EXTRA_ARGS_SLUG} ]]; then
CUDA_VISIBLE_DEVICES=${GPU_ID} time python ./tools/test_net.py \
--imdb ${TEST_IMDB} \
--model ${NET_FINAL} \
--cfg experiments/cfgs/${NET}.yml \
--tag ${EXTRA_ARGS_SLUG} \
--net ${NET} \
--set ANCHOR_SCALES ${ANCHORS} ANCHOR_RATIOS ${RATIOS} ${EXTRA_ARGS}
else
CUDA_VISIBLE_DEVICES=${GPU_ID} time python ./tools/test_net.py \
--imdb ${TEST_IMDB} \
--model ${NET_FINAL} \
--cfg experiments/cfgs/${NET}.yml \
--net ${NET} \
--set ANCHOR_SCALES ${ANCHORS} ANCHOR_RATIOS ${RATIOS} ${EXTRA_ARGS}
fi
================================================
FILE: experiments/scripts/train_faster_rcnn.sh
================================================
#!/bin/bash
set -x
set -e
export PYTHONUNBUFFERED="True"
GPU_ID=$1
DATASET=$2
NET=$3
array=( $@ )
len=${#array[@]}
EXTRA_ARGS=${array[@]:3:$len}
EXTRA_ARGS_SLUG=${EXTRA_ARGS// /_}
case ${DATASET} in
graspRGB)
TRAIN_IMDB="graspRGB_train"
TEST_IMDB="graspRGB_test"
STEPSIZE=50000
ITERS=240000
#ANCHORS="[2,4,8,16,32]"
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
pascal_voc)
TRAIN_IMDB="voc_2007_trainval"
TEST_IMDB="voc_2007_test"
STEPSIZE=50000
ITERS=70000
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
pascal_voc_0712)
TRAIN_IMDB="voc_2007_trainval+voc_2012_trainval"
TEST_IMDB="voc_2007_test"
STEPSIZE=80000
ITERS=110000
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
coco)
TRAIN_IMDB="coco_2014_train+coco_2014_valminusminival"
TEST_IMDB="coco_2014_minival"
STEPSIZE=350000
ITERS=490000
ANCHORS="[4,8,16,32]"
RATIOS="[0.5,1,2]"
;;
*)
echo "No dataset given"
exit
;;
esac
LOG="experiments/logs/${NET}_${TRAIN_IMDB}_${EXTRA_ARGS_SLUG}_${NET}.txt.`date +'%Y-%m-%d_%H-%M-%S'`"
exec &> >(tee -a "$LOG")
echo Logging output to "$LOG"
set +x
if [[ ! -z ${EXTRA_ARGS_SLUG} ]]; then
NET_FINAL=output/${NET}/${TRAIN_IMDB}/${EXTRA_ARGS_SLUG}/${NET}_faster_rcnn_iter_${ITERS}.ckpt
else
NET_FINAL=output/${NET}/${TRAIN_IMDB}/default/${NET}_faster_rcnn_iter_${ITERS}.ckpt
fi
set -x
if [ ! -f ${NET_FINAL}.index ]; then
if [[ ! -z ${EXTRA_ARGS_SLUG} ]]; then
CUDA_VISIBLE_DEVICES=${GPU_ID} time python ./tools/trainval_net.py \
--weight data/imagenet_weights/${NET}.ckpt \
--imdb ${TRAIN_IMDB} \
--imdbval ${TEST_IMDB} \
--iters ${ITERS} \
--cfg experiments/cfgs/${NET}.yml \
--tag ${EXTRA_ARGS_SLUG} \
--net ${NET} \
--set ANCHOR_SCALES ${ANCHORS} ANCHOR_RATIOS ${RATIOS} TRAIN.STEPSIZE ${STEPSIZE} ${EXTRA_ARGS}
else
CUDA_VISIBLE_DEVICES=${GPU_ID} time python ./tools/trainval_net.py \
--weight data/imagenet_weights/${NET}.ckpt \
--imdb ${TRAIN_IMDB} \
--imdbval ${TEST_IMDB} \
--iters ${ITERS} \
--cfg experiments/cfgs/${NET}.yml \
--net ${NET} \
--set ANCHOR_SCALES ${ANCHORS} ANCHOR_RATIOS ${RATIOS} TRAIN.STEPSIZE ${STEPSIZE} ${EXTRA_ARGS}
fi
fi
./experiments/scripts/test_faster_rcnn.sh $@
================================================
FILE: experiments/scripts/train_faster_rcnn.sh~
================================================
#!/bin/bash
set -x
set -e
export PYTHONUNBUFFERED="True"
GPU_ID=$1
DATASET=$2
NET=$3
array=( $@ )
len=${#array[@]}
EXTRA_ARGS=${array[@]:3:$len}
EXTRA_ARGS_SLUG=${EXTRA_ARGS// /_}
case ${DATASET} in
graspRGB)
TRAIN_IMDB="graspRGB_train"
TEST_IMDB="graspRGB_test"
STEPSIZE=50000
ITERS=160000
#ANCHORS="[2,4,8,16,32]"
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
pascal_voc)
TRAIN_IMDB="voc_2007_trainval"
TEST_IMDB="voc_2007_test"
STEPSIZE=50000
ITERS=70000
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
pascal_voc_0712)
TRAIN_IMDB="voc_2007_trainval+voc_2012_trainval"
TEST_IMDB="voc_2007_test"
STEPSIZE=80000
ITERS=110000
ANCHORS="[8,16,32]"
RATIOS="[0.5,1,2]"
;;
coco)
TRAIN_IMDB="coco_2014_train+coco_2014_valminusminival"
TEST_IMDB="coco_2014_minival"
STEPSIZE=350000
ITERS=490000
ANCHORS="[4,8,16,32]"
RATIOS="[0.5,1,2]"
;;
*)
echo "No dataset given"
exit
;;
esac
LOG="experiments/logs/${NET}_${TRAIN_IMDB}_${EXTRA_ARGS_SLUG}_${NET}.txt.`date +'%Y-%m-%d_%H-%M-%S'`"
exec &> >(tee -a "$LOG")
echo Logging output to "$LOG"
set +x
if [[ ! -z ${EXTRA_ARGS_SLUG} ]]; then
NET_FINAL=output/${NET}/${TRAIN_IMDB}/${EXTRA_ARGS_SLUG}/${NET}_faster_rcnn_iter_${ITERS}.ckpt
else
NET_FINAL=output/${NET}/${TRAIN_IMDB}/default/${NET}_faster_rcnn_iter_${ITERS}.ckpt
fi
set -x
if [ ! -f ${NET_FINAL}.index ]; then
if [[ ! -z ${EXTRA_ARGS_SLUG} ]]; then
CUDA_VISIBLE_DEVICES=${GPU_ID} time python ./tools/trainval_net.py \
--weight data/imagenet_weights/${NET}.ckpt \
--imdb ${TRAIN_IMDB} \
--imdbval ${TEST_IMDB} \
--iters ${ITERS} \
--cfg experiments/cfgs/${NET}.yml \
--tag ${EXTRA_ARGS_SLUG} \
--net ${NET} \
--set ANCHOR_SCALES ${ANCHORS} ANCHOR_RATIOS ${RATIOS} TRAIN.STEPSIZE ${STEPSIZE} ${EXTRA_ARGS}
else
CUDA_VISIBLE_DEVICES=${GPU_ID} time python ./tools/trainval_net.py \
--weight data/imagenet_weights/${NET}.ckpt \
--imdb ${TRAIN_IMDB} \
--imdbval ${TEST_IMDB} \
--iters ${ITERS} \
--cfg experiments/cfgs/${NET}.yml \
--net ${NET} \
--set ANCHOR_SCALES ${ANCHORS} ANCHOR_RATIOS ${RATIOS} TRAIN.STEPSIZE ${STEPSIZE} ${EXTRA_ARGS}
fi
fi
./experiments/scripts/test_faster_rcnn.sh $@
================================================
FILE: lib/Makefile
================================================
all:
python setup.py build_ext --inplace
rm -rf build
clean:
rm -rf */*.pyc
rm -rf */*.so
================================================
FILE: lib/datasets/VOCdevkit-matlab-wrapper/get_voc_opts.m
================================================
function VOCopts = get_voc_opts(path)
tmp = pwd;
cd(path);
try
addpath('VOCcode');
VOCinit;
catch
rmpath('VOCcode');
cd(tmp);
error(sprintf('VOCcode directory not found under %s', path));
end
rmpath('VOCcode');
cd(tmp);
================================================
FILE: lib/datasets/VOCdevkit-matlab-wrapper/voc_eval.m
================================================
function res = voc_eval(path, comp_id, test_set, output_dir)
VOCopts = get_voc_opts(path);
VOCopts.testset = test_set;
for i = 1:length(VOCopts.classes)
cls = VOCopts.classes{i};
res(i) = voc_eval_cls(cls, VOCopts, comp_id, output_dir);
end
fprintf('\n~~~~~~~~~~~~~~~~~~~~\n');
fprintf('Results:\n');
aps = [res(:).ap]';
fprintf('%.1f\n', aps * 100);
fprintf('%.1f\n', mean(aps) * 100);
fprintf('~~~~~~~~~~~~~~~~~~~~\n');
function res = voc_eval_cls(cls, VOCopts, comp_id, output_dir)
test_set = VOCopts.testset;
year = VOCopts.dataset(4:end);
addpath(fullfile(VOCopts.datadir, 'VOCcode'));
res_fn = sprintf(VOCopts.detrespath, comp_id, cls);
recall = [];
prec = [];
ap = 0;
ap_auc = 0;
do_eval = (str2num(year) <= 2007) | ~strcmp(test_set, 'test');
if do_eval
% Bug in VOCevaldet requires that tic has been called first
tic;
[recall, prec, ap] = VOCevaldet(VOCopts, comp_id, cls, true);
ap_auc = xVOCap(recall, prec);
% force plot limits
ylim([0 1]);
xlim([0 1]);
print(gcf, '-djpeg', '-r0', ...
[output_dir '/' cls '_pr.jpg']);
end
fprintf('!!! %s : %.4f %.4f\n', cls, ap, ap_auc);
res.recall = recall;
res.prec = prec;
res.ap = ap;
res.ap_auc = ap_auc;
save([output_dir '/' cls '_pr.mat'], ...
'res', 'recall', 'prec', 'ap', 'ap_auc');
rmpath(fullfile(VOCopts.datadir, 'VOCcode'));
================================================
FILE: lib/datasets/VOCdevkit-matlab-wrapper/xVOCap.m
================================================
function ap = xVOCap(rec,prec)
% From the PASCAL VOC 2011 devkit
mrec=[0 ; rec ; 1];
mpre=[0 ; prec ; 0];
for i=numel(mpre)-1:-1:1
mpre(i)=max(mpre(i),mpre(i+1));
end
i=find(mrec(2:end)~=mrec(1:end-1))+1;
ap=sum((mrec(i)-mrec(i-1)).*mpre(i));
================================================
FILE: lib/datasets/__init__.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
================================================
FILE: lib/datasets/coco.py
================================================
# --------------------------------------------------------
# Fast/er R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from datasets.imdb import imdb
import datasets.ds_utils as ds_utils
from model.config import cfg
import os.path as osp
import sys
import os
import numpy as np
import scipy.sparse
import scipy.io as sio
import pickle
import json
import uuid
# COCO API
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
from pycocotools import mask as COCOmask
class coco(imdb):
def __init__(self, image_set, year):
imdb.__init__(self, 'coco_' + year + '_' + image_set)
# COCO specific config options
self.config = {'use_salt': True,
'cleanup': True}
# name, paths
self._year = year
self._image_set = image_set
self._data_path = osp.join(cfg.DATA_DIR, 'coco')
# load COCO API, classes, class <-> id mappings
self._COCO = COCO(self._get_ann_file())
cats = self._COCO.loadCats(self._COCO.getCatIds())
self._classes = tuple(['__background__'] + [c['name'] for c in cats])
self._class_to_ind = dict(list(zip(self.classes, list(range(self.num_classes)))))
self._class_to_coco_cat_id = dict(list(zip([c['name'] for c in cats],
self._COCO.getCatIds())))
self._image_index = self._load_image_set_index()
# Default to roidb handler
self.set_proposal_method('gt')
self.competition_mode(False)
# Some image sets are "views" (i.e. subsets) into others.
# For example, minival2014 is a random 5000 image subset of val2014.
# This mapping tells us where the view's images and proposals come from.
self._view_map = {
'minival2014': 'val2014', # 5k val2014 subset
'valminusminival2014': 'val2014', # val2014 \setminus minival2014
'test-dev2015': 'test2015',
}
coco_name = image_set + year # e.g., "val2014"
self._data_name = (self._view_map[coco_name]
if coco_name in self._view_map
else coco_name)
# Dataset splits that have ground-truth annotations (test splits
# do not have gt annotations)
self._gt_splits = ('train', 'val', 'minival')
def _get_ann_file(self):
prefix = 'instances' if self._image_set.find('test') == -1 \
else 'image_info'
return osp.join(self._data_path, 'annotations',
prefix + '_' + self._image_set + self._year + '.json')
def _load_image_set_index(self):
"""
Load image ids.
"""
image_ids = self._COCO.getImgIds()
return image_ids
def _get_widths(self):
anns = self._COCO.loadImgs(self._image_index)
widths = [ann['width'] for ann in anns]
return widths
def image_path_at(self, i):
"""
Return the absolute path to image i in the image sequence.
"""
return self.image_path_from_index(self._image_index[i])
def image_path_from_index(self, index):
"""
Construct an image path from the image's "index" identifier.
"""
# Example image path for index=119993:
# images/train2014/COCO_train2014_000000119993.jpg
file_name = ('COCO_' + self._data_name + '_' +
str(index).zfill(12) + '.jpg')
image_path = osp.join(self._data_path, 'images',
self._data_name, file_name)
assert osp.exists(image_path), \
'Path does not exist: {}'.format(image_path)
return image_path
def gt_roidb(self):
"""
Return the database of ground-truth regions of interest.
This function loads/saves from/to a cache file to speed up future calls.
"""
cache_file = osp.join(self.cache_path, self.name + '_gt_roidb.pkl')
if osp.exists(cache_file):
with open(cache_file, 'rb') as fid:
roidb = pickle.load(fid)
print('{} gt roidb loaded from {}'.format(self.name, cache_file))
return roidb
gt_roidb = [self._load_coco_annotation(index)
for index in self._image_index]
with open(cache_file, 'wb') as fid:
pickle.dump(gt_roidb, fid, pickle.HIGHEST_PROTOCOL)
print('wrote gt roidb to {}'.format(cache_file))
return gt_roidb
def _load_coco_annotation(self, index):
"""
Loads COCO bounding-box instance annotations. Crowd instances are
handled by marking their overlaps (with all categories) to -1. This
overlap value means that crowd "instances" are excluded from training.
"""
im_ann = self._COCO.loadImgs(index)[0]
width = im_ann['width']
height = im_ann['height']
annIds = self._COCO.getAnnIds(imgIds=index, iscrowd=None)
objs = self._COCO.loadAnns(annIds)
# Sanitize bboxes -- some are invalid
valid_objs = []
for obj in objs:
x1 = np.max((0, obj['bbox'][0]))
y1 = np.max((0, obj['bbox'][1]))
x2 = np.min((width - 1, x1 + np.max((0, obj['bbox'][2] - 1))))
y2 = np.min((height - 1, y1 + np.max((0, obj['bbox'][3] - 1))))
if obj['area'] > 0 and x2 >= x1 and y2 >= y1:
obj['clean_bbox'] = [x1, y1, x2, y2]
valid_objs.append(obj)
objs = valid_objs
num_objs = len(objs)
boxes = np.zeros((num_objs, 4), dtype=np.uint16)
gt_classes = np.zeros((num_objs), dtype=np.int32)
overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)
seg_areas = np.zeros((num_objs), dtype=np.float32)
# Lookup table to map from COCO category ids to our internal class
# indices
coco_cat_id_to_class_ind = dict([(self._class_to_coco_cat_id[cls],
self._class_to_ind[cls])
for cls in self._classes[1:]])
for ix, obj in enumerate(objs):
cls = coco_cat_id_to_class_ind[obj['category_id']]
boxes[ix, :] = obj['clean_bbox']
gt_classes[ix] = cls
seg_areas[ix] = obj['area']
if obj['iscrowd']:
# Set overlap to -1 for all classes for crowd objects
# so they will be excluded during training
overlaps[ix, :] = -1.0
else:
overlaps[ix, cls] = 1.0
ds_utils.validate_boxes(boxes, width=width, height=height)
overlaps = scipy.sparse.csr_matrix(overlaps)
return {'width': width,
'height': height,
'boxes': boxes,
'gt_classes': gt_classes,
'gt_overlaps': overlaps,
'flipped': False,
'seg_areas': seg_areas}
def _get_widths(self):
return [r['width'] for r in self.roidb]
def append_flipped_images(self):
num_images = self.num_images
widths = self._get_widths()
for i in range(num_images):
boxes = self.roidb[i]['boxes'].copy()
oldx1 = boxes[:, 0].copy()
oldx2 = boxes[:, 2].copy()
boxes[:, 0] = widths[i] - oldx2 - 1
boxes[:, 2] = widths[i] - oldx1 - 1
assert (boxes[:, 2] >= boxes[:, 0]).all()
entry = {'width': widths[i],
'height': self.roidb[i]['height'],
'boxes': boxes,
'gt_classes': self.roidb[i]['gt_classes'],
'gt_overlaps': self.roidb[i]['gt_overlaps'],
'flipped': True,
'seg_areas': self.roidb[i]['seg_areas']}
self.roidb.append(entry)
self._image_index = self._image_index * 2
def _get_box_file(self, index):
# first 14 chars / first 22 chars / all chars + .mat
# COCO_val2014_0/COCO_val2014_000000447/COCO_val2014_000000447991.mat
file_name = ('COCO_' + self._data_name +
'_' + str(index).zfill(12) + '.mat')
return osp.join(file_name[:14], file_name[:22], file_name)
def _print_detection_eval_metrics(self, coco_eval):
IoU_lo_thresh = 0.5
IoU_hi_thresh = 0.95
def _get_thr_ind(coco_eval, thr):
ind = np.where((coco_eval.params.iouThrs > thr - 1e-5) &
(coco_eval.params.iouThrs < thr + 1e-5))[0][0]
iou_thr = coco_eval.params.iouThrs[ind]
assert np.isclose(iou_thr, thr)
return ind
ind_lo = _get_thr_ind(coco_eval, IoU_lo_thresh)
ind_hi = _get_thr_ind(coco_eval, IoU_hi_thresh)
# precision has dims (iou, recall, cls, area range, max dets)
# area range index 0: all area ranges
# max dets index 2: 100 per image
precision = \
coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, :, 0, 2]
ap_default = np.mean(precision[precision > -1])
print(('~~~~ Mean and per-category AP @ IoU=[{:.2f},{:.2f}] '
'~~~~').format(IoU_lo_thresh, IoU_hi_thresh))
print('{:.1f}'.format(100 * ap_default))
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
# minus 1 because of __background__
precision = coco_eval.eval['precision'][ind_lo:(ind_hi + 1), :, cls_ind - 1, 0, 2]
ap = np.mean(precision[precision > -1])
print('{:.1f}'.format(100 * ap))
print('~~~~ Summary metrics ~~~~')
coco_eval.summarize()
def _do_detection_eval(self, res_file, output_dir):
ann_type = 'bbox'
coco_dt = self._COCO.loadRes(res_file)
coco_eval = COCOeval(self._COCO, coco_dt)
coco_eval.params.useSegm = (ann_type == 'segm')
coco_eval.evaluate()
coco_eval.accumulate()
self._print_detection_eval_metrics(coco_eval)
eval_file = osp.join(output_dir, 'detection_results.pkl')
with open(eval_file, 'wb') as fid:
pickle.dump(coco_eval, fid, pickle.HIGHEST_PROTOCOL)
print('Wrote COCO eval results to: {}'.format(eval_file))
def _coco_results_one_category(self, boxes, cat_id):
results = []
for im_ind, index in enumerate(self.image_index):
dets = boxes[im_ind].astype(np.float)
if dets == []:
continue
scores = dets[:, -1]
xs = dets[:, 0]
ys = dets[:, 1]
ws = dets[:, 2] - xs + 1
hs = dets[:, 3] - ys + 1
results.extend(
[{'image_id': index,
'category_id': cat_id,
'bbox': [xs[k], ys[k], ws[k], hs[k]],
'score': scores[k]} for k in range(dets.shape[0])])
return results
def _write_coco_results_file(self, all_boxes, res_file):
# [{"image_id": 42,
# "category_id": 18,
# "bbox": [258.15,41.29,348.26,243.78],
# "score": 0.236}, ...]
results = []
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
print('Collecting {} results ({:d}/{:d})'.format(cls, cls_ind,
self.num_classes - 1))
coco_cat_id = self._class_to_coco_cat_id[cls]
results.extend(self._coco_results_one_category(all_boxes[cls_ind],
coco_cat_id))
print('Writing results json to {}'.format(res_file))
with open(res_file, 'w') as fid:
json.dump(results, fid)
def evaluate_detections(self, all_boxes, output_dir):
res_file = osp.join(output_dir, ('detections_' +
self._image_set +
self._year +
'_results'))
if self.config['use_salt']:
res_file += '_{}'.format(str(uuid.uuid4()))
res_file += '.json'
self._write_coco_results_file(all_boxes, res_file)
# Only do evaluation on non-test sets
if self._image_set.find('test') == -1:
self._do_detection_eval(res_file, output_dir)
# Optionally cleanup results json file
if self.config['cleanup']:
os.remove(res_file)
def competition_mode(self, on):
if on:
self.config['use_salt'] = False
self.config['cleanup'] = False
else:
self.config['use_salt'] = True
self.config['cleanup'] = True
================================================
FILE: lib/datasets/ds_utils.py
================================================
# --------------------------------------------------------
# Fast/er R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def unique_boxes(boxes, scale=1.0):
"""Return indices of unique boxes."""
v = np.array([1, 1e3, 1e6, 1e9])
hashes = np.round(boxes * scale).dot(v)
_, index = np.unique(hashes, return_index=True)
return np.sort(index)
def xywh_to_xyxy(boxes):
"""Convert [x y w h] box format to [x1 y1 x2 y2] format."""
return np.hstack((boxes[:, 0:2], boxes[:, 0:2] + boxes[:, 2:4] - 1))
def xyxy_to_xywh(boxes):
"""Convert [x1 y1 x2 y2] box format to [x y w h] format."""
return np.hstack((boxes[:, 0:2], boxes[:, 2:4] - boxes[:, 0:2] + 1))
def validate_boxes(boxes, width=0, height=0):
"""Check that a set of boxes are valid."""
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
assert (x1 >= 0).all()
assert (y1 >= 0).all()
assert (x2 >= x1).all()
assert (y2 >= y1).all()
assert (x2 < width).all()
assert (y2 < height).all()
def filter_small_boxes(boxes, min_size):
w = boxes[:, 2] - boxes[:, 0]
h = boxes[:, 3] - boxes[:, 1]
keep = np.where((w >= min_size) & (h > min_size))[0]
return keep
================================================
FILE: lib/datasets/factory.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Factory method for easily getting imdbs by name."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from datasets.graspRGB import graspRGB
__sets = {}
from datasets.pascal_voc import pascal_voc
from datasets.coco import coco
import numpy as np
# Set up voc_<year>_<split>
for year in ['2007', '2012']:
for split in ['train', 'val', 'trainval', 'test']:
name = 'voc_{}_{}'.format(year, split)
__sets[name] = (lambda split=split, year=year: pascal_voc(split, year))
# Set up coco_2014_<split>
for year in ['2014']:
for split in ['train', 'val', 'minival', 'valminusminival', 'trainval']:
name = 'coco_{}_{}'.format(year, split)
__sets[name] = (lambda split=split, year=year: coco(split, year))
# Set up coco_2015_<split>
for year in ['2015']:
for split in ['test', 'test-dev']:
name = 'coco_{}_{}'.format(year, split)
__sets[name] = (lambda split=split, year=year: coco(split, year))
# Set up graspRGB_<split> using selective search "fast" mode # added by FC
graspRGB_devkit_path = '/media/fujenchu/home3/fasterrcnn_grasp/rgb_multibbs_5_5_5_object_tf'
for split in ['train', 'test']:
name = '{}_{}'.format('graspRGB', split)
__sets[name] = (lambda split=split: graspRGB(split, graspRGB_devkit_path))
def get_imdb(name):
"""Get an imdb (image database) by name."""
if name not in __sets:
raise KeyError('Unknown dataset: {}'.format(name))
return __sets[name]()
def list_imdbs():
"""List all registered imdbs."""
return list(__sets.keys())
================================================
FILE: lib/datasets/factory.py~
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""Factory method for easily getting imdbs by name."""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from datasets.graspRGB import graspRGB
__sets = {}
from datasets.pascal_voc import pascal_voc
from datasets.coco import coco
import numpy as np
# Set up voc_<year>_<split>
for year in ['2007', '2012']:
for split in ['train', 'val', 'trainval', 'test']:
name = 'voc_{}_{}'.format(year, split)
__sets[name] = (lambda split=split, year=year: pascal_voc(split, year))
# Set up coco_2014_<split>
for year in ['2014']:
for split in ['train', 'val', 'minival', 'valminusminival', 'trainval']:
name = 'coco_{}_{}'.format(year, split)
__sets[name] = (lambda split=split, year=year: coco(split, year))
# Set up coco_2015_<split>
for year in ['2015']:
for split in ['test', 'test-dev']:
name = 'coco_{}_{}'.format(year, split)
__sets[name] = (lambda split=split, year=year: coco(split, year))
# Set up graspRGB_<split> using selective search "fast" mode # added by FC
graspRGB_devkit_path = '/media/fujenchu/home3/fasterrcnn_grasp/rgd_multibbs_5_5_5_object_tf'
for split in ['train', 'test']:
name = '{}_{}'.format('graspRGB', split)
__sets[name] = (lambda split=split: graspRGB(split, graspRGB_devkit_path))
def get_imdb(name):
"""Get an imdb (image database) by name."""
if name not in __sets:
raise KeyError('Unknown dataset: {}'.format(name))
return __sets[name]()
def list_imdbs():
"""List all registered imdbs."""
return list(__sets.keys())
================================================
FILE: lib/datasets/graspRGB.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import datasets
import datasets.graspRGB
import os
from datasets.imdb import imdb
import datasets.ds_utils as ds_utils
import xml.etree.ElementTree as ET
import numpy as np
import scipy.sparse
import scipy.io as sio
import utils.cython_bbox
import cPickle
import subprocess
import uuid
from voc_eval import voc_eval
class graspRGB(imdb):
def __init__(self, image_set, devkit_path):
imdb.__init__(self, image_set)
self._image_set = image_set
self._devkit_path = devkit_path
self._data_path = os.path.join(self._devkit_path, 'data')
self._classes = ('__background__', # always index 0
'angle_01', 'angle_02', 'angle_03', 'angle_04', 'angle_05',
'angle_06', 'angle_07', 'angle_08', 'angle_09', 'angle_10',
'angle_11', 'angle_12', 'angle_13', 'angle_14', 'angle_15',
'angle_16', 'angle_17', 'angle_18', 'angle_19')
self._class_to_ind = dict(zip(self.classes, xrange(self.num_classes)))
self._image_ext = ['.jpg', '.png']
self._image_index = self._load_image_set_index()
# Default to roidb handler
self._roidb_handler = self.selective_search_roidb
self._salt = str(uuid.uuid4())
self._comp_id = 'comp4'
# Specific config options
self.config = {'cleanup' : True,
'use_salt' : True,
'use_diff' : False,
'matlab_eval' : False,
'rpn_file' : None,
'min_size' : 2}
assert os.path.exists(self._devkit_path), \
'Devkit path does not exist: {}'.format(self._devkit_path)
assert os.path.exists(self._data_path), \
'Path does not exist: {}'.format(self._data_path)
def image_path_at(self, i):
"""
Return the absolute path to image i in the image sequence.
"""
return self.image_path_from_index(self._image_index[i])
def image_path_from_index(self, index):
"""
Construct an image path from the image's "index" identifier.
"""
for ext in self._image_ext:
image_path = os.path.join(self._data_path, 'Images',
index + ext)
if os.path.exists(image_path):
break
assert os.path.exists(image_path), \
'Path does not exist: {}'.format(image_path)
return image_path
def _load_image_set_index(self):
"""
Load the indexes listed in this dataset's image set file.
"""
# Example path to image set file:
# self._data_path + /ImageSets/val.txt
image_set_file = os.path.join(self._data_path, 'ImageSets',
self._image_set + '.txt')
assert os.path.exists(image_set_file), \
'Path does not exist: {}'.format(image_set_file)
with open(image_set_file) as f:
image_index = [x.strip() for x in f.readlines()]
# if you output image_index, it's ['I00001', 'I00002', ..] all the file names, not just numbers
return image_index
def gt_roidb(self):
"""
Return the database of ground-truth regions of interest.
This function loads/saves from/to a cache file to speed up future calls.
"""
cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl')
if os.path.exists(cache_file):
with open(cache_file, 'rb') as fid:
roidb = cPickle.load(fid)
print '{} gt roidb loaded from {}'.format(self.name, cache_file)
return roidb
gt_roidb = [self._load_graspRGB_annotation(index)
for index in self.image_index]
with open(cache_file, 'wb') as fid:
cPickle.dump(gt_roidb, fid, cPickle.HIGHEST_PROTOCOL)
print 'wrote gt roidb to {}'.format(cache_file)
return gt_roidb
def selective_search_roidb(self):
"""
Return the database of selective search regions of interest.
Ground-truth ROIs are also included.
This function loads/saves from/to a cache file to speed up future calls.
"""
cache_file = os.path.join(self.cache_path,
self.name + '_selective_search_roidb.pkl')
if os.path.exists(cache_file):
with open(cache_file, 'rb') as fid:
roidb = cPickle.load(fid)
print '{} ss roidb loaded from {}'.format(self.name, cache_file)
return roidb
if self._image_set != 'test':
gt_roidb = self.gt_roidb()
ss_roidb = self._load_selective_search_roidb(gt_roidb)
roidb = imdb.merge_roidbs(gt_roidb, ss_roidb)
else:
roidb = self._load_selective_search_roidb(None)
print len(roidb)
with open(cache_file, 'wb') as fid:
cPickle.dump(roidb, fid, cPickle.HIGHEST_PROTOCOL)
print 'wrote ss roidb to {}'.format(cache_file)
return roidb
def _load_selective_search_roidb(self, gt_roidb):
filename = os.path.abspath(os.path.join(self._devkit_path,
self.name + '.mat'))
assert os.path.exists(filename), \
'Selective search data not found at: {}'.format(filename)
print filename
raw_data = sio.loadmat(filename)['all_boxes'].ravel()
box_list = []
for i in xrange(raw_data.shape[0]):
box_list.append(raw_data[i][:, (1, 0, 3, 2)] - 1)
return self.create_roidb_from_box_list(box_list, gt_roidb)
def selective_search_IJCV_roidb(self):
"""
eturn the database of selective search regions of interest.
Ground-truth ROIs are also included.
This function loads/saves from/to a cache file to speed up future calls.
"""
cache_file = os.path.join(self.cache_path,
'{:s}_selective_search_IJCV_top_{:d}_roidb.pkl'.
format(self.name, self.config['top_k']))
if os.path.exists(cache_file):
with open(cache_file, 'rb') as fid:
roidb = cPickle.load(fid)
print '{} ss roidb loaded from {}'.format(self.name, cache_file)
return roidb
gt_roidb = self.gt_roidb()
ss_roidb = self._load_selective_search_IJCV_roidb(gt_roidb)
roidb = datasets.imdb.merge_roidbs(gt_roidb, ss_roidb)
with open(cache_file, 'wb') as fid:
cPickle.dump(roidb, fid, cPickle.HIGHEST_PROTOCOL)
print 'wrote ss roidb to {}'.format(cache_file)
return roidb
def _load_selective_search_IJCV_roidb(self, gt_roidb):
IJCV_path = os.path.abspath(os.path.join(self.cache_path, '..',
'selective_search_IJCV_data',
self.name))
assert os.path.exists(IJCV_path), \
'Selective search IJCV data not found at: {}'.format(IJCV_path)
top_k = self.config['top_k']
box_list = []
for i in xrange(self.num_images):
filename = os.path.join(IJCV_path, self.image_index[i] + '.mat')
raw_data = sio.loadmat(filename)
box_list.append((raw_data['boxes'][:top_k, :]-1).astype(np.uint16))
return self.create_roidb_from_box_list(box_list, gt_roidb)
def _load_graspRGB_annotation(self, index):
"""
Load image and bounding boxes info from txt files of graspRGB.
"""
filename = os.path.join(self._data_path, 'Annotations', index + '.txt')
print 'Loading: {}'.format(filename)
with open(filename) as f:
data = f.readlines()
num_objs = len(data)
print len(data)
if len(data) == 0:
print 'yooooooooo'
import sys
sys.exit()
boxes = np.zeros((num_objs, 4), dtype=np.uint16)
gt_classes = np.zeros((num_objs), dtype=np.int32)
overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)
# "Seg" area for pascal is just the box area
seg_areas = np.zeros((num_objs), dtype=np.float32)
# Load object bounding boxes into a data frame.
for ix, aline in enumerate(data):
# Make pixel indexes 0-based
tokens = aline.strip().split()
if len(tokens) != 5:
continue
cls = int(tokens[0]) # this file uses 0 as the background
x1 = float(tokens[1])
y1 = float(tokens[2])
x2 = float(tokens[3])
y2 = float(tokens[4])
# if not doing this, there is negative value when bbs around boundary of image, and when it got read back, it becomes 655xx
if (x1<0 and x2<0) or (y1<0 and y2<0):
print 'yooooooooo'
import sys
sys.exit()
if x1 < 0:
x1 = 0
if x2 < 0:
x2 = 0
if y1 < 0:
y1 = 0
if y2 < 0:
y2 = 0
gt_classes[ix] = cls
boxes[ix, :] = [x1, y1, x2, y2]
overlaps[ix, cls] = 1.0
seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1)
overlaps = scipy.sparse.csr_matrix(overlaps)
return {'boxes' : boxes,
'gt_classes': gt_classes,
'gt_overlaps' : overlaps,
'flipped' : False,
'seg_areas' : seg_areas}
def _write_graspRGB_results_file(self, all_boxes):
use_salt = self.config['use_salt']
comp_id = 'comp4'
if use_salt:
comp_id += '-{}'.format(os.getpid())
# VOCdevkit/results/comp4-44503_det_test_aeroplane.txt
path = os.path.join(self._devkit_path, 'results', self.name, comp_id + '_')
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
print 'Writing {} results file'.format(cls)
filename = path + 'det_' + self._image_set + '_' + cls + '.txt'
with open(filename, 'wt') as f:
for im_ind, index in enumerate(self.image_index):
dets = all_boxes[cls_ind][im_ind]
if dets == []:
continue
# the VOCdevkit expects 1-based indices
for k in xrange(dets.shape[0]):
f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
format(index, dets[k, -1],
dets[k, 0] + 1, dets[k, 1] + 1,
dets[k, 2] + 1, dets[k, 3] + 1))
return comp_id
def _do_matlab_eval(self, comp_id, output_dir='output'):
rm_results = self.config['cleanup']
path = os.path.join(os.path.dirname(__file__),
'VOCdevkit-matlab-wrapper')
cmd = 'cd {} && '.format(path)
cmd += '{:s} -nodisplay -nodesktop '.format(datasets.MATLAB)
cmd += '-r "dbstop if error; '
cmd += 'setenv(\'LC_ALL\',\'C\'); voc_eval(\'{:s}\',\'{:s}\',\'{:s}\',\'{:s}\',{:d}); quit;"' \
.format(self._devkit_path, comp_id,
self._image_set, output_dir, int(rm_results))
print('Running:\n{}'.format(cmd))
status = subprocess.call(cmd, shell=True)
def evaluate_detections(self, all_boxes, output_dir):
comp_id = self._write_graspRGB_results_file(all_boxes)
self._do_matlab_eval(comp_id, output_dir)
def competition_mode(self, on):
if on:
self.config['use_salt'] = False
self.config['cleanup'] = False
else:
self.config['use_salt'] = True
self.config['cleanup'] = True
if __name__ == '__main__':
d = datasets.graspRGB('train', '')
res = d.roidb
from IPython import embed; embed()
================================================
FILE: lib/datasets/graspRGB.py~
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
import datasets
import datasets.graspRGB
import os
from datasets.imdb import imdb
import datasets.ds_utils as ds_utils
import xml.etree.ElementTree as ET
import numpy as np
import scipy.sparse
import scipy.io as sio
import utils.cython_bbox
import cPickle
import subprocess
import uuid
from voc_eval import voc_eval
class graspRGB(imdb):
def __init__(self, image_set, devkit_path):
imdb.__init__(self, image_set)
self._image_set = image_set
self._devkit_path = devkit_path
self._data_path = os.path.join(self._devkit_path, 'data')
self._classes = ('__background__', # always index 0
'angle_01', 'angle_02', 'angle_03', 'angle_04', 'angle_05',
'angle_06', 'angle_07', 'angle_08', 'angle_09', 'angle_10',
'angle_11', 'angle_12', 'angle_13', 'angle_14', 'angle_15',
'angle_16', 'angle_17', 'angle_18', 'angle_19')
self._class_to_ind = dict(zip(self.classes, xrange(self.num_classes)))
self._image_ext = ['.jpg', '.png']
self._image_index = self._load_image_set_index()
# Default to roidb handler
self._roidb_handler = self.selective_search_roidb
self._salt = str(uuid.uuid4())
self._comp_id = 'comp4'
# Specific config options
self.config = {'cleanup' : True,
'use_salt' : True,
'use_diff' : False,
'matlab_eval' : False,
'rpn_file' : None,
'min_size' : 2}
assert os.path.exists(self._devkit_path), \
'Devkit path does not exist: {}'.format(self._devkit_path)
assert os.path.exists(self._data_path), \
'Path does not exist: {}'.format(self._data_path)
def image_path_at(self, i):
"""
Return the absolute path to image i in the image sequence.
"""
return self.image_path_from_index(self._image_index[i])
def image_path_from_index(self, index):
"""
Construct an image path from the image's "index" identifier.
"""
for ext in self._image_ext:
image_path = os.path.join(self._data_path, 'Images',
index + ext)
if os.path.exists(image_path):
break
assert os.path.exists(image_path), \
'Path does not exist: {}'.format(image_path)
return image_path
def _load_image_set_index(self):
"""
Load the indexes listed in this dataset's image set file.
"""
# Example path to image set file:
# self._data_path + /ImageSets/val.txt
image_set_file = os.path.join(self._data_path, 'ImageSets',
self._image_set + '.txt')
assert os.path.exists(image_set_file), \
'Path does not exist: {}'.format(image_set_file)
with open(image_set_file) as f:
image_index = [x.strip() for x in f.readlines()]
# if you output image_index, it's ['I00001', 'I00002', ..] all the file names, not just numbers
return image_index
def gt_roidb(self):
"""
Return the database of ground-truth regions of interest.
This function loads/saves from/to a cache file to speed up future calls.
"""
cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl')
if os.path.exists(cache_file):
with open(cache_file, 'rb') as fid:
roidb = cPickle.load(fid)
print '{} gt roidb loaded from {}'.format(self.name, cache_file)
return roidb
gt_roidb = [self._load_graspRGB_annotation(index)
for index in self.image_index]
with open(cache_file, 'wb') as fid:
cPickle.dump(gt_roidb, fid, cPickle.HIGHEST_PROTOCOL)
print 'wrote gt roidb to {}'.format(cache_file)
return gt_roidb
def selective_search_roidb(self):
"""
Return the database of selective search regions of interest.
Ground-truth ROIs are also included.
This function loads/saves from/to a cache file to speed up future calls.
"""
cache_file = os.path.join(self.cache_path,
self.name + '_selective_search_roidb.pkl')
if os.path.exists(cache_file):
with open(cache_file, 'rb') as fid:
roidb = cPickle.load(fid)
print '{} ss roidb loaded from {}'.format(self.name, cache_file)
return roidb
if self._image_set != 'test':
gt_roidb = self.gt_roidb()
ss_roidb = self._load_selective_search_roidb(gt_roidb)
roidb = imdb.merge_roidbs(gt_roidb, ss_roidb)
else:
roidb = self._load_selective_search_roidb(None)
print len(roidb)
with open(cache_file, 'wb') as fid:
cPickle.dump(roidb, fid, cPickle.HIGHEST_PROTOCOL)
print 'wrote ss roidb to {}'.format(cache_file)
return roidb
def _load_selective_search_roidb(self, gt_roidb):
filename = os.path.abspath(os.path.join(self._devkit_path,
self.name + '.mat'))
assert os.path.exists(filename), \
'Selective search data not found at: {}'.format(filename)
print filename
raw_data = sio.loadmat(filename)['all_boxes'].ravel()
box_list = []
for i in xrange(raw_data.shape[0]):
box_list.append(raw_data[i][:, (1, 0, 3, 2)] - 1)
return self.create_roidb_from_box_list(box_list, gt_roidb)
def selective_search_IJCV_roidb(self):
"""
eturn the database of selective search regions of interest.
Ground-truth ROIs are also included.
This function loads/saves from/to a cache file to speed up future calls.
"""
cache_file = os.path.join(self.cache_path,
'{:s}_selective_search_IJCV_top_{:d}_roidb.pkl'.
format(self.name, self.config['top_k']))
if os.path.exists(cache_file):
with open(cache_file, 'rb') as fid:
roidb = cPickle.load(fid)
print '{} ss roidb loaded from {}'.format(self.name, cache_file)
return roidb
gt_roidb = self.gt_roidb()
ss_roidb = self._load_selective_search_IJCV_roidb(gt_roidb)
roidb = datasets.imdb.merge_roidbs(gt_roidb, ss_roidb)
with open(cache_file, 'wb') as fid:
cPickle.dump(roidb, fid, cPickle.HIGHEST_PROTOCOL)
print 'wrote ss roidb to {}'.format(cache_file)
return roidb
def _load_selective_search_IJCV_roidb(self, gt_roidb):
IJCV_path = os.path.abspath(os.path.join(self.cache_path, '..',
'selective_search_IJCV_data',
self.name))
assert os.path.exists(IJCV_path), \
'Selective search IJCV data not found at: {}'.format(IJCV_path)
top_k = self.config['top_k']
box_list = []
for i in xrange(self.num_images):
filename = os.path.join(IJCV_path, self.image_index[i] + '.mat')
raw_data = sio.loadmat(filename)
box_list.append((raw_data['boxes'][:top_k, :]-1).astype(np.uint16))
return self.create_roidb_from_box_list(box_list, gt_roidb)
def _load_graspRGB_annotation(self, index):
"""
Load image and bounding boxes info from txt files of graspRGB.
"""
filename = os.path.join(self._data_path, 'Annotations', index + '.txt')
print 'Loading: {}'.format(filename)
with open(filename) as f:
data = f.readlines()
num_objs = len(data)
print len(data)
if len(data) == 0:
print 'yooooooooo'
import sys
sys.exit()
boxes = np.zeros((num_objs, 4), dtype=np.uint16)
gt_classes = np.zeros((num_objs), dtype=np.int32)
overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)
# "Seg" area for pascal is just the box area
seg_areas = np.zeros((num_objs), dtype=np.float32)
# Load object bounding boxes into a data frame.
for ix, aline in enumerate(data):
# Make pixel indexes 0-based
tokens = aline.strip().split()
if len(tokens) != 5:
continue
cls = float(tokens[0]) # this file uses 0 as the background
x1 = float(tokens[1])
y1 = float(tokens[2])
x2 = float(tokens[3])
y2 = float(tokens[4])
# if not doing this, there is negative value when bbs around boundary of image, and when it got read back, it becomes 655xx
if (x1<0 and x2<0) or (y1<0 and y2<0):
print 'yooooooooo'
import sys
sys.exit()
if x1 < 0:
x1 = 0
if x2 < 0:
x2 = 0
if y1 < 0:
y1 = 0
if y2 < 0:
y2 = 0
gt_classes[ix] = cls
boxes[ix, :] = [x1, y1, x2, y2]
overlaps[ix, cls] = 1.0
seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1)
overlaps = scipy.sparse.csr_matrix(overlaps)
return {'boxes' : boxes,
'gt_classes': gt_classes,
'gt_overlaps' : overlaps,
'flipped' : False,
'seg_areas' : seg_areas}
def _write_graspRGB_results_file(self, all_boxes):
use_salt = self.config['use_salt']
comp_id = 'comp4'
if use_salt:
comp_id += '-{}'.format(os.getpid())
# VOCdevkit/results/comp4-44503_det_test_aeroplane.txt
path = os.path.join(self._devkit_path, 'results', self.name, comp_id + '_')
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
print 'Writing {} results file'.format(cls)
filename = path + 'det_' + self._image_set + '_' + cls + '.txt'
with open(filename, 'wt') as f:
for im_ind, index in enumerate(self.image_index):
dets = all_boxes[cls_ind][im_ind]
if dets == []:
continue
# the VOCdevkit expects 1-based indices
for k in xrange(dets.shape[0]):
f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
format(index, dets[k, -1],
dets[k, 0] + 1, dets[k, 1] + 1,
dets[k, 2] + 1, dets[k, 3] + 1))
return comp_id
def _do_matlab_eval(self, comp_id, output_dir='output'):
rm_results = self.config['cleanup']
path = os.path.join(os.path.dirname(__file__),
'VOCdevkit-matlab-wrapper')
cmd = 'cd {} && '.format(path)
cmd += '{:s} -nodisplay -nodesktop '.format(datasets.MATLAB)
cmd += '-r "dbstop if error; '
cmd += 'setenv(\'LC_ALL\',\'C\'); voc_eval(\'{:s}\',\'{:s}\',\'{:s}\',\'{:s}\',{:d}); quit;"' \
.format(self._devkit_path, comp_id,
self._image_set, output_dir, int(rm_results))
print('Running:\n{}'.format(cmd))
status = subprocess.call(cmd, shell=True)
def evaluate_detections(self, all_boxes, output_dir):
comp_id = self._write_graspRGB_results_file(all_boxes)
self._do_matlab_eval(comp_id, output_dir)
def competition_mode(self, on):
if on:
self.config['use_salt'] = False
self.config['cleanup'] = False
else:
self.config['use_salt'] = True
self.config['cleanup'] = True
if __name__ == '__main__':
d = datasets.graspRGB('train', '')
res = d.roidb
from IPython import embed; embed()
================================================
FILE: lib/datasets/imdb.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import os.path as osp
import PIL
from utils.cython_bbox import bbox_overlaps
import numpy as np
import scipy.sparse
from model.config import cfg
class imdb(object):
"""Image database."""
def __init__(self, name, classes=None):
self._name = name
self._num_classes = 0
if not classes:
self._classes = []
else:
self._classes = classes
self._image_index = []
self._obj_proposer = 'gt'
self._roidb = None
self._roidb_handler = self.default_roidb
# Use this dict for storing dataset specific config options
self.config = {}
@property
def name(self):
return self._name
@property
def num_classes(self):
return len(self._classes)
@property
def classes(self):
return self._classes
@property
def image_index(self):
return self._image_index
@property
def roidb_handler(self):
return self._roidb_handler
@roidb_handler.setter
def roidb_handler(self, val):
self._roidb_handler = val
def set_proposal_method(self, method):
method = eval('self.' + method + '_roidb')
self.roidb_handler = method
@property
def roidb(self):
# A roidb is a list of dictionaries, each with the following keys:
# boxes
# gt_overlaps
# gt_classes
# flipped
if self._roidb is not None:
return self._roidb
self._roidb = self.roidb_handler()
return self._roidb
@property
def cache_path(self):
cache_path = osp.abspath(osp.join(cfg.DATA_DIR, 'cache'))
if not os.path.exists(cache_path):
os.makedirs(cache_path)
return cache_path
@property
def num_images(self):
return len(self.image_index)
def image_path_at(self, i):
raise NotImplementedError
def default_roidb(self):
raise NotImplementedError
def evaluate_detections(self, all_boxes, output_dir=None):
"""
all_boxes is a list of length number-of-classes.
Each list element is a list of length number-of-images.
Each of those list elements is either an empty list []
or a numpy array of detection.
all_boxes[class][image] = [] or np.array of shape #dets x 5
"""
raise NotImplementedError
def _get_widths(self):
return [PIL.Image.open(self.image_path_at(i)).size[0]
for i in range(self.num_images)]
def append_flipped_images(self):
num_images = self.num_images
widths = self._get_widths()
for i in range(num_images):
boxes = self.roidb[i]['boxes'].copy()
oldx1 = boxes[:, 0].copy()
oldx2 = boxes[:, 2].copy()
boxes[:, 0] = widths[i] - oldx2 - 1
boxes[:, 2] = widths[i] - oldx1 - 1
assert (boxes[:, 2] >= boxes[:, 0]).all()
entry = {'boxes': boxes,
'gt_overlaps': self.roidb[i]['gt_overlaps'],
'gt_classes': self.roidb[i]['gt_classes'],
'flipped': True}
self.roidb.append(entry)
self._image_index = self._image_index * 2
def evaluate_recall(self, candidate_boxes=None, thresholds=None,
area='all', limit=None):
"""Evaluate detection proposal recall metrics.
Returns:
results: dictionary of results with keys
'ar': average recall
'recalls': vector recalls at each IoU overlap threshold
'thresholds': vector of IoU overlap thresholds
'gt_overlaps': vector of all ground-truth overlaps
"""
# Record max overlap value for each gt box
# Return vector of overlap values
areas = {'all': 0, 'small': 1, 'medium': 2, 'large': 3,
'96-128': 4, '128-256': 5, '256-512': 6, '512-inf': 7}
area_ranges = [[0 ** 2, 1e5 ** 2], # all
[0 ** 2, 32 ** 2], # small
[32 ** 2, 96 ** 2], # medium
[96 ** 2, 1e5 ** 2], # large
[96 ** 2, 128 ** 2], # 96-128
[128 ** 2, 256 ** 2], # 128-256
[256 ** 2, 512 ** 2], # 256-512
[512 ** 2, 1e5 ** 2], # 512-inf
]
assert area in areas, 'unknown area range: {}'.format(area)
area_range = area_ranges[areas[area]]
gt_overlaps = np.zeros(0)
num_pos = 0
for i in range(self.num_images):
# Checking for max_overlaps == 1 avoids including crowd annotations
# (...pretty hacking :/)
max_gt_overlaps = self.roidb[i]['gt_overlaps'].toarray().max(axis=1)
gt_inds = np.where((self.roidb[i]['gt_classes'] > 0) &
(max_gt_overlaps == 1))[0]
gt_boxes = self.roidb[i]['boxes'][gt_inds, :]
gt_areas = self.roidb[i]['seg_areas'][gt_inds]
valid_gt_inds = np.where((gt_areas >= area_range[0]) &
(gt_areas <= area_range[1]))[0]
gt_boxes = gt_boxes[valid_gt_inds, :]
num_pos += len(valid_gt_inds)
if candidate_boxes is None:
# If candidate_boxes is not supplied, the default is to use the
# non-ground-truth boxes from this roidb
non_gt_inds = np.where(self.roidb[i]['gt_classes'] == 0)[0]
boxes = self.roidb[i]['boxes'][non_gt_inds, :]
else:
boxes = candidate_boxes[i]
if boxes.shape[0] == 0:
continue
if limit is not None and boxes.shape[0] > limit:
boxes = boxes[:limit, :]
overlaps = bbox_overlaps(boxes.astype(np.float),
gt_boxes.astype(np.float))
_gt_overlaps = np.zeros((gt_boxes.shape[0]))
for j in range(gt_boxes.shape[0]):
# find which proposal box maximally covers each gt box
argmax_overlaps = overlaps.argmax(axis=0)
# and get the iou amount of coverage for each gt box
max_overlaps = overlaps.max(axis=0)
# find which gt box is 'best' covered (i.e. 'best' = most iou)
gt_ind = max_overlaps.argmax()
gt_ovr = max_overlaps.max()
assert (gt_ovr >= 0)
# find the proposal box that covers the best covered gt box
box_ind = argmax_overlaps[gt_ind]
# record the iou coverage of this gt box
_gt_overlaps[j] = overlaps[box_ind, gt_ind]
assert (_gt_overlaps[j] == gt_ovr)
# mark the proposal box and the gt box as used
overlaps[box_ind, :] = -1
overlaps[:, gt_ind] = -1
# append recorded iou coverage level
gt_overlaps = np.hstack((gt_overlaps, _gt_overlaps))
gt_overlaps = np.sort(gt_overlaps)
if thresholds is None:
step = 0.05
thresholds = np.arange(0.5, 0.95 + 1e-5, step)
recalls = np.zeros_like(thresholds)
# compute recall for each iou threshold
for i, t in enumerate(thresholds):
recalls[i] = (gt_overlaps >= t).sum() / float(num_pos)
# ar = 2 * np.trapz(recalls, thresholds)
ar = recalls.mean()
return {'ar': ar, 'recalls': recalls, 'thresholds': thresholds,
'gt_overlaps': gt_overlaps}
def create_roidb_from_box_list(self, box_list, gt_roidb):
assert len(box_list) == self.num_images, \
'Number of boxes must match number of ground-truth images'
roidb = []
for i in range(self.num_images):
boxes = box_list[i]
num_boxes = boxes.shape[0]
overlaps = np.zeros((num_boxes, self.num_classes), dtype=np.float32)
if gt_roidb is not None and gt_roidb[i]['boxes'].size > 0:
gt_boxes = gt_roidb[i]['boxes']
gt_classes = gt_roidb[i]['gt_classes']
gt_overlaps = bbox_overlaps(boxes.astype(np.float),
gt_boxes.astype(np.float))
argmaxes = gt_overlaps.argmax(axis=1)
maxes = gt_overlaps.max(axis=1)
I = np.where(maxes > 0)[0]
overlaps[I, gt_classes[argmaxes[I]]] = maxes[I]
overlaps = scipy.sparse.csr_matrix(overlaps)
roidb.append({
'boxes': boxes,
'gt_classes': np.zeros((num_boxes,), dtype=np.int32),
'gt_overlaps': overlaps,
'flipped': False,
'seg_areas': np.zeros((num_boxes,), dtype=np.float32),
})
return roidb
@staticmethod
def merge_roidbs(a, b):
assert len(a) == len(b)
for i in range(len(a)):
a[i]['boxes'] = np.vstack((a[i]['boxes'], b[i]['boxes']))
a[i]['gt_classes'] = np.hstack((a[i]['gt_classes'],
b[i]['gt_classes']))
a[i]['gt_overlaps'] = scipy.sparse.vstack([a[i]['gt_overlaps'],
b[i]['gt_overlaps']])
a[i]['seg_areas'] = np.hstack((a[i]['seg_areas'],
b[i]['seg_areas']))
return a
def competition_mode(self, on):
"""Turn competition mode on or off."""
pass
================================================
FILE: lib/datasets/pascal_voc.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from datasets.imdb import imdb
import datasets.ds_utils as ds_utils
import xml.etree.ElementTree as ET
import numpy as np
import scipy.sparse
import scipy.io as sio
import utils.cython_bbox
import pickle
import subprocess
import uuid
from .voc_eval import voc_eval
from model.config import cfg
class pascal_voc(imdb):
def __init__(self, image_set, year, devkit_path=None):
imdb.__init__(self, 'voc_' + year + '_' + image_set)
self._year = year
self._image_set = image_set
self._devkit_path = self._get_default_path() if devkit_path is None \
else devkit_path
self._data_path = os.path.join(self._devkit_path, 'VOC' + self._year)
self._classes = ('__background__', # always index 0
'aeroplane', 'bicycle', 'bird', 'boat',
'bottle', 'bus', 'car', 'cat', 'chair',
'cow', 'diningtable', 'dog', 'horse',
'motorbike', 'person', 'pottedplant',
'sheep', 'sofa', 'train', 'tvmonitor')
self._class_to_ind = dict(list(zip(self.classes, list(range(self.num_classes)))))
self._image_ext = '.jpg'
self._image_index = self._load_image_set_index()
# Default to roidb handler
self._roidb_handler = self.gt_roidb
self._salt = str(uuid.uuid4())
self._comp_id = 'comp4'
# PASCAL specific config options
self.config = {'cleanup': True,
'use_salt': True,
'use_diff': False,
'matlab_eval': False,
'rpn_file': None}
assert os.path.exists(self._devkit_path), \
'VOCdevkit path does not exist: {}'.format(self._devkit_path)
assert os.path.exists(self._data_path), \
'Path does not exist: {}'.format(self._data_path)
def image_path_at(self, i):
"""
Return the absolute path to image i in the image sequence.
"""
return self.image_path_from_index(self._image_index[i])
def image_path_from_index(self, index):
"""
Construct an image path from the image's "index" identifier.
"""
image_path = os.path.join(self._data_path, 'JPEGImages',
index + self._image_ext)
assert os.path.exists(image_path), \
'Path does not exist: {}'.format(image_path)
return image_path
def _load_image_set_index(self):
"""
Load the indexes listed in this dataset's image set file.
"""
# Example path to image set file:
# self._devkit_path + /VOCdevkit2007/VOC2007/ImageSets/Main/val.txt
image_set_file = os.path.join(self._data_path, 'ImageSets', 'Main',
self._image_set + '.txt')
assert os.path.exists(image_set_file), \
'Path does not exist: {}'.format(image_set_file)
with open(image_set_file) as f:
image_index = [x.strip() for x in f.readlines()]
return image_index
def _get_default_path(self):
"""
Return the default path where PASCAL VOC is expected to be installed.
"""
return os.path.join(cfg.DATA_DIR, 'VOCdevkit' + self._year)
def gt_roidb(self):
"""
Return the database of ground-truth regions of interest.
This function loads/saves from/to a cache file to speed up future calls.
"""
cache_file = os.path.join(self.cache_path, self.name + '_gt_roidb.pkl')
if os.path.exists(cache_file):
with open(cache_file, 'rb') as fid:
try:
roidb = pickle.load(fid)
except:
roidb = pickle.load(fid, encoding='bytes')
print('{} gt roidb loaded from {}'.format(self.name, cache_file))
return roidb
gt_roidb = [self._load_pascal_annotation(index)
for index in self.image_index]
with open(cache_file, 'wb') as fid:
pickle.dump(gt_roidb, fid, pickle.HIGHEST_PROTOCOL)
print('wrote gt roidb to {}'.format(cache_file))
return gt_roidb
def rpn_roidb(self):
if int(self._year) == 2007 or self._image_set != 'test':
gt_roidb = self.gt_roidb()
rpn_roidb = self._load_rpn_roidb(gt_roidb)
roidb = imdb.merge_roidbs(gt_roidb, rpn_roidb)
else:
roidb = self._load_rpn_roidb(None)
return roidb
def _load_rpn_roidb(self, gt_roidb):
filename = self.config['rpn_file']
print('loading {}'.format(filename))
assert os.path.exists(filename), \
'rpn data not found at: {}'.format(filename)
with open(filename, 'rb') as f:
box_list = pickle.load(f)
return self.create_roidb_from_box_list(box_list, gt_roidb)
def _load_pascal_annotation(self, index):
"""
Load image and bounding boxes info from XML file in the PASCAL VOC
format.
"""
filename = os.path.join(self._data_path, 'Annotations', index + '.xml')
tree = ET.parse(filename)
objs = tree.findall('object')
if not self.config['use_diff']:
# Exclude the samples labeled as difficult
non_diff_objs = [
obj for obj in objs if int(obj.find('difficult').text) == 0]
# if len(non_diff_objs) != len(objs):
# print 'Removed {} difficult objects'.format(
# len(objs) - len(non_diff_objs))
objs = non_diff_objs
num_objs = len(objs)
boxes = np.zeros((num_objs, 4), dtype=np.uint16)
gt_classes = np.zeros((num_objs), dtype=np.int32)
overlaps = np.zeros((num_objs, self.num_classes), dtype=np.float32)
# "Seg" area for pascal is just the box area
seg_areas = np.zeros((num_objs), dtype=np.float32)
# Load object bounding boxes into a data frame.
for ix, obj in enumerate(objs):
bbox = obj.find('bndbox')
# Make pixel indexes 0-based
x1 = float(bbox.find('xmin').text) - 1
y1 = float(bbox.find('ymin').text) - 1
x2 = float(bbox.find('xmax').text) - 1
y2 = float(bbox.find('ymax').text) - 1
cls = self._class_to_ind[obj.find('name').text.lower().strip()]
boxes[ix, :] = [x1, y1, x2, y2]
gt_classes[ix] = cls
overlaps[ix, cls] = 1.0
seg_areas[ix] = (x2 - x1 + 1) * (y2 - y1 + 1)
overlaps = scipy.sparse.csr_matrix(overlaps)
return {'boxes': boxes,
'gt_classes': gt_classes,
'gt_overlaps': overlaps,
'flipped': False,
'seg_areas': seg_areas}
def _get_comp_id(self):
comp_id = (self._comp_id + '_' + self._salt if self.config['use_salt']
else self._comp_id)
return comp_id
def _get_voc_results_file_template(self):
# VOCdevkit/results/VOC2007/Main/<comp_id>_det_test_aeroplane.txt
filename = self._get_comp_id() + '_det_' + self._image_set + '_{:s}.txt'
path = os.path.join(
self._devkit_path,
'results',
'VOC' + self._year,
'Main',
filename)
return path
def _write_voc_results_file(self, all_boxes):
for cls_ind, cls in enumerate(self.classes):
if cls == '__background__':
continue
print('Writing {} VOC results file'.format(cls))
filename = self._get_voc_results_file_template().format(cls)
with open(filename, 'wt') as f:
for im_ind, index in enumerate(self.image_index):
dets = all_boxes[cls_ind][im_ind]
if dets == []:
continue
# the VOCdevkit expects 1-based indices
for k in range(dets.shape[0]):
f.write('{:s} {:.3f} {:.1f} {:.1f} {:.1f} {:.1f}\n'.
format(index, dets[k, -1],
dets[k, 0] + 1, dets[k, 1] + 1,
dets[k, 2] + 1, dets[k, 3] + 1))
def _do_python_eval(self, output_dir='output'):
annopath = os.path.join(
self._devkit_path,
'VOC' + self._year,
'Annotations',
'{:s}.xml')
imagesetfile = os.path.join(
self._devkit_path,
'VOC' + self._year,
'ImageSets',
'Main',
self._image_set + '.txt')
cachedir = os.path.join(self._devkit_path, 'annotations_cache')
aps = []
# The PASCAL VOC metric changed in 2010
use_07_metric = True if int(self._year) < 2010 else False
print('VOC07 metric? ' + ('Yes' if use_07_metric else 'No'))
if not os.path.isdir(output_dir):
os.mkdir(output_dir)
for i, cls in enumerate(self._classes):
if cls == '__background__':
continue
filename = self._get_voc_results_file_template().format(cls)
rec, prec, ap = voc_eval(
filename, annopath, imagesetfile, cls, cachedir, ovthresh=0.5,
use_07_metric=use_07_metric)
aps += [ap]
print(('AP for {} = {:.4f}'.format(cls, ap)))
with open(os.path.join(output_dir, cls + '_pr.pkl'), 'wb') as f:
pickle.dump({'rec': rec, 'prec': prec, 'ap': ap}, f)
print(('Mean AP = {:.4f}'.format(np.mean(aps))))
print('~~~~~~~~')
print('Results:')
for ap in aps:
print(('{:.3f}'.format(ap)))
print(('{:.3f}'.format(np.mean(aps))))
print('~~~~~~~~')
print('')
print('--------------------------------------------------------------')
print('Results computed with the **unofficial** Python eval code.')
print('Results should be very close to the official MATLAB eval code.')
print('Recompute with `./tools/reval.py --matlab ...` for your paper.')
print('-- Thanks, The Management')
print('--------------------------------------------------------------')
def _do_matlab_eval(self, output_dir='output'):
print('-----------------------------------------------------')
print('Computing results with the official MATLAB eval code.')
print('-----------------------------------------------------')
path = os.path.join(cfg.ROOT_DIR, 'lib', 'datasets',
'VOCdevkit-matlab-wrapper')
cmd = 'cd {} && '.format(path)
cmd += '{:s} -nodisplay -nodesktop '.format(cfg.MATLAB)
cmd += '-r "dbstop if error; '
cmd += 'voc_eval(\'{:s}\',\'{:s}\',\'{:s}\',\'{:s}\'); quit;"' \
.format(self._devkit_path, self._get_comp_id(),
self._image_set, output_dir)
print(('Running:\n{}'.format(cmd)))
status = subprocess.call(cmd, shell=True)
def evaluate_detections(self, all_boxes, output_dir):
self._write_voc_results_file(all_boxes)
self._do_python_eval(output_dir)
if self.config['matlab_eval']:
self._do_matlab_eval(output_dir)
if self.config['cleanup']:
for cls in self._classes:
if cls == '__background__':
continue
filename = self._get_voc_results_file_template().format(cls)
os.remove(filename)
def competition_mode(self, on):
if on:
self.config['use_salt'] = False
self.config['cleanup'] = False
else:
self.config['use_salt'] = True
self.config['cleanup'] = True
if __name__ == '__main__':
from datasets.pascal_voc import pascal_voc
d = pascal_voc('trainval', '2007')
res = d.roidb
from IPython import embed;
embed()
================================================
FILE: lib/datasets/tools/mcg_munge.py
================================================
import os
import sys
"""Hacky tool to convert file system layout of MCG boxes downloaded from
http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/mcg/
so that it's consistent with those computed by Jan Hosang (see:
http://www.mpi-inf.mpg.de/departments/computer-vision-and-multimodal-
computing/research/object-recognition-and-scene-understanding/how-
good-are-detection-proposals-really/)
NB: Boxes from the MCG website are in (y1, x1, y2, x2) order.
Boxes from Hosang et al. are in (x1, y1, x2, y2) order.
"""
def munge(src_dir):
# stored as: ./MCG-COCO-val2014-boxes/COCO_val2014_000000193401.mat
# want: ./MCG/mat/COCO_val2014_0/COCO_val2014_000000141/COCO_val2014_000000141334.mat
files = os.listdir(src_dir)
for fn in files:
base, ext = os.path.splitext(fn)
# first 14 chars / first 22 chars / all chars + .mat
# COCO_val2014_0/COCO_val2014_000000447/COCO_val2014_000000447991.mat
first = base[:14]
second = base[:22]
dst_dir = os.path.join('MCG', 'mat', first, second)
if not os.path.exists(dst_dir):
os.makedirs(dst_dir)
src = os.path.join(src_dir, fn)
dst = os.path.join(dst_dir, fn)
print 'MV: {} -> {}'.format(src, dst)
os.rename(src, dst)
if __name__ == '__main__':
# src_dir should look something like:
# src_dir = 'MCG-COCO-val2014-boxes'
src_dir = sys.argv[1]
munge(src_dir)
================================================
FILE: lib/datasets/voc_eval.py
================================================
# --------------------------------------------------------
# Fast/er R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Bharath Hariharan
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import xml.etree.ElementTree as ET
import os
import pickle
import numpy as np
def parse_rec(filename):
""" Parse a PASCAL VOC xml file """
tree = ET.parse(filename)
objects = []
for obj in tree.findall('object'):
obj_struct = {}
obj_struct['name'] = obj.find('name').text
obj_struct['pose'] = obj.find('pose').text
obj_struct['truncated'] = int(obj.find('truncated').text)
obj_struct['difficult'] = int(obj.find('difficult').text)
bbox = obj.find('bndbox')
obj_struct['bbox'] = [int(bbox.find('xmin').text),
int(bbox.find('ymin').text),
int(bbox.find('xmax').text),
int(bbox.find('ymax').text)]
objects.append(obj_struct)
return objects
def voc_ap(rec, prec, use_07_metric=False):
""" ap = voc_ap(rec, prec, [use_07_metric])
Compute VOC AP given precision and recall.
If use_07_metric is true, uses the
VOC 07 11 point method (default:False).
"""
if use_07_metric:
# 11 point metric
ap = 0.
for t in np.arange(0., 1.1, 0.1):
if np.sum(rec >= t) == 0:
p = 0
else:
p = np.max(prec[rec >= t])
ap = ap + p / 11.
else:
# correct AP calculation
# first append sentinel values at the end
mrec = np.concatenate(([0.], rec, [1.]))
mpre = np.concatenate(([0.], prec, [0.]))
# compute the precision envelope
for i in range(mpre.size - 1, 0, -1):
mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])
# to calculate area under PR curve, look for points
# where X axis (recall) changes value
i = np.where(mrec[1:] != mrec[:-1])[0]
# and sum (\Delta recall) * prec
ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
return ap
def voc_eval(detpath,
annopath,
imagesetfile,
classname,
cachedir,
ovthresh=0.5,
use_07_metric=False):
"""rec, prec, ap = voc_eval(detpath,
annopath,
imagesetfile,
classname,
[ovthresh],
[use_07_metric])
Top level function that does the PASCAL VOC evaluation.
detpath: Path to detections
detpath.format(classname) should produce the detection results file.
annopath: Path to annotations
annopath.format(imagename) should be the xml annotations file.
imagesetfile: Text file containing the list of images, one image per line.
classname: Category name (duh)
cachedir: Directory for caching the annotations
[ovthresh]: Overlap threshold (default = 0.5)
[use_07_metric]: Whether to use VOC07's 11 point AP computation
(default False)
"""
# assumes detections are in detpath.format(classname)
# assumes annotations are in annopath.format(imagename)
# assumes imagesetfile is a text file with each line an image name
# cachedir caches the annotations in a pickle file
# first load gt
if not os.path.isdir(cachedir):
os.mkdir(cachedir)
cachefile = os.path.join(cachedir, 'annots.pkl')
# read list of images
with open(imagesetfile, 'r') as f:
lines = f.readlines()
imagenames = [x.strip() for x in lines]
if not os.path.isfile(cachefile):
# load annots
recs = {}
for i, imagename in enumerate(imagenames):
recs[imagename] = parse_rec(annopath.format(imagename))
if i % 100 == 0:
print('Reading annotation for {:d}/{:d}'.format(
i + 1, len(imagenames)))
# save
print('Saving cached annotations to {:s}'.format(cachefile))
with open(cachefile, 'w') as f:
pickle.dump(recs, f)
else:
# load
with open(cachefile, 'rb') as f:
try:
recs = pickle.load(f)
except:
recs = pickle.load(f, encoding='bytes')
# extract gt objects for this class
class_recs = {}
npos = 0
for imagename in imagenames:
R = [obj for obj in recs[imagename] if obj['name'] == classname]
bbox = np.array([x['bbox'] for x in R])
difficult = np.array([x['difficult'] for x in R]).astype(np.bool)
det = [False] * len(R)
npos = npos + sum(~difficult)
class_recs[imagename] = {'bbox': bbox,
'difficult': difficult,
'det': det}
# read dets
detfile = detpath.format(classname)
with open(detfile, 'r') as f:
lines = f.readlines()
splitlines = [x.strip().split(' ') for x in lines]
image_ids = [x[0] for x in splitlines]
confidence = np.array([float(x[1]) for x in splitlines])
BB = np.array([[float(z) for z in x[2:]] for x in splitlines])
nd = len(image_ids)
tp = np.zeros(nd)
fp = np.zeros(nd)
if BB.shape[0] > 0:
# sort by confidence
sorted_ind = np.argsort(-confidence)
sorted_scores = np.sort(-confidence)
BB = BB[sorted_ind, :]
image_ids = [image_ids[x] for x in sorted_ind]
# go down dets and mark TPs and FPs
for d in range(nd):
R = class_recs[image_ids[d]]
bb = BB[d, :].astype(float)
ovmax = -np.inf
BBGT = R['bbox'].astype(float)
if BBGT.size > 0:
# compute overlaps
# intersection
ixmin = np.maximum(BBGT[:, 0], bb[0])
iymin = np.maximum(BBGT[:, 1], bb[1])
ixmax = np.minimum(BBGT[:, 2], bb[2])
iymax = np.minimum(BBGT[:, 3], bb[3])
iw = np.maximum(ixmax - ixmin + 1., 0.)
ih = np.maximum(iymax - iymin + 1., 0.)
inters = iw * ih
# union
uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
(BBGT[:, 2] - BBGT[:, 0] + 1.) *
(BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)
overlaps = inters / uni
ovmax = np.max(overlaps)
jmax = np.argmax(overlaps)
if ovmax > ovthresh:
if not R['difficult'][jmax]:
if not R['det'][jmax]:
tp[d] = 1.
R['det'][jmax] = 1
else:
fp[d] = 1.
else:
fp[d] = 1.
# compute precision recall
fp = np.cumsum(fp)
tp = np.cumsum(tp)
rec = tp / float(npos)
# avoid divide by zero in case the first detection matches a difficult
# ground truth
prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)
ap = voc_ap(rec, prec, use_07_metric)
return rec, prec, ap
================================================
FILE: lib/layer_utils/__init__.py
================================================
================================================
FILE: lib/layer_utils/anchor_target_layer.py
================================================
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
from model.config import cfg
import numpy as np
import numpy.random as npr
from utils.cython_bbox import bbox_overlaps
from model.bbox_transform import bbox_transform
def anchor_target_layer(rpn_cls_score, gt_boxes, im_info, _feat_stride, all_anchors, num_anchors):
"""Same as the anchor target layer in original Fast/er RCNN """
A = num_anchors
total_anchors = all_anchors.shape[0]
K = total_anchors / num_anchors
im_info = im_info[0]
# allow boxes to sit over the edge by a small amount
_allowed_border = 0
# map of shape (..., H, W)
height, width = rpn_cls_score.shape[1:3]
# only keep anchors inside the image
inds_inside = np.where(
(all_anchors[:, 0] >= -_allowed_border) &
(all_anchors[:, 1] >= -_allowed_border) &
(all_anchors[:, 2] < im_info[1] + _allowed_border) & # width
(all_anchors[:, 3] < im_info[0] + _allowed_border) # height
)[0]
# keep only inside anchors
anchors = all_anchors[inds_inside, :]
# label: 1 is positive, 0 is negative, -1 is dont care
labels = np.empty((len(inds_inside),), dtype=np.float32)
labels.fill(-1)
# overlaps between the anchors and the gt boxes
# overlaps (ex, gt)
overlaps = bbox_overlaps(
np.ascontiguousarray(anchors, dtype=np.float),
np.ascontiguousarray(gt_boxes, dtype=np.float))
argmax_overlaps = overlaps.argmax(axis=1)
max_overlaps = overlaps[np.arange(len(inds_inside)), argmax_overlaps]
gt_argmax_overlaps = overlaps.argmax(axis=0)
gt_max_overlaps = overlaps[gt_argmax_overlaps,
np.arange(overlaps.shape[1])]
gt_argmax_overlaps = np.where(overlaps == gt_max_overlaps)[0]
if not cfg.TRAIN.RPN_CLOBBER_POSITIVES:
# assign bg labels first so that positive labels can clobber them
# first set the negatives
labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
# fg label: for each gt, anchor with highest overlap
labels[gt_argmax_overlaps] = 1
# fg label: above threshold IOU
labels[max_overlaps >= cfg.TRAIN.RPN_POSITIVE_OVERLAP] = 1
if cfg.TRAIN.RPN_CLOBBER_POSITIVES:
# assign bg labels last so that negative labels can clobber positives
labels[max_overlaps < cfg.TRAIN.RPN_NEGATIVE_OVERLAP] = 0
# subsample positive labels if we have too many
num_fg = int(cfg.TRAIN.RPN_FG_FRACTION * cfg.TRAIN.RPN_BATCHSIZE)
fg_inds = np.where(labels == 1)[0]
if len(fg_inds) > num_fg:
disable_inds = npr.choice(
fg_inds, size=(len(fg_inds) - num_fg), replace=False)
labels[disable_inds] = -1
# subsample negative labels if we have too many
num_bg = cfg.TRAIN.RPN_BATCHSIZE - np.sum(labels == 1)
bg_inds = np.where(labels == 0)[0]
if len(bg_inds) > num_bg:
disable_inds = npr.choice(
bg_inds, size=(len(bg_inds) - num_bg), replace=False)
labels[disable_inds] = -1
bbox_targets = np.zeros((len(inds_inside), 4), dtype=np.float32)
bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])
bbox_inside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
# only the positive ones have regression targets
bbox_inside_weights[labels == 1, :] = np.array(cfg.TRAIN.RPN_BBOX_INSIDE_WEIGHTS)
bbox_outside_weights = np.zeros((len(inds_inside), 4), dtype=np.float32)
if cfg.TRAIN.RPN_POSITIVE_WEIGHT < 0:
# uniform weighting of examples (given non-uniform sampling)
num_examples = np.sum(labels >= 0)
positive_weights = np.ones((1, 4)) * 1.0 / num_examples
negative_weights = np.ones((1, 4)) * 1.0 / num_examples
else:
assert ((cfg.TRAIN.RPN_POSITIVE_WEIGHT > 0) &
(cfg.TRAIN.RPN_POSITIVE_WEIGHT < 1))
positive_weights = (cfg.TRAIN.RPN_POSITIVE_WEIGHT /
np.sum(labels == 1))
negative_weights = ((1.0 - cfg.TRAIN.RPN_POSITIVE_WEIGHT) /
np.sum(labels == 0))
bbox_outside_weights[labels == 1, :] = positive_weights
bbox_outside_weights[labels == 0, :] = negative_weights
# map up to original set of anchors
labels = _unmap(labels, total_anchors, inds_inside, fill=-1)
bbox_targets = _unmap(bbox_targets, total_anchors, inds_inside, fill=0)
bbox_inside_weights = _unmap(bbox_inside_weights, total_anchors, inds_inside, fill=0)
bbox_outside_weights = _unmap(bbox_outside_weights, total_anchors, inds_inside, fill=0)
# labels
labels = labels.reshape((1, height, width, A)).transpose(0, 3, 1, 2)
labels = labels.reshape((1, 1, A * height, width))
rpn_labels = labels
# bbox_targets
bbox_targets = bbox_targets \
.reshape((1, height, width, A * 4))
rpn_bbox_targets = bbox_targets
# bbox_inside_weights
bbox_inside_weights = bbox_inside_weights \
.reshape((1, height, width, A * 4))
rpn_bbox_inside_weights = bbox_inside_weights
# bbox_outside_weights
bbox_outside_weights = bbox_outside_weights \
.reshape((1, height, width, A * 4))
rpn_bbox_outside_weights = bbox_outside_weights
return rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights
def _unmap(data, count, inds, fill=0):
""" Unmap a subset of item (data) back to the original set of items (of
size count) """
if len(data.shape) == 1:
ret = np.empty((count,), dtype=np.float32)
ret.fill(fill)
ret[inds] = data
else:
ret = np.empty((count,) + data.shape[1:], dtype=np.float32)
ret.fill(fill)
ret[inds, :] = data
return ret
def _compute_targets(ex_rois, gt_rois):
"""Compute bounding-box regression targets for an image."""
assert ex_rois.shape[0] == gt_rois.shape[0]
assert ex_rois.shape[1] == 4
assert gt_rois.shape[1] == 5
return bbox_transform(ex_rois, gt_rois[:, :4]).astype(np.float32, copy=False)
================================================
FILE: lib/layer_utils/generate_anchors.py
================================================
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Sean Bell
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
# Verify that we compute the same anchors as Shaoqing's matlab implementation:
#
# >> load output/rpn_cachedir/faster_rcnn_VOC2007_ZF_stage1_rpn/anchors.mat
# >> anchors
#
# anchors =
#
# -83 -39 100 56
# -175 -87 192 104
# -359 -183 376 200
# -55 -55 72 72
# -119 -119 136 136
# -247 -247 264 264
# -35 -79 52 96
# -79 -167 96 184
# -167 -343 184 360
# array([[ -83., -39., 100., 56.],
# [-175., -87., 192., 104.],
# [-359., -183., 376., 200.],
# [ -55., -55., 72., 72.],
# [-119., -119., 136., 136.],
# [-247., -247., 264., 264.],
# [ -35., -79., 52., 96.],
# [ -79., -167., 96., 184.],
# [-167., -343., 184., 360.]])
def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
scales=2 ** np.arange(3, 6)):
"""
Generate anchor (reference) windows by enumerating aspect ratios X
scales wrt a reference (0, 0, 15, 15) window.
"""
base_anchor = np.array([1, 1, base_size, base_size]) - 1
ratio_anchors = _ratio_enum(base_anchor, ratios)
anchors = np.vstack([_scale_enum(ratio_anchors[i, :], scales)
for i in range(ratio_anchors.shape[0])])
return anchors
def _whctrs(anchor):
"""
Return width, height, x center, and y center for an anchor (window).
"""
w = anchor[2] - anchor[0] + 1
h = anchor[3] - anchor[1] + 1
x_ctr = anchor[0] + 0.5 * (w - 1)
y_ctr = anchor[1] + 0.5 * (h - 1)
return w, h, x_ctr, y_ctr
def _mkanchors(ws, hs, x_ctr, y_ctr):
"""
Given a vector of widths (ws) and heights (hs) around a center
(x_ctr, y_ctr), output a set of anchors (windows).
"""
ws = ws[:, np.newaxis]
hs = hs[:, np.newaxis]
anchors = np.hstack((x_ctr - 0.5 * (ws - 1),
y_ctr - 0.5 * (hs - 1),
x_ctr + 0.5 * (ws - 1),
y_ctr + 0.5 * (hs - 1)))
return anchors
def _ratio_enum(anchor, ratios):
"""
Enumerate a set of anchors for each aspect ratio wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor)
size = w * h
size_ratios = size / ratios
ws = np.round(np.sqrt(size_ratios))
hs = np.round(ws * ratios)
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
def _scale_enum(anchor, scales):
"""
Enumerate a set of anchors for each scale wrt an anchor.
"""
w, h, x_ctr, y_ctr = _whctrs(anchor)
ws = w * scales
hs = h * scales
anchors = _mkanchors(ws, hs, x_ctr, y_ctr)
return anchors
if __name__ == '__main__':
import time
t = time.time()
a = generate_anchors()
print(time.time() - t)
print(a)
from IPython import embed;
embed()
================================================
FILE: lib/layer_utils/proposal_layer.py
================================================
# --------------------------------------------------------
# Faster R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick and Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from model.config import cfg
from model.bbox_transform import bbox_transform_inv, clip_boxes
from model.nms_wrapper import nms
def proposal_layer(rpn_cls_prob, rpn_bbox_pred, im_info, cfg_key, _feat_stride, anchors, num_anchors):
"""A simplified version compared to fast/er RCNN
For details please see the technical report
"""
if type(cfg_key) == bytes:
cfg_key = cfg_key.decode('utf-8')
pre_nms_topN = cfg[cfg_key].RPN_PRE_NMS_TOP_N
post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
nms_thresh = cfg[cfg_key].RPN_NMS_THRESH
im_info = im_info[0]
# Get the scores and bounding boxes
scores = rpn_cls_prob[:, :, :, num_anchors:]
rpn_bbox_pred = rpn_bbox_pred.reshape((-1, 4))
scores = scores.reshape((-1, 1))
proposals = bbox_transform_inv(anchors, rpn_bbox_pred)
proposals = clip_boxes(proposals, im_info[:2])
# Pick the top region proposals
order = scores.ravel().argsort()[::-1]
if pre_nms_topN > 0:
order = order[:pre_nms_topN]
proposals = proposals[order, :]
scores = scores[order]
# Non-maximal suppression
keep = nms(np.hstack((proposals, scores)), nms_thresh)
# Pick th top region proposals after NMS
if post_nms_topN > 0:
keep = keep[:post_nms_topN]
proposals = proposals[keep, :]
scores = scores[keep]
# Only support single image as input
batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
return blob, scores
================================================
FILE: lib/layer_utils/proposal_target_layer.py
================================================
# --------------------------------------------------------
# Faster R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick, Sean Bell and Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import numpy.random as npr
from model.config import cfg
from model.bbox_transform import bbox_transform
from utils.cython_bbox import bbox_overlaps
def proposal_target_layer(rpn_rois, rpn_scores, gt_boxes, _num_classes):
"""
Assign object detection proposals to ground-truth targets. Produces proposal
classification labels and bounding-box regression targets.
"""
# Proposal ROIs (0, x1, y1, x2, y2) coming from RPN
# (i.e., rpn.proposal_layer.ProposalLayer), or any other source
all_rois = rpn_rois
all_scores = rpn_scores
# Include ground-truth boxes in the set of candidate rois
if cfg.TRAIN.USE_GT:
zeros = np.zeros((gt_boxes.shape[0], 1), dtype=gt_boxes.dtype)
all_rois = np.vstack(
(all_rois, np.hstack((zeros, gt_boxes[:, :-1])))
)
# not sure if it a wise appending, but anyway i am not using it
all_scores = np.vstack((all_scores, zeros))
num_images = 1
rois_per_image = cfg.TRAIN.BATCH_SIZE / num_images
fg_rois_per_image = np.round(cfg.TRAIN.FG_FRACTION * rois_per_image)
# Sample rois with classification labels and bounding box regression
# targets
labels, rois, roi_scores, bbox_targets, bbox_inside_weights = _sample_rois(
all_rois, all_scores, gt_boxes, fg_rois_per_image,
rois_per_image, _num_classes)
rois = rois.reshape(-1, 5)
roi_scores = roi_scores.reshape(-1)
labels = labels.reshape(-1, 1)
bbox_targets = bbox_targets.reshape(-1, _num_classes * 4)
bbox_inside_weights = bbox_inside_weights.reshape(-1, _num_classes * 4)
bbox_outside_weights = np.array(bbox_inside_weights > 0).astype(np.float32)
return rois, roi_scores, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights
def _get_bbox_regression_labels(bbox_target_data, num_classes):
"""Bounding-box regression targets (bbox_target_data) are stored in a
compact form N x (class, tx, ty, tw, th)
This function expands those targets into the 4-of-4*K representation used
by the network (i.e. only one class has non-zero targets).
Returns:
bbox_target (ndarray): N x 4K blob of regression targets
bbox_inside_weights (ndarray): N x 4K blob of loss weights
"""
clss = bbox_target_data[:, 0]
bbox_targets = np.zeros((clss.size, 4 * num_classes), dtype=np.float32)
bbox_inside_weights = np.zeros(bbox_targets.shape, dtype=np.float32)
inds = np.where(clss > 0)[0]
for ind in inds:
cls = clss[ind]
start = int(4 * cls)
end = start + 4
bbox_targets[ind, start:end] = bbox_target_data[ind, 1:]
bbox_inside_weights[ind, start:end] = cfg.TRAIN.BBOX_INSIDE_WEIGHTS
return bbox_targets, bbox_inside_weights
def _compute_targets(ex_rois, gt_rois, labels):
"""Compute bounding-box regression targets for an image."""
assert ex_rois.shape[0] == gt_rois.shape[0]
assert ex_rois.shape[1] == 4
assert gt_rois.shape[1] == 4
targets = bbox_transform(ex_rois, gt_rois)
if cfg.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED:
# Optionally normalize targets by a precomputed mean and stdev
targets = ((targets - np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS))
/ np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS))
return np.hstack(
(labels[:, np.newaxis], targets)).astype(np.float32, copy=False)
def _sample_rois(all_rois, all_scores, gt_boxes, fg_rois_per_image, rois_per_image, num_classes):
"""Generate a random sample of RoIs comprising foreground and background
examples.
"""
# overlaps: (rois x gt_boxes)
overlaps = bbox_overlaps(
np.ascontiguousarray(all_rois[:, 1:5], dtype=np.float),
np.ascontiguousarray(gt_boxes[:, :4], dtype=np.float))
gt_assignment = overlaps.argmax(axis=1)
max_overlaps = overlaps.max(axis=1)
labels = gt_boxes[gt_assignment, 4]
# Select foreground RoIs as those with >= FG_THRESH overlap
fg_inds = np.where(max_overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Guard against the case when an image has fewer than fg_rois_per_image
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((max_overlaps < cfg.TRAIN.BG_THRESH_HI) &
(max_overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
# Small modification to the original version where we ensure a fixed number of regions are sampled
if fg_inds.size > 0 and bg_inds.size > 0:
fg_rois_per_image = min(fg_rois_per_image, fg_inds.size)
fg_inds = npr.choice(fg_inds, size=int(fg_rois_per_image), replace=False)
bg_rois_per_image = rois_per_image - fg_rois_per_image
to_replace = bg_inds.size < bg_rois_per_image
bg_inds = npr.choice(bg_inds, size=int(bg_rois_per_image), replace=to_replace)
elif fg_inds.size > 0:
to_replace = fg_inds.size < rois_per_image
fg_inds = npr.choice(fg_inds, size=int(rois_per_image), replace=to_replace)
fg_rois_per_image = rois_per_image
elif bg_inds.size > 0:
to_replace = bg_inds.size < rois_per_image
bg_inds = npr.choice(bg_inds, size=int(rois_per_image), replace=to_replace)
fg_rois_per_image = 0
else:
import pdb
pdb.set_trace()
# The indices that we're selecting (both fg and bg)
keep_inds = np.append(fg_inds, bg_inds)
# Select sampled values from various arrays:
labels = labels[keep_inds]
# Clamp labels for the background RoIs to 0
labels[int(fg_rois_per_image):] = 0
rois = all_rois[keep_inds]
roi_scores = all_scores[keep_inds]
bbox_target_data = _compute_targets(
rois[:, 1:5], gt_boxes[gt_assignment[keep_inds], :4], labels)
bbox_targets, bbox_inside_weights = \
_get_bbox_regression_labels(bbox_target_data, num_classes)
return labels, rois, roi_scores, bbox_targets, bbox_inside_weights
================================================
FILE: lib/layer_utils/proposal_top_layer.py
================================================
# --------------------------------------------------------
# Faster R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
from model.config import cfg
from model.bbox_transform import bbox_transform_inv, clip_boxes
import numpy.random as npr
def proposal_top_layer(rpn_cls_prob, rpn_bbox_pred, im_info, _feat_stride, anchors, num_anchors):
"""A layer that just selects the top region proposals
without using non-maximal suppression,
For details please see the technical report
"""
rpn_top_n = cfg.TEST.RPN_TOP_N
im_info = im_info[0]
scores = rpn_cls_prob[:, :, :, num_anchors:]
rpn_bbox_pred = rpn_bbox_pred.reshape((-1, 4))
scores = scores.reshape((-1, 1))
length = scores.shape[0]
if length < rpn_top_n:
# Random selection, maybe unnecessary and loses good proposals
# But such case rarely happens
top_inds = npr.choice(length, size=rpn_top_n, replace=True)
else:
top_inds = scores.argsort(0)[::-1]
top_inds = top_inds[:rpn_top_n]
top_inds = top_inds.reshape(rpn_top_n, )
# Do the selection here
anchors = anchors[top_inds, :]
rpn_bbox_pred = rpn_bbox_pred[top_inds, :]
scores = scores[top_inds]
# Convert anchors into proposals via bbox transformations
proposals = bbox_transform_inv(anchors, rpn_bbox_pred)
# Clip predicted boxes to image
proposals = clip_boxes(proposals, im_info[:2])
# Output rois blob
# Our RPN implementation only supports a single input image, so all
# batch inds are 0
batch_inds = np.zeros((proposals.shape[0], 1), dtype=np.float32)
blob = np.hstack((batch_inds, proposals.astype(np.float32, copy=False)))
return blob, scores
================================================
FILE: lib/layer_utils/snippets.py
================================================
# --------------------------------------------------------
# Tensorflow Faster R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
import numpy.random as npr
from model.config import cfg
from layer_utils.generate_anchors import generate_anchors
from model.bbox_transform import bbox_transform_inv, clip_boxes
from utils.cython_bbox import bbox_overlaps
def generate_anchors_pre(height, width, feat_stride, anchor_scales=(8,16,32), anchor_ratios=(0.5,1,2)):
""" A wrapper function to generate anchors given different scales
Also return the number of anchors in variable 'length'
"""
anchors = generate_anchors(ratios=np.array(anchor_ratios), scales=np.array(anchor_scales))
A = anchors.shape[0]
shift_x = np.arange(0, width) * feat_stride
shift_y = np.arange(0, height) * feat_stride
shift_x, shift_y = np.meshgrid(shift_x, shift_y)
shifts = np.vstack((shift_x.ravel(), shift_y.ravel(), shift_x.ravel(), shift_y.ravel())).transpose()
K = shifts.shape[0]
# width changes faster, so here it is H, W, C
anchors = anchors.reshape((1, A, 4)) + shifts.reshape((1, K, 4)).transpose((1, 0, 2))
anchors = anchors.reshape((K * A, 4)).astype(np.float32, copy=False)
length = np.int32(anchors.shape[0])
return anchors, length
================================================
FILE: lib/model/__init__.py
================================================
from . import config
================================================
FILE: lib/model/bbox_transform.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import numpy as np
def bbox_transform(ex_rois, gt_rois):
ex_widths = ex_rois[:, 2] - ex_rois[:, 0] + 1.0
ex_heights = ex_rois[:, 3] - ex_rois[:, 1] + 1.0
ex_ctr_x = ex_rois[:, 0] + 0.5 * ex_widths
ex_ctr_y = ex_rois[:, 1] + 0.5 * ex_heights
gt_widths = gt_rois[:, 2] - gt_rois[:, 0] + 1.0
gt_heights = gt_rois[:, 3] - gt_rois[:, 1] + 1.0
gt_ctr_x = gt_rois[:, 0] + 0.5 * gt_widths
gt_ctr_y = gt_rois[:, 1] + 0.5 * gt_heights
targets_dx = (gt_ctr_x - ex_ctr_x) / ex_widths
targets_dy = (gt_ctr_y - ex_ctr_y) / ex_heights
targets_dw = np.log(gt_widths / ex_widths)
targets_dh = np.log(gt_heights / ex_heights)
targets = np.vstack(
(targets_dx, targets_dy, targets_dw, targets_dh)).transpose()
return targets
def bbox_transform_inv(boxes, deltas):
if boxes.shape[0] == 0:
return np.zeros((0, deltas.shape[1]), dtype=deltas.dtype)
boxes = boxes.astype(deltas.dtype, copy=False)
widths = boxes[:, 2] - boxes[:, 0] + 1.0
heights = boxes[:, 3] - boxes[:, 1] + 1.0
ctr_x = boxes[:, 0] + 0.5 * widths
ctr_y = boxes[:, 1] + 0.5 * heights
dx = deltas[:, 0::4]
dy = deltas[:, 1::4]
dw = deltas[:, 2::4]
dh = deltas[:, 3::4]
pred_ctr_x = dx * widths[:, np.newaxis] + ctr_x[:, np.newaxis]
pred_ctr_y = dy * heights[:, np.newaxis] + ctr_y[:, np.newaxis]
pred_w = np.exp(dw) * widths[:, np.newaxis]
pred_h = np.exp(dh) * heights[:, np.newaxis]
pred_boxes = np.zeros(deltas.shape, dtype=deltas.dtype)
# x1
pred_boxes[:, 0::4] = pred_ctr_x - 0.5 * pred_w
# y1
pred_boxes[:, 1::4] = pred_ctr_y - 0.5 * pred_h
# x2
pred_boxes[:, 2::4] = pred_ctr_x + 0.5 * pred_w
# y2
pred_boxes[:, 3::4] = pred_ctr_y + 0.5 * pred_h
return pred_boxes
def clip_boxes(boxes, im_shape):
"""
Clip boxes to image boundaries.
"""
# x1 >= 0
boxes[:, 0::4] = np.maximum(np.minimum(boxes[:, 0::4], im_shape[1] - 1), 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(np.minimum(boxes[:, 1::4], im_shape[0] - 1), 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.maximum(np.minimum(boxes[:, 2::4], im_shape[1] - 1), 0)
# y2 < im_shape[0]
boxes[:, 3::4] = np.maximum(np.minimum(boxes[:, 3::4], im_shape[0] - 1), 0)
return boxes
================================================
FILE: lib/model/config.py
================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import os.path as osp
import numpy as np
# `pip install easydict` if you don't have it
from easydict import EasyDict as edict
__C = edict()
# Consumers can get config by:
# from fast_rcnn_config import cfg
cfg = __C
#
# Training options
#
__C.TRAIN = edict()
# Initial learning rate
__C.TRAIN.LEARNING_RATE = 0.0001
# Momentum
__C.TRAIN.MOMENTUM = 0.9
# Weight decay, for regularization
__C.TRAIN.WEIGHT_DECAY = 0.0005
# Factor for reducing the learning rate
__C.TRAIN.GAMMA = 0.1
# Step size for reducing the learning rate, currently only support one step
__C.TRAIN.STEPSIZE = 30000
# Iteration intervals for showing the loss during training, on command line interface
__C.TRAIN.DISPLAY = 10
# Whether to double the learning rate for bias
__C.TRAIN.DOUBLE_BIAS = True
# Whether to initialize the weights with truncated normal distribution
__C.TRAIN.TRUNCATED = False
# Whether to have weight decay on bias as well
__C.TRAIN.BIAS_DECAY = False
# Whether to add ground truth boxes to the pool when sampling regions
__C.TRAIN.USE_GT = False
# Whether to use aspect-ratio grouping of training images, introduced merely for saving
# GPU memory
__C.TRAIN.ASPECT_GROUPING = False
# The number of snapshots kept, older ones are deleted to save space
__C.TRAIN.SNAPSHOT_KEPT = 3
# The time interval for saving tensorflow summaries
__C.TRAIN.SUMMARY_INTERVAL = 180
# Scale to use during training (can list multiple scales)
# The scale is the pixel size of an image's shortest side
__C.TRAIN.SCALES = (600,)
# Max pixel size of the longest side of a scaled input image
__C.TRAIN.MAX_SIZE = 1000
# Images to use per minibatch
__C.TRAIN.IMS_PER_BATCH = 1
# Minibatch size (number of regions of interest [ROIs])
__C.TRAIN.BATCH_SIZE = 128
# Fraction of minibatch that is labeled foreground (i.e. class > 0)
__C.TRAIN.FG_FRACTION = 0.25
# Overlap threshold for a ROI to be considered foreground (if >= FG_THRESH)
__C.TRAIN.FG_THRESH = 0.5
# Overlap threshold for a ROI to be considered background (class = 0 if
# overlap in [LO, HI))
__C.TRAIN.BG_THRESH_HI = 0.5
__C.TRAIN.BG_THRESH_LO = 0.1
# Use horizontally-flipped images during training?
#__C.TRAIN.USE_FLIPPED = True
__C.TRAIN.USE_FLIPPED = False
# Train bounding-box regressors
__C.TRAIN.BBOX_REG = True
# Overlap required between a ROI and ground-truth box in order for that ROI to
# be used as a bounding-box regression training example
__C.TRAIN.BBOX_THRESH = 0.5
# Iterations between snapshots
__C.TRAIN.SNAPSHOT_ITERS = 3000
# solver.prototxt specifies the snapshot path prefix, this adds an optional
# infix to yield the path: <prefix>[_<infix>]_iters_XYZ.caffemodel
__C.TRAIN.SNAPSHOT_PREFIX = 'res101_faster_rcnn'
# __C.TRAIN.SNAPSHOT_INFIX = ''
# Use a prefetch thread in roi_data_layer.layer
# So far I haven't found this useful; likely more engineering work is required
# __C.TRAIN.USE_PREFETCH = False
# Normalize the targets (subtract empirical mean, divide by empirical stddev)
__C.TRAIN.BBOX_NORMALIZE_TARGETS = True
# Deprecated (inside weights)
__C.TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
# Normalize the targets using "precomputed" (or made up) means and stdevs
# (BBOX_NORMALIZE_TARGETS must also be True)
__C.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED = True
__C.TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0)
__C.TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2)
# Train using these proposals
__C.TRAIN.PROPOSAL_METHOD = 'gt'
# Make minibatches from images that have similar aspect ratios (i.e. both
# tall and thin or both short and wide) in order to avoid wasting computation
# on zero-padding.
# Use RPN to detect objects
__C.TRAIN.HAS_RPN = True
# IOU >= thresh: positive example
__C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7
# IOU < thresh: negative example
__C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3
# If an anchor statisfied by positive and negative conditions set to negative
__C.TRAIN.RPN_CLOBBER_POSITIVES = False
# Max number of foreground examples
__C.TRAIN.RPN_FG_FRACTION = 0.5
# Total number of examples
__C.TRAIN.RPN_BATCHSIZE = 256
# NMS threshold used on RPN proposals
__C.TRAIN.RPN_NMS_THRESH = 0.7
# Number of top scoring boxes to keep before apply NMS to RPN proposals
__C.TRAIN.RPN_PRE_NMS_TOP_N = 12000
# Number of top scoring boxes to keep after applying NMS to RPN proposals
__C.TRAIN.RPN_POST_NMS_TOP_N = 2000
# Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale)
# __C.TRAIN.RPN_MIN_SIZE = 16
# Deprecated (outside weights)
__C.TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
# Give the positive RPN examples weight of p * 1 / {num positives}
# and give negatives a weight of (1 - p)
# Set to -1.0 to use uniform example weighting
__C.TRAIN.RPN_POSITIVE_WEIGHT = -1.0
# Whether to use all ground truth bounding boxes for training,
# For COCO, setting USE_ALL_GT to False will exclude boxes that are flagged as ''iscrowd''
__C.TRAIN.USE_ALL_GT = True
#
# Testing options
#
__C.TEST = edict()
# Scale to use during testing (can NOT list multiple scales)
# The scale is the pixel size of an image's shortest side
__C.TEST.SCALES = (600,)
# Max pixel size of the longest side of a scaled input image
__C.TEST.MAX_SIZE = 1000
# Overlap threshold used for non-maximum suppression (suppress boxes with
# IoU >= this threshold)
__C.TEST.NMS = 0.3
# Experimental: treat the (K+1) units in the cls_score layer as linear
# predictors (trained, eg, with one-vs-rest SVMs).
__C.TEST.SVM = False
# Test using bounding-box regressors
__C.TEST.BBOX_REG = True
# Propose boxes
__C.TEST.HAS_RPN = False
# Test using these proposals
__C.TEST.PROPOSAL_METHOD = 'gt'
## NMS threshold used on RPN proposals
__C.TEST.RPN_NMS_THRESH = 0.7
## Number of top scoring boxes to keep before apply NMS to RPN proposals
__C.TEST.RPN_PRE_NMS_TOP_N = 6000
## Number of top scoring boxes to keep after applying NMS to RPN proposals
__C.TEST.RPN_POST_NMS_TOP_N = 300
# Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale)
# __C.TEST.RPN_MIN_SIZE = 16
# Testing mode, default to be 'nms', 'top' is slower but better
# See report for details
__C.TEST.MODE = 'nms'
# Only useful when TEST.MODE is 'top', specifies the number of top proposals to select
__C.TEST.RPN_TOP_N = 5000
#
# ResNet options
#
__C.RESNET = edict()
# Option to set if max-pooling is appended after crop_and_resize.
# if true, the region will be resized to a squre of 2xPOOLING_SIZE,
# then 2x2 max-pooling is applied; otherwise the region will be directly
# resized to a square of POOLING_SIZE
__C.RESNET.MAX_POOL = False
# Number of fixed blocks during finetuning, by default the first of all 4 blocks is fixed
# Range: 0 (none) to 3 (all)
__C.RESNET.FIXED_BLOCKS = 1
# Whether to tune the batch nomalization parameters during training
__C.RESNET.BN_TRAIN = False
#
# MISC
#
# The mapping from image coordinates to feature map coordinates might cause
# some boxes that are distinct in image space to become identical in feature
# coordinates. If DEDUP_BOXES > 0, then DEDUP_BOXES is used as the scale factor
# for identifying duplicate boxes.
# 1/16 is correct for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16
__C.DEDUP_BOXES = 1. / 16.
# Pixel mean values (BGR order) as a (1, 1, 3) array
# We use the same pixel mean for all networks even though it's not exactly what
# they were trained with
__C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
# For reproducibility
__C.RNG_SEED = 3
# A small number that's used many times
__C.EPS = 1e-14
# Root directory of project
__C.ROOT_DIR = osp.abspath(osp.join(osp.dirname(__file__), '..', '..'))
# Data directory
__C.DATA_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'data'))
# Name (or path to) the matlab executable
__C.MATLAB = 'matlab'
# Place outputs under an experiments directory
__C.EXP_DIR = 'default'
# Use GPU implementation of non-maximum suppression
__C.USE_GPU_NMS = True
# Default GPU device id
__C.GPU_ID = 0
# Default pooling mode, only 'crop' is available
__C.POOLING_MODE = 'crop'
# Size of the pooled region after RoI pooling
__C.POOLING_SIZE = 7
# Anchor scales for RPN
__C.ANCHOR_SCALES = [8,16,32]
# Anchor ratios for RPN
__C.ANCHOR_RATIOS = [0.5,1,2]
def get_output_dir(imdb, weights_filename):
"""Return the directory where experimental artifacts are placed.
If the directory does not exist, it is created.
A canonical path is built using the name from an imdb and a network
(if not None).
"""
outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'output', __C.EXP_DIR, imdb.name))
if weights_filename is None:
weights_filename = 'default'
outdir = osp.join(outdir, weights_filename)
if not os.path.exists(outdir):
os.makedirs(outdir)
return outdir
def get_output_tb_dir(imdb, weights_filename):
"""Return the directory where tensorflow summaries are placed.
If the directory does not exist, it is created.
A canonical path is built using the name from an imdb and a network
(if not None).
"""
outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'tensorboard', __C.EXP_DIR, imdb.name))
if weights_filename is None:
weights_filename = 'default'
outdir = osp.join(outdir, weights_filename)
if not os.path.exists(outdir):
os.makedirs(outdir)
return outdir
def _merge_a_into_b(a, b):
"""Merge config dictionary a into config dictionary b, clobbering the
options in b whenever they are also specified in a.
"""
if type(a) is not edict:
return
for k, v in a.items():
# a must specify keys that are in b
if k not in b:
raise KeyError('{} is not a valid config key'.format(k))
# the types must match, too
old_type = type(b[k])
if old_type is not type(v):
if isinstance(b[k], np.ndarray):
v = np.array(v, dtype=b[k].dtype)
else:
raise ValueError(('Type mismatch ({} vs. {}) '
'for config key: {}').format(type(b[k]),
type(v), k))
# recursively merge dicts
if type(v) is edict:
try:
_merge_a_into_b(a[k], b[k])
except:
print(('Error under config key: {}'.format(k)))
raise
else:
b[k] = v
def cfg_from_file(filename):
"""Load a config file and merge it into the default options."""
import yaml
with open(filename, 'r') as f:
yaml_cfg = edict(yaml.load(f))
_merge_a_into_b(yaml_cfg, __C)
def cfg_from_list(cfg_list):
"""Set config keys via list (e.g., from command line)."""
from ast import literal_eval
assert len(cfg_list) % 2 == 0
for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
key_list = k.split('.')
d = __C
for subkey in key_list[:-1]:
assert subkey in d
d = d[subkey]
subkey = key_list[-1]
assert subkey in d
try:
value = literal_eval(v)
except:
# handle the case when v is a string literal
value = v
assert type(value) == type(d[subkey]), \
'type {} does not match original type {}'.format(
type(value), type(d[subkey]))
d[subkey] = value
================================================
FILE: lib/model/config.py~
================================================
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import os.path as osp
import numpy as np
# `pip install easydict` if you don't have it
from easydict import EasyDict as edict
__C = edict()
# Consumers can get config by:
# from fast_rcnn_config import cfg
cfg = __C
#
# Training options
#
__C.TRAIN = edict()
# Initial learning rate
__C.TRAIN.LEARNING_RATE = 0.001
# Momentum
__C.TRAIN.MOMENTUM = 0.9
# Weight decay, for regularization
__C.TRAIN.WEIGHT_DECAY = 0.0005
# Factor for reducing the learning rate
__C.TRAIN.GAMMA = 0.1
# Step size for reducing the learning rate, currently only support one step
__C.TRAIN.STEPSIZE = 30000
# Iteration intervals for showing the loss during training, on command line interface
__C.TRAIN.DISPLAY = 10
# Whether to double the learning rate for bias
__C.TRAIN.DOUBLE_BIAS = True
# Whether to initialize the weights with truncated normal distribution
__C.TRAIN.TRUNCATED = False
# Whether to have weight decay on bias as well
__C.TRAIN.BIAS_DECAY = False
# Whether to add ground truth boxes to the pool when sampling regions
__C.TRAIN.USE_GT = False
# Whether to use aspect-ratio grouping of training images, introduced merely for saving
# GPU memory
__C.TRAIN.ASPECT_GROUPING = False
# The number of snapshots kept, older ones are deleted to save space
__C.TRAIN.SNAPSHOT_KEPT = 3
# The time interval for saving tensorflow summaries
__C.TRAIN.SUMMARY_INTERVAL = 180
# Scale to use during training (can list multiple scales)
# The scale is the pixel size of an image's shortest side
__C.TRAIN.SCALES = (600,)
# Max pixel size of the longest side of a scaled input image
__C.TRAIN.MAX_SIZE = 1000
# Images to use per minibatch
__C.TRAIN.IMS_PER_BATCH = 1
# Minibatch size (number of regions of interest [ROIs])
__C.TRAIN.BATCH_SIZE = 128
# Fraction of minibatch that is labeled foreground (i.e. class > 0)
__C.TRAIN.FG_FRACTION = 0.25
# Overlap threshold for a ROI to be considered foreground (if >= FG_THRESH)
__C.TRAIN.FG_THRESH = 0.5
# Overlap threshold for a ROI to be considered background (class = 0 if
# overlap in [LO, HI))
__C.TRAIN.BG_THRESH_HI = 0.5
__C.TRAIN.BG_THRESH_LO = 0.1
# Use horizontally-flipped images during training?
#__C.TRAIN.USE_FLIPPED = True
__C.TRAIN.USE_FLIPPED = False
# Train bounding-box regressors
__C.TRAIN.BBOX_REG = True
# Overlap required between a ROI and ground-truth box in order for that ROI to
# be used as a bounding-box regression training example
__C.TRAIN.BBOX_THRESH = 0.5
# Iterations between snapshots
__C.TRAIN.SNAPSHOT_ITERS = 3000
# solver.prototxt specifies the snapshot path prefix, this adds an optional
# infix to yield the path: <prefix>[_<infix>]_iters_XYZ.caffemodel
__C.TRAIN.SNAPSHOT_PREFIX = 'res101_faster_rcnn'
# __C.TRAIN.SNAPSHOT_INFIX = ''
# Use a prefetch thread in roi_data_layer.layer
# So far I haven't found this useful; likely more engineering work is required
# __C.TRAIN.USE_PREFETCH = False
# Normalize the targets (subtract empirical mean, divide by empirical stddev)
__C.TRAIN.BBOX_NORMALIZE_TARGETS = True
# Deprecated (inside weights)
__C.TRAIN.BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
# Normalize the targets using "precomputed" (or made up) means and stdevs
# (BBOX_NORMALIZE_TARGETS must also be True)
__C.TRAIN.BBOX_NORMALIZE_TARGETS_PRECOMPUTED = True
__C.TRAIN.BBOX_NORMALIZE_MEANS = (0.0, 0.0, 0.0, 0.0)
__C.TRAIN.BBOX_NORMALIZE_STDS = (0.1, 0.1, 0.2, 0.2)
# Train using these proposals
__C.TRAIN.PROPOSAL_METHOD = 'gt'
# Make minibatches from images that have similar aspect ratios (i.e. both
# tall and thin or both short and wide) in order to avoid wasting computation
# on zero-padding.
# Use RPN to detect objects
__C.TRAIN.HAS_RPN = True
# IOU >= thresh: positive example
__C.TRAIN.RPN_POSITIVE_OVERLAP = 0.7
# IOU < thresh: negative example
__C.TRAIN.RPN_NEGATIVE_OVERLAP = 0.3
# If an anchor statisfied by positive and negative conditions set to negative
__C.TRAIN.RPN_CLOBBER_POSITIVES = False
# Max number of foreground examples
__C.TRAIN.RPN_FG_FRACTION = 0.5
# Total number of examples
__C.TRAIN.RPN_BATCHSIZE = 256
# NMS threshold used on RPN proposals
__C.TRAIN.RPN_NMS_THRESH = 0.7
# Number of top scoring boxes to keep before apply NMS to RPN proposals
__C.TRAIN.RPN_PRE_NMS_TOP_N = 12000
# Number of top scoring boxes to keep after applying NMS to RPN proposals
__C.TRAIN.RPN_POST_NMS_TOP_N = 2000
# Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale)
# __C.TRAIN.RPN_MIN_SIZE = 16
# Deprecated (outside weights)
__C.TRAIN.RPN_BBOX_INSIDE_WEIGHTS = (1.0, 1.0, 1.0, 1.0)
# Give the positive RPN examples weight of p * 1 / {num positives}
# and give negatives a weight of (1 - p)
# Set to -1.0 to use uniform example weighting
__C.TRAIN.RPN_POSITIVE_WEIGHT = -1.0
# Whether to use all ground truth bounding boxes for training,
# For COCO, setting USE_ALL_GT to False will exclude boxes that are flagged as ''iscrowd''
__C.TRAIN.USE_ALL_GT = True
#
# Testing options
#
__C.TEST = edict()
# Scale to use during testing (can NOT list multiple scales)
# The scale is the pixel size of an image's shortest side
__C.TEST.SCALES = (600,)
# Max pixel size of the longest side of a scaled input image
__C.TEST.MAX_SIZE = 1000
# Overlap threshold used for non-maximum suppression (suppress boxes with
# IoU >= this threshold)
__C.TEST.NMS = 0.3
# Experimental: treat the (K+1) units in the cls_score layer as linear
# predictors (trained, eg, with one-vs-rest SVMs).
__C.TEST.SVM = False
# Test using bounding-box regressors
__C.TEST.BBOX_REG = True
# Propose boxes
__C.TEST.HAS_RPN = False
# Test using these proposals
__C.TEST.PROPOSAL_METHOD = 'gt'
## NMS threshold used on RPN proposals
__C.TEST.RPN_NMS_THRESH = 0.7
## Number of top scoring boxes to keep before apply NMS to RPN proposals
__C.TEST.RPN_PRE_NMS_TOP_N = 6000
## Number of top scoring boxes to keep after applying NMS to RPN proposals
__C.TEST.RPN_POST_NMS_TOP_N = 300
# Proposal height and width both need to be greater than RPN_MIN_SIZE (at orig image scale)
# __C.TEST.RPN_MIN_SIZE = 16
# Testing mode, default to be 'nms', 'top' is slower but better
# See report for details
__C.TEST.MODE = 'nms'
# Only useful when TEST.MODE is 'top', specifies the number of top proposals to select
__C.TEST.RPN_TOP_N = 5000
#
# ResNet options
#
__C.RESNET = edict()
# Option to set if max-pooling is appended after crop_and_resize.
# if true, the region will be resized to a squre of 2xPOOLING_SIZE,
# then 2x2 max-pooling is applied; otherwise the region will be directly
# resized to a square of POOLING_SIZE
__C.RESNET.MAX_POOL = False
# Number of fixed blocks during finetuning, by default the first of all 4 blocks is fixed
# Range: 0 (none) to 3 (all)
__C.RESNET.FIXED_BLOCKS = 1
# Whether to tune the batch nomalization parameters during training
__C.RESNET.BN_TRAIN = False
#
# MISC
#
# The mapping from image coordinates to feature map coordinates might cause
# some boxes that are distinct in image space to become identical in feature
# coordinates. If DEDUP_BOXES > 0, then DEDUP_BOXES is used as the scale factor
# for identifying duplicate boxes.
# 1/16 is correct for {Alex,Caffe}Net, VGG_CNN_M_1024, and VGG16
__C.DEDUP_BOXES = 1. / 16.
# Pixel mean values (BGR order) as a (1, 1, 3) array
# We use the same pixel mean for all networks even though it's not exactly what
# they were trained with
__C.PIXEL_MEANS = np.array([[[102.9801, 115.9465, 122.7717]]])
# For reproducibility
__C.RNG_SEED = 3
# A small number that's used many times
__C.EPS = 1e-14
# Root directory of project
__C.ROOT_DIR = osp.abspath(osp.join(osp.dirname(__file__), '..', '..'))
# Data directory
__C.DATA_DIR = osp.abspath(osp.join(__C.ROOT_DIR, 'data'))
# Name (or path to) the matlab executable
__C.MATLAB = 'matlab'
# Place outputs under an experiments directory
__C.EXP_DIR = 'default'
# Use GPU implementation of non-maximum suppression
__C.USE_GPU_NMS = True
# Default GPU device id
__C.GPU_ID = 0
# Default pooling mode, only 'crop' is available
__C.POOLING_MODE = 'crop'
# Size of the pooled region after RoI pooling
__C.POOLING_SIZE = 7
# Anchor scales for RPN
__C.ANCHOR_SCALES = [8,16,32]
# Anchor ratios for RPN
__C.ANCHOR_RATIOS = [0.5,1,2]
def get_output_dir(imdb, weights_filename):
"""Return the directory where experimental artifacts are placed.
If the directory does not exist, it is created.
A canonical path is built using the name from an imdb and a network
(if not None).
"""
outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'output', __C.EXP_DIR, imdb.name))
if weights_filename is None:
weights_filename = 'default'
outdir = osp.join(outdir, weights_filename)
if not os.path.exists(outdir):
os.makedirs(outdir)
return outdir
def get_output_tb_dir(imdb, weights_filename):
"""Return the directory where tensorflow summaries are placed.
If the directory does not exist, it is created.
A canonical path is built using the name from an imdb and a network
(if not None).
"""
outdir = osp.abspath(osp.join(__C.ROOT_DIR, 'tensorboard', __C.EXP_DIR, imdb.name))
if weights_filename is None:
weights_filename = 'default'
outdir = osp.join(outdir, weights_filename)
if not os.path.exists(outdir):
os.makedirs(outdir)
return outdir
def _merge_a_into_b(a, b):
"""Merge config dictionary a into config dictionary b, clobbering the
options in b whenever they are also specified in a.
"""
if type(a) is not edict:
return
for k, v in a.items():
# a must specify keys that are in b
if k not in b:
raise KeyError('{} is not a valid config key'.format(k))
# the types must match, too
old_type = type(b[k])
if old_type is not type(v):
if isinstance(b[k], np.ndarray):
v = np.array(v, dtype=b[k].dtype)
else:
raise ValueError(('Type mismatch ({} vs. {}) '
'for config key: {}').format(type(b[k]),
type(v), k))
# recursively merge dicts
if type(v) is edict:
try:
_merge_a_into_b(a[k], b[k])
except:
print(('Error under config key: {}'.format(k)))
raise
else:
b[k] = v
def cfg_from_file(filename):
"""Load a config file and merge it into the default options."""
import yaml
with open(filename, 'r') as f:
yaml_cfg = edict(yaml.load(f))
_merge_a_into_b(yaml_cfg, __C)
def cfg_from_list(cfg_list):
"""Set config keys via list (e.g., from command line)."""
from ast import literal_eval
assert len(cfg_list) % 2 == 0
for k, v in zip(cfg_list[0::2], cfg_list[1::2]):
key_list = k.split('.')
d = __C
for subkey in key_list[:-1]:
assert subkey in d
d = d[subkey]
subkey = key_list[-1]
assert subkey in d
try:
value = literal_eval(v)
except:
# handle the case when v is a string literal
value = v
assert type(value) == type(d[subkey]), \
'type {} does not match original type {}'.format(
type(value), type(d[subkey]))
d[subkey] = value
================================================
FILE: lib/model/nms_wrapper.py
================================================
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from model.config import cfg
from nms.gpu_nms import gpu_nms
from nms.cpu_nms import cpu_nms
def nms(dets, thresh, force_cpu=False):
"""Dispatch to either CPU or GPU NMS implementations."""
if dets.shape[0] == 0:
return []
if cfg.USE_GPU_NMS and not force_cpu:
return gpu_nms(dets, thresh, device_id=cfg.GPU_ID)
else:
return cpu_nms(dets, thresh)
================================================
FILE: lib/model/test.py
================================================
# --------------------------------------------------------
# Tensorflow Faster R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import cv2
import numpy as np
try:
import cPickle as pickle
except ImportError:
import pickle
import os
import math
from utils.timer import Timer
from utils.cython_nms import nms, nms_new
from utils.boxes_grid import get_boxes_grid
from utils.blob import im_list_to_blob
from model.config import cfg, get_output_dir
from model.bbox_transform import clip_boxes, bbox_transform_inv
def _get_image_blob(im):
"""Converts an image into a network input.
Arguments:
im (ndarray): a color image in BGR order
Returns:
blob (ndarray): a data blob holding an image pyramid
im_scale_factors (list): list of image scales (relative to im) used
in the image pyramid
"""
im_orig = im.astype(np.float32, copy=True)
im_orig -= cfg.PIXEL_MEANS
im_shape = im_orig.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
processed_ims = []
im_scale_factors = []
for target_size in cfg.TEST.SCALES:
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
im_scale_factors.append(im_scale)
processed_ims.append(im)
# Create a blob to hold the input images
blob = im_list_to_blob(processed_ims)
return blob, np.array(im_scale_factors)
def _get_blobs(im):
"""Convert an image and RoIs within that image into network inputs."""
blobs = {}
blobs['data'], im_scale_factors = _get_image_blob(im)
return blobs, im_scale_factors
def _clip_boxes(boxes, im_shape):
"""Clip boxes to image boundaries."""
# x1 >= 0
boxes[:, 0::4] = np.maximum(boxes[:, 0::4], 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(boxes[:, 1::4], 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.minimum(boxes[:, 2::4], im_shape[1] - 1)
# y2 < im_shape[0]
boxes[:, 3::4] = np.minimum(boxes[:, 3::4], im_shape[0] - 1)
return boxes
def _rescale_boxes(boxes, inds, scales):
"""Rescale boxes according to image rescaling."""
for i in range(boxes.shape[0]):
boxes[i,:] = boxes[i,:] / scales[int(inds[i])]
return boxes
def im_detect(sess, net, im):
blobs, im_scales = _get_blobs(im)
assert len(im_scales) == 1, "Only single-image batch implemented"
im_blob = blobs['data']
# seems to have height, width, and image scales
# still not sure about the scale, maybe full image it is 1.
blobs['im_info'] = np.array([[im_blob.shape[1], im_blob.shape[2], im_scales[0]]], dtype=np.float32)
_, scores, bbox_pred, rois = net.test_image(sess, blobs['data'], blobs['im_info'])
boxes = rois[:, 1:5] / im_scales[0]
# print(scores.shape, bbox_pred.shape, rois.shape, boxes.shape)
scores = np.reshape(scores, [scores.shape[0], -1])
bbox_pred = np.reshape(bbox_pred, [bbox_pred.shape[0], -1])
if cfg.TEST.BBOX_REG:
# Apply bounding-box regression deltas
box_deltas = bbox_pred
pred_boxes = bbox_transform_inv(boxes, box_deltas)
pred_boxes = _clip_boxes(pred_boxes, im.shape)
else:
# Simply repeat the boxes, once for each class
pred_boxes = np.tile(boxes, (1, scores.shape[1]))
return scores, pred_boxes
def apply_nms(all_boxes, thresh):
"""Apply non-maximum suppression to all predicted boxes output by the
test_net method.
"""
num_classes = len(all_boxes)
num_images = len(all_boxes[0])
nms_boxes = [[[] for _ in range(num_images)] for _ in range(num_classes)]
for cls_ind in range(num_classes):
for im_ind in range(num_images):
dets = all_boxes[cls_ind][im_ind]
if dets == []:
continue
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
inds = np.where((x2 > x1) & (y2 > y1) & (scores > cfg.TEST.DET_THRESHOLD))[0]
dets = dets[inds,:]
if dets == []:
continue
keep = nms(dets, thresh)
if len(keep) == 0:
continue
nms_boxes[cls_ind][im_ind] = dets[keep, :].copy()
return nms_boxes
def test_net(sess, net, imdb, weights_filename, max_per_image=100, thresh=0.05):
np.random.seed(cfg.RNG_SEED)
"""Test a Fast R-CNN network on an image database."""
num_images = len(imdb.image_index)
# all detections are collected into:
# all_boxes[cls][image] = N x 5 array of detections in
# (x1, y1, x2, y2, score)
all_boxes = [[[] for _ in range(num_images)]
for _ in range(imdb.num_classes)]
output_dir = get_output_dir(imdb, weights_filename)
# timers
_t = {'im_detect' : Timer(), 'misc' : Timer()}
for i in range(num_images):
im = cv2.imread(imdb.image_path_at(i))
_t['im_detect'].tic()
scores, boxes = im_detect(sess, net, im)
_t['im_detect'].toc()
_t['misc'].tic()
# skip j = 0, because it's the background class
for j in range(1, imdb.num_classes):
inds = np.where(scores[:, j] > thresh)[0]
cls_scores = scores[inds, j]
cls_boxes = boxes[inds, j*4:(j+1)*4]
cls_dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])) \
.astype(np.float32, copy=False)
keep = nms(cls_dets, cfg.TEST.NMS)
cls_dets = cls_dets[keep, :]
all_boxes[j][i] = cls_dets
# Limit to max_per_image detections *over all classes*
if max_per_image > 0:
image_scores = np.hstack([all_boxes[j][i][:, -1]
for j in range(1, imdb.num_classes)])
if len(image_scores) > max_per_image:
image_thresh = np.sort(image_scores)[-max_per_image]
for j in range(1, imdb.num_classes):
keep = np.where(all_boxes[j][i][:, -1] >= image_thresh)[0]
all_boxes[j][i] = all_boxes[j][i][keep, :]
_t['misc'].toc()
print('im_detect: {:d}/{:d} {:.3f}s {:.3f}s' \
.format(i + 1, num_images, _t['im_detect'].average_time,
_t['misc'].average_time))
det_file = os.path.join(output_dir, 'detections.pkl')
with open(det_file, 'wb') as f:
pickle.dump(all_boxes, f, pickle.HIGHEST_PROTOCOL)
print('Evaluating detections')
imdb.evaluate_detections(all_boxes, output_dir)
================================================
FILE: lib/model/test.py~
================================================
# --------------------------------------------------------
# Tensorflow Faster R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import cv2
import numpy as np
try:
import cPickle as pickle
except ImportError:
import pickle
import os
import math
from utils.timer import Timer
from utils.cython_nms import nms, nms_new
from utils.boxes_grid import get_boxes_grid
from utils.blob import im_list_to_blob
from model.config import cfg, get_output_dir
from model.bbox_transform import clip_boxes, bbox_transform_inv
def _get_image_blob(im):
"""Converts an image into a network input.
Arguments:
im (ndarray): a color image in BGR order
Returns:
blob (ndarray): a data blob holding an image pyramid
im_scale_factors (list): list of image scales (relative to im) used
in the image pyramid
"""
im_orig = im.astype(np.float32, copy=True)
im_orig -= cfg.PIXEL_MEANS
im_shape = im_orig.shape
im_size_min = np.min(im_shape[0:2])
im_size_max = np.max(im_shape[0:2])
processed_ims = []
im_scale_factors = []
for target_size in cfg.TEST.SCALES:
im_scale = float(target_size) / float(im_size_min)
# Prevent the biggest axis from being more than MAX_SIZE
if np.round(im_scale * im_size_max) > cfg.TEST.MAX_SIZE:
im_scale = float(cfg.TEST.MAX_SIZE) / float(im_size_max)
im = cv2.resize(im_orig, None, None, fx=im_scale, fy=im_scale,
interpolation=cv2.INTER_LINEAR)
im_scale_factors.append(im_scale)
processed_ims.append(im)
# Create a blob to hold the input images
blob = im_list_to_blob(processed_ims)
return blob, np.array(im_scale_factors)
def _get_blobs(im):
"""Convert an image and RoIs within that image into network inputs."""
blobs = {}
blobs['data'], im_scale_factors = _get_image_blob(im)
return blobs, im_scale_factors
def _clip_boxes(boxes, im_shape):
"""Clip boxes to image boundaries."""
# x1 >= 0
boxes[:, 0::4] = np.maximum(boxes[:, 0::4], 0)
# y1 >= 0
boxes[:, 1::4] = np.maximum(boxes[:, 1::4], 0)
# x2 < im_shape[1]
boxes[:, 2::4] = np.minimum(boxes[:, 2::4], im_shape[1] - 1)
# y2 < im_shape[0]
boxes[:, 3::4] = np.minimum(boxes[:, 3::4], im_shape[0] - 1)
return boxes
def _rescale_boxes(boxes, inds, scales):
"""Rescale boxes according to image rescaling."""
for i in range(boxes.shape[0]):
boxes[i,:] = boxes[i,:] / scales[int(inds[i])]
return boxes
def im_detect(sess, net, im):
blobs, im_scales = _get_blobs(im)
assert len(im_scales) == 1, "Only single-image batch implemented"
print (im_scales)
im_blob = blobs['data']
# seems to have height, width, and image scales
# still not sure about the scale, maybe full image it is 1.
blobs['im_info'] = np.array([[im_blob.shape[1], im_blob.shape[2], im_scales[0]]], dtype=np.float32)
_, scores, bbox_pred, rois = net.test_image(sess, blobs['data'], blobs['im_info'])
boxes = rois[:, 1:5] / im_scales[0]
# print(scores.shape, bbox_pred.shape, rois.shape, boxes.shape)
scores = np.reshape(scores, [scores.shape[0], -1])
bbox_pred = np.reshape(bbox_pred, [bbox_pred.shape[0], -1])
if cfg.TEST.BBOX_REG:
# Apply bounding-box regression deltas
box_deltas = bbox_pred
pred_boxes = bbox_transform_inv(boxes, box_deltas)
pred_boxes = _clip_boxes(pred_boxes, im.shape)
else:
# Simply repeat the boxes, once for each class
pred_boxes = np.tile(boxes, (1, scores.shape[1]))
return scores, pred_boxes
def apply_nms(all_boxes, thresh):
"""Apply non-maximum suppression to all predicted boxes output by the
test_net method.
"""
num_classes = len(all_boxes)
num_images = len(all_boxes[0])
nms_boxes = [[[] for _ in range(num_images)] for _ in range(num_classes)]
for cls_ind in range(num_classes):
for im_ind in range(num_images):
dets = all_boxes[cls_ind][im_ind]
if dets == []:
continue
x1 = dets[:, 0]
y1 = dets[:, 1]
x2 = dets[:, 2]
y2 = dets[:, 3]
scores = dets[:, 4]
inds = np.where((x2 > x1) & (y2 > y1) & (scores > cfg.TEST.DET_THRESHOLD))[0]
dets = dets[inds,:]
if dets == []:
continue
keep = nms(dets, thresh)
if len(keep) == 0:
continue
nms_boxes[cls_ind][im_ind] = dets[keep, :].copy()
return nms_boxes
def test_net(sess, net, imdb, weights_filename, max_per_image=100, thresh=0.05):
np.random.seed(cfg.RNG_SEED)
"""Test a Fast R-CNN network on an image database."""
num_images = len(imdb.image_index)
# all detections are collected into:
# all_boxes[cls][image] = N x 5 array of detections in
# (x1, y1, x2, y2, score)
all_boxes = [[[] for _ in range(num_images)]
for _ in range(imdb.num_classes)]
output_dir = get_output_dir(imdb, weights_filename)
# timers
_t = {'im_detect' : Timer(), 'misc' : Timer()}
for i in range(num_images):
im = cv2.imread(imdb.image_path_at(i))
_t['im_detect'].tic()
scores, boxes = im_detect(sess, net, im)
_t['im_detect'].toc()
_t['misc'].tic()
# skip j = 0, because it's the background class
for j in range(1, imdb.num_classes):
inds = np.where(scores[:, j] > thresh)[0]
cls_scores = scores[inds, j]
cls_boxes = boxes[inds, j*4:(j+1)*4]
cls_dets = np.hstack((cls_boxes, cls_scores[:, np.newaxis])) \
.astype(np.float32, copy=False)
keep = nms(cls_dets, cfg.TEST.NMS)
cls_dets = cls_dets[keep, :]
all_boxes[j][i] = cls_dets
# Limit to max_per_image detections *over all classes*
if max_per_image > 0:
image_scores = np.hstack([all_boxes[j][i][:, -1]
for j in range(1, imdb.num_classes)])
if len(image_scores) > max_per_image:
image_thresh = np.sort(image_scores)[-max_per_image]
for j in range(1, imdb.num_classes):
keep = np.where(all_boxes[j][i][:, -1] >= image_thresh)[0]
all_boxes[j][i] = all_boxes[j][i][keep, :]
_t['misc'].toc()
print('im_detect: {:d}/{:d} {:.3f}s {:.3f}s' \
.format(i + 1, num_images, _t['im_detect'].average_time,
_t['misc'].average_time))
det_file = os.path.join(output_dir, 'detections.pkl')
with open(det_file, 'wb') as f:
pickle.dump(all_boxes, f, pickle.HIGHEST_PROTOCOL)
print('Evaluating detections')
imdb.evaluate_detections(all_boxes, output_dir)
================================================
FILE: lib/model/train_val.py
================================================
# --------------------------------------------------------
# Tensorflow Faster R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Xinlei Chen and Zheqi He
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from model.config import cfg
import roi_data_layer.roidb as rdl_roidb
from roi_data_layer.layer import RoIDataLayer
from utils.timer import Timer
try:
import cPickle as pickle
except ImportError:
import pickle
import numpy as np
import os
import sys
import glob
import time
import tensorflow as tf
from tensorflow.python import pywrap_tensorflow
class SolverWrapper(object):
"""
A wrapper class for the training process
"""
def __init__(self, sess, network, imdb, roidb, valroidb, output_dir, tbdir, pretrained_model=None):
self.net = network
self.imdb = imdb
self.roidb = roidb
self.valroidb = valroidb
self.output_dir = output_dir
self.tbdir = tbdir
# Simply put '_val' at the end to save the summaries from the validation set
self.tbvaldir = tbdir + '_val'
if not os.path.exists(self.tbvaldir):
os.makedirs(self.tbvaldir)
self.pretrained_model = pretrained_model
def snapshot(self, sess, iter):
net = self.net
if not os.path.exists(self.output_dir):
os.makedirs(self.output_dir)
# Store the model snapshot
filename = cfg.TRAIN.SNAPSHOT_PREFIX + '_iter_{:d}'.format(iter) + '.ckpt'
filename = os.path.join(self.output_dir, filename)
self.saver.save(sess, filename)
print('Wrote snapshot to: {:s}'.format(filename))
# Also store some meta information, random state, etc.
nfilename = cfg.TRAIN.SNAPSHOT_PREFIX + '_iter_{:d}'.format(iter) + '.pkl'
nfilename = os.path.join(self.output_dir, nfilename)
# current state of numpy random
st0 = np.random.get_state()
# current position in the database
cur = self.data_layer._cur
# current shuffled indeces of the database
perm = self.data_layer._perm
# current position in the validation database
cur_val = self.data_layer_val._cur
# current shuffled indeces of the validation database
perm_val = self.data_layer_val._perm
# Dump the meta info
with open(nfilename, 'wb') as fid:
pickle.dump(st0, fid, pickle.HIGHEST_PROTOCOL)
pickle.dump(cur, fid, pickle.HIGHEST_PROTOCOL)
pickle.dump(perm, fid, pickle.HIGHEST_PROTOCOL)
pickle.dump(cur_val, fid, pickle.HIGHEST_PROTOCOL)
pickle.dump(perm_val, fid, pickle.HIGHEST_PROTOCOL)
pickle.dump(iter, fid, pickle.HIGHEST_PROTOCOL)
return filename, nfilename
def get_variables_in_checkpoint_file(self, file_name):
try:
reader = pywrap_tensorflow.NewCheckpointReader(file_name)
var_to_shape_map = reader.get_variable_to_shape_map()
return var_to_shape_map
except Exception as e: # pylint: disable=broad-except
print(str(e))
if "corrupted compressed block contents" in str(e):
print("It's likely that your checkpoint file has been compressed "
"with SNAPPY.")
def train_model(self, sess, max_iters):
# Build data layers for both training and validation set
self.data_layer = RoIDataLayer(self.roidb, self.imdb.num_classes)
self.data_layer_val = RoIDataLayer(self.valroidb, self.imdb.num_classes, random=True)
# Determine different scales for anchors, see paper
with sess.graph.as_default():
# Set the random seed for tensorflow
tf.set_random_seed(cfg.RNG_SEED)
# Build the main computation graph
layers = self.net.create_architecture(sess, 'TRAIN', self.imdb.num_classes, tag='default',
anchor_scales=cfg.ANCHOR_SCALES,
anchor_ratios=cfg.ANCHOR_RATIOS)
# Define the loss
loss = layers['total_loss']
# Set learning rate and momentum
lr = tf.Variable(cfg.TRAIN.LEARNING_RATE, trainable=False)
momentum = cfg.TRAIN.MOMENTUM
self.optimizer = tf.train.MomentumOptimizer(lr, momentum)
# Compute the gradients wrt the loss
gvs = self.optimizer.compute_gradients(loss)
# Double the gradient of the bias if set
if cfg.TRAIN.DOUBLE_BIAS:
final_gvs = []
with tf.variable_scope('Gradient_Mult') as scope:
for grad, var in gvs:
scale = 1.
if cfg.TRAIN.DOUBLE_BIAS and '/biases:' in var.name:
scale *= 2.
if not np.allclose(scale, 1.0):
grad = tf.multiply(grad, scale)
final_gvs.append((grad, var))
train_op = self.optimizer.apply_gradients(final_gvs)
else:
train_op = self.optimizer.apply_gradients(gvs)
# We will handle the snapshots ourselves
self.saver = tf.train.Saver(max_to_keep=100000)
# Write the train and validation information to tensorboard
self.writer = tf.summary.FileWriter(self.tbdir, sess.graph)
self.valwriter = tf.summary.FileWriter(self.tbvaldir)
# Find previous snapshots if there is any to restore from
sfiles = os.path.join(self.output_dir, cfg.TRAIN.SNAPSHOT_PREFIX + '_iter_*.ckpt.meta')
sfiles = glob.glob(sfiles)
sfiles.sort(key=os.path.getmtime)
# Get the snapshot name in TensorFlow
redstr = '_iter_{:d}.'.format(cfg.TRAIN.STEPSIZE+1)
sfiles = [ss.replace('.meta', '') for ss in sfiles]
sfiles = [ss for ss in sfiles if redstr not in ss]
nfiles = os.path.join(self.output_dir, cfg.TRAIN.SNAPSHOT_PREFIX + '_iter_*.pkl')
nfiles = glob.glob(nfiles)
nfiles.sort(key=os.path.getmtime)
nfiles = [nn for nn in nfiles if redstr not in nn]
lsf = len(sfiles)
assert len(nfiles) == lsf
np_paths = nfiles
ss_paths = sfiles
if lsf == 0:
# Fresh train directly from ImageNet weights
print('Loading initial model weights from {:s}'.format(self.pretrained_model))
variables = tf.global_variables()
# Initialize all variables first
sess.run(tf.variables_initializer(variables, name='init'))
var_keep_dic = self.get_variables_in_checkpoint_file(self.pretrained_model)
# Get the variables to restore, ignorizing the variables to fix
variables_to_restore = self.net.get_variables_to_restore(variables, var_keep_dic)
restorer = tf.train.Saver(variables_to_restore)
restorer.restore(sess, self.pretrained_model)
print('Loaded.')
# Need to fix the variables before loading, so that the RGB weights are changed to BGR
# For VGG16 it also changes the convolutional weights fc6 and fc7 to
# fully connected weights
self.net.fix_variables(sess, self.pretrained_model)
print('Fixed.')
sess.run(tf.assign(lr, cfg.TRAIN.LEARNING_RATE))
last_snapshot_iter = 0
else:
# Get the most recent snapshot and restore
ss_paths = [ss_paths[-1]]
np_paths = [np_paths[-1]]
print('Restorining model snapshots from {:s}'.format(sfiles[-1]))
self.saver.restore(sess, str(sfiles[-1]))
print('Restored.')
# Needs to restore the other hyperparameters/states for training, (TODO xinlei) I have
# tried my best to find the random states so that it can be recovered exactly
# However the Tensorflow state is currently not available
with open(str(nfiles[-1]), 'rb') as fid:
st0 = pickle.load(fid)
cur = pickle.load(fid)
perm = pickle.load(fid)
cur_val = pickle.load(fid)
perm_val = pickle.load(fid)
last_snapshot_iter = pickle.load(fid)
np.random.set_state(st0)
self.data_layer._cur = cur
self.data_layer._perm = perm
self.data_layer_val._cur = cur_val
self.data_layer_val._perm = perm_val
# Set the learning rate, only reduce once
if last_snapshot_iter > cfg.TRAIN.STEPSIZE:
sess.run(tf.assign(lr, cfg.TRAIN.LEARNING_RATE * cfg.TRAIN.GAMMA))
else:
sess.run(tf.assign(lr, cfg.TRAIN.LEARNING_RATE))
a = np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()])
print(a)
timer = Timer()
iter = last_snapshot_iter + 1
last_summary_time = time.time()
while iter < max_iters + 1:
# Learning rate
if iter == cfg.TRAIN.STEPSIZE + 1:
# Add snapshot here before reducing the learning rate
self.snapshot(sess, iter)
sess.run(tf.assign(lr, cfg.TRAIN.LEARNING_RATE * cfg.TRAIN.GAMMA))
timer.tic()
# Get training data, one batch at a time
blobs = self.data_layer.forward()
now = time.time()
if now - last_summary_time > cfg.TRAIN.SUMMARY_INTERVAL:
# Compute the graph with summary
rpn_loss_cls, rpn_loss_box, loss_cls, loss_box, total_loss, summary = \
self.net.train_step_with_summary(sess, blobs, train_op)
self.writer.add_summary(summary, float(iter))
# Also check the summary on the validation set
blobs_val = self.data_layer_val.forward()
summary_val = self.net.get_summary(sess, blobs_val)
self.valwriter.add_summary(summary_val, float(iter))
last_summary_time = now
else:
# Compute the graph without summary
rpn_loss_cls, rpn_loss_box, loss_cls, loss_box, total_loss = \
self.net.train_step(sess, blobs, train_op)
timer.toc()
# Display training information
if iter % (cfg.TRAIN.DISPLAY) == 0:
print('iter: %d / %d, total loss: %.6f\n >>> rpn_loss_cls: %.6f\n '
'>>> rpn_loss_box: %.6f\n >>> loss_cls: %.6f\n >>> loss_box: %.6f\n >>> lr: %f' % \
(iter, max_iters, total_loss, rpn_loss_cls, rpn_loss_box, loss_cls, loss_box, lr.eval()))
print('speed: {:.3f}s / iter'.format(timer.average_time))
if iter % cfg.TRAIN.SNAPSHOT_ITERS == 0:
last_snapshot_iter = iter
snapshot_path, np_path = self.snapshot(sess, iter)
np_paths.append(np_path)
ss_paths.append(snapshot_path)
# Remove the old snapshots if there are too many
if len(np_paths) > cfg.TRAIN.SNAPSHOT_KEPT:
to_remove = len(np_paths) - cfg.TRAIN.SNAPSHOT_KEPT
for c in range(to_remove):
nfile = np_paths[0]
os.remove(str(nfile))
np_paths.remove(nfile)
if len(ss_paths) > cfg.TRAIN.SNAPSHOT_KEPT:
to_remove = len(ss_paths) - cfg.TRAIN.SNAPSHOT_KEPT
for c in range(to_remove):
sfile = ss_paths[0]
# To make the code compatible to earlier versions of Tensorflow,
# where the naming tradition for checkpoints are different
if os.path.exists(str(sfile)):
os.remove(str(sfile))
else:
os.remove(str(sfile + '.data-00000-of-00001'))
os.remove(str(sfile + '.index'))
sfile_meta = sfile + '.meta'
os.remove(str(sfile_meta))
ss_paths.remove(sfile)
iter += 1
if last_snapshot_iter != iter - 1:
self.snapshot(sess, iter - 1)
self.writer.close()
self.valwriter.close()
def get_training_roidb(imdb):
"""Returns a roidb (Region of Interest database) for use in training."""
if cfg.TRAIN.USE_FLIPPED:
print('Appending horizontally-flipped training examples...')
imdb.append_flipped_images()
print('done')
print('Preparing training data...')
rdl_roidb.prepare_roidb(imdb)
print('done')
return imdb.roidb
def filter_roidb(roidb):
"""Remove roidb entries that have no usable RoIs."""
def is_valid(entry):
# Valid images have:
# (1) At least one foreground RoI OR
# (2) At least one background RoI
overlaps = entry['max_overlaps']
# find boxes with sufficient overlap
fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
(overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
# image is only valid if such boxes exist
valid = len(fg_inds) > 0 or len(bg_inds) > 0
return valid
num = len(roidb)
filtered_roidb = [entry for entry in roidb if is_valid(entry)]
num_after = len(filtered_roidb)
print('Filtered {} roidb entries: {} -> {}'.format(num - num_after,
num, num_after))
return filtered_roidb
def train_net(network, imdb, roidb, valroidb, output_dir, tb_dir,
pretrained_model=None,
max_iters=40000):
"""Train a Fast R-CNN network."""
roidb = filter_roidb(roidb)
valroidb = filter_roidb(valroidb)
tfconfig = tf.ConfigProto(allow_soft_placement=True)
tfconfig.gpu_options.allow_growth = True
with tf.Session(config=tfconfig) as sess:
sw = SolverWrapper(sess, network, imdb, roidb, valroidb, output_dir, tb_dir,
pretrained_model=pretrained_model)
print('Solving...')
sw.train_model(sess, max_iters)
print('done solving')
================================================
FILE: lib/model/train_val.py~
================================================
# --------------------------------------------------------
# Tensorflow Faster R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Xinlei Chen and Zheqi He
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from model.config import cfg
import roi_data_layer.roidb as rdl_roidb
from roi_data_layer.layer import RoIDataLayer
from utils.timer import Timer
try:
import cPickle as pickle
except ImportError:
import pickle
import numpy as np
import os
import sys
import glob
import time
import tensorflow as tf
from tensorflow.python import pywrap_tensorflow
class SolverWrapper(object):
"""
A wrapper class for the training process
"""
def __init__(self, sess, network, imdb, roidb, valroidb, output_dir, tbdir, pretrained_model=None):
self.net = network
self.imdb = imdb
self.roidb = roidb
self.valroidb = valroidb
self.output_dir = output_dir
self.tbdir = tbdir
# Simply put '_val' at the end to save the summaries from the validation set
self.tbvaldir = tbdir + '_val'
if not os.path.exists(self.tbvaldir):
os.makedirs(self.tbvaldir)
self.pretrained_model = pretrained_model
def snapshot(self, sess, iter):
net = self.net
if not os.path.exists(self.output_dir):
os.makedirs(self.output_dir)
# Store the model snapshot
filename = cfg.TRAIN.SNAPSHOT_PREFIX + '_iter_{:d}'.format(iter) + '.ckpt'
filename = os.path.join(self.output_dir, filename)
self.saver.save(sess, filename)
print('Wrote snapshot to: {:s}'.format(filename))
# Also store some meta information, random state, etc.
nfilename = cfg.TRAIN.SNAPSHOT_PREFIX + '_iter_{:d}'.format(iter) + '.pkl'
nfilename = os.path.join(self.output_dir, nfilename)
# current state of numpy random
st0 = np.random.get_state()
# current position in the database
cur = self.data_layer._cur
# current shuffled indeces of the database
perm = self.data_layer._perm
# current position in the validation database
cur_val = self.data_layer_val._cur
# current shuffled indeces of the validation database
perm_val = self.data_layer_val._perm
# Dump the meta info
with open(nfilename, 'wb') as fid:
pickle.dump(st0, fid, pickle.HIGHEST_PROTOCOL)
pickle.dump(cur, fid, pickle.HIGHEST_PROTOCOL)
pickle.dump(perm, fid, pickle.HIGHEST_PROTOCOL)
pickle.dump(cur_val, fid, pickle.HIGHEST_PROTOCOL)
pickle.dump(perm_val, fid, pickle.HIGHEST_PROTOCOL)
pickle.dump(iter, fid, pickle.HIGHEST_PROTOCOL)
return filename, nfilename
def get_variables_in_checkpoint_file(self, file_name):
try:
reader = pywrap_tensorflow.NewCheckpointReader(file_name)
var_to_shape_map = reader.get_variable_to_shape_map()
return var_to_shape_map
except Exception as e: # pylint: disable=broad-except
print(str(e))
if "corrupted compressed block contents" in str(e):
print("It's likely that your checkpoint file has been compressed "
"with SNAPPY.")
def train_model(self, sess, max_iters):
# Build data layers for both training and validation set
self.data_layer = RoIDataLayer(self.roidb, self.imdb.num_classes)
self.data_layer_val = RoIDataLayer(self.valroidb, self.imdb.num_classes, random=True)
# Determine different scales for anchors, see paper
with sess.graph.as_default():
# Set the random seed for tensorflow
tf.set_random_seed(cfg.RNG_SEED)
# Build the main computation graph
layers = self.net.create_architecture(sess, 'TRAIN', self.imdb.num_classes, tag='default',
anchor_scales=cfg.ANCHOR_SCALES,
anchor_ratios=cfg.ANCHOR_RATIOS)
# Define the loss
loss = layers['total_loss']
# Set learning rate and momentum
lr = tf.Variable(cfg.TRAIN.LEARNING_RATE, trainable=False)
momentum = cfg.TRAIN.MOMENTUM
self.optimizer = tf.train.MomentumOptimizer(lr, momentum)
# Compute the gradients wrt the loss
gvs = self.optimizer.compute_gradients(loss)
# Double the gradient of the bias if set
if cfg.TRAIN.DOUBLE_BIAS:
final_gvs = []
with tf.variable_scope('Gradient_Mult') as scope:
for grad, var in gvs:
scale = 1.
if cfg.TRAIN.DOUBLE_BIAS and '/biases:' in var.name:
scale *= 2.
if not np.allclose(scale, 1.0):
grad = tf.multiply(grad, scale)
final_gvs.append((grad, var))
train_op = self.optimizer.apply_gradients(final_gvs)
else:
train_op = self.optimizer.apply_gradients(gvs)
# We will handle the snapshots ourselves
self.saver = tf.train.Saver(max_to_keep=100000)
# Write the train and validation information to tensorboard
self.writer = tf.summary.FileWriter(self.tbdir, sess.graph)
self.valwriter = tf.summary.FileWriter(self.tbvaldir)
# Find previous snapshots if there is any to restore from
sfiles = os.path.join(self.output_dir, cfg.TRAIN.SNAPSHOT_PREFIX + '_iter_*.ckpt.meta')
sfiles = glob.glob(sfiles)
sfiles.sort(key=os.path.getmtime)
# Get the snapshot name in TensorFlow
redstr = '_iter_{:d}.'.format(cfg.TRAIN.STEPSIZE+1)
sfiles = [ss.replace('.meta', '') for ss in sfiles]
sfiles = [ss for ss in sfiles if redstr not in ss]
nfiles = os.path.join(self.output_dir, cfg.TRAIN.SNAPSHOT_PREFIX + '_iter_*.pkl')
nfiles = glob.glob(nfiles)
nfiles.sort(key=os.path.getmtime)
nfiles = [nn for nn in nfiles if redstr not in nn]
lsf = len(sfiles)
assert len(nfiles) == lsf
np_paths = nfiles
ss_paths = sfiles
if lsf == 0:
# Fresh train directly from ImageNet weights
print('Loading initial model weights from {:s}'.format(self.pretrained_model))
variables = tf.global_variables()
# Initialize all variables first
sess.run(tf.variables_initializer(variables, name='init'))
var_keep_dic = self.get_variables_in_checkpoint_file(self.pretrained_model)
# Get the variables to restore, ignorizing the variables to fix
variables_to_restore = self.net.get_variables_to_restore(variables, var_keep_dic)
restorer = tf.train.Saver(variables_to_restore)
restorer.restore(sess, self.pretrained_model)
print('Loaded.')
# Need to fix the variables before loading, so that the RGB weights are changed to BGR
# For VGG16 it also changes the convolutional weights fc6 and fc7 to
# fully connected weights
self.net.fix_variables(sess, self.pretrained_model)
print('Fixed.')
sess.run(tf.assign(lr, cfg.TRAIN.LEARNING_RATE))
last_snapshot_iter = 0
else:
# Get the most recent snapshot and restore
ss_paths = [ss_paths[-1]]
np_paths = [np_paths[-1]]
print('Restorining model snapshots from {:s}'.format(sfiles[-1]))
self.saver.restore(sess, str(sfiles[-1]))
print('Restored.')
# Needs to restore the other hyperparameters/states for training, (TODO xinlei) I have
# tried my best to find the random states so that it can be recovered exactly
# However the Tensorflow state is currently not available
with open(str(nfiles[-1]), 'rb') as fid:
st0 = pickle.load(fid)
cur = pickle.load(fid)
perm = pickle.load(fid)
cur_val = pickle.load(fid)
perm_val = pickle.load(fid)
last_snapshot_iter = pickle.load(fid)
np.random.set_state(st0)
self.data_layer._cur = cur
self.data_layer._perm = perm
self.data_layer_val._cur = cur_val
self.data_layer_val._perm = perm_val
# Set the learning rate, only reduce once
if last_snapshot_iter > cfg.TRAIN.STEPSIZE:
sess.run(tf.assign(lr, cfg.TRAIN.LEARNING_RATE * cfg.TRAIN.GAMMA))
else:
sess.run(tf.assign(lr, cfg.TRAIN.LEARNING_RATE))
np.sum([np.prod(v.get_shape().as_list()) for v in tf.trainable_variables()])
timer = Timer()
iter = last_snapshot_iter + 1
last_summary_time = time.time()
while iter < max_iters + 1:
# Learning rate
if iter == cfg.TRAIN.STEPSIZE + 1:
# Add snapshot here before reducing the learning rate
self.snapshot(sess, iter)
sess.run(tf.assign(lr, cfg.TRAIN.LEARNING_RATE * cfg.TRAIN.GAMMA))
timer.tic()
# Get training data, one batch at a time
blobs = self.data_layer.forward()
now = time.time()
if now - last_summary_time > cfg.TRAIN.SUMMARY_INTERVAL:
# Compute the graph with summary
rpn_loss_cls, rpn_loss_box, loss_cls, loss_box, total_loss, summary = \
self.net.train_step_with_summary(sess, blobs, train_op)
self.writer.add_summary(summary, float(iter))
# Also check the summary on the validation set
blobs_val = self.data_layer_val.forward()
summary_val = self.net.get_summary(sess, blobs_val)
self.valwriter.add_summary(summary_val, float(iter))
last_summary_time = now
else:
# Compute the graph without summary
rpn_loss_cls, rpn_loss_box, loss_cls, loss_box, total_loss = \
self.net.train_step(sess, blobs, train_op)
timer.toc()
# Display training information
if iter % (cfg.TRAIN.DISPLAY) == 0:
print('iter: %d / %d, total loss: %.6f\n >>> rpn_loss_cls: %.6f\n '
'>>> rpn_loss_box: %.6f\n >>> loss_cls: %.6f\n >>> loss_box: %.6f\n >>> lr: %f' % \
(iter, max_iters, total_loss, rpn_loss_cls, rpn_loss_box, loss_cls, loss_box, lr.eval()))
print('speed: {:.3f}s / iter'.format(timer.average_time))
if iter % cfg.TRAIN.SNAPSHOT_ITERS == 0:
last_snapshot_iter = iter
snapshot_path, np_path = self.snapshot(sess, iter)
np_paths.append(np_path)
ss_paths.append(snapshot_path)
# Remove the old snapshots if there are too many
if len(np_paths) > cfg.TRAIN.SNAPSHOT_KEPT:
to_remove = len(np_paths) - cfg.TRAIN.SNAPSHOT_KEPT
for c in range(to_remove):
nfile = np_paths[0]
os.remove(str(nfile))
np_paths.remove(nfile)
if len(ss_paths) > cfg.TRAIN.SNAPSHOT_KEPT:
to_remove = len(ss_paths) - cfg.TRAIN.SNAPSHOT_KEPT
for c in range(to_remove):
sfile = ss_paths[0]
# To make the code compatible to earlier versions of Tensorflow,
# where the naming tradition for checkpoints are different
if os.path.exists(str(sfile)):
os.remove(str(sfile))
else:
os.remove(str(sfile + '.data-00000-of-00001'))
os.remove(str(sfile + '.index'))
sfile_meta = sfile + '.meta'
os.remove(str(sfile_meta))
ss_paths.remove(sfile)
iter += 1
if last_snapshot_iter != iter - 1:
self.snapshot(sess, iter - 1)
self.writer.close()
self.valwriter.close()
def get_training_roidb(imdb):
"""Returns a roidb (Region of Interest database) for use in training."""
if cfg.TRAIN.USE_FLIPPED:
print('Appending horizontally-flipped training examples...')
imdb.append_flipped_images()
print('done')
print('Preparing training data...')
rdl_roidb.prepare_roidb(imdb)
print('done')
return imdb.roidb
def filter_roidb(roidb):
"""Remove roidb entries that have no usable RoIs."""
def is_valid(entry):
# Valid images have:
# (1) At least one foreground RoI OR
# (2) At least one background RoI
overlaps = entry['max_overlaps']
# find boxes with sufficient overlap
fg_inds = np.where(overlaps >= cfg.TRAIN.FG_THRESH)[0]
# Select background RoIs as those within [BG_THRESH_LO, BG_THRESH_HI)
bg_inds = np.where((overlaps < cfg.TRAIN.BG_THRESH_HI) &
(overlaps >= cfg.TRAIN.BG_THRESH_LO))[0]
# image is only valid if such boxes exist
valid = len(fg_inds) > 0 or len(bg_inds) > 0
return valid
num = len(roidb)
filtered_roidb = [entry for entry in roidb if is_valid(entry)]
num_after = len(filtered_roidb)
print('Filtered {} roidb entries: {} -> {}'.format(num - num_after,
num, num_after))
return filtered_roidb
def train_net(network, imdb, roidb, valroidb, output_dir, tb_dir,
pretrained_model=None,
max_iters=40000):
"""Train a Fast R-CNN network."""
roidb = filter_roidb(roidb)
valroidb = filter_roidb(valroidb)
tfconfig = tf.ConfigProto(allow_soft_placement=True)
tfconfig.gpu_options.allow_growth = True
with tf.Session(config=tfconfig) as sess:
sw = SolverWrapper(sess, network, imdb, roidb, valroidb, output_dir, tb_dir,
pretrained_model=pretrained_model)
print('Solving...')
sw.train_model(sess, max_iters)
print('done solving')
================================================
FILE: lib/nets/__init__.py
================================================
================================================
FILE: lib/nets/network.py
================================================
# --------------------------------------------------------
# Tensorflow Faster R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Xinlei Chen
# --------------------------------------------------------
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import tensorflow as tf
import tensorflow.contrib.slim as slim
from tensorflow.contrib.slim import losses
from tensorflow.contrib.slim import arg_scope
import numpy as np
from layer_utils.snippets import generate_anchors_pre
from layer_utils.proposal_layer import proposal_layer
from layer_utils.proposal_top_layer import proposal_top_layer
from layer_utils.anchor_target_layer import anchor_target_layer
from layer_utils.proposal_target_layer import proposal_target_layer
from model.config import cfg
class Network(object):
def __init__(self, batch_size=1):
self._feat_stride = [16, ]
self._feat_compress = [1. / 16., ]
self._batch_size = batch_size
self._predictions = {}
self._losses = {}
self._anchor_targets = {}
self._proposal_targets = {}
self._layers = {}
self._act_summaries = []
self._score_summaries = {}
self._train_summaries = []
self._event_summaries = {}
self._variables_to_fix = {}
def _add_image_summary(self, image, boxes):
# add back mean
image += cfg.PIXEL_MEANS
# bgr to rgb (opencv uses bgr)
channels = tf.unstack (image, axis=-1)
image = tf.stack ([channels[2], channels[1], channels[0]], axis=-1)
# dims for normalization
width = tf.to_float(tf.shape(image)[2])
height = tf.to_float(tf.shape(image)[1])
# from [x1, y1, x2, y2, cls] to normalized [y1, x1, y1, x1]
cols = tf.unstack(boxes, axis=1)
boxes = tf.stack([cols[1] / height,
cols[0] / width,
cols[3] / height,
cols[2] / width], axis=1)
# add batch dimension (assume batch_size==1)
assert image.get_shape()[0] == 1
boxes = tf.expand_dims(boxes, dim=0)
image = tf.image.draw_bounding_boxes(image, boxes)
return tf.summary.image('ground_truth', image)
def _add_act_summary(self, tensor):
tf.summary.histogram('ACT/' + tensor.op.name + '/activations', tensor)
tf.summary.scalar('ACT/' + tensor.op.name + '/zero_fraction',
tf.nn.zero_fraction(tensor))
def _add_score_summary(self, key, tensor):
tf.summary.histogram('SCORE/' + tensor.op.name + '/' + key + '/scores', tensor)
def _add_train_summary(self, var):
tf.summary.histogram('TRAIN/' + var.op.name, var)
def _reshape_layer(self, bottom, num_dim, name):
input_shape = tf.shape(bottom)
with tf.variable_scope(name) as scope:
# change the channel to the caffe format
to_caffe = tf.transpose(bottom, [0, 3, 1, 2])
# then force it to have channel 2
reshaped = tf.reshape(to_caffe,
tf.concat(axis=0, values=[[self._batch_size], [num_dim, -1], [input_shape[2]]]))
# then swap the channel back
to_tf = tf.transpose(reshaped, [0, 2, 3, 1])
return to_tf
def _softmax_layer(self, bottom, name):
if name == 'rpn_cls_prob_reshape':
input_shape = tf.shape(bottom)
bottom_reshaped = tf.reshape(bottom, [-1, input_shape[-1]])
reshaped_score = tf.nn.softmax(bottom_reshaped, name=name)
return tf.reshape(reshaped_score, input_shape)
return tf.nn.softmax(bottom, name=name)
def _proposal_top_layer(self, rpn_cls_prob, rpn_bbox_pred, name):
with tf.variable_scope(name) as scope:
rois, rpn_scores = tf.py_func(proposal_top_layer,
[rpn_cls_prob, rpn_bbox_pred, self._im_info,
self._feat_stride, self._anchors, self._num_anchors],
[tf.float32, tf.float32])
rois.set_shape([cfg.TEST.RPN_TOP_N, 5])
rpn_scores.set_shape([cfg.TEST.RPN_TOP_N, 1])
return rois, rpn_scores
def _proposal_layer(self, rpn_cls_prob, rpn_bbox_pred, name):
with tf.variable_scope(name) as scope:
rois, rpn_scores = tf.py_func(proposal_layer,
[rpn_cls_prob, rpn_bbox_pred, self._im_info, self._mode,
self._feat_stride, self._anchors, self._num_anchors],
[tf.float32, tf.float32])
rois.set_shape([None, 5])
rpn_scores.set_shape([None, 1])
return rois, rpn_scores
# Only use it if you have roi_pooling op written in tf.image
def _roi_pool_layer(self, bootom, rois, name):
with tf.variable_scope(name) as scope:
return tf.image.roi_pooling(bootom, rois,
pooled_height=cfg.POOLING_SIZE,
pooled_width=cfg.POOLING_SIZE,
spatial_scale=1. / 16.)[0]
def _crop_pool_layer(self, bottom, rois, name):
with tf.variable_scope(name) as scope:
batch_ids = tf.squeeze(tf.slice(rois, [0, 0], [-1, 1], name="batch_id"), [1])
# Get the normalized coordinates of bboxes
bottom_shape = tf.shape(bottom)
height = (tf.to_float(bottom_shape[1]) - 1.) * np.float32(self._feat_stride[0])
width = (tf.to_float(bottom_shape[2]) - 1.) * np.float32(self._feat_stride[0])
x1 = tf.slice(rois, [0, 1], [-1, 1], name="x1") / width
y1 = tf.slice(rois, [0, 2], [-1, 1], name="y1") / height
x2 = tf.slice(rois, [0, 3], [-1, 1], name="x2") / width
y2 = tf.slice(rois, [0, 4], [-1, 1], name="y2") / height
# Won't be backpropagated to rois anyway, but to save time
bboxes = tf.stop_gradient(tf.concat([y1, x1, y2, x2], axis=1))
pre_pool_size = cfg.POOLING_SIZE * 2
crops = tf.image.crop_and_resize(bottom, bboxes, tf.to_int32(batch_ids), [pre_pool_size, pre_pool_size], name="crops")
return slim.max_pool2d(crops, [2, 2], padding='SAME')
def _dropout_layer(self, bottom, name, ratio=0.5):
return tf.nn.dropout(bottom, ratio, name=name)
def _anchor_target_layer(self, rpn_cls_score, name):
with tf.variable_scope(name) as scope:
rpn_labels, rpn_bbox_targets, rpn_bbox_inside_weights, rpn_bbox_outside_weights = tf.py_func(
anchor_target_layer,
[rpn_cls_score, self._gt_boxes, self._im_info, self._feat_stride, self._anchors, self._num_anchors],
[tf.float32, tf.float32, tf.float32, tf.float32])
rpn_labels.set_shape([1, 1, None, None])
rpn_bbox_targets.set_shape([1, None, None, self._num_anchors * 4])
rpn_bbox_inside_weights.set_shape([1, None, None, self._num_anchors * 4])
rpn_bbox_outside_weights.set_shape([1, None, None, self._num_anchors * 4])
rpn_labels = tf.to_int32(rpn_labels, name="to_int32")
self._anchor_targets['rpn_labels'] = rpn_labels
self._anchor_targets['rpn_bbox_targets'] = rpn_bbox_targets
self._anchor_targets['rpn_bbox_inside_weights'] = rpn_bbox_inside_weights
self._anchor_targets['rpn_bbox_outside_weights'] = rpn_bbox_outside_weights
self._score_summaries.update(self._anchor_targets)
return rpn_labels
def _proposal_target_layer(self, rois, roi_scores, name):
with tf.variable_scope(name) as scope:
rois, roi_scores, labels, bbox_targets, bbox_inside_weights, bbox_outside_weights = tf.py_func(
proposal_target_layer,
[rois, roi_scores, self._gt_boxes, self._num_classes],
[tf.float32, tf.float32, tf.float32, tf.float32, tf.float32, tf.float32])
rois.set_shape([cfg.TRAIN.BATCH_SIZE, 5])
roi_scores.set_shape([cfg.TRAIN.BATCH_SIZE])
labels.set_shape([cfg.TRAIN.BATCH_SIZE, 1])
bbox_targets.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
bbox_inside_weights.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
bbox_outside_weights.set_shape([cfg.TRAIN.BATCH_SIZE, self._num_classes * 4])
self._proposal_targets['rois'] = rois
self._proposal_targets['labels'] = tf.to_int32(labels, name="to_int32")
self._proposal_targets['bbox_targets'] = bbox_targets
self._proposal_targets['bbox_inside_weights'] = bbox_inside_weights
self._proposal_targets['bbox_outside_weights'] = bbox_outside_weights
self._score_summaries.update(self._proposal_targets)
return rois, roi_scores
def _anchor_component(self):
with tf.variable_scope('ANCHOR_' + self._tag) as scope:
# just to get the shape right
height = tf.to_int32(tf.ceil(self._im_info[0, 0] / np.float32(self._feat_stride[0])))
width = tf.to_int32(tf.ceil(self._im_info[0, 1] / np.float32(self._feat_stride[0])))
anchors, anchor_length = tf.py_func(generate_anchors_pre,
[height, width,
self._feat_stride, self._anchor_scales, self._anchor_ratios],
[tf.float32, tf.int32], name="generate_anchors")
anchors.set_shape([None, 4])
anchor_length.set_shape([])
self._anchors = anchors
self._anchor_length = anchor_length
def build_network(self, sess, is_training=True):
raise NotImplementedError
def _smooth_l1_loss(self, bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights, sigma=1.0, dim=[1]):
sigma_2 = sigma ** 2
box_diff = bbox_pred - bbox_targets
in_box_diff = bbox_inside_weights * box_diff
abs_in_box_diff = tf.abs(in_box_diff)
smoothL1_sign = tf.stop_gradient(tf.to_float(tf.less(abs_in_box_diff, 1. / sigma_2)))
in_loss_box = tf.pow(in_box_diff, 2) * (sigma_2 / 2.) * smoothL1_sign \
+ (abs_in_box_diff - (0.5 / sigma_2)) * (1. - smoothL1_sign)
out_loss_box = bbox_outside_weights * in_loss_box
loss_box = tf.reduce_mean(tf.reduce_sum(
out_loss_box,
axis=dim
))
return loss_box
def _add_losses(self, sigma_rpn=3.0):
with tf.variable_scope('loss_' + self._tag) as scope:
# RPN, class loss
rpn_cls_score = tf.reshape(self._predictions['rpn_cls_score_reshape'], [-1, 2])
rpn_label = tf.reshape(self._anchor_targets['rpn_labels'], [-1])
rpn_select = tf.where(tf.not_equal(rpn_label, -1))
rpn_cls_score = tf.reshape(tf.gather(rpn_cls_score, rpn_select), [-1, 2])
rpn_label = tf.reshape(tf.gather(rpn_label, rpn_select), [-1])
rpn_cross_entropy = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(logits=rpn_cls_score, labels=rpn_label))
# RPN, bbox loss
rpn_bbox_pred = self._predictions['rpn_bbox_pred']
rpn_bbox_targets = self._anchor_targets['rpn_bbox_targets']
rpn_bbox_inside_weights = self._anchor_targets['rpn_bbox_inside_weights']
rpn_bbox_outside_weights = self._anchor_targets['rpn_bbox_outside_weights']
rpn_loss_box = self._smooth_l1_loss(rpn_bbox_pred, rpn_bbox_targets, rpn_bbox_inside_weights,
rpn_bbox_outside_weights, sigma=sigma_rpn, dim=[1, 2, 3])
# RCNN, class loss
cls_score = self._predictions["cls_score"]
label = tf.reshape(self._proposal_targets["labels"], [-1])
cross_entropy = tf.reduce_mean(
tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=tf.reshape(cls_score, [-1, self._num_classes]), labels=label))
# RCNN, bbox loss
bbox_pred = self._predictions['bbox_pred']
bbox_targets = self._proposal_targets['bbox_targets']
bbox_inside_weights = self._proposal_targets['bbox_inside_weights']
bbox_outside_weights = self._proposal_targets['bbox_outside_weights']
loss_box = self._smooth_l1_loss(bbox_pred, bbox_targets, bbox_inside_weights, bbox_outside_weights)
self._losses['cross_entropy'] = cross_entropy
self._losses['loss_box'] = loss_box
self._losses['rpn_cross_entropy'] = rpn_cross_entropy
self._losses['rpn_loss_box'] = rpn_loss_box
loss = cross_entropy + loss_box + rpn_cross_entropy + rpn_loss_box
self._losses['total_loss'] = loss
self._event_summaries.update(self._losses)
return loss
def create_architecture(self, sess, mode, num_classes, tag=None,
anchor_scales=(8, 16, 32), anchor_ratios=(0.5, 1, 2)):
self._image = tf.placeholder(tf.float32, shape=[self._batch_size, None, None, 3])
self._im_info = tf.placeholder(tf.float32, shape=[self._batch_size, 3])
self._gt_boxes = tf.placeholder(tf.float32, shape=[None, 5])
self._tag = tag
self._num_classes = num_classes
self._mode = mode
self._anchor_scales = anchor_scales
self._num_scales = len(anchor_scales)
self._anchor_ratios = anchor_ratios
self._num_ratios = len(anchor_ratios)
self._num_anchors = self._num_scales * self._num_ratios
training = mode == 'TRAIN'
testing = mode == 'TEST'
assert tag != None
# handle most of the regularizers here
weights_regularizer = tf.contrib.layers.l2_regularizer(cfg.TRAIN.WEIGHT_DECAY)
if cfg.TRAIN.BIAS_DECAY:
biases_regularizer = weights_regularizer
else:
biases_regularizer = tf.no_regularizer
# list as many types of layers as possible, even if they are not used now
with arg_scope([slim.conv2d, slim.conv2d_in_plane, \
slim.conv2d_transpose, slim.separable_conv2d, slim.fully_connected],
weights_regularizer=weights_regularizer,
biases_regularizer=biases_regularizer,
biases_initializer=tf.constant_initializer(0.0)):
rois, cls_prob, bbox_pred = self.build_network(sess, training)
layers_to_output = {'rois': rois}
layers_to_output.update(self._predictions)
for var in tf.trainable_variables():
self._train_summaries.append(var)
if mode == 'TEST':
stds = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_STDS), (self._num_classes))
means = np.tile(np.array(cfg.TRAIN.BBOX_NORMALIZE_MEANS), (self._num_classes))
self._predictions["bbox_pred"] *= stds
self._predictions["bbox_pred"] += means
else:
self._add_losses()
layers_to_output.update(self._losses)
val_summaries = []
with tf.device("/cpu:0"):
val_summaries.append(self._add_image_summary(self._image, self._gt_boxes))
for key, var in self._event_summaries.items():
val_summaries.append(tf.summary.scalar(key, var))
for key, var in self._score_summaries.items():
self._add_score_summary(key, var)
for var in self._act_summaries:
self._add_act_summary(var)
for var in self._train_summaries:
self._add_train_summary(var)
self._summary_op = tf.summary.merge_all()
if not testing:
self._summary_op_val = tf.summary.merge(val_summaries)
return layers_to_output
def get_variables_to_restore(self, variables, var_keep_dic):
raise NotImplementedError
def fix_variables(self, sess, pretrained_model):
raise NotImplementedError
# Extract the head feature maps, for example for vgg16 it is conv5_3
# only useful during testing mode
def extract_head(self, sess, image):
feed_dict = {self._image: image}
feat = sess.run(self._layers["head"], feed_dict=feed_dict)
return feat
# only useful during testing mode
def test_image(self, sess, image, im_info):
feed_dict = {self._image: image,
self._im_info: im_info}
cls_score, cls_prob, bbox_pred, rois = sess.run([self._predictions["cls_score"],
self._predictions['cls_prob'],
self._predictions['bbox_pred'],
self._predictions['rois']],
feed_dict=feed_dict)
return cls_score, cls_prob, bbox_pred, rois
def get_summary(self, sess, blobs):
feed_dict = {self._image: blobs['data'], self._im_info: blobs['im_info'],
self._gt_boxes: blobs['gt_boxes']}
summary = sess.run(self._summary_op_val, feed_dict=feed_dict)
return summary
def train_step(self, sess, blobs, train_op):
feed_dict = {self._image: blobs['data'], self._im_info: blobs['im_info'],
self._gt_boxes: blobs['gt_boxes']}
rpn_loss_cls, rpn_loss_box, loss_cls, loss_box
gitextract_6dytcin7/
├── README.md
├── data/
│ └── scripts/
│ ├── dataPreprocessingTest_fasterrcnn_split.m
│ ├── dataPreprocessing_fasterrcnn.m
│ └── fetch_faster_rcnn_models.sh
├── experiments/
│ ├── cfgs/
│ │ ├── res101-lg.yml
│ │ ├── res101.yml
│ │ ├── res50.yml
│ │ └── vgg16.yml
│ ├── logs/
│ │ └── .gitignore
│ └── scripts/
│ ├── convert_vgg16.sh
│ ├── test_faster_rcnn.sh
│ ├── train_faster_rcnn.sh
│ └── train_faster_rcnn.sh~
├── lib/
│ ├── Makefile
│ ├── datasets/
│ │ ├── VOCdevkit-matlab-wrapper/
│ │ │ ├── get_voc_opts.m
│ │ │ ├── voc_eval.m
│ │ │ └── xVOCap.m
│ │ ├── __init__.py
│ │ ├── coco.py
│ │ ├── ds_utils.py
│ │ ├── factory.py
│ │ ├── factory.py~
│ │ ├── graspRGB.py
│ │ ├── graspRGB.py~
│ │ ├── imdb.py
│ │ ├── pascal_voc.py
│ │ ├── tools/
│ │ │ └── mcg_munge.py
│ │ └── voc_eval.py
│ ├── layer_utils/
│ │ ├── __init__.py
│ │ ├── anchor_target_layer.py
│ │ ├── generate_anchors.py
│ │ ├── proposal_layer.py
│ │ ├── proposal_target_layer.py
│ │ ├── proposal_top_layer.py
│ │ └── snippets.py
│ ├── model/
│ │ ├── __init__.py
│ │ ├── bbox_transform.py
│ │ ├── config.py
│ │ ├── config.py~
│ │ ├── nms_wrapper.py
│ │ ├── test.py
│ │ ├── test.py~
│ │ ├── train_val.py
│ │ └── train_val.py~
│ ├── nets/
│ │ ├── __init__.py
│ │ ├── network.py
│ │ ├── resnet_v1.py
│ │ ├── resnet_v1.py~
│ │ └── vgg16.py
│ ├── nms/
│ │ ├── .gitignore
│ │ ├── __init__.py
│ │ ├── cpu_nms.c
│ │ ├── cpu_nms.pyx
│ │ ├── gpu_nms.cpp
│ │ ├── gpu_nms.hpp
│ │ ├── gpu_nms.pyx
│ │ ├── nms_kernel.cu
│ │ └── py_cpu_nms.py
│ ├── roi_data_layer/
│ │ ├── __init__.py
│ │ ├── layer.py
│ │ ├── minibatch.py
│ │ ├── minibatch.py~
│ │ └── roidb.py
│ ├── setup.py
│ ├── setup.py~
│ └── utils/
│ ├── .gitignore
│ ├── __init__.py
│ ├── bbox.pyx
│ ├── blob.py
│ ├── boxes_grid.py
│ ├── nms.py
│ ├── nms.pyx
│ └── timer.py
└── tools/
├── _init_paths.py
├── demo.py~
├── demo_graspRGD.py
├── demo_graspRGD.py~
├── demo_graspRGD_socket.py
├── demo_graspRGD_socket.py~
├── demo_graspRGD_socket_drawer.py~
├── demo_graspRGD_socket_save_to_rgbd.py~
├── demo_graspRGD_vis_mask.py
├── demo_graspRGD_vis_select.py
├── eval_graspRGD.py~
├── mask_gen.py
└── trainval_net.py
SYMBOL INDEX (484 symbols across 39 files)
FILE: lib/datasets/coco.py
class coco (line 27) | class coco(imdb):
method __init__ (line 28) | def __init__(self, image_set, year):
method _get_ann_file (line 65) | def _get_ann_file(self):
method _load_image_set_index (line 71) | def _load_image_set_index(self):
method _get_widths (line 78) | def _get_widths(self):
method image_path_at (line 83) | def image_path_at(self, i):
method image_path_from_index (line 89) | def image_path_from_index(self, index):
method gt_roidb (line 103) | def gt_roidb(self):
method _load_coco_annotation (line 123) | def _load_coco_annotation(self, index):
method _get_widths (line 181) | def _get_widths(self):
method append_flipped_images (line 184) | def append_flipped_images(self):
method _get_box_file (line 205) | def _get_box_file(self, index):
method _print_detection_eval_metrics (line 212) | def _print_detection_eval_metrics(self, coco_eval):
method _do_detection_eval (line 245) | def _do_detection_eval(self, res_file, output_dir):
method _coco_results_one_category (line 258) | def _coco_results_one_category(self, boxes, cat_id):
method _write_coco_results_file (line 276) | def _write_coco_results_file(self, all_boxes, res_file):
method evaluate_detections (line 294) | def evaluate_detections(self, all_boxes, output_dir):
method competition_mode (line 310) | def competition_mode(self, on):
FILE: lib/datasets/ds_utils.py
function unique_boxes (line 13) | def unique_boxes(boxes, scale=1.0):
function xywh_to_xyxy (line 21) | def xywh_to_xyxy(boxes):
function xyxy_to_xywh (line 26) | def xyxy_to_xywh(boxes):
function validate_boxes (line 31) | def validate_boxes(boxes, width=0, height=0):
function filter_small_boxes (line 45) | def filter_small_boxes(boxes, min_size):
FILE: lib/datasets/factory.py
function get_imdb (line 44) | def get_imdb(name):
function list_imdbs (line 51) | def list_imdbs():
FILE: lib/datasets/graspRGB.py
class graspRGB (line 23) | class graspRGB(imdb):
method __init__ (line 24) | def __init__(self, image_set, devkit_path):
method image_path_at (line 55) | def image_path_at(self, i):
method image_path_from_index (line 61) | def image_path_from_index(self, index):
method _load_image_set_index (line 74) | def _load_image_set_index(self):
method gt_roidb (line 89) | def gt_roidb(self):
method selective_search_roidb (line 110) | def selective_search_roidb(self):
method _load_selective_search_roidb (line 139) | def _load_selective_search_roidb(self, gt_roidb):
method selective_search_IJCV_roidb (line 153) | def selective_search_IJCV_roidb(self):
method _load_selective_search_IJCV_roidb (line 179) | def _load_selective_search_IJCV_roidb(self, gt_roidb):
method _load_graspRGB_annotation (line 195) | def _load_graspRGB_annotation(self, index):
method _write_graspRGB_results_file (line 255) | def _write_graspRGB_results_file(self, all_boxes):
method _do_matlab_eval (line 281) | def _do_matlab_eval(self, comp_id, output_dir='output'):
method evaluate_detections (line 295) | def evaluate_detections(self, all_boxes, output_dir):
method competition_mode (line 299) | def competition_mode(self, on):
FILE: lib/datasets/imdb.py
class imdb (line 20) | class imdb(object):
method __init__ (line 23) | def __init__(self, name, classes=None):
method name (line 38) | def name(self):
method num_classes (line 42) | def num_classes(self):
method classes (line 46) | def classes(self):
method image_index (line 50) | def image_index(self):
method roidb_handler (line 54) | def roidb_handler(self):
method roidb_handler (line 58) | def roidb_handler(self, val):
method set_proposal_method (line 61) | def set_proposal_method(self, method):
method roidb (line 66) | def roidb(self):
method cache_path (line 78) | def cache_path(self):
method num_images (line 85) | def num_images(self):
method image_path_at (line 88) | def image_path_at(self, i):
method default_roidb (line 91) | def default_roidb(self):
method evaluate_detections (line 94) | def evaluate_detections(self, all_boxes, output_dir=None):
method _get_widths (line 105) | def _get_widths(self):
method append_flipped_images (line 109) | def append_flipped_images(self):
method evaluate_recall (line 126) | def evaluate_recall(self, candidate_boxes=None, thresholds=None,
method create_roidb_from_box_list (line 216) | def create_roidb_from_box_list(self, box_list, gt_roidb):
method merge_roidbs (line 246) | def merge_roidbs(a, b):
method competition_mode (line 258) | def competition_mode(self, on):
FILE: lib/datasets/pascal_voc.py
class pascal_voc (line 26) | class pascal_voc(imdb):
method __init__ (line 27) | def __init__(self, image_set, year, devkit_path=None):
method image_path_at (line 60) | def image_path_at(self, i):
method image_path_from_index (line 66) | def image_path_from_index(self, index):
method _load_image_set_index (line 76) | def _load_image_set_index(self):
method _get_default_path (line 90) | def _get_default_path(self):
method gt_roidb (line 96) | def gt_roidb(self):
method rpn_roidb (line 120) | def rpn_roidb(self):
method _load_rpn_roidb (line 130) | def _load_rpn_roidb(self, gt_roidb):
method _load_pascal_annotation (line 139) | def _load_pascal_annotation(self, index):
method _get_comp_id (line 185) | def _get_comp_id(self):
method _get_voc_results_file_template (line 190) | def _get_voc_results_file_template(self):
method _write_voc_results_file (line 201) | def _write_voc_results_file(self, all_boxes):
method _do_python_eval (line 219) | def _do_python_eval(self, output_dir='output'):
method _do_matlab_eval (line 264) | def _do_matlab_eval(self, output_dir='output'):
method evaluate_detections (line 279) | def evaluate_detections(self, all_boxes, output_dir):
method competition_mode (line 291) | def competition_mode(self, on):
FILE: lib/datasets/tools/mcg_munge.py
function munge (line 15) | def munge(src_dir):
FILE: lib/datasets/voc_eval.py
function parse_rec (line 15) | def parse_rec(filename):
function voc_ap (line 35) | def voc_ap(rec, prec, use_07_metric=False):
function voc_eval (line 69) | def voc_eval(detpath,
FILE: lib/layer_utils/anchor_target_layer.py
function anchor_target_layer (line 18) | def anchor_target_layer(rpn_cls_score, gt_boxes, im_info, _feat_stride, ...
function _unmap (line 142) | def _unmap(data, count, inds, fill=0):
function _compute_targets (line 156) | def _compute_targets(ex_rois, gt_rois):
FILE: lib/layer_utils/generate_anchors.py
function generate_anchors (line 41) | def generate_anchors(base_size=16, ratios=[0.5, 1, 2],
function _whctrs (line 55) | def _whctrs(anchor):
function _mkanchors (line 67) | def _mkanchors(ws, hs, x_ctr, y_ctr):
function _ratio_enum (line 82) | def _ratio_enum(anchor, ratios):
function _scale_enum (line 96) | def _scale_enum(anchor, scales):
FILE: lib/layer_utils/proposal_layer.py
function proposal_layer (line 16) | def proposal_layer(rpn_cls_prob, rpn_bbox_pred, im_info, cfg_key, _feat_...
FILE: lib/layer_utils/proposal_target_layer.py
function proposal_target_layer (line 18) | def proposal_target_layer(rpn_rois, rpn_scores, gt_boxes, _num_classes):
function _get_bbox_regression_labels (line 58) | def _get_bbox_regression_labels(bbox_target_data, num_classes):
function _compute_targets (line 83) | def _compute_targets(ex_rois, gt_rois, labels):
function _sample_rois (line 99) | def _sample_rois(all_rois, all_scores, gt_boxes, fg_rois_per_image, rois...
FILE: lib/layer_utils/proposal_top_layer.py
function proposal_top_layer (line 15) | def proposal_top_layer(rpn_cls_prob, rpn_bbox_pred, im_info, _feat_strid...
FILE: lib/layer_utils/snippets.py
function generate_anchors_pre (line 17) | def generate_anchors_pre(height, width, feat_stride, anchor_scales=(8,16...
FILE: lib/model/bbox_transform.py
function bbox_transform (line 13) | def bbox_transform(ex_rois, gt_rois):
function bbox_transform_inv (line 34) | def bbox_transform_inv(boxes, deltas):
function clip_boxes (line 67) | def clip_boxes(boxes, im_shape):
FILE: lib/model/config.py
function get_output_dir (line 274) | def get_output_dir(imdb, weights_filename):
function get_output_tb_dir (line 290) | def get_output_tb_dir(imdb, weights_filename):
function _merge_a_into_b (line 306) | def _merge_a_into_b(a, b):
function cfg_from_file (line 339) | def cfg_from_file(filename):
function cfg_from_list (line 348) | def cfg_from_list(cfg_list):
FILE: lib/model/nms_wrapper.py
function nms (line 15) | def nms(dets, thresh, force_cpu=False):
FILE: lib/model/test.py
function _get_image_blob (line 27) | def _get_image_blob(im):
function _get_blobs (line 61) | def _get_blobs(im):
function _clip_boxes (line 68) | def _clip_boxes(boxes, im_shape):
function _rescale_boxes (line 80) | def _rescale_boxes(boxes, inds, scales):
function im_detect (line 87) | def im_detect(sess, net, im):
function apply_nms (line 113) | def apply_nms(all_boxes, thresh):
function test_net (line 142) | def test_net(sess, net, imdb, weights_filename, max_per_image=100, thres...
FILE: lib/model/train_val.py
class SolverWrapper (line 27) | class SolverWrapper(object):
method __init__ (line 32) | def __init__(self, sess, network, imdb, roidb, valroidb, output_dir, t...
method snapshot (line 45) | def snapshot(self, sess, iter):
method get_variables_in_checkpoint_file (line 82) | def get_variables_in_checkpoint_file(self, file_name):
method train_model (line 93) | def train_model(self, sess, max_iters):
function get_training_roidb (line 286) | def get_training_roidb(imdb):
function filter_roidb (line 300) | def filter_roidb(roidb):
function train_net (line 325) | def train_net(network, imdb, roidb, valroidb, output_dir, tb_dir,
FILE: lib/nets/network.py
class Network (line 25) | class Network(object):
method __init__ (line 26) | def __init__(self, batch_size=1):
method _add_image_summary (line 41) | def _add_image_summary(self, image, boxes):
method _add_act_summary (line 63) | def _add_act_summary(self, tensor):
method _add_score_summary (line 68) | def _add_score_summary(self, key, tensor):
method _add_train_summary (line 71) | def _add_train_summary(self, var):
method _reshape_layer (line 74) | def _reshape_layer(self, bottom, num_dim, name):
method _softmax_layer (line 86) | def _softmax_layer(self, bottom, name):
method _proposal_top_layer (line 94) | def _proposal_top_layer(self, rpn_cls_prob, rpn_bbox_pred, name):
method _proposal_layer (line 105) | def _proposal_layer(self, rpn_cls_prob, rpn_bbox_pred, name):
method _roi_pool_layer (line 117) | def _roi_pool_layer(self, bootom, rois, name):
method _crop_pool_layer (line 124) | def _crop_pool_layer(self, bottom, rois, name):
method _dropout_layer (line 142) | def _dropout_layer(self, bottom, name, ratio=0.5):
method _anchor_target_layer (line 145) | def _anchor_target_layer(self, rpn_cls_score, name):
method _proposal_target_layer (line 167) | def _proposal_target_layer(self, rois, roi_scores, name):
method _anchor_component (line 191) | def _anchor_component(self):
method build_network (line 205) | def build_network(self, sess, is_training=True):
method _smooth_l1_loss (line 208) | def _smooth_l1_loss(self, bbox_pred, bbox_targets, bbox_inside_weights...
method _add_losses (line 223) | def _add_losses(self, sigma_rpn=3.0):
method create_architecture (line 271) | def create_architecture(self, sess, mode, num_classes, tag=None,
method get_variables_to_restore (line 341) | def get_variables_to_restore(self, variables, var_keep_dic):
method fix_variables (line 344) | def fix_variables(self, sess, pretrained_model):
method extract_head (line 349) | def extract_head(self, sess, image):
method test_image (line 355) | def test_image(self, sess, image, im_info):
method get_summary (line 365) | def get_summary(self, sess, blobs):
method train_step (line 372) | def train_step(self, sess, blobs, train_op):
method train_step_with_summary (line 384) | def train_step_with_summary(self, sess, blobs, train_op):
method train_step_no_return (line 397) | def train_step_no_return(self, sess, blobs, train_op):
FILE: lib/nets/resnet_v1.py
function resnet_arg_scope (line 26) | def resnet_arg_scope(is_training=True,
class resnetv1 (line 53) | class resnetv1(Network):
method __init__ (line 54) | def __init__(self, batch_size=1, num_layers=50):
method _crop_pool_layer (line 59) | def _crop_pool_layer(self, bottom, rois, name):
method build_base (line 84) | def build_base(self):
method build_network (line 92) | def build_network(self, sess, is_training=True):
method get_variables_to_restore (line 241) | def get_variables_to_restore(self, variables, var_keep_dic):
method fix_variables (line 265) | def fix_variables(self, sess, pretrained_model):
FILE: lib/nets/vgg16.py
class vgg16 (line 19) | class vgg16(Network):
method __init__ (line 20) | def __init__(self, batch_size=1):
method build_network (line 23) | def build_network(self, sess, is_training=True):
method get_variables_to_restore (line 115) | def get_variables_to_restore(self, variables, var_keep_dic):
method fix_variables (line 133) | def fix_variables(self, sess, pretrained_model):
FILE: lib/nms/cpu_nms.c
type Py_ssize_t (line 61) | typedef int Py_ssize_t;
type Py_buffer (line 89) | typedef struct {
type Py_hash_t (line 241) | typedef long Py_hash_t;
function CYTHON_INLINE (line 310) | static CYTHON_INLINE float __PYX_NAN() {
type __Pyx_StringTabEntry (line 369) | typedef struct {PyObject **p; char *s; const Py_ssize_t n; const char* e...
function CYTHON_INLINE (line 409) | static CYTHON_INLINE size_t __Pyx_Py_UNICODE_strlen(const Py_UNICODE *u)
function __Pyx_init_sys_getdefaultencoding_params (line 435) | static int __Pyx_init_sys_getdefaultencoding_params(void) {
function __Pyx_init_sys_getdefaultencoding_params (line 484) | static int __Pyx_init_sys_getdefaultencoding_params(void) {
type __Pyx_StructField_ (line 559) | struct __Pyx_StructField_
type __Pyx_TypeInfo (line 561) | typedef struct {
type __Pyx_StructField (line 571) | typedef struct __Pyx_StructField_ {
type __Pyx_BufFmt_StackElem (line 576) | typedef struct {
type __Pyx_BufFmt_Context (line 580) | typedef struct {
type npy_int8 (line 601) | typedef npy_int8 __pyx_t_5numpy_int8_t;
type npy_int16 (line 610) | typedef npy_int16 __pyx_t_5numpy_int16_t;
type npy_int32 (line 619) | typedef npy_int32 __pyx_t_5numpy_int32_t;
type npy_int64 (line 628) | typedef npy_int64 __pyx_t_5numpy_int64_t;
type npy_uint8 (line 637) | typedef npy_uint8 __pyx_t_5numpy_uint8_t;
type npy_uint16 (line 646) | typedef npy_uint16 __pyx_t_5numpy_uint16_t;
type npy_uint32 (line 655) | typedef npy_uint32 __pyx_t_5numpy_uint32_t;
type npy_uint64 (line 664) | typedef npy_uint64 __pyx_t_5numpy_uint64_t;
type npy_float32 (line 673) | typedef npy_float32 __pyx_t_5numpy_float32_t;
type npy_float64 (line 682) | typedef npy_float64 __pyx_t_5numpy_float64_t;
type npy_long (line 691) | typedef npy_long __pyx_t_5numpy_int_t;
type npy_longlong (line 700) | typedef npy_longlong __pyx_t_5numpy_long_t;
type npy_longlong (line 709) | typedef npy_longlong __pyx_t_5numpy_longlong_t;
type npy_ulong (line 718) | typedef npy_ulong __pyx_t_5numpy_uint_t;
type npy_ulonglong (line 727) | typedef npy_ulonglong __pyx_t_5numpy_ulong_t;
type npy_ulonglong (line 736) | typedef npy_ulonglong __pyx_t_5numpy_ulonglong_t;
type npy_intp (line 745) | typedef npy_intp __pyx_t_5numpy_intp_t;
type npy_uintp (line 754) | typedef npy_uintp __pyx_t_5numpy_uintp_t;
type npy_double (line 763) | typedef npy_double __pyx_t_5numpy_float_t;
type npy_double (line 772) | typedef npy_double __pyx_t_5numpy_double_t;
type npy_longdouble (line 781) | typedef npy_longdouble __pyx_t_5numpy_longdouble_t;
type std (line 784) | typedef ::std::complex< float > __pyx_t_float_complex;
type __pyx_t_float_complex (line 786) | typedef float _Complex __pyx_t_float_complex;
type __pyx_t_float_complex (line 789) | typedef struct { float real, imag; } __pyx_t_float_complex;
type std (line 794) | typedef ::std::complex< double > __pyx_t_double_complex;
type __pyx_t_double_complex (line 796) | typedef double _Complex __pyx_t_double_complex;
type __pyx_t_double_complex (line 799) | typedef struct { double real, imag; } __pyx_t_double_complex;
type npy_cfloat (line 812) | typedef npy_cfloat __pyx_t_5numpy_cfloat_t;
type npy_cdouble (line 821) | typedef npy_cdouble __pyx_t_5numpy_cdouble_t;
type npy_clongdouble (line 830) | typedef npy_clongdouble __pyx_t_5numpy_clongdouble_t;
type npy_cdouble (line 839) | typedef npy_cdouble __pyx_t_5numpy_complex_t;
type __Pyx_RefNannyAPIStruct (line 844) | typedef struct {
function CYTHON_INLINE (line 903) | static CYTHON_INLINE PyObject* __Pyx_PyObject_GetAttrStr(PyObject* obj, ...
function CYTHON_INLINE (line 949) | static CYTHON_INLINE int __Pyx_PyList_Append(PyObject* list, PyObject* x) {
type __Pyx_Buf_DimInfo (line 979) | typedef struct {
type __Pyx_Buffer (line 982) | typedef struct {
type __Pyx_LocalBuf_ND (line 986) | typedef struct {
type __Pyx_CodeObjectCacheEntry (line 1126) | typedef struct {
type __Pyx_CodeObjectCache (line 1130) | struct __Pyx_CodeObjectCache {
type __Pyx_CodeObjectCache (line 1135) | struct __Pyx_CodeObjectCache
function CYTHON_INLINE (line 1341) | static CYTHON_INLINE __pyx_t_5numpy_float32_t __pyx_f_3nms_7cpu_nms_max(...
function CYTHON_INLINE (line 1384) | static CYTHON_INLINE __pyx_t_5numpy_float32_t __pyx_f_3nms_7cpu_nms_min(...
function PyObject (line 1430) | static PyObject *__pyx_pw_3nms_7cpu_nms_1cpu_nms(PyObject *__pyx_self, P...
function PyObject (line 1495) | static PyObject *__pyx_pf_3nms_7cpu_nms_cpu_nms(CYTHON_UNUSED PyObject *...
function CYTHON_UNUSED (line 2348) | static CYTHON_UNUSED int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObjec...
function __pyx_pf_5numpy_7ndarray___getbuffer__ (line 2359) | static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v...
function CYTHON_UNUSED (line 3144) | static CYTHON_UNUSED void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(Py...
function __pyx_pf_5numpy_7ndarray_2__releasebuffer__ (line 3153) | static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *_...
function CYTHON_INLINE (line 3222) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew1(PyOb...
function CYTHON_INLINE (line 3272) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew2(PyOb...
function CYTHON_INLINE (line 3322) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew3(PyOb...
function CYTHON_INLINE (line 3372) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew4(PyOb...
function CYTHON_INLINE (line 3422) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew5(PyOb...
function CYTHON_INLINE (line 3472) | static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Desc...
function CYTHON_INLINE (line 4176) | static CYTHON_INLINE void __pyx_f_5numpy_set_array_base(PyArrayObject *_...
function CYTHON_INLINE (line 4264) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_get_array_base(PyArrayObje...
type PyModuleDef (line 4325) | struct PyModuleDef
function __Pyx_InitCachedBuiltins (line 4397) | static int __Pyx_InitCachedBuiltins(void) {
function __Pyx_InitCachedConstants (line 4406) | static int __Pyx_InitCachedConstants(void) {
function __Pyx_InitGlobals (line 4575) | static int __Pyx_InitGlobals(void) {
function PyMODINIT_FUNC (line 4593) | PyMODINIT_FUNC PyInit_cpu_nms(void)
function __Pyx_RefNannyAPIStruct (line 4746) | static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modn...
function PyObject (line 4761) | static PyObject *__Pyx_GetBuiltinName(PyObject *name) {
function __Pyx_RaiseArgtupleInvalid (line 4774) | static void __Pyx_RaiseArgtupleInvalid(
function __Pyx_RaiseDoubleKeywordsError (line 4799) | static void __Pyx_RaiseDoubleKeywordsError(
function __Pyx_ParseOptionalKeywords (line 4812) | static int __Pyx_ParseOptionalKeywords(
function __Pyx_RaiseArgumentTypeInvalid (line 4913) | static void __Pyx_RaiseArgumentTypeInvalid(const char* name, PyObject *o...
function CYTHON_INLINE (line 4918) | static CYTHON_INLINE int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *...
function CYTHON_INLINE (line 4939) | static CYTHON_INLINE int __Pyx_IsLittleEndian(void) {
function __Pyx_BufFmt_Init (line 4943) | static void __Pyx_BufFmt_Init(__Pyx_BufFmt_Context* ctx,
function __Pyx_BufFmt_ParseNumber (line 4970) | static int __Pyx_BufFmt_ParseNumber(const char** ts) {
function __Pyx_BufFmt_ExpectNumber (line 4985) | static int __Pyx_BufFmt_ExpectNumber(const char **ts) {
function __Pyx_BufFmt_RaiseUnexpectedChar (line 4992) | static void __Pyx_BufFmt_RaiseUnexpectedChar(char ch) {
function __Pyx_BufFmt_TypeCharToStandardSize (line 5020) | static size_t __Pyx_BufFmt_TypeCharToStandardSize(char ch, int is_comple...
function __Pyx_BufFmt_TypeCharToNativeSize (line 5038) | static size_t __Pyx_BufFmt_TypeCharToNativeSize(char ch, int is_complex) {
type __Pyx_st_short (line 5057) | typedef struct { char c; short x; } __Pyx_st_short;
type __Pyx_st_int (line 5058) | typedef struct { char c; int x; } __Pyx_st_int;
type __Pyx_st_long (line 5059) | typedef struct { char c; long x; } __Pyx_st_long;
type __Pyx_st_float (line 5060) | typedef struct { char c; float x; } __Pyx_st_float;
type __Pyx_st_double (line 5061) | typedef struct { char c; double x; } __Pyx_st_double;
type __Pyx_st_longdouble (line 5062) | typedef struct { char c; long double x; } __Pyx_st_longdouble;
type __Pyx_st_void_p (line 5063) | typedef struct { char c; void *x; } __Pyx_st_void_p;
type __Pyx_st_longlong (line 5065) | typedef struct { char c; PY_LONG_LONG x; } __Pyx_st_longlong;
function __Pyx_BufFmt_TypeCharToAlignment (line 5067) | static size_t __Pyx_BufFmt_TypeCharToAlignment(char ch, CYTHON_UNUSED in...
type __Pyx_pad_short (line 5089) | typedef struct { short x; char c; } __Pyx_pad_short;
type __Pyx_pad_int (line 5090) | typedef struct { int x; char c; } __Pyx_pad_int;
type __Pyx_pad_long (line 5091) | typedef struct { long x; char c; } __Pyx_pad_long;
type __Pyx_pad_float (line 5092) | typedef struct { float x; char c; } __Pyx_pad_float;
type __Pyx_pad_double (line 5093) | typedef struct { double x; char c; } __Pyx_pad_double;
type __Pyx_pad_longdouble (line 5094) | typedef struct { long double x; char c; } __Pyx_pad_longdouble;
type __Pyx_pad_void_p (line 5095) | typedef struct { void *x; char c; } __Pyx_pad_void_p;
type __Pyx_pad_longlong (line 5097) | typedef struct { PY_LONG_LONG x; char c; } __Pyx_pad_longlong;
function __Pyx_BufFmt_TypeCharToPadding (line 5099) | static size_t __Pyx_BufFmt_TypeCharToPadding(char ch, CYTHON_UNUSED int ...
function __Pyx_BufFmt_TypeCharToGroup (line 5117) | static char __Pyx_BufFmt_TypeCharToGroup(char ch, int is_complex) {
function __Pyx_BufFmt_RaiseExpected (line 5138) | static void __Pyx_BufFmt_RaiseExpected(__Pyx_BufFmt_Context* ctx) {
function __Pyx_BufFmt_ProcessTypeChunk (line 5162) | static int __Pyx_BufFmt_ProcessTypeChunk(__Pyx_BufFmt_Context* ctx) {
function CYTHON_INLINE (line 5264) | static CYTHON_INLINE PyObject *
function CYTHON_INLINE (line 5437) | static CYTHON_INLINE void __Pyx_ZeroBuffer(Py_buffer* buf) {
function CYTHON_INLINE (line 5444) | static CYTHON_INLINE int __Pyx_GetBufferAndValidate(
function CYTHON_INLINE (line 5478) | static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info) {
function CYTHON_INLINE (line 5484) | static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *typ...
function CYTHON_INLINE (line 5497) | static CYTHON_INLINE PyObject* __Pyx_PyObject_Call(PyObject *func, PyObj...
function __Pyx_RaiseBufferIndexError (line 5536) | static void __Pyx_RaiseBufferIndexError(int axis) {
function CYTHON_INLINE (line 5541) | static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *val...
function CYTHON_INLINE (line 5558) | static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **val...
function CYTHON_INLINE (line 5737) | static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expec...
function CYTHON_INLINE (line 5742) | static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t inde...
function CYTHON_INLINE (line 5748) | static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void) {
function __Pyx_GetBuffer (line 5753) | static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags) {
function __Pyx_ReleaseBuffer (line 5779) | static void __Pyx_ReleaseBuffer(Py_buffer *view) {
function PyObject (line 5817) | static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, int l...
function CYTHON_INLINE (line 5899) | static CYTHON_INLINE PyObject* __Pyx_PyInt_From_int(int value) {
function CYTHON_INLINE (line 5946) | static CYTHON_INLINE int __Pyx_PyInt_As_int(PyObject *x) {
function CYTHON_INLINE (line 6041) | static CYTHON_INLINE PyObject* __Pyx_PyInt_From_long(long value) {
function CYTHON_INLINE (line 6069) | static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_pa...
function CYTHON_INLINE (line 6073) | static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_pa...
function CYTHON_INLINE (line 6078) | static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_pa...
function CYTHON_INLINE (line 6088) | static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex a, __pyx_t_fl...
function CYTHON_INLINE (line 6091) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_co...
function CYTHON_INLINE (line 6097) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_c...
function CYTHON_INLINE (line 6103) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_c...
function CYTHON_INLINE (line 6109) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_c...
function CYTHON_INLINE (line 6116) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_co...
function CYTHON_INLINE (line 6122) | static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex a) {
function CYTHON_INLINE (line 6125) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_c...
function CYTHON_INLINE (line 6132) | static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex z) {
function CYTHON_INLINE (line 6139) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_co...
function CYTHON_INLINE (line 6189) | static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_...
function CYTHON_INLINE (line 6193) | static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_...
function CYTHON_INLINE (line 6198) | static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_...
function CYTHON_INLINE (line 6208) | static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex a, __pyx_t_do...
function CYTHON_INLINE (line 6211) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_c...
function CYTHON_INLINE (line 6217) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_...
function CYTHON_INLINE (line 6223) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_...
function CYTHON_INLINE (line 6229) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_...
function CYTHON_INLINE (line 6236) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_c...
function CYTHON_INLINE (line 6242) | static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex a) {
function CYTHON_INLINE (line 6245) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_...
function CYTHON_INLINE (line 6252) | static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex z) {
function CYTHON_INLINE (line 6259) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_c...
function __Pyx_PyInt_As_long (line 6312) | static CYTHON_INLINE long __Pyx_PyInt_As_long(PyObject *x) {
function __Pyx_check_binary_version (line 6407) | static int __Pyx_check_binary_version(void) {
function PyObject (line 6428) | static PyObject *__Pyx_ImportModule(const char *name) {
function PyTypeObject (line 6445) | static PyTypeObject *__Pyx_ImportType(const char *module_name, const cha...
function __pyx_bisect_code_objects (line 6511) | static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries...
function PyCodeObject (line 6532) | static PyCodeObject *__pyx_find_code_object(int code_line) {
function __pyx_insert_code_object (line 6546) | static void __pyx_insert_code_object(int code_line, PyCodeObject* code_o...
function PyCodeObject (line 6593) | static PyCodeObject* __Pyx_CreateCodeObjectForTraceback(
function __Pyx_AddTraceback (line 6645) | static void __Pyx_AddTraceback(const char *funcname, int c_line,
function __Pyx_InitStrings (line 6673) | static int __Pyx_InitStrings(__Pyx_StringTabEntry *t) {
function CYTHON_INLINE (line 6703) | static CYTHON_INLINE PyObject* __Pyx_PyUnicode_FromString(char* c_str) {
function CYTHON_INLINE (line 6706) | static CYTHON_INLINE char* __Pyx_PyObject_AsString(PyObject* o) {
function CYTHON_INLINE (line 6770) | static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) {
function CYTHON_INLINE (line 6825) | static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) {
function CYTHON_INLINE (line 6854) | static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) {
FILE: lib/nms/gpu_nms.cpp
function CYTHON_INLINE (line 310) | static CYTHON_INLINE float __PYX_NAN() {
function CYTHON_INLINE (line 410) | static CYTHON_INLINE size_t __Pyx_Py_UNICODE_strlen(const Py_UNICODE *u)
function __Pyx_init_sys_getdefaultencoding_params (line 436) | static int __Pyx_init_sys_getdefaultencoding_params(void) {
function __Pyx_init_sys_getdefaultencoding_params (line 485) | static int __Pyx_init_sys_getdefaultencoding_params(void) {
type __Pyx_StructField_ (line 560) | struct __Pyx_StructField_
type __Pyx_StructField_ (line 564) | struct __Pyx_StructField_
type __Pyx_StructField_ (line 572) | struct __Pyx_StructField_ {
function CYTHON_INLINE (line 920) | static CYTHON_INLINE PyObject* __Pyx_PyObject_GetAttrStr(PyObject* obj, ...
type __Pyx_CodeObjectCache (line 1121) | struct __Pyx_CodeObjectCache {
type __Pyx_CodeObjectCache (line 1126) | struct __Pyx_CodeObjectCache
function PyObject (line 1287) | static PyObject *__pyx_pw_3nms_7gpu_nms_1gpu_nms(PyObject *__pyx_self, P...
function PyObject (line 1367) | static PyObject *__pyx_pf_3nms_7gpu_nms_gpu_nms(CYTHON_UNUSED PyObject *...
function CYTHON_UNUSED (line 1720) | static CYTHON_UNUSED int __pyx_pw_5numpy_7ndarray_1__getbuffer__(PyObjec...
function __pyx_pf_5numpy_7ndarray___getbuffer__ (line 1731) | static int __pyx_pf_5numpy_7ndarray___getbuffer__(PyArrayObject *__pyx_v...
function CYTHON_UNUSED (line 2516) | static CYTHON_UNUSED void __pyx_pw_5numpy_7ndarray_3__releasebuffer__(Py...
function __pyx_pf_5numpy_7ndarray_2__releasebuffer__ (line 2525) | static void __pyx_pf_5numpy_7ndarray_2__releasebuffer__(PyArrayObject *_...
function CYTHON_INLINE (line 2594) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew1(PyOb...
function CYTHON_INLINE (line 2644) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew2(PyOb...
function CYTHON_INLINE (line 2694) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew3(PyOb...
function CYTHON_INLINE (line 2744) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew4(PyOb...
function CYTHON_INLINE (line 2794) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_PyArray_MultiIterNew5(PyOb...
function CYTHON_INLINE (line 2844) | static CYTHON_INLINE char *__pyx_f_5numpy__util_dtypestring(PyArray_Desc...
function CYTHON_INLINE (line 3548) | static CYTHON_INLINE void __pyx_f_5numpy_set_array_base(PyArrayObject *_...
function CYTHON_INLINE (line 3636) | static CYTHON_INLINE PyObject *__pyx_f_5numpy_get_array_base(PyArrayObje...
type PyModuleDef (line 3697) | struct PyModuleDef
function __Pyx_InitCachedBuiltins (line 3750) | static int __Pyx_InitCachedBuiltins(void) {
function __Pyx_InitCachedConstants (line 3759) | static int __Pyx_InitCachedConstants(void) {
function __Pyx_InitGlobals (line 3883) | static int __Pyx_InitGlobals(void) {
function PyMODINIT_FUNC (line 3897) | PyMODINIT_FUNC PyInit_gpu_nms(void)
function __Pyx_RefNannyAPIStruct (line 4066) | static __Pyx_RefNannyAPIStruct *__Pyx_RefNannyImportAPI(const char *modn...
function __Pyx_RaiseArgtupleInvalid (line 4081) | static void __Pyx_RaiseArgtupleInvalid(
function __Pyx_RaiseDoubleKeywordsError (line 4106) | static void __Pyx_RaiseDoubleKeywordsError(
function __Pyx_ParseOptionalKeywords (line 4119) | static int __Pyx_ParseOptionalKeywords(
function __Pyx_RaiseArgumentTypeInvalid (line 4220) | static void __Pyx_RaiseArgumentTypeInvalid(const char* name, PyObject *o...
function CYTHON_INLINE (line 4225) | static CYTHON_INLINE int __Pyx_ArgTypeTest(PyObject *obj, PyTypeObject *...
function CYTHON_INLINE (line 4246) | static CYTHON_INLINE int __Pyx_IsLittleEndian(void) {
function __Pyx_BufFmt_Init (line 4250) | static void __Pyx_BufFmt_Init(__Pyx_BufFmt_Context* ctx,
function __Pyx_BufFmt_ParseNumber (line 4277) | static int __Pyx_BufFmt_ParseNumber(const char** ts) {
function __Pyx_BufFmt_ExpectNumber (line 4292) | static int __Pyx_BufFmt_ExpectNumber(const char **ts) {
function __Pyx_BufFmt_RaiseUnexpectedChar (line 4299) | static void __Pyx_BufFmt_RaiseUnexpectedChar(char ch) {
function __Pyx_BufFmt_TypeCharToStandardSize (line 4327) | static size_t __Pyx_BufFmt_TypeCharToStandardSize(char ch, int is_comple...
function __Pyx_BufFmt_TypeCharToNativeSize (line 4345) | static size_t __Pyx_BufFmt_TypeCharToNativeSize(char ch, int is_complex) {
function __Pyx_BufFmt_TypeCharToAlignment (line 4374) | static size_t __Pyx_BufFmt_TypeCharToAlignment(char ch, CYTHON_UNUSED in...
function __Pyx_BufFmt_TypeCharToPadding (line 4406) | static size_t __Pyx_BufFmt_TypeCharToPadding(char ch, CYTHON_UNUSED int ...
function __Pyx_BufFmt_TypeCharToGroup (line 4424) | static char __Pyx_BufFmt_TypeCharToGroup(char ch, int is_complex) {
function __Pyx_BufFmt_RaiseExpected (line 4445) | static void __Pyx_BufFmt_RaiseExpected(__Pyx_BufFmt_Context* ctx) {
function __Pyx_BufFmt_ProcessTypeChunk (line 4469) | static int __Pyx_BufFmt_ProcessTypeChunk(__Pyx_BufFmt_Context* ctx) {
function CYTHON_INLINE (line 4571) | static CYTHON_INLINE PyObject *
function CYTHON_INLINE (line 4744) | static CYTHON_INLINE void __Pyx_ZeroBuffer(Py_buffer* buf) {
function CYTHON_INLINE (line 4751) | static CYTHON_INLINE int __Pyx_GetBufferAndValidate(
function CYTHON_INLINE (line 4785) | static CYTHON_INLINE void __Pyx_SafeReleaseBuffer(Py_buffer* info) {
function PyObject (line 4791) | static PyObject *__Pyx_GetBuiltinName(PyObject *name) {
function CYTHON_INLINE (line 4822) | static CYTHON_INLINE PyObject* __Pyx_PyObject_Call(PyObject *func, PyObj...
function CYTHON_INLINE (line 4844) | static CYTHON_INLINE int __Pyx_TypeTest(PyObject *obj, PyTypeObject *typ...
function __Pyx_RaiseBufferIndexError (line 4856) | static void __Pyx_RaiseBufferIndexError(int axis) {
function CYTHON_INLINE (line 4861) | static CYTHON_INLINE PyObject* __Pyx_PyObject_GetSlice(
function __Pyx_RaiseBufferFallbackError (line 4958) | static void __Pyx_RaiseBufferFallbackError(void) {
function CYTHON_INLINE (line 4963) | static CYTHON_INLINE void __Pyx_ErrRestore(PyObject *type, PyObject *val...
function CYTHON_INLINE (line 4980) | static CYTHON_INLINE void __Pyx_ErrFetch(PyObject **type, PyObject **val...
function CYTHON_INLINE (line 5159) | static CYTHON_INLINE void __Pyx_RaiseTooManyValuesError(Py_ssize_t expec...
function CYTHON_INLINE (line 5164) | static CYTHON_INLINE void __Pyx_RaiseNeedMoreValuesError(Py_ssize_t inde...
function CYTHON_INLINE (line 5170) | static CYTHON_INLINE void __Pyx_RaiseNoneNotIterableError(void) {
function __Pyx_GetBuffer (line 5175) | static int __Pyx_GetBuffer(PyObject *obj, Py_buffer *view, int flags) {
function __Pyx_ReleaseBuffer (line 5201) | static void __Pyx_ReleaseBuffer(Py_buffer *view) {
function PyObject (line 5239) | static PyObject *__Pyx_Import(PyObject *name, PyObject *from_list, int l...
function CYTHON_INLINE (line 5342) | static CYTHON_INLINE npy_int32 __Pyx_PyInt_As_npy_int32(PyObject *x) {
function CYTHON_INLINE (line 5437) | static CYTHON_INLINE PyObject* __Pyx_PyInt_From_int(int value) {
function CYTHON_INLINE (line 5465) | static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_pa...
function CYTHON_INLINE (line 5469) | static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_pa...
function CYTHON_INLINE (line 5474) | static CYTHON_INLINE __pyx_t_float_complex __pyx_t_float_complex_from_pa...
function CYTHON_INLINE (line 5484) | static CYTHON_INLINE int __Pyx_c_eqf(__pyx_t_float_complex a, __pyx_t_fl...
function CYTHON_INLINE (line 5487) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_sumf(__pyx_t_float_co...
function CYTHON_INLINE (line 5493) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_difff(__pyx_t_float_c...
function CYTHON_INLINE (line 5499) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_prodf(__pyx_t_float_c...
function CYTHON_INLINE (line 5505) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_quotf(__pyx_t_float_c...
function CYTHON_INLINE (line 5512) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_negf(__pyx_t_float_co...
function CYTHON_INLINE (line 5518) | static CYTHON_INLINE int __Pyx_c_is_zerof(__pyx_t_float_complex a) {
function CYTHON_INLINE (line 5521) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_conjf(__pyx_t_float_c...
function CYTHON_INLINE (line 5528) | static CYTHON_INLINE float __Pyx_c_absf(__pyx_t_float_complex z) {
function CYTHON_INLINE (line 5535) | static CYTHON_INLINE __pyx_t_float_complex __Pyx_c_powf(__pyx_t_float_co...
function CYTHON_INLINE (line 5585) | static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_...
function CYTHON_INLINE (line 5589) | static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_...
function CYTHON_INLINE (line 5594) | static CYTHON_INLINE __pyx_t_double_complex __pyx_t_double_complex_from_...
function CYTHON_INLINE (line 5604) | static CYTHON_INLINE int __Pyx_c_eq(__pyx_t_double_complex a, __pyx_t_do...
function CYTHON_INLINE (line 5607) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_sum(__pyx_t_double_c...
function CYTHON_INLINE (line 5613) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_diff(__pyx_t_double_...
function CYTHON_INLINE (line 5619) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_prod(__pyx_t_double_...
function CYTHON_INLINE (line 5625) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_quot(__pyx_t_double_...
function CYTHON_INLINE (line 5632) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_neg(__pyx_t_double_c...
function CYTHON_INLINE (line 5638) | static CYTHON_INLINE int __Pyx_c_is_zero(__pyx_t_double_complex a) {
function CYTHON_INLINE (line 5641) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_conj(__pyx_t_double_...
function CYTHON_INLINE (line 5648) | static CYTHON_INLINE double __Pyx_c_abs(__pyx_t_double_complex z) {
function CYTHON_INLINE (line 5655) | static CYTHON_INLINE __pyx_t_double_complex __Pyx_c_pow(__pyx_t_double_c...
function CYTHON_INLINE (line 5708) | static CYTHON_INLINE int __Pyx_PyInt_As_int(PyObject *x) {
function CYTHON_INLINE (line 5803) | static CYTHON_INLINE PyObject* __Pyx_PyInt_From_long(long value) {
function __Pyx_PyInt_As_long (line 5834) | static CYTHON_INLINE long __Pyx_PyInt_As_long(PyObject *x) {
function __Pyx_check_binary_version (line 5929) | static int __Pyx_check_binary_version(void) {
function PyObject (line 5950) | static PyObject *__Pyx_ImportModule(const char *name) {
function PyTypeObject (line 5967) | static PyTypeObject *__Pyx_ImportType(const char *module_name, const cha...
function __pyx_bisect_code_objects (line 6033) | static int __pyx_bisect_code_objects(__Pyx_CodeObjectCacheEntry* entries...
function PyCodeObject (line 6054) | static PyCodeObject *__pyx_find_code_object(int code_line) {
function __pyx_insert_code_object (line 6068) | static void __pyx_insert_code_object(int code_line, PyCodeObject* code_o...
function PyCodeObject (line 6115) | static PyCodeObject* __Pyx_CreateCodeObjectForTraceback(
function __Pyx_AddTraceback (line 6167) | static void __Pyx_AddTraceback(const char *funcname, int c_line,
function __Pyx_InitStrings (line 6195) | static int __Pyx_InitStrings(__Pyx_StringTabEntry *t) {
function CYTHON_INLINE (line 6225) | static CYTHON_INLINE PyObject* __Pyx_PyUnicode_FromString(char* c_str) {
function CYTHON_INLINE (line 6228) | static CYTHON_INLINE char* __Pyx_PyObject_AsString(PyObject* o) {
function CYTHON_INLINE (line 6292) | static CYTHON_INLINE int __Pyx_PyObject_IsTrue(PyObject* x) {
function CYTHON_INLINE (line 6347) | static CYTHON_INLINE Py_ssize_t __Pyx_PyIndex_AsSsize_t(PyObject* b) {
function CYTHON_INLINE (line 6376) | static CYTHON_INLINE PyObject * __Pyx_PyInt_FromSize_t(size_t ival) {
FILE: lib/nms/py_cpu_nms.py
function py_cpu_nms (line 10) | def py_cpu_nms(dets, thresh):
FILE: lib/roi_data_layer/layer.py
class RoIDataLayer (line 21) | class RoIDataLayer(object):
method __init__ (line 24) | def __init__(self, roidb, num_classes, random=False):
method _shuffle_roidb_inds (line 32) | def _shuffle_roidb_inds(self):
method _get_next_minibatch_inds (line 64) | def _get_next_minibatch_inds(self):
method _get_next_minibatch (line 77) | def _get_next_minibatch(self):
method forward (line 87) | def forward(self):
FILE: lib/roi_data_layer/minibatch.py
function get_minibatch (line 19) | def get_minibatch(roidb, num_classes):
function _get_image_blob (line 54) | def _get_image_blob(roidb, scale_inds):
FILE: lib/roi_data_layer/roidb.py
function prepare_roidb (line 19) | def prepare_roidb(imdb):
FILE: lib/setup.py
function find_in_path (line 15) | def find_in_path(name, path):
function locate_cuda (line 24) | def locate_cuda():
function customize_compiler_for_nvcc (line 63) | def customize_compiler_for_nvcc(self):
class custom_build_ext (line 102) | class custom_build_ext(build_ext):
method build_extensions (line 103) | def build_extensions(self):
FILE: lib/utils/blob.py
function im_list_to_blob (line 17) | def im_list_to_blob(ims):
function prep_im_for_blob (line 33) | def prep_im_for_blob(im, pixel_means, target_size, max_size):
FILE: lib/utils/boxes_grid.py
function get_boxes_grid (line 16) | def get_boxes_grid(image_height, image_width):
FILE: lib/utils/nms.py
function nms (line 10) | def nms(dets, thresh):
FILE: lib/utils/timer.py
class Timer (line 10) | class Timer(object):
method __init__ (line 12) | def __init__(self):
method tic (line 19) | def tic(self):
method toc (line 24) | def toc(self, average=True):
FILE: tools/_init_paths.py
function add_path (line 4) | def add_path(path):
FILE: tools/demo_graspRGD.py
function Rotate2D (line 50) | def Rotate2D(pts,cnt,ang=scipy.pi/4):
function vis_detections (line 54) | def vis_detections(ax, image_name, im, class_name, dets, thresh=0.5):
function demo (line 105) | def demo(sess, net, image_name):
function parse_args (line 154) | def parse_args():
FILE: tools/demo_graspRGD_socket.py
function Rotate2D (line 66) | def Rotate2D(pts,cnt,ang=scipy.pi/4):
function vis_detections (line 70) | def vis_detections(ax, im, class_name, dets, thresh=0.5):
function compute_imgRot (line 121) | def compute_imgRot(frame):
function coordinate_img2table (line 171) | def coordinate_img2table(frame, u, v, rot):
function demo_process (line 237) | def demo_process(sess, net):
function parse_args (line 395) | def parse_args():
FILE: tools/demo_graspRGD_vis_mask.py
function Rotate2D (line 50) | def Rotate2D(pts,cnt,ang=scipy.pi/4):
function vis_detections (line 54) | def vis_detections(ax, image_name, im, class_name, dets, thresh=0.5):
function demo (line 106) | def demo(sess, net, image_name, mask_name):
function parse_args (line 159) | def parse_args():
FILE: tools/demo_graspRGD_vis_select.py
function Rotate2D (line 53) | def Rotate2D(pts,cnt,ang=scipy.pi/4):
function vis_detections (line 57) | def vis_detections(ax, image_name, im, class_name, dets, thresh=0.5):
function demo (line 114) | def demo(sess, net, image_name):
function parse_args (line 165) | def parse_args():
FILE: tools/trainval_net.py
function parse_args (line 24) | def parse_args():
function combined_roidb (line 62) | def combined_roidb(imdb_names):
Condensed preview — 86 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (1,004K chars).
[
{
"path": "README.md",
"chars": 2603,
"preview": "# grasp_multiObject_multiGrasp\n\nThis is the implementation of our RA-L work 'Real-world Multi-object, Multi-grasp Detect"
},
{
"path": "data/scripts/dataPreprocessingTest_fasterrcnn_split.m",
"chars": 4488,
"preview": "%% script to test dataPreprocessing\r\n%% created by Fu-Jen Chu on 09/15/2016\r\n\r\nclose all\r\nclear\r\n\r\n%parpool(4)\r\naddpath("
},
{
"path": "data/scripts/dataPreprocessing_fasterrcnn.m",
"chars": 2649,
"preview": "function [imagesOut bbsOut] = dataPreprocessing( imageIn, bbsIn_all, cropSize, translationShiftNumber, roatateAngleNumbe"
},
{
"path": "data/scripts/fetch_faster_rcnn_models.sh",
"chars": 956,
"preview": "#!/bin/bash\n\nDIR=\"$( cd \"$( dirname \"${BASH_SOURCE[0]}\" )/../\" && pwd )\"\ncd $DIR\n\nNET=res101\nFILE=voc_0712_80k-110k.tgz\n"
},
{
"path": "experiments/cfgs/res101-lg.yml",
"chars": 472,
"preview": "EXP_DIR: res101-lg\nTRAIN:\n HAS_RPN: True\n IMS_PER_BATCH: 1\n BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True\n RPN_POSITIVE_O"
},
{
"path": "experiments/cfgs/res101.yml",
"chars": 347,
"preview": "EXP_DIR: res101\nTRAIN:\n HAS_RPN: True\n IMS_PER_BATCH: 1\n BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True\n RPN_POSITIVE_OVER"
},
{
"path": "experiments/cfgs/res50.yml",
"chars": 345,
"preview": "EXP_DIR: res50\nTRAIN:\n HAS_RPN: True\n IMS_PER_BATCH: 1\n BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True\n RPN_POSITIVE_OVERL"
},
{
"path": "experiments/cfgs/vgg16.yml",
"chars": 301,
"preview": "EXP_DIR: vgg16\nTRAIN:\n HAS_RPN: True\n IMS_PER_BATCH: 1\n BBOX_NORMALIZE_TARGETS_PRECOMPUTED: True\n RPN_POSITIVE_OVERL"
},
{
"path": "experiments/logs/.gitignore",
"chars": 8,
"preview": "*.txt.*\n"
},
{
"path": "experiments/scripts/convert_vgg16.sh",
"chars": 1701,
"preview": "#!/bin/bash\n\nset -x\nset -e\n\nexport PYTHONUNBUFFERED=\"True\"\n\nGPU_ID=$1\nDATASET=$2\nNET=vgg16\n\narray=( $@ )\nlen=${#array[@]"
},
{
"path": "experiments/scripts/test_faster_rcnn.sh",
"chars": 1743,
"preview": "#!/bin/bash\n\nset -x\nset -e\n\nexport PYTHONUNBUFFERED=\"True\"\n\nGPU_ID=$1\nDATASET=$2\nNET=$3\n\narray=( $@ )\nlen=${#array[@]}\nE"
},
{
"path": "experiments/scripts/train_faster_rcnn.sh",
"chars": 2441,
"preview": "#!/bin/bash\n\nset -x\nset -e\n\nexport PYTHONUNBUFFERED=\"True\"\n\nGPU_ID=$1\nDATASET=$2\nNET=$3\n\narray=( $@ )\nlen=${#array[@]}\nE"
},
{
"path": "experiments/scripts/train_faster_rcnn.sh~",
"chars": 2441,
"preview": "#!/bin/bash\n\nset -x\nset -e\n\nexport PYTHONUNBUFFERED=\"True\"\n\nGPU_ID=$1\nDATASET=$2\nNET=$3\n\narray=( $@ )\nlen=${#array[@]}\nE"
},
{
"path": "lib/Makefile",
"chars": 94,
"preview": "all:\n\tpython setup.py build_ext --inplace\n\trm -rf build\nclean:\n\trm -rf */*.pyc\n\trm -rf */*.so\n"
},
{
"path": "lib/datasets/VOCdevkit-matlab-wrapper/get_voc_opts.m",
"chars": 231,
"preview": "function VOCopts = get_voc_opts(path)\n\ntmp = pwd;\ncd(path);\ntry\n addpath('VOCcode');\n VOCinit;\ncatch\n rmpath('VOCcode"
},
{
"path": "lib/datasets/VOCdevkit-matlab-wrapper/voc_eval.m",
"chars": 1332,
"preview": "function res = voc_eval(path, comp_id, test_set, output_dir)\n\nVOCopts = get_voc_opts(path);\nVOCopts.testset = test_set;\n"
},
{
"path": "lib/datasets/VOCdevkit-matlab-wrapper/xVOCap.m",
"chars": 258,
"preview": "function ap = xVOCap(rec,prec)\r\n% From the PASCAL VOC 2011 devkit\r\n\r\nmrec=[0 ; rec ; 1];\r\nmpre=[0 ; prec ; 0];\r\nfor i=nu"
},
{
"path": "lib/datasets/__init__.py",
"chars": 248,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/datasets/coco.py",
"chars": 11804,
"preview": "# --------------------------------------------------------\n# Fast/er R-CNN\n# Licensed under The MIT License [see LICENSE"
},
{
"path": "lib/datasets/ds_utils.py",
"chars": 1402,
"preview": "# --------------------------------------------------------\n# Fast/er R-CNN\n# Licensed under The MIT License [see LICENSE"
},
{
"path": "lib/datasets/factory.py",
"chars": 1811,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/datasets/factory.py~",
"chars": 1811,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/datasets/graspRGB.py",
"chars": 12227,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/datasets/graspRGB.py~",
"chars": 12229,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/datasets/imdb.py",
"chars": 8972,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/datasets/pascal_voc.py",
"chars": 11182,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/datasets/tools/mcg_munge.py",
"chars": 1451,
"preview": "import os\nimport sys\n\n\"\"\"Hacky tool to convert file system layout of MCG boxes downloaded from\nhttp://www.eecs.berkeley."
},
{
"path": "lib/datasets/voc_eval.py",
"chars": 6635,
"preview": "# --------------------------------------------------------\n# Fast/er R-CNN\n# Licensed under The MIT License [see LICENSE"
},
{
"path": "lib/layer_utils/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lib/layer_utils/anchor_target_layer.py",
"chars": 6031,
"preview": "# --------------------------------------------------------\n# Faster R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed unde"
},
{
"path": "lib/layer_utils/generate_anchors.py",
"chars": 3129,
"preview": "# --------------------------------------------------------\n# Faster R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed unde"
},
{
"path": "lib/layer_utils/proposal_layer.py",
"chars": 1850,
"preview": "# --------------------------------------------------------\n# Faster R-CNN\n# Licensed under The MIT License [see LICENSE "
},
{
"path": "lib/layer_utils/proposal_target_layer.py",
"chars": 6017,
"preview": "# --------------------------------------------------------\n# Faster R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed unde"
},
{
"path": "lib/layer_utils/proposal_top_layer.py",
"chars": 1868,
"preview": "# --------------------------------------------------------\n# Faster R-CNN\n# Licensed under The MIT License [see LICENSE "
},
{
"path": "lib/layer_utils/snippets.py",
"chars": 1473,
"preview": "# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed under The MIT License [s"
},
{
"path": "lib/model/__init__.py",
"chars": 21,
"preview": "from . import config\n"
},
{
"path": "lib/model/bbox_transform.py",
"chars": 2544,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/model/config.py",
"chars": 11160,
"preview": "from __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport os\n"
},
{
"path": "lib/model/config.py~",
"chars": 11159,
"preview": "from __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport os\n"
},
{
"path": "lib/model/nms_wrapper.py",
"chars": 727,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/model/test.py",
"chars": 6518,
"preview": "# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed under The MIT License [s"
},
{
"path": "lib/model/test.py~",
"chars": 6534,
"preview": "# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed under The MIT License [s"
},
{
"path": "lib/model/train_val.py",
"chars": 13100,
"preview": "# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed under The MIT License [s"
},
{
"path": "lib/model/train_val.py~",
"chars": 13083,
"preview": "# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed under The MIT License [s"
},
{
"path": "lib/nets/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lib/nets/network.py",
"chars": 18402,
"preview": "# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed under The MIT License [s"
},
{
"path": "lib/nets/resnet_v1.py",
"chars": 13227,
"preview": "# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed under The MIT License [s"
},
{
"path": "lib/nets/resnet_v1.py~",
"chars": 13218,
"preview": "# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed under The MIT License [s"
},
{
"path": "lib/nets/vgg16.py",
"chars": 7550,
"preview": "# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed under The MIT License [s"
},
{
"path": "lib/nms/.gitignore",
"chars": 0,
"preview": ""
},
{
"path": "lib/nms/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "lib/nms/cpu_nms.c",
"chars": 292233,
"preview": "/* Generated by Cython 0.20.1 on Wed Oct 5 13:15:30 2016 */\n\n#define PY_SSIZE_T_CLEAN\n#ifndef CYTHON_USE_PYLONG_INTERNA"
},
{
"path": "lib/nms/cpu_nms.pyx",
"chars": 2241,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/nms/gpu_nms.cpp",
"chars": 269661,
"preview": "/* Generated by Cython 0.20.1 on Wed Oct 5 13:15:30 2016 */\n\n#define PY_SSIZE_T_CLEAN\n#ifndef CYTHON_USE_PYLONG_INTERNA"
},
{
"path": "lib/nms/gpu_nms.hpp",
"chars": 146,
"preview": "void _nms(int* keep_out, int* num_out, const float* boxes_host, int boxes_num,\n int boxes_dim, float nms_overla"
},
{
"path": "lib/nms/gpu_nms.pyx",
"chars": 1110,
"preview": "# --------------------------------------------------------\n# Faster R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed unde"
},
{
"path": "lib/nms/nms_kernel.cu",
"chars": 5064,
"preview": "// ------------------------------------------------------------------\n// Faster R-CNN\n// Copyright (c) 2015 Microsoft\n//"
},
{
"path": "lib/nms/py_cpu_nms.py",
"chars": 1051,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/roi_data_layer/__init__.py",
"chars": 248,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/roi_data_layer/layer.py",
"chars": 3001,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/roi_data_layer/minibatch.py",
"chars": 2602,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/roi_data_layer/minibatch.py~",
"chars": 2652,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/roi_data_layer/roidb.py",
"chars": 1963,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/setup.py",
"chars": 5587,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/setup.py~",
"chars": 5587,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/utils/.gitignore",
"chars": 20,
"preview": "*.c\n*.cpp\n*.h\n*.hpp\n"
},
{
"path": "lib/utils/__init__.py",
"chars": 248,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/utils/bbox.pyx",
"chars": 2994,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/utils/blob.py",
"chars": 1504,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/utils/boxes_grid.py",
"chars": 2599,
"preview": "# --------------------------------------------------------\n# Subcategory CNN\n# Copyright (c) 2015 CVGL Stanford\n# Licens"
},
{
"path": "lib/utils/nms.py",
"chars": 1008,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/utils/nms.pyx",
"chars": 4103,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "lib/utils/timer.py",
"chars": 948,
"preview": "# --------------------------------------------------------\n# Fast R-CNN\n# Copyright (c) 2015 Microsoft\n# Licensed under "
},
{
"path": "tools/_init_paths.py",
"chars": 324,
"preview": "import os.path as osp\nimport sys\n\ndef add_path(path):\n if path not in sys.path:\n sys.path.insert(0, path)\n\nthi"
},
{
"path": "tools/demo.py~",
"chars": 5396,
"preview": "#!/usr/bin/env python\n\n# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed u"
},
{
"path": "tools/demo_graspRGD.py",
"chars": 7889,
"preview": "#!/usr/bin/env python\n\n# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed u"
},
{
"path": "tools/demo_graspRGD.py~",
"chars": 7866,
"preview": "#!/usr/bin/env python\n\n# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed u"
},
{
"path": "tools/demo_graspRGD_socket.py",
"chars": 21248,
"preview": "#!/usr/bin/env python\n\n# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed u"
},
{
"path": "tools/demo_graspRGD_socket.py~",
"chars": 18681,
"preview": "#!/usr/bin/env python\n\n# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed u"
},
{
"path": "tools/demo_graspRGD_socket_drawer.py~",
"chars": 21574,
"preview": "#!/usr/bin/env python\n\n# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed u"
},
{
"path": "tools/demo_graspRGD_socket_save_to_rgbd.py~",
"chars": 18111,
"preview": "#!/usr/bin/env python\n\n# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed u"
},
{
"path": "tools/demo_graspRGD_vis_mask.py",
"chars": 8021,
"preview": "#!/usr/bin/env python\n\n# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed u"
},
{
"path": "tools/demo_graspRGD_vis_select.py",
"chars": 8270,
"preview": "#!/usr/bin/env python\n\n# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed u"
},
{
"path": "tools/eval_graspRGD.py~",
"chars": 11398,
"preview": "#!/usr/bin/env python\n\n# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed u"
},
{
"path": "tools/mask_gen.py",
"chars": 125,
"preview": "import numpy as np \nmask = np.zeros((227,227))\nmask[70:120,90:150]=1\nimport scipy.miscscipy.misc.imsave('mask.jpg', m"
},
{
"path": "tools/trainval_net.py",
"chars": 4531,
"preview": "# --------------------------------------------------------\n# Tensorflow Faster R-CNN\n# Licensed under The MIT License [s"
}
]
About this extraction
This page contains the full source code of the ivalab/grasp_multiObject_multiGrasp GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 86 files (947.1 KB), approximately 298.3k tokens, and a symbol index with 484 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.