[
  {
    "path": ".gitignore",
    "content": "__pycache*\nresult\n*.png"
  },
  {
    "path": "LICENSE",
    "content": "Copyright (c) 2018 DeNA Co., Ltd.\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\n of this software and associated documentation files (the \"Software\"), to deal \nin the Software without restriction, including without limitation the rights \nto use, copy, modify, merge, publish, distribute, and/or sublicense \ncopies of the Software, and to permit persons to whom the Software is \nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all \ncopies or substantial portions of the Software; and\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND,\nEXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF\nMERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.\nIN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY\nCLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,\nTORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE\nSOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.\n\n#####################################################################################\n# Chainer_Mask_R-CNN is designed based on chainercv's API.\n# Chainer_Mask_R-CNN's source code and documents contain the original chainercv ones.\n#####################################################################################\nCopyright (c) 2017 Yusuke Niitani.\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without\nmodification, are permitted provided that the following conditions are\nmet:\n\n    * Redistributions of source code must retain the above copyright\n       notice, this list of conditions and the following disclaimer.\n\n    * Redistributions in binary form must reproduce the above\n       copyright notice, this list of conditions and the following\n       disclaimer in the documentation and/or other materials provided\n       with the distribution.\n\n    * Neither the name of the chainercv Developers nor the names of any\n       contributors may be used to endorse or promote products derived\n       from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS\n\"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT\nLIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR\nA PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT\nOWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\nSPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\nLIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,\nDATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY\nTHEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\nOF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n\n#####################################################################################\n# Chainer_Mask_R-CNN is designed based on chainer's API.\n# Chainer_Mask_R-CNN's source code and documents contain the original chainer ones.\n#####################################################################################\nCopyright (c) 2015 Preferred Infrastructure, Inc.\nCopyright (c) 2015 Preferred Networks, Inc.\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with or without\nmodification, are permitted provided that the following conditions are\nmet:\n\n    * Redistributions of source code must retain the above copyright\n       notice, this list of conditions and the following disclaimer.\n\n    * Redistributions in binary form must reproduce the above\n       copyright notice, this list of conditions and the following\n       disclaimer in the documentation and/or other materials provided\n       with the distribution.\n\n    * Neither the name of the chainer Developers nor the names of any\n       contributors may be used to endorse or promote products derived\n       from this software without specific prior written permission.\n\nTHIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS\n\"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT\nLIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR\nA PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT\nOWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,\nSPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT\nLIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,\nDATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY\nTHEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT\n(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE\nOF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.\n~                                                                      \n"
  },
  {
    "path": "README.md",
    "content": "# Chainer\\_Mask\\_R-CNN   \nChainer implementation of Mask R-CNN - the multi-task network for object detection, object classification, and instance segmentation.\n(https://arxiv.org/abs/1703.06870)   \n<a href=\"README_JP.md\">日本語版 README</a>   \n\n## What's New\n\n- Training result for R-50-C4 model has been evaluated!\n- COCO box AP = 0.346 using our trainer (0.355 with official boxes) \n- COCO mask AP = 0.287 using our trainer (0.314 with official boxes) \n\n## Examples\n- to be updated\n\n## Requirements\n- [Chainer](https://github.com/pfnet/chainer)\n- [Chainercv](https://github.com/chainer/chainercv)\n- [Cupy](https://github.com/cupy/cupy)   \n(operable if your environment can run chainer > v3 with cuda and cudnn.)   \n(verified as operable: chainer==3.1.0, chainercv==0.7.0, cupy==1.0.3)\n```\n$ pip install chainer   \n$ pip install chainercv\n$ pip install cupy\n```   \n- Python 3.0+   \n- NumPy   \n- Matplotlib   \n- OpenCV   \n\n## TODOs\n- [x] Precision Evaluator (bbox, COCO metric)\n- [x] Detectron Model Parser \n- [x] Modify ROIAlign\n- [x] Mask inference using refined ROIs\n- [x] Precision Evaluator (mask, COCO metric)\n- [ ] Improve segmentation AP for R-50-C4 model\n- [ ] Feature Pyramid Network (R-50-FPN)\n- [ ] Keypoint Detection (R-50-FPN, Keypoints)\n\n## Benchmark Results\n\n<table><tbody>\n<tr><th align=\"left\" bgcolor=#f8f8f8> </th>     <td bgcolor=white> Box AP 50:95</td><td bgcolor=white> Segm AP 50:95</td></tr>\n<tr><th align=\"left\" bgcolor=#f8f8f8>Ours (1 GPU)</th> <td bgcolor=white> 0.346 </td><td bgcolor=white> 0.287 </td></tr>\n<tr><th align=\"left\" bgcolor=#f8f8f8>Detectron model</th> <td bgcolor=white> 0.350 </td><td bgcolor=white> 0.295 </td></tr>\n<tr><th align=\"left\" bgcolor=#f8f8f8>Detectron caffe2</th> <td bgcolor=white> 0.355 </td><td bgcolor=white> 0.314 </td></tr>\n</table></tbody>\n\n## Inference with Pretrained Models\n\n- Download the pretrained model from the [Model Zoo] (https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md)   \n (`model` link of `R-50-C4\tMask` at `End-to-End Faster & Mask R-CNN Baselines`)   \n- Make `modelfiles` directory and put the downloaded file `model_final.pkl` in it   \n- Execute:  \n```   \npython utils/detectron_parser.py\n```\n- And the converted model file is saved in `modelfiles`\n- Run the demo:\n```\npython demo.py --bn2affine --modelfile modelfiles/e2e_mask_rcnn_R-50-C4_1x_d2c.npz --image <input image>\n```\n\n## Prerequisites for training\n- Download 'ResNet-50-model.caffemodel' from the \"OneDrive download\" of [ResNet pretrained models](https://github.com/KaimingHe/deep-residual-networks#models) \nfor model initialization and place it in ~/.chainer/dataset/pfnet/chainer/models/\n\n- COCO 2017 dataset :\nthe COCO dataset can be downloaded and unzipped by:\n```\nbash getcoco.sh\n```   \nSetup the COCO API:   \n```\ngit clone https://github.com/waleedka/coco\ncd coco/PythonAPI/\nmake\npython setup.py install\ncd ../../\n```\nnote: the official coco repository is not python3 compatible.    \nUse the repository above in order to run our evaluation.    \n\n## Train\n\n```\npython train.py \n```\narguments and the default conditions are defined as follows:\n```\n'--dataset', choices=('coco2017'), default='coco2017'   \n'--extractor', choices=('resnet50','resnet101'), default='resnet50', help='extractor network'\n'--gpu', '-g', type=int, default=0   \n'--lr', '-l', type=float, default=1e-4   \n'--batchsize', '-b', type=int, default=8   \n'--freeze_bn', action='store_true', default=False, help='freeze batchnorm gamma/beta'\n'--bn2affine', action='store_true', default=False, help='batchnorm to affine'\n'--out', '-o', default='result',  help='output directory'   \n'--seed', '-s', type=int, default=0   \n'--roialign', action='store_true', default=True, help='True: ROIAlign, False: ROIpooling'\n'--step_size', '-ss', type=int, default=400000  \n'--lr_step', '-ls', type=int, default=480000    \n'--lr_initialchange', '-li', type=int, default=800     \n'--pretrained', '-p', type=str, default='imagenet'   \n'--snapshot', type=int, default=4000   \n'--validation', type=int, default=30000   \n'--resume', type=str   \n'--iteration', '-i', type=int, default=800000   \n'--roi_size', '-r', type=int, default=14, help='ROI size for mask head input'\n'--gamma', type=float, default=1, help='mask loss balancing factor'   \n```\n\nnote that we use a subdivision-based updater to enable training with large batch size.\n\n\n## Demo\nSegment the objects in the input image by executing:   \n```\npython demo.py --image <input image> --modelfile result/snapshot_model.npz --contour\n```\n\n## Evaluation\n\nEvaluate the trained model with COCO metric (bounding box, segmentation) :   \n```\npython train.py --lr 0 --iteration 1 --validation 1 --resume <trained_model> \n```\n\n## Citation\nPlease cite the original paper in your publications if it helps your research:    \n\n    @article{DBLP:journals/corr/HeGDG17,\n      author    = {Kaiming He and\n                  Georgia Gkioxari and\n                  Piotr Doll{\\'{a}}r and\n                  Ross B. Girshick},\n      title     = {Mask {R-CNN}},\n      journal   = {CoRR},\n      volume    = {abs/1703.06870},\n      year      = {2017},\n      url       = {http://arxiv.org/abs/1703.06870},\n      archivePrefix = {arXiv},\n      eprint    = {1703.06870},\n      timestamp = {Wed, 07 Jun 2017 14:42:32 +0200},\n      biburl    = {http://dblp.org/rec/bib/journals/corr/HeGDG17},\n      bibsource = {dblp computer science bibliography, http://dblp.org}\n    }\n"
  },
  {
    "path": "README_JP.md",
    "content": "# Chainer\\_Mask\\_R-CNN   \nマルチタスク検出器Mask R-CNNのchainer実装\n(https://arxiv.org/abs/1703.06870)   \n\n## 実行例\n- 準備中\n\n## 必要環境\n- [Chainer](https://github.com/pfnet/chainer)\n- [Chainercv](https://github.com/chainer/chainercv)\n- [Cupy](https://github.com/cupy/cupy)   \n (動作確認済み: chainer==3.1.0, chainercv==0.7.0, verified: cupy==1.0.3)\n```\n$ pip install chainer   \n$ pip install chainercv\n$ pip install cupy==1.0.3\n```   \n- Python 3.0+   \n- NumPy   \n- Matplotlib   \n- OpenCV   \n\n## TODOs\n- [x] Precision Evaluator (bbox, COCO metric)\n- [x] Detectron Model Parser \n- [x] Modify ROIAlign\n- [x] Mask inference using refined ROIs\n- [x] Precision Evaluator (mask, COCO metric)\n- [ ] Feature Pyramid Network (R-50-FPN)\n- [ ] Keypoint Detection (R-50-FPN, Keypoints)\n\n## 学習済みモデルの使用\n\n- [Model Zoo] (https://github.com/facebookresearch/Detectron/blob/master/MODEL_ZOO.md) からモデルファイルをダウンロード\n ( `End-to-End Faster & Mask R-CNN Baselines` の `R-50-C4\tMask` 行の `model` リンク)   \n- `modelfiles` ディレクトリを作り、ダウンロードした `model_final.pkl` を置く\n- 以下を実行\n```   \npython utils/detectron_parser.py\n```\n- `modelfiles` の中に変換されたモデルファイルが保存されます。\n- 以下によりデモを実行\n```\npython demo.py --bn2affine --modelfile modelfiles/e2e_mask_rcnn_R-50-C4_1x_d2c.npz --image <input image>\n```\n\n## 学習のための準備\n- 学習済みモデルのダウンロード  \n・以下リンク先の'OneDrive download'から、ResNet-50-model.caffemodelをダウンロード\n [ResNet pretrained models](https://github.com/KaimingHe/deep-residual-networks#models)\n・~/.chainer/dataset/pfnet/chainer/models/　に置く\n\n- COCO 2017 データセット\nCOCOデータセットのダウンロードと解凍:   \n```\nbash getcoco.sh\n```\n- COCO APIのセットアップ:   \n```\ngit clone https://github.com/waleedka/coco\ncd coco/PythonAPI/\nmake\npython setup.py install\ncd ../../\n```\n\n## 学習\n\n```\npython train.py \n```\n引数は以下です:\n```\n'--dataset', choices=('coco2017'), default='coco2017'   \n'--extractor', choices=('resnet50','resnet101'), default='resnet50', help='extractor network'\n'--gpu', '-g', type=int, default=0   \n'--lr', '-l', type=float, default=1e-4   \n'--batchsize', '-b', type=int, default=8   \n'--freeze_bn', action='store_true', default=False, help='freeze batchnorm gamma/beta'\n'--bn2affine', action='store_true', default=False, help='batchnorm to affine'\n'--out', '-o', default='result',  help='output directory'   \n'--seed', '-s', type=int, default=0   \n'--roialign', action='store_true', default=True, help='True: ROIAlign, False: ROIpooling'\n'--step_size', '-ss', type=int, default=400000  \n'--lr_step', '-ls', type=int, default=480000    \n'--lr_initialchange', '-li', type=int, default=800     \n'--pretrained', '-p', type=str, default='imagenet'   \n'--snapshot', type=int, default=4000   \n'--validation', type=int, default=30000   \n'--resume', type=str   \n'--iteration', '-i', type=int, default=800000   \n'--roi_size', '-r', type=int, default=14, help='ROI size for mask head input'\n'--gamma', type=float, default=1, help='mask loss balancing factor'   \n```\n\n本実装ではsubdivisionを用いたupdateを行なっているため、batch size = 1 相当のGPUメモリでbatch size=8等を指定可能です\n\n## デモ\n入力画像のインスタンス・セグメンテーションを実行します:   \n```\npython demo.py --image <input image> --modelfile result/snapshot_model.npz --contour  \n```\n\n### 評価\n\nCOCO metric (Bounding Box, Segmentation) によるモデルの評価を実行します。\n\n```\npython train.py --lr 0 --iteration 1 --validation 1 --resume <trained_model> \n```\n\n## 引用\nPlease cite the original paper in your publications if it helps your research:    \n\n    @article{DBLP:journals/corr/HeGDG17,\n      author    = {Kaiming He and\n                  Georgia Gkioxari and\n                  Piotr Doll{\\'{a}}r and\n                  Ross B. Girshick},\n      title     = {Mask {R-CNN}},\n      journal   = {CoRR},\n      volume    = {abs/1703.06870},\n      year      = {2017},\n      url       = {http://arxiv.org/abs/1703.06870},\n      archivePrefix = {arXiv},\n      eprint    = {1703.06870},\n      timestamp = {Wed, 07 Jun 2017 14:42:32 +0200},\n      biburl    = {http://dblp.org/rec/bib/journals/corr/HeGDG17},\n      bibsource = {dblp computer science bibliography, http://dblp.org}\n    }\n"
  },
  {
    "path": "coco_dataset.py",
    "content": "import numpy as np\nfrom skimage.draw import polygon\nimport json\nimport os\nimport cv2\nimport pycocotools\nfrom pycocotools.coco import COCO\n\nimport chainer\nfrom chainercv.utils import read_image\nclass COCODataset(chainer.dataset.DatasetMixin):\n    def __init__(self, data_dir='COCO/', json_file='instances_train2017.json',\n                 name='train2017', id_list_file='train2017.txt', sizemin=10):\n        self.data_dir  = data_dir\n        self.json_file = json_file\n        self.coco = COCO(self.data_dir + 'annotations/'+self.json_file)\n        self.ids = self.coco.getImgIds()\n        self.name = name\n        self.sizemin = sizemin\n        self.class_ids = sorted(self.coco.getCatIds())\n\n    def __len__(self):\n        return len(self.ids)\n\n    def ann2rle(self, ann, height, width):\n        if isinstance(ann, list):\n            rles = pycocotools.mask.frPyObjects(ann, height, width)\n            rle = pycocotools.mask.merge(rles)\n        elif isinstance(ann['counts'], list):\n            rle = pycocotools.mask.frPyObjects(ann, height, width)\n        else:\n            rle = ann\n        return rle\n\n    def get_example(self, i):\n        #i = i % 500 # for limiting data size\n        numofboxes=0\n        while True:\n            id_ = self.ids[i]\n            annot_labels, annot_bboxes, annot_segs= list(), list(), list()\n            anno_ids = self.coco.getAnnIds(imgIds=[int(id_)], iscrowd=None)\n            annotations = self.coco.loadAnns(anno_ids)\n            for a in annotations:\n                if a['bbox'][2] > self.sizemin and a['bbox'][3] > self.sizemin \\\n                and a['iscrowd']==0:\n                    annot_labels.append(a['category_id'])\n                    annot_bboxes.append(a['bbox'])\n                    annot_segs.append(a['segmentation'])\n            numofboxes=len(annot_labels)\n            if numofboxes > 0 or chainer.config.train == False:\n                break\n            else:\n                i = i - 1\n        img_file = os.path.join(self.data_dir, self.name, '{:012}'.format(id_) + '.jpg')\n        img = read_image(img_file, color=True)\n        _, h, w = img.shape\n        annot_masks = []\n        for annot_seg_polygons in annot_segs:\n            rle = self.ann2rle(annot_seg_polygons, h, w)\n            annot_masks.append(pycocotools.mask.decode(rle))\n        if numofboxes > 0:\n            annot_masks = np.stack(annot_masks).astype(np.uint8) #y,x\n            annot_bboxes = np.stack(annot_bboxes).astype(np.float32)\n            annot_labels = np.stack(annot_labels).astype(np.int32)\n        else:\n            annot_labels, annot_bboxes, annot_masks = [], [], []\n\n        return img, annot_labels, annot_bboxes, annot_masks, i\n"
  },
  {
    "path": "demo.py",
    "content": "import argparse\r\nimport chainer\r\nimport numpy as np\r\nfrom mask_rcnn_train_chain import MaskRCNNTrainChain\r\nfrom utils.bn_utils import freeze_bn, bn_to_affine\r\n\r\ndef main():\r\n    parser = argparse.ArgumentParser()\r\n    parser.add_argument('--gpu', type=int, default=0)\r\n    parser.add_argument('--modelfile')\r\n    parser.add_argument('--image', type=str)\r\n    parser.add_argument('--roi_size', '-r', type=int, default=14, help='ROI size for mask head input')\r\n    parser.add_argument('--roialign', action='store_false', default=True, help='default: True')\r\n    parser.add_argument('--contour', action='store_true', default=False, help='visualize contour')\r\n    parser.add_argument('--background', action='store_true', default=False, help='background(no-display mode)')\r\n    parser.add_argument('--bn2affine', action='store_true', default=False, help='batchnorm to affine')\r\n    parser.add_argument('--extractor', choices=('resnet50','resnet101'),\r\n                        default='resnet50', help='extractor network')\r\n    args = parser.parse_args()\r\n\r\n    #network class id --> coco label id\r\n    test_class_ids = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, \\\r\n    27, 28, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, \\\r\n    57, 58, 59, 60, 61, 62, 63, 64, 65, 67, 70, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 84, 85, 86, 87, 88, 89, 90]\r\n\r\n    if args.background:\r\n        import matplotlib\r\n        matplotlib.use('Agg')\r\n    import matplotlib.pyplot as plot\r\n    from utils.vis_bbox import vis_bbox\r\n    from chainercv.datasets import voc_bbox_label_names\r\n    from mask_rcnn_resnet import MaskRCNNResNet\r\n    from chainercv import utils\r\n    if args.extractor=='resnet50':\r\n        model = MaskRCNNResNet(n_fg_class=80, roi_size=args.roi_size, pretrained_model=args.modelfile, n_layers=50, roi_align=args.roialign, class_ids=test_class_ids)\r\n    elif args.extractor=='resnet101':\r\n        model = MaskRCNNResNet(n_fg_class=80, roi_size=args.roi_size, pretrained_model=args.modelfile, n_layers=101, roi_align=args.roialign, class_ids=test_class_ids)\r\n\r\n    chainer.serializers.load_npz(args.modelfile, model)\r\n    if args.gpu >= 0:\r\n        chainer.cuda.get_device_from_id(args.gpu).use()\r\n        model.to_gpu()\r\n    if args.bn2affine:\r\n        bn_to_affine(model)\r\n    img = utils.read_image(args.image, color=True)\r\n    bboxes, labels, scores, masks = model.predict([img])\r\n    bbox, label, score, mask = bboxes[0], np.asarray(labels[0],dtype=np.int32), scores[0], masks[0]\r\n    #print(bbox, np.asarray(label,dtype=np.int32), score, mask)\r\n\r\n    coco_label_names=('background',  # class zero\r\n            'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',\r\n            'fire hydrant', 'street sign', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',\r\n            'elephant', 'bear', 'zebra', 'giraffe', 'hat', 'backpack', 'umbrella', 'shoe', 'eye glasses', 'handbag', 'tie', 'suitcase', 'frisbee',\r\n            'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle',\r\n            'plate', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',\r\n            'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',\r\n            'mirror', 'dining table', 'window', 'desk','toilet', 'door', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',\r\n            'toaster', 'sink', 'refrigerator', 'blender', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'\r\n\r\n    )\r\n    vis_bbox(\r\n        img, bbox, label=label, score=score, mask=mask, label_names=coco_label_names, contour=args.contour, labeldisplay=True)\r\n    plot.show()\r\n    filename = \"output.png\"\r\n    plot.savefig(filename)\r\n\r\nif __name__ == '__main__':\r\n    main()\r\n"
  },
  {
    "path": "getcoco.sh",
    "content": "# get COCO dataset\nmkdir COCO\ncd COCO\n\nwget http://images.cocodataset.org/zips/train2017.zip\nwget http://images.cocodataset.org/zips/val2017.zip\nwget http://images.cocodataset.org/annotations/annotations_trainval2017.zip\n\nunzip train2017.zip\nunzip val2017.zip\nunzip annotations_trainval2017.zip\n\nrm -f train2017.zip\nrm -f val2017.zip\nrm -f annotations_trainval2017.zip"
  },
  {
    "path": "mask_rcnn.py",
    "content": "from __future__ import division\n\nimport numpy as np\n\nimport chainer\nfrom chainer import cuda\nimport chainer.functions as F\nfrom chainercv.links.model.faster_rcnn.utils.loc2bbox import loc2bbox\nfrom chainercv.utils import non_maximum_suppression\nfrom chainercv.transforms.image.resize import resize\nimport cv2\nimport pycocotools\nfrom utils.box_utils import bbox_yxyx2xywh, im_mask\n\nclass MaskRCNN(chainer.Chain):\n    def __init__(self, extractor, rpn, head, mean,\n                 min_size=600, max_size=1000,\n                 loc_normalize_mean=(0., 0., 0., 0.),\n                 loc_normalize_std=(0.1, 0.1, 0.2, 0.2),\n                 class_ids=[]\n                 ):\n        print(\"MaskRCNN initialization\")\n        super(MaskRCNN, self).__init__()\n        with self.init_scope():\n            self.extractor = extractor\n            self.rpn = rpn\n            self.head = head\n\n        self.mean = mean\n        self.min_size = min_size\n        self.max_size = max_size\n        self.loc_normalize_mean = loc_normalize_mean\n        self.loc_normalize_std = loc_normalize_std\n        self.use_preset('visualize')\n        if class_ids==[]:\n            raise ValueError('set class ids')\n        self.class_ids = class_ids\n        self.preset = 'visualize'\n    @property\n    def n_class(self):\n        return self.head.n_class\n\n    def __call__(self, x, scale=1.):\n        img_size = x.shape[2:]\n        h = self.extractor(x) #VGG\n        rpn_locs, rpn_scores, rois, roi_indices, anchor = \\\n            self.rpn(h, img_size, scale) #Region Proposal Network\n        hres5 = self.head.res5head(h, rois, roi_indices)\n        roi_cls_locs, roi_scores = self.head.boxhead(hres5)\n        return roi_cls_locs, roi_scores, rois, roi_indices, h\n\n    def use_preset(self, preset):\n        if preset == 'visualize':\n            self.nms_thresh = 0.3\n            self.score_thresh = 0.7\n            self.preset = 'visualize'\n        elif preset == 'evaluate':\n            self.nms_thresh = 0.5\n            self.score_thresh = 0.05\n            self.preset = 'evaluate'\n        else:\n            raise ValueError('preset must be visualize or evaluate')\n\n    def prepare(self, img):\n        _, H, W = img.shape\n        scale = self.min_size / min(H, W)\n        if scale * max(H, W) > self.max_size:\n            scale = self.max_size / max(H, W)\n        #img = resize(img, (int(H * scale), int(W * scale)))\n        img = img.transpose((1,2,0))\n        img = cv2.resize(img, None, None, fx=scale, fy=scale,\n                    interpolation=cv2.INTER_LINEAR)\n        img = img.transpose((2,0,1))\n        img = (img - self.mean).astype(np.float32, copy=False)\n        img = img[::-1, :, :] # RGB to BGR order for resnet pretrained model\n        return img\n\n    def _suppress(self, raw_cls_bbox, raw_cls_roi, raw_prob):\n        bbox = list()\n        roi = list()\n        label = list()\n        score = list()\n        mask = list()\n        for l in range(1, self.n_class):\n            cls_bbox_l = raw_cls_bbox.reshape((-1, self.n_class, 4))[:, l, :]\n            cls_roi_l = raw_cls_roi.reshape((-1, self.n_class, 4))[:, l, :]\n            prob_l = raw_prob[:, l]\n            lmask = prob_l > self.score_thresh\n            cls_bbox_l = cls_bbox_l[lmask]\n            cls_roi_l = cls_roi_l[lmask]\n            prob_l = prob_l[lmask]\n            keep = non_maximum_suppression(cls_bbox_l, self.nms_thresh, prob_l)\n            bbox.append(cls_bbox_l[keep])\n            roi.append(cls_roi_l[keep])\n            label.append((l - 1) * np.ones((len(keep),)))\n            score.append(prob_l[keep])\n        bbox = np.concatenate(bbox, axis=0).astype(np.float32)\n        roi = np.concatenate(roi, axis=0).astype(np.float32)\n        label = np.concatenate(label, axis=0).astype(np.float32)\n        score = np.concatenate(score, axis=0).astype(np.float32)\n        return bbox, roi, label, score\n\n    def predict(self, imgs):\n        prepared_imgs = list()\n        sizes = list()\n        #print(\"predicting!\")\n        for img in imgs:\n            size = img.shape[1:]\n            img = self.prepare(img.astype(np.float32))\n            prepared_imgs.append(img)\n            sizes.append(size)\n        bboxes = list()\n        out_rois = list()\n        labels = list()\n        scores = list()\n        masks = list()\n        for img, size in zip(prepared_imgs, sizes):\n            with chainer.using_config('train', False), \\\n                chainer.function.no_backprop_mode():\n                img_var = chainer.Variable(self.xp.asarray(img[None]))\n                scale = img_var.shape[3] / size[1]\n                roi_cls_locs, roi_scores, rois, _,  h = self.__call__(img_var, scale=scale)\n            #assuming batch size = 1\n            roi_cls_loc = roi_cls_locs.data\n            roi_score = roi_scores.data\n            roi = rois / scale\n            mean = self.xp.tile(self.xp.asarray(self.loc_normalize_mean), self.n_class)\n            std = self.xp.tile(self.xp.asarray(self.loc_normalize_std), self.n_class)\n            roi_cls_loc = (roi_cls_loc * std + mean).astype(np.float32)\n            roi_cls_loc = roi_cls_loc.reshape((-1, self.n_class, 4))\n            roi = self.xp.broadcast_to(roi[:, None], roi_cls_loc.shape).reshape((-1, 4))\n            cls_bbox = loc2bbox(roi, roi_cls_loc.reshape((-1, 4)))\n            cls_bbox = cls_bbox.reshape((-1, self.n_class * 4))\n            cls_roi = roi.reshape((-1, self.n_class * 4))\n            #clip the bbox\n            cls_bbox[:, 0::2] = self.xp.clip(cls_bbox[:, 0::2], 0, size[0])\n            cls_bbox[:, 1::2] = self.xp.clip(cls_bbox[:, 1::2], 0, size[1])\n            cls_roi[:, 0::2] = self.xp.clip(cls_roi[:, 0::2], 0, size[0])\n            cls_roi[:, 1::2] = self.xp.clip(cls_roi[:, 1::2], 0, size[1])\n\n            prob = F.softmax(roi_score).data\n            raw_cls_bbox = cuda.to_cpu(cls_bbox)\n            raw_cls_roi = cuda.to_cpu(cls_roi)\n            raw_prob = cuda.to_cpu(prob)\n            bbox, out_roi, label, score = self._suppress(raw_cls_bbox, raw_cls_roi, raw_prob)\n            mask=[]\n            if len(bbox) > 0:\n                # mask head\n                roi_indices = self.xp.zeros((len(bbox),), dtype=np.int32)\n                with chainer.using_config('train', False), \\\n                    chainer.function.no_backprop_mode():\n                    hres5 = self.head.res5head(h, cuda.to_gpu(bbox * scale), roi_indices)\n                    roi_masks = self.head.maskhead(hres5)\n                roi_mask = F.sigmoid(roi_masks).data\n                raw_mask = cuda.to_cpu(roi_mask)\n                # postprocess \n                if self.preset == 'evaluate':\n                    bboxes.append(bbox_yxyx2xywh(bbox))\n                    wmasks = []\n                    for m, b, l in zip(raw_mask, bbox, label):\n                        wm = im_mask(m[int(l+1)], size, b)\n                        # encode the mask \n                        wm = pycocotools.mask.encode(np.asfortranarray(wm))\n                        wm['counts'] = wm['counts'].decode('ascii')\n                        mask.append(wm)\n                elif self.preset == 'visualize':\n                    bboxes.append(bbox)\n                    wmasks = []\n                    for m, b, l in zip(raw_mask, bbox, label):\n                        wm = im_mask(m[int(l+1)], size, b)\n                        mask.append(wm)\n            elif self.preset == 'evaluate':\n                # len(bbox) = 0\n                wm = np.zeros((size[0], size[1]), dtype=np.uint8)\n                wm = pycocotools.mask.encode(np.asfortranarray(wm))\n                wm['counts'] = wm['counts'].decode('ascii')\n                mask.append(wm)\n                bboxes.append(bbox_yxyx2xywh(bbox))\n            labels.append([self.class_ids[int(l)] for l in label.tolist()])\n            scores.append(score)\n            masks.append(mask)\n\n        return bboxes, labels, scores, masks\n\n"
  },
  {
    "path": "mask_rcnn_resnet.py",
    "content": "import numpy as np\r\n\r\nimport chainer\r\nimport chainer.functions as F\r\nimport chainer.links as L\r\nfrom mask_rcnn import MaskRCNN\r\n#from chainercv.links.model.faster_rcnn.region_proposal_network import \\\r\n#    RegionProposalNetwork\r\nfrom utils.region_proposal_network import RegionProposalNetwork\r\nfrom utils import roi_align_2d\r\nfrom chainer.links.model.vision.resnet import BuildingBlock, _retrieve\r\nfrom chainer.links.connection.convolution_2d import Convolution2D\r\nfrom chainer.links.connection.linear import Linear\r\nfrom chainer.links.normalization.batch_normalization import BatchNormalization\r\nfrom chainer.initializers import constant\r\n\r\nclass ExtractorResNet(chainer.link.Chain):\r\n    def __init__(self, pretrained_model='auto', n_layers=50, roi_size=14):\r\n        super(ExtractorResNet, self).__init__()\r\n        print('Extractor ResNet',n_layers,' initialization')\r\n        kwargs = {'initialW': constant.Zero()}\r\n        if pretrained_model=='auto':\r\n            if n_layers == 50:\r\n                pretrained_model = 'ResNet-50-model.caffemodel'\r\n                block = [3, 4, 6, 3]\r\n            elif n_layers == 101:\r\n                pretrained_model = 'ResNet-101-model.caffemodel'\r\n                block = [3, 4, 23, 3]    \r\n        with self.init_scope():\r\n            self.conv1 = Convolution2D(3, 64, 7, 2, 3, **kwargs, nobias=True)\r\n            self.bn1 = BatchNormalization(64)\r\n            self.res2 = BuildingBlock(block[0], 64, 64, 256, 1, **kwargs)\r\n            self.res3 = BuildingBlock(block[1], 256, 128, 512, 2, **kwargs)\r\n            self.res4 = BuildingBlock(block[2], 512, 256, 1024, 2, **kwargs)\r\n            self.res5 = BuildingBlock(block[3], 1024, 512, 2048, roi_size//7, **kwargs)\r\n            self.fc6 = Linear(2048, 1000)\r\n        if pretrained_model and pretrained_model.endswith('.caffemodel'):\r\n            _retrieve(n_layers, 'ResNet-{}-model.npz'.format(n_layers),\r\n                      pretrained_model, self)\r\n        elif pretrained_model:\r\n            npz.load_npz(pretrained_model, self)\r\n        del self.fc6\r\n    def __call__(self, x):\r\n        h = F.relu(self.bn1(self.conv1(x)))\r\n        _, _, H, W = h.shape\r\n        Hpool = (H + 1)//2\r\n        Wpool = (W + 1)//2\r\n        h = F.max_pooling_2d(h, ksize=3, stride=2, pad=1)\r\n        h = h[:, :, :Hpool, :Wpool]\r\n        h = self.res2(h)\r\n        h = self.res3(h)\r\n        h = self.res4(h)\r\n        return h\r\n\r\nclass MaskRCNNResNet(MaskRCNN):\r\n    feat_stride = 16\r\n    def __init__(self,\r\n                 n_fg_class=None,\r\n                 pretrained_model=None,\r\n                 min_size=800, max_size=1333,\r\n                 ratios=[0.5 ,1, 2], anchor_scales=[2, 4, 8, 16, 32],\r\n                 initialW=None, rpn_initialW=None,\r\n                 loc_initialW=None, score_initialW=None,\r\n                 proposal_creator_params={\"n_test_pre_nms\":6000,\"n_test_post_nms\": 1000,\"min_size\":4},\r\n                 roi_size=14,\r\n                 class_ids=[],\r\n                 n_layers=50, \r\n                 roi_align=True\r\n                 ):\r\n        print(\"MaskRNNResNet initialization\")\r\n        if n_fg_class is None:\r\n            raise ValueError('supply n_fg_class!')\r\n        if loc_initialW is None:\r\n            loc_initialW = chainer.initializers.Normal(0.001)\r\n        if score_initialW is None:\r\n            score_initialW = chainer.initializers.Normal(0.01)\r\n        if rpn_initialW is None:\r\n            rpn_initialW = chainer.initializers.Normal(0.01)\r\n        if initialW is None:# and pretrained_model:\r\n            print(\"setting initialW\")\r\n            initialW = chainer.initializers.Normal(0.01)\r\n        self.roi_size=roi_size\r\n        if pretrained_model is not None:\r\n            pretrained_model = 'auto'\r\n        extractor = ExtractorResNet(pretrained_model, n_layers=n_layers, roi_size=roi_size)\r\n        rpn = RegionProposalNetwork(\r\n            1024, 1024,\r\n            ratios=ratios, anchor_scales=anchor_scales,\r\n            feat_stride=self.feat_stride,\r\n            initialW=rpn_initialW,\r\n            proposal_creator_params=proposal_creator_params,\r\n        )\r\n        head = MaskRCNNHead(\r\n            n_fg_class + 1,\r\n            roi_size=self.roi_size, spatial_scale=1. / self.feat_stride,\r\n            initialW=initialW, loc_initialW=loc_initialW, score_initialW=score_initialW,\r\n            roi_align=roi_align, reslayer=extractor.res5\r\n        )\r\n        del extractor.res5\r\n        super(MaskRCNNResNet, self).__init__(\r\n            extractor, rpn, head,\r\n            mean=np.array([122.7717, 115.9465, 102.9801], dtype=np.float32)[:, None, None],\r\n            min_size=min_size, max_size=max_size, class_ids=class_ids\r\n        )\r\n\r\nclass MaskRCNNHead(chainer.Chain):\r\n    def __init__(self, n_class, roi_size, spatial_scale,\r\n                 initialW=None, loc_initialW=None, score_initialW=None, roi_align=True, reslayer=None):\r\n        super(MaskRCNNHead, self).__init__()\r\n        with self.init_scope():\r\n            self.res5 = reslayer#BuildingBlock(3, 1024, 512, 2048, 1, initialW=initialW) \r\n            #class / loc branch\r\n            self.cls_loc = L.Linear(2048, n_class * 4, initialW=initialW)\r\n            self.score = L.Linear(2048, n_class, initialW=score_initialW)\r\n            #Mask-RCNN branch\r\n            self.deconvm1 = L.Deconvolution2D(2048, 256, 2, 2, initialW=initialW)\r\n            self.convm2 = L.Convolution2D(256, n_class, 1, 1, pad=0,initialW=initialW)\r\n\r\n        self.n_class = n_class\r\n        self.roi_size = roi_size\r\n        self.spatial_scale = spatial_scale\r\n        self.roi_align = roi_align\r\n        print(\"ROI Align=\",roi_align)\r\n\r\n    def res5head(self, x, rois, roi_indices):\r\n        # extracted feature map -> pooling -> res5 block \r\n        roi_indices = roi_indices.astype(np.float32)\r\n        indices_and_rois = self.xp.concatenate(\r\n            (roi_indices[:, None], rois), axis=1)\r\n        #x: (batch, channel, w, h)\r\n        #rois: (128, 4) (ROI indices)\r\n        if self.roi_align:\r\n            pool = _roi_align_2d_yx(\r\n                x, indices_and_rois, self.roi_size,self.roi_size,\r\n                self.spatial_scale)\r\n        else:\r\n            pool = _roi_pooling_2d_yx(\r\n                x, indices_and_rois, self.roi_size,self.roi_size,\r\n                self.spatial_scale)\r\n        hres5 = self.res5(pool)\r\n        return hres5\r\n\r\n    def maskhead(self, hres5):\r\n        # mask branch\r\n        h = F.relu(self.deconvm1(hres5)) \r\n        masks=self.convm2(h)\r\n        return masks\r\n\r\n    def boxhead(self, hres5):\r\n        # box branch\r\n        h = F.average_pooling_2d(hres5, self.roi_size//2, stride=7)\r\n        roi_cls_locs = self.cls_loc(h)\r\n        roi_scores = self.score(h)\r\n        return roi_cls_locs, roi_scores\r\n\r\ndef _roi_pooling_2d_yx(x, indices_and_rois, outh, outw, spatial_scale):\r\n    xy_indices_and_rois = indices_and_rois[:, [0, 2, 1, 4, 3]]\r\n    pool = F.roi_pooling_2d(\r\n        x, xy_indices_and_rois, outh, outw, spatial_scale)\r\n    return pool\r\n\r\ndef _roi_align_2d_yx(x, indices_and_rois, outh, outw, spatial_scale):\r\n    xy_indices_and_rois = indices_and_rois[:, [0, 2, 1, 4, 3]]\r\n    pool = roi_align_2d.roi_align_2d(\r\n        x, xy_indices_and_rois, outh, outw, spatial_scale)\r\n    return pool\r\n"
  },
  {
    "path": "mask_rcnn_train_chain.py",
    "content": "import numpy as np\r\n\r\nimport chainer\r\nfrom chainer import cuda\r\nimport chainer.functions as F\r\n\r\nfrom chainercv.links.model.faster_rcnn.utils.anchor_target_creator import AnchorTargetCreator\r\nfrom utils.proposal_target_creator import ProposalTargetCreator\r\nfrom chainer import computational_graph as c\r\nfrom chainercv.links import PixelwiseSoftmaxClassifier\r\n\r\nclass MaskRCNNTrainChain(chainer.Chain):\r\n    def __init__(self, mask_rcnn, rpn_sigma=3., roi_sigma=1., gamma=1,\r\n                 anchor_target_creator=AnchorTargetCreator(),\r\n                 roi_size=14):\r\n        super(MaskRCNNTrainChain, self).__init__()\r\n        with self.init_scope():\r\n            self.mask_rcnn = mask_rcnn\r\n        self.rpn_sigma = rpn_sigma\r\n        self.roi_sigma = roi_sigma\r\n        self.anchor_target_creator = anchor_target_creator\r\n        self.proposal_target_creator = ProposalTargetCreator(roi_size=roi_size//2)\r\n        self.loc_normalize_mean = mask_rcnn.loc_normalize_mean\r\n        self.loc_normalize_std = mask_rcnn.loc_normalize_std\r\n        self.decayrate=0.99\r\n        self.avg_loss = None\r\n        self.gamma=gamma\r\n    def __call__(self, imgs, bboxes, labels, scale, masks, i):\r\n\r\n        if isinstance(bboxes, chainer.Variable):\r\n            bboxes = bboxes.data\r\n        if isinstance(labels, chainer.Variable):\r\n            labels = labels.data\r\n        if isinstance(scale, chainer.Variable):\r\n            scale = scale.data\r\n        if isinstance(masks, chainer.Variable):\r\n            masks = masks.data\r\n        scale = np.asscalar(cuda.to_cpu(scale))\r\n        n = bboxes.shape[0]\r\n        if n != 1:\r\n            raise ValueError('only batch size 1 is supported')\r\n        _, _, H, W = imgs.shape\r\n        img_size = (H, W)\r\n        #Extractor (VGG) : img -> features\r\n        with chainer.using_config('train', False):\r\n            features = self.mask_rcnn.extractor(imgs)\r\n\r\n        #Region Proposal Network : features -> rpn_locs, rpn_scores, rois\r\n        rpn_locs, rpn_scores, rois, roi_indices, anchor = self.mask_rcnn.rpn(\r\n            features, img_size, scale)\r\n        bbox, label, mask, rpn_score, rpn_loc, roi = \\\r\n            bboxes[0], labels[0], masks[0], rpn_scores[0], rpn_locs[0], rois # batch size=1\r\n\r\n        #proposal target : roi(proposed) , bbox(GT), label(GT) -> sample_roi, gt_roi_loc, gt_roi_label\r\n        #the targets are compared with the head output.\r\n        sample_roi, gt_roi_loc, gt_roi_label, gt_roi_mask = self.proposal_target_creator(\r\n            roi, bbox, label, mask, self.loc_normalize_mean, self.loc_normalize_std)\r\n        sample_roi_index = self.xp.zeros((len(sample_roi),), dtype=np.int32)\r\n\r\n        #Head Network : features, sample_roi -> roi_cls_loc, roi_score\r\n        with chainer.using_config('train', False):\r\n            hres5 = self.mask_rcnn.head.res5head(features, sample_roi, sample_roi_index)\r\n            roi_cls_loc, roi_score = self.mask_rcnn.head.boxhead(hres5)\r\n            roi_cls_mask = self.mask_rcnn.head.maskhead(hres5)\r\n            del(hres5)\r\n\r\n        #RPN losses\r\n        gt_rpn_loc, gt_rpn_label = self.anchor_target_creator(bbox, anchor, img_size)\r\n        rpn_loc_loss = _fast_rcnn_loc_loss(rpn_loc, gt_rpn_loc, gt_rpn_label, self.rpn_sigma)\r\n        rpn_cls_loss = F.sigmoid_cross_entropy(rpn_score, gt_rpn_label)\r\n\r\n        #Head output losses\r\n        n_sample = roi_cls_loc.shape[0]\r\n        roi_cls_loc = roi_cls_loc.reshape((n_sample, -1, 4))\r\n        roi_loc = roi_cls_loc[self.xp.arange(n_sample), gt_roi_label] \r\n        roi_mask = roi_cls_mask[self.xp.arange(n_sample), gt_roi_label]\r\n        roi_loc_loss = _fast_rcnn_loc_loss(roi_loc, gt_roi_loc, gt_roi_label, self.roi_sigma)\r\n        roi_cls_loss = F.softmax_cross_entropy(roi_score, gt_roi_label)\r\n\r\n        #mask loss:  average binary cross-entropy loss\r\n        mask_loss = F.sigmoid_cross_entropy(roi_mask[0:gt_roi_mask.shape[0]], gt_roi_mask)\r\n\r\n        #total loss\r\n        loss = rpn_loc_loss + rpn_cls_loss + roi_loc_loss + roi_cls_loss + self.gamma * mask_loss\r\n\r\n        #avg loss calculation\r\n        if self.avg_loss is None:\r\n            self.avg_loss = loss.data\r\n        else:\r\n            self.avg_loss = self.avg_loss * self.decayrate + loss.data*(1-self.decayrate)\r\n        chainer.reporter.report({'rpn_loc_loss':rpn_loc_loss,\r\n                                 'rpn_cls_loss':rpn_cls_loss,\r\n                                 'roi_loc_loss':roi_loc_loss,\r\n                                 'roi_cls_loss':roi_cls_loss,\r\n                                 'roi_mask_loss':self.gamma * mask_loss,\r\n                                 'avg_loss':self.avg_loss,\r\n                                 'loss':loss}, self)\r\n        return loss\r\n\r\n\r\ndef _smooth_l1_loss(x, t, in_weight, sigma):\r\n    sigma2 = sigma ** 2\r\n    diff = in_weight * (x - t)\r\n    abs_diff = F.absolute(diff)\r\n    flag = (abs_diff.data < (1. / sigma2)).astype(np.float32)\r\n    y = (flag * (sigma2 / 2.) * F.square(diff) +\r\n         (1 - flag) * (abs_diff - 0.5 / sigma2))\r\n    return F.sum(y)\r\n\r\ndef _fast_rcnn_loc_loss(pred_loc, gt_loc, gt_label, sigma):\r\n    xp = chainer.cuda.get_array_module(pred_loc)\r\n    in_weight = xp.zeros_like(gt_loc)\r\n    in_weight[gt_label > 0] = 1\r\n    loc_loss = _smooth_l1_loss(pred_loc, gt_loc, in_weight, sigma)\r\n    loc_loss /= xp.sum(gt_label >= 0)\r\n    return loc_loss\r\n"
  },
  {
    "path": "mask_rcnn_train_chain_batch.py",
    "content": "import numpy as np\r\n\r\nimport chainer\r\nfrom chainer import cuda\r\nimport chainer.functions as F\r\n\r\nfrom chainercv.links.model.faster_rcnn.utils.anchor_target_creator import AnchorTargetCreator\r\nfrom utils.proposal_target_creator import ProposalTargetCreator\r\nfrom chainer import computational_graph as c\r\nfrom chainercv.links import PixelwiseSoftmaxClassifier\r\n\r\nclass MaskRCNNTrainChain(chainer.Chain):\r\n    def __init__(self, mask_rcnn, rpn_sigma=3., roi_sigma=1., gamma=1,\r\n                 anchor_target_creator=AnchorTargetCreator(),\r\n                 roi_size=7):\r\n        super(MaskRCNNTrainChain, self).__init__()\r\n        with self.init_scope():\r\n            self.mask_rcnn = mask_rcnn\r\n        self.rpn_sigma = rpn_sigma\r\n        self.roi_sigma = roi_sigma\r\n        self.anchor_target_creator = anchor_target_creator\r\n        self.proposal_target_creator = ProposalTargetCreator(roi_size=roi_size)\r\n        self.loc_normalize_mean = mask_rcnn.loc_normalize_mean\r\n        self.loc_normalize_std = mask_rcnn.loc_normalize_std\r\n        self.decayrate=0.99\r\n        self.avg_loss = None\r\n        self.gamma=gamma\r\n    def __call__(self, imgs, bboxes, labels, scale, masks):\r\n\r\n        if isinstance(bboxes, chainer.Variable):\r\n            bboxes = bboxes.data\r\n        if isinstance(labels, chainer.Variable):\r\n            labels = labels.data\r\n        if isinstance(scale, chainer.Variable):\r\n            scale = scale.data\r\n        if isinstance(masks, chainer.Variable):\r\n            masks = masks.data\r\n        scale = np.asscalar(cuda.to_cpu(scale[0]))\r\n        n = bboxes.shape[0]\r\n        #if n != 1:\r\n        #    raise ValueError('only batch size 1 is supported')\r\n        _, _, H, W = imgs.shape\r\n        img_size = (H, W)\r\n        #Extractor (VGG) : img -> features\r\n        features = self.mask_rcnn.extractor(imgs)\r\n\r\n        #Region Proposal Network : features -> rpn_locs, rpn_scores, rois\r\n        rpn_loc_loss,rpn_cls_loss, roi_loc_loss, roi_cls_loss, mask_loss= 0,0,0,0,0    \r\n        for i in range(n):\r\n            rpn_locs, rpn_scores, rois, roi_indices, anchor = self.mask_rcnn.rpn(\r\n                features[i:i+1], img_size, scale)\r\n            bbox, label, mask, rpn_score, rpn_loc, roi = \\\r\n                bboxes[i], labels[i], masks[i], rpn_scores[0], rpn_locs[0], rois\r\n            mask[mask>1]=0\r\n            numdata = sum(label>=0)\r\n            label = label[0:numdata]\r\n            bbox = bbox[0:numdata]\r\n            mask = mask[0:numdata]\r\n            #proposal target : roi(proposed) , bbox(GT), label(GT) -> sample_roi, gt_roi_loc, gt_roi_label\r\n            #the targets are compared with the head output.\r\n            sample_roi, gt_roi_loc, gt_roi_label, gt_roi_mask = self.proposal_target_creator(\r\n            roi, bbox, label, mask, self.loc_normalize_mean, self.loc_normalize_std)\r\n            sample_roi_index = self.xp.zeros((len(sample_roi),), dtype=np.int32)\r\n\r\n            #Head Network : features, sample_roi -> roi_cls_loc, roi_score\r\n            roi_cls_loc, roi_score, roi_cls_mask = self.mask_rcnn.head(\r\n                features[i:i+1], sample_roi, sample_roi_index)\r\n\r\n            #RPN losses\r\n            gt_rpn_loc, gt_rpn_label = self.anchor_target_creator(bbox, anchor, img_size)\r\n            rpn_loc_loss += _fast_rcnn_loc_loss(rpn_loc, gt_rpn_loc, gt_rpn_label, self.rpn_sigma)\r\n            rpn_cls_loss += F.softmax_cross_entropy(rpn_score, gt_rpn_label)\r\n\r\n            #Head output losses\r\n            n_sample = roi_cls_loc.shape[0]\r\n            roi_cls_loc = roi_cls_loc.reshape((n_sample, -1, 4))\r\n            roi_loc = roi_cls_loc[self.xp.arange(n_sample), gt_roi_label] \r\n            roi_mask = roi_cls_mask[self.xp.arange(n_sample), gt_roi_label]\r\n            roi_loc_loss += _fast_rcnn_loc_loss(roi_loc, gt_roi_loc, gt_roi_label, self.roi_sigma)\r\n            roi_cls_loss += F.softmax_cross_entropy(roi_score, gt_roi_label)\r\n\r\n            #mask loss:  average binary cross-entropy loss\r\n            mask_loss += F.sigmoid_cross_entropy(roi_mask[0:gt_roi_mask.shape[0]], gt_roi_mask)\r\n\r\n        #total loss\r\n        loss = rpn_loc_loss + rpn_cls_loss + roi_loc_loss + roi_cls_loss + self.gamma * mask_loss\r\n        loss /= n\r\n\r\n        #avg loss calculation\r\n        if self.avg_loss is None:\r\n            self.avg_loss = loss.data\r\n        else:\r\n            self.avg_loss = self.avg_loss * self.decayrate + loss.data*(1-self.decayrate)\r\n        chainer.reporter.report({'rpn_loc_loss':rpn_loc_loss/n,\r\n                                 'rpn_cls_loss':rpn_cls_loss/n,\r\n                                 'roi_loc_loss':roi_loc_loss/n,\r\n                                 'roi_cls_loss':roi_cls_loss/n,\r\n                                 'roi_mask_loss':self.gamma * mask_loss/n,\r\n                                 'avg_loss':self.avg_loss,\r\n                                 'loss':loss}, self)\r\n        return loss\r\n\r\n\r\ndef _smooth_l1_loss(x, t, in_weight, sigma):\r\n    sigma2 = sigma ** 2\r\n    diff = in_weight * (x - t)\r\n    abs_diff = F.absolute(diff)\r\n    flag = (abs_diff.data < (1. / sigma2)).astype(np.float32)\r\n    y = (flag * (sigma2 / 2.) * F.square(diff) +\r\n         (1 - flag) * (abs_diff - 0.5 / sigma2))\r\n    return F.sum(y)\r\n\r\ndef _fast_rcnn_loc_loss(pred_loc, gt_loc, gt_label, sigma):\r\n    xp = chainer.cuda.get_array_module(pred_loc)\r\n    in_weight = xp.zeros_like(gt_loc)\r\n    in_weight[gt_label > 0] = 1\r\n    loc_loss = _smooth_l1_loss(pred_loc, gt_loc, in_weight, sigma)\r\n    loc_loss /= xp.sum(gt_label >= 0)\r\n    return loc_loss\r\n"
  },
  {
    "path": "train.py",
    "content": "import chainer\r\nfrom chainer import training\r\nfrom chainer.training import extensions, ParallelUpdater\r\nfrom chainer.training.triggers import ManualScheduleTrigger\r\nfrom chainer.datasets import TransformDataset\r\nfrom chainercv.datasets import VOCBboxDataset, voc_bbox_label_names\r\nfrom chainercv import transforms\r\nfrom chainercv.transforms.image.resize import resize\r\n\r\nimport argparse\r\nimport numpy as np\r\nimport time\r\n#from mask_rcnn_vgg import MaskRCNNVGG16\r\nfrom mask_rcnn_resnet import MaskRCNNResNet\r\nfrom coco_dataset import COCODataset\r\nfrom mask_rcnn_train_chain import MaskRCNNTrainChain\r\nfrom utils.bn_utils import freeze_bn, bn_to_affine\r\nfrom utils.cocoapi_evaluator import COCOAPIEvaluator\r\nfrom utils.detection_coco_evaluator import DetectionCOCOEvaluator\r\nimport logging\r\nimport traceback\r\nfrom utils.updater import SubDivisionUpdater\r\nimport cv2\r\n\r\ndef resize_bbox(bbox, in_size, out_size):\r\n    bbox_o = bbox.copy()\r\n    y_scale = float(out_size[0]) / in_size[0]\r\n    x_scale = float(out_size[1]) / in_size[1]\r\n    bbox_o[:, 0] = y_scale * bbox[:, 1]\r\n    bbox_o[:, 2] = y_scale * (bbox[:, 1]+bbox[:, 3])\r\n    bbox_o[:, 1] = x_scale * bbox[:, 0]\r\n    bbox_o[:, 3] = x_scale * (bbox[:, 0]+bbox[:, 2])\r\n    return bbox_o\r\n\r\ndef parse():\r\n    parser = argparse.ArgumentParser(\r\n        description='Mask RCNN trainer')\r\n    parser.add_argument('--dataset', choices=('coco2017'),\r\n                        default='coco2017')\r\n    parser.add_argument('--extractor', choices=('resnet50','resnet101'),\r\n                        default='resnet50', help='extractor network')\r\n    parser.add_argument('--gpu', '-g', type=int, default=0)\r\n    parser.add_argument('--lr', '-l', type=float, default=1e-4)\r\n    parser.add_argument('--batchsize', '-b', type=int, default=8)\r\n    parser.add_argument('--freeze_bn', action='store_true', default=False, help='freeze batchnorm gamma/beta')\r\n    parser.add_argument('--bn2affine', action='store_true', default=False, help='batchnorm to affine')\r\n    parser.add_argument('--out', '-o', default='result',\r\n                        help='Output directory')\r\n    parser.add_argument('--seed', '-s', type=int, default=0)\r\n    parser.add_argument('--roialign', action='store_false', default=True, help='default: True')\r\n    parser.add_argument('--lr_step', '-ls', type=int, default=120000)\r\n    parser.add_argument('--lr_initialchange', '-li', type=int, default=400)\r\n    parser.add_argument('--pretrained', '-p', type=str, default='imagenet')\r\n    parser.add_argument('--snapshot', type=int, default=4000)\r\n    parser.add_argument('--validation', type=int, default=30000)\r\n    parser.add_argument('--resume', type=str)\r\n    parser.add_argument('--iteration', '-i', type=int, default=180000)\r\n    parser.add_argument('--roi_size', '-r', type=int, default=14, help='ROI size for mask head input')\r\n    parser.add_argument('--gamma', type=float, default=1, help='mask loss weight')\r\n    return parser.parse_args()\r\n\r\nclass Transform(object):\r\n    def __init__(self, net, labelids):\r\n        self.net = net\r\n        self.labelids = labelids\r\n    def __call__(self, in_data):\r\n        if len(in_data)==5:\r\n            img, label, bbox, mask, i = in_data\r\n        elif len(in_data)==4:\r\n            img, bbox, label, i= in_data\r\n        label = [self.labelids.index(l) for l in label]\r\n        _, H, W = img.shape\r\n        if chainer.config.train:\r\n            img = self.net.prepare(img)\r\n        _, o_H, o_W = img.shape\r\n        scale = o_H / H\r\n        if len(bbox)==0:\r\n            return img, [],[],1\r\n        bbox = resize_bbox(bbox, (H, W), (o_H, o_W))\r\n        mask = resize(mask,(o_H, o_W))\r\n        if chainer.config.train:\r\n            #horizontal flip\r\n            img, params = transforms.random_flip(\r\n                img, x_random=True, return_param=True)\r\n            bbox = transforms.flip_bbox(\r\n                bbox, (o_H, o_W), x_flip=params['x_flip'])\r\n            mask = transforms.flip(mask, x_flip=params['x_flip'])\r\n        return img, bbox, label, scale, mask, i\r\n\r\ndef convert(batch, device):\r\n    return chainer.dataset.convert.concat_examples(batch, device, padding=-1)\r\n\r\ndef main():\r\n    args = parse()\r\n    np.random.seed(args.seed)\r\n    print('arguments: ', args)\r\n\r\n    # Model setup\r\n    if args.dataset == 'coco2017':\r\n        train_data = COCODataset()\r\n    test_data = COCODataset(json_file='instances_val2017.json', name='val2017', id_list_file='val2017.txt')\r\n    train_class_ids =train_data.class_ids\r\n    test_ids = test_data.ids\r\n    cocoanns = test_data.coco\r\n    if args.extractor=='vgg16':\r\n        mask_rcnn = MaskRCNNVGG16(n_fg_class=80, pretrained_model=args.pretrained, roi_size=args.roi_size, roi_align = args.roialign)\r\n    elif args.extractor=='resnet50':\r\n        mask_rcnn = MaskRCNNResNet(n_fg_class=80, pretrained_model=args.pretrained,roi_size=args.roi_size, n_layers=50, roi_align = args.roialign, class_ids=train_class_ids)\r\n    elif args.extractor=='resnet101':\r\n        mask_rcnn = MaskRCNNResNet(n_fg_class=80, pretrained_model=args.pretrained,roi_size=args.roi_size, n_layers=101, roi_align = args.roialign, class_ids=train_class_ids)\r\n    mask_rcnn.use_preset('evaluate')\r\n    model = MaskRCNNTrainChain(mask_rcnn, gamma=args.gamma, roi_size=args.roi_size)\r\n \r\n    # Trainer setup\r\n    if args.gpu >= 0:\r\n        chainer.cuda.get_device_from_id(args.gpu).use()\r\n        model.to_gpu()\r\n    optimizer = chainer.optimizers.MomentumSGD(lr=args.lr, momentum=0.9)\r\n    #optimizer = chainer.optimizers.Adam()#alpha=0.001, beta1=0.9, beta2=0.999 , eps=0.00000001)\r\n    optimizer.setup(model)\r\n    optimizer.add_hook(chainer.optimizer.WeightDecay(rate=0.0001))\r\n\r\n    train_data=TransformDataset(train_data, Transform(mask_rcnn, train_class_ids))\r\n    test_data=TransformDataset(test_data, Transform(mask_rcnn, train_class_ids))\r\n    train_iter = chainer.iterators.SerialIterator(\r\n        train_data, batch_size=args.batchsize)\r\n    test_iter = chainer.iterators.SerialIterator(\r\n        test_data, batch_size=1, repeat=False, shuffle=False)\r\n    updater = SubDivisionUpdater(train_iter, optimizer, device=args.gpu, subdivisions=args.batchsize)\r\n    #updater = ParallelUpdater(train_iter, optimizer, devices={\"main\": 0, \"second\": 1}, converter=convert ) #for training with multiple GPUs\r\n    trainer = training.Trainer(\r\n        updater, (args.iteration, 'iteration'), out=args.out)\r\n\r\n    # Extensions\r\n    trainer.extend(\r\n        extensions.snapshot_object(model.mask_rcnn, 'snapshot_model.npz'),\r\n        trigger=(args.snapshot, 'iteration'))\r\n    trainer.extend(extensions.ExponentialShift('lr', 10),\r\n                       trigger=ManualScheduleTrigger(\r\n                          [args.lr_initialchange], 'iteration'))\r\n    trainer.extend(extensions.ExponentialShift('lr', 0.1),\r\n                   trigger=(args.lr_step, 'iteration'))\r\n    if args.resume is not None:\r\n        chainer.serializers.load_npz(args.resume, model.mask_rcnn)\r\n    if args.freeze_bn:\r\n        freeze_bn(model.mask_rcnn)\r\n    if args.bn2affine:\r\n        bn_to_affine(model.mask_rcnn)\r\n    log_interval = 40, 'iteration'\r\n    plot_interval = 160, 'iteration'\r\n    print_interval = 40, 'iteration'\r\n\r\n    #trainer.extend(extensions.Evaluator(test_iter, model, device=args.gpu), trigger=(args.validation, 'iteration'))\r\n    #trainer.extend(DetectionCOCOEvaluator(test_iter, model.mask_rcnn), trigger=(args.validation, 'iteration')) #COCO AP Evaluator with VOC metric\r\n    trainer.extend(COCOAPIEvaluator(test_iter, model.mask_rcnn, test_ids, cocoanns), trigger=(args.validation, 'iteration')) #COCO AP Evaluator\r\n    trainer.extend(chainer.training.extensions.observe_lr(),\r\n                   trigger=log_interval)\r\n    trainer.extend(extensions.LogReport(trigger=log_interval))\r\n    trainer.extend(extensions.PrintReport(\r\n        ['iteration', 'epoch', 'elapsed_time', 'lr',\r\n         'main/loss',\r\n         'main/avg_loss',\r\n         'main/roi_loc_loss',\r\n         'main/roi_cls_loss',\r\n         'main/roi_mask_loss',\r\n         'main/rpn_loc_loss',\r\n         'main/rpn_cls_loss',\r\n         'validation/main/loss',\r\n         'validation/main/map',\r\n         ]), trigger=print_interval)\r\n    trainer.extend(extensions.ProgressBar(update_interval=1000))\r\n    #trainer.extend(extensions.dump_graph('main/loss'))\r\n    try:\r\n        trainer.run()\r\n    except:\r\n        traceback.print_exc()\r\n\r\nif __name__ == '__main__':\r\n    main()\r\n"
  },
  {
    "path": "utils/__init__.py",
    "content": ""
  },
  {
    "path": "utils/bn_utils.py",
    "content": "import numpy as np\nimport cupy\n\ndef freeze_bn(model):\n    # freeze batchnorm update \n    def disableupdate(block):\n        for name in block._forward:\n            l = getattr(block, name)\n            l.bn1.disable_update()   \n            l.bn2.disable_update()   \n            l.bn3.disable_update()   \n            if name=='a':\n                l.bn4.disable_update()\n    model.extractor.bn1.disable_update()  \n    disableupdate(model.extractor.res2)\n    disableupdate(model.extractor.res3)\n    disableupdate(model.extractor.res4)\n    disableupdate(model.head.res5)\n    print(\"batchnorm update disabled!\")\n\ndef bn_to_affine(model):\n    # change batchnorm layers to affine layers (mean -> 0, var -> 1)\n    def bn_to_affine_block(block):\n        for name in block._forward:\n            l = getattr(block, name)\n            l.bn1.avg_mean = cupy.zeros(l.bn1.avg_mean.shape, dtype=np.float32)\n            l.bn1.avg_var = cupy.ones(l.bn1.avg_var.shape, dtype=np.float32) - l.bn1.eps\n            l.bn2.avg_mean = cupy.zeros(l.bn2.avg_mean.shape, dtype=np.float32)\n            l.bn2.avg_var = cupy.ones(l.bn2.avg_var.shape, dtype=np.float32) - l.bn1.eps   \n            l.bn3.avg_mean = cupy.zeros(l.bn3.avg_mean.shape, dtype=np.float32) \n            l.bn3.avg_var = cupy.ones(l.bn3.avg_var.shape, dtype=np.float32) - l.bn1.eps  \n            if name=='a':\n                l.bn4.avg_mean = cupy.zeros(l.bn4.avg_mean.shape, dtype=np.float32) \n                l.bn4.avg_var = cupy.ones(l.bn4.avg_var.shape, dtype=np.float32) - l.bn1.eps \n    model.extractor.bn1.avg_mean = cupy.zeros(model.extractor.bn1.avg_mean.shape, dtype=np.float32)\n    model.extractor.bn1.avg_var = cupy.ones(model.extractor.bn1.avg_var.shape, dtype=np.float32) - model.extractor.bn1.eps \n    bn_to_affine_block(model.extractor.res2)\n    bn_to_affine_block(model.extractor.res3)\n    bn_to_affine_block(model.extractor.res4)\n    bn_to_affine_block(model.head.res5)\n    print(\"converted batchnorm to affine\")"
  },
  {
    "path": "utils/box_utils.py",
    "content": "import numpy as np\nimport cupy\nimport cv2\n\ndef resize_bbox(bbox, in_size, out_size):\n    bbox_o = bbox.copy()\n    y_scale = float(out_size[0]) / in_size[0]\n    x_scale = float(out_size[1]) / in_size[1]\n    bbox_o[:, 0] = y_scale * bbox[:, 1]\n    bbox_o[:, 2] = y_scale * (bbox[:, 1]+bbox[:, 3])\n    bbox_o[:, 1] = x_scale * bbox[:, 0]\n    bbox_o[:, 3] = x_scale * (bbox[:, 0]+bbox[:, 2])\n    return bbox_o\n\ndef bbox_yxyx2xywh(bbox):\n    bbox_o = bbox.copy()\n    bbox_o[:, 0] = bbox[:, 1]\n    bbox_o[:, 2] = bbox[:, 3] - bbox[:, 1]\n    bbox_o[:, 1] = bbox[:, 0]\n    bbox_o[:, 3] = bbox[:, 2] - bbox[:, 0]\n    return bbox_o\n\ndef im_mask(mask, size, bbox):\n    # bboxes are already clipped to [0, w], [0, h]\n    masksize = mask.shape[0]\n    # pad the mask to avoid cv2.resize artifacts \n    pmask = np.zeros((masksize + 2, masksize + 2), dtype=np.float32)\n    pmask[1:-1, 1:-1] = mask\n    # extend the boxhead\n    scale = (masksize + 2) / masksize\n    ex_w = (bbox[3] - bbox[1]) * scale\n    ex_h = (bbox[2] - bbox[0]) * scale\n    ex_x0 = (bbox[3] + bbox[1] - ex_w) / 2\n    ex_y0 = (bbox[2] + bbox[0] - ex_h) / 2\n    ex_x1 = (bbox[3] + bbox[1] + ex_w) / 2\n    ex_y1 = (bbox[2] + bbox[0] + ex_h) / 2\n    ex_bbox = np.asarray([ex_y0, ex_x0, ex_y1, ex_x1], dtype=np.int32)\n    # whole-image-sized mask \n    immask = np.zeros((size[0],size[1]), dtype=np.uint8)\n    x0, x1 = max(ex_bbox[1], 0), min(ex_bbox[3] + 1, size[1])\n    y0, y1= max(ex_bbox[0], 0), min(ex_bbox[2] + 1, size[0])\n    immask_roi = cv2.resize(pmask, (x1 - x0, y1 - y0))\n    immask[y0:y1, x0:x1] = np.round(immask_roi).astype(np.uint8)\n    return immask\n"
  },
  {
    "path": "utils/cocoapi_evaluator.py",
    "content": "import copy\nimport numpy as np\n\nfrom chainer import reporter\nimport chainer.training.extensions\n\nfrom utils import eval_detection_coco\nfrom chainercv.utils import apply_prediction_to_iterator\nimport pycocotools\nfrom pycocotools.coco import COCO\nfrom pycocotools.cocoeval import COCOeval\n\nclass COCOAPIEvaluator(chainer.training.extensions.Evaluator):\n    trigger = 1, 'epoch'\n    default_name = 'validation'\n    priority = chainer.training.PRIORITY_WRITER\n\n    def __init__(\n            self, iterator, target, ids, cocoanns, label_names=None):\n        super(COCOAPIEvaluator, self).__init__(\n            iterator, target)\n        self.ids = ids\n        self.cocoanns = cocoanns\n\n    def evaluate(self):\n        iterator = self._iterators['main']\n        target = self._targets['main']\n\n        annType = ['segm','bbox','keypoints']\n        if hasattr(iterator, 'reset'):\n            iterator.reset()\n            it = iterator\n        else:\n            it = copy.copy(iterator)\n\n        in_values, out_values, rest_values = apply_prediction_to_iterator(\n            target.predict, it)\n        # delete unused iterators explicitly\n        del in_values\n\n        pred_bboxes, pred_labels, pred_scores, pred_masks = out_values\n\n        if len(rest_values) == 3:\n            gt_bboxes, gt_labels, gt_difficults = rest_values\n        elif len(rest_values) == 2:\n            gt_bboxes, gt_labels = rest_values\n            gt_difficults = None\n        elif len(rest_values) == 5:\n            gt_bboxes, gt_labels, _, _, i = rest_values\n            gt_difficults = None\n        pred_bboxes = iter(list(pred_bboxes))\n        pred_labels = iter(list(pred_labels))\n        pred_scores = iter(list(pred_scores))\n        gt_bboxes = iter(list(gt_bboxes))\n        gt_labels = iter(list(gt_labels))\n        data_dict = []\n        for i, (pred_bbox, pred_label, pred_score, pred_mask) in \\\n            enumerate(zip(pred_bboxes, pred_labels, pred_scores, pred_masks)):\n            for bbox, label, score, mask in zip(pred_bbox, pred_label, pred_score, pred_mask):\n                A={\"image_id\":int(self.ids[i]), \"category_id\":int(label), \"bbox\":bbox.tolist(),\n                 \"score\":float(score), \"segmentation\": mask}\n                data_dict.append(A)\n        if len(data_dict)>0:\n            for i in range(2):  # 'segm','bbox'\n                cocoGt=self.cocoanns\n                cocoDt=cocoGt.loadRes(data_dict)\n                cocoEval = COCOeval(self.cocoanns, cocoDt, annType[i])\n                cocoEval.params.imgIds  = [int(id_) for id_ in self.ids]\n                cocoEval.evaluate()\n                cocoEval.accumulate()\n                cocoEval.summarize()\n            report = {'map': cocoEval.stats[0]} # report COCO AP (IoU=0.5:0:95)\n        else:\n            report = {'map': 0}\n        observation = {}\n        with reporter.report_scope(observation):\n            reporter.report(report, target)\n        return observation"
  },
  {
    "path": "utils/detection_coco_evaluator.py",
    "content": "import copy\nimport numpy as np\n\nfrom chainer import reporter\nimport chainer.training.extensions\n\nfrom utils import eval_detection_coco\nfrom chainercv.utils import apply_prediction_to_iterator\n\n\nclass DetectionCOCOEvaluator(chainer.training.extensions.Evaluator):\n\n    \"\"\"An extension that evaluates a detection model by PASCAL VOC metric.\n\n    This extension iterates over an iterator and evaluates the prediction\n    results by average precisions (APs) and mean of them\n    (mean Average Precision, mAP).\n    This extension reports the following values with keys.\n    Please note that :obj:`'ap/<label_names[l]>'` is reported only if\n    :obj:`label_names` is specified.\n\n    * :obj:`'map'`: Mean of average precisions (mAP).\n    * :obj:`'ap/<label_names[l]>'`: Average precision for class \\\n        :obj:`label_names[l]`, where :math:`l` is the index of the class. \\\n        For example, this evaluator reports :obj:`'ap/aeroplane'`, \\\n        :obj:`'ap/bicycle'`, etc. if :obj:`label_names` is \\\n        :obj:`~chainercv.datasets.voc_bbox_label_names`. \\\n        If there is no bounding box assigned to class :obj:`label_names[l]` \\\n        in either ground truth or prediction, it reports :obj:`numpy.nan` as \\\n        its average precision. \\\n        In this case, mAP is computed without this class.\n\n    Args:\n        iterator (chainer.Iterator): An iterator. Each sample should be\n            following tuple :obj:`img, bbox, label` or\n            :obj:`img, bbox, label, difficult`.\n            :obj:`img` is an image, :obj:`bbox` is coordinates of bounding\n            boxes, :obj:`label` is labels of the bounding boxes and\n            :obj:`difficult` is whether the bounding boxes are difficult or\n            not. If :obj:`difficult` is returned, difficult ground truth\n            will be ignored from evaluation.\n        target (chainer.Link): A detection link. This link must have\n            :meth:`predict` method that takes a list of images and returns\n            :obj:`bboxes`, :obj:`labels` and :obj:`scores`.\n        use_07_metric (bool): Whether to use PASCAL VOC 2007 evaluation metric\n            for calculating average precision. The default value is\n            :obj:`False`.\n        label_names (iterable of strings): An iterable of names of classes.\n            If this value is specified, average precision for each class is\n            also reported with the key :obj:`'ap/<label_names[l]>'`.\n\n    \"\"\"\n\n    trigger = 1, 'epoch'\n    default_name = 'validation'\n    priority = chainer.training.PRIORITY_WRITER\n\n    def __init__(\n            self, iterator, target, use_07_metric=False, label_names=None):\n        super(DetectionCOCOEvaluator, self).__init__(\n            iterator, target)\n        self.use_07_metric = use_07_metric\n        self.label_names = ['background',  # class zero\n            'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',\n            'fire hydrant', 'street sign', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',\n            'elephant', 'bear', 'zebra', 'giraffe', 'hat', 'backpack', 'umbrella', 'shoe', 'eye glasses', 'handbag', 'tie', 'suitcase', 'frisbee',\n            'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle',\n            'plate', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',\n            'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed',\n            'mirror', 'dining table', 'window', 'desk','toilet', 'door', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone', 'microwave', 'oven',\n            'toaster', 'sink', 'refrigerator', 'blender', 'book', 'clock', 'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush']\n\n    def evaluate(self):\n        iterator = self._iterators['main']\n        target = self._targets['main']\n\n        if hasattr(iterator, 'reset'):\n            iterator.reset()\n            it = iterator\n        else:\n            it = copy.copy(iterator)\n\n        in_values, out_values, rest_values = apply_prediction_to_iterator(\n            target.predict, it)\n        # delete unused iterators explicitly\n        del in_values\n\n        pred_bboxes, _, pred_labels, pred_scores, _ = out_values\n\n        if len(rest_values) == 3:\n            gt_bboxes, gt_labels, gt_difficults = rest_values\n        elif len(rest_values) == 2:\n            gt_bboxes, gt_labels = rest_values\n            gt_difficults = None\n        elif len(rest_values) == 5:\n            gt_bboxes, gt_labels, _, _, i = rest_values\n            gt_difficults = None\n\n        result = eval_detection_coco.eval_detection_coco(\n            pred_bboxes, pred_labels, pred_scores,\n            gt_bboxes, gt_labels, gt_difficults,\n            use_07_metric=self.use_07_metric)\n\n        report = {'map': result['map']}\n\n        if self.label_names is not None:\n            for l, label_name in enumerate(self.label_names):\n                try:\n                    report['ap/{:s}'.format(label_name)] = result['ap'][l]\n                except IndexError:\n                    report['ap/{:s}'.format(label_name)] = np.nan\n        if True:\n            print(report)\n\n        observation = {}\n        with reporter.report_scope(observation):\n            reporter.report(report, target)\n        return observation"
  },
  {
    "path": "utils/detectron_parser.py",
    "content": "import numpy as np\nimport os\npath = os.path.join(os.path.dirname(__file__), '../')\nimport sys\nsys.path.append(path)\nfrom mask_rcnn_resnet import MaskRCNNResNet\nfrom chainer import serializers\nimport pickle\n\nmodel = MaskRCNNResNet(n_fg_class=80, roi_size=14, pretrained_model='auto', anchor_scales=[2, 4, 8, 16, 32], n_layers=50, class_ids=[[1]])\n\nmodeldir = \"modelfiles\"\nif os.path.exists(modeldir)==False:\n    os.mkdir(modeldir)\n    \n# resnet50, end-to-end, C4\nd_model_file = \"modelfiles/model_final.pkl\"\nc_model_file = \"modelfiles/e2e_mask_rcnn_R-50-C4_1x_d2c.npz\"\n\nwith open(d_model_file, 'rb') as f:\n    d = pickle.load(f, encoding='latin-1')['blobs']\nd_key  = sorted(d)\n\nparsecount = 0\nfor bl in d_key:\n    if 'res' in bl:\n        stage = bl[3] # resnet stage, 2, 3, 4, 5\n        block = bl[5] # resnet block, a or b\n        if stage=='_': # non-resnet layers\n            continue\n        else:\n            stage = int(stage) - 1\n            if stage == 4:\n                netname='head'\n            else:\n                netname='extractor'\n            if 'branch2a' in bl:\n                c_nlayer = 1\n            elif 'branch2b' in bl:\n                c_nlayer = 2\n            elif 'branch2c' in bl:\n                c_nlayer = 3\n            elif 'branch1' in bl:\n                c_nlayer = 4\n            else:\n                c_nlayer = 0\n            \n            # do not copy\n            if bl.endswith('_b') and 'bn_b' not in bl:\n                continue\n            if 'momentum' in bl:\n                continue\n            \n            # conv / bn gamma / bn beta\n            if '_w' in bl:\n                c_kind = 'conv%d.W' % c_nlayer\n            elif 'bn_s' in bl:\n                c_kind = 'bn%d.gamma' % c_nlayer\n            elif 'bn_b' in bl:\n                c_kind = 'bn%d.beta' % c_nlayer\n                \n            # chainer block kind\n            if block == '0':\n                c_block = 'a'\n            else:\n                c_block = 'b'+block\n            \n            # shape checker\n            exec(\"c_shape = model.%s.res%d.%s.%s.data.shape\" % (netname, stage + 1, c_block, c_kind))\n            exec(\"d_shape = d['%s'].shape\" % bl)\n            if c_shape == d_shape:\n                # execute copy\n                txt = \"model.%s.res%d.%s.%s.data = d['%s']\" % (netname, stage + 1, c_block, c_kind, bl )\n                print(txt)\n                exec(txt)\n                parsecount += 1\n            else:\n                print(\"shape mismatch error!\")\n\n# copy the other layers\nlayer_pairs = \\\n[('extractor.conv1.W', 'conv1_w'), ('extractor.bn1.gamma', 'res_conv1_bn_s'), ('extractor.bn1.beta', 'res_conv1_bn_b'),\n ('rpn.conv1.W', 'conv_rpn_w'), ('rpn.conv1.b', 'conv_rpn_b'), \n ('rpn.loc.W', 'rpn_bbox_pred_w'), ('rpn.loc.b', 'rpn_bbox_pred_b'), \n ('rpn.score.W', 'rpn_cls_logits_w'), ('rpn.score.b', 'rpn_cls_logits_b'), \n ('head.score.W', 'cls_score_w'), ('head.score.b', 'cls_score_b'), \n ('head.cls_loc.W', 'bbox_pred_w'), ('head.cls_loc.b', 'bbox_pred_b'), \n ('head.deconvm1.W', 'conv5_mask_w'), ('head.deconvm1.b', 'conv5_mask_b'),\n ('head.convm2.W', 'mask_fcn_logits_w'), ('head.convm2.b', 'mask_fcn_logits_b'),\n]\n\ndef xytrans(src):\n    sh = src.shape\n    dst = src.reshape(sh[0]//4, 4, -1)[:,[1, 0, 3, 2]].reshape(sh)\n    return dst\n\nfor layer_pair in layer_pairs:\n    exec(\"c_shape = model.%s.data.shape\" % layer_pair[0])\n    exec(\"d_shape = d['%s'].shape\" % layer_pair[1])\n    if 'bbox_pred' in layer_pair[1]:\n        d[layer_pair[1]] = xytrans(d[layer_pair[1]])\n    if c_shape == d_shape:\n        txt = \"model.%s.data = d['%s']\" % layer_pair\n        print(txt)\n        exec(txt)\n        parsecount += 1\n    else:\n        print(\"shape mismatch error!\")\n\nprint(parsecount, \" layers copied\")\nserializers.save_npz(c_model_file, model)\nprint(\"save weights file to a chainer model\", c_model_file)"
  },
  {
    "path": "utils/eval_detection_coco.py",
    "content": "from __future__ import division\n\nfrom collections import defaultdict\nimport itertools\nimport numpy as np\nimport six\n\nfrom chainercv.utils.bbox.bbox_iou import bbox_iou\n\n\ndef eval_detection_coco(\n        pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,\n        gt_difficults=None,\n        iou_thresh=0.5, use_07_metric=False):\n    \"\"\"Calculate average precisions based on evaluation code of PASCAL VOC.\n\n    This function evaluates predicted bounding boxes obtained from a dataset\n    which has :math:`N` images by using average precision for each class.\n    The code is based on the evaluation code used in PASCAL VOC Challenge.\n\n    Args:\n        pred_bboxes (iterable of numpy.ndarray): An iterable of :math:`N`\n            sets of bounding boxes.\n            Its index corresponds to an index for the base dataset.\n            Each element of :obj:`pred_bboxes` is a set of coordinates\n            of bounding boxes. This is an array whose shape is :math:`(R, 4)`,\n            where :math:`R` corresponds\n            to the number of bounding boxes, which may vary among boxes.\n            The second axis corresponds to\n            :math:`y_{min}, x_{min}, y_{max}, x_{max}` of a bounding box.\n        pred_labels (iterable of numpy.ndarray): An iterable of labels.\n            Similar to :obj:`pred_bboxes`, its index corresponds to an\n            index for the base dataset. Its length is :math:`N`.\n        pred_scores (iterable of numpy.ndarray): An iterable of confidence\n            scores for predicted bounding boxes. Similar to :obj:`pred_bboxes`,\n            its index corresponds to an index for the base dataset.\n            Its length is :math:`N`.\n        gt_bboxes (iterable of numpy.ndarray): An iterable of ground truth\n            bounding boxes\n            whose length is :math:`N`. An element of :obj:`gt_bboxes` is a\n            bounding box whose shape is :math:`(R, 4)`. Note that the number of\n            bounding boxes in each image does not need to be same as the number\n            of corresponding predicted boxes.\n        gt_labels (iterable of numpy.ndarray): An iterable of ground truth\n            labels which are organized similarly to :obj:`gt_bboxes`.\n        gt_difficults (iterable of numpy.ndarray): An iterable of boolean\n            arrays which is organized similarly to :obj:`gt_bboxes`.\n            This tells whether the\n            corresponding ground truth bounding box is difficult or not.\n            By default, this is :obj:`None`. In that case, this function\n            considers all bounding boxes to be not difficult.\n        iou_thresh (float): A prediction is correct if its Intersection over\n            Union with the ground truth is above this value.\n        use_07_metric (bool): Whether to use PASCAL VOC 2007 evaluation metric\n            for calculating average precision. The default value is\n            :obj:`False`.\n\n    Returns:\n        dict:\n\n        The keys, value-types and the description of the values are listed\n        below.\n\n        * **ap** (*numpy.ndarray*): An array of average precisions. \\\n            The :math:`l`-th value corresponds to the average precision \\\n            for class :math:`l`. If class :math:`l` does not exist in \\\n            either :obj:`pred_labels` or :obj:`gt_labels`, the corresponding \\\n            value is set to :obj:`numpy.nan`.\n        * **map** (*float*): The average of Average Precisions over classes.\n\n    \"\"\"\n\n    prec, rec = calc_detection_coco_prec_rec(\n        pred_bboxes, pred_labels, pred_scores,\n        gt_bboxes, gt_labels, gt_difficults,\n        iou_thresh=iou_thresh)\n\n    ap = calc_detection_coco_ap(prec, rec, use_07_metric=use_07_metric)\n    #for name, ap0 in zip(coconames, ap):\n    #    if ~(ap0==ap0):\n    #        ap0 = -1\n    #    apresults.append([name, ap0])\n    #print(\"average precision evaluation results: \", apresults)\n\n\n    return {'ap': ap, 'map': np.nanmean(ap)}\n\n\ndef calc_detection_coco_prec_rec(\n        pred_bboxes, pred_labels, pred_scores, gt_bboxes, gt_labels,\n        gt_difficults=None,\n        iou_thresh=0.5):\n    \"\"\"Calculate precision and recall based on evaluation code of PASCAL VOC.\n\n    This function calculates precision and recall of\n    predicted bounding boxes obtained from a dataset which has :math:`N`\n    images.\n    The code is based on the evaluation code used in PASCAL VOC Challenge.\n\n    Args:\n        pred_bboxes (iterable of numpy.ndarray): An iterable of :math:`N`\n            sets of bounding boxes.\n            Its index corresponds to an index for the base dataset.\n            Each element of :obj:`pred_bboxes` is a set of coordinates\n            of bounding boxes. This is an array whose shape is :math:`(R, 4)`,\n            where :math:`R` corresponds\n            to the number of bounding boxes, which may vary among boxes.\n            The second axis corresponds to\n            :math:`y_{min}, x_{min}, y_{max}, x_{max}` of a bounding box.\n        pred_labels (iterable of numpy.ndarray): An iterable of labels.\n            Similar to :obj:`pred_bboxes`, its index corresponds to an\n            index for the base dataset. Its length is :math:`N`.\n        pred_scores (iterable of numpy.ndarray): An iterable of confidence\n            scores for predicted bounding boxes. Similar to :obj:`pred_bboxes`,\n            its index corresponds to an index for the base dataset.\n            Its length is :math:`N`.\n        gt_bboxes (iterable of numpy.ndarray): An iterable of ground truth\n            bounding boxes\n            whose length is :math:`N`. An element of :obj:`gt_bboxes` is a\n            bounding box whose shape is :math:`(R, 4)`. Note that the number of\n            bounding boxes in each image does not need to be same as the number\n            of corresponding predicted boxes.\n        gt_labels (iterable of numpy.ndarray): An iterable of ground truth\n            labels which are organized similarly to :obj:`gt_bboxes`.\n        gt_difficults (iterable of numpy.ndarray): An iterable of boolean\n            arrays which is organized similarly to :obj:`gt_bboxes`.\n            This tells whether the\n            corresponding ground truth bounding box is difficult or not.\n            By default, this is :obj:`None`. In that case, this function\n            considers all bounding boxes to be not difficult.\n        iou_thresh (float): A prediction is correct if its Intersection over\n            Union with the ground truth is above this value..\n\n    Returns:\n        tuple of two lists:\n        This function returns two lists: :obj:`prec` and :obj:`rec`.\n\n        * :obj:`prec`: A list of arrays. :obj:`prec[l]` is precision \\\n            for class :math:`l`. If class :math:`l` does not exist in \\\n            either :obj:`pred_labels` or :obj:`gt_labels`, :obj:`prec[l]` is \\\n            set to :obj:`None`.\n        * :obj:`rec`: A list of arrays. :obj:`rec[l]` is recall \\\n            for class :math:`l`. If class :math:`l` that is not marked as \\\n            difficult does not exist in \\\n            :obj:`gt_labels`, :obj:`rec[l]` is \\\n            set to :obj:`None`.\n\n    \"\"\"\n\n    pred_bboxes = iter(list(pred_bboxes))\n    pred_labels = iter(list(pred_labels))\n    pred_scores = iter(list(pred_scores))\n    gt_bboxes = iter(list(gt_bboxes))\n    gt_labels = iter(list(gt_labels))\n    if gt_difficults is None:\n        gt_difficults = itertools.repeat(None)\n    else:\n        gt_difficults = iter(gt_difficults)\n\n    n_pos = defaultdict(int)\n    score = defaultdict(list)\n    match = defaultdict(list)\n\n    for pred_bbox, pred_label, pred_score, gt_bbox, gt_label, gt_difficult in \\\n        six.moves.zip(\n            pred_bboxes, pred_labels, pred_scores,\n            gt_bboxes, gt_labels, gt_difficults):\n\n        if gt_difficult is None:\n            gt_difficult = np.zeros(gt_bbox.shape[0], dtype=bool)\n\n        for l in np.unique(np.concatenate((pred_label, gt_label)).astype(int)):\n            pred_mask_l = pred_label == l\n            pred_bbox_l = pred_bbox[pred_mask_l]\n            pred_score_l = pred_score[pred_mask_l]\n            # sort by score\n            order = pred_score_l.argsort()[::-1]\n            pred_bbox_l = pred_bbox_l[order]\n            pred_score_l = pred_score_l[order]\n\n            gt_mask_l = gt_label == l\n            gt_bbox_l = gt_bbox[gt_mask_l]\n            gt_difficult_l = gt_difficult[gt_mask_l]\n\n            n_pos[l] += np.logical_not(gt_difficult_l).sum()\n            score[l].extend(pred_score_l)\n\n            if len(pred_bbox_l) == 0:\n                continue\n            if len(gt_bbox_l) == 0:\n                match[l].extend((0,) * pred_bbox_l.shape[0])\n                continue\n\n            # VOC evaluation follows integer typed bounding boxes.\n            pred_bbox_l = pred_bbox_l.copy()\n            pred_bbox_l[:, 2:] += 1\n            gt_bbox_l = gt_bbox_l.copy()\n            gt_bbox_l[:, 2:] += 1\n\n            iou = bbox_iou(pred_bbox_l, gt_bbox_l)\n            gt_index = iou.argmax(axis=1)\n            # set -1 if there is no matching ground truth\n            gt_index[iou.max(axis=1) < iou_thresh] = -1\n            del iou\n\n            selec = np.zeros(gt_bbox_l.shape[0], dtype=bool)\n            for gt_idx in gt_index:\n                if gt_idx >= 0:\n                    if gt_difficult_l[gt_idx]:\n                        match[l].append(-1)\n                    else:\n                        if not selec[gt_idx]:\n                            match[l].append(1)\n                        else:\n                            match[l].append(0)\n                    selec[gt_idx] = True\n                else:\n                    match[l].append(0)\n\n    for iter_ in (\n            pred_bboxes, pred_labels, pred_scores,\n            gt_bboxes, gt_labels, gt_difficults):\n        if next(iter_, None) is not None:\n            raise ValueError('Length of input iterables need to be same.')\n\n    n_fg_class = max(n_pos.keys()) + 1\n    prec = [None] * n_fg_class\n    rec = [None] * n_fg_class\n\n    for l in n_pos.keys():\n        score_l = np.array(score[l])\n        match_l = np.array(match[l], dtype=np.int8)\n\n        order = score_l.argsort()[::-1]\n        match_l = match_l[order]\n\n        tp = np.cumsum(match_l == 1)\n        fp = np.cumsum(match_l == 0)\n\n        # If an element of fp + tp is 0,\n        # the corresponding element of prec[l] is nan.\n        prec[l] = tp / (fp + tp)\n        # If n_pos[l] is 0, rec[l] is None.\n        if n_pos[l] > 0:\n            rec[l] = tp / n_pos[l]\n\n    return prec, rec\n\n\ndef calc_detection_coco_ap(prec, rec, use_07_metric=False):\n    \"\"\"Calculate average precisions based on evaluation code of PASCAL VOC.\n\n    This function calculates average precisions\n    from given precisions and recalls.\n    The code is based on the evaluation code used in PASCAL VOC Challenge.\n\n    Args:\n        prec (list of numpy.array): A list of arrays.\n            :obj:`prec[l]` indicates precision for class :math:`l`.\n            If :obj:`prec[l]` is :obj:`None`, this function returns\n            :obj:`numpy.nan` for class :math:`l`.\n        rec (list of numpy.array): A list of arrays.\n            :obj:`rec[l]` indicates recall for class :math:`l`.\n            If :obj:`rec[l]` is :obj:`None`, this function returns\n            :obj:`numpy.nan` for class :math:`l`.\n        use_07_metric (bool): Whether to use PASCAL VOC 2007 evaluation metric\n            for calculating average precision. The default value is\n            :obj:`False`.\n\n    Returns:\n        ~numpy.ndarray:\n        This function returns an array of average precisions.\n        The :math:`l`-th value corresponds to the average precision\n        for class :math:`l`. If :obj:`prec[l]` or :obj:`rec[l]` is\n        :obj:`None`, the corresponding value is set to :obj:`numpy.nan`.\n\n    \"\"\"\n\n    n_fg_class = len(prec)\n    ap = np.empty(n_fg_class)\n    for l in six.moves.range(n_fg_class):\n        if prec[l] is None or rec[l] is None:\n            ap[l] = np.nan\n            continue\n\n        if use_07_metric:\n            # 11 point metric\n            ap[l] = 0\n            for t in np.arange(0., 1.1, 0.1):\n                if np.sum(rec[l] >= t) == 0:\n                    p = 0\n                else:\n                    p = np.max(np.nan_to_num(prec[l])[rec[l] >= t])\n                ap[l] += p / 11\n        else:\n            # correct AP calculation\n            # first append sentinel values at the end\n            mpre = np.concatenate(([0], np.nan_to_num(prec[l]), [0]))\n            mrec = np.concatenate(([0], rec[l], [1]))\n\n            mpre = np.maximum.accumulate(mpre[::-1])[::-1]\n\n            # to calculate area under PR curve, look for points\n            # where X axis (recall) changes value\n            i = np.where(mrec[1:] != mrec[:-1])[0]\n\n            # and sum (\\Delta recall) * prec\n            ap[l] = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])\n\n    return ap"
  },
  {
    "path": "utils/makecocolist.py",
    "content": "import glob\nfnames = glob.glob('COCO/train2017/*.jpg')\n\nwith open(\"COCO/train2017.txt\", \"w\") as f:\n    for fname in fnames:\n        f.write(fname.split('/')[-1].split('.')[0]+'\\n')\nf.close()\n\nfnames = glob.glob('COCO/val2017/*.jpg')\n\nwith open(\"COCO/val2017.txt\", \"w\") as f:\n    for i, fname in enumerate(fnames):\n        f.write(fname.split('/')[-1].split('.')[0]+'\\n')\n        if i > 1000:\n            break\nf.close()\n\n"
  },
  {
    "path": "utils/proposal_target_creator.py",
    "content": "import numpy as np\n\nfrom chainer import cuda\n\nfrom chainercv.links.model.faster_rcnn.utils.bbox2loc import bbox2loc\nfrom chainercv.utils.bbox.bbox_iou import bbox_iou\nimport cv2\n\n\nclass ProposalTargetCreator(object):\n    \"\"\"Assign ground truth bounding boxes to given RoIs.\n\n    The :meth:`__call__` of this class generates training targets\n    for each object proposal.\n    This is used to train Faster RCNN [#]_.\n\n    .. [#] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. \\\n    Faster R-CNN: Towards Real-Time Object Detection with \\\n    Region Proposal Networks. NIPS 2015.\n\n    Args:\n        n_sample (int): The number of sampled regions.\n        pos_ratio (float): Fraction of regions that is labeled as a\n            foreground.\n        pos_iou_thresh (float): IoU threshold for a RoI to be considered as a\n            foreground.\n        neg_iou_thresh_hi (float): RoI is considered to be the background\n            if IoU is in\n            [:obj:`neg_iou_thresh_hi`, :obj:`neg_iou_thresh_hi`).\n        neg_iou_thresh_lo (float): See above.\n\n    \"\"\"\n\n    def __init__(self,\n                 n_sample=128,\n                 pos_ratio=0.25, pos_iou_thresh=0.5,\n                 neg_iou_thresh_hi=0.5, neg_iou_thresh_lo=0.0,\n                 roi_size=7\n                 ):\n        self.roi_size=roi_size\n        self.n_sample = n_sample\n        self.pos_ratio = pos_ratio\n        self.pos_iou_thresh = pos_iou_thresh\n        self.neg_iou_thresh_hi = neg_iou_thresh_hi\n        self.neg_iou_thresh_lo = neg_iou_thresh_lo\n\n    def __call__(self, roi, bbox, label, mask,\n                 loc_normalize_mean=(0., 0., 0., 0.),\n                 loc_normalize_std=(0.1, 0.1, 0.2, 0.2)):\n        \"\"\"Assigns ground truth to sampled proposals.\n\n        This function samples total of :obj:`self.n_sample` RoIs\n        from the combination of :obj:`roi` and :obj:`bbox`.\n        The RoIs are assigned with the ground truth class labels as well as\n        bounding box offsets and scales to match the ground truth bounding\n        boxes. As many as :obj:`pos_ratio * self.n_sample` RoIs are\n        sampled as foregrounds.\n\n        Offsets and scales of bounding boxes are calculated using\n        :func:`chainercv.links.model.faster_rcnn.bbox2loc`.\n        Also, types of input arrays and output arrays are same.\n\n        Here are notations.\n\n        * :math:`S` is the total number of sampled RoIs, which equals \\\n            :obj:`self.n_sample`.\n        * :math:`L` is number of object classes possibly including the \\\n            background.\n\n        Args:\n            roi (array): Region of Interests (RoIs) from which we sample.\n                Its shape is :math:`(R, 4)`\n            bbox (array): The coordinates of ground truth bounding boxes.\n                Its shape is :math:`(R', 4)`.\n            label (array): Ground truth bounding box labels. Its shape\n                is :math:`(R',)`. Its range is :math:`[0, L - 1]`, where\n                :math:`L` is the number of foreground classes.\n            loc_normalize_mean (tuple of four floats): Mean values to normalize\n                coordinates of bouding boxes.\n            loc_normalize_std (tupler of four floats): Standard deviation of\n                the coordinates of bounding boxes.\n\n        Returns:\n            (array, array, array):\n\n            * **sample_roi**: Regions of interests that are sampled. \\\n                Its shape is :math:`(S, 4)`.\n            * **gt_roi_loc**: Offsets and scales to match \\\n                the sampled RoIs to the ground truth bounding boxes. \\\n                Its shape is :math:`(S, 4)`.\n            * **gt_roi_label**: Labels assigned to sampled RoIs. Its shape is \\\n                :math:`(S,)`. Its range is :math:`[0, L]`. The label with \\\n                value 0 is the background.\n\n        \"\"\"\n        xp = cuda.get_array_module(roi)\n        roi = cuda.to_cpu(roi)\n        bbox = cuda.to_cpu(bbox)\n        label = cuda.to_cpu(label)\n        mask = cuda.to_cpu(mask)\n\n        n_bbox, _ = bbox.shape\n        roi = np.concatenate((roi, bbox), axis=0)\n\n        pos_roi_per_image = np.round(self.n_sample * self.pos_ratio)\n        iou = bbox_iou(roi, bbox)\n        gt_assignment = iou.argmax(axis=1)\n        max_iou = iou.max(axis=1)\n\n        # Offset range of classes from [0, n_fg_class - 1] to [1, n_fg_class].\n        # The label with value 0 is the background.\n        gt_roi_label = label[gt_assignment] + 1\n\n        # Select foreground RoIs as those with >= pos_iou_thresh IoU.\n        pos_index = np.where(max_iou >= self.pos_iou_thresh)[0]\n        pos_roi_per_this_image = int(min(pos_roi_per_image, pos_index.size))\n        if pos_index.size > 0:\n            pos_index = np.random.choice(\n                pos_index, size=pos_roi_per_this_image, replace=False)\n\n        # Select background RoIs as those within\n        # [neg_iou_thresh_lo, neg_iou_thresh_hi).\n        neg_index = np.where((max_iou < self.neg_iou_thresh_hi) &\n                             (max_iou >= self.neg_iou_thresh_lo))[0]\n        neg_roi_per_this_image = self.n_sample - pos_roi_per_this_image\n        neg_roi_per_this_image = int(min(neg_roi_per_this_image,\n                                         neg_index.size))\n        if neg_index.size > 0:\n            neg_index = np.random.choice(\n                neg_index, size=neg_roi_per_this_image, replace=False)\n\n        # The indices that we're selecting (both positive and negative).\n        keep_index = np.append(pos_index, neg_index)\n        gt_roi_label = gt_roi_label[keep_index]\n        gt_roi_label[pos_roi_per_this_image:] = 0  # negative labels --> 0\n        sample_roi = roi[keep_index]# sampled <- proposed\n\n        # Compute offsets and scales to match sampled RoIs to the GTs.\n        gt_roi_loc = bbox2loc(sample_roi, bbox[gt_assignment[keep_index]])\n        gt_roi_loc = ((gt_roi_loc - np.array(loc_normalize_mean, np.float32)\n                       ) / np.array(loc_normalize_std, np.float32))\n        \n        # Prepare groundtruth masks\n        gt_roi_mask=[]\n        _, h, w = mask.shape\n        for i , idx in enumerate(gt_assignment[pos_index]):\n            A=mask[idx, np.max((int(sample_roi[i,0]),0)):np.min((int(sample_roi[i,2]),h)), np.max((int(sample_roi[i,1]),0)):np.min((int(sample_roi[i,3]),w))]\n            gt_roi_mask.append(cv2.resize(A, (self.roi_size*2,self.roi_size*2)))\n        #debug: visualize masks\n        #cv2.imwrite(\"gt_assignment_mask.png\",mask[0,np.max((int(sample_roi[0,0]),0)):np.min((int(sample_roi[0,2]),h)), np.max((int(sample_roi[0,1]),0)):np.min((int(sample_roi[0,3]),w))]*255)\n        #cv2.imwrite(\"gt_roi_mask.png\",gt_roi_mask[0]*244)#\n\n        if xp != np:\n            sample_roi = cuda.to_gpu(sample_roi)\n            gt_roi_loc = cuda.to_gpu(gt_roi_loc)\n            gt_roi_label = cuda.to_gpu(gt_roi_label) \n            gt_roi_mask = cuda.to_gpu(np.stack(gt_roi_mask).astype(np.int32))\n        else:\n            gt_roi_mask = np.stack(gt_roi_mask).astype(np.int32)\n        return sample_roi, gt_roi_loc, gt_roi_label, gt_roi_mask\n"
  },
  {
    "path": "utils/region_proposal_network.py",
    "content": "import numpy as np\n\nimport chainer\nfrom chainer import cuda\nimport chainer.functions as F\nimport chainer.links as L\n\nfrom chainercv.links.model.faster_rcnn.utils.generate_anchor_base import \\\n    generate_anchor_base\nfrom chainercv.links.model.faster_rcnn.utils.proposal_creator import \\\n    ProposalCreator\n\nclass RegionProposalNetwork(chainer.Chain):\n\n    \"\"\"Region Proposal Network introduced in Faster R-CNN.\n\n    This is Region Proposal Network introduced in Faster R-CNN [#]_.\n    This takes features extracted from images and propose\n    class agnostic bounding boxes around \"objects\".\n\n    .. [#] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. \\\n    Faster R-CNN: Towards Real-Time Object Detection with \\\n    Region Proposal Networks. NIPS 2015.\n\n    Args:\n        in_channels (int): The channel size of input.\n        mid_channels (int): The channel size of the intermediate tensor.\n        ratios (list of floats): This is ratios of width to height of\n            the anchors.\n        anchor_scales (list of numbers): This is areas of anchors.\n            Those areas will be the product of the square of an element in\n            :obj:`anchor_scales` and the original area of the reference\n            window.\n        feat_stride (int): Stride size after extracting features from an\n            image.\n        initialW (callable): Initial weight value. If :obj:`None` then this\n            function uses Gaussian distribution scaled by 0.1 to\n            initialize weight.\n            May also be a callable that takes an array and edits its values.\n        proposal_creator_params (dict): Key valued paramters for\n            :class:`~chainercv.links.model.faster_rcnn.ProposalCreator`.\n\n    .. seealso::\n        :class:`~chainercv.links.model.faster_rcnn.ProposalCreator`\n\n    \"\"\"\n\n    def __init__(\n            self, in_channels=512, mid_channels=512, ratios=[0.5, 1, 2],\n            anchor_scales=[8, 16, 32], feat_stride=16,\n            initialW=None,\n            proposal_creator_params={},\n    ):\n        self.anchor_base = generate_anchor_base(\n            anchor_scales=anchor_scales, ratios=ratios)\n        self.feat_stride = feat_stride\n        self.proposal_layer = ProposalCreator(**proposal_creator_params)\n\n        n_anchor = self.anchor_base.shape[0]\n        super(RegionProposalNetwork, self).__init__()\n        with self.init_scope():\n            self.conv1 = L.Convolution2D(\n                in_channels, mid_channels, 3, 1, 1, initialW=initialW)\n            self.score = L.Convolution2D(\n                mid_channels, n_anchor * 1, 1, 1, 0, initialW=initialW)\n            self.loc = L.Convolution2D(\n                mid_channels, n_anchor * 4, 1, 1, 0, initialW=initialW)\n\n    def __call__(self, x, img_size, scale=1.):\n        \"\"\"Forward Region Proposal Network.\n\n        Here are notations.\n\n        * :math:`N` is batch size.\n        * :math:`C` channel size of the input.\n        * :math:`H` and :math:`W` are height and witdh of the input feature.\n        * :math:`A` is number of anchors assigned to each pixel.\n\n        Args:\n            x (~chainer.Variable): The Features extracted from images.\n                Its shape is :math:`(N, C, H, W)`.\n            img_size (tuple of ints): A tuple :obj:`height, width`,\n                which contains image size after scaling.\n            scale (float): The amount of scaling done to the input images after\n                reading them from files.\n\n        Returns:\n            (~chainer.Variable, ~chainer.Variable, array, array, array):\n\n            This is a tuple of five following values.\n\n            * **rpn_locs**: Predicted bounding box offsets and scales for \\\n                anchors. Its shape is :math:`(N, H W A, 4)`.\n            * **rpn_scores**:  Predicted foreground scores for \\\n                anchors. Its shape is :math:`(N, H W A, 2)`.\n            * **rois**: A bounding box array containing coordinates of \\\n                proposal boxes.  This is a concatenation of bounding box \\\n                arrays from multiple images in the batch. \\\n                Its shape is :math:`(R', 4)`. Given :math:`R_i` predicted \\\n                bounding boxes from the :math:`i` th image, \\\n                :math:`R' = \\\\sum _{i=1} ^ N R_i`.\n            * **roi_indices**: An array containing indices of images to \\\n                which RoIs correspond to. Its shape is :math:`(R',)`.\n            * **anchor**: Coordinates of enumerated shifted anchors. \\\n                Its shape is :math:`(H W A, 4)`.\n\n        \"\"\"\n        n, _, hh, ww = x.shape\n        anchor = _enumerate_shifted_anchor(\n            self.xp.array(self.anchor_base), self.feat_stride, hh, ww)\n        n_anchor = anchor.shape[0] // (hh * ww)\n        h = F.relu(self.conv1(x))\n\n        rpn_locs = self.loc(h)\n        rpn_scores = self.score(h)\n        \n        rpn_locs = rpn_locs.transpose((0, 2, 3, 1)).reshape((n, -1, 4))\n        rpn_scores = rpn_scores.transpose((0, 2, 3, 1))\n        rpn_fg_scores =\\\n            rpn_scores.reshape((n, hh, ww, n_anchor))[:, :, :, :] # modified from chainercv\n        rpn_fg_scores = rpn_fg_scores.reshape((n, -1))\n        rpn_scores = rpn_scores.reshape((n, -1)) # modified from chainercv\n\n        rois = []\n        roi_indices = []\n        for i in range(n):\n            roi = self.proposal_layer(\n                rpn_locs[i].array, rpn_fg_scores[i].array, anchor, img_size,\n                scale=scale)\n            batch_index = i * self.xp.ones((len(roi),), dtype=np.int32)\n            rois.append(roi)\n            roi_indices.append(batch_index)\n        rois = self.xp.concatenate(rois, axis=0)\n        roi_indices = self.xp.concatenate(roi_indices, axis=0)\n        return rpn_locs, rpn_scores, rois, roi_indices, anchor\n\n\ndef _enumerate_shifted_anchor(anchor_base, feat_stride, height, width):\n    # Enumerate all shifted anchors:\n    #\n    # add A anchors (1, A, 4) to\n    # cell K shifts (K, 1, 4) to get\n    # shift anchors (K, A, 4)\n    # reshape to (K*A, 4) shifted anchors\n    xp = cuda.get_array_module(anchor_base)\n    shift_y = xp.arange(0, height * feat_stride, feat_stride)\n    shift_x = xp.arange(0, width * feat_stride, feat_stride)\n    shift_x, shift_y = xp.meshgrid(shift_x, shift_y)\n    shift = xp.stack((shift_y.ravel(), shift_x.ravel(),\n                      shift_y.ravel(), shift_x.ravel()), axis=1)\n\n    A = anchor_base.shape[0]\n    K = shift.shape[0]\n    anchor = anchor_base.reshape((1, A, 4)) + \\\n        shift.reshape((1, K, 4)).transpose((1, 0, 2))\n    anchor = anchor.reshape((K * A, 4)).astype(np.float32)\n    return anchor"
  },
  {
    "path": "utils/roi_align_2d.py",
    "content": "# Modified work as ROIAlign:\r\n# -----------------------------------------------------------------------------\r\n# Copyright (c) 2018 DeNA\r\n# -----------------------------------------------------------------------------\r\n\r\n# Modified work:\r\n# -----------------------------------------------------------------------------\r\n# Copyright (c) 2015 Preferred Infrastructure, Inc.\r\n# Copyright (c) 2015 Preferred Networks, Inc.\r\n# -----------------------------------------------------------------------------\r\n\r\n# Original work of forward_gpu and backward_gpu:\r\n# -----------------------------------------------------------------------------\r\n# Fast R-CNN\r\n# Copyright (c) 2015 Microsoft\r\n# Licensed under The MIT License [see fast-rcnn/LICENSE for details]\r\n# Written by Ross Girshick\r\n# -----------------------------------------------------------------------------\r\n\r\nimport numpy\r\nimport six\r\n\r\nfrom chainer import cuda\r\nfrom chainer import function\r\nfrom chainer.utils import type_check\r\n\r\nclass ROIAlign2D(function.Function):\r\n\r\n    \"\"\"RoI align over a set of 2d planes.\"\"\"\r\n\r\n    def __init__(self, outh, outw, spatial_scale):\r\n        self.outh, self.outw = outh, outw\r\n        self.spatial_scale = spatial_scale\r\n\r\n    def check_type_forward(self, in_types):\r\n        type_check.expect(in_types.size() == 2)\r\n\r\n        x_type, roi_type = in_types\r\n        type_check.expect(\r\n            x_type.dtype == numpy.float32,\r\n            x_type.ndim == 4,\r\n            roi_type.dtype == numpy.float32,\r\n            roi_type.ndim == 2,\r\n            roi_type.shape[1] == 5,\r\n        )\r\n\r\n    def forward_gpu(self, inputs):\r\n        self.retain_inputs((1,))\r\n        self._bottom_data_shape = inputs[0].shape\r\n\r\n        bottom_data, bottom_rois = inputs\r\n        #e.g. (batch, channel, h, w)=(1, 512, 38, 53) (n_rois, )=(128, 5)\r\n        channels, height, width = bottom_data.shape[1:]\r\n        n_rois = bottom_rois.shape[0]\r\n        top_data = cuda.cupy.empty((n_rois, channels, self.outh,\r\n                                    self.outw), dtype=numpy.float32)\r\n        cuda.cupy.ElementwiseKernel(\r\n            '''\r\n            raw float32 bottom_data, float32 spatial_scale, int32 channels,\r\n            int32 height, int32 width, int32 pooled_height, int32 pooled_width,\r\n            raw float32 bottom_rois\r\n            ''',\r\n            'float32 top_data',\r\n            '''\r\n            // pos in output filter\r\n            int pw = i % pooled_width;\r\n            int ph = (i / pooled_width) % pooled_height;\r\n            int c = (i / pooled_width / pooled_height) % channels;\r\n            int num = i / pooled_width / pooled_height / channels;\r\n\r\n            // scale the ROI coordinates (1/16)\r\n            float roi_batch_ind = bottom_rois[num * 5 + 0];\r\n            float roi_start_w = bottom_rois[num * 5 + 1] * spatial_scale;\r\n            float roi_start_h = bottom_rois[num * 5 + 2] * spatial_scale;\r\n            float roi_end_w = bottom_rois[num * 5 + 3] * spatial_scale;\r\n            float roi_end_h = bottom_rois[num * 5 + 4] * spatial_scale;\r\n\r\n            // Force malformed ROIs to be 1x1\r\n            float roi_width = max(roi_end_w - roi_start_w, 1.0);\r\n            float roi_height = max(roi_end_h - roi_start_h, 1.0);\r\n\r\n            // float bin size \r\n            float bin_size_h = roi_height / static_cast<float>(pooled_height);\r\n            float bin_size_w = roi_width / static_cast<float>(pooled_width);\r\n            float maxval = 0;\r\n            int maxidx = -1;\r\n            \r\n            for (int j = 0; j < 4; j++) {\r\n                int ih = j / 2;\r\n                int iw = j % 2;\r\n                float val = 0;\r\n                // ROIAlign using the center of the bin\r\n                float fh = roi_start_h + (static_cast<float>(ph) + 0.25 + static_cast<float>(ih) * 0.5f) * bin_size_h;\r\n                float fw = roi_start_w + (static_cast<float>(pw) + 0.25 + static_cast<float>(iw) * 0.5f) * bin_size_w;\r\n                \r\n                if (fh < -1.0 || fh > height || fw < -1.0 || fw > width) {\r\n                    continue;\r\n                }\r\n\r\n                int hstart = static_cast<int>(floor(fh));\r\n                int wstart = static_cast<int>(floor(fw));\r\n                int hend = hstart + 1;\r\n                int wend = wstart + 1;\r\n\r\n                if (hstart >= height - 1) {\r\n                    hend = hstart = height - 1;\r\n                    fh = static_cast<float>(hstart);\r\n                } else {\r\n                    hend = hstart + 1;\r\n                }\r\n\r\n                if (wstart >= width - 1) {\r\n                    wend = wstart = width - 1;\r\n                    fw = static_cast<float>(wstart);\r\n                } else {\r\n                    wend = wstart + 1;\r\n                }\r\n                float dh = fh - static_cast<float>(hstart);\r\n                float dw = fw - static_cast<float>(wstart);\r\n\r\n                //compute the max value in the bin\r\n                int data_offset = (roi_batch_ind * channels + c) * height * width;\r\n\r\n                val += (1.0 - dh) * (1.0 - dw) * bottom_data[data_offset + hstart * width + wstart];\r\n                val += (1.0 - dh) * dw         * bottom_data[data_offset + hstart * width + wend];\r\n                val += dh * (1.0 - dw)         * bottom_data[data_offset + hend * width + wstart];\r\n                val += dh * dw                 * bottom_data[data_offset + hend * width + wend];\r\n\r\n                maxval += val;\r\n            }\r\n            top_data = maxval / 4;\r\n            \r\n            ''', 'roi_pooling_2d_fwd'\r\n        )(bottom_data, self.spatial_scale, channels, height, width,\r\n          self.outh, self.outw, bottom_rois, top_data)\r\n        return top_data,\r\n\r\n    def backward_gpu(self, inputs, gy):\r\n        bottom_rois = inputs[1]\r\n        channels, height, width = self._bottom_data_shape[1:]\r\n        bottom_diff = cuda.cupy.zeros(self._bottom_data_shape, numpy.float32)\r\n        cuda.cupy.ElementwiseKernel(\r\n            '''\r\n            raw float32 top_diff, int32 num_rois,\r\n            float32 spatial_scale, int32 channels, int32 height, int32 width,\r\n            int32 pooled_height, int32 pooled_width, raw float32 bottom_rois\r\n            ''',\r\n            'raw float32 bottom_diff',\r\n            '''\r\n            // pos in output filter\r\n            int pw = i % pooled_width;\r\n            int ph = (i / pooled_width) % pooled_height;\r\n            int c = (i / pooled_width / pooled_height) % channels;\r\n            int num = i / pooled_width / pooled_height / channels;\r\n\r\n            // scale the ROI coordinates (1/16)\r\n            float roi_batch_ind = bottom_rois[num * 5 + 0];\r\n            float roi_start_w = bottom_rois[num * 5 + 1] * spatial_scale;\r\n            float roi_start_h = bottom_rois[num * 5 + 2] * spatial_scale;\r\n            float roi_end_w = bottom_rois[num * 5 + 3] * spatial_scale;\r\n            float roi_end_h = bottom_rois[num * 5 + 4] * spatial_scale;\r\n\r\n            // Force malformed ROIs to be 1x1\r\n            float roi_width = max(roi_end_w - roi_start_w, 1.0);\r\n            float roi_height = max(roi_end_h - roi_start_h, 1.0);\r\n\r\n            // float bin size \r\n            float bin_size_h = roi_height / static_cast<float>(pooled_height);\r\n            float bin_size_w = roi_width / static_cast<float>(pooled_width);\r\n            int data_offset = (roi_batch_ind * channels + c) * height * width;\r\n            \r\n            for (int j = 0; j < 4; j++) {\r\n                int ih = j / 2;\r\n                int iw = j % 2;\r\n                // ROIAlign using the center of the bin\r\n                float fh = roi_start_h + (static_cast<float>(ph) + 0.25 + static_cast<float>(ih) * 0.5f) * bin_size_h;\r\n                float fw = roi_start_w + (static_cast<float>(pw) + 0.25 + static_cast<float>(iw) * 0.5f) * bin_size_w;\r\n                \r\n                if (fh < -1.0 || fh > height || fw < -1.0 || fw > width) {\r\n                    continue;\r\n                }\r\n\r\n                int hstart = static_cast<int>(floor(fh));\r\n                int wstart = static_cast<int>(floor(fw));\r\n                int hend = hstart + 1;\r\n                int wend = wstart + 1;\r\n\r\n                if (hstart >= height - 1) {\r\n                    hend = hstart = height - 1;\r\n                    fh = static_cast<float>(hstart);\r\n                } else {\r\n                    hend = hstart + 1;\r\n                }\r\n\r\n                if (wstart >= width - 1) {\r\n                    wend = wstart = width - 1;\r\n                    fw = static_cast<float>(wstart);\r\n                } else {\r\n                    wend = wstart + 1;\r\n                }\r\n                float dh = fh - static_cast<float>(hstart);\r\n                float dw = fw - static_cast<float>(wstart);\r\n\r\n                //atomic add: pointer, value\r\n                atomicAdd(&bottom_diff[data_offset + hstart * width + wstart], top_diff[i] * (1.0 - dh) * (1.0 - dw) / 4);\r\n                atomicAdd(&bottom_diff[data_offset + hstart * width + wend], top_diff[i] * (1.0 - dh) * dw         / 4);\r\n                atomicAdd(&bottom_diff[data_offset + hend * width + wstart], top_diff[i] * dh         * (1.0 - dw) / 4);\r\n                atomicAdd(&bottom_diff[data_offset + hend * width + wend], top_diff[i] * dh         * dw         / 4);\r\n            }\r\n\r\n            ''', 'roi_pooling_2d_bwd'\r\n        )(gy[0], bottom_rois.shape[0], self.spatial_scale,\r\n          channels, height, width, self.outh, self.outw,\r\n          bottom_rois, bottom_diff, size=gy[0].size)\r\n        \r\n        return bottom_diff, None\r\n\r\n\r\ndef roi_align_2d(x, rois, outh, outw, spatial_scale):\r\n    \"\"\"Spatial Region of Interest (ROI) align function.\r\n\r\n    This function acts similarly to :class:`~functions.MaxPooling2D`, but\r\n    it computes the maximum of input spatial patch for each channel\r\n    with the region of interest.\r\n\r\n    Args:\r\n        x (~chainer.Variable): Input variable. The shape is expected to be\r\n            4 dimentional: (n: batch, c: channel, h, height, w: width).\r\n        rois (~chainer.Variable): Input roi variable. The shape is expected to\r\n            be (n: data size, 5), and each datum is set as below:\r\n            (batch_index, x_min, y_min, x_max, y_max).\r\n        outh (int): Height of output image after pooled.\r\n        outw (int): Width of output image after pooled.\r\n        spatial_scale (float): Scale of the roi is resized.\r\n\r\n    Returns:\r\n        ~chainer.Variable: Output variable.\r\n\r\n    See the original paper proposing ROIPooling:\r\n    `Fast R-CNN <https://arxiv.org/abs/1504.08083>`_.\r\n\r\n    \"\"\"\r\n    return ROIAlign2D(outh, outw, spatial_scale)(x, rois)\r\n"
  },
  {
    "path": "utils/updater.py",
    "content": "import copy\r\nimport six\r\n\r\nfrom chainer.dataset import convert\r\nfrom chainer.dataset import iterator as iterator_module\r\nfrom chainer import function, variable\r\nfrom chainer.training.updater import StandardUpdater\r\nfrom chainer import reporter\r\nfrom chainer import cuda\r\n\r\nclass SubDivisionUpdater(StandardUpdater):\r\n\r\n\r\n    def __init__(self, iterator, optimizer, converter=convert.concat_examples,\r\n        subdivisions=1, device=None, loss_func=None):\r\n        super(SubDivisionUpdater, self).__init__(\r\n            iterator=iterator,\r\n            optimizer=optimizer,\r\n            converter=converter,\r\n            device=device,\r\n            loss_func=loss_func,\r\n        )\r\n        self._batchsize = self._iterators['main'].batch_size\r\n        self._subdivisions = subdivisions\r\n        self._n = int(self._batchsize / self._subdivisions)\r\n        assert self._batchsize % self._subdivisions == 0, (self._batchsize, self._subdivisions)\r\n\r\n    def update_core(self):\r\n        batch = self._iterators['main'].next()\r\n        #print(self._n)\r\n        in_arrays_list = []\r\n        for i in range(self._subdivisions):\r\n            in_arrays_list.append(self.converter(batch[i::self._subdivisions], self.device))\r\n            #in_arrays_list.append(self.converter(batch, self.device))\r\n        optimizer = self._optimizers['main']\r\n        loss_func = self.loss_func or optimizer.target\r\n        loss_func.cleargrads()\r\n\r\n        losses=[]\r\n\r\n        for i, in_arrays in enumerate(in_arrays_list):\r\n            if isinstance(in_arrays, tuple):\r\n                in_vars = list(variable.Variable(x) for x in in_arrays)\r\n                loss = loss_func(*in_vars)\r\n            elif isinstance(in_arrays, dict):\r\n                in_vars = {key: variable.Variable(x) for key, x in six.iteritems(in_arrays)}\r\n                loss = loss_func(in_vars)\r\n            else:\r\n                print(type(in_arrays))\r\n            loss.backward()\r\n            #loss = {k: cuda.to_cpu(v.data) for k, v in loss.items()} # for logging\r\n            loss = cuda.to_cpu(loss.data)\r\n            losses.append(loss)\r\n        \r\n        optimizer.update()\r\n        # minibatch average\r\n        if isinstance(loss, dict):\r\n            avg_loss = {k: 0. for k in losses[0].keys()}\r\n            for loss in losses:\r\n                for k, v in loss.items():\r\n                    avg_loss[k] += v\r\n            #avg_loss = {k: v / float(self._batchsize) for k, v in avg_loss.items()}\r\n            avg_loss = {k: v / float(len(losses)) for k, v in avg_loss.items()}\r\n            #avg_loss = {k: v for k, v in avg_loss.items()}\r\n\r\n            # report all the loss values\r\n            for k, v in avg_loss.items():\r\n                reporter.report({k: v}, loss_func)\r\n            reporter.report({'loss': sum(list(avg_loss.values()))}, loss_func)\r\n        else:\r\n            avg_loss = 0.\r\n            for loss in losses:\r\n                avg_loss += loss\r\n            avg_loss /= float(self._subdivisions)\r\n            reporter.report({'loss': avg_loss}, loss_func)"
  },
  {
    "path": "utils/vis_bbox.py",
    "content": "from chainercv.visualizations.vis_image import vis_image\r\nimport numpy as np\r\nfrom skimage.measure import find_contours\r\nfrom matplotlib.patches import Polygon\r\nimport cv2\r\n\r\ndef vis_bbox(img, bbox, label=None, score=None, mask=None, label_names=None, ax=None, contour=False, labeldisplay=True):\r\n    \"\"\"Visualize bounding boxes inside image.\r\n\r\n    Example:\r\n\r\n        >>> from chainercv.datasets import VOCDetectionDataset\r\n        >>> from chainercv.datasets import voc_bbox_label_names\r\n        >>> from chainercv.visualizations import vis_bbox\r\n        >>> import matplotlib.pyplot as plot\r\n        >>> dataset = VOCDetectionDataset()\r\n        >>> img, bbox, label = dataset[60]\r\n        >>> vis_bbox(img, bbox, label,\r\n        ...         label_names=voc_bbox_label_names)\r\n        >>> plot.show()\r\n\r\n    Args:\r\n        img (~numpy.ndarray): An array of shape :math:`(3, height, width)`.\r\n            This is in RGB format and the range of its value is\r\n            :math:`[0, 255]`.\r\n        bbox (~numpy.ndarray): An array of shape :math:`(R, 4)`, where\r\n            :math:`R` is the number of bounding boxes in the image.\r\n            Each element is organized\r\n            by :obj:`(y_min, x_min, y_max, x_max)` in the second axis.\r\n        label (~numpy.ndarray): An integer array of shape :math:`(R,)`.\r\n            The values correspond to id for label names stored in\r\n            :obj:`label_names`. This is optional.\r\n        score (~numpy.ndarray): A float array of shape :math:`(R,)`.\r\n             Each value indicates how confident the prediction is.\r\n             This is optional.\r\n        label_names (iterable of strings): Name of labels ordered according\r\n            to label ids. If this is :obj:`None`, labels will be skipped.\r\n        ax (matplotlib.axes.Axis): The visualization is displayed on this\r\n            axis. If this is :obj:`None` (default), a new axis is created.\r\n\r\n    Returns:\r\n        ~matploblib.axes.Axes:\r\n        Returns the Axes object with the plot for further tweaking.\r\n\r\n    \"\"\"\r\n    from matplotlib import pyplot as plot\r\n\r\n    if label is not None and not len(bbox) == len(label):\r\n        raise ValueError('The length of label must be same as that of bbox')\r\n    if score is not None and not len(bbox) == len(score):\r\n        raise ValueError('The length of score must be same as that of bbox')\r\n\r\n    # alpha-blend the masks\r\n    COLOR=[(1,1,0), (1,0,1),(0,1,1),(0,0,1),(0,1,0), (1,0,0),(0.1,1,0.2)]\r\n    dst = img.astype(float)\r\n    for i, m in enumerate(mask):\r\n        alpha = np.tile(np.round(m), (3, 1, 1)).astype(float) * 0.4\r\n        src1 = np.ones(dst.shape).astype(float)\r\n        for j, col in enumerate(COLOR[i%len(COLOR)]):\r\n            src1[j] *= col * 255\r\n        dst = cv2.multiply(src1, alpha) + cv2.multiply(dst, 1 - alpha)\r\n\r\n    # Returns newly instantiated matplotlib.axes.Axes object if ax is None\r\n    ax = vis_image(dst, ax=ax)\r\n\r\n    # If there is no bounding box to display, visualize the image and exit.\r\n    if len(bbox) == 0:\r\n        return ax\r\n\r\n    # add boxes, contours and labels\r\n    for i, bb in enumerate(bbox):\r\n        # boxes\r\n        xy = (bb[1], bb[0])\r\n        height = int(bb[2]) - int(bb[0])\r\n        width = int(bb[3]) - int(bb[1])\r\n        ax.add_patch(plot.Rectangle(\r\n            xy, width, height, fill=False, edgecolor='red', linewidth=1))\r\n        \r\n        # contours\r\n        if contour:\r\n            Mcontours = find_contours(mask[i].T, 0.5)\r\n            for verts in Mcontours:\r\n                p = Polygon(verts, facecolor=\"none\", edgecolor=[0.5,0.5,0.5])\r\n                ax.add_patch(p)\r\n        \r\n        #labels\r\n        caption = list()\r\n        if label is not None and label_names is not None:\r\n            lb = label[i]\r\n            print(lb)\r\n            if not (0 <= lb < len(label_names)):\r\n                raise ValueError('No corresponding name is given')\r\n            caption.append(label_names[lb])\r\n        if score is not None:\r\n            sc = score[i]\r\n            caption.append('{:.2f}'.format(sc))\r\n\r\n        if len(caption) > 0 and labeldisplay:\r\n            ax.text(bb[1], bb[0],\r\n                    ': '.join(caption),\r\n                    style='italic',\r\n                    fontsize=8,\r\n                    color='white'\r\n                    )#'facecolor': 'white', 'alpha': 0.7, 'pad': 10})\r\n    return ax\r\n"
  }
]