[
  {
    "path": "README.md",
    "content": "# CondInst\nThis repository is an unofficial pytorch implementation of [Conditional Convolutions for Instance Segmentation](https://arxiv.org/abs/2003.05664). The model with ResNet-101 backbone achieves 37.1 mAP on COCO val2017 set.\n\n## Install\nThe code is based on [detectron2](https://github.com/facebookresearch/detectron2). Please check [Install.md](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md) for installation instructions.\n\n## Training \nFollows the same way as detectron2.\n\nSingle GPU:\n```\npython train_net.py --config-file configs/CondInst/MS_R_101_3x.yaml\n```\nMulti GPU(for example 8):\n```\npython train_net.py --num-gpus 8 --config-file configs/CondInst/MS_R_101_3x.yaml\n```\nPlease adjust the IMS_PER_BATCH in the config file according to the GPU memory.\n\n\n\n## Notes\nI have replaced the original upsample with the aligned upsample according to the [author's issue](https://github.com/Epiphqny/CondInst/issues/1), and use the upsampled mask to calculate loss, this brings more gains but may cost more GPU memory, if you do not have much memory, use the original unupsampled version to calculate loss.\n\n## Inference\nFirst replace the original detectron2 installed postprocessing.py with the [file](https://github.com/Epiphqny/CondInst/blob/master/postprocessing.py) in this repository, as the original file only suit for ROI obatined masks.\nThe path should be like /miniconda3/envs/py37/lib/python3.7/site-packages/detectron2/modeling/postprocessing.py\n\nSingle GPU:\n```\npython train_net.py --config-file configs/CondInst/MS_R_101_3x.yaml --eval-only MODEL.WEIGHTS /path/to/checkpoint_file\n```\nMulti GPU(for example 8):\n```\npython train_net.py --num-gpus 8 --config-file configs/CondInst/MS_R_101_3x.yaml --eval-only MODEL.WEIGHTS /path/to/checkpoint_file\n```\n## Weights\nTrained model can be download in [Google drive](https://drive.google.com/file/d/17-g91zwJzt99G8APza0IaleWYLC3kTMK/view?usp=sharing)\n\n## Results\nAfter training 36 epochs on the coco dataset using the resnet-101 backbone, the mAP is 0.371 on COCO val2017 dataset:\n\n<img src=\"AP.jpg\">\n\n## Visualization\n\n<img src=\"condinst.png\" width=\"2000\">\n\n"
  },
  {
    "path": "configs/CondInst/Base-FCOS.yaml",
    "content": "MODEL:\n  META_ARCHITECTURE: \"OneStageDetector\"\n  BACKBONE:\n    NAME: \"build_fcos_resnet_fpn_backbone\"\n  RESNETS:\n    OUT_FEATURES: [\"res3\", \"res4\", \"res5\"]\n  FPN:\n    IN_FEATURES: [\"res3\", \"res4\", \"res5\"]\n  PROPOSAL_GENERATOR:\n    NAME: \"FCOS\"\n  # PIXEL_MEAN: [102.9801, 115.9465, 122.7717]\n  MASK_ON: True\nDATASETS:\n  TRAIN: (\"coco_2017_train\",)\n  TEST: (\"coco_2017_val\",)\nSOLVER:\n  IMS_PER_BATCH: 4\n  BASE_LR: 0.01  # Note that RetinaNet uses a different default learning rate\n  STEPS: (60000, 80000)\n  MAX_ITER: 90000\nINPUT:\n  MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)\n  MASK_FORMAT: \"bitmask\"\n"
  },
  {
    "path": "configs/CondInst/MS_R_101_3x.yaml",
    "content": "_BASE_: \"Base-FCOS.yaml\"\nMODEL:\n  WEIGHTS: \"detectron2://ImageNetPretrained/MSRA/R-101.pkl\"\n  RESNETS:\n    DEPTH: 101\nSOLVER:\n  STEPS: (60000, 80000)\n  MAX_ITER: 90000\nOUTPUT_DIR: \"output/fcos/R_101_3x\"\n"
  },
  {
    "path": "configs/CondInst/MS_R_50_2x.yaml",
    "content": "_BASE_: \"Base-FCOS.yaml\"\nMODEL:\n  WEIGHTS: \"detectron2://ImageNetPretrained/MSRA/R-50.pkl\"\n  RESNETS:\n    DEPTH: 50\nSOLVER:\n  STEPS: (120000, 160000)\n  MAX_ITER: 180000\nOUTPUT_DIR: \"output/fcos/R_50_2x\"\n"
  },
  {
    "path": "configs/CondInst/MS_X_101_2x.yaml",
    "content": "_BASE_: \"Base-FCOS.yaml\"\nMODEL:\n  WEIGHTS: \"detectron2://ImageNetPretrained/FAIR/X-101-32x8d.pkl\"\n  PIXEL_STD: [57.375, 57.120, 58.395]\n  RESNETS:\n    STRIDE_IN_1X1: False  # this is a C2 model\n    NUM_GROUPS: 32\n    WIDTH_PER_GROUP: 8\n    DEPTH: 101\nSOLVER:\n  STEPS: (120000, 160000)\n  MAX_ITER: 180000\nOUTPUT_DIR: \"output/fcos/X_101_2x\"\n"
  },
  {
    "path": "configs/CondInst/R_50_1x.yaml",
    "content": "_BASE_: \"Base-FCOS.yaml\"\nMODEL:\n  WEIGHTS: \"detectron2://ImageNetPretrained/MSRA/R-50.pkl\"\n  RESNETS:\n    DEPTH: 50\nINPUT:\n  MIN_SIZE_TRAIN: (800,)\nSOLVER:\n  WARMUP_METHOD: \"constant\"\n  WARMUP_FACTOR: 0.3333\n  WARMUP_ITERS: 500\nOUTPUT_DIR: \"output/fcos/R_50_1x\"\n"
  },
  {
    "path": "configs/CondInst/vovnet/MS_V_39_3x.yaml",
    "content": "_BASE_: \"../Base-FCOS.yaml\"\nMODEL:\n  WEIGHTS: \"https://www.dropbox.com/s/q98pypf96rhtd8y/vovnet39_ese_detectron2.pth?dl=1\"\n  BACKBONE:\n    NAME: \"build_fcos_vovnet_fpn_backbone\"\n    FREEZE_AT: 0\n  VOVNET:\n    CONV_BODY : \"V-39-eSE\"\n    OUT_FEATURES: [\"stage3\", \"stage4\", \"stage5\"]\n  FPN:\n    IN_FEATURES: [\"stage3\", \"stage4\", \"stage5\"]\nSOLVER:\n  STEPS: (210000, 250000)\n  MAX_ITER: 270000\nOUTPUT_DIR: \"output/fcos/V_39_ms_3x\"\n"
  },
  {
    "path": "configs/CondInst/vovnet/MS_V_57_3x.yaml",
    "content": "_BASE_: \"../Base-FCOS.yaml\"\nMODEL:\n  WEIGHTS: \"https://www.dropbox.com/s/8xl0cb3jj51f45a/vovnet57_ese_detectron2.pth?dl=1\"\n  BACKBONE:\n    NAME: \"build_fcos_vovnet_fpn_backbone\"\n    FREEZE_AT: 0\n  VOVNET:\n    CONV_BODY : \"V-57-eSE\"\n    OUT_FEATURES: [\"stage3\", \"stage4\", \"stage5\"]\n  FPN:\n    IN_FEATURES: [\"stage3\", \"stage4\", \"stage5\"]\nSOLVER:\n  STEPS: (210000, 250000)\n  MAX_ITER: 270000\nOUTPUT_DIR: \"output/fcos/V_57_ms_3x\"\n"
  },
  {
    "path": "configs/CondInst/vovnet/MS_V_99_3x.yaml",
    "content": "_BASE_: \"../Base-FCOS.yaml\"\nMODEL:\n  WEIGHTS: \"https://www.dropbox.com/s/1mlv31coewx8trd/vovnet99_ese_detectron2.pth?dl=1\"\n  BACKBONE:\n    NAME: \"build_fcos_vovnet_fpn_backbone\"\n    FREEZE_AT: 0\n  VOVNET:\n    CONV_BODY : \"V-99-eSE\"\n    OUT_FEATURES: [\"stage3\", \"stage4\", \"stage5\"]\n  FPN:\n    IN_FEATURES: [\"stage3\", \"stage4\", \"stage5\"]\nSOLVER:\n  STEPS: (210000, 250000)\n  MAX_ITER: 270000\nOUTPUT_DIR: \"output/fcos/V_99_ms_3x\"\n"
  },
  {
    "path": "configs/CondInst/vovnet/README.md",
    "content": "# [VoVNet-v2](https://github.com/youngwanLEE/CenterMask) backbone networks in [FCOS](https://github.com/aim-uofa/adet)\n**Efficient Backbone Network for Object Detection and Segmentation**\\\nYoungwan Lee\n\n\n[[`vovnet-detectron2`](https://github.com/youngwanLEE/vovnet-detectron2)][[`CenterMask(code)`](https://github.com/youngwanLEE/CenterMask)] [[`VoVNet-v1(arxiv)`](https://arxiv.org/abs/1904.09730)] [[`VoVNet-v2(arxiv)`](https://arxiv.org/abs/1911.06667)] [[`BibTeX`](#CitingVoVNet)]\n\n\n<div align=\"center\">\n  <img src=\"https://dl.dropbox.com/s/jgi3c5828dzcupf/osa_updated.jpg\" width=\"700px\" />\n</div>\n\n  \n## Comparison with Faster R-CNN and ResNet\n\n### Note\n\nWe measure the inference time of all models with batch size 1 on the same V100 GPU machine.\n\n- pytorch1.3.1\n- CUDA 10.1\n- cuDNN 7.3\n\n\n|Method|Backbone|lr sched|inference time|AP|APs|APm|APl|download|\n|---|:--------:|:---:|:--:|--|----|----|---|--------|\n|Faster|R-50-FPN|3x|0.047|40.2|24.2|43.5|52.0|<a href=\"https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/model_final_f10217.pkl\">model</a>&nbsp;\\|&nbsp;<a href=\"https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x/137849600/metrics.json\">metrics</a>\n|Faster|**V2-39-FPN**|3x|0.047|42.7|27.1|45.6|54.0|<a href=\"https://dl.dropbox.com/s/dkto39ececze6l4/faster_V_39_eSE_ms_3x.pth\">model</a>&nbsp;\\|&nbsp;<a href=\"https://dl.dropbox.com/s/dx9qz1dn65ccrwd/faster_V_39_eSE_ms_3x_metrics.json\">metrics</a>\n|**FCOS**|**V2-39-FPN**|3x|0.045|43.5|28.1|47.2|54.5|<a href=\"https://dl.dropbox.com/s/t51vrqiekid49vp/fcos_V_39_eSE_FPN_ms_3x.pth\">model</a>&nbsp;\\|&nbsp;<a href=\"https://www.dropbox.com/s/jhu301a95o7lzw1/fcos_V_39_eSE_FPN_ms_3x_metrics.json\">metrics</a>\n||\n|Faster|R-101-FPN|3x|0.063|42.0|25.2|45.6|54.6|<a href=\"https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x/138205316/model_final_a3ec72.pkl\">model</a>&nbsp;\\|&nbsp;<a href=\"https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x/138205316/metrics.json\">metrics</a>\n|Faster|**V2-57-FPN**|3x|0.054|43.3|27.5|46.7|55.3|<a href=\"https://dl.dropbox.com/s/c7mb1mq10eo4pzk/faster_V_57_eSE_ms_3x.pth\">model</a>&nbsp;\\|&nbsp;<a href=\"https://dl.dropbox.com/s/3tsn218zzmuhyo8/faster_V_57_eSE_metrics.json\">metrics</a>\n|**FCOS**|**V2-57-FPN**|3x|0.051|44.4|28.8|47.2|56.3|<a href=\"https://dl.dropbox.com/s/c7mb1mq10eo4pzk/faster_V_57_eSE_ms_3x.pth\">model</a>&nbsp;\\|&nbsp;<a href=\"https://dl.dropbox.com/s/3tsn218zzmuhyo8/faster_V_57_eSE_metrics.json\">metrics</a>\n||\n|Faster|X-101-FPN|3x|0.120|43.0|27.2|46.1|54.9|<a href=\"https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/model_final_2d9806.pkl\">model</a>&nbsp;\\|&nbsp;<a href=\"https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_X_101_32x8d_FPN_3x/139653917/metrics.json\">metrics</a>|\n|Faster|**V2-99-FPN**|3x|0.073|44.1|28.1|47.0|56.4|<a href=\"https://dl.dropbox.com/s/v64mknwzfpmfcdh/faster_V_99_eSE_ms_3x.pth\">model</a>&nbsp;\\|&nbsp;<a href=\"https://dl.dropbox.com/s/zvaz9s8gvq2mhrd/faster_V_99_eSE_ms_3x_metrics.json\">metrics</a>|\n|**FCOS**|**V2-99-FPN**|3x|0.070|45.2|29.2|48.4|57.3|<a href=\"https://www.dropbox.com/s/cztd5jry52cy6vx/fcos_V_99_eSE_FPN_ms_3x.pth\">model</a>&nbsp;\\|&nbsp;<a href=\"https://www.dropbox.com/s/zdfb5zjl9lhi5p8/fcos_V_99_eSE_FPN_ms_3x_metrics.json\">metrics</a>|\n\n\n\n## <a name=\"CitingVoVNet\"></a>Citing VoVNet\n\nIf you use VoVNet, please use the following BibTeX entry.\n\n```BibTeX\n@inproceedings{lee2019energy,\n  title = {An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection},\n  author = {Lee, Youngwan and Hwang, Joong-won and Lee, Sangrok and Bae, Yuseok and Park, Jongyoul},\n  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops},\n  year = {2019}\n}\n\n@article{lee2019centermask,\n  title={CenterMask: Real-Time Anchor-Free Instance Segmentation},\n  author={Lee, Youngwan and Park, Jongyoul},\n  journal={arXiv preprint arXiv:1911.06667},\n  year={2019}\n}\n```\n"
  },
  {
    "path": "demo/demo.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport argparse\nimport glob\nimport multiprocessing as mp\nimport os\nimport time\nimport cv2\nimport tqdm\n\nfrom detectron2.data.detection_utils import read_image\nfrom detectron2.utils.logger import setup_logger\n\nfrom predictor import VisualizationDemo\nfrom adet.config import get_cfg\n\n# constants\nWINDOW_NAME = \"COCO detections\"\n\n\ndef setup_cfg(args):\n    # load config from file and command-line arguments\n    cfg = get_cfg()\n    cfg.merge_from_file(args.config_file)\n    cfg.merge_from_list(args.opts)\n    # Set score_threshold for builtin models\n    cfg.MODEL.RETINANET.SCORE_THRESH_TEST = args.confidence_threshold\n    cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = args.confidence_threshold\n    cfg.MODEL.FCOS.INFERENCE_TH_TEST = args.confidence_threshold\n    cfg.MODEL.PANOPTIC_FPN.COMBINE.INSTANCES_CONFIDENCE_THRESH = args.confidence_threshold\n    cfg.freeze()\n    return cfg\n\n\ndef get_parser():\n    parser = argparse.ArgumentParser(description=\"Detectron2 Demo\")\n    parser.add_argument(\n        \"--config-file\",\n        default=\"configs/quick_schedules/e2e_mask_rcnn_R_50_FPN_inference_acc_test.yaml\",\n        metavar=\"FILE\",\n        help=\"path to config file\",\n    )\n    parser.add_argument(\"--webcam\", action=\"store_true\", help=\"Take inputs from webcam.\")\n    parser.add_argument(\"--video-input\", help=\"Path to video file.\")\n    parser.add_argument(\"--input\", nargs=\"+\", help=\"A list of space separated input images\")\n    parser.add_argument(\n        \"--output\",\n        help=\"A file or directory to save output visualizations. \"\n        \"If not given, will show output in an OpenCV window.\",\n    )\n\n    parser.add_argument(\n        \"--confidence-threshold\",\n        type=float,\n        default=0.5,\n        help=\"Minimum score for instance predictions to be shown\",\n    )\n    parser.add_argument(\n        \"--opts\",\n        help=\"Modify config options using the command-line 'KEY VALUE' pairs\",\n        default=[],\n        nargs=argparse.REMAINDER,\n    )\n    return parser\n\n\nif __name__ == \"__main__\":\n    mp.set_start_method(\"spawn\", force=True)\n    args = get_parser().parse_args()\n    logger = setup_logger()\n    logger.info(\"Arguments: \" + str(args))\n\n    cfg = setup_cfg(args)\n\n    demo = VisualizationDemo(cfg)\n\n    if args.input:\n        if os.path.isdir(args.input[0]):\n            args.input = [os.path.join(args.input[0], fname) for fname in os.listdir(args.input[0])]\n        elif len(args.input) == 1:\n            args.input = glob.glob(os.path.expanduser(args.input[0]))\n            assert args.input, \"The input path(s) was not found\"\n        for path in tqdm.tqdm(args.input, disable=not args.output):\n            # use PIL, to be consistent with evaluation\n            img = read_image(path, format=\"BGR\")\n            start_time = time.time()\n            predictions, visualized_output = demo.run_on_image(img)\n            logger.info(\n                \"{}: detected {} instances in {:.2f}s\".format(\n                    path, len(predictions[\"instances\"]), time.time() - start_time\n                )\n            )\n\n            if args.output:\n                if os.path.isdir(args.output):\n                    assert os.path.isdir(args.output), args.output\n                    out_filename = os.path.join(args.output, os.path.basename(path))\n                else:\n                    assert len(args.input) == 1, \"Please specify a directory with args.output\"\n                    out_filename = args.output\n                visualized_output.save(out_filename)\n            else:\n                cv2.imshow(WINDOW_NAME, visualized_output.get_image()[:, :, ::-1])\n                if cv2.waitKey(0) == 27:\n                    break  # esc to quit\n    elif args.webcam:\n        assert args.input is None, \"Cannot have both --input and --webcam!\"\n        cam = cv2.VideoCapture(0)\n        for vis in tqdm.tqdm(demo.run_on_video(cam)):\n            cv2.namedWindow(WINDOW_NAME, cv2.WINDOW_NORMAL)\n            cv2.imshow(WINDOW_NAME, vis)\n            if cv2.waitKey(1) == 27:\n                break  # esc to quit\n        cv2.destroyAllWindows()\n    elif args.video_input:\n        video = cv2.VideoCapture(args.video_input)\n        width = int(video.get(cv2.CAP_PROP_FRAME_WIDTH))\n        height = int(video.get(cv2.CAP_PROP_FRAME_HEIGHT))\n        frames_per_second = video.get(cv2.CAP_PROP_FPS)\n        num_frames = int(video.get(cv2.CAP_PROP_FRAME_COUNT))\n        basename = os.path.basename(args.video_input)\n\n        if args.output:\n            if os.path.isdir(args.output):\n                output_fname = os.path.join(args.output, basename)\n                output_fname = os.path.splitext(output_fname)[0] + \".mkv\"\n            else:\n                output_fname = args.output\n            assert not os.path.isfile(output_fname), output_fname\n            output_file = cv2.VideoWriter(\n                filename=output_fname,\n                # some installation of opencv may not support x264 (due to its license),\n                # you can try other format (e.g. MPEG)\n                fourcc=cv2.VideoWriter_fourcc(*\"x264\"),\n                fps=float(frames_per_second),\n                frameSize=(width, height),\n                isColor=True,\n            )\n        assert os.path.isfile(args.video_input)\n        for vis_frame in tqdm.tqdm(demo.run_on_video(video), total=num_frames):\n            if args.output:\n                output_file.write(vis_frame)\n            else:\n                cv2.namedWindow(basename, cv2.WINDOW_NORMAL)\n                cv2.imshow(basename, vis_frame)\n                if cv2.waitKey(1) == 27:\n                    break  # esc to quit\n        video.release()\n        if args.output:\n            output_file.release()\n        else:\n            cv2.destroyAllWindows()\n"
  },
  {
    "path": "demo/predictor.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport numpy as np\nimport atexit\nimport bisect\nimport multiprocessing as mp\nfrom collections import deque\nimport cv2\nimport torch\nimport matplotlib.pyplot as plt\n\nfrom detectron2.data import MetadataCatalog\nfrom detectron2.engine.defaults import DefaultPredictor\nfrom detectron2.utils.video_visualizer import VideoVisualizer\nfrom detectron2.utils.visualizer import ColorMode, Visualizer\n\n\nclass VisualizationDemo(object):\n    def __init__(self, cfg, instance_mode=ColorMode.IMAGE, parallel=False):\n        \"\"\"\n        Args:\n            cfg (CfgNode):\n            instance_mode (ColorMode):\n            parallel (bool): whether to run the model in different processes from visualization.\n                Useful since the visualization logic can be slow.\n        \"\"\"\n        self.metadata = MetadataCatalog.get(\n            cfg.DATASETS.TEST[0] if len(cfg.DATASETS.TEST) else \"__unused\"\n        )\n        self.cpu_device = torch.device(\"cpu\")\n        self.instance_mode = instance_mode\n\n        self.parallel = parallel\n        if parallel:\n            num_gpu = torch.cuda.device_count()\n            self.predictor = AsyncPredictor(cfg, num_gpus=num_gpu)\n        else:\n            self.predictor = DefaultPredictor(cfg)\n\n    def run_on_image(self, image):\n        \"\"\"\n        Args:\n            image (np.ndarray): an image of shape (H, W, C) (in BGR order).\n                This is the format used by OpenCV.\n\n        Returns:\n            predictions (dict): the output of the model.\n            vis_output (VisImage): the visualized image output.\n        \"\"\"\n        vis_output = None\n        predictions = self.predictor(image)\n        # Convert image from OpenCV BGR format to Matplotlib RGB format.\n        image = image[:, :, ::-1]\n        visualizer = Visualizer(image, self.metadata, instance_mode=self.instance_mode)\n        if \"inst\" in predictions:\n            visualizer.vis_inst(predictions[\"inst\"])\n        if \"bases\" in predictions:\n            self.vis_bases(predictions[\"bases\"])\n        if \"panoptic_seg\" in predictions:\n            panoptic_seg, segments_info = predictions[\"panoptic_seg\"]\n            vis_output = visualizer.draw_panoptic_seg_predictions(\n                panoptic_seg.to(self.cpu_device), segments_info\n            )\n        else:\n            if \"sem_seg\" in predictions:\n                vis_output = visualizer.draw_sem_seg(\n                    predictions[\"sem_seg\"].argmax(dim=0).to(self.cpu_device))\n            if \"instances\" in predictions:\n                instances = predictions[\"instances\"].to(self.cpu_device)\n                vis_output = visualizer.draw_instance_predictions(predictions=instances)\n\n        return predictions, vis_output\n\n    def _frame_from_video(self, video):\n        while video.isOpened():\n            success, frame = video.read()\n            if success:\n                yield frame\n            else:\n                break\n\n    def vis_bases(self, bases):\n        basis_colors = [[2, 200, 255], [107, 220, 255], [30, 200, 255], [60, 220, 255]]\n        bases = bases[0].squeeze()\n        bases = (bases / 8).tanh().cpu().numpy()\n        num_bases = len(bases)\n        fig, axes = plt.subplots(nrows=num_bases // 2, ncols=2)\n        for i, basis in enumerate(bases):\n            basis = (basis + 1) / 2\n            basis = basis / basis.max()\n            basis_viz = np.zeros((basis.shape[0], basis.shape[1], 3), dtype=np.uint8)\n            basis_viz[:, :, 0] = basis_colors[i][0]\n            basis_viz[:, :, 1] = basis_colors[i][1]\n            basis_viz[:, :, 2] = np.uint8(basis * 255)\n            basis_viz = cv2.cvtColor(basis_viz, cv2.COLOR_HSV2RGB)\n            axes[i // 2][i % 2].imshow(basis_viz)\n        plt.show()\n\n    def run_on_video(self, video):\n        \"\"\"\n        Visualizes predictions on frames of the input video.\n\n        Args:\n            video (cv2.VideoCapture): a :class:`VideoCapture` object, whose source can be\n                either a webcam or a video file.\n\n        Yields:\n            ndarray: BGR visualizations of each video frame.\n        \"\"\"\n        video_visualizer = VideoVisualizer(self.metadata, self.instance_mode)\n\n        def process_predictions(frame, predictions):\n            frame = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)\n            if \"panoptic_seg\" in predictions:\n                panoptic_seg, segments_info = predictions[\"panoptic_seg\"]\n                vis_frame = video_visualizer.draw_panoptic_seg_predictions(\n                    frame, panoptic_seg.to(self.cpu_device), segments_info\n                )\n            elif \"instances\" in predictions:\n                predictions = predictions[\"instances\"].to(self.cpu_device)\n                vis_frame = video_visualizer.draw_instance_predictions(frame, predictions)\n            elif \"sem_seg\" in predictions:\n                vis_frame = video_visualizer.draw_sem_seg(\n                    frame, predictions[\"sem_seg\"].argmax(dim=0).to(self.cpu_device)\n                )\n\n            # Converts Matplotlib RGB format to OpenCV BGR format\n            vis_frame = cv2.cvtColor(vis_frame.get_image(), cv2.COLOR_RGB2BGR)\n            return vis_frame\n\n        frame_gen = self._frame_from_video(video)\n        if self.parallel:\n            buffer_size = self.predictor.default_buffer_size\n\n            frame_data = deque()\n\n            for cnt, frame in enumerate(frame_gen):\n                frame_data.append(frame)\n                self.predictor.put(frame)\n\n                if cnt >= buffer_size:\n                    frame = frame_data.popleft()\n                    predictions = self.predictor.get()\n                    yield process_predictions(frame, predictions)\n\n            while len(frame_data):\n                frame = frame_data.popleft()\n                predictions = self.predictor.get()\n                yield process_predictions(frame, predictions)\n        else:\n            for frame in frame_gen:\n                yield process_predictions(frame, self.predictor(frame))\n\n\nclass AsyncPredictor:\n    \"\"\"\n    A predictor that runs the model asynchronously, possibly on >1 GPUs.\n    Because rendering the visualization takes considerably amount of time,\n    this helps improve throughput when rendering videos.\n    \"\"\"\n\n    class _StopToken:\n        pass\n\n    class _PredictWorker(mp.Process):\n        def __init__(self, cfg, task_queue, result_queue):\n            self.cfg = cfg\n            self.task_queue = task_queue\n            self.result_queue = result_queue\n            super().__init__()\n\n        def run(self):\n            predictor = DefaultPredictor(self.cfg)\n\n            while True:\n                task = self.task_queue.get()\n                if isinstance(task, AsyncPredictor._StopToken):\n                    break\n                idx, data = task\n                result = predictor(data)\n                self.result_queue.put((idx, result))\n\n    def __init__(self, cfg, num_gpus: int = 1):\n        \"\"\"\n        Args:\n            cfg (CfgNode):\n            num_gpus (int): if 0, will run on CPU\n        \"\"\"\n        num_workers = max(num_gpus, 1)\n        self.task_queue = mp.Queue(maxsize=num_workers * 3)\n        self.result_queue = mp.Queue(maxsize=num_workers * 3)\n        self.procs = []\n        for gpuid in range(max(num_gpus, 1)):\n            cfg = cfg.clone()\n            cfg.defrost()\n            cfg.MODEL.DEVICE = \"cuda:{}\".format(gpuid) if num_gpus > 0 else \"cpu\"\n            self.procs.append(\n                AsyncPredictor._PredictWorker(cfg, self.task_queue, self.result_queue)\n            )\n\n        self.put_idx = 0\n        self.get_idx = 0\n        self.result_rank = []\n        self.result_data = []\n\n        for p in self.procs:\n            p.start()\n        atexit.register(self.shutdown)\n\n    def put(self, image):\n        self.put_idx += 1\n        self.task_queue.put((self.put_idx, image))\n\n    def get(self):\n        self.get_idx += 1  # the index needed for this request\n        if len(self.result_rank) and self.result_rank[0] == self.get_idx:\n            res = self.result_data[0]\n            del self.result_data[0], self.result_rank[0]\n            return res\n\n        while True:\n            # make sure the results are returned in the correct order\n            idx, res = self.result_queue.get()\n            if idx == self.get_idx:\n                return res\n            insert = bisect.bisect(self.result_rank, idx)\n            self.result_rank.insert(insert, idx)\n            self.result_data.insert(insert, res)\n\n    def __len__(self):\n        return self.put_idx - self.get_idx\n\n    def __call__(self, image):\n        self.put(image)\n        return self.get()\n\n    def shutdown(self):\n        for _ in self.procs:\n            self.task_queue.put(AsyncPredictor._StopToken())\n\n    @property\n    def default_buffer_size(self):\n        return len(self.procs) * 5\n"
  },
  {
    "path": "fcos/__init__.py",
    "content": "from fcos import modeling\n\n__version__ = \"0.1.1\"\n"
  },
  {
    "path": "fcos/checkpoint/__init__.py",
    "content": "from .adet_checkpoint import AdetCheckpointer\n\n__all__ = [\"AdetCheckpointer\"]\n"
  },
  {
    "path": "fcos/checkpoint/adet_checkpoint.py",
    "content": "import pickle\nfrom fvcore.common.file_io import PathManager\nfrom detectron2.checkpoint import DetectionCheckpointer\n\n\nclass AdetCheckpointer(DetectionCheckpointer):\n    \"\"\"\n    Same as :class:`DetectronCheckpointer`, but is able to convert models\n    in AdelaiDet, such as LPF backbone.\n    \"\"\"\n    def _load_file(self, filename):\n        if filename.endswith(\".pkl\"):\n            with PathManager.open(filename, \"rb\") as f:\n                data = pickle.load(f, encoding=\"latin1\")\n            if \"model\" in data and \"__author__\" in data:\n                # file is in Detectron2 model zoo format\n                self.logger.info(\"Reading a file from '{}'\".format(data[\"__author__\"]))\n                return data\n            else:\n                # assume file is from Caffe2 / Detectron1 model zoo\n                if \"blobs\" in data:\n                    # Detection models have \"blobs\", but ImageNet models don't\n                    data = data[\"blobs\"]\n                data = {k: v for k, v in data.items() if not k.endswith(\"_momentum\")}\n                return {\"model\": data, \"__author__\": \"Caffe2\", \"matching_heuristics\": True}\n\n        loaded = super()._load_file(filename)  # load native pth checkpoint\n        if \"model\" not in loaded:\n            loaded = {\"model\": loaded}\n        if \"lpf\" in filename:\n            loaded[\"matching_heuristics\"] = True\n        return loaded\n"
  },
  {
    "path": "fcos/config/__init__.py",
    "content": "from .config import get_cfg\n\n__all__ = [\n    \"get_cfg\",\n]\n"
  },
  {
    "path": "fcos/config/config.py",
    "content": "from detectron2.config import CfgNode\n\n\ndef get_cfg() -> CfgNode:\n    \"\"\"\n    Get a copy of the default config.\n\n    Returns:\n        a detectron2 CfgNode instance.\n    \"\"\"\n    from .defaults import _C\n\n    return _C.clone()\n"
  },
  {
    "path": "fcos/config/defaults.py",
    "content": "from detectron2.config.defaults import _C\nfrom detectron2.config import CfgNode as CN\n\n\n# ---------------------------------------------------------------------------- #\n# Additional Configs\n# ---------------------------------------------------------------------------- #\n_C.MODEL.MOBILENET = False\n\n# ---------------------------------------------------------------------------- #\n# FCOS Head\n# ---------------------------------------------------------------------------- #\n_C.MODEL.FCOS = CN()\n\n# This is the number of foreground classes.\n_C.MODEL.FCOS.NUM_CLASSES = 80\n_C.MODEL.FCOS.IN_FEATURES = [\"p3\", \"p4\", \"p5\", \"p6\", \"p7\"]\n_C.MODEL.FCOS.FPN_STRIDES = [8, 16, 32, 64, 128]\n_C.MODEL.FCOS.PRIOR_PROB = 0.01\n_C.MODEL.FCOS.INFERENCE_TH_TRAIN = 0.05\n_C.MODEL.FCOS.INFERENCE_TH_TEST = 0.05\n_C.MODEL.FCOS.NMS_TH = 0.6\n_C.MODEL.FCOS.PRE_NMS_TOPK_TRAIN = 1000\n_C.MODEL.FCOS.PRE_NMS_TOPK_TEST = 1000\n_C.MODEL.FCOS.POST_NMS_TOPK_TRAIN = 100\n_C.MODEL.FCOS.POST_NMS_TOPK_TEST = 100\n_C.MODEL.FCOS.TOP_LEVELS = 2\n_C.MODEL.FCOS.NORM = \"GN\"  # Support GN or none\n_C.MODEL.FCOS.USE_SCALE = True\n\n# Multiply centerness before threshold\n# This will affect the final performance by about 0.05 AP but save some time\n_C.MODEL.FCOS.THRESH_WITH_CTR = False\n\n# Focal loss parameters\n_C.MODEL.FCOS.LOSS_ALPHA = 0.25\n_C.MODEL.FCOS.LOSS_GAMMA = 2.0\n_C.MODEL.FCOS.SIZES_OF_INTEREST = [64, 128, 256, 512]\n_C.MODEL.FCOS.USE_RELU = True\n_C.MODEL.FCOS.USE_DEFORMABLE = False\n\n# the number of convolutions used in the cls and bbox tower\n_C.MODEL.FCOS.NUM_CLS_CONVS = 4\n_C.MODEL.FCOS.NUM_BOX_CONVS = 4\n_C.MODEL.FCOS.NUM_SHARE_CONVS = 0\n_C.MODEL.FCOS.CENTER_SAMPLE = True\n_C.MODEL.FCOS.POS_RADIUS = 1.5\n_C.MODEL.FCOS.LOC_LOSS_TYPE = 'giou'\n\n\n# ---------------------------------------------------------------------------- #\n# VoVNet backbone\n# ---------------------------------------------------------------------------- #\n\n_C.MODEL.VOVNET = CN()\n_C.MODEL.VOVNET.CONV_BODY = \"V-39-eSE\"\n_C.MODEL.VOVNET.OUT_FEATURES = [\"stage2\", \"stage3\", \"stage4\", \"stage5\"]\n\n# Options: FrozenBN, GN, \"SyncBN\", \"BN\"\n_C.MODEL.VOVNET.NORM = \"FrozenBN\"\n_C.MODEL.VOVNET.OUT_CHANNELS = 256\n_C.MODEL.VOVNET.BACKBONE_OUT_CHANNELS = 256"
  },
  {
    "path": "fcos/data/__init__.py",
    "content": "from . import builtin  # ensure the builtin datasets are registered\n# from .dataset_mapper import DatasetMapperWithBasis\n\n\n# __all__ = [\"DatasetMapperWithBasis\"]\n"
  },
  {
    "path": "fcos/data/builtin.py",
    "content": "import os\n\nfrom detectron2.data.datasets.register_coco import register_coco_instances\n\n# register person in context dataset\n\n_PREDEFINED_SPLITS_PIC = {\n    \"pic_person_train\": (\"pic/image/train\", \"pic/annotations/train_person.json\"),\n    \"pic_person_val\": (\"pic/image/val\", \"pic/annotations/val_person.json\"),\n}\n\nmetadata = {\n    \"thing_classes\": [\"person\"]\n}\n\n\ndef register_all_coco(root=\"datasets\"):\n    for key, (image_root, json_file) in _PREDEFINED_SPLITS_PIC.items():\n        # Assume pre-defined datasets live in `./datasets`.\n        register_coco_instances(\n            key,\n            metadata,\n            os.path.join(root, json_file) if \"://\" not in json_file else json_file,\n            os.path.join(root, image_root),\n        )\n\nregister_all_coco()\n"
  },
  {
    "path": "fcos/layers/__init__.py",
    "content": "from .deform_conv import DFConv2d\nfrom .ml_nms import ml_nms\nfrom .iou_loss import IOULoss\nfrom .conv_with_kaiming_uniform import conv_with_kaiming_uniform\n\n__all__ = [k for k in globals().keys() if not k.startswith(\"_\")]\n"
  },
  {
    "path": "fcos/layers/conv_with_kaiming_uniform.py",
    "content": "from torch import nn\n\nfrom detectron2.layers import Conv2d\nfrom .deform_conv import DFConv2d\nfrom detectron2.layers.batch_norm import get_norm\n\n\ndef conv_with_kaiming_uniform(\n        norm=None, activation=None,\n        use_deformable=False, use_sep=False):\n    def make_conv(\n        in_channels, out_channels, kernel_size, stride=1, dilation=1\n    ):\n        if use_deformable:\n            conv_func = DFConv2d\n        else:\n            conv_func = Conv2d\n        if use_sep:\n            assert in_channels == out_channels\n            groups = in_channels\n        else:\n            groups = 1\n        conv = conv_func(\n            in_channels,\n            out_channels,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=dilation * (kernel_size - 1) // 2,\n            dilation=dilation,\n            groups=groups,\n            bias=(norm is None)\n        )\n        if not use_deformable:\n            # Caffe2 implementation uses XavierFill, which in fact\n            # corresponds to kaiming_uniform_ in PyTorch\n            nn.init.kaiming_uniform_(conv.weight, a=1)\n            if norm is None:\n                nn.init.constant_(conv.bias, 0)\n        module = [conv,]\n        if norm is not None:\n            if norm == \"GN\":\n                norm_module = nn.GroupNorm(32, out_channels)\n            else:\n                norm_module = get_norm(norm, out_channels)\n            module.append(norm_module)\n        if activation is not None:\n            module.append(nn.ReLU(inplace=True))\n        if len(module) > 1:\n            return nn.Sequential(*module)\n        return conv\n\n    return make_conv\n"
  },
  {
    "path": "fcos/layers/csrc/cuda_version.cu",
    "content": "#include <cuda_runtime_api.h>\n\nnamespace adet {\nint get_cudart_version() {\n  return CUDART_VERSION;\n}\n} // namespace adet\n"
  },
  {
    "path": "fcos/layers/csrc/ml_nms/ml_nms.cu",
    "content": "// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.\n#include <ATen/ATen.h>\n#include <ATen/cuda/CUDAContext.h>\n#include <THC/THC.h>\n#include <THC/THCDeviceUtils.cuh>\n\n#include <vector>\n#include <iostream>\n\nint const threadsPerBlock = sizeof(unsigned long long) * 8;\n\n__device__ inline float devIoU(float const * const a, float const * const b) {\n  if (a[5] != b[5]) {\n    return 0.0;\n  }\n  float left = max(a[0], b[0]), right = min(a[2], b[2]);\n  float top = max(a[1], b[1]), bottom = min(a[3], b[3]);\n  float width = max(right - left + 1, 0.f), height = max(bottom - top + 1, 0.f);\n  float interS = width * height;\n  float Sa = (a[2] - a[0] + 1) * (a[3] - a[1] + 1);\n  float Sb = (b[2] - b[0] + 1) * (b[3] - b[1] + 1);\n  return interS / (Sa + Sb - interS);\n}\n\n__global__ void ml_nms_kernel(const int n_boxes, const float nms_overlap_thresh,\n                           const float *dev_boxes, unsigned long long *dev_mask) {\n  const int row_start = blockIdx.y;\n  const int col_start = blockIdx.x;\n\n  // if (row_start > col_start) return;\n\n  const int row_size =\n        min(n_boxes - row_start * threadsPerBlock, threadsPerBlock);\n  const int col_size =\n        min(n_boxes - col_start * threadsPerBlock, threadsPerBlock);\n\n  __shared__ float block_boxes[threadsPerBlock * 6];\n  if (threadIdx.x < col_size) {\n    block_boxes[threadIdx.x * 6 + 0] =\n        dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 0];\n    block_boxes[threadIdx.x * 6 + 1] =\n        dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 1];\n    block_boxes[threadIdx.x * 6 + 2] =\n        dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 2];\n    block_boxes[threadIdx.x * 6 + 3] =\n        dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 3];\n    block_boxes[threadIdx.x * 6 + 4] =\n        dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 4];\n    block_boxes[threadIdx.x * 6 + 5] =\n        dev_boxes[(threadsPerBlock * col_start + threadIdx.x) * 6 + 5];\n  }\n  __syncthreads();\n\n  if (threadIdx.x < row_size) {\n    const int cur_box_idx = threadsPerBlock * row_start + threadIdx.x;\n    const float *cur_box = dev_boxes + cur_box_idx * 6;\n    int i = 0;\n    unsigned long long t = 0;\n    int start = 0;\n    if (row_start == col_start) {\n      start = threadIdx.x + 1;\n    }\n    for (i = start; i < col_size; i++) {\n      if (devIoU(cur_box, block_boxes + i * 6) > nms_overlap_thresh) {\n        t |= 1ULL << i;\n      }\n    }\n    const int col_blocks = THCCeilDiv(n_boxes, threadsPerBlock);\n    dev_mask[cur_box_idx * col_blocks + col_start] = t;\n  }\n}\n\nnamespace adet {\n\n// boxes is a N x 6 tensor\nat::Tensor ml_nms_cuda(const at::Tensor boxes, const float nms_overlap_thresh) {\n  using scalar_t = float;\n  AT_ASSERTM(boxes.type().is_cuda(), \"boxes must be a CUDA tensor\");\n  auto scores = boxes.select(1, 4);\n  auto order_t = std::get<1>(scores.sort(0, /* descending=*/true));\n  auto boxes_sorted = boxes.index_select(0, order_t);\n\n  int boxes_num = boxes.size(0);\n\n  const int col_blocks = THCCeilDiv(boxes_num, threadsPerBlock);\n\n  scalar_t* boxes_dev = boxes_sorted.data<scalar_t>();\n\n  THCState *state = at::globalContext().lazyInitCUDA(); // TODO replace with getTHCState\n\n  unsigned long long* mask_dev = NULL;\n  //THCudaCheck(THCudaMalloc(state, (void**) &mask_dev,\n  //                      boxes_num * col_blocks * sizeof(unsigned long long)));\n\n  mask_dev = (unsigned long long*) THCudaMalloc(state, boxes_num * col_blocks * sizeof(unsigned long long));\n\n  dim3 blocks(THCCeilDiv(boxes_num, threadsPerBlock),\n              THCCeilDiv(boxes_num, threadsPerBlock));\n  dim3 threads(threadsPerBlock);\n  ml_nms_kernel<<<blocks, threads>>>(boxes_num,\n                                  nms_overlap_thresh,\n                                  boxes_dev,\n                                  mask_dev);\n\n  std::vector<unsigned long long> mask_host(boxes_num * col_blocks);\n  THCudaCheck(cudaMemcpy(&mask_host[0],\n                        mask_dev,\n                        sizeof(unsigned long long) * boxes_num * col_blocks,\n                        cudaMemcpyDeviceToHost));\n\n  std::vector<unsigned long long> remv(col_blocks);\n  memset(&remv[0], 0, sizeof(unsigned long long) * col_blocks);\n\n  at::Tensor keep = at::empty({boxes_num}, boxes.options().dtype(at::kLong).device(at::kCPU));\n  int64_t* keep_out = keep.data<int64_t>();\n\n  int num_to_keep = 0;\n  for (int i = 0; i < boxes_num; i++) {\n    int nblock = i / threadsPerBlock;\n    int inblock = i % threadsPerBlock;\n\n    if (!(remv[nblock] & (1ULL << inblock))) {\n      keep_out[num_to_keep++] = i;\n      unsigned long long *p = &mask_host[0] + i * col_blocks;\n      for (int j = nblock; j < col_blocks; j++) {\n        remv[j] |= p[j];\n      }\n    }\n  }\n\n  THCudaFree(state, mask_dev);\n  // TODO improve this part\n  return std::get<0>(order_t.index({\n                       keep.narrow(/*dim=*/0, /*start=*/0, /*length=*/num_to_keep).to(\n                         order_t.device(), keep.scalar_type())\n                     }).sort(0, false));\n}\n\n} // namespace adet"
  },
  {
    "path": "fcos/layers/csrc/ml_nms/ml_nms.h",
    "content": "#pragma once\n#include <torch/extension.h>\n\nnamespace adet {\n\n\n#ifdef WITH_CUDA\nat::Tensor ml_nms_cuda(\n    const at::Tensor dets,\n    const float threshold);\n#endif\n\nat::Tensor ml_nms(const at::Tensor& dets,\n                  const at::Tensor& scores,\n                  const at::Tensor& labels,\n                  const float threshold) {\n\n  if (dets.type().is_cuda()) {\n#ifdef WITH_CUDA\n    // TODO raise error if not compiled with CUDA\n    if (dets.numel() == 0)\n      return at::empty({0}, dets.options().dtype(at::kLong).device(at::kCPU));\n    auto b = at::cat({dets, scores.unsqueeze(1), labels.unsqueeze(1)}, 1);\n    return ml_nms_cuda(b, threshold);\n#else\n    AT_ERROR(\"Not compiled with GPU support\");\n#endif\n  }\n  AT_ERROR(\"CPU version not implemented\");\n}\n\n} // namespace adet\n"
  },
  {
    "path": "fcos/layers/csrc/vision.cpp",
    "content": "// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\n#include \"ml_nms/ml_nms.h\"\n\nnamespace adet {\n\n#ifdef WITH_CUDA\nextern int get_cudart_version();\n#endif\n\nstd::string get_cuda_version() {\n#ifdef WITH_CUDA\n  std::ostringstream oss;\n\n  // copied from\n  // https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cuda/detail/CUDAHooks.cpp#L231\n  auto printCudaStyleVersion = [&](int v) {\n    oss << (v / 1000) << \".\" << (v / 10 % 100);\n    if (v % 10 != 0) {\n      oss << \".\" << (v % 10);\n    }\n  };\n  printCudaStyleVersion(get_cudart_version());\n  return oss.str();\n#else\n  return std::string(\"not available\");\n#endif\n}\n\n// similar to\n// https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Version.cpp\nstd::string get_compiler_version() {\n  std::ostringstream ss;\n#if defined(__GNUC__)\n#ifndef __clang__\n  { ss << \"GCC \" << __GNUC__ << \".\" << __GNUC_MINOR__; }\n#endif\n#endif\n\n#if defined(__clang_major__)\n  {\n    ss << \"clang \" << __clang_major__ << \".\" << __clang_minor__ << \".\"\n       << __clang_patchlevel__;\n  }\n#endif\n\n#if defined(_MSC_VER)\n  { ss << \"MSVC \" << _MSC_FULL_VER; }\n#endif\n  return ss.str();\n}\n\nPYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {\n  m.def(\"ml_nms\", &ml_nms, \"Multi-Label NMS\");\n}\n\n} // namespace adet\n"
  },
  {
    "path": "fcos/layers/deform_conv.py",
    "content": "import torch\nfrom torch import nn\n\nfrom detectron2.layers import Conv2d\n\n\nclass _NewEmptyTensorOp(torch.autograd.Function):\n    @staticmethod\n    def forward(ctx, x, new_shape):\n        ctx.shape = x.shape\n        return x.new_empty(new_shape)\n\n    @staticmethod\n    def backward(ctx, grad):\n        shape = ctx.shape\n        return _NewEmptyTensorOp.apply(grad, shape), None\n\n\nclass DFConv2d(nn.Module):\n    \"\"\"\n    Deformable convolutional layer with configurable\n    deformable groups, dilations and groups.\n\n    Code is from:\n    https://github.com/facebookresearch/maskrcnn-benchmark/blob/master/maskrcnn_benchmark/layers/misc.py\n\n\n    \"\"\"\n    def __init__(\n            self,\n            in_channels,\n            out_channels,\n            with_modulated_dcn=True,\n            kernel_size=3,\n            stride=1,\n            groups=1,\n            dilation=1,\n            deformable_groups=1,\n            bias=False,\n            padding=None\n    ):\n        super(DFConv2d, self).__init__()\n        if isinstance(kernel_size, (list, tuple)):\n            assert isinstance(stride, (list, tuple))\n            assert isinstance(dilation, (list, tuple))\n            assert len(kernel_size) == 2\n            assert len(stride) == 2\n            assert len(dilation) == 2\n            padding = (\n                dilation[0] * (kernel_size[0] - 1) // 2,\n                dilation[1] * (kernel_size[1] - 1) // 2\n            )\n            offset_base_channels = kernel_size[0] * kernel_size[1]\n        else:\n            padding = dilation * (kernel_size - 1) // 2\n            offset_base_channels = kernel_size * kernel_size\n        if with_modulated_dcn:\n            from .deform_conv import ModulatedDeformConv\n            offset_channels = offset_base_channels * 3  # default: 27\n            conv_block = ModulatedDeformConv\n        else:\n            from .deform_conv import DeformConv\n            offset_channels = offset_base_channels * 2  # default: 18\n            conv_block = DeformConv\n        self.offset = Conv2d(\n            in_channels,\n            deformable_groups * offset_channels,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=padding,\n            groups=1,\n            dilation=dilation\n        )\n        for l in [self.offset, ]:\n            nn.init.kaiming_uniform_(l.weight, a=1)\n            torch.nn.init.constant_(l.bias, 0.)\n        self.conv = conv_block(\n            in_channels,\n            out_channels,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=padding,\n            dilation=dilation,\n            groups=groups,\n            deformable_groups=deformable_groups,\n            bias=bias\n        )\n        self.with_modulated_dcn = with_modulated_dcn\n        self.kernel_size = kernel_size\n        self.stride = stride\n        self.padding = padding\n        self.dilation = dilation\n        self.offset_split = offset_base_channels * deformable_groups * 2\n\n    def forward(self, x, return_offset=False):\n        if x.numel() > 0:\n            if not self.with_modulated_dcn:\n                offset_mask = self.offset(x)\n                x = self.conv(x, offset_mask)\n            else:\n                offset_mask = self.offset(x)\n                offset = offset_mask[:, :self.offset_split, :, :]\n                mask = offset_mask[:, self.offset_split:, :, :].sigmoid()\n                x = self.conv(x, offset, mask)\n            if return_offset:\n                return x, offset_mask\n            return x\n        # get output shape\n        output_shape = [\n            (i + 2 * p - (di * (k - 1) + 1)) // d + 1\n            for i, p, di, k, d in zip(\n                x.shape[-2:],\n                self.padding,\n                self.dilation,\n                self.kernel_size,\n                self.stride\n            )\n        ]\n        output_shape = [x.shape[0], self.conv.weight.shape[0]] + output_shape\n        return _NewEmptyTensorOp.apply(x, output_shape)\n"
  },
  {
    "path": "fcos/layers/iou_loss.py",
    "content": "import torch\nfrom torch import nn\n\n\nclass IOULoss(nn.Module):\n    \"\"\"\n    Intersetion Over Union (IoU) loss which supports three\n    different IoU computations:\n\n    * IoU\n    * Linear IoU\n    * gIoU\n    \"\"\"\n    def __init__(self, loc_loss_type='iou'):\n        super(IOULoss, self).__init__()\n        self.loc_loss_type = loc_loss_type\n\n    def forward(self, pred, target, weight=None):\n        \"\"\"\n        Args:\n            pred: Nx4 predicted bounding boxes\n            target: Nx4 target bounding boxes\n            weight: N loss weight for each instance\n        \"\"\"\n        pred_left = pred[:, 0]\n        pred_top = pred[:, 1]\n        pred_right = pred[:, 2]\n        pred_bottom = pred[:, 3]\n\n        target_left = target[:, 0]\n        target_top = target[:, 1]\n        target_right = target[:, 2]\n        target_bottom = target[:, 3]\n\n        target_aera = (target_left + target_right) * \\\n                      (target_top + target_bottom)\n        pred_aera = (pred_left + pred_right) * \\\n                    (pred_top + pred_bottom)\n\n        w_intersect = torch.min(pred_left, target_left) + \\\n                      torch.min(pred_right, target_right)\n        h_intersect = torch.min(pred_bottom, target_bottom) + \\\n                      torch.min(pred_top, target_top)\n\n        g_w_intersect = torch.max(pred_left, target_left) + \\\n                        torch.max(pred_right, target_right)\n        g_h_intersect = torch.max(pred_bottom, target_bottom) + \\\n                        torch.max(pred_top, target_top)\n        ac_uion = g_w_intersect * g_h_intersect\n\n        area_intersect = w_intersect * h_intersect\n        area_union = target_aera + pred_aera - area_intersect\n\n        ious = (area_intersect + 1.0) / (area_union + 1.0)\n        gious = ious - (ac_uion - area_union) / ac_uion\n        if self.loc_loss_type == 'iou':\n            losses = -torch.log(ious)\n        elif self.loc_loss_type == 'linear_iou':\n            losses = 1 - ious\n        elif self.loc_loss_type == 'giou':\n            losses = 1 - gious\n        else:\n            raise NotImplementedError\n\n        if weight is not None:\n            return (losses * weight).sum()\n        else:\n            return losses.sum()\n"
  },
  {
    "path": "fcos/layers/ml_nms.py",
    "content": "from detectron2.layers import batched_nms\n\n\ndef ml_nms(boxlist, nms_thresh, max_proposals=-1,\n           score_field=\"scores\", label_field=\"labels\"):\n    \"\"\"\n    Performs non-maximum suppression on a boxlist, with scores specified\n    in a boxlist field via score_field.\n    \n    Args:\n        boxlist (detectron2.structures.Boxes): \n        nms_thresh (float): \n        max_proposals (int): if > 0, then only the top max_proposals are kept\n            after non-maximum suppression\n        score_field (str): \n    \"\"\"\n    if nms_thresh <= 0:\n        return boxlist\n    boxes = boxlist.pred_boxes.tensor\n    scores = boxlist.scores\n    labels = boxlist.pred_classes\n    keep = batched_nms(boxes, scores, labels, nms_thresh)\n    if max_proposals > 0:\n        keep = keep[: max_proposals]\n    boxlist = boxlist[keep]\n    return boxlist\n"
  },
  {
    "path": "fcos/modeling/__init__.py",
    "content": "from .fcos import FCOS\nfrom .backbone import build_fcos_resnet_fpn_backbone\nfrom .one_stage_detector import OneStageDetector\n\n_EXCLUDE = {\"torch\", \"ShapeSpec\"}\n__all__ = [k for k in globals().keys() if k not in _EXCLUDE and not k.startswith(\"_\")]\n"
  },
  {
    "path": "fcos/modeling/backbone/__init__.py",
    "content": "from .fpn import build_fcos_resnet_fpn_backbone\nfrom .vovnet import build_vovnet_fpn_backbone, build_vovnet_backbone\n"
  },
  {
    "path": "fcos/modeling/backbone/fpn.py",
    "content": "from torch import nn\nimport torch.nn.functional as F\nimport fvcore.nn.weight_init as weight_init\n\nfrom detectron2.modeling.backbone import FPN, build_resnet_backbone\nfrom detectron2.layers import ShapeSpec\nfrom detectron2.modeling.backbone.build import BACKBONE_REGISTRY\n\nfrom .mobilenet import build_mnv2_backbone\n\n\nclass LastLevelP6P7(nn.Module):\n    \"\"\"\n    This module is used in RetinaNet and FCOS to generate extra layers, P6 and P7 from\n    C5 or P5 feature.\n    \"\"\"\n\n    def __init__(self, in_channels, out_channels, in_features=\"res5\"):\n        super().__init__()\n        self.num_levels = 2\n        self.in_feature = in_features\n        self.p6 = nn.Conv2d(in_channels, out_channels, 3, 2, 1)\n        self.p7 = nn.Conv2d(out_channels, out_channels, 3, 2, 1)\n        for module in [self.p6, self.p7]:\n            weight_init.c2_xavier_fill(module)\n\n    def forward(self, x):\n        p6 = self.p6(x)\n        p7 = self.p7(F.relu(p6))\n        return [p6, p7]\n\n\nclass LastLevelP6(nn.Module):\n    \"\"\"\n    This module is used in FCOS to generate extra layers\n    \"\"\"\n\n    def __init__(self, in_channels, out_channels, in_features=\"res5\"):\n        super().__init__()\n        self.num_levels = 1\n        self.in_feature = in_features\n        self.p6 = nn.Conv2d(in_channels, out_channels, 3, 2, 1)\n        for module in [self.p6]:\n            weight_init.c2_xavier_fill(module)\n\n    def forward(self, x):\n        p6 = self.p6(x)\n        return [p6]\n\n\n@BACKBONE_REGISTRY.register()\ndef build_fcos_resnet_fpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    if cfg.MODEL.MOBILENET:\n        bottom_up = build_mnv2_backbone(cfg, input_shape)\n    else:\n        bottom_up = build_resnet_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.FPN.OUT_CHANNELS\n    top_levels = cfg.MODEL.FCOS.TOP_LEVELS\n    in_channels_top = out_channels\n    if top_levels == 2:\n        top_block = LastLevelP6P7(in_channels_top, out_channels, \"p5\")\n    if top_levels == 1:\n        top_block = LastLevelP6(in_channels_top, out_channels, \"p5\")\n    elif top_levels == 0:\n        top_block = None\n    backbone = FPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        norm=cfg.MODEL.FPN.NORM,\n        top_block=top_block,\n        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,\n    )\n    return backbone\n"
  },
  {
    "path": "fcos/modeling/backbone/mobilenet.py",
    "content": "# taken from https://github.com/tonylins/pytorch-mobilenet-v2/\n# Published by Ji Lin, tonylins\n# licensed under the  Apache License, Version 2.0, January 2004\n\nfrom torch import nn\nfrom torch.nn import BatchNorm2d\n#from detectron2.layers.batch_norm import NaiveSyncBatchNorm as BatchNorm2d\nfrom detectron2.layers import Conv2d\nfrom detectron2.modeling.backbone.build import BACKBONE_REGISTRY\nfrom detectron2.modeling.backbone import Backbone\n\n\ndef conv_bn(inp, oup, stride):\n    return nn.Sequential(\n        Conv2d(inp, oup, 3, stride, 1, bias=False),\n        BatchNorm2d(oup),\n        nn.ReLU6(inplace=True)\n    )\n\n\ndef conv_1x1_bn(inp, oup):\n    return nn.Sequential(\n        Conv2d(inp, oup, 1, 1, 0, bias=False),\n        BatchNorm2d(oup),\n        nn.ReLU6(inplace=True)\n    )\n\n\nclass InvertedResidual(nn.Module):\n    def __init__(self, inp, oup, stride, expand_ratio):\n        super(InvertedResidual, self).__init__()\n        self.stride = stride\n        assert stride in [1, 2]\n\n        hidden_dim = int(round(inp * expand_ratio))\n        self.use_res_connect = self.stride == 1 and inp == oup\n\n        if expand_ratio == 1:\n            self.conv = nn.Sequential(\n                # dw\n                Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),\n                BatchNorm2d(hidden_dim),\n                nn.ReLU6(inplace=True),\n                # pw-linear\n                Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),\n                BatchNorm2d(oup),\n            )\n        else:\n            self.conv = nn.Sequential(\n                # pw\n                Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),\n                BatchNorm2d(hidden_dim),\n                nn.ReLU6(inplace=True),\n                # dw\n                Conv2d(hidden_dim, hidden_dim, 3, stride, 1, groups=hidden_dim, bias=False),\n                BatchNorm2d(hidden_dim),\n                nn.ReLU6(inplace=True),\n                # pw-linear\n                Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),\n                BatchNorm2d(oup),\n            )\n\n    def forward(self, x):\n        if self.use_res_connect:\n            return x + self.conv(x)\n        else:\n            return self.conv(x)\n\n\nclass MobileNetV2(Backbone):\n    \"\"\"\n    Should freeze bn\n    \"\"\"\n    def __init__(self, cfg, n_class=1000, input_size=224, width_mult=1.):\n        super(MobileNetV2, self).__init__()\n        block = InvertedResidual\n        input_channel = 32\n        interverted_residual_setting = [\n            # t, c, n, s\n            [1, 16, 1, 1],\n            [6, 24, 2, 2],\n            [6, 32, 3, 2],\n            [6, 64, 4, 2],\n            [6, 96, 3, 1],\n            [6, 160, 3, 2],\n            [6, 320, 1, 1],\n        ]\n\n        # building first layer\n        assert input_size % 32 == 0\n        input_channel = int(input_channel * width_mult)\n        self.return_features_indices = [3, 6, 13, 17]\n        self.return_features_num_channels = []\n        self.features = nn.ModuleList([conv_bn(3, input_channel, 2)])\n        # building inverted residual blocks\n        for t, c, n, s in interverted_residual_setting:\n            output_channel = int(c * width_mult)\n            for i in range(n):\n                if i == 0:\n                    self.features.append(block(input_channel, output_channel, s, expand_ratio=t))\n                else:\n                    self.features.append(block(input_channel, output_channel, 1, expand_ratio=t))\n                input_channel = output_channel\n                if len(self.features) - 1 in self.return_features_indices:\n                    self.return_features_num_channels.append(output_channel)\n\n        self._initialize_weights()\n        self._freeze_backbone(cfg.MODEL.BACKBONE.FREEZE_AT)\n\n    def _freeze_backbone(self, freeze_at):\n        for layer_index in range(freeze_at):\n            for p in self.features[layer_index].parameters():\n                p.requires_grad = False\n\n    def forward(self, x):\n        res = []\n        for i, m in enumerate(self.features):\n            x = m(x)\n            if i in self.return_features_indices:\n                res.append(x)\n        return {'res{}'.format(i + 2): r for i, r in enumerate(res)}\n\n    def _initialize_weights(self):\n        for m in self.modules():\n            if isinstance(m, Conv2d):\n                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n                m.weight.data.normal_(0, (2. / n) ** 0.5)\n                if m.bias is not None:\n                    m.bias.data.zero_()\n            elif isinstance(m, BatchNorm2d):\n                m.weight.data.fill_(1)\n                m.bias.data.zero_()\n            elif isinstance(m, nn.Linear):\n                n = m.weight.size(1)\n                m.weight.data.normal_(0, 0.01)\n                m.bias.data.zero_()\n\n@BACKBONE_REGISTRY.register()\ndef build_mnv2_backbone(cfg, input_shape):\n    \"\"\"\n    Create a ResNet instance from config.\n\n    Returns:\n        ResNet: a :class:`ResNet` instance.\n    \"\"\"\n    out_features = cfg.MODEL.RESNETS.OUT_FEATURES\n\n    out_feature_channels = {\"res2\": 24, \"res3\": 32,\n                            \"res4\": 96, \"res5\": 320}\n    out_feature_strides = {\"res2\": 4, \"res3\": 8, \"res4\": 16, \"res5\": 32}\n    model = MobileNetV2(cfg)\n    model._out_features = out_features\n    model._out_feature_channels = out_feature_channels\n    model._out_feature_strides = out_feature_strides\n    return model\n"
  },
  {
    "path": "fcos/modeling/backbone/vovnet.py",
    "content": "# Copyright (c) Youngwan Lee (ETRI) All Rights Reserved.\nfrom collections import OrderedDict\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nimport fvcore.nn.weight_init as weight_init\nfrom detectron2.modeling.backbone import Backbone\nfrom detectron2.modeling.backbone.build import BACKBONE_REGISTRY\nfrom detectron2.modeling.backbone.fpn import FPN\nfrom detectron2.layers import (\n    Conv2d,\n    DeformConv,\n    FrozenBatchNorm2d,\n    ShapeSpec,\n    get_norm,\n)\nfrom .fpn import LastLevelP6, LastLevelP6P7\n\n__all__ = [\n    \"VoVNet\",\n    \"build_vovnet_backbone\",\n    \"build_vovnet_fpn_backbone\"\n]\n\n_NORM = False\n\nVoVNet19_eSE = {\n    'stage_conv_ch': [128, 160, 192, 224],\n    'stage_out_ch': [256, 512, 768, 1024],\n    'layer_per_block': 3,\n    'block_per_stage': [1, 1, 1, 1],\n    'eSE' : True\n}\n\nVoVNet39_eSE = {\n    'stage_conv_ch': [128, 160, 192, 224],\n    'stage_out_ch': [256, 512, 768, 1024],\n    'layer_per_block': 5,\n    'block_per_stage': [1, 1, 2, 2],\n    'eSE' : True\n}\n\nVoVNet57_eSE = {\n    'stage_conv_ch': [128, 160, 192, 224],\n    'stage_out_ch': [256, 512, 768, 1024],\n    'layer_per_block': 5,\n    'block_per_stage': [1, 1, 4, 3],\n    'eSE' : True\n}\n\nVoVNet99_eSE = {\n    'stage_conv_ch': [128, 160, 192, 224],\n    'stage_out_ch': [256, 512, 768, 1024],\n    'layer_per_block': 5,\n    'block_per_stage': [1, 3, 9, 3],\n    'eSE' : True\n}\n\n_STAGE_SPECS = {\n    \"V-19-eSE\": VoVNet19_eSE,\n    \"V-39-eSE\": VoVNet39_eSE,\n    \"V-57-eSE\": VoVNet57_eSE,\n    \"V-99-eSE\": VoVNet99_eSE\n}\n\ndef conv3x3(in_channels, out_channels, module_name, postfix, \n              stride=1, groups=1, kernel_size=3, padding=1):\n    \"\"\"3x3 convolution with padding\"\"\"\n    return [\n        (f'{module_name}_{postfix}/conv',\n         nn.Conv2d(in_channels, \n                    out_channels, \n                    kernel_size=kernel_size, \n                    stride=stride, \n                    padding=padding, \n                    groups=groups, \n                    bias=False)),\n        (f'{module_name}_{postfix}/norm', get_norm(_NORM, out_channels)),\n        (f'{module_name}_{postfix}/relu', nn.ReLU(inplace=True))\n    ]\n\n\ndef conv1x1(in_channels, out_channels, module_name, postfix, \n              stride=1, groups=1, kernel_size=1, padding=0):\n    \"\"\"1x1 convolution with padding\"\"\"\n    return [\n        (f'{module_name}_{postfix}/conv',\n         nn.Conv2d(in_channels, \n                    out_channels, \n                    kernel_size=kernel_size, \n                    stride=stride, \n                    padding=padding, \n                    groups=groups,\n                    bias=False)),\n        (f'{module_name}_{postfix}/norm', get_norm(_NORM, out_channels)),\n        (f'{module_name}_{postfix}/relu', nn.ReLU(inplace=True))\n    ]\n\nclass Hsigmoid(nn.Module):\n    def __init__(self, inplace=True):\n        super(Hsigmoid, self).__init__()\n        self.inplace = inplace\n\n    def forward(self, x):\n        return F.relu6(x + 3., inplace=self.inplace) / 6.\n\n\nclass eSEModule(nn.Module):\n    def __init__(self, channel, reduction=4):\n        super(eSEModule, self).__init__()\n        self.avg_pool = nn.AdaptiveAvgPool2d(1)\n        self.fc = nn.Conv2d(channel,channel, kernel_size=1,\n                             padding=0)\n        self.hsigmoid = Hsigmoid()\n\n    def forward(self, x):\n        input = x\n        x = self.avg_pool(x)\n        x = self.fc(x)\n        x = self.hsigmoid(x)\n        return input * x\n\n\nclass _OSA_module(nn.Module):\n\n    def __init__(self, \n                 in_ch, \n                 stage_ch, \n                 concat_ch, \n                 layer_per_block, \n                 module_name, \n                 SE=False,\n                 identity=False):\n\n        super(_OSA_module, self).__init__()\n\n        self.identity = identity\n        self.layers = nn.ModuleList()\n        in_channel = in_ch\n        for i in range(layer_per_block):\n            self.layers.append(nn.Sequential(OrderedDict(conv3x3(in_channel, stage_ch, module_name, i))))\n            in_channel = stage_ch\n\n        # feature aggregation\n        in_channel = in_ch + layer_per_block * stage_ch\n        self.concat = nn.Sequential(OrderedDict(conv1x1(in_channel, concat_ch, module_name, 'concat')))\n\n        self.ese = eSEModule(concat_ch)\n\n    def forward(self, x):\n\n        identity_feat = x\n\n        output = []\n        output.append(x)\n        for layer in self.layers:\n            x = layer(x)\n            output.append(x)\n\n        x = torch.cat(output, dim=1)\n        xt = self.concat(x)\n\n        xt = self.ese(xt)\n\n        if self.identity:\n            xt = xt + identity_feat\n\n        return xt\n\n\nclass _OSA_stage(nn.Sequential):\n\n    def __init__(self, \n                 in_ch, \n                 stage_ch, \n                 concat_ch, \n                 block_per_stage, \n                 layer_per_block, \n                 stage_num,\n                 SE=False):\n        super(_OSA_stage, self).__init__()\n\n        if not stage_num == 2:\n            self.add_module('Pooling', nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True))\n\n        if block_per_stage !=1:\n            SE = False\n        module_name = f'OSA{stage_num}_1'\n        self.add_module(module_name, _OSA_module(in_ch, \n                                                 stage_ch, \n                                                 concat_ch, \n                                                 layer_per_block, \n                                                 module_name,\n                                                 SE))\n        for i in range(block_per_stage - 1):\n            if i != block_per_stage -2: #last block\n                SE = False\n            module_name = f'OSA{stage_num}_{i + 2}'\n            self.add_module(module_name,\n                            _OSA_module(concat_ch, \n                                        stage_ch, \n                                        concat_ch, \n                                        layer_per_block, \n                                        module_name, \n                                        SE,\n                                        identity=True))\n\n\n\nclass VoVNet(Backbone):\n\n    def __init__(self, cfg, input_ch, out_features=None):\n        \"\"\"\n        Args:\n            input_ch(int) : the number of input channel\n            out_features (list[str]): name of the layers whose outputs should\n                be returned in forward. Can be anything in \"stem\", \"stage2\" ...\n        \"\"\"\n        super(VoVNet, self).__init__()\n\n        global _NORM\n        _NORM = cfg.MODEL.VOVNET.NORM\n            \n        stage_specs = _STAGE_SPECS[cfg.MODEL.VOVNET.CONV_BODY]\n\n        config_stage_ch = stage_specs['stage_conv_ch']\n        config_concat_ch = stage_specs['stage_out_ch']\n        block_per_stage = stage_specs['block_per_stage']\n        layer_per_block = stage_specs['layer_per_block']\n        SE = stage_specs['eSE']\n\n        self._out_features = out_features\n\n\n        # Stem module\n        stem = conv3x3(input_ch, 64, 'stem', '1', 2)\n        stem += conv3x3(64, 64, 'stem', '2', 1)\n        stem += conv3x3(64, 128, 'stem', '3', 2)\n        self.add_module('stem', nn.Sequential((OrderedDict(stem))))\n        current_stirde = 4\n        self._out_feature_strides = {\"stem\": current_stirde, \"stage2\": current_stirde}\n        self._out_feature_channels = {\"stem\": 128}\n\n        stem_out_ch = [128]\n        in_ch_list = stem_out_ch + config_concat_ch[:-1]\n        # OSA stages\n        self.stage_names = []\n        for i in range(4):  # num_stages\n            name = 'stage%d' % (i + 2) # stage 2 ... stage 5\n            self.stage_names.append(name)\n            self.add_module(name, _OSA_stage(in_ch_list[i],\n                                             config_stage_ch[i],\n                                             config_concat_ch[i],\n                                             block_per_stage[i],\n                                             layer_per_block,\n                                             i + 2,\n                                             SE))\n            \n            self._out_feature_channels[name] = config_concat_ch[i]\n            if not i == 0:\n                self._out_feature_strides[name] = current_stirde = int(\n                    current_stirde * 2) \n\n        # initialize weights\n        self._initialize_weights()\n        # Optionally freeze (requires_grad=False) parts of the backbone\n        self._freeze_backbone(cfg.MODEL.BACKBONE.FREEZE_AT)\n\n\n    def _initialize_weights(self):\n        for m in self.modules():\n            if isinstance(m, nn.Conv2d):\n                nn.init.kaiming_normal_(m.weight)\n\n    def _freeze_backbone(self, freeze_at):\n        if freeze_at < 0:\n            return\n        # freeze BN layers\n        for m in self.modules():\n            if isinstance(m, nn.BatchNorm2d):\n                freeze_bn_params(m)\n        for stage_index in range(freeze_at):\n            if stage_index == 0:\n                m = self.stem # stage 0 is the stem\n            else:\n                m = getattr(self, \"stage\" + str(stage_index+1))\n            for p in m.parameters():\n                p.requires_grad = False\n                FrozenBatchNorm2d.convert_frozen_batchnorm(self)\n\n    def forward(self, x):\n        outputs = {}\n        x = self.stem(x)\n        if \"stem\" in self._out_features:\n            outputs[\"stem\"] = x\n        for name in self.stage_names:\n            x = getattr(self, name)(x)\n            if name in self._out_features:\n                outputs[name] = x\n\n        return outputs\n\n    def output_shape(self):\n        return {\n            name: ShapeSpec(\n                channels=self._out_feature_channels[name], stride=self._out_feature_strides[name]\n            )\n            for name in self._out_features\n        }\n\n\n@BACKBONE_REGISTRY.register()\ndef build_vovnet_backbone(cfg, input_shape):\n    \"\"\"\n    Create a VoVNet instance from config.\n\n    Returns:\n        VoVNet: a :class:`VoVNet` instance.\n    \"\"\"\n    out_features = cfg.MODEL.VOVNET.OUT_FEATURES\n    return VoVNet(cfg, input_shape.channels, out_features=out_features)\n\n\n@BACKBONE_REGISTRY.register()\ndef build_vovnet_fpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = build_vovnet_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.FPN.OUT_CHANNELS\n    backbone = FPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        norm=cfg.MODEL.FPN.NORM,\n        top_block=LastLevelMaxPool(),\n        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,\n    )\n    return backbone\n\n\n@BACKBONE_REGISTRY.register()\ndef build_fcos_vovnet_fpn_backbone(cfg, input_shape: ShapeSpec):\n    \"\"\"\n    Args:\n        cfg: a detectron2 CfgNode\n\n    Returns:\n        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.\n    \"\"\"\n    bottom_up = build_vovnet_backbone(cfg, input_shape)\n    in_features = cfg.MODEL.FPN.IN_FEATURES\n    out_channels = cfg.MODEL.FPN.OUT_CHANNELS\n    top_levels = cfg.MODEL.FCOS.TOP_LEVELS\n    in_channels_top = out_channels\n    if top_levels == 2:\n        top_block = LastLevelP6P7(in_channels_top, out_channels, \"p5\")\n    if top_levels == 1:\n        top_block = LastLevelP6(in_channels_top, out_channels, \"p5\")\n    elif top_levels == 0:\n        top_block = None\n    backbone = FPN(\n        bottom_up=bottom_up,\n        in_features=in_features,\n        out_channels=out_channels,\n        norm=cfg.MODEL.FPN.NORM,\n        top_block=top_block,\n        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,\n    )\n    return backbone\n"
  },
  {
    "path": "fcos/modeling/fcos/__init__.py",
    "content": "from .fcos import FCOS\n"
  },
  {
    "path": "fcos/modeling/fcos/fcos.py",
    "content": "import math\nfrom typing import List, Dict\nimport torch\nfrom torch import nn\nfrom torch.nn import functional as F\n\nfrom detectron2.layers import ShapeSpec\nfrom detectron2.modeling.proposal_generator.build import PROPOSAL_GENERATOR_REGISTRY\n\nfrom fcos.layers import DFConv2d, IOULoss\nfrom .fcos_outputs import FCOSOutputs\n\n\n__all__ = [\"FCOS\"]\n\nINF = 100000000\n\n\nclass Scale(nn.Module):\n    def __init__(self, init_value=1.0):\n        super(Scale, self).__init__()\n        self.scale = nn.Parameter(torch.FloatTensor([init_value]))\n\n    def forward(self, input):\n        return input * self.scale\n\n\n@PROPOSAL_GENERATOR_REGISTRY.register()\nclass FCOS(nn.Module):\n    \"\"\"\n    Implement FCOS (https://arxiv.org/abs/1904.01355).\n    \"\"\"\n    def __init__(self, cfg, input_shape: Dict[str, ShapeSpec]):\n        super().__init__()\n        # fmt: off\n        self.in_features          = cfg.MODEL.FCOS.IN_FEATURES\n        self.fpn_strides          = cfg.MODEL.FCOS.FPN_STRIDES\n        self.focal_loss_alpha     = cfg.MODEL.FCOS.LOSS_ALPHA\n        self.focal_loss_gamma     = cfg.MODEL.FCOS.LOSS_GAMMA\n        self.center_sample        = cfg.MODEL.FCOS.CENTER_SAMPLE\n        self.strides              = cfg.MODEL.FCOS.FPN_STRIDES\n        self.radius               = cfg.MODEL.FCOS.POS_RADIUS\n        self.pre_nms_thresh_train = cfg.MODEL.FCOS.INFERENCE_TH_TRAIN\n        self.pre_nms_thresh_test  = cfg.MODEL.FCOS.INFERENCE_TH_TEST\n        self.pre_nms_topk_train   = cfg.MODEL.FCOS.PRE_NMS_TOPK_TRAIN\n        self.pre_nms_topk_test    = cfg.MODEL.FCOS.PRE_NMS_TOPK_TEST\n        self.nms_thresh           = cfg.MODEL.FCOS.NMS_TH\n        self.post_nms_topk_train  = cfg.MODEL.FCOS.POST_NMS_TOPK_TRAIN\n        self.post_nms_topk_test   = cfg.MODEL.FCOS.POST_NMS_TOPK_TEST\n        self.thresh_with_ctr      = cfg.MODEL.FCOS.THRESH_WITH_CTR\n        # fmt: on\n        self.iou_loss = IOULoss(cfg.MODEL.FCOS.LOC_LOSS_TYPE)\n        # generate sizes of interest\n        soi = []\n        prev_size = -1\n        for s in cfg.MODEL.FCOS.SIZES_OF_INTEREST:\n            soi.append([prev_size, s])\n            prev_size = s\n        soi.append([prev_size, INF])\n        self.sizes_of_interest = soi\n        self.fcos_head = FCOSHead(cfg, [input_shape[f] for f in self.in_features])\n\n    def forward(self, images, features, gt_instances):\n        \"\"\"\n        Arguments:\n            images (list[Tensor] or ImageList): images to be processed\n            targets (list[BoxList]): ground-truth boxes present in the image (optional)\n\n        Returns:\n            result (list[BoxList] or dict[Tensor]): the output from the model.\n                During training, it returns a dict[Tensor] which contains the losses.\n                During testing, it returns list[BoxList] contains additional fields\n                like `scores`, `labels` and `mask` (for Mask R-CNN models).\n\n        \"\"\"\n        features = [features[f] for f in self.in_features]\n        locations = self.compute_locations(features)\n        logits_pred, reg_pred, ctrness_pred, bbox_towers, controllers, masks = self.fcos_head(features)\n\n        if self.training:\n            pre_nms_thresh = self.pre_nms_thresh_train\n            pre_nms_topk = self.pre_nms_topk_train\n            post_nms_topk = self.post_nms_topk_train\n        else:\n            pre_nms_thresh = self.pre_nms_thresh_test\n            pre_nms_topk = self.pre_nms_topk_test\n            post_nms_topk = self.post_nms_topk_test\n\n        outputs = FCOSOutputs(\n            images,\n            locations,\n            logits_pred,\n            reg_pred,\n            ctrness_pred,\n            self.focal_loss_alpha,\n            self.focal_loss_gamma,\n            self.iou_loss,\n            self.center_sample,\n            self.sizes_of_interest,\n            self.strides,\n            self.radius,\n            self.fcos_head.num_classes,\n            pre_nms_thresh,\n            pre_nms_topk,\n            self.nms_thresh,\n            post_nms_topk,\n            self.thresh_with_ctr,\n            controllers, \n            masks,\n            gt_instances\n        )\n\n        if self.training:\n            losses, _ = outputs.losses()\n            return None, losses\n        else:\n            proposals = outputs.predict_proposals()\n            return proposals, {}\n\n    def compute_locations(self, features):\n        locations = []\n        for level, feature in enumerate(features):\n            h, w = feature.size()[-2:]\n            locations_per_level = self.compute_locations_per_level(\n                h, w, self.fpn_strides[level],\n                feature.device\n            )\n            locations.append(locations_per_level)\n        return locations\n\n    def compute_locations_per_level(self, h, w, stride, device):\n        shifts_x = torch.arange(\n            0, w * stride, step=stride,\n            dtype=torch.float32, device=device\n        )\n        shifts_y = torch.arange(\n            0, h * stride, step=stride,\n            dtype=torch.float32, device=device\n        )\n        shift_y, shift_x = torch.meshgrid(shifts_y, shifts_x)\n        shift_x = shift_x.reshape(-1)\n        shift_y = shift_y.reshape(-1)\n        locations = torch.stack((shift_x, shift_y), dim=1) + stride // 2\n        return locations\n\n\nclass FCOSHead(nn.Module):\n    def __init__(self, cfg, input_shape: List[ShapeSpec]):\n        \"\"\"\n        Arguments:\n            in_channels (int): number of channels of the input feature\n        \"\"\"\n        super().__init__()\n        # TODO: Implement the sigmoid version first.\n        self.num_classes = cfg.MODEL.FCOS.NUM_CLASSES\n        self.fpn_strides = cfg.MODEL.FCOS.FPN_STRIDES\n        head_configs = {\"cls\": (cfg.MODEL.FCOS.NUM_CLS_CONVS,\n                                False),\n                        \"bbox\": (cfg.MODEL.FCOS.NUM_BOX_CONVS,\n                                 cfg.MODEL.FCOS.USE_DEFORMABLE),\n                        \"share\": (cfg.MODEL.FCOS.NUM_SHARE_CONVS,\n                                  cfg.MODEL.FCOS.USE_DEFORMABLE),\n                        \"mask\": (8,False)}          \n        norm = None if cfg.MODEL.FCOS.NORM == \"none\" else cfg.MODEL.FCOS.NORM\n\n        in_channels = [s.channels for s in input_shape]\n        assert len(set(in_channels)) == 1, \"Each level must have the same channel!\"\n        in_channels = in_channels[0]\n\n        for head in head_configs:\n            tower = []\n            num_convs, use_deformable = head_configs[head]\n            if use_deformable:\n                conv_func = DFConv2d\n            else:\n                conv_func = nn.Conv2d\n            for i in range(num_convs):\n                tower.append(conv_func(\n                        in_channels, in_channels,\n                        kernel_size=3, stride=1,\n                        padding=1, bias=True\n                ))\n                if norm == \"GN\":\n                    tower.append(nn.GroupNorm(32, in_channels))\n                tower.append(nn.ReLU())\n            self.add_module('{}_tower'.format(head),\n                            nn.Sequential(*tower))\n\n        self.cls_logits = nn.Conv2d(\n            in_channels, self.num_classes,\n            kernel_size=3, stride=1,\n            padding=1\n        )\n        self.bbox_pred = nn.Conv2d(\n            in_channels, 4, kernel_size=3,\n            stride=1, padding=1\n        )\n        self.ctrness = nn.Conv2d(\n            in_channels, 1, kernel_size=3,\n            stride=1, padding=1\n        )\n        self.controller = nn.Conv2d(\n            in_channels, 169, kernel_size=3,\n            stride=1, padding=1\n        )\n        self.mask = nn.Conv2d(\n            in_channels, 8,\n            kernel_size=3, stride=1,\n            padding=1\n        )\n\n        if cfg.MODEL.FCOS.USE_SCALE:\n            self.scales = nn.ModuleList([Scale(init_value=1.0) for _ in self.fpn_strides])\n        else:\n            self.scales = None\n\n        for modules in [\n            self.cls_tower, self.bbox_tower,\n            self.share_tower, self.cls_logits,\n            self.bbox_pred, self.ctrness,\n            self.controller, self.mask, \n        ]:\n            for l in modules.modules():\n                if isinstance(l, nn.Conv2d):\n                    torch.nn.init.normal_(l.weight, std=0.01)\n                    torch.nn.init.constant_(l.bias, 0)\n\n        # initialize the bias for focal loss\n        prior_prob = cfg.MODEL.FCOS.PRIOR_PROB\n        bias_value = -math.log((1 - prior_prob) / prior_prob)\n        torch.nn.init.constant_(self.cls_logits.bias, bias_value)\n\n    def forward(self, x):\n        logits = []\n        bbox_reg = []\n        ctrness = []\n        bbox_towers = []\n        controllers = []\n        for l, feature in enumerate(x):\n            feature = self.share_tower(feature)\n            cls_tower = self.cls_tower(feature)\n            bbox_tower = self.bbox_tower(feature)\n\n            logits.append(self.cls_logits(cls_tower))\n            ctrness.append(self.ctrness(bbox_tower))\n            controllers.append(self.controller(bbox_tower))            \n            reg = self.bbox_pred(bbox_tower)\n            if self.scales is not None:\n                reg = self.scales[l](reg)\n            # Note that we use relu, as in the improved FCOS, instead of exp.\n            bbox_reg.append(F.relu(reg))\n        masks = x[0]\n        masks = self.mask_tower(masks)\n        masks = self.mask(masks)\n        return logits, bbox_reg, ctrness, bbox_towers, controllers, masks\n"
  },
  {
    "path": "fcos/modeling/fcos/fcos_outputs.py",
    "content": "import logging\nimport torch\nimport torch.nn.functional as F\n\nfrom detectron2.layers import cat\nfrom detectron2.structures import Instances, Boxes\nfrom fcos.utils.comm import get_world_size\nfrom fvcore.nn import sigmoid_focal_loss_jit\n\nfrom fcos.utils.comm import reduce_sum\nfrom fcos.layers import ml_nms\n#from detectron2.layers import interpolate\n\nlogger = logging.getLogger(__name__)\n\nINF = 100000000\n\n\"\"\"\nShape shorthand in this module:\n\n    N: number of images in the minibatch\n    L: number of feature maps per image on which RPN is run\n    Hi, Wi: height and width of the i-th feature map\n    4: size of the box parameterization\n\nNaming convention:\n\n    labels: refers to the ground-truth class of an position.\n\n    reg_targets: refers to the 4-d (left, top, right, bottom) distances that parameterize the ground-truth box.\n\n    logits_pred: predicted classification scores in [-inf, +inf];\n    \n    reg_pred: the predicted (left, top, right, bottom), corresponding to reg_targets \n\n    ctrness_pred: predicted centerness scores\n    \n\"\"\"\n\ndef aligned_bilinear(tensor, factor):\n    assert tensor.dim() == 4\n    assert factor >= 1\n    assert int(factor) == factor\n\n    if factor == 1:\n        return tensor\n\n    h, w = tensor.size()[2:]\n    tensor = F.pad(tensor, pad=(0, 1, 0, 1), mode=\"replicate\")\n    oh = factor * h + 1\n    ow = factor * w + 1\n    tensor = F.interpolate(\n        tensor, size=(oh, ow),\n        mode='bilinear',\n        align_corners=True\n    )\n    tensor = F.pad(\n        tensor, pad=(factor // 2, 0, factor // 2, 0),\n        mode=\"replicate\"\n    )\n\n    return tensor[:, :, :oh - 1, :ow - 1]\n        \n\ndef compute_ctrness_targets(reg_targets):\n    if len(reg_targets) == 0:\n        return reg_targets.new_zeros(len(reg_targets))\n    left_right = reg_targets[:, [0, 2]]\n    top_bottom = reg_targets[:, [1, 3]]\n    ctrness = (left_right.min(dim=-1)[0] / left_right.max(dim=-1)[0]) * \\\n                 (top_bottom.min(dim=-1)[0] / top_bottom.max(dim=-1)[0])\n    return torch.sqrt(ctrness)\n\n\nclass FCOSOutputs(object):\n    def __init__(\n            self,\n            images,\n            locations,\n            logits_pred,\n            reg_pred,\n            ctrness_pred,\n            focal_loss_alpha,\n            focal_loss_gamma,\n            iou_loss,\n            center_sample,\n            sizes_of_interest,\n            strides,\n            radius,\n            num_classes,\n            pre_nms_thresh,\n            pre_nms_top_n,\n            nms_thresh,\n            fpn_post_nms_top_n,\n            thresh_with_ctr,\n            controllers, \n            masks,\n            gt_instances=None,\n    ):\n        self.logits_pred = logits_pred\n        self.reg_pred = reg_pred\n        self.ctrness_pred = ctrness_pred\n        self.locations = locations\n\n        self.gt_instances = gt_instances\n        self.num_feature_maps = len(logits_pred)\n        self.num_images = len(images)\n        self.image_sizes = images.image_sizes\n        self.focal_loss_alpha = focal_loss_alpha\n        self.focal_loss_gamma = focal_loss_gamma\n        self.iou_loss = iou_loss\n        self.center_sample = center_sample\n        self.sizes_of_interest = sizes_of_interest\n        self.strides = strides\n        self.radius = radius\n        self.num_classes = num_classes\n        self.pre_nms_thresh = pre_nms_thresh\n        self.pre_nms_top_n = pre_nms_top_n\n        self.nms_thresh = nms_thresh\n        self.fpn_post_nms_top_n = fpn_post_nms_top_n\n        self.thresh_with_ctr = thresh_with_ctr\n        self.controllers = controllers\n        self.masks = masks\n\n    def _transpose(self, training_targets, num_loc_list):\n        '''\n        This function is used to transpose image first training targets to level first ones\n        :return: level first training targets\n        '''\n        for im_i in range(len(training_targets)):\n            training_targets[im_i] = torch.split(\n                training_targets[im_i], num_loc_list, dim=0\n            )\n\n        targets_level_first = []\n        for targets_per_level in zip(*training_targets):\n            targets_level_first.append(\n                torch.cat(targets_per_level, dim=0)\n            )\n        return targets_level_first\n\n    def _get_ground_truth(self):\n        num_loc_list = [len(loc) for loc in self.locations]\n        self.num_loc_list = num_loc_list\n\n        # compute locations to size ranges\n        loc_to_size_range = []\n        for l, loc_per_level in enumerate(self.locations):\n            loc_to_size_range_per_level = loc_per_level.new_tensor(self.sizes_of_interest[l])\n            loc_to_size_range.append(\n                loc_to_size_range_per_level[None].expand(num_loc_list[l], -1)\n            )\n\n        loc_to_size_range = torch.cat(loc_to_size_range, dim=0)\n        locations = torch.cat(self.locations, dim=0)\n\n        training_targets = self.compute_targets_for_locations(\n            locations, self.gt_instances, loc_to_size_range\n        )\n\n        # transpose im first training_targets to level first ones\n        training_targets = {\n            k: self._transpose(v, num_loc_list) for k, v in training_targets.items()\n        }\n\n        # we normalize reg_targets by FPN's strides here\n        reg_targets = training_targets[\"reg_targets\"]\n        for l in range(len(reg_targets)):\n            reg_targets[l] = reg_targets[l] / float(self.strides[l])\n\n        return training_targets\n\n    def get_sample_region(self, gt, strides, num_loc_list, loc_xs, loc_ys, radius=1):\n        num_gts = gt.shape[0]\n        K = len(loc_xs)\n        gt = gt[None].expand(K, num_gts, 4)\n        center_x = (gt[..., 0] + gt[..., 2]) / 2\n        center_y = (gt[..., 1] + gt[..., 3]) / 2\n        center_gt = gt.new_zeros(gt.shape)\n        # no gt\n        if center_x.numel() == 0 or center_x[..., 0].sum() == 0:\n            return loc_xs.new_zeros(loc_xs.shape, dtype=torch.uint8)\n        beg = 0\n        for level, num_loc in enumerate(num_loc_list):\n            end = beg + num_loc\n            stride = strides[level] * radius\n            xmin = center_x[beg:end] - stride\n            ymin = center_y[beg:end] - stride\n            xmax = center_x[beg:end] + stride\n            ymax = center_y[beg:end] + stride\n            # limit sample region in gt\n            center_gt[beg:end, :, 0] = torch.where(xmin > gt[beg:end, :, 0], xmin, gt[beg:end, :, 0])\n            center_gt[beg:end, :, 1] = torch.where(ymin > gt[beg:end, :, 1], ymin, gt[beg:end, :, 1])\n            center_gt[beg:end, :, 2] = torch.where(xmax > gt[beg:end, :, 2], gt[beg:end, :, 2], xmax)\n            center_gt[beg:end, :, 3] = torch.where(ymax > gt[beg:end, :, 3], gt[beg:end, :, 3], ymax)\n            beg = end\n        left = loc_xs[:, None] - center_gt[..., 0]\n        right = center_gt[..., 2] - loc_xs[:, None]\n        top = loc_ys[:, None] - center_gt[..., 1]\n        bottom = center_gt[..., 3] - loc_ys[:, None]\n        center_bbox = torch.stack((left, top, right, bottom), -1)\n        inside_gt_bbox_mask = center_bbox.min(-1)[0] > 0\n        return inside_gt_bbox_mask\n\n    def compute_targets_for_locations(self, locations, targets, size_ranges):\n        labels = []\n        reg_targets = []\n        matched_idxes = []\n        im_idxes = []\n        xs, ys = locations[:, 0], locations[:, 1]\n\n        for im_i in range(len(targets)):\n            targets_per_im = targets[im_i]\n            bboxes = targets_per_im.gt_boxes.tensor\n            labels_per_im = targets_per_im.gt_classes\n\n            # no gt\n            if bboxes.numel() == 0:\n                labels.append(labels_per_im.new_zeros(locations.size(0)) + self.num_classes)\n                reg_targets.append(locations.new_zeros((locations.size(0), 4)))\n                continue\n\n            area = targets_per_im.gt_boxes.area()\n\n            l = xs[:, None] - bboxes[:, 0][None]\n            t = ys[:, None] - bboxes[:, 1][None]\n            r = bboxes[:, 2][None] - xs[:, None]\n            b = bboxes[:, 3][None] - ys[:, None]\n            reg_targets_per_im = torch.stack([l, t, r, b], dim=2)\n\n            if self.center_sample:\n                is_in_boxes = self.get_sample_region(\n                    bboxes, self.strides, self.num_loc_list,\n                    xs, ys, radius=self.radius\n                )\n            else:\n                is_in_boxes = reg_targets_per_im.min(dim=2)[0] > 0\n\n            max_reg_targets_per_im = reg_targets_per_im.max(dim=2)[0]\n            # limit the regression range for each location\n            is_cared_in_the_level = \\\n                (max_reg_targets_per_im >= size_ranges[:, [0]]) & \\\n                (max_reg_targets_per_im <= size_ranges[:, [1]])\n\n            locations_to_gt_area = area[None].repeat(len(locations), 1)\n            locations_to_gt_area[is_in_boxes == 0] = INF\n            locations_to_gt_area[is_cared_in_the_level == 0] = INF\n\n            # if there are still more than one objects for a location,\n            # we choose the one with minimal area\n            locations_to_min_area, locations_to_gt_inds = locations_to_gt_area.min(dim=1)\n\n            reg_targets_per_im = reg_targets_per_im[range(len(locations)), locations_to_gt_inds]\n\n            labels_per_im = labels_per_im[locations_to_gt_inds]\n            labels_per_im[locations_to_min_area == INF] = self.num_classes\n\n            labels.append(labels_per_im)\n            reg_targets.append(reg_targets_per_im)\n            matched_idxes.append(locations_to_gt_inds)\n            im_idxes.append(torch.tensor([im_i]*len(labels_per_im)).to(locations_to_gt_inds.device))\n        return {\"labels\": labels, \"reg_targets\": reg_targets, \"matched_idxes\": matched_idxes, \"im_idxes\": im_idxes}\n\n    def losses(self):\n        \"\"\"\n        Return the losses from a set of FCOS predictions and their associated ground-truth.\n\n        Returns:\n            dict[loss name -> loss value]: A dict mapping from loss name to loss value.\n        \"\"\"\n\n        training_targets = self._get_ground_truth()\n        labels, reg_targets, matched_idxes, im_idxes = training_targets[\"labels\"], training_targets[\"reg_targets\"], training_targets[\"matched_idxes\"], training_targets[\"im_idxes\"]\n\n        # Collect all logits and regression predictions over feature maps\n        # and images to arrive at the same shape as the labels and targets\n        # The final ordering is L, N, H, W from slowest to fastest axis.\n        logits_pred = cat(\n            [\n                # Reshape: (N, C, Hi, Wi) -> (N, Hi, Wi, C) -> (N*Hi*Wi, C)\n                x.permute(0, 2, 3, 1).reshape(-1, self.num_classes)\n                for x in self.logits_pred\n            ], dim=0,)\n        reg_pred = cat(\n            [\n                # Reshape: (N, B, Hi, Wi) -> (N, Hi, Wi, B) -> (N*Hi*Wi, B)\n                x.permute(0, 2, 3, 1).reshape(-1, 4)\n                for x in self.reg_pred\n            ], dim=0,)\n        ctrness_pred = cat(\n            [\n                # Reshape: (N, 1, Hi, Wi) -> (N*Hi*Wi,)\n                x.reshape(-1) for x in self.ctrness_pred\n            ], dim=0,)\n\n        labels = cat(\n            [\n                # Reshape: (N, 1, Hi, Wi) -> (N*Hi*Wi,)\n                x.reshape(-1) for x in labels\n            ], dim=0,)\n\n        reg_targets = cat(\n            [\n                # Reshape: (N, Hi, Wi, 4) -> (N*Hi*Wi, 4)\n                x.reshape(-1, 4) for x in reg_targets\n            ], dim=0,)\n        \n        matched_idxes = cat(\n            [\n                x.reshape(-1) for x in matched_idxes\n            ], dim=0,)\n\n        im_idxes = cat(\n            [\n                x.reshape(-1) for x in im_idxes\n            ], dim=0,)\n\n        controllers_pred = cat(\n            [\n                x.permute(0, 2, 3, 1).reshape(-1, 169) for x in self.controllers\n            ], dim=0,)\n\n        return self.fcos_losses(\n            labels,\n            reg_targets,\n            logits_pred,\n            reg_pred,\n            ctrness_pred,\n            controllers_pred,\n            self.focal_loss_alpha,\n            self.focal_loss_gamma,\n            self.iou_loss,\n            matched_idxes,\n            im_idxes\n        )\n\n    def predict_proposals(self):\n        sampled_boxes = []\n\n        bundle = (\n            self.locations, self.logits_pred,\n            self.reg_pred, self.ctrness_pred,\n            self.strides\n        )\n\n        for i, (l, o, r, c, s) in enumerate(zip(*bundle)):\n            # recall that during training, we normalize regression targets with FPN's stride.\n            # we denormalize them here.\n            r = r * s\n            controller = self.controllers[i]\n            sampled_boxes.append(\n                self.forward_for_single_feature_map(\n                    l, o, r, c, controller, self.image_sizes\n                )\n            )\n\n        boxlists = list(zip(*sampled_boxes))\n        boxlists = [Instances.cat(boxlist) for boxlist in boxlists]\n        boxlists = self.select_over_all_levels(boxlists)\n\n        # for CondInst\n        boxlists = self.forward_for_mask(boxlists)\n\n        return boxlists\n\n    def forward_for_mask(self, boxlists):\n        N, dim, h, w = self.masks.shape\n        grid_x = torch.arange(w).view(1,-1).float().repeat(h,1).cuda() / (w-1) * 2 - 1\n        grid_y = torch.arange(h).view(-1,1).float().repeat(1,w).cuda() / (h-1) * 2 - 1\n        x_map = grid_x.view(1, 1, h, w).repeat(N, 1, 1, 1)\n        y_map = grid_y.view(1, 1, h, w).repeat(N, 1, 1, 1)\n        masks_feat = torch.cat((self.masks, x_map, y_map), dim=1)\n        o_h = int(h * self.strides[0])\n        o_w = int(w * self.strides[0])\n        for im in range(N):\n            boxlist = boxlists[im]\n            input_h, input_w = boxlist.image_size\n            mask = masks_feat[None, im]\n            ins_num = boxlist.controllers.shape[0]\n            weights1 = boxlist.controllers[:,:80].reshape(-1,8,10).reshape(-1,10).unsqueeze(-1).unsqueeze(-1)\n            bias1 = boxlist.controllers[:, 80:88].flatten()\n            weights2 = boxlist.controllers[:, 88:152].reshape(-1,8,8).reshape(-1,8).unsqueeze(-1).unsqueeze(-1)\n            bias2 = boxlist.controllers[:, 152:160].flatten()\n            weights3 = boxlist.controllers[:, 160:168].unsqueeze(-1).unsqueeze(-1)\n            bias3 = boxlist.controllers[:,168:169].flatten()\n            \n            conv1 = F.conv2d(mask,weights1,bias1).relu()\n            conv2 = F.conv2d(conv1, weights2, bias2, groups = ins_num).relu()\n            masks_per_image = F.conv2d(conv2, weights3, bias3, groups = ins_num)\n            #masks = interpolate(masks_per_image, size = (o_h,o_w), mode=\"bilinear\", align_corners=False).sigmoid()\n            masks = aligned_bilinear(masks_per_image, self.strides[0]).sigmoid()\n            masks = masks[:, :, :input_h, :input_w].permute(1,0,2,3)\n            boxlist.pred_masks = masks\n        return boxlists\n\n    def forward_for_single_feature_map(\n            self, locations, box_cls,\n            reg_pred, ctrness, controller, image_sizes\n    ):\n        N, C, H, W = box_cls.shape\n\n        # put in the same format as locations\n        box_cls = box_cls.view(N, C, H, W).permute(0, 2, 3, 1)\n        box_cls = box_cls.reshape(N, -1, C).sigmoid()\n        box_regression = reg_pred.view(N, 4, H, W).permute(0, 2, 3, 1)\n        box_regression = box_regression.reshape(N, -1, 4)\n        ctrness = ctrness.view(N, 1, H, W).permute(0, 2, 3, 1)\n        ctrness = ctrness.reshape(N, -1).sigmoid()\n        controller = controller.view(N, 169, H, W).permute(0, 2, 3, 1)\n        controller = controller.reshape(N, -1, 169)\n        # if self.thresh_with_ctr is True, we multiply the classification\n        # scores with centerness scores before applying the threshold.\n        if self.thresh_with_ctr:\n            box_cls = box_cls * ctrness[:, :, None]\n        candidate_inds = box_cls > self.pre_nms_thresh\n        pre_nms_top_n = candidate_inds.view(N, -1).sum(1)\n        pre_nms_top_n = pre_nms_top_n.clamp(max=self.pre_nms_top_n)\n\n        if not self.thresh_with_ctr:\n            box_cls = box_cls * ctrness[:, :, None]\n\n        results = []\n        for i in range(N):\n            per_box_cls = box_cls[i]\n            per_candidate_inds = candidate_inds[i]\n            per_box_cls = per_box_cls[per_candidate_inds]\n\n            per_candidate_nonzeros = per_candidate_inds.nonzero()\n            per_box_loc = per_candidate_nonzeros[:, 0]\n            per_class = per_candidate_nonzeros[:, 1]\n\n            per_box_regression = box_regression[i]\n            per_box_regression = per_box_regression[per_box_loc]\n            per_locations = locations[per_box_loc]\n\n            per_controller = controller[i]\n            per_controller = per_controller[per_box_loc]\n            per_pre_nms_top_n = pre_nms_top_n[i]\n\n            if per_candidate_inds.sum().item() > per_pre_nms_top_n.item():\n                per_box_cls, top_k_indices = \\\n                    per_box_cls.topk(per_pre_nms_top_n, sorted=False)\n                per_class = per_class[top_k_indices]\n                per_box_regression = per_box_regression[top_k_indices]\n                per_locations = per_locations[top_k_indices]\n                per_controller = per_controller[top_k_indices]\n\n            detections = torch.stack([\n                per_locations[:, 0] - per_box_regression[:, 0],\n                per_locations[:, 1] - per_box_regression[:, 1],\n                per_locations[:, 0] + per_box_regression[:, 2],\n                per_locations[:, 1] + per_box_regression[:, 3],\n            ], dim=1)\n\n            boxlist = Instances(image_sizes[i])\n            boxlist.pred_boxes = Boxes(detections)\n            boxlist.scores = torch.sqrt(per_box_cls)\n            boxlist.pred_classes = per_class\n            boxlist.locations = per_locations\n            boxlist.controllers = per_controller\n\n            results.append(boxlist)\n\n        return results\n\n    def select_over_all_levels(self, boxlists):\n        num_images = len(boxlists)\n        results = []\n        for i in range(num_images):\n            # multiclass nms\n            result = ml_nms(boxlists[i], self.nms_thresh)\n            number_of_detections = len(result)\n\n            # Limit to max_per_image detections **over all classes**\n            if number_of_detections > self.fpn_post_nms_top_n > 0:\n                cls_scores = result.scores\n                image_thresh, _ = torch.kthvalue(\n                    cls_scores.cpu(),\n                    number_of_detections - self.fpn_post_nms_top_n + 1\n                )\n                keep = cls_scores >= image_thresh.item()\n                keep = torch.nonzero(keep).squeeze(1)\n                result = result[keep]\n            results.append(result)\n        return results\n\n    def prepare_masks(self, m_h, m_w, r_h, r_w, targets_masks):\n        masks = []\n        for im_i in range(len(targets_masks)):\n            mask_t = targets_masks[im_i]\n            if len(mask_t) == 0:\n                masks.append(mask_t.new_tensor([]))\n                continue\n            n, h, w = mask_t.shape\n            mask = mask_t.new_zeros((n, r_h, r_w))\n            mask[:, :h, :w] = mask_t\n            #resized_mask = aligned_bilinear(mask.float().unsqueeze(0), m_h/r_h)[0].gt(0)\n            #resized_mask = interpolate(\n            #    input=mask.float().unsqueeze(0), size=(m_h, m_w), mode=\"bilinear\", align_corners=False,\n            #    )[0].gt(0)\n            #masks.append(resized_mask)\n            masks.append(mask)\n        return masks\n\n    def dice_loss(self,input, target):\n        smooth = 1.\n        iflat = input.contiguous().view(-1)\n        tflat = target.contiguous().view(-1)\n        intersection = (iflat * tflat).sum()\n        return 1 - ((2. * intersection + smooth) /((iflat*iflat).sum() + (tflat*tflat).sum() + smooth))\n\n\n\n    def fcos_losses(\n        self,\n        labels,\n        reg_targets,\n        logits_pred,\n        reg_pred,\n        ctrness_pred,\n        controllers_pred,\n        focal_loss_alpha,\n        focal_loss_gamma,\n        iou_loss,\n        matched_idxes,\n        im_idxes\n    ):\n        num_classes = logits_pred.size(1)\n        labels = labels.flatten()\n\n        pos_inds = torch.nonzero(labels != num_classes).squeeze(1)\n        num_pos_local = pos_inds.numel()\n        num_gpus = get_world_size()\n        total_num_pos = reduce_sum(pos_inds.new_tensor([num_pos_local])).item()\n        num_pos_avg = max(total_num_pos / num_gpus, 1.0)\n\n        # prepare one_hot\n        class_target = torch.zeros_like(logits_pred)\n        class_target[pos_inds, labels[pos_inds]] = 1\n\n        class_loss = sigmoid_focal_loss_jit(\n            logits_pred,\n            class_target,\n            alpha=focal_loss_alpha,\n            gamma=focal_loss_gamma,\n            reduction=\"sum\",\n        ) / num_pos_avg\n\n        reg_pred = reg_pred[pos_inds]\n        reg_targets = reg_targets[pos_inds]\n        ctrness_pred = ctrness_pred[pos_inds]\n        controllers_pred = controllers_pred[pos_inds]\n        matched_idxes = matched_idxes[pos_inds]\n        im_idxes = im_idxes[pos_inds]\n\n        ctrness_targets = compute_ctrness_targets(reg_targets)\n        ctrness_targets_sum = ctrness_targets.sum()\n        ctrness_norm = max(reduce_sum(ctrness_targets_sum).item() / num_gpus, 1e-6)\n\n        reg_loss = iou_loss(\n            reg_pred,\n            reg_targets,\n            ctrness_targets\n        ) / ctrness_norm\n\n        ctrness_loss = F.binary_cross_entropy_with_logits(\n             ctrness_pred,\n             ctrness_targets,\n             reduction=\"sum\"\n         ) / num_pos_avg\n\n        # for CondInst\n        N, C, h, w = self.masks.shape \n        grid_x = torch.arange(w).view(1,-1).float().repeat(h,1).cuda() / (w-1) * 2 - 1\n        grid_y = torch.arange(h).view(-1,1).float().repeat(1,w).cuda() / (h-1) * 2 - 1\n        x_map = grid_x.view(1, 1, h, w).repeat(N, 1, 1, 1)\n        y_map = grid_y.view(1, 1, h, w).repeat(N, 1, 1, 1)\n        masks_feat = torch.cat((self.masks, x_map, y_map), dim=1)\n        r_h = int(h * self.strides[0])\n        r_w = int(w * self.strides[0])\n        targets_masks = [target_im.gt_masks.tensor for target_im in self.gt_instances]\n        masks_t = self.prepare_masks(h, w, r_h, r_w, targets_masks)\n        mask_loss = masks_feat[0].new_tensor(0.0)\n        batch_ins = im_idxes.shape[0] \n        # for each image\n        for i in range(N):\n            inds = (im_idxes==i).nonzero().flatten()\n            ins_num = inds.shape[0]\n            if ins_num > 0:\n                controllers = controllers_pred[inds]\n                mask_feat = masks_feat[None, i]\n                weights1 = controllers[:, :80].reshape(-1,8,10).reshape(-1,10).unsqueeze(-1).unsqueeze(-1)\n                bias1 = controllers[:, 80:88].flatten()            \n                weights2 = controllers[:, 88:152].reshape(-1,8,8).reshape(-1,8).unsqueeze(-1).unsqueeze(-1)\n                bias2 = controllers[:, 152:160].flatten()\n                weights3 = controllers[:, 160:168].unsqueeze(-1).unsqueeze(-1)\n                bias3 = controllers[:,168:169].flatten()\n                conv1 = F.conv2d(mask_feat,weights1,bias1).relu()\n                conv2 = F.conv2d(conv1, weights2, bias2, groups = ins_num).relu()\n                #masks_per_image = F.conv2d(conv2, weights3, bias3, groups = ins_num)[0].sigmoid()\n                masks_per_image = F.conv2d(conv2, weights3, bias3, groups = ins_num) \n                masks_per_image = aligned_bilinear(masks_per_image, self.strides[0])[0].sigmoid()         \n                for j in range(ins_num):\n                    ind = inds[j]\n                    mask_gt = masks_t[i][matched_idxes[ind]].float()\n                    mask_pred = masks_per_image[j]\n                    mask_loss += self.dice_loss(mask_pred, mask_gt)\n            \n        if batch_ins > 0:\n            mask_loss = mask_loss / batch_ins\n              \n\n        losses = {\n            \"loss_fcos_cls\": class_loss,\n            \"loss_fcos_loc\": reg_loss,\n            \"loss_fcos_ctr\": ctrness_loss,\n            \"loss_mask\": mask_loss\n        }\n        return losses, {}\n\n"
  },
  {
    "path": "fcos/modeling/one_stage_detector.py",
    "content": "from detectron2.modeling.meta_arch.build import META_ARCH_REGISTRY\nfrom detectron2.modeling import ProposalNetwork\n\n\n@META_ARCH_REGISTRY.register()\nclass OneStageDetector(ProposalNetwork):\n    \"\"\"\n    Same as :class:`detectron2.modeling.ProposalNetwork`.\n    Uses \"instances\" as the return key instead of using \"proposal\".\n    \"\"\"\n    def forward(self, batched_inputs):\n        if self.training:\n            return super().forward(batched_inputs)\n        processed_results = super().forward(batched_inputs)\n        processed_results = [{\"instances\": r[\"proposals\"]} for r in processed_results]\n        return processed_results\n"
  },
  {
    "path": "fcos/modeling/poolers.py",
    "content": "import sys\nimport torch\nfrom detectron2.layers import cat\n\nfrom detectron2.modeling.poolers import (\n    ROIPooler, convert_boxes_to_pooler_format, assign_boxes_to_levels\n)\n\n__all__ = [\"TopPooler\"]\n\n\ndef _box_max_size(boxes):\n    box = boxes.tensor\n    max_size = torch.max(box[:, 2] - box[:, 0], box[:, 3] - box[:, 1])\n    return max_size\n\n\ndef assign_boxes_to_levels_by_length(\n        box_lists, min_level, max_level, canonical_box_size, canonical_level):\n    \"\"\"\n    Map each box in `box_lists` to a feature map level index and return the assignment\n    vector.\n\n    Args:\n        box_lists (list[detectron2.structures.Boxes]): A list of N Boxes or N RotatedBoxes,\n            where N is the number of images in the batch.\n        min_level (int): Smallest feature map level index. The input is considered index 0,\n            the output of stage 1 is index 1, and so.\n        max_level (int): Largest feature map level index.\n        canonical_box_size (int): A canonical box size in pixels (sqrt(box area)).\n        canonical_level (int): The feature map level index on which a canonically-sized box\n            should be placed.\n\n    Returns:\n        A tensor of length M, where M is the total number of boxes aggregated over all\n            N batch images. The memory layout corresponds to the concatenation of boxes\n            from all images. Each element is the feature map index, as an offset from\n            `self.min_level`, for the corresponding box (so value i means the box is at\n            `self.min_level + i`).\n    \"\"\"\n    eps = sys.float_info.epsilon\n    box_sizes = cat([_box_max_size(boxes) for boxes in box_lists])\n    # Eqn.(1) in FPN paper\n    level_assignments = torch.floor(\n        canonical_level + torch.log2(box_sizes / canonical_box_size + eps)\n    )\n    level_assignments = torch.clamp(level_assignments, min=min_level, max=max_level)\n    return level_assignments.to(torch.int64) - min_level\n\n\nclass TopPooler(ROIPooler):\n    \"\"\"\n    ROIPooler with option to assign level by max length. Used by top modules.\n    \"\"\"\n    def __init__(self,\n                 output_size,\n                 scales,\n                 sampling_ratio,\n                 pooler_type,\n                 canonical_box_size=224,\n                 canonical_level=4,\n                 assign_crit=\"area\",):\n        super().__init__(output_size, scales, sampling_ratio, pooler_type,\n                         canonical_box_size=canonical_box_size,\n                         canonical_level=canonical_level)\n        self.assign_crit = assign_crit\n\n    def forward(self, x, box_lists):\n        \"\"\"\n        Args:\n            x (list[Tensor]): A list of feature maps of NCHW shape, with scales matching those\n                used to construct this module.\n            box_lists (list[Boxes] | list[RotatedBoxes]):\n                A list of N Boxes or N RotatedBoxes, where N is the number of images in the batch.\n                The box coordinates are defined on the original image and\n                will be scaled by the `scales` argument of :class:`ROIPooler`.\n\n        Returns:\n            Tensor:\n                A tensor of shape (M, C, output_size, output_size) where M is the total number of\n                boxes aggregated over all N batch images and C is the number of channels in `x`.\n        \"\"\"\n        num_level_assignments = len(self.level_poolers)\n\n        assert isinstance(x, list) and isinstance(\n            box_lists, list\n        ), \"Arguments to pooler must be lists\"\n        assert (\n            len(x) == num_level_assignments\n        ), \"unequal value, num_level_assignments={}, but x is list of {} Tensors\".format(\n            num_level_assignments, len(x)\n        )\n\n        assert len(box_lists) == x[0].size(\n            0\n        ), \"unequal value, x[0] batch dim 0 is {}, but box_list has length {}\".format(\n            x[0].size(0), len(box_lists)\n        )\n\n        pooler_fmt_boxes = convert_boxes_to_pooler_format(box_lists)\n\n        if num_level_assignments == 1:\n            return self.level_poolers[0](x[0], pooler_fmt_boxes)\n\n        if self.assign_crit == \"length\":\n            assign_method = assign_boxes_to_levels_by_length\n        else:\n            assign_method = assign_boxes_to_levels\n\n        level_assignments = assign_method(\n            box_lists, self.min_level, self.max_level,\n            self.canonical_box_size, self.canonical_level)\n\n        num_boxes = len(pooler_fmt_boxes)\n        num_channels = x[0].shape[1]\n        output_size = self.output_size[0]\n\n        dtype, device = x[0].dtype, x[0].device\n        output = torch.zeros(\n            (num_boxes, num_channels, output_size, output_size), dtype=dtype, device=device\n        )\n\n        for level, (x_level, pooler) in enumerate(zip(x, self.level_poolers)):\n            inds = torch.nonzero(level_assignments == level).squeeze(1)\n            pooler_fmt_boxes_level = pooler_fmt_boxes[inds]\n            output[inds] = pooler(x_level, pooler_fmt_boxes_level)\n\n        return output\n"
  },
  {
    "path": "fcos/utils/comm.py",
    "content": "import torch.distributed as dist\nfrom detectron2.utils.comm import get_world_size\n\n\ndef reduce_sum(tensor):\n    world_size = get_world_size()\n    if world_size < 2:\n        return tensor\n    tensor = tensor.clone()\n    dist.all_reduce(tensor, op=dist.ReduceOp.SUM)\n    return tensor\n"
  },
  {
    "path": "fcos/utils/measures.py",
    "content": "# coding: utf-8\n# Adapted from https://github.com/ShichenLiu/CondenseNet/blob/master/utils.py\nfrom __future__ import absolute_import\nfrom __future__ import unicode_literals\nfrom __future__ import print_function\nfrom __future__ import division\n\nimport operator\n\nfrom functools import reduce\n\n\ndef get_num_gen(gen):\n    return sum(1 for x in gen)\n\n\ndef is_pruned(layer):\n    try:\n        layer.mask\n        return True\n    except AttributeError:\n        return False\n\n\ndef is_leaf(model):\n    return get_num_gen(model.children()) == 0\n\n\ndef get_layer_info(layer):\n    layer_str = str(layer)\n    type_name = layer_str[:layer_str.find('(')].strip()\n    return type_name\n\n\ndef get_layer_param(model):\n    return sum([reduce(operator.mul, i.size(), 1) for i in model.parameters()])\n\n\n### The input batch size should be 1 to call this function\ndef measure_layer(layer, *args):\n    global count_ops, count_params\n\n    for x in args:\n        delta_ops = 0\n        delta_params = 0\n        multi_add = 1\n        type_name = get_layer_info(layer)\n\n        ### ops_conv\n        if type_name in ['Conv2d']:\n            out_h = int((x.size()[2] + 2 * layer.padding[0] / layer.dilation[0] - layer.kernel_size[0]) /\n                        layer.stride[0] + 1)\n            out_w = int((x.size()[3] + 2 * layer.padding[1] / layer.dilation[1] - layer.kernel_size[1]) /\n                        layer.stride[1] + 1)\n            delta_ops = layer.in_channels * layer.out_channels * layer.kernel_size[0] * layer.kernel_size[1] * out_h * out_w / layer.groups * multi_add\n            delta_params = get_layer_param(layer)\n\n        elif type_name in ['ConvTranspose2d']:\n            _, _, in_h, in_w = x.size()\n            out_h = int((in_h-1)*layer.stride[0] - 2 * layer.padding[0] + layer.kernel_size[0] + layer.output_padding[0])\n            out_w = int((in_w-1)*layer.stride[1] - 2 * layer.padding[1] + layer.kernel_size[1] + layer.output_padding[1])\n            delta_ops = layer.in_channels * layer.out_channels * layer.kernel_size[0] *  \\\n                        layer.kernel_size[1] * out_h * out_w / layer.groups * multi_add\n            delta_params = get_layer_param(layer)\n\n        ### ops_learned_conv\n        elif type_name in ['LearnedGroupConv']:\n            measure_layer(layer.relu, x)\n            measure_layer(layer.norm, x)\n            conv = layer.conv\n            out_h = int((x.size()[2] + 2 * conv.padding[0] - conv.kernel_size[0]) /\n                        conv.stride[0] + 1)\n            out_w = int((x.size()[3] + 2 * conv.padding[1] - conv.kernel_size[1]) /\n                        conv.stride[1] + 1)\n            delta_ops = conv.in_channels * conv.out_channels * conv.kernel_size[0] * conv.kernel_size[1] * out_h * out_w / layer.condense_factor * multi_add\n            delta_params = get_layer_param(conv) / layer.condense_factor\n\n        ### ops_nonlinearity\n        elif type_name in ['ReLU', 'ReLU6']:\n            delta_ops = x.numel()\n            delta_params = get_layer_param(layer)\n\n        ### ops_pooling\n        elif type_name in ['AvgPool2d', 'MaxPool2d']:\n            in_w = x.size()[2]\n            kernel_ops = layer.kernel_size * layer.kernel_size\n            out_w = int((in_w + 2 * layer.padding - layer.kernel_size) / layer.stride + 1)\n            out_h = int((in_w + 2 * layer.padding - layer.kernel_size) / layer.stride + 1)\n            delta_ops = x.size()[0] * x.size()[1] * out_w * out_h * kernel_ops\n            delta_params = get_layer_param(layer)\n\n        elif type_name in ['LastLevelMaxPool']:\n            pass\n\n        elif type_name in ['AdaptiveAvgPool2d']:\n            delta_ops = x.size()[0] * x.size()[1] * x.size()[2] * x.size()[3]\n            delta_params = get_layer_param(layer)\n\n        elif type_name in ['ZeroPad2d', 'RetinaNetPostProcessor']:\n            pass\n            #delta_ops = x.size()[0] * x.size()[1] * x.size()[2] * x.size()[3]\n            #delta_params = get_layer_param(layer)\n\n        ### ops_linear\n        elif type_name in ['Linear']:\n            weight_ops = layer.weight.numel() * multi_add\n            bias_ops = layer.bias.numel()\n            delta_ops = x.size()[0] * (weight_ops + bias_ops)\n            delta_params = get_layer_param(layer)\n\n        ### ops_nothing\n        elif type_name in ['BatchNorm2d', 'Dropout2d', 'DropChannel', 'Dropout', 'FrozenBatchNorm2d', 'GroupNorm']:\n            delta_params = get_layer_param(layer)\n\n        elif type_name in ['SumTwo']:\n            delta_ops = x.numel()\n\n        elif type_name in ['AggregateCell']:\n            if not layer.pre_transform:\n                delta_ops = 2 * x.numel() # twice for each input\n            else:\n                measure_layer(layer.branch_1, x)\n                measure_layer(layer.branch_2, x)\n                delta_params = get_layer_param(layer)\n\n        elif type_name in ['Identity', 'Zero']:\n            pass\n\n        elif type_name in ['Scale']:\n            delta_params = get_layer_param(layer)\n            delta_ops = x.numel()\n\n        elif type_name in ['FCOSPostProcessor', 'RPNPostProcessor', 'KeypointPostProcessor',\n                           'ROIAlign', 'PostProcessor', 'KeypointRCNNPredictor', \n                           'NaiveSyncBatchNorm', 'Upsample', 'Sequential']:\n            pass\n\n        elif type_name in ['DeformConv']:\n            # don't count bilinear\n            offset_conv = list(layer.parameters())[0]\n            delta_ops = reduce(operator.mul, offset_conv.size(), x.size()[2] * x.size()[3])\n            out_h = int((x.size()[2] + 2 * layer.padding[0] / layer.dilation[0]\n                         - layer.kernel_size[0]) / layer.stride[0] + 1)\n            out_w = int((x.size()[3] + 2 * layer.padding[1] / layer.dilation[1]\n                         - layer.kernel_size[1]) / layer.stride[1] + 1)\n            delta_ops += layer.in_channels * layer.out_channels * layer.kernel_size[0] * layer.kernel_size[1] * out_h * out_w / layer.groups * multi_add\n            delta_params = get_layer_param(layer)\n\n        ### unknown layer type\n        else:\n            raise TypeError('unknown layer type: %s' % type_name)\n\n        count_ops += delta_ops\n        count_params += delta_params\n    return\n\n\ndef measure_model(model, x):\n    global count_ops, count_params\n    count_ops = 0\n    count_params = 0\n\n    def should_measure(x):\n        return is_leaf(x) or is_pruned(x)\n\n    def modify_forward(model):\n        for child in model.children():\n            if should_measure(child):\n                def new_forward(m):\n                    def lambda_forward(*args):\n                        measure_layer(m, *args)\n                        return m.old_forward(*args)\n                    return lambda_forward\n                child.old_forward = child.forward\n                child.forward = new_forward(child)\n            else:\n                modify_forward(child)\n\n    def restore_forward(model):\n        for child in model.children():\n            # leaf node\n            if is_leaf(child) and hasattr(child, 'old_forward'):\n                child.forward = child.old_forward\n                child.old_forward = None\n            else:\n                restore_forward(child)\n\n    modify_forward(model)\n    out = model.forward(x)\n    restore_forward(model)\n\n    return out, count_ops, count_params\n"
  },
  {
    "path": "postprocessing.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nfrom torch.nn import functional as F\n\nfrom detectron2.layers import paste_masks_in_image\nfrom detectron2.structures import Instances\n\n\ndef detector_postprocess(results, output_height, output_width, mask_threshold=0.5):\n    \"\"\"\n    Resize the output instances.\n    The input images are often resized when entering an object detector.\n    As a result, we often need the outputs of the detector in a different\n    resolution from its inputs.\n\n    This function will resize the raw outputs of an R-CNN detector\n    to produce outputs according to the desired output resolution.\n\n    Args:\n        results (Instances): the raw outputs from the detector.\n            `results.image_size` contains the input image resolution the detector sees.\n            This object might be modified in-place.\n        output_height, output_width: the desired output resolution.\n\n    Returns:\n        Instances: the resized output from the model, based on the output resolution\n    \"\"\"\n    scale_x, scale_y = (output_width / results.image_size[1], output_height / results.image_size[0])\n    results = Instances((output_height, output_width), **results.get_fields())\n\n    if results.has(\"pred_boxes\"):\n        output_boxes = results.pred_boxes\n    elif results.has(\"proposal_boxes\"):\n        output_boxes = results.proposal_boxes\n\n    output_boxes.scale(scale_x, scale_y)\n    output_boxes.clip(results.image_size)\n\n    results = results[output_boxes.nonempty()]\n    \n    if results.has(\"pred_masks\"):\n        if results.pred_masks.shape[0]:\n            results.pred_masks = F.interpolate(input=results.pred_masks, size=results.image_size,mode=\"bilinear\", align_corners=False).gt(0.5).squeeze(1)\n        #results.pred_masks = paste_masks_in_image(\n        #    results.pred_masks[:, 0, :, :],  # N, 1, M, M\n        #    results.pred_boxes,\n        #    results.image_size,\n        #    threshold=mask_threshold,\n        #)\n\n    if results.has(\"pred_keypoints\"):\n        results.pred_keypoints[:, :, 0] *= scale_x\n        results.pred_keypoints[:, :, 1] *= scale_y\n\n    return results\n\n\ndef sem_seg_postprocess(result, img_size, output_height, output_width):\n    \"\"\"\n    Return semantic segmentation predictions in the original resolution.\n\n    The input images are often resized when entering semantic segmentor. Moreover, in same\n    cases, they also padded inside segmentor to be divisible by maximum network stride.\n    As a result, we often need the predictions of the segmentor in a different\n    resolution from its inputs.\n\n    Args:\n        result (Tensor): semantic segmentation prediction logits. A tensor of shape (C, H, W),\n            where C is the number of classes, and H, W are the height and width of the prediction.\n        img_size (tuple): image size that segmentor is taking as input.\n        output_height, output_width: the desired output resolution.\n\n    Returns:\n        semantic segmentation prediction (Tensor): A tensor of the shape\n            (C, output_height, output_width) that contains per-pixel soft predictions.\n    \"\"\"\n    result = result[:, : img_size[0], : img_size[1]].expand(1, -1, -1, -1)\n    result = F.interpolate(\n        result, size=(output_height, output_width), mode=\"bilinear\", align_corners=False\n    )[0]\n    return result\n"
  },
  {
    "path": "tools/compute_flops.py",
    "content": "import torch\nfrom detectron2.engine import default_argument_parser, default_setup\n\nfrom adet.config import get_cfg\nfrom adet.utils.measures import measure_model\n\nfrom train_net import Trainer\n\n\ndef setup(args):\n    \"\"\"\n    Create configs and perform basic setups.\n    \"\"\"\n    cfg = get_cfg()\n    cfg.merge_from_file(args.config_file)\n    cfg.merge_from_list(args.opts)\n    cfg.freeze()\n    default_setup(cfg, args)\n    return cfg\n\n\ndef main(args):\n    cfg = setup(args)\n\n    model = Trainer.build_model(cfg)\n    model.eval().cuda()\n    input_size = (3, 512, 512)\n    image = torch.zeros(*input_size)\n    batched_input = {\"image\": image}\n    ops, params = measure_model(model, [batched_input])\n    print('ops: {:.2f}G\\tparams: {:.2f}M'.format(ops / 2**30, params / 2**20))\n\n\nif __name__ == \"__main__\":\n    args = default_argument_parser().parse_args()\n    print(\"Command Line Args:\", args)\n    main(args)\n"
  },
  {
    "path": "tools/convert_fcos_weight.py",
    "content": "import argparse\nfrom collections import OrderedDict\n\nimport torch\n\n\ndef get_parser():\n    parser = argparse.ArgumentParser(description=\"FCOS Detectron2 Converter\")\n    parser.add_argument(\n        \"--model\",\n        default=\"weights/fcos_R_50_1x_official.pth\",\n        metavar=\"FILE\",\n        help=\"path to model weights\",\n    )\n    parser.add_argument(\n        \"--output\",\n        default=\"weights/fcos_R_50_1x_converted.pth\",\n        metavar=\"FILE\",\n        help=\"path to model weights\",\n    )\n    return parser\n\n\ndef rename_resnet_param_names(ckpt_state_dict):\n    converted_state_dict = OrderedDict()\n    for key in ckpt_state_dict.keys():\n        value = ckpt_state_dict[key]\n\n        key = key.replace(\"module.\", \"\")\n        key = key.replace(\"body\", \"bottom_up\")\n\n        # adding a . ahead to avoid renaming the fpn modules\n        # this can happen after fpn renaming\n        key = key.replace(\".layer1\", \".res2\")\n        key = key.replace(\".layer2\", \".res3\")\n        key = key.replace(\".layer3\", \".res4\")\n        key = key.replace(\".layer4\", \".res5\")\n        key = key.replace(\"downsample.0\", \"shortcut\")\n        key = key.replace(\"downsample.1\", \"shortcut.norm\")\n        key = key.replace(\"bn1\", \"conv1.norm\")\n        key = key.replace(\"bn2\", \"conv2.norm\")\n        key = key.replace(\"bn3\", \"conv3.norm\")\n        key = key.replace(\"fpn_inner2\", \"fpn_lateral3\")\n        key = key.replace(\"fpn_inner3\", \"fpn_lateral4\")\n        key = key.replace(\"fpn_inner4\", \"fpn_lateral5\")\n        key = key.replace(\"fpn_layer2\", \"fpn_output3\")\n        key = key.replace(\"fpn_layer3\", \"fpn_output4\")\n        key = key.replace(\"fpn_layer4\", \"fpn_output5\")\n        key = key.replace(\"top_blocks\", \"top_block\")\n        key = key.replace(\"fpn.\", \"\")\n        key = key.replace(\"rpn\", \"proposal_generator\")\n        key = key.replace(\"head\", \"fcos_head\")\n\n        converted_state_dict[key] = value\n    return converted_state_dict\n\n\nif __name__ == \"__main__\":\n    args = get_parser().parse_args()\n    ckpt = torch.load(args.model)\n    model = rename_resnet_param_names(ckpt[\"model\"])\n    torch.save(model, args.output)\n"
  },
  {
    "path": "tools/remove_optim_from_ckpt.py",
    "content": "import argparse\n\nimport torch\n\n\ndef get_parser():\n    parser = argparse.ArgumentParser(description=\"Keep only model in ckpt\")\n    parser.add_argument(\n        \"--path\",\n        default=\"output/person/blendmask/R_50_1x/\",\n        help=\"path to model weights\",\n    )\n    parser.add_argument(\n        \"--name\",\n        default=\"R_50_1x.pth\",\n        help=\"name of output file\",\n    )\n    return parser\n\n\nif __name__ == \"__main__\":\n    args = get_parser().parse_args()\n    ckpt = torch.load(args.path + 'model_final.pth')\n    model = ckpt[\"model\"]\n    torch.save(model, args.path + args.name)\n"
  },
  {
    "path": "train_net.py",
    "content": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n\"\"\"\nDetection Training Script.\n\nThis scripts reads a given config file and runs the training or evaluation.\nIt is an entry point that is made to train standard models in detectron2.\n\nIn order to let one script support training of many models,\nthis script contains logic that are specific to these built-in models and therefore\nmay not be suitable for your own project.\nFor example, your research project perhaps only needs a single \"evaluator\".\n\nTherefore, we recommend you to use detectron2 as an library and take\nthis file as an example of how to use the library.\nYou may want to write your own script with your datasets and other customizations.\n\"\"\"\n\nimport logging\nimport os\nfrom collections import OrderedDict\nimport torch\nfrom torch.nn.parallel import DistributedDataParallel\n\nimport detectron2.utils.comm as comm\nfrom detectron2.data import MetadataCatalog, build_detection_train_loader\nfrom detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, hooks, launch\nfrom detectron2.utils.events import EventStorage\nfrom detectron2.evaluation import (\n    CityscapesEvaluator,\n    COCOEvaluator,\n    COCOPanopticEvaluator,\n    DatasetEvaluators,\n    LVISEvaluator,\n    PascalVOCDetectionEvaluator,\n    SemSegEvaluator,\n    verify_results,\n)\nfrom detectron2.modeling import GeneralizedRCNNWithTTA\n\nfrom detectron2.data.dataset_mapper import DatasetMapper\nfrom fcos.config import get_cfg\nfrom fcos.checkpoint import AdetCheckpointer\n\n\nclass Trainer(DefaultTrainer):\n    \"\"\"\n    This is the same Trainer except that we rewrite the\n    `build_train_loader` method.\n    \"\"\"\n\n    def __init__(self, cfg):\n        \"\"\"\n        Args:\n            cfg (CfgNode):\n        Use the custom checkpointer, which loads other backbone models\n        with matching heuristics.\n        \"\"\"\n        # Assume these objects must be constructed in this order.\n        model = self.build_model(cfg)\n        optimizer = self.build_optimizer(cfg, model)\n        data_loader = self.build_train_loader(cfg)\n\n        # For training, wrap with DDP. But don't need this for inference.\n        if comm.get_world_size() > 1:\n            model = DistributedDataParallel(\n                model, device_ids=[comm.get_local_rank()], broadcast_buffers=False\n            )\n        super(DefaultTrainer, self).__init__(model, data_loader, optimizer)\n\n        self.scheduler = self.build_lr_scheduler(cfg, optimizer)\n        # Assume no other objects need to be checkpointed.\n        # We can later make it checkpoint the stateful hooks\n        self.checkpointer = AdetCheckpointer(\n            # Assume you want to save checkpoints together with logs/statistics\n            model,\n            cfg.OUTPUT_DIR,\n            optimizer=optimizer,\n            scheduler=self.scheduler,\n        )\n        self.start_iter = 0\n        self.max_iter = cfg.SOLVER.MAX_ITER\n        self.cfg = cfg\n\n        self.register_hooks(self.build_hooks())\n\n    def train_loop(self, start_iter: int, max_iter: int):\n        \"\"\"\n        Args:\n            start_iter, max_iter (int): See docs above\n        \"\"\"\n        logger = logging.getLogger(__name__)\n        logger.info(\"Starting training from iteration {}\".format(start_iter))\n\n        self.iter = self.start_iter = start_iter\n        self.max_iter = max_iter\n\n        with EventStorage(start_iter) as self.storage:\n            self.before_train()\n            for self.iter in range(start_iter, max_iter):\n                self.before_step()\n                self.run_step()\n                self.after_step()\n            self.after_train()\n\n    def train(self):\n        \"\"\"\n        Run training.\n\n        Returns:\n            OrderedDict of results, if evaluation is enabled. Otherwise None.\n        \"\"\"\n        self.train_loop(self.start_iter, self.max_iter)\n        if hasattr(self, \"_last_eval_results\") and comm.is_main_process():\n            verify_results(self.cfg, self._last_eval_results)\n            return self._last_eval_results\n\n    @classmethod\n    def build_train_loader(cls, cfg):\n        \"\"\"\n        Returns:\n            iterable\n\n        It calls :func:`detectron2.data.build_detection_train_loader` with a customized\n        DatasetMapper, which adds categorical labels as a semantic mask.\n        \"\"\"\n        mapper = DatasetMapper(cfg, True)\n        return build_detection_train_loader(cfg, mapper)\n\n    @classmethod\n    def build_evaluator(cls, cfg, dataset_name, output_folder=None):\n        \"\"\"\n        Create evaluator(s) for a given dataset.\n        This uses the special metadata \"evaluator_type\" associated with each builtin dataset.\n        For your own dataset, you can simply create an evaluator manually in your\n        script and do not have to worry about the hacky if-else logic here.\n        \"\"\"\n        if output_folder is None:\n            output_folder = os.path.join(cfg.OUTPUT_DIR, \"inference\")\n        evaluator_list = []\n        evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type\n        if evaluator_type in [\"sem_seg\", \"coco_panoptic_seg\"]:\n            evaluator_list.append(\n                SemSegEvaluator(\n                    dataset_name,\n                    distributed=True,\n                    num_classes=cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES,\n                    ignore_label=cfg.MODEL.SEM_SEG_HEAD.IGNORE_VALUE,\n                    output_dir=output_folder,\n                )\n            )\n        if evaluator_type in [\"coco\", \"coco_panoptic_seg\"]:\n            evaluator_list.append(COCOEvaluator(dataset_name, cfg, True, output_folder))\n        if evaluator_type == \"coco_panoptic_seg\":\n            evaluator_list.append(COCOPanopticEvaluator(dataset_name, output_folder))\n        if evaluator_type == \"cityscapes\":\n            assert (\n                torch.cuda.device_count() >= comm.get_rank()\n            ), \"CityscapesEvaluator currently do not work with multiple machines.\"\n            return CityscapesEvaluator(dataset_name)\n        if evaluator_type == \"pascal_voc\":\n            return PascalVOCDetectionEvaluator(dataset_name)\n        if evaluator_type == \"lvis\":\n            return LVISEvaluator(dataset_name, cfg, True, output_folder)\n        if len(evaluator_list) == 0:\n            raise NotImplementedError(\n                \"no Evaluator for the dataset {} with the type {}\".format(\n                    dataset_name, evaluator_type\n                )\n            )\n        if len(evaluator_list) == 1:\n            return evaluator_list[0]\n        return DatasetEvaluators(evaluator_list)\n\n    @classmethod\n    def test_with_TTA(cls, cfg, model):\n        logger = logging.getLogger(\"detectron2.trainer\")\n        # In the end of training, run an evaluation with TTA\n        # Only support some R-CNN models.\n        logger.info(\"Running inference with test-time augmentation ...\")\n        model = GeneralizedRCNNWithTTA(cfg, model)\n        evaluators = [\n            cls.build_evaluator(\n                cfg, name, output_folder=os.path.join(cfg.OUTPUT_DIR, \"inference_TTA\")\n            )\n            for name in cfg.DATASETS.TEST\n        ]\n        res = cls.test(cfg, model, evaluators)\n        res = OrderedDict({k + \"_TTA\": v for k, v in res.items()})\n        return res\n\n\ndef setup(args):\n    \"\"\"\n    Create configs and perform basic setups.\n    \"\"\"\n    cfg = get_cfg()\n    cfg.merge_from_file(args.config_file)\n    cfg.merge_from_list(args.opts)\n    cfg.freeze()\n    default_setup(cfg, args)\n    return cfg\n\n\ndef main(args):\n    cfg = setup(args)\n\n    if args.eval_only:\n        model = Trainer.build_model(cfg)\n        AdetCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(\n            cfg.MODEL.WEIGHTS, resume=args.resume\n        )\n        res = Trainer.test(cfg, model)\n        if comm.is_main_process():\n            verify_results(cfg, res)\n        if cfg.TEST.AUG.ENABLED:\n            res.update(Trainer.test_with_TTA(cfg, model))\n        return res\n\n    \"\"\"\n    If you'd like to do anything fancier than the standard training logic,\n    consider writing your own training loop or subclassing the trainer.\n    \"\"\"\n    trainer = Trainer(cfg)\n    trainer.resume_or_load(resume=args.resume)\n    if cfg.TEST.AUG.ENABLED:\n        trainer.register_hooks(\n            [hooks.EvalHook(0, lambda: trainer.test_with_TTA(cfg, trainer.model))]\n        )\n    return trainer.train()\n\n\nif __name__ == \"__main__\":\n    args = default_argument_parser().parse_args()\n    print(\"Command Line Args:\", args)\n    launch(\n        main,\n        args.num_gpus,\n        num_machines=args.num_machines,\n        machine_rank=args.machine_rank,\n        dist_url=args.dist_url,\n        args=(args,),\n    )\n"
  }
]