## Updates
* 11/26/2020 v1.0 release
* 02/25/2021 improve camera and lidar calibration parameter
* 03/04/2021 update ROS bag with new tf (v1.1 release)
* 06/14/2021 fix missing labels of point cloud and fix wrong poses
* 01/24/2022 add Velodyne point clouds in kitti format and labels transfered from Ouster
## Overview
Semantic scene understanding is crucial for robust and safe autonomous navigation, particularly so in off-road environments. Recent deep learning advances for 3D semantic segmentation rely heavily on large sets of training data; however, existing autonomy datasets represent urban environments or lack multimodal off-road data. We fill this gap with RELLIS-3D, a multimodal dataset collected in an off-road environment containing annotations for **13,556 LiDAR scans** and **6,235 images**. The data was collected on the Rellis Campus of Texas A\&M University and presents challenges to existing algorithms related to class imbalance and environmental topography. Additionally, we evaluate the current state of the art deep learning semantic segmentation models on this dataset. Experimental results show that RELLIS-3D presents challenges for algorithms designed for segmentation in urban environments. Except for the annotated data, the dataset also provides full-stack sensor data in ROS bag format, including **RGB camera images**, **LiDAR point clouds**, **a pair of stereo images**, **high-precision GPS measurement**, and **IMU data**. This novel dataset provides the resources needed by researchers to develop more advanced algorithms and investigate new research directions to enhance autonomous navigation in off-road environments.

### Recording Platform
* [Clearpath Robobtics Warthog](https://clearpathrobotics.com/warthog-unmanned-ground-vehicle-robot/)
### Sensor Setup
* 64 channels Lidar: [Ouster OS1](https://ouster.com/products/os1-lidar-sensor)
* 32 Channels Lidar: [Velodyne Ultra Puck](https://velodynelidar.com/vlp-32c.html)
* 3D Stereo Camera: [Nerian Karmin2](https://nerian.com/products/karmin2-3d-stereo-camera/) + [Nerian SceneScan](https://nerian.com/products/scenescan-stereo-vision/) [(Sensor Configuration)](https://nerian.com/support/calculator/?1,10,0,6,2,1600,1200,1,1,1,800,592,1,6,2,0,61.4,0,0,1,1,0,5,0,0,0,0.66,0,1,25,1,0,1,256,0.25,256,4.0,5.0,0,1.5,1,#results)
* RGB Camera: [Basler acA1920-50gc](https://www.baslerweb.com/en/products/cameras/area-scan-cameras/ace/aca1920-50gc/) + [Edmund Optics 16mm/F1.8 86-571](https://www.edmundoptics.com/p/16mm-focal-length-hp-series-fixed-focal-length-lens/28990/)
* Inertial Navigation System (GPS/IMU): [Vectornav VN-300 Dual Antenna GNSS/INS](https://www.vectornav.com/products/vn-300)

## Folder structure
Rellis-3D
├── pt_test.lst
├── pt_val.lst
├── pt_train.lst
├── pt_test.lst
├── pt_train.lst
├── pt_val.lst
├── 00000
├── os1_cloud_node_kitti_bin/ -- directory containing ".bin" files with Ouster 64-Channels point clouds.
├── os1_cloud_node_semantickitti_label_id/ -- containing, ".label" files for Ouster Lidar point cloud with manually labelled semantics label
├── vel_cloud_node_kitti_bin/ -- directory containing ".bin" files with Velodyne 32-Channels point clouds.
├── vel_cloud_node_semantickitti_label_id/ -- containing, ".label" files for Velodyne Lidar point cloud transfered from Ouster point cloud.
├── pylon_camera_node/ -- directory containing ".png" files from the color camera.
├── pylon_camera_node_label_color -- color image lable
├── pylon_camera_node_label_id -- id image lable
├── calib.txt -- calibration of velodyne vs. camera. needed for projection of point cloud into camera.
└── poses.txt -- file containing the poses of every scan.
## Download Link on BaiDu Pan:
链接: https://pan.baidu.com/s/1akqSm7mpIMyUJhn_qwg3-w?pwd=4gk3 提取码: 4gk3 复制这段内容后打开百度网盘手机App,操作更方便哦
## Annotated Data:
### Ontology:
With the goal of providing multi-modal data to enhance autonomous off-road navigation, we defined an ontology of object and terrain classes, which largely derives from [the RUGD dataset](http://rugd.vision/) but also includes unique terrain and object classes not present in RUGD. Specifically, sequences from this dataset includes classes such as mud, man-made barriers, and rubble piles. Additionally, this dataset provides a finer-grained class structure for water sources, i.e., puddle and deep water, as these two classes present different traversability scenarios for most robotic platforms. Overall, 20 classes (including void class) are present in the data.
**Ontology Definition** ([Download 18KB](https://drive.google.com/file/d/1K8Zf0ju_xI5lnx3NTDLJpVTs59wmGPI6/view?usp=sharing))
### Images Statics:

Note: Due to the limitation of Google Drive, the downloads might be constrained. Please wait for 24h and try again. If you still can't access the file, please email maskjp@tamu.edu with the title "RELLIS-3D Access Request"..
### Image Download:
**Image with Annotation Examples** ([Download 3MB](https://drive.google.com/file/d/1wIig-LCie571DnK72p2zNAYYWeclEz1D/view?usp=sharing))
**Full Images** ([Download 11GB](https://drive.google.com/file/d/1F3Leu0H_m6aPVpZITragfreO_SGtL2yV/view?usp=sharing))
**Full Image Annotations Color Format** ([Download 119MB](https://drive.google.com/file/d/1HJl8Fi5nAjOr41DPUFmkeKWtDXhCZDke/view?usp=sharing))
**Full Image Annotations ID Format** ([Download 94MB](https://drive.google.com/file/d/16URBUQn_VOGvUqfms-0I8HHKMtjPHsu5/view?usp=sharing))
**Image Split File** ([44KB](https://drive.google.com/file/d/1zHmnVaItcYJAWat3Yti1W_5Nfux194WQ/view?usp=sharing))
### LiDAR Scans Statics:

### LiDAR Download:
**Ouster LiDAR with Annotation Examples** ([Download 24MB](https://drive.google.com/file/d/1QikPnpmxneyCuwefr6m50fBOSB2ny4LC/view?usp=sharing))
**Ouster LiDAR with Color Annotation PLY Format** ([Download 26GB](https://drive.google.com/file/d/1BZWrPOeLhbVItdN0xhzolfsABr6ymsRr/view?usp=sharing))
The header of the PLY file is described as followed:
```
element vertex
property float x
property float y
property float z
property float intensity
property uint t
property ushort reflectivity
property uchar ring
property ushort noise
property uint range
property uchar label
property uchar red
property uchar green
property uchar blue
```
To visualize the color of the ply file, please use [CloudCompare](https://www.danielgm.net/cc/) or [Open3D](http://www.open3d.org/). Meshlab has problem to visualize the color.
**Ouster LiDAR SemanticKITTI Format** ([Download 14GB](https://drive.google.com/file/d/1lDSVRf_kZrD0zHHMsKJ0V1GN9QATR4wH/view?usp=sharing))
To visualize the datasets using the SemanticKITTI tools, please use this fork: [https://github.com/unmannedlab/point_labeler](https://github.com/unmannedlab/point_labeler)
**Ouster LiDAR Annotation SemanticKITTI Format** ([Download 174MB](https://drive.google.com/file/d/12bsblHXtob60KrjV7lGXUQTdC5PhV8Er/view?usp=sharing))
**Ouster LiDAR Scan Poses files** ([Download 174MB](https://drive.google.com/file/d/1V3PT_NJhA41N7TBLp5AbW31d0ztQDQOX/view?usp=sharing))
**Ouster LiDAR Split File** ([75KB](https://drive.google.com/file/d/1raQJPySyqDaHpc53KPnJVl3Bln6HlcVS/view?usp=sharing))
**Velodyne LiDAR SemanticKITTI Format** ([Download 5.58GB](https://drive.google.com/file/d/1PiQgPQtJJZIpXumuHSig5Y6kxhAzz1cz/view?usp=sharing))
**Velodyne LiDAR Annotation SemanticKITTI Format** ([Download 143.6MB](https://drive.google.com/file/d/1n-9FkpiH4QUP7n0PnQBp-s7nzbSzmxp8/view?usp=sharing))
### Calibration Download:
**Camera Instrinsic** ([Download 2KB](https://drive.google.com/file/d/1NAigZTJYocRSOTfgFBddZYnDsI_CSpwK/view?usp=sharing))
**Basler Camera to Ouster LiDAR** ([Download 3KB](https://drive.google.com/file/d/19EOqWS9fDUFp4nsBrMCa69xs9LgIlS2e/view?usp=sharing))
**Velodyne LiDAR to Ouster LiDAR** ([Download 3KB](https://drive.google.com/file/d/1T6yPwcdzJoU-ifFRelLtDLPuPQswIQwf/view?usp=sharing))
**Stereo Calibration** ([Download 3KB](https://drive.google.com/file/d/1cP5-l_nYt3kZ4hZhEAHEdpt2fzToar0R/view?usp=sharing))
**Calibration Raw Data** ([Download 774MB](https://drive.google.com/drive/folders/1VAb-98lh6HWEe_EKLhUC1Xle0jkpp2Fl?usp=sharing
))
## Benchmarks
### Image Semantic Segmenation
models | sky | grass |tr ee | bush | concrete | mud | person | puddle | rubble | barrier | log | fence | vehicle | object | pole | water | asphalt | building | mean
-------| ----| ------|------|------|----------|-----| -------| -------|--------|---------|-----|-------| --------| -------|------|-------|---------|----------| ----
[HRNet+OCR](https://github.com/HRNet/HRNet-Semantic-Segmentation/tree/HRNet-OCR) | 96.94 | 90.20 | 80.53 | 76.76 | 84.22 | 43.29 | 89.48 | 73.94 | 62.03 | 54.86 | 0.00 | 39.52 | 41.54 | 46.44 | 9.51 | 0.72 | 33.25 | 4.60 | 48.83
[GSCNN](https://github.com/nv-tlabs/GSCNN) | 97.02 | 84.95 | 78.52 | 70.33 | 83.82 | 45.52 | 90.31 | 71.49 | 66.03 | 55.12 | 2.92 | 41.86 | 46.51 | 54.64 | 6.90 | 0.94 | 44.18 | 11.47 | 50.13
[](https://www.youtube.com/watch?v=vr3g6lCTKRM)
### LiDAR Semantic Segmenation
models | sky | grass |tr ee | bush | concrete | mud | person | puddle | rubble | barrier | log | fence | vehicle | object | pole | water | asphalt | building | mean
-------| ----| ------|------|------|----------|-----| -------| -------|--------|---------|-----|-------| --------| -------|------|-------|---------|----------| ----
[SalsaNext](https://github.com/Halmstad-University/SalsaNext) | - | 64.74 | 79.04 | 72.90 | 75.27 | 9.58 | 83.17 | 23.20 | 5.01 | 75.89 | 18.76 | 16.13| 23.12 | - | 56.26 | 0.00 | - | - | 40.20
[KPConv](https://github.com/HuguesTHOMAS/KPConv) | - | 56.41 | 49.25 | 58.45 | 33.91 | 0.00 | 81.20 | 0.00 | 0.00 | 0.00 | 0.00 | 0.40 | 0.00 | - | 0.00 | 0.00 | - | - | 18.64
[](https://www.youtube.com/watch?v=wkm8UiVNGao)
### Benchmark Reproduction
To reproduce the results, please refer to [here](./benchmarks/README.md)
## ROS Bag Raw Data
Data included in raw ROS bagfiles:
Topic Name | Message Tpye | Message Descriptison
------------ | ------------- | ---------------------------------
/img_node/intensity_image | sensor_msgs/Image | Intensity image generated by ouster Lidar
/img_node/noise_image | sensor_msgs/Image | Noise image generated by ouster Lidar
/img_node/range_image | sensor_msgs/Image | Range image generated by ouster Lidar
/imu/data | sensor_msgs/Imu | Filtered imu data from embeded imu of Warthog
/imu/data_raw | sensor_msgs/Imu | Raw imu data from embeded imu of Warthog
/imu/mag | sensor_msgs/MagneticField | Raw magnetic field data from embeded imu of Warthog
/left_drive/status/battery_current | std_msgs/Float64 |
/left_drive/status/battery_voltage | std_msgs/Float64 |
/mcu/status | warthog_msgs/Status |
/nerian/left/camera_info | sensor_msgs/CameraInfo |
/nerian/left/image_raw | sensor_msgs/Image | Left image from Nerian Karmin2
/nerian/right/camera_info | sensor_msgs/CameraInfo |
/nerian/right/image_raw | sensor_msgs/Image | Right image from Nerian Karmin2
/odometry/filtered | nav_msgs/Odometry | A filtered local-ization estimate based on wheel odometry (en-coders) and integrated IMU from Warthog
/os1_cloud_node/imu | sensor_msgs/Imu | Raw imu data from embeded imu of Ouster Lidar
/os1_cloud_node/points | sensor_msgs/PointCloud2 | Point cloud data from Ouster Lidar
/os1_node/imu_packets | ouster_ros/PacketMsg | Raw imu data from Ouster Lidar
/os1_node/lidar_packets | ouster_ros/PacketMsg | Raw lidar data from Ouster Lidar
/pylon_camera_node/camera_info | sensor_msgs/CameraInfo |
/pylon_camera_node/image_raw | sensor_msgs/Image |
/right_drive/status/battery_current | std_msgs/Float64 |
/right_drive/status/battery_voltage | std_msgs/Float64 |
/tf | tf2_msgs/TFMessage |
/tf_static | tf2_msgs/TFMessage
/vectornav/GPS | sensor_msgs/NavSatFix | INS data from VectorNav-VN300
/vectornav/IMU | sensor_msgs/Imu | Imu data from VectorNav-VN300
/vectornav/Mag | sensor_msgs/MagneticField | Raw magnetic field data from VectorNav-VN300
/vectornav/Odom | nav_msgs/Odometry | Odometry from VectorNav-VN300
/vectornav/Pres | sensor_msgs/FluidPressure |
/vectornav/Temp | sensor_msgs/Temperature |
/velodyne_points | sensor_msgs/PointCloud2 | PointCloud produced by the Velodyne Lidar
/warthog_velocity_controller/cmd_vel | geometry_msgs/Twist |
/warthog_velocity_controller/odom | nav_msgs/Odometry |
### ROS Bag Download
The following are the links for the ROS Bag files.
* Synced data (60 seconds example [2 GB](https://drive.google.com/file/d/13EHwiJtU0aAWBQn-ZJhTJwC1Yx2zDVUv/view?usp=sharing)): includes synced */os1_cloud_node/points*, */pylon_camera_node/camera_info* and */pylon_camera_node/image_raw*
* Full-stack Merged data:(60 seconds example [4.2 GB](https://drive.google.com/file/d/1qSeOoY6xbQGjcrZycgPM8Ty37eKDjpJL/view?usp=sharing)): includes all data in above table and extrinsic calibration info data embedded in the tf tree.
* Full-stack Split Raw data:(60 seconds example [4.3 GB](https://drive.google.com/file/d/1-TDpelP4wKTWUDTIn0dNuZIT3JkBoZ_R/view?usp=sharing)): is orignal data recorded by ```rosbag record``` command.
**Sequence 00000**: Synced data: ([12GB](https://drive.google.com/file/d/1bIb-6fWbaiI9Q8Pq9paANQwXWn7GJDtl/view?usp=sharing)) Full-stack Merged data: ([23GB](https://drive.google.com/file/d/1grcYRvtAijiA0Kzu-AV_9K4k2C1Kc3Tn/view?usp=sharing)) Full-stack Split Raw data: ([29GB](https://drive.google.com/drive/folders/1IZ-Tn_kzkp82mNbOL_4sNAniunD7tsYU?usp=sharing))
[](https://www.youtube.com/watch?v=Qc7IepWGKr8)
**Sequence 00001**: Synced data: ([8GB](https://drive.google.com/file/d/1xNjAFE3cv6X8n046irm8Bo5QMerNbwP1/view?usp=sharing)) Full-stack Merged data: ([16GB](https://drive.google.com/file/d/1geoU45pPavnabQ0arm4ILeHSsG3cU6ti/view?usp=sharing)) Full-stack Split Raw data: ([22GB](https://drive.google.com/drive/folders/1hf-vF5zyTKcCLqIiddIGdemzKT742T1t?usp=sharing))
[](https://www.youtube.com/watch?v=nO5JADjDWQ0)
**Sequence 00002**: Synced data: ([14GB](https://drive.google.com/file/d/1gy0ehP9Buj-VkpfvU9Qwyz1euqXXQ_mj/view?usp=sharing)) Full-stack Merged data: ([28GB](https://drive.google.com/file/d/1h0CVg62jTXiJ91LnR6md-WrUBDxT543n/view?usp=sharing)) Full-stack Split Raw data: ([37GB](https://drive.google.com/drive/folders/1R8jP5Qo7Z6uKPoG9XUvFCStwJu6rtliu?usp=sharing))
[](https://www.youtube.com/watch?v=aXaOmzjHmNE)
**Sequence 00003**:Synced data: ([8GB](https://drive.google.com/file/d/1vCeZusijzyn1ZrZbg4JaHKYSc2th7GEt/view?usp=sharing)) Full-stack Merged data: ([15GB](https://drive.google.com/file/d/1glJzgnTYLIB_ar3CgHpc_MBp5AafQpy9/view?usp=sharing)) Full-stack Split Raw data: ([19GB](https://drive.google.com/drive/folders/1iP0k6dbmPdAH9kkxs6ugi6-JbrkGhm5o?usp=sharing))
[](https://www.youtube.com/watch?v=Kjo3tGDSbtU)
**Sequence 00004**:Synced data: ([7GB](https://drive.google.com/file/d/1gxODhAd8CBM5AGvsoyuqN7yGpWazzmVy/view?usp=sharing)) Full-stack Merged data: ([14GB](https://drive.google.com/file/d/1AuEjX0do3jGZhGKPszSEUNoj85YswNya/view?usp=sharing)) Full-stack Split Raw data: ([17GB](https://drive.google.com/drive/folders/1WV9pecF2beESyM7N29W-nhi-JaoKvEqc?usp=sharing))
[](https://www.youtube.com/watch?v=lLLYTI4TCD4)
### ROS Environment Installment
The ROS workspace includes a plaftform description package which can provide rough tf tree for running the rosbag.
To run cartographer on RELLIS-3D please refer to [here](https://github.com/unmannedlab/cartographer)

## Full Data Download:
[Access Link](https://drive.google.com/drive/folders/1aZ1tJ3YYcWuL3oWKnrTIC5gq46zx1bMc?usp=sharing)
## Citation
```
@misc{jiang2020rellis3d,
title={RELLIS-3D Dataset: Data, Benchmarks and Analysis},
author={Peng Jiang and Philip Osteen and Maggie Wigness and Srikanth Saripalli},
year={2020},
eprint={2011.12954},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```
## Collaborator
## License
All datasets and code on this page are copyright by us and published under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 License.
## Related Work
[SemanticUSL: A Dataset for Semantic Segmentation Domain Adatpation](https://unmannedlab.github.io/research/SemanticUSL)
[LiDARNet: A Boundary-Aware Domain Adaptation Model for Lidar Point Cloud Semantic Segmentation](https://unmannedlab.github.io/research/LiDARNet)
[A RUGD Dataset for Autonomous Navigation and Visual Perception inUnstructured Outdoor Environments](http://rugd.vision/)
================================================
FILE: benchmarks/GSCNN-master/.gitignore
================================================
logs/
network/pretrained_models/
__pycache__/
================================================
FILE: benchmarks/GSCNN-master/Dockerfile
================================================
FROM pytorch/pytorch:1.0-cuda10.0-cudnn7-devel
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get update \
&& apt-get install -y software-properties-common wget \
&& add-apt-repository -y ppa:ubuntu-toolchain-r/test \
&& apt-get update \
&& apt-get install -y make git curl vim vim-gnome
# Install apt-get
RUN apt-get install -y python3-pip python3-dev vim htop python3-tk pkg-config
RUN pip3 install --upgrade pip==9.0.1
# Install from pip
RUN pip3 install pyyaml \
scipy==1.1.0 \
numpy \
tensorflow \
scikit-learn \
scikit-image \
matplotlib \
opencv-python \
torch==1.0.0 \
torchvision==0.2.0 \
torch-encoding==1.0.1 \
tensorboardX \
tqdm
================================================
FILE: benchmarks/GSCNN-master/LICENSE
================================================
Copyright (C) 2019 NVIDIA Corporation. Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler
All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
Permission to use, copy, modify, and distribute this software and its documentation
for any non-commercial purpose is hereby granted without fee, provided that the above
copyright notice appear in all copies and that both that copyright notice and this
permission notice appear in supporting documentation, and that the name of the author
not be used in advertising or publicity pertaining to distribution of the software
without specific, written prior permission.
THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
================================================
FILE: benchmarks/GSCNN-master/README.md
================================================
# GSCNN
This is the official code for:
#### Gated-SCNN: Gated Shape CNNs for Semantic Segmentation
[Towaki Takikawa](https://tovacinni.github.io), [David Acuna](http://www.cs.toronto.edu/~davidj/), [Varun Jampani](https://varunjampani.github.io), [Sanja Fidler](http://www.cs.toronto.edu/~fidler/)
ICCV 2019
**[[Paper](https://arxiv.org/abs/1907.05740)] [[Project Page](https://nv-tlabs.github.io/GSCNN/)]**

Based on based on https://github.com/NVIDIA/semantic-segmentation.
## License
```
Copyright (C) 2019 NVIDIA Corporation. Towaki Takikawa, David Acuna, Varun Jampani, Sanja Fidler
All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
Permission to use, copy, modify, and distribute this software and its documentation
for any non-commercial purpose is hereby granted without fee, provided that the above
copyright notice appear in all copies and that both that copyright notice and this
permission notice appear in supporting documentation, and that the name of the author
not be used in advertising or publicity pertaining to distribution of the software
without specific, written prior permission.
THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE.
IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL
DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
~
```
## Usage
##### Clone this repo
```bash
git clone https://github.com/nv-tlabs/GSCNN
cd GSCNN
```
#### Python requirements
Currently, the code supports Python 3
* numpy
* PyTorch (>=1.1.0)
* torchvision
* scipy
* scikit-image
* tensorboardX
* tqdm
* torch-encoding
* opencv
* PyYAML
#### Download pretrained models
Download the pretrained model from the [Google Drive Folder](https://drive.google.com/file/d/1wlhAXg-PfoUM-rFy2cksk43Ng3PpsK2c/view), and save it in 'checkpoints/'
#### Download inferred images
Download (if needed) the inferred images from the [Google Drive Folder](https://drive.google.com/file/d/105WYnpSagdlf5-ZlSKWkRVeq-MyKLYOV/view)
#### Evaluation (Cityscapes)
```bash
python train.py --evaluate --snapshot checkpoints/best_cityscapes_checkpoint.pth
```
#### Training
A note on training- we train on 8 NVIDIA GPUs, and as such, training will be an issue with WiderResNet38 if you try to train on a single GPU.
If you use this code, please cite:
```
@article{takikawa2019gated,
title={Gated-SCNN: Gated Shape CNNs for Semantic Segmentation},
author={Takikawa, Towaki and Acuna, David and Jampani, Varun and Fidler, Sanja},
journal={ICCV},
year={2019}
}
```
================================================
FILE: benchmarks/GSCNN-master/config.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# Code adapted from:
# https://github.com/facebookresearch/Detectron/blob/master/detectron/core/config.py
Source License
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import copy
import six
import os.path as osp
from ast import literal_eval
import numpy as np
import yaml
import torch
import torch.nn as nn
from torch.nn import init
from utils.AttrDict import AttrDict
__C = AttrDict()
# Consumers can get config by:
# from fast_rcnn_config import cfg
cfg = __C
__C.EPOCH = 0
__C.CLASS_UNIFORM_PCT=0.0
__C.BATCH_WEIGHTING=False
__C.BORDER_WINDOW=1
__C.REDUCE_BORDER_EPOCH= -1
__C.STRICTBORDERCLASS= None
__C.DATASET =AttrDict()
__C.DATASET.CITYSCAPES_DIR='/home/usl/Datasets/cityscapes/'
__C.DATASET.RELLIS_DIR='/path/to/RELLIS-3D/'
__C.DATASET.CV_SPLITS=3
__C.MODEL = AttrDict()
__C.MODEL.BN = 'regularnorm'
__C.MODEL.BNFUNC = torch.nn.BatchNorm2d
__C.MODEL.BIGMEMORY = False
def assert_and_infer_cfg(args, make_immutable=True):
"""Call this function in your script after you have finished setting all cfg
values that are necessary (e.g., merging a config from a file, merging
command line config options, etc.). By default, this function will also
mark the global cfg as immutable to prevent changing the global cfg settings
during script execution (which can lead to hard to debug errors or code
that's harder to understand than is necessary).
"""
if args.batch_weighting:
__C.BATCH_WEIGHTING=True
if args.syncbn:
import encoding
__C.MODEL.BN = 'syncnorm'
__C.MODEL.BNFUNC = encoding.nn.BatchNorm2d
else:
__C.MODEL.BNFUNC = torch.nn.BatchNorm2d
print('Using regular batch norm')
if make_immutable:
cfg.immutable(True)
================================================
FILE: benchmarks/GSCNN-master/datasets/__init__.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
from datasets import cityscapes, rellis
import torchvision.transforms as standard_transforms
import torchvision.utils as vutils
import transforms.joint_transforms as joint_transforms
import transforms.transforms as extended_transforms
from torch.utils.data import DataLoader
def setup_loaders(args):
'''
input: argument passed by the user
return: training data loader, validation data loader loader, train_set
'''
if args.dataset == 'cityscapes':
args.dataset_cls = cityscapes
args.train_batch_size = args.bs_mult * args.ngpu
if args.bs_mult_val > 0:
args.val_batch_size = args.bs_mult_val * args.ngpu
else:
args.val_batch_size = args.bs_mult * args.ngpu
mean_std = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
elif args.dataset == "rellis":
args.dataset_cls = rellis
args.train_batch_size = args.bs_mult * args.ngpu
if args.bs_mult_val > 0:
args.val_batch_size = args.bs_mult_val * args.ngpu
else:
args.val_batch_size = args.bs_mult * args.ngpu
mean_std = ([0.496588, 0.59493099, 0.53358843], [0.496588, 0.59493099, 0.53358843])
else:
raise
args.num_workers = 4 * args.ngpu
if args.test_mode:
args.num_workers = 0 #1
# Geometric image transformations
train_joint_transform_list = [
joint_transforms.RandomSizeAndCrop(args.crop_size,
False,
pre_size=args.pre_size,
scale_min=args.scale_min,
scale_max=args.scale_max,
ignore_index=args.dataset_cls.ignore_label),
joint_transforms.Resize(args.crop_size),
joint_transforms.RandomHorizontallyFlip()]
#if args.rotate:
# train_joint_transform_list += [joint_transforms.RandomRotate(args.rotate)]
train_joint_transform = joint_transforms.Compose(train_joint_transform_list)
# Image appearance transformations
train_input_transform = []
if args.color_aug:
train_input_transform += [extended_transforms.ColorJitter(
brightness=args.color_aug,
contrast=args.color_aug,
saturation=args.color_aug,
hue=args.color_aug)]
if args.bblur:
train_input_transform += [extended_transforms.RandomBilateralBlur()]
elif args.gblur:
train_input_transform += [extended_transforms.RandomGaussianBlur()]
else:
pass
train_input_transform += [standard_transforms.ToTensor(),
standard_transforms.Normalize(*mean_std)]
train_input_transform = standard_transforms.Compose(train_input_transform)
val_input_transform = standard_transforms.Compose([
standard_transforms.ToTensor(),
standard_transforms.Normalize(*mean_std)
])
target_transform = extended_transforms.MaskToTensor()
target_train_transform = extended_transforms.MaskToTensor()
if args.dataset == 'cityscapes':
city_mode = 'train' ## Can be trainval
city_quality = 'fine'
train_set = args.dataset_cls.CityScapes(
city_quality, city_mode, 0,
joint_transform=train_joint_transform,
transform=train_input_transform,
target_transform=target_train_transform,
dump_images=args.dump_augmentation_images,
cv_split=args.cv)
val_set = args.dataset_cls.CityScapes('fine', 'val', 0,
transform=val_input_transform,
target_transform=target_transform,
cv_split=args.cv)
elif args.dataset == 'rellis':
if args.mode != "test":
city_mode = 'train'
train_set = args.dataset_cls.Rellis(
city_mode,
joint_transform=train_joint_transform,
transform=train_input_transform,
target_transform=target_train_transform,
dump_images=args.dump_augmentation_images,
cv_split=args.cv)
val_set = args.dataset_cls.Rellis('val',
transform=val_input_transform,
target_transform=target_transform,
cv_split=args.cv)
else:
city_mode = 'test'
train_set = args.dataset_cls.Rellis('test',
transform=val_input_transform,
target_transform=target_transform,
cv_split=args.cv)
val_set = args.dataset_cls.Rellis('test',
transform=val_input_transform,
target_transform=target_transform,
cv_split=args.cv)
else:
raise
train_sampler = None
val_sampler = None
train_loader = DataLoader(train_set, batch_size=args.train_batch_size,
num_workers=args.num_workers, shuffle=(train_sampler is None), drop_last=True, sampler = train_sampler)
val_loader = DataLoader(val_set, batch_size=args.val_batch_size,
num_workers=args.num_workers // 2 , shuffle=False, drop_last=False, sampler = val_sampler)
return train_loader, val_loader, train_set
================================================
FILE: benchmarks/GSCNN-master/datasets/cityscapes.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
import os
import numpy as np
import torch
from PIL import Image
from torch.utils import data
from collections import defaultdict
import math
import logging
import datasets.cityscapes_labels as cityscapes_labels
import json
from config import cfg
import torchvision.transforms as transforms
import datasets.edge_utils as edge_utils
trainid_to_name = cityscapes_labels.trainId2name
id_to_trainid = cityscapes_labels.label2trainid
num_classes = 19
ignore_label = 255
root = cfg.DATASET.CITYSCAPES_DIR
palette = [128, 64, 128, 244, 35, 232, 70, 70, 70, 102, 102, 156, 190, 153, 153,
153, 153, 153, 250, 170, 30,
220, 220, 0, 107, 142, 35, 152, 251, 152, 70, 130, 180, 220, 20, 60,
255, 0, 0, 0, 0, 142, 0, 0, 70,
0, 60, 100, 0, 80, 100, 0, 0, 230, 119, 11, 32]
zero_pad = 256 * 3 - len(palette)
for i in range(zero_pad):
palette.append(0)
def colorize_mask(mask):
# mask: numpy array of the mask
new_mask = Image.fromarray(mask.astype(np.uint8)).convert('P')
new_mask.putpalette(palette)
return new_mask
def add_items(items, aug_items, cities, img_path, mask_path, mask_postfix, mode, maxSkip):
for c in cities:
c_items = [name.split('_leftImg8bit.png')[0] for name in
os.listdir(os.path.join(img_path, c))]
for it in c_items:
item = (os.path.join(img_path, c, it + '_leftImg8bit.png'),
os.path.join(mask_path, c, it + mask_postfix))
items.append(item)
def make_cv_splits(img_dir_name):
'''
Create splits of train/val data.
A split is a lists of cities.
split0 is aligned with the default Cityscapes train/val.
'''
trn_path = os.path.join(root, img_dir_name, 'leftImg8bit', 'train')
val_path = os.path.join(root, img_dir_name, 'leftImg8bit', 'val')
trn_cities = ['train/' + c for c in os.listdir(trn_path)]
val_cities = ['val/' + c for c in os.listdir(val_path)]
# want reproducible randomly shuffled
trn_cities = sorted(trn_cities)
all_cities = val_cities + trn_cities
num_val_cities = len(val_cities)
num_cities = len(all_cities)
cv_splits = []
for split_idx in range(cfg.DATASET.CV_SPLITS):
split = {}
split['train'] = []
split['val'] = []
offset = split_idx * num_cities // cfg.DATASET.CV_SPLITS
for j in range(num_cities):
if j >= offset and j < (offset + num_val_cities):
split['val'].append(all_cities[j])
else:
split['train'].append(all_cities[j])
cv_splits.append(split)
return cv_splits
def make_split_coarse(img_path):
'''
Create a train/val split for coarse
return: city split in train
'''
all_cities = os.listdir(img_path)
all_cities = sorted(all_cities) # needs to always be the same
val_cities = [] # Can manually set cities to not be included into train split
split = {}
split['val'] = val_cities
split['train'] = [c for c in all_cities if c not in val_cities]
return split
def make_test_split(img_dir_name):
test_path = os.path.join(root, img_dir_name, 'leftImg8bit', 'test')
test_cities = ['test/' + c for c in os.listdir(test_path)]
return test_cities
def make_dataset(quality, mode, maxSkip=0, fine_coarse_mult=6, cv_split=0):
'''
Assemble list of images + mask files
fine - modes: train/val/test/trainval cv:0,1,2
coarse - modes: train/val cv:na
path examples:
leftImg8bit_trainextra/leftImg8bit/train_extra/augsburg
gtCoarse/gtCoarse/train_extra/augsburg
'''
items = []
aug_items = []
if quality == 'fine':
assert mode in ['train', 'val', 'test', 'trainval']
img_dir_name = 'leftImg8bit_trainvaltest'
img_path = os.path.join(root, img_dir_name, 'leftImg8bit')
mask_path = os.path.join(root, 'gtFine_trainvaltest', 'gtFine')
mask_postfix = '_gtFine_labelIds.png'
cv_splits = make_cv_splits(img_dir_name)
if mode == 'trainval':
modes = ['train', 'val']
else:
modes = [mode]
for mode in modes:
if mode == 'test':
cv_splits = make_test_split(img_dir_name)
add_items(items, cv_splits, img_path, mask_path,
mask_postfix)
else:
logging.info('{} fine cities: '.format(mode) + str(cv_splits[cv_split][mode]))
add_items(items, aug_items, cv_splits[cv_split][mode], img_path, mask_path,
mask_postfix, mode, maxSkip)
else:
raise 'unknown cityscapes quality {}'.format(quality)
logging.info('Cityscapes-{}: {} images'.format(mode, len(items)+len(aug_items)))
return items, aug_items
class CityScapes(data.Dataset):
def __init__(self, quality, mode, maxSkip=0, joint_transform=None, sliding_crop=None,
transform=None, target_transform=None, dump_images=False,
cv_split=None, eval_mode=False,
eval_scales=None, eval_flip=False):
self.quality = quality
self.mode = mode
self.maxSkip = maxSkip
self.joint_transform = joint_transform
self.sliding_crop = sliding_crop
self.transform = transform
self.target_transform = target_transform
self.dump_images = dump_images
self.eval_mode = eval_mode
self.eval_flip = eval_flip
self.eval_scales = None
if eval_scales != None:
self.eval_scales = [float(scale) for scale in eval_scales.split(",")]
if cv_split:
self.cv_split = cv_split
assert cv_split < cfg.DATASET.CV_SPLITS, \
'expected cv_split {} to be < CV_SPLITS {}'.format(
cv_split, cfg.DATASET.CV_SPLITS)
else:
self.cv_split = 0
self.imgs, _ = make_dataset(quality, mode, self.maxSkip, cv_split=self.cv_split)
if len(self.imgs) == 0:
raise RuntimeError('Found 0 images, please check the data set')
self.mean_std = ([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
def _eval_get_item(self, img, mask, scales, flip_bool):
return_imgs = []
for flip in range(int(flip_bool)+1):
imgs = []
if flip :
img = img.transpose(Image.FLIP_LEFT_RIGHT)
for scale in scales:
w,h = img.size
target_w, target_h = int(w * scale), int(h * scale)
resize_img =img.resize((target_w, target_h))
tensor_img = transforms.ToTensor()(resize_img)
final_tensor = transforms.Normalize(*self.mean_std)(tensor_img)
imgs.append(tensor_img)
return_imgs.append(imgs)
return return_imgs, mask
def __getitem__(self, index):
img_path, mask_path = self.imgs[index]
img, mask = Image.open(img_path).convert('RGB'), Image.open(mask_path)
img_name = os.path.splitext(os.path.basename(img_path))[0]
mask = np.array(mask)
mask_copy = mask.copy()
for k, v in id_to_trainid.items():
mask_copy[mask == k] = v
if self.eval_mode:
return self._eval_get_item(img, mask_copy, self.eval_scales, self.eval_flip), img_name
mask = Image.fromarray(mask_copy.astype(np.uint8))
# Image Transformations
if self.joint_transform is not None:
img, mask = self.joint_transform(img, mask)
if self.transform is not None:
img = self.transform(img)
if self.target_transform is not None:
mask = self.target_transform(mask)
_edgemap = mask.numpy()
_edgemap = edge_utils.mask_to_onehot(_edgemap, num_classes)
_edgemap = edge_utils.onehot_to_binary_edges(_edgemap, 2, num_classes)
edgemap = torch.from_numpy(_edgemap).float()
# Debug
if self.dump_images:
outdir = '../../dump_imgs_{}'.format(self.mode)
os.makedirs(outdir, exist_ok=True)
out_img_fn = os.path.join(outdir, img_name + '.png')
out_msk_fn = os.path.join(outdir, img_name + '_mask.png')
mask_img = colorize_mask(np.array(mask))
img.save(out_img_fn)
mask_img.save(out_msk_fn)
return img, mask, edgemap, img_name
def __len__(self):
return len(self.imgs)
def make_dataset_video():
img_dir_name = 'leftImg8bit_demoVideo'
img_path = os.path.join(root, img_dir_name, 'leftImg8bit/demoVideo')
items = []
categories = os.listdir(img_path)
for c in categories[1:]:
c_items = [name.split('_leftImg8bit.png')[0] for name in
os.listdir(os.path.join(img_path, c))]
for it in c_items:
item = os.path.join(img_path, c, it + '_leftImg8bit.png')
items.append(item)
return items
class CityScapesVideo(data.Dataset):
def __init__(self, transform=None):
self.imgs = make_dataset_video()
if len(self.imgs) == 0:
raise RuntimeError('Found 0 images, please check the data set')
self.transform = transform
def __getitem__(self, index):
img_path = self.imgs[index]
img = Image.open(img_path).convert('RGB')
img_name = os.path.splitext(os.path.basename(img_path))[0]
if self.transform is not None:
img = self.transform(img)
return img, img_name
def __len__(self):
return len(self.imgs)
================================================
FILE: benchmarks/GSCNN-master/datasets/cityscapes_labels.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# File taken from https://github.com/mcordts/cityscapesScripts/
# License File Available at:
# https://github.com/mcordts/cityscapesScripts/blob/master/license.txt
# ----------------------
# The Cityscapes Dataset
# ----------------------
#
#
# License agreement
# -----------------
#
# This dataset is made freely available to academic and non-academic entities for non-commercial purposes such as academic research, teaching, scientific publications, or personal experimentation. Permission is granted to use the data given that you agree:
#
# 1. That the dataset comes "AS IS", without express or implied warranty. Although every effort has been made to ensure accuracy, we (Daimler AG, MPI Informatics, TU Darmstadt) do not accept any responsibility for errors or omissions.
# 2. That you include a reference to the Cityscapes Dataset in any work that makes use of the dataset. For research papers, cite our preferred publication as listed on our website; for other media cite our preferred publication as listed on our website or link to the Cityscapes website.
# 3. That you do not distribute this dataset or modified versions. It is permissible to distribute derivative works in as far as they are abstract representations of this dataset (such as models trained on it or additional annotations that do not directly include any of our data) and do not allow to recover the dataset or something similar in character.
# 4. That you may not use the dataset or any derivative work for commercial purposes as, for example, licensing or selling the data, or using the data with a purpose to procure a commercial gain.
# 5. That all rights not expressly granted to you are reserved by us (Daimler AG, MPI Informatics, TU Darmstadt).
#
#
# Contact
# -------
#
# Marius Cordts, Mohamed Omran
# www.cityscapes-dataset.net
"""
from collections import namedtuple
#--------------------------------------------------------------------------------
# Definitions
#--------------------------------------------------------------------------------
# a label and all meta information
Label = namedtuple( 'Label' , [
'name' , # The identifier of this label, e.g. 'car', 'person', ... .
# We use them to uniquely name a class
'id' , # An integer ID that is associated with this label.
# The IDs are used to represent the label in ground truth images
# An ID of -1 means that this label does not have an ID and thus
# is ignored when creating ground truth images (e.g. license plate).
# Do not modify these IDs, since exactly these IDs are expected by the
# evaluation server.
'trainId' , # Feel free to modify these IDs as suitable for your method. Then create
# ground truth images with train IDs, using the tools provided in the
# 'preparation' folder. However, make sure to validate or submit results
# to our evaluation server using the regular IDs above!
# For trainIds, multiple labels might have the same ID. Then, these labels
# are mapped to the same class in the ground truth images. For the inverse
# mapping, we use the label that is defined first in the list below.
# For example, mapping all void-type classes to the same ID in training,
# might make sense for some approaches.
# Max value is 255!
'category' , # The name of the category that this label belongs to
'categoryId' , # The ID of this category. Used to create ground truth images
# on category level.
'hasInstances', # Whether this label distinguishes between single instances or not
'ignoreInEval', # Whether pixels having this class as ground truth label are ignored
# during evaluations or not
'color' , # The color of this label
] )
#--------------------------------------------------------------------------------
# A list of all labels
#--------------------------------------------------------------------------------
# Please adapt the train IDs as appropriate for you approach.
# Note that you might want to ignore labels with ID 255 during training.
# Further note that the current train IDs are only a suggestion. You can use whatever you like.
# Make sure to provide your results using the original IDs and not the training IDs.
# Note that many IDs are ignored in evaluation and thus you never need to predict these!
labels = [
# name id trainId category catId hasInstances ignoreInEval color
Label( 'unlabeled' , 0 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ),
Label( 'ego vehicle' , 1 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ),
Label( 'rectification border' , 2 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ),
Label( 'out of roi' , 3 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ),
Label( 'static' , 4 , 255 , 'void' , 0 , False , True , ( 0, 0, 0) ),
Label( 'dynamic' , 5 , 255 , 'void' , 0 , False , True , (111, 74, 0) ),
Label( 'ground' , 6 , 255 , 'void' , 0 , False , True , ( 81, 0, 81) ),
Label( 'road' , 7 , 0 , 'flat' , 1 , False , False , (128, 64,128) ),
Label( 'sidewalk' , 8 , 1 , 'flat' , 1 , False , False , (244, 35,232) ),
Label( 'parking' , 9 , 255 , 'flat' , 1 , False , True , (250,170,160) ),
Label( 'rail track' , 10 , 255 , 'flat' , 1 , False , True , (230,150,140) ),
Label( 'building' , 11 , 2 , 'construction' , 2 , False , False , ( 70, 70, 70) ),
Label( 'wall' , 12 , 3 , 'construction' , 2 , False , False , (102,102,156) ),
Label( 'fence' , 13 , 4 , 'construction' , 2 , False , False , (190,153,153) ),
Label( 'guard rail' , 14 , 255 , 'construction' , 2 , False , True , (180,165,180) ),
Label( 'bridge' , 15 , 255 , 'construction' , 2 , False , True , (150,100,100) ),
Label( 'tunnel' , 16 , 255 , 'construction' , 2 , False , True , (150,120, 90) ),
Label( 'pole' , 17 , 5 , 'object' , 3 , False , False , (153,153,153) ),
Label( 'polegroup' , 18 , 255 , 'object' , 3 , False , True , (153,153,153) ),
Label( 'traffic light' , 19 , 6 , 'object' , 3 , False , False , (250,170, 30) ),
Label( 'traffic sign' , 20 , 7 , 'object' , 3 , False , False , (220,220, 0) ),
Label( 'vegetation' , 21 , 8 , 'nature' , 4 , False , False , (107,142, 35) ),
Label( 'terrain' , 22 , 9 , 'nature' , 4 , False , False , (152,251,152) ),
Label( 'sky' , 23 , 10 , 'sky' , 5 , False , False , ( 70,130,180) ),
Label( 'person' , 24 , 11 , 'human' , 6 , True , False , (220, 20, 60) ),
Label( 'rider' , 25 , 12 , 'human' , 6 , True , False , (255, 0, 0) ),
Label( 'car' , 26 , 13 , 'vehicle' , 7 , True , False , ( 0, 0,142) ),
Label( 'truck' , 27 , 14 , 'vehicle' , 7 , True , False , ( 0, 0, 70) ),
Label( 'bus' , 28 , 15 , 'vehicle' , 7 , True , False , ( 0, 60,100) ),
Label( 'caravan' , 29 , 255 , 'vehicle' , 7 , True , True , ( 0, 0, 90) ),
Label( 'trailer' , 30 , 255 , 'vehicle' , 7 , True , True , ( 0, 0,110) ),
Label( 'train' , 31 , 16 , 'vehicle' , 7 , True , False , ( 0, 80,100) ),
Label( 'motorcycle' , 32 , 17 , 'vehicle' , 7 , True , False , ( 0, 0,230) ),
Label( 'bicycle' , 33 , 18 , 'vehicle' , 7 , True , False , (119, 11, 32) ),
Label( 'license plate' , -1 , -1 , 'vehicle' , 7 , False , True , ( 0, 0,142) ),
Label( 'license plate' , 34 , 255 , 'vehicle' , 7 , False , True , ( 0, 0,142) ),
]
#--------------------------------------------------------------------------------
# Create dictionaries for a fast lookup
#--------------------------------------------------------------------------------
# Please refer to the main method below for example usages!
# name to label object
name2label = { label.name : label for label in labels }
# id to label object
id2label = { label.id : label for label in labels }
# trainId to label object
trainId2label = { label.trainId : label for label in reversed(labels) }
# label2trainid
label2trainid = { label.id : label.trainId for label in labels }
# trainId to label object
trainId2name = { label.trainId : label.name for label in labels }
trainId2color = { label.trainId : label.color for label in labels }
# category to list of label objects
category2labels = {}
for label in labels:
category = label.category
if category in category2labels:
category2labels[category].append(label)
else:
category2labels[category] = [label]
#--------------------------------------------------------------------------------
# Assure single instance name
#--------------------------------------------------------------------------------
# returns the label name that describes a single instance (if possible)
# e.g. input | output
# ----------------------
# car | car
# cargroup | car
# foo | None
# foogroup | None
# skygroup | None
def assureSingleInstanceName( name ):
# if the name is known, it is not a group
if name in name2label:
return name
# test if the name actually denotes a group
if not name.endswith("group"):
return None
# remove group
name = name[:-len("group")]
# test if the new name exists
if not name in name2label:
return None
# test if the new name denotes a label that actually has instances
if not name2label[name].hasInstances:
return None
# all good then
return name
#--------------------------------------------------------------------------------
# Main for testing
#--------------------------------------------------------------------------------
# just a dummy main
if __name__ == "__main__":
# Print all the labels
print("List of cityscapes labels:")
print("")
print((" {:>21} | {:>3} | {:>7} | {:>14} | {:>10} | {:>12} | {:>12}".format( 'name', 'id', 'trainId', 'category', 'categoryId', 'hasInstances', 'ignoreInEval' )))
print((" " + ('-' * 98)))
for label in labels:
print((" {:>21} | {:>3} | {:>7} | {:>14} | {:>10} | {:>12} | {:>12}".format( label.name, label.id, label.trainId, label.category, label.categoryId, label.hasInstances, label.ignoreInEval )))
print("")
print("Example usages:")
# Map from name to label
name = 'car'
id = name2label[name].id
print(("ID of label '{name}': {id}".format( name=name, id=id )))
# Map from ID to label
category = id2label[id].category
print(("Category of label with ID '{id}': {category}".format( id=id, category=category )))
# Map from trainID to label
trainId = 0
name = trainId2label[trainId].name
print(("Name of label with trainID '{id}': {name}".format( id=trainId, name=name )))
================================================
FILE: benchmarks/GSCNN-master/datasets/edge_utils.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
import os
import numpy as np
from PIL import Image
from scipy.ndimage.morphology import distance_transform_edt
def mask_to_onehot(mask, num_classes):
"""
Converts a segmentation mask (H,W) to (K,H,W) where the last dim is a one
hot encoding vector
"""
_mask = [mask == (i + 1) for i in range(num_classes)]
return np.array(_mask).astype(np.uint8)
def onehot_to_mask(mask):
"""
Converts a mask (K,H,W) to (H,W)
"""
_mask = np.argmax(mask, axis=0)
_mask[_mask != 0] += 1
return _mask
def onehot_to_multiclass_edges(mask, radius, num_classes):
"""
Converts a segmentation mask (K,H,W) to an edgemap (K,H,W)
"""
if radius < 0:
return mask
# We need to pad the borders for boundary conditions
mask_pad = np.pad(mask, ((0, 0), (1, 1), (1, 1)), mode='constant', constant_values=0)
channels = []
for i in range(num_classes):
dist = distance_transform_edt(mask_pad[i, :])+distance_transform_edt(1.0-mask_pad[i, :])
dist = dist[1:-1, 1:-1]
dist[dist > radius] = 0
dist = (dist > 0).astype(np.uint8)
channels.append(dist)
return np.array(channels)
def onehot_to_binary_edges(mask, radius, num_classes):
"""
Converts a segmentation mask (K,H,W) to a binary edgemap (H,W)
"""
if radius < 0:
return mask
# We need to pad the borders for boundary conditions
mask_pad = np.pad(mask, ((0, 0), (1, 1), (1, 1)), mode='constant', constant_values=0)
edgemap = np.zeros(mask.shape[1:])
for i in range(num_classes):
dist = distance_transform_edt(mask_pad[i, :])+distance_transform_edt(1.0-mask_pad[i, :])
dist = dist[1:-1, 1:-1]
dist[dist > radius] = 0
edgemap += dist
edgemap = np.expand_dims(edgemap, axis=0)
edgemap = (edgemap > 0).astype(np.uint8)
return edgemap
================================================
FILE: benchmarks/GSCNN-master/datasets/rellis.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
import os
import numpy as np
import torch
from PIL import Image
from torch.utils import data
from collections import defaultdict
import math
import logging
import datasets.cityscapes_labels as cityscapes_labels
import json
from config import cfg
import torchvision.transforms as transforms
import datasets.edge_utils as edge_utils
trainid_to_name = cityscapes_labels.trainId2name
id_to_trainid = cityscapes_labels.label2trainid
num_classes = 19
ignore_label = 0
root = cfg.DATASET.RELLIS_DIR
list_paths = {'train':'train.lst','val':"val.lst",'test':'test.lst'}
palette = [128, 64, 128, 244, 35, 232, 70, 70, 70, 102, 102, 156, 190, 153, 153,
153, 153, 153, 250, 170, 30,
220, 220, 0, 107, 142, 35, 152, 251, 152, 70, 130, 180, 220, 20, 60,
255, 0, 0, 0, 0, 142, 0, 0, 70,
0, 60, 100, 0, 80, 100, 0, 0, 230, 119, 11, 32]
zero_pad = 256 * 3 - len(palette)
for i in range(zero_pad):
palette.append(0)
def colorize_mask(mask):
# mask: numpy array of the mask
new_mask = Image.fromarray(mask.astype(np.uint8)).convert('P')
new_mask.putpalette(palette)
return new_mask
class Rellis(data.Dataset):
def __init__(self, mode, joint_transform=None, sliding_crop=None,
transform=None, target_transform=None, dump_images=False,
cv_split=None, eval_mode=False,
eval_scales=None, eval_flip=False):
self.mode = mode
self.joint_transform = joint_transform
self.sliding_crop = sliding_crop
self.transform = transform
self.target_transform = target_transform
self.dump_images = dump_images
self.eval_mode = eval_mode
self.eval_flip = eval_flip
self.eval_scales = None
self.root = root
if eval_scales != None:
self.eval_scales = [float(scale) for scale in eval_scales.split(",")]
self.list_path = list_paths[mode]
self.img_list = [line.strip().split() for line in open(root+self.list_path)]
self.files = self.read_files()
if len(self.files) == 0:
raise RuntimeError('Found 0 images, please check the data set')
self.mean_std = ([0.54218053, 0.64250553, 0.56620195], [0.54218052, 0.64250552, 0.56620194])
self.label_mapping = {0: 0,
1: 0,
3: 1,
4: 2,
5: 3,
6: 4,
7: 5,
8: 6,
9: 7,
10: 8,
12: 9,
15: 10,
17: 11,
18: 12,
19: 13,
23: 14,
27: 15,
29: 1,
30: 1,
31: 16,
32: 4,
33: 17,
34: 18}
def _eval_get_item(self, img, mask, scales, flip_bool):
return_imgs = []
for flip in range(int(flip_bool)+1):
imgs = []
if flip :
img = img.transpose(Image.FLIP_LEFT_RIGHT)
for scale in scales:
w,h = img.size
target_w, target_h = int(w * scale), int(h * scale)
resize_img =img.resize((target_w, target_h))
tensor_img = transforms.ToTensor()(resize_img)
final_tensor = transforms.Normalize(*self.mean_std)(tensor_img)
imgs.append(tensor_img)
return_imgs.append(imgs)
return return_imgs, mask
def read_files(self):
files = []
# if 'test' in self.mode:
# for item in self.img_list:
# image_path = item
# name = os.path.splitext(os.path.basename(image_path[0]))[0]
# files.append({
# "img": image_path[0],
# "name": name,
# })
# else:
for item in self.img_list:
image_path, label_path = item
name = os.path.splitext(os.path.basename(label_path))[0]
files.append({
"img": image_path,
"label": label_path,
"name": name,
"weight": 1
})
return files
def convert_label(self, label, inverse=False):
temp = label.copy()
if inverse:
for v, k in self.label_mapping.items():
label[temp == k] = v
else:
for k, v in self.label_mapping.items():
label[temp == k] = v
return label
def __getitem__(self, index):
item = self.files[index]
img_name = item["name"]
img_path = self.root + item['img']
label_path = self.root + item["label"]
img = Image.open(img_path).convert('RGB')
mask = np.array(Image.open(label_path))
mask = mask[:, :]
mask_copy = self.convert_label(mask)
if self.eval_mode:
return self._eval_get_item(img, mask_copy, self.eval_scales, self.eval_flip), img_name
mask = Image.fromarray(mask_copy.astype(np.uint8))
# Image Transformations
if self.joint_transform is not None:
img, mask = self.joint_transform(img, mask)
if self.transform is not None:
img = self.transform(img)
if self.target_transform is not None:
mask = self.target_transform(mask)
if self.mode == 'test':
return img, mask, img_name, item['img']
_edgemap = mask.numpy()
_edgemap = edge_utils.mask_to_onehot(_edgemap, num_classes)
_edgemap = edge_utils.onehot_to_binary_edges(_edgemap, 2, num_classes)
edgemap = torch.from_numpy(_edgemap).float()
# Debug
if self.dump_images:
outdir = '../../dump_imgs_{}'.format(self.mode)
os.makedirs(outdir, exist_ok=True)
out_img_fn = os.path.join(outdir, img_name + '.png')
out_msk_fn = os.path.join(outdir, img_name + '_mask.png')
mask_img = colorize_mask(np.array(mask))
img.save(out_img_fn)
mask_img.save(out_msk_fn)
return img, mask, edgemap, img_name
def __len__(self):
return len(self.files)
def make_dataset_video():
img_dir_name = 'leftImg8bit_demoVideo'
img_path = os.path.join(root, img_dir_name, 'leftImg8bit/demoVideo')
items = []
categories = os.listdir(img_path)
for c in categories[1:]:
c_items = [name.split('_leftImg8bit.png')[0] for name in
os.listdir(os.path.join(img_path, c))]
for it in c_items:
item = os.path.join(img_path, c, it + '_leftImg8bit.png')
items.append(item)
return items
class CityScapesVideo(data.Dataset):
def __init__(self, transform=None):
self.imgs = make_dataset_video()
if len(self.imgs) == 0:
raise RuntimeError('Found 0 images, please check the data set')
self.transform = transform
def __getitem__(self, index):
img_path = self.imgs[index]
img = Image.open(img_path).convert('RGB')
img_name = os.path.splitext(os.path.basename(img_path))[0]
if self.transform is not None:
img = self.transform(img)
return img, img_name
def __len__(self):
return len(self.imgs)
================================================
FILE: benchmarks/GSCNN-master/docs/index.html
================================================
Gated Shape CNN
Gated-SCNN Gated Shape CNNs for Semantic Segmentation
Current state-of-the-art methods for image segmentation form a dense image representation where the color, shape and texture information are all processed together inside a deep CNN. This however may not be ideal as they contain very different type of information relevant for recognition. We propose a new architecture that adds a shape stream to the classical CNN architecture. The two streams process the image in parallel, and their information gets fused in the very top layers. Key to this architecture is a new type of gates that connect the intermediate layers of the two streams. Specifically, we use the higher-level activations in the classical stream to gate the lower-level activations in the shape stream, effectively removing noise and helping the shape stream to only focus on processing the relevant boundary-related information. This enables us to use a very shallow architecture for the shape stream that operates on the image-level resolution. Our experiments show that this leads to a highly effective architecture that produces sharper predictions around object boundaries and significantly boosts performance on thinner and smaller objects. Our method achieves state-of-the-art performance on the Cityscapes benchmark, in terms of both mask (mIoU) and boundary (F-score) quality, improving by 2% and 4% over strong baselines.
Evaluation at different distances, measured by crop factor.
================================================
FILE: benchmarks/GSCNN-master/docs/resources/bibtex.txt
================================================
@inproceedings{Takikawa2019GatedSCNNGS,
title={Gated-SCNN: Gated Shape CNNs for Semantic Segmentation},
author={Towaki Takikawa and David Acuna and Varun Jampani and Sanja Fidler},
year={2019}
}
================================================
FILE: benchmarks/GSCNN-master/gscnn.txt
================================================
absl-py==0.10.0
aiohttp==3.6.2
astor==0.8.1
astunparse==1.6.3
async-timeout==3.0.1
attrs==20.2.0
cachetools==4.1.1
certifi==2020.6.20
chardet==3.0.4
cycler==0.10.0
decorator==4.4.2
future==0.18.2
gast==0.2.2
google-auth==1.22.0
google-auth-oauthlib==0.4.1
google-pasta==0.2.0
grpcio==1.32.0
h5py==2.10.0
idna==2.10
idna-ssl==1.1.0
imageio==2.9.0
imageio-ffmpeg==0.4.2
importlib-metadata==2.0.0
joblib==0.16.0
Keras-Applications==1.0.8
Keras-Preprocessing==1.1.2
kiwisolver==1.2.0
Markdown==3.2.2
matplotlib==3.3.2
multidict==4.7.6
networkx==2.5
ninja==1.10.0.post2
nose==1.3.7
numpy==1.18.5
oauthlib==3.1.0
opencv-python==4.4.0.44
opt-einsum==3.3.0
Pillow==7.2.0
portalocker==2.0.0
protobuf==3.13.0
pyasn1==0.4.8
pyasn1-modules==0.2.8
pyparsing==2.4.7
python-dateutil==2.8.1
PyWavelets==1.1.1
PyYAML==5.3.1
requests==2.24.0
requests-oauthlib==1.3.0
rsa==4.6
scikit-image==0.17.2
scikit-learn==0.23.2
scipy==1.5.2
six==1.15.0
tensorboard==1.15.0
tensorboard-plugin-wit==1.7.0
tensorboardX==2.1
tensorflow-estimator==1.15.1
tensorflow-gpu==1.15.0
termcolor==1.1.0
threadpoolctl==2.1.0
tifffile==2020.9.3
torch==1.4.0
torch-encoding @ git+https://github.com/zhanghang1989/PyTorch-Encoding/@ced288d6fa10d4780fa5205a2f239c84022e71a3
torchvision==0.5.0
tqdm==4.49.0
typing-extensions==3.7.4.3
urllib3==1.25.10
Werkzeug==1.0.1
wrapt==1.12.1
yacs==0.1.8
yarl==1.6.0
zipp==3.2.0
================================================
FILE: benchmarks/GSCNN-master/loss.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import logging
import numpy as np
from config import cfg
from my_functionals.DualTaskLoss import DualTaskLoss
def get_loss(args):
'''
Get the criterion based on the loss function
args:
return: criterion
'''
if args.img_wt_loss:
criterion = ImageBasedCrossEntropyLoss2d(
classes=args.dataset_cls.num_classes, size_average=True,
ignore_index=args.dataset_cls.ignore_label,
upper_bound=args.wt_bound).cuda()
elif args.joint_edgeseg_loss:
criterion = JointEdgeSegLoss(classes=args.dataset_cls.num_classes,
ignore_index=args.dataset_cls.ignore_label, upper_bound=args.wt_bound,
edge_weight=args.edge_weight, seg_weight=args.seg_weight, att_weight=args.att_weight, dual_weight=args.dual_weight).cuda()
else:
criterion = CrossEntropyLoss2d(size_average=True,
ignore_index=args.dataset_cls.ignore_label).cuda()
criterion_val = JointEdgeSegLoss(classes=args.dataset_cls.num_classes, mode='val',
ignore_index=args.dataset_cls.ignore_label, upper_bound=args.wt_bound,
edge_weight=args.edge_weight, seg_weight=args.seg_weight).cuda()
return criterion, criterion_val
class JointEdgeSegLoss(nn.Module):
def __init__(self, classes, weight=None, reduction='mean', ignore_index=255,
norm=False, upper_bound=1.0, mode='train',
edge_weight=1, seg_weight=1, att_weight=1, dual_weight=1, edge='none'):
super(JointEdgeSegLoss, self).__init__()
self.num_classes = classes
if mode == 'train':
self.seg_loss = ImageBasedCrossEntropyLoss2d(
classes=classes, ignore_index=ignore_index, upper_bound=upper_bound).cuda()
elif mode == 'val':
self.seg_loss = CrossEntropyLoss2d(size_average=True,
ignore_index=ignore_index).cuda()
self.ignore_index = ignore_index
self.edge_weight = edge_weight
self.seg_weight = seg_weight
self.att_weight = att_weight
self.dual_weight = dual_weight
self.dual_task = DualTaskLoss()
def bce2d(self, input, target):
n, c, h, w = input.size()
log_p = input.transpose(1, 2).transpose(2, 3).contiguous().view(1, -1)
target_t = target.transpose(1, 2).transpose(2, 3).contiguous().view(1, -1)
target_trans = target_t.clone()
pos_index = (target_t ==1)
neg_index = (target_t ==0)
ignore_index=(target_t >1)
target_trans[pos_index] = 1
target_trans[neg_index] = 0
pos_index = pos_index.data.cpu().numpy().astype(bool)
neg_index = neg_index.data.cpu().numpy().astype(bool)
ignore_index=ignore_index.data.cpu().numpy().astype(bool)
weight = torch.Tensor(log_p.size()).fill_(0)
weight = weight.numpy()
pos_num = pos_index.sum()
neg_num = neg_index.sum()
sum_num = pos_num + neg_num
weight[pos_index] = neg_num*1.0 / sum_num
weight[neg_index] = pos_num*1.0 / sum_num
weight[ignore_index] = 0
weight = torch.from_numpy(weight)
weight = weight.cuda()
loss = F.binary_cross_entropy_with_logits(log_p, target_t, weight, size_average=True)
return loss
def edge_attention(self, input, target, edge):
n, c, h, w = input.size()
filler = torch.ones_like(target) * self.ignore_index
return self.seg_loss(input,
torch.where(edge.max(1)[0] > 0.8, target, filler))
def forward(self, inputs, targets):
segin, edgein = inputs
segmask, edgemask = targets
losses = {}
losses['seg_loss'] = self.seg_weight * self.seg_loss(segin, segmask)
losses['edge_loss'] = self.edge_weight * 20 * self.bce2d(edgein, edgemask)
losses['att_loss'] = self.att_weight * self.edge_attention(segin, segmask, edgein)
losses['dual_loss'] = self.dual_weight * self.dual_task(segin, segmask,ignore_pixel=self.ignore_index)
return losses
#Img Weighted Loss
class ImageBasedCrossEntropyLoss2d(nn.Module):
def __init__(self, classes, weight=None, size_average=True, ignore_index=255,
norm=False, upper_bound=1.0):
super(ImageBasedCrossEntropyLoss2d, self).__init__()
logging.info("Using Per Image based weighted loss")
self.num_classes = classes
self.nll_loss = nn.NLLLoss2d(weight, size_average, ignore_index)
self.norm = norm
self.upper_bound = upper_bound
self.batch_weights = cfg.BATCH_WEIGHTING
def calculateWeights(self, target):
hist = np.histogram(target.flatten(), range(
self.num_classes + 1), normed=True)[0]
if self.norm:
hist = ((hist != 0) * self.upper_bound * (1 / hist)) + 1
else:
hist = ((hist != 0) * self.upper_bound * (1 - hist)) + 1
return hist
def forward(self, inputs, targets):
target_cpu = targets.data.cpu().numpy()
#print("loss",np.unique(target_cpu))
if self.batch_weights:
weights = self.calculateWeights(target_cpu)
self.nll_loss.weight = torch.Tensor(weights).cuda()
loss = 0.0
for i in range(0, inputs.shape[0]):
if not self.batch_weights:
weights = self.calculateWeights(target_cpu[i])
self.nll_loss.weight = torch.Tensor(weights).cuda()
loss += self.nll_loss(F.log_softmax(inputs[i].unsqueeze(0)),
targets[i].unsqueeze(0))
return loss
#Cross Entroply NLL Loss
class CrossEntropyLoss2d(nn.Module):
def __init__(self, weight=None, size_average=True, ignore_index=255):
super(CrossEntropyLoss2d, self).__init__()
logging.info("Using Cross Entropy Loss")
self.nll_loss = nn.NLLLoss2d(weight, size_average, ignore_index)
def forward(self, inputs, targets):
return self.nll_loss(F.log_softmax(inputs), targets)
================================================
FILE: benchmarks/GSCNN-master/my_functionals/DualTaskLoss.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# Code adapted from:
# https://github.com/ericjang/gumbel-softmax/blob/3c8584924603869e90ca74ac20a6a03d99a91ef9/Categorical%20VAE.ipynb
#
# MIT License
#
# Copyright (c) 2016 Eric Jang
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
"""
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
from my_functionals.custom_functional import compute_grad_mag
def perturbate_input_(input, n_elements=200):
N, C, H, W = input.shape
assert N == 1
c_ = np.random.random_integers(0, C - 1, n_elements)
h_ = np.random.random_integers(0, H - 1, n_elements)
w_ = np.random.random_integers(0, W - 1, n_elements)
for c_idx in c_:
for h_idx in h_:
for w_idx in w_:
input[0, c_idx, h_idx, w_idx] = 1
return input
def _sample_gumbel(shape, eps=1e-10):
"""
Sample from Gumbel(0, 1)
based on
https://github.com/ericjang/gumbel-softmax/blob/3c8584924603869e90ca74ac20a6a03d99a91ef9/Categorical%20VAE.ipynb ,
(MIT license)
"""
U = torch.rand(shape).cuda()
return - torch.log(eps - torch.log(U + eps))
def _gumbel_softmax_sample(logits, tau=1, eps=1e-10):
"""
Draw a sample from the Gumbel-Softmax distribution
based on
https://github.com/ericjang/gumbel-softmax/blob/3c8584924603869e90ca74ac20a6a03d99a91ef9/Categorical%20VAE.ipynb
(MIT license)
"""
assert logits.dim() == 3
gumbel_noise = _sample_gumbel(logits.size(), eps=eps)
y = logits + gumbel_noise
return F.softmax(y / tau, 1)
def _one_hot_embedding(labels, num_classes):
"""Embedding labels to one-hot form.
Args:
labels: (LongTensor) class labels, sized [N,].
num_classes: (int) number of classes.
Returns:
(tensor) encoded labels, sized [N, #classes].
"""
y = torch.eye(num_classes).cuda()
return y[labels].permute(0,3,1,2)
class DualTaskLoss(nn.Module):
def __init__(self, cuda=False):
super(DualTaskLoss, self).__init__()
self._cuda = cuda
return
def forward(self, input_logits, gts, ignore_pixel=255):
"""
:param input_logits: NxCxHxW
:param gt_semantic_masks: NxCxHxW
:return: final loss
"""
N, C, H, W = input_logits.shape
th = 1e-8 # 1e-10
eps = 1e-10
ignore_mask = (gts == ignore_pixel).detach()
input_logits = torch.where(ignore_mask.view(N, 1, H, W).expand(N, C, H, W),
torch.zeros(N,C,H,W).cuda(),
input_logits)
gt_semantic_masks = gts.detach()
gt_semantic_masks = torch.where(ignore_mask, torch.zeros(N,H,W).long().cuda(), gt_semantic_masks)
gt_semantic_masks = _one_hot_embedding(gt_semantic_masks, C).detach()
g = _gumbel_softmax_sample(input_logits.view(N, C, -1), tau=0.5)
g = g.reshape((N, C, H, W))
g = compute_grad_mag(g, cuda=self._cuda)
g_hat = compute_grad_mag(gt_semantic_masks, cuda=self._cuda)
g = g.view(N, -1)
#g_hat = g_hat.view(N, -1)
g_hat = g_hat.reshape(N, -1)
loss_ewise = F.l1_loss(g, g_hat, reduction='none', reduce=False)
p_plus_g_mask = (g >= th).detach().float()
loss_p_plus_g = torch.sum(loss_ewise * p_plus_g_mask) / (torch.sum(p_plus_g_mask) + eps)
p_plus_g_hat_mask = (g_hat >= th).detach().float()
loss_p_plus_g_hat = torch.sum(loss_ewise * p_plus_g_hat_mask) / (torch.sum(p_plus_g_hat_mask) + eps)
total_loss = 0.5 * loss_p_plus_g + 0.5 * loss_p_plus_g_hat
return total_loss
================================================
FILE: benchmarks/GSCNN-master/my_functionals/GatedSpatialConv.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
import torch.nn as nn
import torch
import torch.nn.functional as F
from torch.nn.modules.conv import _ConvNd
from torch.nn.modules.utils import _pair
import numpy as np
import math
import network.mynn as mynn
import my_functionals.custom_functional as myF
class GatedSpatialConv2d(_ConvNd):
def __init__(self, in_channels, out_channels, kernel_size=1, stride=1,
padding=0, dilation=1, groups=1, bias=False):
"""
:param in_channels:
:param out_channels:
:param kernel_size:
:param stride:
:param padding:
:param dilation:
:param groups:
:param bias:
"""
kernel_size = _pair(kernel_size)
stride = _pair(stride)
padding = _pair(padding)
dilation = _pair(dilation)
super(GatedSpatialConv2d, self).__init__(
in_channels, out_channels, kernel_size, stride, padding, dilation,
False, _pair(0), groups, bias, 'zeros')
self._gate_conv = nn.Sequential(
mynn.Norm2d(in_channels+1),
nn.Conv2d(in_channels+1, in_channels+1, 1),
nn.ReLU(),
nn.Conv2d(in_channels+1, 1, 1),
mynn.Norm2d(1),
nn.Sigmoid()
)
def forward(self, input_features, gating_features):
"""
:param input_features: [NxCxHxW] featuers comming from the shape branch (canny branch).
:param gating_features: [Nx1xHxW] features comming from the texture branch (resnet). Only one channel feature map.
:return:
"""
alphas = self._gate_conv(torch.cat([input_features, gating_features], dim=1))
input_features = (input_features * (alphas + 1))
return F.conv2d(input_features, self.weight, self.bias, self.stride,
self.padding, self.dilation, self.groups)
def reset_parameters(self):
nn.init.xavier_normal_(self.weight)
if self.bias is not None:
nn.init.zeros_(self.bias)
class Conv2dPad(nn.Conv2d):
def forward(self, input):
return myF.conv2d_same(input,self.weight,self.groups)
class HighFrequencyGatedSpatialConv2d(_ConvNd):
def __init__(self, in_channels, out_channels, kernel_size=1, stride=1,
padding=0, dilation=1, groups=1, bias=False):
"""
:param in_channels:
:param out_channels:
:param kernel_size:
:param stride:
:param padding:
:param dilation:
:param groups:
:param bias:
"""
kernel_size = _pair(kernel_size)
stride = _pair(stride)
padding = _pair(padding)
dilation = _pair(dilation)
super(HighFrequencyGatedSpatialConv2d, self).__init__(
in_channels, out_channels, kernel_size, stride, padding, dilation,
False, _pair(0), groups, bias)
self._gate_conv = nn.Sequential(
mynn.Norm2d(in_channels+1),
nn.Conv2d(in_channels+1, in_channels+1, 1),
nn.ReLU(),
nn.Conv2d(in_channels+1, 1, 1),
mynn.Norm2d(1),
nn.Sigmoid()
)
kernel_size = 7
sigma = 3
x_cord = torch.arange(kernel_size).float()
x_grid = x_cord.repeat(kernel_size).view(kernel_size, kernel_size).float()
y_grid = x_grid.t().float()
xy_grid = torch.stack([x_grid, y_grid], dim=-1).float()
mean = (kernel_size - 1)/2.
variance = sigma**2.
gaussian_kernel = (1./(2.*math.pi*variance)) *\
torch.exp(
-torch.sum((xy_grid - mean)**2., dim=-1) /\
(2*variance)
)
gaussian_kernel = gaussian_kernel / torch.sum(gaussian_kernel)
gaussian_kernel = gaussian_kernel.view(1, 1, kernel_size, kernel_size)
gaussian_kernel = gaussian_kernel.repeat(in_channels, 1, 1, 1)
self.gaussian_filter = nn.Conv2d(in_channels=in_channels, out_channels=in_channels, padding=3,
kernel_size=kernel_size, groups=in_channels, bias=False)
self.gaussian_filter.weight.data = gaussian_kernel
self.gaussian_filter.weight.requires_grad = False
self.cw = nn.Conv2d(in_channels * 2, in_channels, 1)
self.procdog = nn.Sequential(
nn.Conv2d(in_channels, in_channels, 1),
mynn.Norm2d(in_channels),
nn.Sigmoid()
)
def forward(self, input_features, gating_features):
"""
:param input_features: [NxCxHxW] featuers comming from the shape branch (canny branch).
:param gating_features: [Nx1xHxW] features comming from the texture branch (resnet). Only one channel feature map.
:return:
"""
n, c, h, w = input_features.size()
smooth_features = self.gaussian_filter(input_features)
dog_features = input_features - smooth_features
dog_features = self.cw(torch.cat((dog_features, input_features), dim=1))
alphas = self._gate_conv(torch.cat([input_features, gating_features], dim=1))
dog_features = dog_features * (alphas + 1)
return F.conv2d(dog_features, self.weight, self.bias, self.stride,
self.padding, self.dilation, self.groups)
def reset_parameters(self):
nn.init.xavier_normal_(self.weight)
if self.bias is not None:
nn.init.zeros_(self.bias)
def t():
import matplotlib.pyplot as plt
canny_map_filters_in = 8
canny_map = np.random.normal(size=(1, canny_map_filters_in, 10, 10)) # NxCxHxW
resnet_map = np.random.normal(size=(1, 1, 10, 10)) # NxCxHxW
plt.imshow(canny_map[0, 0])
plt.show()
canny_map = torch.from_numpy(canny_map).float()
resnet_map = torch.from_numpy(resnet_map).float()
gconv = GatedSpatialConv2d(canny_map_filters_in, canny_map_filters_in,
kernel_size=3, stride=1, padding=1)
output_map = gconv(canny_map, resnet_map)
print('done')
if __name__ == "__main__":
t()
================================================
FILE: benchmarks/GSCNN-master/my_functionals/__init__.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
================================================
FILE: benchmarks/GSCNN-master/my_functionals/custom_functional.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
import torch
import torch.nn.functional as F
from torchvision.transforms.functional import pad
import numpy as np
def calc_pad_same(in_siz, out_siz, stride, ksize):
"""Calculate same padding width.
Args:
ksize: kernel size [I, J].
Returns:
pad_: Actual padding width.
"""
return (out_siz - 1) * stride + ksize - in_siz
def conv2d_same(input, kernel, groups,bias=None,stride=1,padding=0,dilation=1):
n, c, h, w = input.shape
kout, ki_c_g, kh, kw = kernel.shape
pw = calc_pad_same(w, w, 1, kw)
ph = calc_pad_same(h, h, 1, kh)
pw_l = pw // 2
pw_r = pw - pw_l
ph_t = ph // 2
ph_b = ph - ph_t
input_ = F.pad(input, (pw_l, pw_r, ph_t, ph_b))
result = F.conv2d(input_, kernel, bias=bias, stride=stride, padding=padding, dilation=dilation, groups=groups)
assert result.shape == input.shape
return result
def gradient_central_diff(input, cuda):
return input, input
kernel = [[1, 0, -1]]
kernel_t = 0.5 * torch.Tensor(kernel) * -1. # pytorch implements correlation instead of conv
if type(cuda) is int:
if cuda != -1:
kernel_t = kernel_t.cuda(device=cuda)
else:
if cuda is True:
kernel_t = kernel_t.cuda()
n, c, h, w = input.shape
x = conv2d_same(input, kernel_t.unsqueeze(0).unsqueeze(0).repeat([c, 1, 1, 1]), c)
y = conv2d_same(input, kernel_t.t().unsqueeze(0).unsqueeze(0).repeat([c, 1, 1, 1]), c)
return x, y
def compute_single_sided_diferences(o_x, o_y, input):
# n,c,h,w
#input = input.clone()
o_y[:, :, 0, :] = input[:, :, 1, :].clone() - input[:, :, 0, :].clone()
o_x[:, :, :, 0] = input[:, :, :, 1].clone() - input[:, :, :, 0].clone()
# --
o_y[:, :, -1, :] = input[:, :, -1, :].clone() - input[:, :, -2, :].clone()
o_x[:, :, :, -1] = input[:, :, :, -1].clone() - input[:, :, :, -2].clone()
return o_x, o_y
def numerical_gradients_2d(input, cuda=False):
"""
numerical gradients implementation over batches using torch group conv operator.
the single sided differences are re-computed later.
it matches np.gradient(image) with the difference than here output=x,y for an image while there output=y,x
:param input: N,C,H,W
:param cuda: whether or not use cuda
:return: X,Y
"""
n, c, h, w = input.shape
assert h > 1 and w > 1
x, y = gradient_central_diff(input, cuda)
return x, y
def convTri(input, r, cuda=False):
"""
Convolves an image by a 2D triangle filter (the 1D triangle filter f is
[1:r r+1 r:-1:1]/(r+1)^2, the 2D version is simply conv2(f,f'))
:param input:
:param r: integer filter radius
:param cuda: move the kernel to gpu
:return:
"""
if (r <= 1):
raise ValueError()
n, c, h, w = input.shape
return input
f = list(range(1, r + 1)) + [r + 1] + list(reversed(range(1, r + 1)))
kernel = torch.Tensor([f]) / (r + 1) ** 2
if type(cuda) is int:
if cuda != -1:
kernel = kernel.cuda(device=cuda)
else:
if cuda is True:
kernel = kernel.cuda()
# padding w
input_ = F.pad(input, (1, 1, 0, 0), mode='replicate')
input_ = F.pad(input_, (r, r, 0, 0), mode='reflect')
input_ = [input_[:, :, :, :r], input, input_[:, :, :, -r:]]
input_ = torch.cat(input_, 3)
t = input_
# padding h
input_ = F.pad(input_, (0, 0, 1, 1), mode='replicate')
input_ = F.pad(input_, (0, 0, r, r), mode='reflect')
input_ = [input_[:, :, :r, :], t, input_[:, :, -r:, :]]
input_ = torch.cat(input_, 2)
output = F.conv2d(input_,
kernel.unsqueeze(0).unsqueeze(0).repeat([c, 1, 1, 1]),
padding=0, groups=c)
output = F.conv2d(output,
kernel.t().unsqueeze(0).unsqueeze(0).repeat([c, 1, 1, 1]),
padding=0, groups=c)
return output
def compute_normal(E, cuda=False):
if torch.sum(torch.isnan(E)) != 0:
print('nans found here')
import ipdb;
ipdb.set_trace()
E_ = convTri(E, 4, cuda)
Ox, Oy = numerical_gradients_2d(E_, cuda)
Oxx, _ = numerical_gradients_2d(Ox, cuda)
Oxy, Oyy = numerical_gradients_2d(Oy, cuda)
aa = Oyy * torch.sign(-(Oxy + 1e-5)) / (Oxx + 1e-5)
t = torch.atan(aa)
O = torch.remainder(t, np.pi)
if torch.sum(torch.isnan(O)) != 0:
print('nans found here')
import ipdb;
ipdb.set_trace()
return O
def compute_normal_2(E, cuda=False):
if torch.sum(torch.isnan(E)) != 0:
print('nans found here')
import ipdb;
ipdb.set_trace()
E_ = convTri(E, 4, cuda)
Ox, Oy = numerical_gradients_2d(E_, cuda)
Oxx, _ = numerical_gradients_2d(Ox, cuda)
Oxy, Oyy = numerical_gradients_2d(Oy, cuda)
aa = Oyy * torch.sign(-(Oxy + 1e-5)) / (Oxx + 1e-5)
t = torch.atan(aa)
O = torch.remainder(t, np.pi)
if torch.sum(torch.isnan(O)) != 0:
print('nans found here')
import ipdb;
ipdb.set_trace()
return O, (Oyy, Oxx)
def compute_grad_mag(E, cuda=False):
E_ = convTri(E, 4, cuda)
Ox, Oy = numerical_gradients_2d(E_, cuda)
mag = torch.sqrt(torch.mul(Ox,Ox) + torch.mul(Oy,Oy) + 1e-6)
mag = mag / mag.max()
return mag
================================================
FILE: benchmarks/GSCNN-master/network/Resnet.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# Code Adapted from:
# https://github.com/pytorch/vision/blob/master/torchvision/models/resnet.py
#
# BSD 3-Clause License
#
# Copyright (c) 2017,
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
import torch.nn as nn
import math
import torch.utils.model_zoo as model_zoo
import network.mynn as mynn
__all__ = ['ResNet', 'resnet18', 'resnet34', 'resnet50', 'resnet101',
'resnet152']
model_urls = {
'resnet18': 'https://download.pytorch.org/models/resnet18-5c106cde.pth',
'resnet34': 'https://download.pytorch.org/models/resnet34-333f7ec4.pth',
'resnet50': 'https://download.pytorch.org/models/resnet50-19c8e357.pth',
'resnet101': 'https://download.pytorch.org/models/resnet101-5d3b4d8f.pth',
'resnet152': 'https://download.pytorch.org/models/resnet152-b121ed2d.pth',
}
def conv3x3(in_planes, out_planes, stride=1):
"""3x3 convolution with padding"""
return nn.Conv2d(in_planes, out_planes, kernel_size=3, stride=stride,
padding=1, bias=False)
class BasicBlock(nn.Module):
expansion = 1
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = mynn.Norm2d(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes)
self.bn2 = mynn.Norm2d(planes)
self.downsample = downsample
self.stride = stride
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
class Bottleneck(nn.Module):
expansion = 4
def __init__(self, inplanes, planes, stride=1, downsample=None):
super(Bottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
self.bn1 = mynn.Norm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=stride,
padding=1, bias=False)
self.bn2 = mynn.Norm2d(planes)
self.conv3 = nn.Conv2d(planes, planes * self.expansion, kernel_size=1, bias=False)
self.bn3 = mynn.Norm2d(planes * self.expansion)
self.relu = nn.ReLU(inplace=True)
self.downsample = downsample
self.stride = stride
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
out += residual
out = self.relu(out)
return out
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000):
self.inplanes = 64
super(ResNet, self).__init__()
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3,
bias=False)
self.bn1 = mynn.Norm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
self.avgpool = nn.AvgPool2d(7, stride=1)
self.fc = nn.Linear(512 * block.expansion, num_classes)
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
def _make_layer(self, block, planes, blocks, stride=1):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=1, stride=stride, bias=False),
mynn.Norm2d(planes * block.expansion),
)
layers = []
layers.append(block(self.inplanes, planes, stride, downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = x.view(x.size(0), -1)
x = self.fc(x)
return x
def resnet18(pretrained=True, **kwargs):
"""Constructs a ResNet-18 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(BasicBlock, [2, 2, 2, 2], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['resnet18']))
return model
def resnet34(pretrained=True, **kwargs):
"""Constructs a ResNet-34 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(BasicBlock, [3, 4, 6, 3], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['resnet34']))
return model
def resnet50(pretrained=True, **kwargs):
"""Constructs a ResNet-50 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['resnet50']))
return model
def resnet101(pretrained=True, **kwargs):
"""Constructs a ResNet-101 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['resnet101']))
return model
def resnet152(pretrained=True, **kwargs):
"""Constructs a ResNet-152 model.
Args:
pretrained (bool): If True, returns a model pre-trained on ImageNet
"""
model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs)
if pretrained:
model.load_state_dict(model_zoo.load_url(model_urls['resnet152']))
return model
================================================
FILE: benchmarks/GSCNN-master/network/SEresnext.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# Code adapted from:
# https://github.com/Cadene/pretrained-models.pytorch
#
# BSD 3-Clause License
#
# Copyright (c) 2017, Remi Cadene
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
from collections import OrderedDict
import math
import network.mynn as mynn
import torch.nn as nn
from torch.utils import model_zoo
__all__ = ['SENet', 'senet154', 'se_resnet50', 'se_resnet101', 'se_resnet152',
'se_resnext50_32x4d', 'se_resnext101_32x4d']
pretrained_settings = {
'se_resnext50_32x4d': {
'imagenet': {
'url': 'http://data.lip6.fr/cadene/pretrainedmodels/se_resnext50_32x4d-a260b3a4.pth',
'input_space': 'RGB',
'input_size': [3, 224, 224],
'input_range': [0, 1],
'mean': [0.485, 0.456, 0.406],
'std': [0.229, 0.224, 0.225],
'num_classes': 1000
}
},
'se_resnext101_32x4d': {
'imagenet': {
'url': 'http://data.lip6.fr/cadene/pretrainedmodels/se_resnext101_32x4d-3b2fe3d8.pth',
'input_space': 'RGB',
'input_size': [3, 224, 224],
'input_range': [0, 1],
'mean': [0.485, 0.456, 0.406],
'std': [0.229, 0.224, 0.225],
'num_classes': 1000
}
},
}
class SEModule(nn.Module):
def __init__(self, channels, reduction):
super(SEModule, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc1 = nn.Conv2d(channels, channels // reduction, kernel_size=1,
padding=0)
self.relu = nn.ReLU(inplace=True)
self.fc2 = nn.Conv2d(channels // reduction, channels, kernel_size=1,
padding=0)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
module_input = x
x = self.avg_pool(x)
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.sigmoid(x)
return module_input * x
class Bottleneck(nn.Module):
"""
Base class for bottlenecks that implements `forward()` method.
"""
def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = self.relu(out)
out = self.conv3(out)
out = self.bn3(out)
if self.downsample is not None:
residual = self.downsample(x)
out = self.se_module(out) + residual
out = self.relu(out)
return out
class SEBottleneck(Bottleneck):
"""
Bottleneck for SENet154.
"""
expansion = 4
def __init__(self, inplanes, planes, groups, reduction, stride=1,
downsample=None):
super(SEBottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes * 2, kernel_size=1, bias=False)
self.bn1 = mynn.Norm2d(planes * 2)
self.conv2 = nn.Conv2d(planes * 2, planes * 4, kernel_size=3,
stride=stride, padding=1, groups=groups,
bias=False)
self.bn2 = mynn.Norm2d(planes * 4)
self.conv3 = nn.Conv2d(planes * 4, planes * 4, kernel_size=1,
bias=False)
self.bn3 = mynn.Norm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
self.se_module = SEModule(planes * 4, reduction=reduction)
self.downsample = downsample
self.stride = stride
class SEResNetBottleneck(Bottleneck):
"""
ResNet bottleneck with a Squeeze-and-Excitation module. It follows Caffe
implementation and uses `stride=stride` in `conv1` and not in `conv2`
(the latter is used in the torchvision implementation of ResNet).
"""
expansion = 4
def __init__(self, inplanes, planes, groups, reduction, stride=1,
downsample=None):
super(SEResNetBottleneck, self).__init__()
self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False,
stride=stride)
self.bn1 = mynn.Norm2d(planes)
self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, padding=1,
groups=groups, bias=False)
self.bn2 = mynn.Norm2d(planes)
self.conv3 = nn.Conv2d(planes, planes * 4, kernel_size=1, bias=False)
self.bn3 = mynn.Norm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
self.se_module = SEModule(planes * 4, reduction=reduction)
self.downsample = downsample
self.stride = stride
class SEResNeXtBottleneck(Bottleneck):
"""
ResNeXt bottleneck type C with a Squeeze-and-Excitation module.
"""
expansion = 4
def __init__(self, inplanes, planes, groups, reduction, stride=1,
downsample=None, base_width=4):
super(SEResNeXtBottleneck, self).__init__()
width = math.floor(planes * (base_width / 64)) * groups
self.conv1 = nn.Conv2d(inplanes, width, kernel_size=1, bias=False,
stride=1)
self.bn1 = mynn.Norm2d(width)
self.conv2 = nn.Conv2d(width, width, kernel_size=3, stride=stride,
padding=1, groups=groups, bias=False)
self.bn2 = mynn.Norm2d(width)
self.conv3 = nn.Conv2d(width, planes * 4, kernel_size=1, bias=False)
self.bn3 = mynn.Norm2d(planes * 4)
self.relu = nn.ReLU(inplace=True)
self.se_module = SEModule(planes * 4, reduction=reduction)
self.downsample = downsample
self.stride = stride
class SENet(nn.Module):
def __init__(self, block, layers, groups, reduction, dropout_p=0.2,
inplanes=128, input_3x3=True, downsample_kernel_size=3,
downsample_padding=1, num_classes=1000):
"""
Parameters
----------
block (nn.Module): Bottleneck class.
- For SENet154: SEBottleneck
- For SE-ResNet models: SEResNetBottleneck
- For SE-ResNeXt models: SEResNeXtBottleneck
layers (list of ints): Number of residual blocks for 4 layers of the
network (layer1...layer4).
groups (int): Number of groups for the 3x3 convolution in each
bottleneck block.
- For SENet154: 64
- For SE-ResNet models: 1
- For SE-ResNeXt models: 32
reduction (int): Reduction ratio for Squeeze-and-Excitation modules.
- For all models: 16
dropout_p (float or None): Drop probability for the Dropout layer.
If `None` the Dropout layer is not used.
- For SENet154: 0.2
- For SE-ResNet models: None
- For SE-ResNeXt models: None
inplanes (int): Number of input channels for layer1.
- For SENet154: 128
- For SE-ResNet models: 64
- For SE-ResNeXt models: 64
input_3x3 (bool): If `True`, use three 3x3 convolutions instead of
a single 7x7 convolution in layer0.
- For SENet154: True
- For SE-ResNet models: False
- For SE-ResNeXt models: False
downsample_kernel_size (int): Kernel size for downsampling convolutions
in layer2, layer3 and layer4.
- For SENet154: 3
- For SE-ResNet models: 1
- For SE-ResNeXt models: 1
downsample_padding (int): Padding for downsampling convolutions in
layer2, layer3 and layer4.
- For SENet154: 1
- For SE-ResNet models: 0
- For SE-ResNeXt models: 0
num_classes (int): Number of outputs in `last_linear` layer.
- For all models: 1000
"""
super(SENet, self).__init__()
self.inplanes = inplanes
if input_3x3:
layer0_modules = [
('conv1', nn.Conv2d(3, 64, 3, stride=2, padding=1,
bias=False)),
('bn1', mynn.Norm2d(64)),
('relu1', nn.ReLU(inplace=True)),
('conv2', nn.Conv2d(64, 64, 3, stride=1, padding=1,
bias=False)),
('bn2', mynn.Norm2d(64)),
('relu2', nn.ReLU(inplace=True)),
('conv3', nn.Conv2d(64, inplanes, 3, stride=1, padding=1,
bias=False)),
('bn3', mynn.Norm2d(inplanes)),
('relu3', nn.ReLU(inplace=True)),
]
else:
layer0_modules = [
('conv1', nn.Conv2d(3, inplanes, kernel_size=7, stride=2,
padding=3, bias=False)),
('bn1', mynn.Norm2d(inplanes)),
('relu1', nn.ReLU(inplace=True)),
]
# To preserve compatibility with Caffe weights `ceil_mode=True`
# is used instead of `padding=1`.
layer0_modules.append(('pool', nn.MaxPool2d(3, stride=2,
ceil_mode=True)))
self.layer0 = nn.Sequential(OrderedDict(layer0_modules))
self.layer1 = self._make_layer(
block,
planes=64,
blocks=layers[0],
groups=groups,
reduction=reduction,
downsample_kernel_size=1,
downsample_padding=0
)
self.layer2 = self._make_layer(
block,
planes=128,
blocks=layers[1],
stride=2,
groups=groups,
reduction=reduction,
downsample_kernel_size=downsample_kernel_size,
downsample_padding=downsample_padding
)
self.layer3 = self._make_layer(
block,
planes=256,
blocks=layers[2],
stride=1,
groups=groups,
reduction=reduction,
downsample_kernel_size=downsample_kernel_size,
downsample_padding=downsample_padding
)
self.layer4 = self._make_layer(
block,
planes=512,
blocks=layers[3],
stride=1,
groups=groups,
reduction=reduction,
downsample_kernel_size=downsample_kernel_size,
downsample_padding=downsample_padding
)
self.avg_pool = nn.AvgPool2d(7, stride=1)
self.dropout = nn.Dropout(dropout_p) if dropout_p is not None else None
self.last_linear = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, planes, blocks, groups, reduction, stride=1,
downsample_kernel_size=1, downsample_padding=0):
downsample = None
if stride != 1 or self.inplanes != planes * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes * block.expansion,
kernel_size=downsample_kernel_size, stride=stride,
padding=downsample_padding, bias=False),
mynn.Norm2d(planes * block.expansion),
)
layers = []
layers.append(block(self.inplanes, planes, groups, reduction, stride,
downsample))
self.inplanes = planes * block.expansion
for i in range(1, blocks):
layers.append(block(self.inplanes, planes, groups, reduction))
return nn.Sequential(*layers)
def features(self, x):
x = self.layer0(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
return x
def logits(self, x):
x = self.avg_pool(x)
if self.dropout is not None:
x = self.dropout(x)
x = x.view(x.size(0), -1)
x = self.last_linear(x)
return x
def forward(self, x):
x = self.features(x)
x = self.logits(x)
return x
def initialize_pretrained_model(model, num_classes, settings):
assert num_classes == settings['num_classes'], \
'num_classes should be {}, but is {}'.format(
settings['num_classes'], num_classes)
weights = model_zoo.load_url(settings['url'])
model.load_state_dict(weights)
model.input_space = settings['input_space']
model.input_size = settings['input_size']
model.input_range = settings['input_range']
model.mean = settings['mean']
model.std = settings['std']
def se_resnext50_32x4d(num_classes=1000):
model = SENet(SEResNeXtBottleneck, [3, 4, 6, 3], groups=32, reduction=16,
dropout_p=None, inplanes=64, input_3x3=False,
downsample_kernel_size=1, downsample_padding=0,
num_classes=num_classes)
settings = pretrained_settings['se_resnext50_32x4d']['imagenet']
initialize_pretrained_model(model, num_classes, settings)
return model
def se_resnext101_32x4d(num_classes=1000):
model = SENet(SEResNeXtBottleneck, [3, 4, 23, 3], groups=32, reduction=16,
dropout_p=None, inplanes=64, input_3x3=False,
downsample_kernel_size=1, downsample_padding=0,
num_classes=num_classes)
settings = pretrained_settings['se_resnext101_32x4d']['imagenet']
initialize_pretrained_model(model, num_classes, settings)
return model
================================================
FILE: benchmarks/GSCNN-master/network/__init__.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
import importlib
import torch
import logging
def get_net(args, criterion):
net = get_model(network=args.arch, num_classes=args.dataset_cls.num_classes,
criterion=criterion, trunk=args.trunk)
num_params = sum([param.nelement() for param in net.parameters()])
logging.info('Model params = {:2.1f}M'.format(num_params / 1000000))
#net = net
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
net = torch.nn.DataParallel(net).to(device)
if args.checkpoint_path:
print(f"Loading state_dict from {args.checkpoint_path}")
net.load_state_dict(torch.load(args.checkpoint_path)["state_dict"])
return net
def get_model(network, num_classes, criterion, trunk):
module = network[:network.rfind('.')]
model = network[network.rfind('.')+1:]
mod = importlib.import_module(module)
net_func = getattr(mod, model)
net = net_func(num_classes=num_classes, trunk=trunk, criterion=criterion)
return net
================================================
FILE: benchmarks/GSCNN-master/network/gscnn.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# Code Adapted from:
# https://github.com/sthalles/deeplab_v3
#
# MIT License
#
# Copyright (c) 2018 Thalles Santos Silva
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
"""
import torch
import torch.nn.functional as F
from torch import nn
from network import SEresnext
from network import Resnet
from network.wider_resnet import wider_resnet38_a2
from config import cfg
from network.mynn import initialize_weights, Norm2d
from torch.autograd import Variable
from my_functionals import GatedSpatialConv as gsc
import cv2
import numpy as np
class Crop(nn.Module):
def __init__(self, axis, offset):
super(Crop, self).__init__()
self.axis = axis
self.offset = offset
def forward(self, x, ref):
"""
:param x: input layer
:param ref: reference usually data in
:return:
"""
for axis in range(self.axis, x.dim()):
ref_size = ref.size(axis)
indices = torch.arange(self.offset, self.offset + ref_size).long()
indices = x.data.new().resize_(indices.size()).copy_(indices).long()
x = x.index_select(axis, Variable(indices))
return x
class MyIdentity(nn.Module):
def __init__(self, axis, offset):
super(MyIdentity, self).__init__()
self.axis = axis
self.offset = offset
def forward(self, x, ref):
"""
:param x: input layer
:param ref: reference usually data in
:return:
"""
return x
class SideOutputCrop(nn.Module):
"""
This is the original implementation ConvTranspose2d (fixed) and crops
"""
def __init__(self, num_output, kernel_sz=None, stride=None, upconv_pad=0, do_crops=True):
super(SideOutputCrop, self).__init__()
self._do_crops = do_crops
self.conv = nn.Conv2d(num_output, out_channels=1, kernel_size=1, stride=1, padding=0, bias=True)
if kernel_sz is not None:
self.upsample = True
self.upsampled = nn.ConvTranspose2d(1, out_channels=1, kernel_size=kernel_sz, stride=stride,
padding=upconv_pad,
bias=False)
##doing crops
if self._do_crops:
self.crops = Crop(2, offset=kernel_sz // 4)
else:
self.crops = MyIdentity(None, None)
else:
self.upsample = False
def forward(self, res, reference=None):
side_output = self.conv(res)
if self.upsample:
side_output = self.upsampled(side_output)
side_output = self.crops(side_output, reference)
return side_output
class _AtrousSpatialPyramidPoolingModule(nn.Module):
'''
operations performed:
1x1 x depth
3x3 x depth dilation 6
3x3 x depth dilation 12
3x3 x depth dilation 18
image pooling
concatenate all together
Final 1x1 conv
'''
def __init__(self, in_dim, reduction_dim=256, output_stride=16, rates=[6, 12, 18]):
super(_AtrousSpatialPyramidPoolingModule, self).__init__()
# Check if we are using distributed BN and use the nn from encoding.nn
# library rather than using standard pytorch.nn
if output_stride == 8:
rates = [2 * r for r in rates]
elif output_stride == 16:
pass
else:
raise 'output stride of {} not supported'.format(output_stride)
self.features = []
# 1x1
self.features.append(
nn.Sequential(nn.Conv2d(in_dim, reduction_dim, kernel_size=1, bias=False),
Norm2d(reduction_dim), nn.ReLU(inplace=True)))
# other rates
for r in rates:
self.features.append(nn.Sequential(
nn.Conv2d(in_dim, reduction_dim, kernel_size=3,
dilation=r, padding=r, bias=False),
Norm2d(reduction_dim),
nn.ReLU(inplace=True)
))
self.features = torch.nn.ModuleList(self.features)
# img level features
self.img_pooling = nn.AdaptiveAvgPool2d(1)
self.img_conv = nn.Sequential(
nn.Conv2d(in_dim, reduction_dim, kernel_size=1, bias=False),
Norm2d(reduction_dim), nn.ReLU(inplace=True))
self.edge_conv = nn.Sequential(
nn.Conv2d(1, reduction_dim, kernel_size=1, bias=False),
Norm2d(reduction_dim), nn.ReLU(inplace=True))
def forward(self, x, edge):
x_size = x.size()
img_features = self.img_pooling(x)
img_features = self.img_conv(img_features)
img_features = F.interpolate(img_features, x_size[2:],
mode='bilinear',align_corners=True)
out = img_features
edge_features = F.interpolate(edge, x_size[2:],
mode='bilinear',align_corners=True)
edge_features = self.edge_conv(edge_features)
out = torch.cat((out, edge_features), 1)
for f in self.features:
y = f(x)
out = torch.cat((out, y), 1)
return out
class GSCNN(nn.Module):
'''
Wide_resnet version of DeepLabV3
mod1
pool2
mod2 str2
pool3
mod3-7
structure: [3, 3, 6, 3, 1, 1]
channels = [(128, 128), (256, 256), (512, 512), (512, 1024), (512, 1024, 2048),
(1024, 2048, 4096)]
'''
def __init__(self, num_classes, trunk=None, criterion=None):
super(GSCNN, self).__init__()
self.criterion = criterion
self.num_classes = num_classes
wide_resnet = wider_resnet38_a2(classes=1000, dilation=True)
wide_resnet = torch.nn.DataParallel(wide_resnet)
try:
checkpoint = torch.load('./network/pretrained_models/wider_resnet38.pth.tar', map_location='cpu')
wide_resnet.load_state_dict(checkpoint['state_dict'])
del checkpoint
except:
print("Please download the ImageNet weights of WideResNet38 in our repo to ./pretrained_models/wider_resnet38.pth.tar.")
raise RuntimeError("=====================Could not load ImageNet weights of WideResNet38 network.=======================")
wide_resnet = wide_resnet.module
self.mod1 = wide_resnet.mod1
self.mod2 = wide_resnet.mod2
self.mod3 = wide_resnet.mod3
self.mod4 = wide_resnet.mod4
self.mod5 = wide_resnet.mod5
self.mod6 = wide_resnet.mod6
self.mod7 = wide_resnet.mod7
self.pool2 = wide_resnet.pool2
self.pool3 = wide_resnet.pool3
self.interpolate = F.interpolate
del wide_resnet
self.dsn1 = nn.Conv2d(64, 1, 1)
self.dsn3 = nn.Conv2d(256, 1, 1)
self.dsn4 = nn.Conv2d(512, 1, 1)
self.dsn7 = nn.Conv2d(4096, 1, 1)
self.res1 = Resnet.BasicBlock(64, 64, stride=1, downsample=None)
self.d1 = nn.Conv2d(64, 32, 1)
self.res2 = Resnet.BasicBlock(32, 32, stride=1, downsample=None)
self.d2 = nn.Conv2d(32, 16, 1)
self.res3 = Resnet.BasicBlock(16, 16, stride=1, downsample=None)
self.d3 = nn.Conv2d(16, 8, 1)
self.fuse = nn.Conv2d(8, 1, kernel_size=1, padding=0, bias=False)
self.cw = nn.Conv2d(2, 1, kernel_size=1, padding=0, bias=False)
self.gate1 = gsc.GatedSpatialConv2d(32, 32)
self.gate2 = gsc.GatedSpatialConv2d(16, 16)
self.gate3 = gsc.GatedSpatialConv2d(8, 8)
self.aspp = _AtrousSpatialPyramidPoolingModule(4096, 256,
output_stride=8)
self.bot_fine = nn.Conv2d(128, 48, kernel_size=1, bias=False)
self.bot_aspp = nn.Conv2d(1280 + 256, 256, kernel_size=1, bias=False)
self.final_seg = nn.Sequential(
nn.Conv2d(256 + 48, 256, kernel_size=3, padding=1, bias=False),
Norm2d(256),
nn.ReLU(inplace=True),
nn.Conv2d(256, 256, kernel_size=3, padding=1, bias=False),
Norm2d(256),
nn.ReLU(inplace=True),
nn.Conv2d(256, num_classes, kernel_size=1, bias=False))
self.sigmoid = nn.Sigmoid()
initialize_weights(self.final_seg)
def forward(self, inp, gts=None):
x_size = inp.size()
# res 1
m1 = self.mod1(inp)
# res 2
m2 = self.mod2(self.pool2(m1))
# res 3
m3 = self.mod3(self.pool3(m2))
# res 4-7
m4 = self.mod4(m3)
m5 = self.mod5(m4)
m6 = self.mod6(m5)
m7 = self.mod7(m6)
s3 = F.interpolate(self.dsn3(m3), x_size[2:],
mode='bilinear', align_corners=True)
s4 = F.interpolate(self.dsn4(m4), x_size[2:],
mode='bilinear', align_corners=True)
s7 = F.interpolate(self.dsn7(m7), x_size[2:],
mode='bilinear', align_corners=True)
m1f = F.interpolate(m1, x_size[2:], mode='bilinear', align_corners=True)
im_arr = inp.cpu().numpy().transpose((0,2,3,1)).astype(np.uint8)
canny = np.zeros((x_size[0], 1, x_size[2], x_size[3]))
for i in range(x_size[0]):
canny[i] = cv2.Canny(im_arr[i],10,100)
canny = torch.from_numpy(canny).cuda().float()
cs = self.res1(m1f)
cs = F.interpolate(cs, x_size[2:],
mode='bilinear', align_corners=True)
cs = self.d1(cs)
cs = self.gate1(cs, s3)
cs = self.res2(cs)
cs = F.interpolate(cs, x_size[2:],
mode='bilinear', align_corners=True)
cs = self.d2(cs)
cs = self.gate2(cs, s4)
cs = self.res3(cs)
cs = F.interpolate(cs, x_size[2:],
mode='bilinear', align_corners=True)
cs = self.d3(cs)
cs = self.gate3(cs, s7)
cs = self.fuse(cs)
cs = F.interpolate(cs, x_size[2:],
mode='bilinear', align_corners=True)
edge_out = self.sigmoid(cs)
cat = torch.cat((edge_out, canny), dim=1)
acts = self.cw(cat)
acts = self.sigmoid(acts)
# aspp
x = self.aspp(m7, acts)
dec0_up = self.bot_aspp(x)
dec0_fine = self.bot_fine(m2)
dec0_up = self.interpolate(dec0_up, m2.size()[2:], mode='bilinear',align_corners=True)
dec0 = [dec0_fine, dec0_up]
dec0 = torch.cat(dec0, 1)
dec1 = self.final_seg(dec0)
seg_out = self.interpolate(dec1, x_size[2:], mode='bilinear')
if self.training:
return self.criterion((seg_out, edge_out), gts)
else:
return seg_out, edge_out
================================================
FILE: benchmarks/GSCNN-master/network/mynn.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
from config import cfg
import torch.nn as nn
from math import sqrt
import torch
from torch.autograd.function import InplaceFunction
from itertools import repeat
from torch.nn.modules import Module
from torch.utils.checkpoint import checkpoint
def Norm2d(in_channels):
"""
Custom Norm Function to allow flexible switching
"""
layer = getattr(cfg.MODEL,'BNFUNC')
normalizationLayer = layer(in_channels)
return normalizationLayer
def initialize_weights(*models):
for model in models:
for module in model.modules():
if isinstance(module, nn.Conv2d) or isinstance(module, nn.Linear):
nn.init.kaiming_normal(module.weight)
if module.bias is not None:
module.bias.data.zero_()
elif isinstance(module, nn.BatchNorm2d):
module.weight.data.fill_(1)
module.bias.data.zero_()
================================================
FILE: benchmarks/GSCNN-master/network/wider_resnet.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# Code adapted from:
# https://github.com/mapillary/inplace_abn/
#
# BSD 3-Clause License
#
# Copyright (c) 2017, mapillary
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
"""
import sys
from collections import OrderedDict
from functools import partial
import torch.nn as nn
import torch
import network.mynn as mynn
def bnrelu(channels):
return nn.Sequential(mynn.Norm2d(channels),
nn.ReLU(inplace=True))
class GlobalAvgPool2d(nn.Module):
def __init__(self):
"""Global average pooling over the input's spatial dimensions"""
super(GlobalAvgPool2d, self).__init__()
def forward(self, inputs):
in_size = inputs.size()
return inputs.view((in_size[0], in_size[1], -1)).mean(dim=2)
class IdentityResidualBlock(nn.Module):
def __init__(self,
in_channels,
channels,
stride=1,
dilation=1,
groups=1,
norm_act=bnrelu,
dropout=None,
dist_bn=False
):
"""Configurable identity-mapping residual block
Parameters
----------
in_channels : int
Number of input channels.
channels : list of int
Number of channels in the internal feature maps.
Can either have two or three elements: if three construct
a residual block with two `3 x 3` convolutions,
otherwise construct a bottleneck block with `1 x 1`, then
`3 x 3` then `1 x 1` convolutions.
stride : int
Stride of the first `3 x 3` convolution
dilation : int
Dilation to apply to the `3 x 3` convolutions.
groups : int
Number of convolution groups.
This is used to create ResNeXt-style blocks and is only compatible with
bottleneck blocks.
norm_act : callable
Function to create normalization / activation Module.
dropout: callable
Function to create Dropout Module.
dist_bn: Boolean
A variable to enable or disable use of distributed BN
"""
super(IdentityResidualBlock, self).__init__()
self.dist_bn = dist_bn
# Check if we are using distributed BN and use the nn from encoding.nn
# library rather than using standard pytorch.nn
# Check parameters for inconsistencies
if len(channels) != 2 and len(channels) != 3:
raise ValueError("channels must contain either two or three values")
if len(channels) == 2 and groups != 1:
raise ValueError("groups > 1 are only valid if len(channels) == 3")
is_bottleneck = len(channels) == 3
need_proj_conv = stride != 1 or in_channels != channels[-1]
self.bn1 = norm_act(in_channels)
if not is_bottleneck:
layers = [
("conv1", nn.Conv2d(in_channels,
channels[0],
3,
stride=stride,
padding=dilation,
bias=False,
dilation=dilation)),
("bn2", norm_act(channels[0])),
("conv2", nn.Conv2d(channels[0], channels[1],
3,
stride=1,
padding=dilation,
bias=False,
dilation=dilation))
]
if dropout is not None:
layers = layers[0:2] + [("dropout", dropout())] + layers[2:]
else:
layers = [
("conv1",
nn.Conv2d(in_channels,
channels[0],
1,
stride=stride,
padding=0,
bias=False)),
("bn2", norm_act(channels[0])),
("conv2", nn.Conv2d(channels[0],
channels[1],
3, stride=1,
padding=dilation, bias=False,
groups=groups,
dilation=dilation)),
("bn3", norm_act(channels[1])),
("conv3", nn.Conv2d(channels[1], channels[2],
1, stride=1, padding=0, bias=False))
]
if dropout is not None:
layers = layers[0:4] + [("dropout", dropout())] + layers[4:]
self.convs = nn.Sequential(OrderedDict(layers))
if need_proj_conv:
self.proj_conv = nn.Conv2d(
in_channels, channels[-1], 1, stride=stride, padding=0, bias=False)
def forward(self, x):
"""
This is the standard forward function for non-distributed batch norm
"""
if hasattr(self, "proj_conv"):
bn1 = self.bn1(x)
shortcut = self.proj_conv(bn1)
else:
shortcut = x.clone()
bn1 = self.bn1(x)
out = self.convs(bn1)
out.add_(shortcut)
return out
class WiderResNet(nn.Module):
def __init__(self,
structure,
norm_act=bnrelu,
classes=0
):
"""Wider ResNet with pre-activation (identity mapping) blocks
Parameters
----------
structure : list of int
Number of residual blocks in each of the six modules of the network.
norm_act : callable
Function to create normalization / activation Module.
classes : int
If not `0` also include global average pooling and \
a fully-connected layer with `classes` outputs at the end
of the network.
"""
super(WiderResNet, self).__init__()
self.structure = structure
if len(structure) != 6:
raise ValueError("Expected a structure with six values")
# Initial layers
self.mod1 = nn.Sequential(OrderedDict([
("conv1", nn.Conv2d(3, 64, 3, stride=1, padding=1, bias=False))
]))
# Groups of residual blocks
in_channels = 64
channels = [(128, 128), (256, 256), (512, 512), (512, 1024),
(512, 1024, 2048), (1024, 2048, 4096)]
for mod_id, num in enumerate(structure):
# Create blocks for module
blocks = []
for block_id in range(num):
blocks.append((
"block%d" % (block_id + 1),
IdentityResidualBlock(in_channels, channels[mod_id],
norm_act=norm_act)
))
# Update channels and p_keep
in_channels = channels[mod_id][-1]
# Create module
if mod_id <= 4:
self.add_module("pool%d" %
(mod_id + 2), nn.MaxPool2d(3, stride=2, padding=1))
self.add_module("mod%d" % (mod_id + 2), nn.Sequential(OrderedDict(blocks)))
# Pooling and predictor
self.bn_out = norm_act(in_channels)
if classes != 0:
self.classifier = nn.Sequential(OrderedDict([
("avg_pool", GlobalAvgPool2d()),
("fc", nn.Linear(in_channels, classes))
]))
def forward(self, img):
out = self.mod1(img)
out = self.mod2(self.pool2(out))
out = self.mod3(self.pool3(out))
out = self.mod4(self.pool4(out))
out = self.mod5(self.pool5(out))
out = self.mod6(self.pool6(out))
out = self.mod7(out)
out = self.bn_out(out)
if hasattr(self, "classifier"):
out = self.classifier(out)
return out
class WiderResNetA2(nn.Module):
def __init__(self,
structure,
norm_act=bnrelu,
classes=0,
dilation=False,
dist_bn=False
):
"""Wider ResNet with pre-activation (identity mapping) blocks
This variant uses down-sampling by max-pooling in the first two blocks and \
by strided convolution in the others.
Parameters
----------
structure : list of int
Number of residual blocks in each of the six modules of the network.
norm_act : callable
Function to create normalization / activation Module.
classes : int
If not `0` also include global average pooling and a fully-connected layer
\with `classes` outputs at the end
of the network.
dilation : bool
If `True` apply dilation to the last three modules and change the
\down-sampling factor from 32 to 8.
"""
super(WiderResNetA2, self).__init__()
self.dist_bn = dist_bn
# If using distributed batch norm, use the encoding.nn as oppose to torch.nn
nn.Dropout = nn.Dropout2d
norm_act = bnrelu
self.structure = structure
self.dilation = dilation
if len(structure) != 6:
raise ValueError("Expected a structure with six values")
# Initial layers
self.mod1 = torch.nn.Sequential(OrderedDict([
("conv1", nn.Conv2d(3, 64, 3, stride=1, padding=1, bias=False))
]))
# Groups of residual blocks
in_channels = 64
channels = [(128, 128), (256, 256), (512, 512), (512, 1024), (512, 1024, 2048),
(1024, 2048, 4096)]
for mod_id, num in enumerate(structure):
# Create blocks for module
blocks = []
for block_id in range(num):
if not dilation:
dil = 1
stride = 2 if block_id == 0 and 2 <= mod_id <= 4 else 1
else:
if mod_id == 3:
dil = 2
elif mod_id > 3:
dil = 4
else:
dil = 1
stride = 2 if block_id == 0 and mod_id == 2 else 1
if mod_id == 4:
drop = partial(nn.Dropout, p=0.3)
elif mod_id == 5:
drop = partial(nn.Dropout, p=0.5)
else:
drop = None
blocks.append((
"block%d" % (block_id + 1),
IdentityResidualBlock(in_channels,
channels[mod_id], norm_act=norm_act,
stride=stride, dilation=dil,
dropout=drop, dist_bn=self.dist_bn)
))
# Update channels and p_keep
in_channels = channels[mod_id][-1]
# Create module
if mod_id < 2:
self.add_module("pool%d" %
(mod_id + 2), nn.MaxPool2d(3, stride=2, padding=1))
self.add_module("mod%d" % (mod_id + 2), nn.Sequential(OrderedDict(blocks)))
# Pooling and predictor
self.bn_out = norm_act(in_channels)
if classes != 0:
self.classifier = nn.Sequential(OrderedDict([
("avg_pool", GlobalAvgPool2d()),
("fc", nn.Linear(in_channels, classes))
]))
def forward(self, img):
out = self.mod1(img)
out = self.mod2(self.pool2(out))
out = self.mod3(self.pool3(out))
out = self.mod4(out)
out = self.mod5(out)
out = self.mod6(out)
out = self.mod7(out)
out = self.bn_out(out)
if hasattr(self, "classifier"):
return self.classifier(out)
else:
return out
_NETS = {
"16": {"structure": [1, 1, 1, 1, 1, 1]},
"20": {"structure": [1, 1, 1, 3, 1, 1]},
"38": {"structure": [3, 3, 6, 3, 1, 1]},
}
__all__ = []
for name, params in _NETS.items():
net_name = "wider_resnet" + name
setattr(sys.modules[__name__], net_name, partial(WiderResNet, **params))
__all__.append(net_name)
for name, params in _NETS.items():
net_name = "wider_resnet" + name + "_a2"
setattr(sys.modules[__name__], net_name, partial(WiderResNetA2, **params))
__all__.append(net_name)
================================================
FILE: benchmarks/GSCNN-master/optimizer.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
import torch
from torch import optim
import math
import logging
from config import cfg
def get_optimizer(args, net):
param_groups = net.parameters()
if args.sgd:
optimizer = optim.SGD(param_groups,
lr=args.lr,
weight_decay=args.weight_decay,
momentum=args.momentum,
nesterov=False)
elif args.adam:
amsgrad=False
if args.amsgrad:
amsgrad=True
optimizer = optim.Adam(param_groups,
lr=args.lr,
weight_decay=args.weight_decay,
amsgrad=amsgrad
)
else:
raise ('Not a valid optimizer')
if args.lr_schedule == 'poly':
lambda1 = lambda epoch: math.pow(1 - epoch / args.max_epoch, args.poly_exp)
scheduler = optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda1)
else:
raise ValueError('unknown lr schedule {}'.format(args.lr_schedule))
if args.snapshot:
logging.info('Loading weights from model {}'.format(args.snapshot))
net, optimizer = restore_snapshot(args, net, optimizer, args.snapshot)
else:
logging.info('Loaded weights from IMGNET classifier')
return optimizer, scheduler
def restore_snapshot(args, net, optimizer, snapshot):
checkpoint = torch.load(snapshot, map_location=torch.device('cpu'))
logging.info("Load Compelete")
if args.sgd_finetuned:
print('skipping load optimizer')
else:
if 'optimizer' in checkpoint and args.restore_optimizer:
optimizer.load_state_dict(checkpoint['optimizer'])
if 'state_dict' in checkpoint:
net = forgiving_state_restore(net, checkpoint['state_dict'])
else:
net = forgiving_state_restore(net, checkpoint)
return net, optimizer
def forgiving_state_restore(net, loaded_dict):
# Handle partial loading when some tensors don't match up in size.
# Because we want to use models that were trained off a different
# number of classes.
net_state_dict = net.state_dict()
new_loaded_dict = {}
for k in net_state_dict:
if k in loaded_dict and net_state_dict[k].size() == loaded_dict[k].size():
new_loaded_dict[k] = loaded_dict[k]
else:
logging.info('Skipped loading parameter {}'.format(k))
net_state_dict.update(new_loaded_dict)
net.load_state_dict(net_state_dict)
return net
================================================
FILE: benchmarks/GSCNN-master/run_gscnn.sh
================================================
#!/bin/bash
export PYTHONPATH=/home/usl/Code/Peng/data_collection/benchmarks/GSCNN-master/:$PYTHONPATH
echo $PYTHONPATH
python train.py --dataset rellis --bs_mult 3 --lr 0.001 --exp final
================================================
FILE: benchmarks/GSCNN-master/run_gscnn_eval.sh
================================================
#!/bin/bash
export PYTHONPATH=/home/usl/Code/PengJiang/RELLIS-3D/benchmarks/GSCNN-master/:$PYTHONPATH
echo $PYTHONPATH
python train.py --dataset rellis --bs_mult 3 --lr 0.001 --exp final \
--checkpoint_path /home/usl/Downloads/best_epoch_84_mean-iu_0.46839.pth \
--mode test \
--viz \
--data-cfg /home/usl/Code/Peng/data_collection/benchmarks/SalsaNext/train/tasks/semantic/config/labels/rellis.yaml \
--test_sv_path /home/usl/Datasets/prediction
================================================
FILE: benchmarks/GSCNN-master/train.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
from __future__ import absolute_import
from __future__ import division
import argparse
from functools import partial
from config import cfg, assert_and_infer_cfg
import logging
import math
import os
import sys
import torch
import numpy as np
import yaml
from utils.misc import AverageMeter, prep_experiment, evaluate_eval, fast_hist
from utils.f_boundary import eval_mask_boundary
import datasets
import loss
import network
import optimizer
from tqdm import tqdm
from PIL import Image
# Argument Parser
parser = argparse.ArgumentParser(description='GSCNN')
parser.add_argument('--lr', type=float, default=0.01)
parser.add_argument('--arch', type=str, default='network.gscnn.GSCNN')
parser.add_argument('--dataset', type=str, default='cityscapes')
parser.add_argument('--cv', type=int, default=0,
help='cross validation split')
parser.add_argument('--joint_edgeseg_loss', action='store_true', default=True,
help='joint loss')
parser.add_argument('--img_wt_loss', action='store_true', default=False,
help='per-image class-weighted loss')
parser.add_argument('--batch_weighting', action='store_true', default=False,
help='Batch weighting for class')
parser.add_argument('--eval_thresholds', type=str, default='0.0005,0.001875,0.00375,0.005',
help='Thresholds for boundary evaluation')
parser.add_argument('--rescale', type=float, default=1.0,
help='Rescaled LR Rate')
parser.add_argument('--repoly', type=float, default=1.5,
help='Rescaled Poly')
parser.add_argument('--edge_weight', type=float, default=1.0,
help='Edge loss weight for joint loss')
parser.add_argument('--seg_weight', type=float, default=1.0,
help='Segmentation loss weight for joint loss')
parser.add_argument('--att_weight', type=float, default=1.0,
help='Attention loss weight for joint loss')
parser.add_argument('--dual_weight', type=float, default=1.0,
help='Dual loss weight for joint loss')
parser.add_argument('--evaluate', action='store_true', default=False)
parser.add_argument("--local_rank", default=0, type=int)
parser.add_argument('--sgd', action='store_true', default=True)
parser.add_argument('--sgd_finetuned',action='store_true',default=False)
parser.add_argument('--adam', action='store_true', default=False)
parser.add_argument('--amsgrad', action='store_true', default=False)
parser.add_argument('--trunk', type=str, default='resnet101',
help='trunk model, can be: resnet101 (default), resnet50')
parser.add_argument('--max_epoch', type=int, default=175)
parser.add_argument('--start_epoch', type=int, default=0)
parser.add_argument('--color_aug', type=float,
default=0.25, help='level of color augmentation')
parser.add_argument('--rotate', type=float,
default=0, help='rotation')
parser.add_argument('--gblur', action='store_true', default=True)
parser.add_argument('--bblur', action='store_true', default=False)
parser.add_argument('--lr_schedule', type=str, default='poly',
help='name of lr schedule: poly')
parser.add_argument('--poly_exp', type=float, default=1.0,
help='polynomial LR exponent')
parser.add_argument('--bs_mult', type=int, default=1)
parser.add_argument('--bs_mult_val', type=int, default=2)
parser.add_argument('--crop_size', type=int, default=720,
help='training crop size')
parser.add_argument('--pre_size', type=int, default=None,
help='resize image shorter edge to this before augmentation')
parser.add_argument('--scale_min', type=float, default=0.5,
help='dynamically scale training images down to this size')
parser.add_argument('--scale_max', type=float, default=2.0,
help='dynamically scale training images up to this size')
parser.add_argument('--weight_decay', type=float, default=1e-4)
parser.add_argument('--momentum', type=float, default=0.9)
parser.add_argument('--snapshot', type=str, default=None)
parser.add_argument('--restore_optimizer', action='store_true', default=False)
parser.add_argument('--exp', type=str, default='default',
help='experiment directory name')
parser.add_argument('--tb_tag', type=str, default='',
help='add tag to tb dir')
parser.add_argument('--ckpt', type=str, default='logs/ckpt')
parser.add_argument('--tb_path', type=str, default='logs/tb')
parser.add_argument('--syncbn', action='store_true', default=True,
help='Synchronized BN')
parser.add_argument('--dump_augmentation_images', action='store_true', default=False,
help='Synchronized BN')
parser.add_argument('--test_mode', action='store_true', default=False,
help='minimum testing (1 epoch run ) to verify nothing failed')
parser.add_argument('--mode',type=str,default="train")
parser.add_argument('--test_sv_path', type=str, default="")
parser.add_argument('--checkpoint_path',type=str,default="")
parser.add_argument('-wb', '--wt_bound', type=float, default=1.0)
parser.add_argument('--maxSkip', type=int, default=0)
parser.add_argument('--data-cfg', help='data config (kitti format)',
default='config/rellis.yaml',
type=str)
parser.add_argument('--viz', dest='viz',
help="Save color predictions to disk",
action='store_true')
args = parser.parse_args()
args.best_record = {'epoch': -1, 'iter': 0, 'val_loss': 1e10, 'acc': 0,
'acc_cls': 0, 'mean_iu': 0, 'fwavacc': 0}
def convert_label(label, inverse=False):
label_mapping = {0: 0,
1: 0,
3: 1,
4: 2,
5: 3,
6: 4,
7: 5,
8: 6,
9: 7,
10: 8,
12: 9,
15: 10,
17: 11,
18: 12,
19: 13,
23: 14,
27: 15,
# 29: 1,
# 30: 1,
31: 16,
# 32: 4,
33: 17,
34: 18}
temp = label.copy()
if inverse:
for v,k in label_mapping.items():
temp[label == k] = v
else:
for k, v in label_mapping.items():
temp[label == k] = v
return temp
def convert_color(label, color_map):
temp = np.zeros(label.shape + (3,)).astype(np.uint8)
for k,v in color_map.items():
temp[label == k] = v
return temp
#Enable CUDNN Benchmarking optimization
torch.backends.cudnn.benchmark = True
args.world_size = 1
#Test Mode run two epochs with a few iterations of training and val
if args.test_mode:
args.max_epoch = 2
if 'WORLD_SIZE' in os.environ:
args.world_size = int(os.environ['WORLD_SIZE'])
print("Total world size: ", int(os.environ['WORLD_SIZE']))
def main():
'''
Main Function
'''
#Set up the Arguments, Tensorboard Writer, Dataloader, Loss Fn, Optimizer
assert_and_infer_cfg(args)
writer = prep_experiment(args,parser)
train_loader, val_loader, train_obj = datasets.setup_loaders(args)
criterion, criterion_val = loss.get_loss(args)
net = network.get_net(args, criterion)
optim, scheduler = optimizer.get_optimizer(args, net)
torch.cuda.empty_cache()
if args.mode=="test":
test_sv_path = args.test_sv_path
print(f"Saving prediction {test_sv_path}")
net.eval()
try:
print("Opening config file %s" % args.data_cfg)
CFG = yaml.safe_load(open(args.data_cfg, 'r'))
except Exception as e:
print(e)
print("Error opening yaml file.")
quit()
id_color_map = CFG["color_map"]
for vi, data in enumerate(tqdm(val_loader)):
input, mask, img_name, img_path = data
assert len(input.size()) == 4 and len(mask.size()) == 3
assert input.size()[2:] == mask.size()[1:]
b, h, w = mask.size()
batch_pixel_size = input.size(0) * input.size(2) * input.size(3)
input, mask_cuda = input.cuda(), mask.cuda()
with torch.no_grad():
seg_out, edge_out = net(input) # output = (1, 19, 713, 713)
seg_predictions = seg_out.data.cpu().numpy()
edge_predictions = edge_out.cpu().numpy()
for i in range(b):
_,file_name = os.path.split(img_path[i])
file_name = file_name.replace("jpg","png")
seq = img_path[i][:5]
seg_path = os.path.join(test_sv_path,"gscnn","labels",seq)
if not os.path.exists(seg_path):
os.makedirs(seg_path)
seg_arg = np.argmax(seg_predictions[i],axis=0).astype(np.uint8)
seg_arg = convert_label(seg_arg,True)
seg_img = np.stack((seg_arg,seg_arg,seg_arg),axis=2)
seg_img = Image.fromarray(seg_img)
seg_img.save(os.path.join(seg_path,file_name))
if args.viz:
edge_arg = np.argmax(edge_predictions[i],axis=0).astype(np.uint8)
edge_img = np.stack((edge_arg,edge_arg,edge_arg),axis=2)
edge_path = os.path.join(test_sv_path,"gscnn","edge",seq)
#edgenp_path = os.path.join(test_sv_path,"gscnn","edgenp",seq)
if not os.path.exists(edge_path):
os.makedirs(edge_path)
#os.makedirs(edgenp_path)
edge_img = Image.fromarray(edge_img)
edge_img.save(os.path.join(edge_path,file_name))
color_label = convert_color(seg_arg,id_color_map)
color_path = os.path.join(test_sv_path,"gscnn","color",seq)
if not os.path.exists(color_path):
os.makedirs(color_path)
color_label = convert_color(seg_arg,id_color_map)
color_label = Image.fromarray(color_label)
color_label.save(os.path.join(color_path,file_name))
return
if args.evaluate:
# Early evaluation for benchmarking
default_eval_epoch = 1
validate(val_loader, net, criterion_val,
optim, default_eval_epoch, writer)
evaluate(val_loader, net)
return
#Main Loop
for epoch in range(args.start_epoch, args.max_epoch):
# Update EPOCH CTR
cfg.immutable(False)
cfg.EPOCH = epoch
cfg.immutable(True)
scheduler.step()
train(train_loader, net, criterion, optim, epoch, writer)
validate(val_loader, net, criterion_val,
optim, epoch, writer)
def train(train_loader, net, criterion, optimizer, curr_epoch, writer):
'''
Runs the training loop per epoch
train_loader: Data loader for train
net: thet network
criterion: loss fn
optimizer: optimizer
curr_epoch: current epoch
writer: tensorboard writer
return: val_avg for step function if required
'''
net.train()
train_main_loss = AverageMeter()
train_edge_loss = AverageMeter()
train_seg_loss = AverageMeter()
train_att_loss = AverageMeter()
train_dual_loss = AverageMeter()
curr_iter = curr_epoch * len(train_loader)
for i, data in enumerate(train_loader):
if i==0:
print('running....')
inputs, mask, edge, _img_name = data
if torch.sum(torch.isnan(inputs)) > 0:
import pdb; pdb.set_trace()
batch_pixel_size = inputs.size(0) * inputs.size(2) * inputs.size(3)
inputs, mask, edge = inputs.cuda(), mask.cuda(), edge.cuda()
if i==0:
print('forward done')
optimizer.zero_grad()
main_loss = None
loss_dict = None
if args.joint_edgeseg_loss:
loss_dict = net(inputs, gts=(mask, edge))
if args.seg_weight > 0:
log_seg_loss = loss_dict['seg_loss'].mean().clone().detach_()
train_seg_loss.update(log_seg_loss.item(), batch_pixel_size)
main_loss = loss_dict['seg_loss']
if args.edge_weight > 0:
log_edge_loss = loss_dict['edge_loss'].mean().clone().detach_()
train_edge_loss.update(log_edge_loss.item(), batch_pixel_size)
if main_loss is not None:
main_loss += loss_dict['edge_loss']
else:
main_loss = loss_dict['edge_loss']
if args.att_weight > 0:
log_att_loss = loss_dict['att_loss'].mean().clone().detach_()
train_att_loss.update(log_att_loss.item(), batch_pixel_size)
if main_loss is not None:
main_loss += loss_dict['att_loss']
else:
main_loss = loss_dict['att_loss']
if args.dual_weight > 0:
log_dual_loss = loss_dict['dual_loss'].mean().clone().detach_()
train_dual_loss.update(log_dual_loss.item(), batch_pixel_size)
if main_loss is not None:
main_loss += loss_dict['dual_loss']
else:
main_loss = loss_dict['dual_loss']
else:
main_loss = net(inputs, gts=mask)
main_loss = main_loss.mean()
log_main_loss = main_loss.clone().detach_()
train_main_loss.update(log_main_loss.item(), batch_pixel_size)
main_loss.backward()
optimizer.step()
if i==0:
print('step 1 done')
curr_iter += 1
if args.local_rank == 0:
msg = '[epoch {}], [iter {} / {}], [train main loss {:0.6f}], [seg loss {:0.6f}], [edge loss {:0.6f}], [lr {:0.6f}]'.format(
curr_epoch, i + 1, len(train_loader), train_main_loss.avg, train_seg_loss.avg, train_edge_loss.avg, optimizer.param_groups[-1]['lr'] )
logging.info(msg)
# Log tensorboard metrics for each iteration of the training phase
writer.add_scalar('training/loss', (train_main_loss.val),
curr_iter)
writer.add_scalar('training/lr', optimizer.param_groups[-1]['lr'],
curr_iter)
if args.joint_edgeseg_loss:
writer.add_scalar('training/seg_loss', (train_seg_loss.val),
curr_iter)
writer.add_scalar('training/edge_loss', (train_edge_loss.val),
curr_iter)
writer.add_scalar('training/att_loss', (train_att_loss.val),
curr_iter)
writer.add_scalar('training/dual_loss', (train_dual_loss.val),
curr_iter)
if i > 5 and args.test_mode:
return
def validate(val_loader, net, criterion, optimizer, curr_epoch, writer):
'''
Runs the validation loop after each training epoch
val_loader: Data loader for validation
net: thet network
criterion: loss fn
optimizer: optimizer
curr_epoch: current epoch
writer: tensorboard writer
return:
'''
net.eval()
val_loss = AverageMeter()
mf_score = AverageMeter()
IOU_acc = 0
dump_images = []
heatmap_images = []
for vi, data in enumerate(val_loader):
input, mask, edge, img_names = data
assert len(input.size()) == 4 and len(mask.size()) == 3
assert input.size()[2:] == mask.size()[1:]
h, w = mask.size()[1:]
batch_pixel_size = input.size(0) * input.size(2) * input.size(3)
input, mask_cuda, edge_cuda = input.cuda(), mask.cuda(), edge.cuda()
with torch.no_grad():
seg_out, edge_out = net(input) # output = (1, 19, 713, 713)
if args.joint_edgeseg_loss:
loss_dict = criterion((seg_out, edge_out), (mask_cuda, edge_cuda))
val_loss.update(sum(loss_dict.values()).item(), batch_pixel_size)
else:
val_loss.update(criterion(seg_out, mask_cuda).item(), batch_pixel_size)
# Collect data from different GPU to a single GPU since
# encoding.parallel.criterionparallel function calculates distributed loss
# functions
seg_predictions = seg_out.data.max(1)[1].cpu()
edge_predictions = edge_out.max(1)[0].cpu()
#Logging
if vi % 20 == 0:
if args.local_rank == 0:
logging.info('validating: %d / %d' % (vi + 1, len(val_loader)))
if vi > 10 and args.test_mode:
break
_edge = edge.max(1)[0]
#Image Dumps
if vi < 10:
dump_images.append([mask, seg_predictions, img_names])
heatmap_images.append([_edge, edge_predictions, img_names])
IOU_acc += fast_hist(seg_predictions.numpy().flatten(), mask.numpy().flatten(),
args.dataset_cls.num_classes)
del seg_out, edge_out, vi, data
if args.local_rank == 0:
evaluate_eval(args, net, optimizer, val_loss, mf_score, IOU_acc, dump_images, heatmap_images,
writer, curr_epoch, args.dataset_cls)
return val_loss.avg
def evaluate(val_loader, net):
'''
Runs the evaluation loop and prints F score
val_loader: Data loader for validation
net: thet network
return:
'''
net.eval()
for thresh in args.eval_thresholds.split(','):
mf_score1 = AverageMeter()
mf_pc_score1 = AverageMeter()
ap_score1 = AverageMeter()
ap_pc_score1 = AverageMeter()
Fpc = np.zeros((args.dataset_cls.num_classes))
Fc = np.zeros((args.dataset_cls.num_classes))
for vi, data in enumerate(val_loader):
input, mask, edge, img_names = data
assert len(input.size()) == 4 and len(mask.size()) == 3
assert input.size()[2:] == mask.size()[1:]
h, w = mask.size()[1:]
batch_pixel_size = input.size(0) * input.size(2) * input.size(3)
input, mask_cuda, edge_cuda = input.cuda(), mask.cuda(), edge.cuda()
with torch.no_grad():
seg_out, edge_out = net(input)
seg_predictions = seg_out.data.max(1)[1].cpu()
edge_predictions = edge_out.max(1)[0].cpu()
logging.info('evaluating: %d / %d' % (vi + 1, len(val_loader)))
_Fpc, _Fc = eval_mask_boundary(seg_predictions.numpy(), mask.numpy(), args.dataset_cls.num_classes, bound_th=float(thresh))
Fc += _Fc
Fpc += _Fpc
del seg_out, edge_out, vi, data
logging.info('Threshold: ' + thresh)
logging.info('F_Score: ' + str(np.sum(Fpc/Fc)/args.dataset_cls.num_classes))
logging.info('F_Score (Classwise): ' + str(Fpc/Fc))
if __name__ == '__main__':
main()
================================================
FILE: benchmarks/GSCNN-master/transforms/joint_transforms.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# Code borrowded from:
# https://github.com/zijundeng/pytorch-semantic-segmentation/blob/master/utils/joint_transforms.py
#
#
# MIT License
#
# Copyright (c) 2017 ZijunDeng
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
"""
import math
import numbers
import random
from PIL import Image, ImageOps
import numpy as np
import random
class Compose(object):
def __init__(self, transforms):
self.transforms = transforms
def __call__(self, img, mask):
assert img.size == mask.size
for t in self.transforms:
img, mask = t(img, mask)
return img, mask
class RandomCrop(object):
'''
Take a random crop from the image.
First the image or crop size may need to be adjusted if the incoming image
is too small...
If the image is smaller than the crop, then:
the image is padded up to the size of the crop
unless 'nopad', in which case the crop size is shrunk to fit the image
A random crop is taken such that the crop fits within the image.
If a centroid is passed in, the crop must intersect the centroid.
'''
def __init__(self, size, ignore_index=0, nopad=True):
if isinstance(size, numbers.Number):
self.size = (int(size), int(size))
else:
self.size = size
self.ignore_index = ignore_index
self.nopad = nopad
self.pad_color = (0, 0, 0)
def __call__(self, img, mask, centroid=None):
assert img.size == mask.size
w, h = img.size
# ASSUME H, W
th, tw = self.size
if w == tw and h == th:
return img, mask
if self.nopad:
if th > h or tw > w:
# Instead of padding, adjust crop size to the shorter edge of image.
shorter_side = min(w, h)
th, tw = shorter_side, shorter_side
else:
# Check if we need to pad img to fit for crop_size.
if th > h:
pad_h = (th - h) // 2 + 1
else:
pad_h = 0
if tw > w:
pad_w = (tw - w) // 2 + 1
else:
pad_w = 0
border = (pad_w, pad_h, pad_w, pad_h)
if pad_h or pad_w:
img = ImageOps.expand(img, border=border, fill=self.pad_color)
mask = ImageOps.expand(mask, border=border, fill=self.ignore_index)
w, h = img.size
if centroid is not None:
# Need to insure that centroid is covered by crop and that crop
# sits fully within the image
c_x, c_y = centroid
max_x = w - tw
max_y = h - th
x1 = random.randint(c_x - tw, c_x)
x1 = min(max_x, max(0, x1))
y1 = random.randint(c_y - th, c_y)
y1 = min(max_y, max(0, y1))
else:
if w == tw:
x1 = 0
else:
x1 = random.randint(0, w - tw)
if h == th:
y1 = 0
else:
y1 = random.randint(0, h - th)
return img.crop((x1, y1, x1 + tw, y1 + th)), mask.crop((x1, y1, x1 + tw, y1 + th))
class ResizeHeight(object):
def __init__(self, size, interpolation=Image.BICUBIC):
self.target_h = size
self.interpolation = interpolation
def __call__(self, img, mask):
w, h = img.size
target_w = int(w / h * self.target_h)
return (img.resize((target_w, self.target_h), self.interpolation),
mask.resize((target_w, self.target_h), Image.NEAREST))
class CenterCrop(object):
def __init__(self, size):
if isinstance(size, numbers.Number):
self.size = (int(size), int(size))
else:
self.size = size
def __call__(self, img, mask):
assert img.size == mask.size
w, h = img.size
th, tw = self.size
x1 = int(round((w - tw) / 2.))
y1 = int(round((h - th) / 2.))
return img.crop((x1, y1, x1 + tw, y1 + th)), mask.crop((x1, y1, x1 + tw, y1 + th))
class CenterCropPad(object):
def __init__(self, size, ignore_index=0):
if isinstance(size, numbers.Number):
self.size = (int(size), int(size))
else:
self.size = size
self.ignore_index = ignore_index
def __call__(self, img, mask):
assert img.size == mask.size
w, h = img.size
if isinstance(self.size, tuple):
tw, th = self.size[0], self.size[1]
else:
th, tw = self.size, self.size
if w < tw:
pad_x = tw - w
else:
pad_x = 0
if h < th:
pad_y = th - h
else:
pad_y = 0
if pad_x or pad_y:
# left, top, right, bottom
img = ImageOps.expand(img, border=(pad_x, pad_y, pad_x, pad_y), fill=0)
mask = ImageOps.expand(mask, border=(pad_x, pad_y, pad_x, pad_y),
fill=self.ignore_index)
x1 = int(round((w - tw) / 2.))
y1 = int(round((h - th) / 2.))
return img.crop((x1, y1, x1 + tw, y1 + th)), mask.crop((x1, y1, x1 + tw, y1 + th))
class PadImage(object):
def __init__(self, size, ignore_index):
self.size = size
self.ignore_index = ignore_index
def __call__(self, img, mask):
assert img.size == mask.size
th, tw = self.size, self.size
w, h = img.size
if w > tw or h > th :
wpercent = (tw/float(w))
target_h = int((float(img.size[1])*float(wpercent)))
img, mask = img.resize((tw, target_h), Image.BICUBIC), mask.resize((tw, target_h), Image.NEAREST)
w, h = img.size
##Pad
img = ImageOps.expand(img, border=(0,0,tw-w, th-h), fill=0)
mask = ImageOps.expand(mask, border=(0,0,tw-w, th-h), fill=self.ignore_index)
return img, mask
class RandomHorizontallyFlip(object):
def __call__(self, img, mask):
if random.random() < 0.5:
return img.transpose(Image.FLIP_LEFT_RIGHT), mask.transpose(
Image.FLIP_LEFT_RIGHT)
return img, mask
class FreeScale(object):
def __init__(self, size):
self.size = tuple(reversed(size)) # size: (h, w)
def __call__(self, img, mask):
assert img.size == mask.size
return img.resize(self.size, Image.BICUBIC), mask.resize(self.size, Image.NEAREST)
class Scale(object):
'''
Scale image such that longer side is == size
'''
def __init__(self, size):
self.size = size
def __call__(self, img, mask):
assert img.size == mask.size
w, h = img.size
if (w >= h and w == self.size) or (h >= w and h == self.size):
return img, mask
if w > h:
ow = self.size
oh = int(self.size * h / w)
return img.resize((ow, oh), Image.BICUBIC), mask.resize(
(ow, oh), Image.NEAREST)
else:
oh = self.size
ow = int(self.size * w / h)
return img.resize((ow, oh), Image.BICUBIC), mask.resize(
(ow, oh), Image.NEAREST)
class ScaleMin(object):
'''
Scale image such that shorter side is == size
'''
def __init__(self, size):
self.size = size
def __call__(self, img, mask):
assert img.size == mask.size
w, h = img.size
if (w <= h and w == self.size) or (h <= w and h == self.size):
return img, mask
if w < h:
ow = self.size
oh = int(self.size * h / w)
return img.resize((ow, oh), Image.BICUBIC), mask.resize(
(ow, oh), Image.NEAREST)
else:
oh = self.size
ow = int(self.size * w / h)
return img.resize((ow, oh), Image.BICUBIC), mask.resize(
(ow, oh), Image.NEAREST)
class Resize(object):
'''
Resize image to exact size of crop
'''
def __init__(self, size):
self.size = (size, size)
def __call__(self, img, mask):
assert img.size == mask.size
w, h = img.size
if (w == h and w == self.size):
return img, mask
return (img.resize(self.size, Image.BICUBIC),
mask.resize(self.size, Image.NEAREST))
class RandomSizedCrop(object):
def __init__(self, size):
self.size = size
def __call__(self, img, mask):
assert img.size == mask.size
for attempt in range(10):
area = img.size[0] * img.size[1]
target_area = random.uniform(0.45, 1.0) * area
aspect_ratio = random.uniform(0.5, 2)
w = int(round(math.sqrt(target_area * aspect_ratio)))
h = int(round(math.sqrt(target_area / aspect_ratio)))
if random.random() < 0.5:
w, h = h, w
if w <= img.size[0] and h <= img.size[1]:
x1 = random.randint(0, img.size[0] - w)
y1 = random.randint(0, img.size[1] - h)
img = img.crop((x1, y1, x1 + w, y1 + h))
mask = mask.crop((x1, y1, x1 + w, y1 + h))
assert (img.size == (w, h))
return img.resize((self.size, self.size), Image.BICUBIC),\
mask.resize((self.size, self.size), Image.NEAREST)
# Fallback
scale = Scale(self.size)
crop = CenterCrop(self.size)
return crop(*scale(img, mask))
class RandomRotate(object):
def __init__(self, degree):
self.degree = degree
def __call__(self, img, mask):
rotate_degree = random.random() * 2 * self.degree - self.degree
return img.rotate(rotate_degree, Image.BICUBIC), mask.rotate(
rotate_degree, Image.NEAREST)
class RandomSizeAndCrop(object):
def __init__(self, size, crop_nopad,
scale_min=0.5, scale_max=2.0, ignore_index=0, pre_size=None):
self.size = size
self.crop = RandomCrop(self.size, ignore_index=ignore_index, nopad=crop_nopad)
self.scale_min = scale_min
self.scale_max = scale_max
self.pre_size = pre_size
def __call__(self, img, mask, centroid=None):
assert img.size == mask.size
# first, resize such that shorter edge is pre_size
if self.pre_size is None:
scale_amt = 1.
elif img.size[1] < img.size[0]:
scale_amt = self.pre_size / img.size[1]
else:
scale_amt = self.pre_size / img.size[0]
scale_amt *= random.uniform(self.scale_min, self.scale_max)
w, h = [int(i * scale_amt) for i in img.size]
if centroid is not None:
centroid = [int(c * scale_amt) for c in centroid]
img, mask = img.resize((w, h), Image.BICUBIC), mask.resize((w, h), Image.NEAREST)
return self.crop(img, mask, centroid)
class SlidingCropOld(object):
def __init__(self, crop_size, stride_rate, ignore_label):
self.crop_size = crop_size
self.stride_rate = stride_rate
self.ignore_label = ignore_label
def _pad(self, img, mask):
h, w = img.shape[: 2]
pad_h = max(self.crop_size - h, 0)
pad_w = max(self.crop_size - w, 0)
img = np.pad(img, ((0, pad_h), (0, pad_w), (0, 0)), 'constant')
mask = np.pad(mask, ((0, pad_h), (0, pad_w)), 'constant',
constant_values=self.ignore_label)
return img, mask
def __call__(self, img, mask):
assert img.size == mask.size
w, h = img.size
long_size = max(h, w)
img = np.array(img)
mask = np.array(mask)
if long_size > self.crop_size:
stride = int(math.ceil(self.crop_size * self.stride_rate))
h_step_num = int(math.ceil((h - self.crop_size) / float(stride))) + 1
w_step_num = int(math.ceil((w - self.crop_size) / float(stride))) + 1
img_sublist, mask_sublist = [], []
for yy in range(h_step_num):
for xx in range(w_step_num):
sy, sx = yy * stride, xx * stride
ey, ex = sy + self.crop_size, sx + self.crop_size
img_sub = img[sy: ey, sx: ex, :]
mask_sub = mask[sy: ey, sx: ex]
img_sub, mask_sub = self._pad(img_sub, mask_sub)
img_sublist.append(
Image.fromarray(
img_sub.astype(
np.uint8)).convert('RGB'))
mask_sublist.append(
Image.fromarray(
mask_sub.astype(
np.uint8)).convert('P'))
return img_sublist, mask_sublist
else:
img, mask = self._pad(img, mask)
img = Image.fromarray(img.astype(np.uint8)).convert('RGB')
mask = Image.fromarray(mask.astype(np.uint8)).convert('P')
return img, mask
class SlidingCrop(object):
def __init__(self, crop_size, stride_rate, ignore_label):
self.crop_size = crop_size
self.stride_rate = stride_rate
self.ignore_label = ignore_label
def _pad(self, img, mask):
h, w = img.shape[: 2]
pad_h = max(self.crop_size - h, 0)
pad_w = max(self.crop_size - w, 0)
img = np.pad(img, ((0, pad_h), (0, pad_w), (0, 0)), 'constant')
mask = np.pad(mask, ((0, pad_h), (0, pad_w)), 'constant',
constant_values=self.ignore_label)
return img, mask, h, w
def __call__(self, img, mask):
assert img.size == mask.size
w, h = img.size
long_size = max(h, w)
img = np.array(img)
mask = np.array(mask)
if long_size > self.crop_size:
stride = int(math.ceil(self.crop_size * self.stride_rate))
h_step_num = int(math.ceil((h - self.crop_size) / float(stride))) + 1
w_step_num = int(math.ceil((w - self.crop_size) / float(stride))) + 1
img_slices, mask_slices, slices_info = [], [], []
for yy in range(h_step_num):
for xx in range(w_step_num):
sy, sx = yy * stride, xx * stride
ey, ex = sy + self.crop_size, sx + self.crop_size
img_sub = img[sy: ey, sx: ex, :]
mask_sub = mask[sy: ey, sx: ex]
img_sub, mask_sub, sub_h, sub_w = self._pad(img_sub, mask_sub)
img_slices.append(
Image.fromarray(
img_sub.astype(
np.uint8)).convert('RGB'))
mask_slices.append(
Image.fromarray(
mask_sub.astype(
np.uint8)).convert('P'))
slices_info.append([sy, ey, sx, ex, sub_h, sub_w])
return img_slices, mask_slices, slices_info
else:
img, mask, sub_h, sub_w = self._pad(img, mask)
img = Image.fromarray(img.astype(np.uint8)).convert('RGB')
mask = Image.fromarray(mask.astype(np.uint8)).convert('P')
return [img], [mask], [[0, sub_h, 0, sub_w, sub_h, sub_w]]
class ClassUniform(object):
def __init__(self, size, crop_nopad, scale_min=0.5, scale_max=2.0, ignore_index=0,
class_list=[16, 15, 14]):
"""
This is the initialization for class uniform sampling
:param size: crop size (int)
:param crop_nopad: Padding or no padding (bool)
:param scale_min: Minimum Scale (float)
:param scale_max: Maximum Scale (float)
:param ignore_index: The index value to ignore in the GT images (unsigned int)
:param class_list: A list of class to sample around, by default Truck, train, bus
"""
self.size = size
self.crop = RandomCrop(self.size, ignore_index=ignore_index, nopad=crop_nopad)
self.class_list = class_list.replace(" ", "").split(",")
self.scale_min = scale_min
self.scale_max = scale_max
def detect_peaks(self, image):
"""
Takes an image and detect the peaks usingthe local maximum filter.
Returns a boolean mask of the peaks (i.e. 1 when
the pixel's value is the neighborhood maximum, 0 otherwise)
:param image: An 2d input images
:return: Binary output images of the same size as input with pixel value equal
to 1 indicating that there is peak at that point
"""
# define an 8-connected neighborhood
neighborhood = generate_binary_structure(2, 2)
# apply the local maximum filter; all pixel of maximal value
# in their neighborhood are set to 1
local_max = maximum_filter(image, footprint=neighborhood) == image
# local_max is a mask that contains the peaks we are
# looking for, but also the background.
# In order to isolate the peaks we must remove the background from the mask.
# we create the mask of the background
background = (image == 0)
# a little technicality: we must erode the background in order to
# successfully subtract it form local_max, otherwise a line will
# appear along the background border (artifact of the local maximum filter)
eroded_background = binary_erosion(background, structure=neighborhood,
border_value=1)
# we obtain the final mask, containing only peaks,
# by removing the background from the local_max mask (xor operation)
detected_peaks = local_max ^ eroded_background
return detected_peaks
def __call__(self, img, mask):
"""
:param img: PIL Input Image
:param mask: PIL Input Mask
:return: PIL output PIL (mask, crop) of self.crop_size
"""
assert img.size == mask.size
scale_amt = random.uniform(self.scale_min, self.scale_max)
w = int(scale_amt * img.size[0])
h = int(scale_amt * img.size[1])
if scale_amt < 1.0:
img, mask = img.resize((w, h), Image.BICUBIC), mask.resize((w, h),
Image.NEAREST)
return self.crop(img, mask)
else:
# Smart Crop ( Class Uniform's ABN)
origw, origh = mask.size
img_new, mask_new = \
img.resize((w, h), Image.BICUBIC), mask.resize((w, h), Image.NEAREST)
interested_class = self.class_list # [16, 15, 14] # Train, Truck, Bus
data = np.array(mask)
arr = np.zeros((1024, 2048))
for class_of_interest in interested_class:
# hist = np.histogram(data==class_of_interest)
map = np.where(data == class_of_interest, data, 0)
map = map.astype('float64') / map.sum() / class_of_interest
map[np.isnan(map)] = 0
arr = arr + map
origarr = arr
window_size = 250
# Given a list of classes of interest find the points on the image that are
# of interest to crop from
sum_arr = np.zeros((1024, 2048)).astype('float32')
tmp = np.zeros((1024, 2048)).astype('float32')
for x in range(0, arr.shape[0] - window_size, window_size):
for y in range(0, arr.shape[1] - window_size, window_size):
sum_arr[int(x + window_size / 2), int(y + window_size / 2)] = origarr[
x:x + window_size,
y:y + window_size].sum()
tmp[x:x + window_size, y:y + window_size] = \
origarr[x:x + window_size, y:y + window_size].sum()
# Scaling Ratios in X and Y for non-uniform images
ratio = (float(origw) / w, float(origh) / h)
output = self.detect_peaks(sum_arr)
coord = (np.column_stack(np.where(output))).tolist()
# Check if there are any peaks in the images to crop from if not do standard
# cropping behaviour
if len(coord) == 0:
return self.crop(img_new, mask_new)
else:
# If peaks are detected, random peak selection followed by peak
# coordinate scaling to new scaled image and then random
# cropping around the peak point in the scaled image
randompick = np.random.randint(len(coord))
y, x = coord[randompick]
y, x = int(y * ratio[0]), int(x * ratio[1])
window_size = window_size * ratio[0]
cropx = random.uniform(
max(0, (x - window_size / 2) - (self.size - window_size)),
max((x - window_size / 2), (x - window_size / 2) - (
(w - window_size) - x + window_size / 2)))
cropy = random.uniform(
max(0, (y - window_size / 2) - (self.size - window_size)),
max((y - window_size / 2), (y - window_size / 2) - (
(h - window_size) - y + window_size / 2)))
return_img = img_new.crop(
(cropx, cropy, cropx + self.size, cropy + self.size))
return_mask = mask_new.crop(
(cropx, cropy, cropx + self.size, cropy + self.size))
return (return_img, return_mask)
================================================
FILE: benchmarks/GSCNN-master/transforms/transforms.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# Code borrowded from:
# https://github.com/zijundeng/pytorch-semantic-segmentation/blob/master/utils/transforms.py
#
#
# MIT License
#
# Copyright (c) 2017 ZijunDeng
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
"""
import random
import numpy as np
from skimage.filters import gaussian
from skimage.restoration import denoise_bilateral
import torch
from PIL import Image, ImageFilter, ImageEnhance
import torchvision.transforms as torch_tr
from scipy import ndimage
from config import cfg
from scipy.ndimage.interpolation import shift
#from scipy.misc import imsave
import imageio
from skimage.segmentation import find_boundaries
class RandomVerticalFlip(object):
def __call__(self, img):
if random.random() < 0.5:
return img.transpose(Image.FLIP_TOP_BOTTOM)
return img
class DeNormalize(object):
def __init__(self, mean, std):
self.mean = mean
self.std = std
def __call__(self, tensor):
for t, m, s in zip(tensor, self.mean, self.std):
t.mul_(s).add_(m)
return tensor
class MaskToTensor(object):
def __call__(self, img):
return torch.from_numpy(np.array(img, dtype=np.int32)).long()
class RelaxedBoundaryLossToTensor(object):
def __init__(self,ignore_id, num_classes):
self.ignore_id=ignore_id
self.num_classes= num_classes
def new_one_hot_converter(self,a):
ncols = self.num_classes+1
out = np.zeros( (a.size,ncols), dtype=np.uint8)
out[np.arange(a.size),a.ravel()] = 1
out.shape = a.shape + (ncols,)
return out
def __call__(self,img):
img_arr = np.array(img)
imageio.imwrite('orig.png',img_arr)
img_arr[img_arr==self.ignore_id]=self.num_classes
if cfg.STRICTBORDERCLASS != None:
one_hot_orig = self.new_one_hot_converter(img_arr)
mask = np.zeros((img_arr.shape[0],img_arr.shape[1]))
for cls in cfg.STRICTBORDERCLASS:
mask = np.logical_or(mask,(img_arr == cls))
one_hot = 0
#print(cfg.EPOCH, "Non Reduced", cfg.TRAIN.REDUCE_RELAXEDITERATIONCOUNT)
border = cfg.BORDER_WINDOW
if (cfg.REDUCE_BORDER_EPOCH !=-1 and cfg.EPOCH > cfg.REDUCE_BORDER_EPOCH):
border = border // 2
border_prediction = find_boundaries(img_arr, mode='thick').astype(np.uint8)
print(cfg.EPOCH, "Reduced")
for i in range(-border,border+1):
for j in range(-border, border+1):
shifted= shift(img_arr,(i,j), cval=self.num_classes)
one_hot += self.new_one_hot_converter(shifted)
one_hot[one_hot>1] = 1
if cfg.STRICTBORDERCLASS != None:
one_hot = np.where(np.expand_dims(mask,2), one_hot_orig, one_hot)
one_hot = np.moveaxis(one_hot,-1,0)
if (cfg.REDUCE_BORDER_EPOCH !=-1 and cfg.EPOCH > cfg.REDUCE_BORDER_EPOCH):
one_hot = np.where(border_prediction,2*one_hot,1*one_hot)
print(one_hot.shape)
return torch.from_numpy(one_hot).byte()
#return torch.from_numpy(one_hot).float()
exit(0)
class ResizeHeight(object):
def __init__(self, size, interpolation=Image.BILINEAR):
self.target_h = size
self.interpolation = interpolation
def __call__(self, img):
w, h = img.size
target_w = int(w / h * self.target_h)
return img.resize((target_w, self.target_h), self.interpolation)
class FreeScale(object):
def __init__(self, size, interpolation=Image.BILINEAR):
self.size = tuple(reversed(size)) # size: (h, w)
self.interpolation = interpolation
def __call__(self, img):
return img.resize(self.size, self.interpolation)
class FlipChannels(object):
def __call__(self, img):
img = np.array(img)[:, :, ::-1]
return Image.fromarray(img.astype(np.uint8))
class RandomGaussianBlur(object):
def __call__(self, img):
sigma = 0.15 + random.random() * 1.15
blurred_img = gaussian(np.array(img), sigma=sigma, multichannel=True)
blurred_img *= 255
return Image.fromarray(blurred_img.astype(np.uint8))
class RandomBilateralBlur(object):
def __call__(self, img):
sigma = random.uniform(0.05,0.75)
blurred_img = denoise_bilateral(np.array(img), sigma_spatial=sigma, multichannel=True)
blurred_img *= 255
return Image.fromarray(blurred_img.astype(np.uint8))
try:
import accimage
except ImportError:
accimage = None
def _is_pil_image(img):
if accimage is not None:
return isinstance(img, (Image.Image, accimage.Image))
else:
return isinstance(img, Image.Image)
def adjust_brightness(img, brightness_factor):
"""Adjust brightness of an Image.
Args:
img (PIL Image): PIL Image to be adjusted.
brightness_factor (float): How much to adjust the brightness. Can be
any non negative number. 0 gives a black image, 1 gives the
original image while 2 increases the brightness by a factor of 2.
Returns:
PIL Image: Brightness adjusted image.
"""
if not _is_pil_image(img):
raise TypeError('img should be PIL Image. Got {}'.format(type(img)))
enhancer = ImageEnhance.Brightness(img)
img = enhancer.enhance(brightness_factor)
return img
def adjust_contrast(img, contrast_factor):
"""Adjust contrast of an Image.
Args:
img (PIL Image): PIL Image to be adjusted.
contrast_factor (float): How much to adjust the contrast. Can be any
non negative number. 0 gives a solid gray image, 1 gives the
original image while 2 increases the contrast by a factor of 2.
Returns:
PIL Image: Contrast adjusted image.
"""
if not _is_pil_image(img):
raise TypeError('img should be PIL Image. Got {}'.format(type(img)))
enhancer = ImageEnhance.Contrast(img)
img = enhancer.enhance(contrast_factor)
return img
def adjust_saturation(img, saturation_factor):
"""Adjust color saturation of an image.
Args:
img (PIL Image): PIL Image to be adjusted.
saturation_factor (float): How much to adjust the saturation. 0 will
give a black and white image, 1 will give the original image while
2 will enhance the saturation by a factor of 2.
Returns:
PIL Image: Saturation adjusted image.
"""
if not _is_pil_image(img):
raise TypeError('img should be PIL Image. Got {}'.format(type(img)))
enhancer = ImageEnhance.Color(img)
img = enhancer.enhance(saturation_factor)
return img
def adjust_hue(img, hue_factor):
"""Adjust hue of an image.
The image hue is adjusted by converting the image to HSV and
cyclically shifting the intensities in the hue channel (H).
The image is then converted back to original image mode.
`hue_factor` is the amount of shift in H channel and must be in the
interval `[-0.5, 0.5]`.
See https://en.wikipedia.org/wiki/Hue for more details on Hue.
Args:
img (PIL Image): PIL Image to be adjusted.
hue_factor (float): How much to shift the hue channel. Should be in
[-0.5, 0.5]. 0.5 and -0.5 give complete reversal of hue channel in
HSV space in positive and negative direction respectively.
0 means no shift. Therefore, both -0.5 and 0.5 will give an image
with complementary colors while 0 gives the original image.
Returns:
PIL Image: Hue adjusted image.
"""
if not(-0.5 <= hue_factor <= 0.5):
raise ValueError('hue_factor is not in [-0.5, 0.5].'.format(hue_factor))
if not _is_pil_image(img):
raise TypeError('img should be PIL Image. Got {}'.format(type(img)))
input_mode = img.mode
if input_mode in {'L', '1', 'I', 'F'}:
return img
h, s, v = img.convert('HSV').split()
np_h = np.array(h, dtype=np.uint8)
# uint8 addition take cares of rotation across boundaries
with np.errstate(over='ignore'):
np_h += np.uint8(hue_factor * 255)
h = Image.fromarray(np_h, 'L')
img = Image.merge('HSV', (h, s, v)).convert(input_mode)
return img
class ColorJitter(object):
"""Randomly change the brightness, contrast and saturation of an image.
Args:
brightness (float): How much to jitter brightness. brightness_factor
is chosen uniformly from [max(0, 1 - brightness), 1 + brightness].
contrast (float): How much to jitter contrast. contrast_factor
is chosen uniformly from [max(0, 1 - contrast), 1 + contrast].
saturation (float): How much to jitter saturation. saturation_factor
is chosen uniformly from [max(0, 1 - saturation), 1 + saturation].
hue(float): How much to jitter hue. hue_factor is chosen uniformly from
[-hue, hue]. Should be >=0 and <= 0.5.
"""
def __init__(self, brightness=0, contrast=0, saturation=0, hue=0):
self.brightness = brightness
self.contrast = contrast
self.saturation = saturation
self.hue = hue
@staticmethod
def get_params(brightness, contrast, saturation, hue):
"""Get a randomized transform to be applied on image.
Arguments are same as that of __init__.
Returns:
Transform which randomly adjusts brightness, contrast and
saturation in a random order.
"""
transforms = []
if brightness > 0:
brightness_factor = np.random.uniform(max(0, 1 - brightness), 1 + brightness)
transforms.append(
torch_tr.Lambda(lambda img: adjust_brightness(img, brightness_factor)))
if contrast > 0:
contrast_factor = np.random.uniform(max(0, 1 - contrast), 1 + contrast)
transforms.append(
torch_tr.Lambda(lambda img: adjust_contrast(img, contrast_factor)))
if saturation > 0:
saturation_factor = np.random.uniform(max(0, 1 - saturation), 1 + saturation)
transforms.append(
torch_tr.Lambda(lambda img: adjust_saturation(img, saturation_factor)))
if hue > 0:
hue_factor = np.random.uniform(-hue, hue)
transforms.append(
torch_tr.Lambda(lambda img: adjust_hue(img, hue_factor)))
np.random.shuffle(transforms)
transform = torch_tr.Compose(transforms)
return transform
def __call__(self, img):
"""
Args:
img (PIL Image): Input image.
Returns:
PIL Image: Color jittered image.
"""
transform = self.get_params(self.brightness, self.contrast,
self.saturation, self.hue)
return transform(img)
================================================
FILE: benchmarks/GSCNN-master/utils/AttrDict.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# Code adapted from:
# https://github.com/facebookresearch/Detectron/blob/master/detectron/utils/collections.py
Source License
# Copyright (c) 2017-present, Facebook, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
##############################################################################
#
# Based on:
# --------------------------------------------------------
# Fast R-CNN
# Copyright (c) 2015 Microsoft
# Licensed under The MIT License [see LICENSE for details]
# Written by Ross Girshick
# --------------------------------------------------------
"""
class AttrDict(dict):
IMMUTABLE = '__immutable__'
def __init__(self, *args, **kwargs):
super(AttrDict, self).__init__(*args, **kwargs)
self.__dict__[AttrDict.IMMUTABLE] = False
def __getattr__(self, name):
if name in self.__dict__:
return self.__dict__[name]
elif name in self:
return self[name]
else:
raise AttributeError(name)
def __setattr__(self, name, value):
if not self.__dict__[AttrDict.IMMUTABLE]:
if name in self.__dict__:
self.__dict__[name] = value
else:
self[name] = value
else:
raise AttributeError(
'Attempted to set "{}" to "{}", but AttrDict is immutable'.
format(name, value)
)
def immutable(self, is_immutable):
"""Set immutability to is_immutable and recursively apply the setting
to all nested AttrDicts.
"""
self.__dict__[AttrDict.IMMUTABLE] = is_immutable
# Recursively set immutable state
for v in self.__dict__.values():
if isinstance(v, AttrDict):
v.immutable(is_immutable)
for v in self.values():
if isinstance(v, AttrDict):
v.immutable(is_immutable)
def is_immutable(self):
return self.__dict__[AttrDict.IMMUTABLE]
================================================
FILE: benchmarks/GSCNN-master/utils/f_boundary.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
# Code adapted from:
# https://github.com/fperazzi/davis/blob/master/python/lib/davis/measures/f_boundary.py
#
# Source License
#
# BSD 3-Clause License
#
# Copyright (c) 2017,
# All rights reserved.
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright notice, this
# list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright notice,
# this list of conditions and the following disclaimer in the documentation
# and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the names of its
# contributors may be used to endorse or promote products derived from
# this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.s
##############################################################################
#
# Based on:
# ----------------------------------------------------------------------------
# A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation
# Copyright (c) 2016 Federico Perazzi
# Licensed under the BSD License [see LICENSE for details]
# Written by Federico Perazzi
# ----------------------------------------------------------------------------
"""
import numpy as np
from multiprocessing import Pool
from tqdm import tqdm
""" Utilities for computing, reading and saving benchmark evaluation."""
def eval_mask_boundary(seg_mask,gt_mask,num_classes,num_proc=10,bound_th=0.008):
"""
Compute F score for a segmentation mask
Arguments:
seg_mask (ndarray): segmentation mask prediction
gt_mask (ndarray): segmentation mask ground truth
num_classes (int): number of classes
Returns:
F (float): mean F score across all classes
Fpc (listof float): F score per class
"""
p = Pool(processes=num_proc)
batch_size = seg_mask.shape[0]
Fpc = np.zeros(num_classes)
Fc = np.zeros(num_classes)
for class_id in tqdm(range(num_classes)):
args = [((seg_mask[i] == class_id).astype(np.uint8),
(gt_mask[i] == class_id).astype(np.uint8),
gt_mask[i] == 255,
bound_th)
for i in range(batch_size)]
temp = p.map(db_eval_boundary_wrapper, args)
temp = np.array(temp)
Fs = temp[:,0]
_valid = ~np.isnan(Fs)
Fc[class_id] = np.sum(_valid)
Fs[np.isnan(Fs)] = 0
Fpc[class_id] = sum(Fs)
return Fpc, Fc
#def db_eval_boundary_wrapper_wrapper(args):
# seg_mask, gt_mask, class_id, batch_size, Fpc = args
# print("class_id:" + str(class_id))
# p = Pool(processes=10)
# args = [((seg_mask[i] == class_id).astype(np.uint8),
# (gt_mask[i] == class_id).astype(np.uint8))
# for i in range(batch_size)]
# Fs = p.map(db_eval_boundary_wrapper, args)
# Fpc[class_id] = sum(Fs)
# return
def db_eval_boundary_wrapper(args):
foreground_mask, gt_mask, ignore, bound_th = args
return db_eval_boundary(foreground_mask, gt_mask,ignore, bound_th)
def db_eval_boundary(foreground_mask,gt_mask, ignore_mask,bound_th=0.008):
"""
Compute mean,recall and decay from per-frame evaluation.
Calculates precision/recall for boundaries between foreground_mask and
gt_mask using morphological operators to speed it up.
Arguments:
foreground_mask (ndarray): binary segmentation image.
gt_mask (ndarray): binary annotated image.
Returns:
F (float): boundaries F-measure
P (float): boundaries precision
R (float): boundaries recall
"""
assert np.atleast_3d(foreground_mask).shape[2] == 1
bound_pix = bound_th if bound_th >= 1 else \
np.ceil(bound_th*np.linalg.norm(foreground_mask.shape))
#print(bound_pix)
#print(gt.shape)
#print(np.unique(gt))
foreground_mask[ignore_mask] = 0
gt_mask[ignore_mask] = 0
# Get the pixel boundaries of both masks
fg_boundary = seg2bmap(foreground_mask);
gt_boundary = seg2bmap(gt_mask);
from skimage.morphology import binary_dilation,disk
fg_dil = binary_dilation(fg_boundary,disk(bound_pix))
gt_dil = binary_dilation(gt_boundary,disk(bound_pix))
# Get the intersection
gt_match = gt_boundary * fg_dil
fg_match = fg_boundary * gt_dil
# Area of the intersection
n_fg = np.sum(fg_boundary)
n_gt = np.sum(gt_boundary)
#% Compute precision and recall
if n_fg == 0 and n_gt > 0:
precision = 1
recall = 0
elif n_fg > 0 and n_gt == 0:
precision = 0
recall = 1
elif n_fg == 0 and n_gt == 0:
precision = 1
recall = 1
else:
precision = np.sum(fg_match)/float(n_fg)
recall = np.sum(gt_match)/float(n_gt)
# Compute F measure
if precision + recall == 0:
F = 0
else:
F = 2*precision*recall/(precision+recall);
return F, precision
def seg2bmap(seg,width=None,height=None):
"""
From a segmentation, compute a binary boundary map with 1 pixel wide
boundaries. The boundary pixels are offset by 1/2 pixel towards the
origin from the actual segment boundary.
Arguments:
seg : Segments labeled from 1..k.
width : Width of desired bmap <= seg.shape[1]
height : Height of desired bmap <= seg.shape[0]
Returns:
bmap (ndarray): Binary boundary map.
David Martin
January 2003
"""
seg = seg.astype(np.bool)
seg[seg>0] = 1
assert np.atleast_3d(seg).shape[2] == 1
width = seg.shape[1] if width is None else width
height = seg.shape[0] if height is None else height
h,w = seg.shape[:2]
ar1 = float(width) / float(height)
ar2 = float(w) / float(h)
assert not (width>w | height>h | abs(ar1-ar2)>0.01),\
'Can''t convert %dx%d seg to %dx%d bmap.'%(w,h,width,height)
e = np.zeros_like(seg)
s = np.zeros_like(seg)
se = np.zeros_like(seg)
e[:,:-1] = seg[:,1:]
s[:-1,:] = seg[1:,:]
se[:-1,:-1] = seg[1:,1:]
b = seg^e | seg^s | seg^se
b[-1,:] = seg[-1,:]^e[-1,:]
b[:,-1] = seg[:,-1]^s[:,-1]
b[-1,-1] = 0
if w == width and h == height:
bmap = b
else:
bmap = np.zeros((height,width))
for x in range(w):
for y in range(h):
if b[y,x]:
j = 1+floor((y-1)+height / h)
i = 1+floor((x-1)+width / h)
bmap[j,i] = 1;
return bmap
================================================
FILE: benchmarks/GSCNN-master/utils/image_page.py
================================================
"""
Copyright (C) 2019 NVIDIA Corporation. All rights reserved.
Licensed under the CC BY-NC-SA 4.0 license (https://creativecommons.org/licenses/by-nc-sa/4.0/legalcode).
"""
import glob
import os
class ImagePage(object):
'''
This creates an HTML page of embedded images, useful for showing evaluation results.
Usage:
ip = ImagePage(html_fn)
# Add a table with N images ...
ip.add_table((img, descr), (img, descr), ...)
# Generate html page
ip.write_page()
'''
def __init__(self, experiment_name, html_filename):
self.experiment_name = experiment_name
self.html_filename = html_filename
self.outfile = open(self.html_filename, 'w')
self.items = []
def _print_header(self):
header = '''
Experiment = {}
'''.format(self.experiment_name)
self.outfile.write(header)
def _print_footer(self):
self.outfile.write('''
''')
def _print_table_header(self, table_name):
table_hdr = '''