[
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2021 Chen Min\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# ORFD: A Dataset and Benchmark for Off-Road Freespace Detection\n\nRepository for our ICRA 2022 paper [\"ORFD: A Dataset and Benchmark for Off-Road Freespace Detection\"](https://arxiv.org/abs/2206.09907).\n\nEmail:mincheng@ict.ac.cn\n\n## 📰 News\n\n* **[2025.12.29]**  Our new dataset [ORAD-3D](https://github.com/chaytonmin/ORAD-3D-Dataset-For-Off-Road-Autonomous-Driving) for off-road autonomous driving is publicly released.\n\n## Introduction\nFreespace detection is an essential component of autonomous driving technology and plays an important role in trajectory planning. In the last decade, deep learning based freespace detection methods have been proved feasible. However, these efforts were focused on urban road environments and few deep learning based methods were specifically designed for off-road freespace detection due to the lack of off-road dataset and benchmark. In this paper, we present the ORFD dataset, which, to our knowledge, is the first off-road freespace detection dataset. The dataset was collected in different scenes (woodland, farmland, grassland and countryside), ndifferent weather conditions (sunny, rainy, foggy and snowy) and different light conditions (bright light, daylight, twilight, darkness), which totally contains 12,198 LiDAR point cloud and RGB image pairs with the traversable area, non-traversable area and unreachable area annotated in detail. We propose\na novel network named OFF-Net, which unifies Transformer architecture to aggregate local and global information, to meet the requirement of large receptive fields for freespace detection task. We also propose the cross-attention to dynamically fuse LiDAR and RGB image information for accurate off-road freespace detection.\n<p align=\"center\">\n<img src=\"doc/demo1.gif\" width=\"100%\"/>demo 1\n</p>\n<p align=\"center\">\n<img src=\"doc/demo2.gif\" width=\"100%\"/>demo 2\n</p>\n<p align=\"center\">\n<img src=\"doc/demo3.gif\" width=\"100%\"/>demo 3\n</p>\n<p align=\"center\">\n<img src=\"doc/flowchart.png\" width=\"75%\"/>\n</p>\n\n\n## Requirements\n\n- python 3.6\n- pytorch 1.7\n- pip install mmcv, mmsegmentation\n- pip install mmcv-full==1.3.16 -f https://download.openmmlab.com/mmcv/dist/cu110/torch1.7.0/index.html\n\n## Pretrained models\n\nThe pretrained models of our OFF-Net trained on ORFD dataset can be download [here](https://drive.google.com/drive/folders/1lnm2M1HEkVs9W3-FSEX3ddE9GYz4rqCU). \n\n## Prepare data\n\nThe proposed off-road freespace detection dataset ORFD can be found [torrent](https://academictorrents.com/collection/traversability-analysis-datasets), [BaiduYun](https://pan.baidu.com/s/1GCy2unPzs4K7_PBgZLPO4w?pwd=1234) (code:1234, about 30GB), and [Google Drive](https://github.com/chaytonmin/Off-Road-Freespace-Detection/issues/6#issuecomment-1347324143). Extract and organize as follows:\n\n```\n|-- datasets\n |  |-- ORFD\n |  |  |-- training\n |  |  |  |-- sequence   |-- calib\n |  |  |                 |-- sparse_depth\n |  |  |                 |-- dense_depth\n |  |  |                 |-- lidar_data\n |  |  |                 |-- image_data\n |  |  |                 |-- gt_image\n ......\n |  |  |-- validation\n ......\n |  |  |-- testing\n ......\n```\nThe LiDAR coordinate system is x facing left, y facing forward, and z facing up.\nGenerate depth map: python road_hesai40_process.py\n\n## Usage\n\n### Demo ###\n```\nbash ./scripts/demo.sh\n```\n### Training\n```\nbash ./scripts/train.sh\n```\n### Testing\n\n```\nbash ./scripts/test.sh\n```\n##  License\n\nOur code and dataset are released under the Apache 2.0 license.\n\n## Acknowledgement\n\nThis repository is based on [SNE-RoadSeg](https://github.com/hlwang1124/SNE-RoadSeg) [1], and  [SegFormer](https://github.com/NVlabs/SegFormer) [2].\n\n## References\n\n[1] Fan, Rui, et al. \"Sne-roadseg: Incorporating surface normal information into semantic segmentation for accurate freespace detection.\" *European Conference on Computer Vision*. Springer, Cham, 2020. \n\n[2] Xie, Enze, et al. \"SegFormer: Simple and efficient design for semantic segmentation with transformers.\" Advances in Neural Information Processing Systems 34 (2021).\n"
  },
  {
    "path": "data/ORFD_dataset.py",
    "content": "import os.path\nimport torchvision.transforms as transforms\nimport torch\nimport cv2\nimport numpy as np\nimport glob\nfrom data.base_dataset import BaseDataset\nfrom models.sne_model import SNE\n\n\nclass orfdCalibInfo():\n    \"\"\"\n    Read calibration files in the ORFD dataset,\n    we need to use the intrinsic parameter\n    \"\"\"\n    def __init__(self, filepath):\n        \"\"\"\n        Args:\n            filepath ([str]): calibration file path (AAA.txt)\n        \"\"\"\n        self.data = self._load_calib(filepath)\n\n    def get_cam_param(self):\n        \"\"\"\n        Returns:\n            [numpy.array]: intrinsic parameter \n        \"\"\"\n        return self.data['K']\n\n    def _load_calib(self, filepath):\n        rawdata = self._read_calib_file(filepath)\n        data = {}\n        K = np.reshape(rawdata['cam_K'], (3,3))\n        data['K'] = K\n        return data\n\n    def _read_calib_file(self, filepath):\n        \"\"\"Read in a calibration file and parse into a dictionary.\"\"\"\n        data = {}\n\n        with open(filepath, 'r') as f:\n            for line in f.readlines():\n                key, value = line.split(':', 1)\n                # The only non-float values in these files are dates, which\n                # we don't care about anyway\n                try:\n                    data[key] = np.array([float(x) for x in value.split()])\n                except ValueError:\n                    pass\n        return data\n\n\nclass orfddataset(BaseDataset):\n    \"\"\"dataloader for ORFD dataset\"\"\"\n    @staticmethod\n    def modify_commandline_options(parser, is_train):\n        return parser\n\n    def initialize(self, opt):\n        self.opt = opt\n        self.batch_size = opt.batch_size\n        self.root = opt.dataroot # path for the dataset\n        self.use_sne = opt.use_sne\n        self.num_labels = 2\n        self.use_size = (opt.useWidth, opt.useHeight)\n        if self.use_sne:\n            self.sne_model = SNE()\n\n        if opt.phase == \"train\":\n            self.image_list = sorted(glob.glob(os.path.join(self.root, 'training', '*','image_data', '*.png'))) # shape: 1280*720\n        elif opt.phase == \"val\":\n            self.image_list = sorted(glob.glob(os.path.join(self.root, 'validation', '*','image_data', '*.png')))\n        else:\n            self.image_list = sorted(glob.glob(os.path.join(self.root, 'testing', '*','image_data', '*.png')))\n\n    def __getitem__(self, index):\n        useDir = \"/\".join(self.image_list[index].split('/')[:-2])\n        name = self.image_list[index].split('/')[-1]\n\n        rgb_image = cv2.cvtColor(cv2.imread(os.path.join(useDir, 'image_data', name)), cv2.COLOR_BGR2RGB)\n        depth_image = cv2.imread(os.path.join(useDir, 'dense_depth', name), cv2.IMREAD_ANYDEPTH)\n        oriHeight, oriWidth, _ = rgb_image.shape\n        if self.opt.phase == 'test' and self.opt.no_label:\n            # Since we have no gt label (e.g., kitti submission), we generate pseudo gt labels to\n            # avoid destroying the code architecture\n            label = np.zeros((oriHeight, oriWidth), dtype=np.uint8)\n        else:\n            #label_image = cv2.cvtColor(cv2.imread(os.path.join(useDir, 'gt_image_2', name[:-10]+'road_'+name[-10:])), cv2.COLOR_BGR2RGB)\n            label_img_name = name.split('.')[0]+\"_fillcolor.png\"\n            label_dir = os.path.join(useDir, 'gt_image', label_img_name)\n            label_image = cv2.cvtColor(cv2.imread(label_dir), cv2.COLOR_BGR2RGB)\n            \n            label = np.zeros((oriHeight, oriWidth), dtype=np.uint8)\n            label[label_image[:,:,2] > 200] = 1\n\n\n        # resize image to enable sizes divide 32\n        rgb_image = cv2.resize(rgb_image, self.use_size)\n        label = cv2.resize(label, (int(self.use_size[0]/4), int(self.use_size[1]/4)), interpolation=cv2.INTER_NEAREST)\n\n        # another_image will be normal when using SNE, otherwise will be depth\n        if self.use_sne:\n            calib = orfdCalibInfo(os.path.join(useDir, 'calib', name.split('.')[0] +'.txt'))\n            camParam = torch.tensor(calib.get_cam_param(), dtype=torch.float32)\n            camParam[1, 2] = camParam[1, 2] - 8  # 720-16=704\n            \n            #normal = self.sne_model(torch.tensor(depth_image.astype(np.float32)/1000), camParam)\n            normal = self.sne_model(torch.tensor(depth_image.astype(np.float32)/256), camParam)\n            another_image = normal.cpu().numpy()\n            another_image = np.transpose(another_image, [1, 2, 0])\n            another_image = cv2.resize(another_image, self.use_size)\n        else:\n            another_image = depth_image.astype(np.float32)/65535\n            another_image = cv2.resize(another_image, self.use_size)\n            another_image = another_image[:,:,np.newaxis]\n\n        label[label > 0] = 1\n        rgb_image = rgb_image.astype(np.float32) / 255\n\n        rgb_image = transforms.ToTensor()(rgb_image)\n        another_image = transforms.ToTensor()(another_image)\n\n        label = torch.from_numpy(label)\n        label = label.type(torch.LongTensor)\n\n        # return a dictionary containing useful information\n        # input rgb images, another images and labels for training;\n        # 'path': image name for saving predictions\n        # 'oriSize': original image size for evaluating and saving predictions\n        return {'rgb_image': rgb_image, 'another_image': another_image, 'label': label,\n                'path': name, 'oriSize': (oriWidth, oriHeight)}\n\n    def __len__(self):\n        return len(self.image_list)\n\n    def name(self):\n        return 'orfd'\n"
  },
  {
    "path": "data/__init__.py",
    "content": "import importlib\nimport torch.utils.data\nfrom data.base_data_loader import BaseDataLoader\nfrom data.base_dataset import BaseDataset\nimport numpy\n\n\ndef find_dataset_using_name(dataset_name):\n    # Given the option --dataset [datasetname],\n    # the file \"data/datasetname_dataset.py\"\n    # will be imported.\n    dataset_filename = \"data.\" + dataset_name + \"_dataset\"\n    datasetlib = importlib.import_module(dataset_filename)\n\n    # In the file, the class called DatasetNameDataset() will\n    # be instantiated. It has to be a subclass of BaseDataset,\n    # and it is case-insensitive.\n    dataset = None\n    target_dataset_name = dataset_name.replace('_', '') + 'dataset'\n    for name, cls in datasetlib.__dict__.items():\n        if name.lower() == target_dataset_name.lower() \\\n           and issubclass(cls, BaseDataset):\n            dataset = cls\n\n    if dataset is None:\n        print(\"In %s.py, there should be a subclass of BaseDataset with class name that matches %s in lowercase.\" % (dataset_filename, target_dataset_name))\n        exit(0)\n\n    return dataset\n\ndef get_option_setter(dataset_name):\n    dataset_class = find_dataset_using_name(dataset_name)\n    return dataset_class.modify_commandline_options\n\ndef create_dataset(opt):\n    dataset = find_dataset_using_name(opt.dataset)\n    instance = dataset()\n    instance.initialize(opt)\n    print(\"dataset [%s] was created\" % (instance.name()))\n    return instance\n\ndef CreateDataLoader(opt):\n    data_loader = CustomDatasetDataLoader()\n    data_loader.initialize(opt)\n    return data_loader\n\n\n# Wrapper class of Dataset class that performs\n# multi-threaded data loading\nclass CustomDatasetDataLoader(BaseDataLoader):\n    def name(self):\n        return 'CustomDatasetDataLoader'\n\n    def initialize(self, opt):\n        BaseDataLoader.initialize(self, opt)\n        self.dataset = create_dataset(opt)\n\n        '''\n        self.dataloader = torch.utils.data.DataLoader(\n            self.dataset,\n            batch_size=opt.batch_size,\n            shuffle=not opt.serial_batches,\n            num_workers=int(opt.num_threads),\n            worker_init_fn=lambda worker_id: numpy.random.seed(opt.seed + worker_id))\n        '''\n        \n        self.dataloader = torch.utils.data.DataLoader(\n            self.dataset,\n            batch_size=opt.batch_size,\n            shuffle=not opt.serial_batches,\n            num_workers= 0,\n            worker_init_fn=lambda worker_id: numpy.random.seed(opt.seed + worker_id))\n\n    def load_data(self):\n        return self\n\n    def __len__(self):\n        return len(self.dataset)\n\n    def __iter__(self):\n        for i, data in enumerate(self.dataloader):\n            yield data\n"
  },
  {
    "path": "data/base_data_loader.py",
    "content": "class BaseDataLoader():\n    def __init__(self):\n        pass\n\n    def initialize(self, opt):\n        self.opt = opt\n        pass\n\n    def load_data():\n        return None\n"
  },
  {
    "path": "data/base_dataset.py",
    "content": "import torch.utils.data as data\n\n\nclass BaseDataset(data.Dataset):\n    def __init__(self):\n        super(BaseDataset, self).__init__()\n\n    def name(self):\n        return 'BaseDataset'\n\n    @staticmethod\n    def modify_commandline_options(parser, is_train):\n        return parser\n\n    def initialize(self, opt):\n        pass\n\n    def __len__(self):\n        return 0\n"
  },
  {
    "path": "datasets/palette.txt",
    "content": "0\t0\t0\n0\t128\t0\n128\t0\t0\n128\t128\t0\n0\t0\t128\n128\t0\t128\n0\t128\t128\n128\t128\t128\n64\t0\t0\n192\t0\t0\n64\t128\t0\n192\t128\t0\n64\t0\t128\n192\t0\t128\n64\t128\t128\n192\t128\t128\n0\t64\t0\n128\t64\t0\n0\t192\t0\n128\t192\t0\n0\t64\t128\n128\t64\t128\n0\t192\t128\n128\t192\t128\n64\t64\t0\n192\t64\t0\n64\t192\t0\n192\t192\t0\n64\t64\t128\n192\t64\t128\n64\t192\t128\n192\t192\t128\n0\t0\t64\n128\t0\t64\n0\t128\t64\n128\t128\t64\n0\t0\t192\n128\t0\t192\n0\t128\t192\n128\t128\t192\n64\t0\t64\n192\t0\t64\n64\t128\t64\n192\t128\t64\n64\t0\t192\n192\t0\t192\n64\t128\t192\n192\t128\t192\n0\t64\t64\n128\t64\t64\n0\t192\t64\n128\t192\t64\n0\t64\t192\n128\t64\t192\n0\t192\t192\n128\t192\t192\n64\t64\t64\n192\t64\t64\n64\t192\t64\n192\t192\t64\n64\t64\t192\n192\t64\t192\n64\t192\t192\n192\t192\t192\n32\t0\t0\n160\t0\t0\n32\t128\t0\n160\t128\t0\n32\t0\t128\n160\t0\t128\n32\t128\t128\n160\t128\t128\n96\t0\t0\n224\t0\t0\n96\t128\t0\n224\t128\t0\n96\t0\t128\n224\t0\t128\n96\t128\t128\n224\t128\t128\n32\t64\t0\n160\t64\t0\n32\t192\t0\n160\t192\t0\n32\t64\t128\n160\t64\t128\n32\t192\t128\n160\t192\t128\n96\t64\t0\n224\t64\t0\n96\t192\t0\n224\t192\t0\n96\t64\t128\n224\t64\t128\n96\t192\t128\n224\t192\t128\n32\t0\t64\n160\t0\t64\n32\t128\t64\n160\t128\t64\n32\t0\t192\n160\t0\t192\n32\t128\t192\n160\t128\t192\n96\t0\t64\n224\t0\t64\n96\t128\t64\n224\t128\t64\n96\t0\t192\n224\t0\t192\n96\t128\t192\n224\t128\t192\n32\t64\t64\n160\t64\t64\n32\t192\t64\n160\t192\t64\n32\t64\t192\n160\t64\t192\n32\t192\t192\n160\t192\t192\n96\t64\t64\n224\t64\t64\n96\t192\t64\n224\t192\t64\n96\t64\t192\n224\t64\t192\n96\t192\t192\n224\t192\t192\n0\t32\t0\n128\t32\t0\n0\t160\t0\n128\t160\t0\n0\t32\t128\n128\t32\t128\n0\t160\t128\n128\t160\t128\n64\t32\t0\n192\t32\t0\n64\t160\t0\n192\t160\t0\n64\t32\t128\n192\t32\t128\n64\t160\t128\n192\t160\t128\n0\t96\t0\n128\t96\t0\n0\t224\t0\n128\t224\t0\n0\t96\t128\n128\t96\t128\n0\t224\t128\n128\t224\t128\n64\t96\t0\n192\t96\t0\n64\t224\t0\n192\t224\t0\n64\t96\t128\n192\t96\t128\n64\t224\t128\n192\t224\t128\n0\t32\t64\n128\t32\t64\n0\t160\t64\n128\t160\t64\n0\t32\t192\n128\t32\t192\n0\t160\t192\n128\t160\t192\n64\t32\t64\n192\t32\t64\n64\t160\t64\n192\t160\t64\n64\t32\t192\n192\t32\t192\n64\t160\t192\n192\t160\t192\n0\t96\t64\n128\t96\t64\n0\t224\t64\n128\t224\t64\n0\t96\t192\n128\t96\t192\n0\t224\t192\n128\t224\t192\n64\t96\t64\n192\t96\t64\n64\t224\t64\n192\t224\t64\n64\t96\t192\n192\t96\t192\n64\t224\t192\n192\t224\t192\n32\t32\t0\n160\t32\t0\n32\t160\t0\n160\t160\t0\n32\t32\t128\n160\t32\t128\n32\t160\t128\n160\t160\t128\n96\t32\t0\n224\t32\t0\n96\t160\t0\n224\t160\t0\n96\t32\t128\n224\t32\t128\n96\t160\t128\n224\t160\t128\n32\t96\t0\n160\t96\t0\n32\t224\t0\n160\t224\t0\n32\t96\t128\n160\t96\t128\n32\t224\t128\n160\t224\t128\n96\t96\t0\n224\t96\t0\n96\t224\t0\n224\t224\t0\n96\t96\t128\n224\t96\t128\n96\t224\t128\n224\t224\t128\n32\t32\t64\n160\t32\t64\n32\t160\t64\n160\t160\t64\n32\t32\t192\n160\t32\t192\n32\t160\t192\n160\t160\t192\n96\t32\t64\n224\t32\t64\n96\t160\t64\n224\t160\t64\n96\t32\t192\n224\t32\t192\n96\t160\t192\n224\t160\t192\n32\t96\t64\n160\t96\t64\n32\t224\t64\n160\t224\t64\n32\t96\t192\n160\t96\t192\n32\t224\t192\n160\t224\t192\n96\t96\t64\n224\t96\t64\n96\t224\t64\n224\t224\t64\n96\t96\t192\n224\t96\t192\n96\t224\t192\n224\t224\t192\n"
  },
  {
    "path": "demo.py",
    "content": "import os\nfrom options.test_options import TestOptions\nfrom models import create_model\nfrom util.util import tensor2labelim, tensor2confidencemap\nfrom models.sne_model import SNE\nimport torchvision.transforms as transforms\nimport torch\nimport numpy as np\nimport cv2\nimport copy\nimport tqdm\nimport glob\n\n\nclass dataset():\n    def __init__(self):\n        self.num_labels = 2\n\n\ndef load_calib(filepath):\n\n    rawdata = read_calib_file(filepath)\n    K = np.reshape(rawdata['cam_K'], (3,3))\n    K[1, 2] = K[1, 2] - 8  # 720-16=704\n\n    return K\n\ndef read_calib_file(filepath):\n    \"\"\"Read in a calibration file and parse into a dictionary.\"\"\"\n    data = {}\n\n    with open(filepath, 'r') as f:\n        for line in f.readlines():\n            key, value = line.split(':', 1)\n            # The only non-float values in these files are dates, which\n            # we don't care about anyway\n            try:\n                data[key] = np.array([float(x) for x in value.split()])\n            except ValueError:\n                pass\n    return data\n\n\nif __name__ == '__main__':\n    opt = TestOptions().parse()\n    opt.num_threads = 1\n    opt.batch_size = 1\n    opt.serial_batches = True  # no shuffle\n    opt.isTrain = False\n\n    example_dataset = dataset()\n    model = create_model(opt, example_dataset)\n    model.setup(opt)\n    model.eval()\n\n    # if you want to use your own data, please modify seq_name, image_data, dense_depth, calib and use_size correspondingly.\n    use_size = (1280, 704)\n    root_dir = 'examples'\n    seq_name = 'y0613_1242' # Need to be changed\n\n    save_dir = os.path.join(root_dir, seq_name, 'results')\n    os.makedirs(save_dir, exist_ok=True)\n\n    img_list = glob.glob(os.path.join(root_dir, seq_name, 'image_data', '*.png'))\n\n    for img_list_i in img_list:\n        img_name = img_list_i.split('/')[-1].split('.')[0]\n        print('img_name:',img_name)\n        rgb_image = cv2.cvtColor(cv2.imread(os.path.join(root_dir, seq_name, 'image_data', img_name+'.png')), cv2.COLOR_BGR2RGB)\n        depth_image = cv2.imread(os.path.join(root_dir, seq_name, 'dense_depth', img_name+'.png'), cv2.IMREAD_ANYDEPTH) \n        oriHeight, oriWidth, _ = rgb_image.shape\n        oriSize = (oriWidth, oriHeight)\n        rgb_image_save = copy.copy(rgb_image)\n        rgb_image_save_ = copy.copy(rgb_image)\n\n        # resize image to enable sizes divide 32\n\n        rgb_image = cv2.resize(rgb_image, use_size)\n        \n        rgb_image = rgb_image.astype(np.float32) / 255\n\n        # compute normal using SNE\n        sne_model = SNE()\n        calib_dir = os.path.join(root_dir, seq_name, 'calib', img_name+'.txt')\n        camParam = load_calib(calib_dir)\n        \n        normal = sne_model(torch.tensor(depth_image.astype(np.float32)/256), camParam)\n        normal_image = normal.cpu().numpy()\n        normal_image = np.transpose(normal_image, [1, 2, 0])\n        cv2.imwrite(os.path.join(os.path.join(save_dir, img_name+'_normal.png')), cv2.cvtColor(255*(1+normal_image)/2, cv2.COLOR_RGB2BGR))\n        normal_image_save = cv2.cvtColor(255*(1+normal_image)/2, cv2.COLOR_RGB2BGR)\n        normal_image = cv2.resize(normal_image, use_size)\n\n        rgb_image = transforms.ToTensor()(rgb_image).unsqueeze(dim=0)\n        normal_image = transforms.ToTensor()(normal_image).unsqueeze(dim=0)\n\n        with torch.no_grad():\n            pred = model.netRoadSeg(rgb_image, normal_image)\n\n            palet_file = 'datasets/palette.txt'\n            impalette = list(np.genfromtxt(palet_file, dtype=np.uint8).reshape(3*256))\n            pred_img = tensor2labelim(pred, impalette)\n            pred_img = cv2.resize(pred_img, oriSize)\n            prob_map = tensor2confidencemap(pred)\n            prob_map = cv2.resize(prob_map, oriSize)\n            cv2.imwrite(os.path.join(os.path.join(save_dir, img_name+'_pred.png')), pred_img)\n            cv2.imwrite(os.path.join(os.path.join(save_dir, img_name+'_probmap.png')), prob_map)\n\n            rgb_image_save = rgb_image_save.transpose(2,0,1)\n            pred_img = pred_img.transpose(2,0,1)\n\n            inds = prob_map>128\n\n            rgb_image_save[:,inds] = pred_img[:,inds]\n            rgb_image_save = rgb_image_save.transpose(1,2,0)\n\n            cv2.imwrite(os.path.join(os.path.join(save_dir, img_name+'_mask.png')), rgb_image_save)\n\n            img_cat = np.concatenate((rgb_image_save_, normal_image_save, rgb_image_save),axis=1)\n            img_cat = cv2.resize(img_cat, (int(img_cat.shape[1]*0.5),int(img_cat.shape[0]*0.5)))\n            cv2.imwrite(os.path.join(os.path.join(root_dir, seq_name, img_name+'.png')), img_cat)\n            #cv2.imshow('img_cat',img_cat)\n            #cv2.waitKey(0)\n            #cv2.destroyAllWindows()\n    print('Done!')\n"
  },
  {
    "path": "examples/y0613_1242/calib/1623721491895.txt",
    "content": "cam_K: 1487.752825 0.0 625.011716 0.0 1471.063437 376.483801 0.0 0.0 1.0 \ncam_RT: 0.9990242442017794 -0.03870134489718302 -0.02127828471028577 0.0019129833672195673 -0.025608957095939373 -0.1150890189011354 -0.9930250243799661 0.1408030241727829 0.03598250704885683 0.9926009891348704 -0.11596782120465826 0.08308956027030945 0.0 0.0 0.0 1.0 \nlidar_R: -0.996479 -0.0834525 0.00805512 0.0837999 -0.994374 0.0647862 0.00260323 0.0652331 0.997867 \nlidar_T: 0 0 0 "
  },
  {
    "path": "examples/y0613_1242/calib/1623721491991.txt",
    "content": "cam_K: 1487.752825 0.0 625.011716 0.0 1471.063437 376.483801 0.0 0.0 1.0 \ncam_RT: 0.9990242442017794 -0.03870134489718302 -0.02127828471028577 0.0019129833672195673 -0.025608957095939373 -0.1150890189011354 -0.9930250243799661 0.1408030241727829 0.03598250704885683 0.9926009891348704 -0.11596782120465826 0.08308956027030945 0.0 0.0 0.0 1.0 \nlidar_R: -0.996479 -0.0834525 0.00805512 0.0837999 -0.994374 0.0647862 0.00260323 0.0652331 0.997867 \nlidar_T: 0 0 0 "
  },
  {
    "path": "examples/y0613_1242/calib/1623721492091.txt",
    "content": "cam_K: 1487.752825 0.0 625.011716 0.0 1471.063437 376.483801 0.0 0.0 1.0 \ncam_RT: 0.9990242442017794 -0.03870134489718302 -0.02127828471028577 0.0019129833672195673 -0.025608957095939373 -0.1150890189011354 -0.9930250243799661 0.1408030241727829 0.03598250704885683 0.9926009891348704 -0.11596782120465826 0.08308956027030945 0.0 0.0 0.0 1.0 \nlidar_R: -0.996479 -0.0834525 0.00805512 0.0837999 -0.994374 0.0647862 0.00260323 0.0652331 0.997867 \nlidar_T: 0 0 0 "
  },
  {
    "path": "examples/y0613_1242/calib/1623721492191.txt",
    "content": "cam_K: 1487.752825 0.0 625.011716 0.0 1471.063437 376.483801 0.0 0.0 1.0 \ncam_RT: 0.9990242442017794 -0.03870134489718302 -0.02127828471028577 0.0019129833672195673 -0.025608957095939373 -0.1150890189011354 -0.9930250243799661 0.1408030241727829 0.03598250704885683 0.9926009891348704 -0.11596782120465826 0.08308956027030945 0.0 0.0 0.0 1.0 \nlidar_R: -0.996479 -0.0834525 0.00805512 0.0837999 -0.994374 0.0647862 0.00260323 0.0652331 0.997867 \nlidar_T: 0 0 0 "
  },
  {
    "path": "examples/y0613_1242/calib/1623721492290.txt",
    "content": "cam_K: 1487.752825 0.0 625.011716 0.0 1471.063437 376.483801 0.0 0.0 1.0 \ncam_RT: 0.9990242442017794 -0.03870134489718302 -0.02127828471028577 0.0019129833672195673 -0.025608957095939373 -0.1150890189011354 -0.9930250243799661 0.1408030241727829 0.03598250704885683 0.9926009891348704 -0.11596782120465826 0.08308956027030945 0.0 0.0 0.0 1.0 \nlidar_R: -0.996479 -0.0834525 0.00805512 0.0837999 -0.994374 0.0647862 0.00260323 0.0652331 0.997867 \nlidar_T: 0 0 0 "
  },
  {
    "path": "examples/y0613_1242/calib/1623721492790.txt",
    "content": "cam_K: 1487.752825 0.0 625.011716 0.0 1471.063437 376.483801 0.0 0.0 1.0 \ncam_RT: 0.9990242442017794 -0.03870134489718302 -0.02127828471028577 0.0019129833672195673 -0.025608957095939373 -0.1150890189011354 -0.9930250243799661 0.1408030241727829 0.03598250704885683 0.9926009891348704 -0.11596782120465826 0.08308956027030945 0.0 0.0 0.0 1.0 \nlidar_R: -0.996479 -0.0834525 0.00805512 0.0837999 -0.994374 0.0647862 0.00260323 0.0652331 0.997867 \nlidar_T: 0 0 0 "
  },
  {
    "path": "models/__init__.py",
    "content": "import importlib\nfrom models.base_model import BaseModel\n\n\ndef find_model_using_name(model_name):\n    # Given the option --model [modelname],\n    # the file \"models/modelname_model.py\"\n    # will be imported.\n    model_filename = \"models.\" + model_name + \"_model\"\n    modellib = importlib.import_module(model_filename)\n\n    # In the file, the class called ModelNameModel() will\n    # be instantiated. It has to be a subclass of BaseModel,\n    # and it is case-insensitive.\n    model = None\n    target_model_name = model_name.replace('_', '') + 'model'\n    for name, cls in modellib.__dict__.items():\n        if name.lower() == target_model_name.lower() \\\n           and issubclass(cls, BaseModel):\n            model = cls\n    \n    if model is None:\n        print(\"In %s.py, there should be a subclass of BaseModel with class name that matches %s in lowercase.\" % (model_filename, target_model_name))\n        exit(0)\n\n    return model\n\ndef get_option_setter(model_name):\n    model_class = find_model_using_name(model_name)\n    return model_class.modify_commandline_options\n\ndef create_model(opt, dataset):\n    model = find_model_using_name(opt.model)\n    \n    instance = model()\n    instance.initialize(opt, dataset)\n    print(\"model [%s] was created\" % (instance.name()))\n    return instance\n"
  },
  {
    "path": "models/base_model.py",
    "content": "import os\nimport torch\nfrom collections import OrderedDict\nfrom . import loss\n\n\nclass BaseModel():\n    # modify parser to add command line options,\n    # and also change the default values if needed\n    @staticmethod\n    def modify_commandline_options(parser, is_train):\n        return parser\n\n    def name(self):\n        return 'BaseModel'\n\n    def initialize(self, opt):\n        self.opt = opt\n        self.gpu_ids = opt.gpu_ids\n        self.isTrain = opt.isTrain\n        self.device = torch.device('cuda:{}'.format(self.gpu_ids[0])) if self.gpu_ids else torch.device('cpu')\n        self.save_dir = os.path.join(opt.checkpoints_dir, opt.name)\n        self.loss_names = []\n        self.model_names = []\n        self.visual_names = []\n        self.image_names = []\n        self.image_oriSize = []\n\n    def set_input(self, input):\n        self.input = input\n\n    def forward(self):\n        pass\n\n    # load and print networks; create schedulers\n    def setup(self, opt, parser=None):\n        if self.isTrain:\n            self.schedulers = [loss.get_scheduler(optimizer, opt) for optimizer in self.optimizers]\n\n        if not self.isTrain or opt.continue_train:\n            self.load_networks(opt.epoch)\n        self.print_networks(opt.verbose)\n\n    # make models eval mode during test time\n    def eval(self):\n        for name in self.model_names:\n            if isinstance(name, str):\n                net = getattr(self, 'net' + name)\n                net.eval()\n\n    def train(self):\n        for name in self.model_names:\n            if isinstance(name, str):\n                net = getattr(self, 'net' + name)\n                net.train()\n\n    # used in test time, wrapping `forward` in no_grad() so we don't save\n    # intermediate steps for backprop\n    def test(self):\n        with torch.no_grad():\n            self.forward()\n\n    # get image names\n    def get_image_names(self):\n        return self.image_names\n\n    # get image original size\n    def get_image_oriSize(self):\n        return self.image_oriSize\n\n    def optimize_parameters(self):\n        pass\n\n    # update learning rate (called once every epoch)\n    def update_learning_rate(self):\n        for scheduler in self.schedulers:\n            scheduler.step()\n        lr = self.optimizers[0].param_groups[0]['lr']\n        print('learning rate = %.7f' % lr)\n\n    # return visualization images. train.py will display these images in tensorboardX\n    def get_current_visuals(self):\n        visual_ret = OrderedDict()\n        for name in self.visual_names:\n            if isinstance(name, str):\n                visual_ret[name] = getattr(self, name)\n        return visual_ret\n\n    # return traning losses/errors. train.py will print out these errors as debugging information\n    def get_current_losses(self):\n        errors_ret = OrderedDict()\n        for name in self.loss_names:\n            if isinstance(name, str):\n                # float(...) works for both scalar tensor and float number\n                errors_ret[name] = float(getattr(self, 'loss_' + name))\n        return errors_ret\n\n    # save models to the disk\n    def save_networks(self, epoch):\n        for name in self.model_names:\n            if isinstance(name, str):\n                save_filename = '%s_net_%s.pth' % (epoch, name)\n                save_path = os.path.join(self.save_dir, save_filename)\n                net = getattr(self, 'net' + name)\n\n                if len(self.gpu_ids) > 0 and torch.cuda.is_available():\n                    torch.save(net.module.cpu().state_dict(), save_path)\n                    net.cuda(self.gpu_ids[0])\n                else:\n                    torch.save(net.cpu().state_dict(), save_path)\n\n    def __patch_instance_norm_state_dict(self, state_dict, module, keys, i=0):\n        key = keys[i]\n        if i + 1 == len(keys):  # at the end, pointing to a parameter/buffer\n            if module.__class__.__name__.startswith('InstanceNorm') and \\\n                    (key == 'running_mean' or key == 'running_var'):\n                if getattr(module, key) is None:\n                    state_dict.pop('.'.join(keys))\n            if module.__class__.__name__.startswith('InstanceNorm') and \\\n               (key == 'num_batches_tracked'):\n                state_dict.pop('.'.join(keys))\n        else:\n            self.__patch_instance_norm_state_dict(state_dict, getattr(module, key), keys, i + 1)\n\n    # load models from the disk\n    def load_networks(self, epoch):\n        for name in self.model_names:\n            if isinstance(name, str):\n                load_filename = '%s_net_%s.pth' % (epoch, name)\n                load_path = os.path.join(self.save_dir, load_filename)\n                net = getattr(self, 'net' + name)\n                if isinstance(net, torch.nn.DataParallel):\n                    net = net.module\n                print('loading the model from %s' % load_path)\n                # if you are using PyTorch newer than 0.4 (e.g., built from\n                # GitHub source), you can remove str() on self.device\n                state_dict = torch.load(load_path, map_location=str(self.device))\n                if hasattr(state_dict, '_metadata'):\n                    del state_dict._metadata\n\n                for key in list(state_dict.keys()):\n                    self.__patch_instance_norm_state_dict(state_dict, net, key.split('.'))\n                net.load_state_dict(state_dict)\n\n    # print network information\n    def print_networks(self, verbose):\n        print('---------- Networks initialized -------------')\n        for name in self.model_names:\n            if isinstance(name, str):\n                net = getattr(self, 'net' + name)\n                num_params = 0\n                for param in net.parameters():\n                    num_params += param.numel()\n                if verbose:\n                    print(net)\n                print('[Network %s] Total number of parameters : %.3f M' % (name, num_params / 1e6))\n        print('-----------------------------------------------')\n\n    # set requies_grad=Fasle to avoid computation\n    def set_requires_grad(self, nets, requires_grad=False):\n        if not isinstance(nets, list):\n            nets = [nets]\n        for net in nets:\n            if net is not None:\n                for param in net.parameters():\n                    param.requires_grad = requires_grad\n"
  },
  {
    "path": "models/loss.py",
    "content": "import torch\nimport torch.nn as nn\nfrom torch.nn import init\nimport torchvision\nfrom torch.optim import lr_scheduler\nimport torch.nn.functional as F\n\ndef get_scheduler(optimizer, opt):\n    if opt.lr_policy == 'lambda':\n        lambda_rule = lambda epoch: opt.lr_gamma ** ((epoch+1) // opt.lr_decay_epochs)\n        scheduler = lr_scheduler.LambdaLR(optimizer, lr_lambda=lambda_rule)\n    elif opt.lr_policy == 'step':\n        scheduler = lr_scheduler.StepLR(optimizer,step_size=opt.lr_decay_iters, gamma=0.1)\n    else:\n        return NotImplementedError('learning rate policy [%s] is not implemented', opt.lr_policy)\n    return scheduler\n\nclass SegmantationLoss(nn.Module):\n    def __init__(self, class_weights=None):\n        super(SegmantationLoss, self).__init__()\n        self.loss = nn.CrossEntropyLoss(weight=class_weights)\n    def __call__(self, output, target, pixel_average=True):\n        if pixel_average:\n            return self.loss(output, target) #/ target.data.sum()\n        else:\n            return self.loss(output, target)\n"
  },
  {
    "path": "models/roadseg_model.py",
    "content": "import torch\nfrom .base_model import BaseModel\nfrom . import loss\nfrom .transformer_models.backbones import transformer\nimport numpy as np\nimport os\n\n\nclass RoadSegModel(BaseModel):\n    def name(self):\n        return 'RoadSegModel'\n\n    @staticmethod\n    def modify_commandline_options(parser, is_train=True):\n        # changing the default values\n        if is_train:\n            parser.add_argument('--lambda_L1', type=float, default=100.0, help='weight for L1 loss')\n        return parser\n\n    def initialize(self, opt, dataset):\n        BaseModel.initialize(self, opt)\n        self.isTrain = opt.isTrain\n        # specify the training losses you want to print out. The program will call base_model.get_current_losses\n        self.loss_names = ['segmentation']\n        # specify the images you want to save/display. The program will call base_model.get_current_visuals\n        self.visual_names = ['rgb_image', 'another_image', 'label', 'output']\n        # specify the models you want to save to the disk. The program will call base_model.save_networks and base_model.load_networks\n        self.model_names = ['RoadSeg']\n\n        # load/define networks\n\n        # mix_transformer\n        self.netRoadSeg = transformer.define_RoadSeg(dataset.num_labels, use_sne=opt.use_sne, init_type=opt.init_type, init_gain= opt.init_gain, gpu_ids= self.gpu_ids)\n\n        # define loss functions\n        self.criterionSegmentation = loss.SegmantationLoss(class_weights=None).to(self.device)\n\n        if self.isTrain:\n            # initialize optimizers\n            self.optimizers = []\n            self.optimizer_RoadSeg = torch.optim.SGD(self.netRoadSeg.parameters(), lr=opt.lr, momentum=opt.momentum, weight_decay=opt.weight_decay)\n            #self.optimizer_RoadSeg = torch.optim.Adam(self.netRoadSeg.parameters(), lr=opt.lr, weight_decay=opt.weight_decay)\n            self.optimizers.append(self.optimizer_RoadSeg)\n            self.set_requires_grad(self.netRoadSeg, True)\n\n    def set_input(self, input):\n        self.rgb_image = input['rgb_image'].to(self.device)\n        self.another_image = input['another_image'].to(self.device)\n        self.label = input['label'].to(self.device)\n        self.image_names = input['path']\n        self.image_oriSize = input['oriSize']\n\n    def forward(self):\n        self.output = self.netRoadSeg(self.rgb_image, self.another_image)\n\n    def get_loss(self):\n        self.loss_segmentation = self.criterionSegmentation(self.output, self.label)\n\n    def backward(self):\n        self.loss_segmentation.backward()\n\n    def optimize_parameters(self):\n        self.forward()\n        self.optimizer_RoadSeg.zero_grad()\n        self.get_loss()\n        self.backward()\n        self.optimizer_RoadSeg.step()\n"
  },
  {
    "path": "models/sne_model.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass SNE(nn.Module):\n    \"\"\"SNE takes depth and camera intrinsic parameters as input,\n    and outputs normal estimations.\n    \"\"\"\n    def __init__(self):\n        super(SNE, self).__init__()\n\n    def forward(self, depth, camParam):\n        h,w = depth.size()\n        v_map, u_map = torch.meshgrid(torch.arange(h), torch.arange(w))\n        v_map = v_map.type(torch.float32)\n        u_map = u_map.type(torch.float32)\n\n        Z = depth   # h, w\n        Y = Z.mul((v_map - camParam[1,2])) / camParam[0,0]  # h, w\n        X = Z.mul((u_map - camParam[0,2])) / camParam[0,0]  # h, w\n        # half part\n        #Z[Y <= 0] = 0\n        #Y[Y <= 0] = 0 \n        Z[torch.isnan(Z)] = 0\n        D = torch.div(torch.ones(h, w), Z)  # h, w\n\n        Gx = torch.tensor([[0,0,0],[-1,0,1],[0,0,0]], dtype=torch.float32)\n        Gy = torch.tensor([[0,-1,0],[0,0,0],[0,1,0]], dtype=torch.float32)\n\n        Gu = F.conv2d(D.view(1,1,h,w), Gx.view(1,1,3,3), padding=1)\n        Gv = F.conv2d(D.view(1,1,h,w), Gy.view(1,1,3,3), padding=1)\n\n        nx_t = Gu * camParam[0,0]   # 1, 1, h, w\n        ny_t = Gv * camParam[1,1]   # 1, 1, h, w\n\n        phi = torch.atan(torch.div(ny_t, nx_t)) + torch.ones([1,1,h,w])*3.141592657\n        a = torch.cos(phi)\n        b = torch.sin(phi)\n\n        diffKernelArray = torch.tensor([[-1, 0, 0, 0, 1, 0, 0, 0, 0],\n                                        [ 0,-1, 0, 0, 1, 0, 0, 0, 0],\n                                        [ 0, 0,-1, 0, 1, 0, 0, 0, 0],\n                                        [ 0, 0, 0,-1, 1, 0, 0, 0, 0],\n                                        [ 0, 0, 0, 0, 1,-1, 0, 0, 0],\n                                        [ 0, 0, 0, 0, 1, 0,-1, 0, 0],\n                                        [ 0, 0, 0, 0, 1, 0, 0,-1, 0],\n                                        [ 0, 0, 0, 0, 1, 0, 0, 0,-1]], dtype=torch.float32)\n\n        sum_nx = torch.zeros((1,1,h,w), dtype=torch.float32)\n        sum_ny = torch.zeros((1,1,h,w), dtype=torch.float32)\n        sum_nz = torch.zeros((1,1,h,w), dtype=torch.float32)\n\n        for i in range(8):\n            diffKernel = diffKernelArray[i].view(1,1,3,3)\n            X_d = F.conv2d(X.view(1,1,h,w), diffKernel, padding=1)\n            Y_d = F.conv2d(Y.view(1,1,h,w), diffKernel, padding=1)\n            Z_d = F.conv2d(Z.view(1,1,h,w), diffKernel, padding=1)\n\n            nz_i = torch.div((torch.mul(nx_t, X_d) + torch.mul(ny_t, Y_d)), Z_d)\n            norm = torch.sqrt(torch.mul(nx_t, nx_t) + torch.mul(ny_t, ny_t) + torch.mul(nz_i, nz_i))\n            nx_t_i = torch.div(nx_t, norm)\n            ny_t_i = torch.div(ny_t, norm)\n            nz_t_i = torch.div(nz_i, norm)\n\n            nx_t_i[torch.isnan(nx_t_i)] = 0\n            ny_t_i[torch.isnan(ny_t_i)] = 0\n            nz_t_i[torch.isnan(nz_t_i)] = 0\n\n            sum_nx = sum_nx + nx_t_i\n            sum_ny = sum_ny + ny_t_i\n            sum_nz = sum_nz + nz_t_i\n\n        theta = -torch.atan(torch.div((torch.mul(sum_nx, a) + torch.mul(sum_ny, b)), sum_nz))\n        nx = torch.mul(torch.sin(theta), torch.cos(phi))\n        ny = torch.mul(torch.sin(theta), torch.sin(phi))\n        nz = torch.cos(theta)\n\n        nx[torch.isnan(nz)] = 0\n        ny[torch.isnan(nz)] = 0\n        nz[torch.isnan(nz)] = -1\n\n        sign = torch.ones((1,1,h,w), dtype=torch.float32)\n        sign[ny > 0] = -1\n\n        nx = torch.mul(nx, sign).squeeze(dim=0)\n        ny = torch.mul(ny, sign).squeeze(dim=0)\n        nz = torch.mul(nz, sign).squeeze(dim=0)\n\n        return torch.cat([nx, ny, nz], dim=0)\n"
  },
  {
    "path": "models/transformer_models/__init__.py",
    "content": ""
  },
  {
    "path": "models/transformer_models/backbones/__init__.py",
    "content": ""
  },
  {
    "path": "models/transformer_models/backbones/transformer.py",
    "content": "# ---------------------------------------------------------------\n# Copyright (c) 2021, NVIDIA Corporation. All rights reserved.\n#\n# This work is licensed under the NVIDIA Source Code License\n# ---------------------------------------------------------------\nimport torch\nimport torch.nn as nn\nfrom torch.nn import init\nimport torch.nn.functional as F\nfrom functools import partial\n\nfrom timm.models.layers import DropPath, to_2tuple, trunc_normal_\nfrom timm.models.registry import register_model\nfrom timm.models.vision_transformer import _cfg\nfrom mmseg.models.builder import BACKBONES\nfrom mmseg.utils import get_root_logger\nfrom mmcv.runner import load_checkpoint\nimport math\n\nfrom ..decode_heads.head import Head\n\nclass Mlp(nn.Module):\n    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):\n        super().__init__()\n        out_features = out_features or in_features\n        hidden_features = hidden_features or in_features\n        self.fc1 = nn.Linear(in_features, hidden_features)\n        self.dwconv = DWConv(hidden_features)\n        self.act = act_layer()\n        self.fc2 = nn.Linear(hidden_features, out_features)\n        self.drop = nn.Dropout(drop)\n\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            trunc_normal_(m.weight, std=.02)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n        elif isinstance(m, nn.Conv2d):\n            fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n            fan_out //= m.groups\n            m.weight.data.normal_(0, math.sqrt(2.0 / fan_out))\n            if m.bias is not None:\n                m.bias.data.zero_()\n\n    def forward(self, x, H, W):\n        x = self.fc1(x)\n        x = self.dwconv(x, H, W)\n        x = self.act(x)\n        x = self.drop(x)\n        x = self.fc2(x)\n        x = self.drop(x)\n        return x\n\n\nclass Attention(nn.Module):\n    def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0., sr_ratio=1):\n        super().__init__()\n        assert dim % num_heads == 0, f\"dim {dim} should be divided by num_heads {num_heads}.\"\n\n        self.dim = dim\n        self.num_heads = num_heads\n        head_dim = dim // num_heads\n        self.scale = qk_scale or head_dim ** -0.5\n\n        self.q = nn.Linear(dim, dim, bias=qkv_bias)\n        self.kv = nn.Linear(dim, dim * 2, bias=qkv_bias)\n        self.attn_drop = nn.Dropout(attn_drop)\n        self.proj = nn.Linear(dim, dim)\n        self.proj_drop = nn.Dropout(proj_drop)\n\n        self.sr_ratio = sr_ratio\n        if sr_ratio > 1:\n            self.sr = nn.Conv2d(dim, dim, kernel_size=sr_ratio, stride=sr_ratio)\n            self.norm = nn.LayerNorm(dim)\n\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            trunc_normal_(m.weight, std=.02)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n        elif isinstance(m, nn.Conv2d):\n            fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n            fan_out //= m.groups\n            m.weight.data.normal_(0, math.sqrt(2.0 / fan_out))\n            if m.bias is not None:\n                m.bias.data.zero_()\n\n    def forward(self, x, H, W):\n        B, N, C = x.shape\n        q = self.q(x).reshape(B, N, self.num_heads, C // self.num_heads).permute(0, 2, 1, 3)\n\n        if self.sr_ratio > 1:\n            x_ = x.permute(0, 2, 1).reshape(B, C, H, W)\n            x_ = self.sr(x_).reshape(B, C, -1).permute(0, 2, 1)\n            x_ = self.norm(x_)\n            kv = self.kv(x_).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)\n        else:\n            kv = self.kv(x).reshape(B, -1, 2, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)\n        k, v = kv[0], kv[1]\n\n        attn = (q @ k.transpose(-2, -1)) * self.scale\n        attn = attn.softmax(dim=-1)\n        attn = self.attn_drop(attn)\n\n        x = (attn @ v).transpose(1, 2).reshape(B, N, C)\n        x = self.proj(x)\n        x = self.proj_drop(x)\n\n        return x\n\n\nclass Block(nn.Module):\n\n    def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,\n                 drop_path=0., act_layer=nn.GELU, norm_layer=nn.LayerNorm, sr_ratio=1):\n        super().__init__()\n        self.norm1 = norm_layer(dim)\n        self.attn = Attention(\n            dim,\n            num_heads=num_heads, qkv_bias=qkv_bias, qk_scale=qk_scale,\n            attn_drop=attn_drop, proj_drop=drop, sr_ratio=sr_ratio)\n        # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here\n        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()\n        self.norm2 = norm_layer(dim)\n        mlp_hidden_dim = int(dim * mlp_ratio)\n        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)\n\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            trunc_normal_(m.weight, std=.02)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n        elif isinstance(m, nn.Conv2d):\n            fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n            fan_out //= m.groups\n            m.weight.data.normal_(0, math.sqrt(2.0 / fan_out))\n            if m.bias is not None:\n                m.bias.data.zero_()\n\n    def forward(self, x, H, W):\n        x = x + self.drop_path(self.attn(self.norm1(x), H, W))\n        x = x + self.drop_path(self.mlp(self.norm2(x), H, W))\n\n        return x\n\nclass _PositionAttentionModule(nn.Module):\n    \"\"\" Position attention module\"\"\"\n\n    def __init__(self, in_channels, **kwargs):\n        super(_PositionAttentionModule, self).__init__()\n        self.conv_b = nn.Conv2d(in_channels, in_channels // 8, 1)\n        self.conv_c = nn.Conv2d(in_channels, in_channels // 8, 1)\n        self.conv_d = nn.Conv2d(in_channels, in_channels, 1)\n        self.alpha = nn.Parameter(torch.zeros(1))\n        self.softmax = nn.Softmax(dim=-1)\n\n    def forward(self, x):\n        batch_size, _, height, width = x.size()\n        feat_b = self.conv_b(x).view(batch_size, -1, height * width).permute(0, 2, 1)\n        feat_c = self.conv_c(x).view(batch_size, -1, height * width)\n        attention_s = self.softmax(torch.bmm(feat_b, feat_c))\n        feat_d = self.conv_d(x).view(batch_size, -1, height * width)\n        feat_e = torch.bmm(feat_d, attention_s.permute(0, 2, 1)).view(batch_size, -1, height, width)\n        out = self.alpha * feat_e + x\n\n        return out\n\nclass OverlapPatchEmbed(nn.Module):\n    \"\"\" Image to Patch Embedding\n    \"\"\"\n\n    def __init__(self, img_size=224, patch_size=7, stride=4, in_chans=3, embed_dim=768):\n        super().__init__()\n        img_size = to_2tuple(img_size)\n        patch_size = to_2tuple(patch_size)\n\n        self.img_size = img_size\n        self.patch_size = patch_size\n        self.H, self.W = img_size[0] // patch_size[0], img_size[1] // patch_size[1]\n        self.num_patches = self.H * self.W\n        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=stride,\n                              padding=(patch_size[0] // 2, patch_size[1] // 2))\n        self.norm = nn.LayerNorm(embed_dim)\n\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            trunc_normal_(m.weight, std=.02)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n        elif isinstance(m, nn.Conv2d):\n            fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n            fan_out //= m.groups\n            m.weight.data.normal_(0, math.sqrt(2.0 / fan_out))\n            if m.bias is not None:\n                m.bias.data.zero_()\n\n    def forward(self, x):\n        x = self.proj(x)\n        _, _, H, W = x.shape\n        x = x.flatten(2).transpose(1, 2)\n        x = self.norm(x)\n\n        return x, H, W\n\ndef init_weights(net, init_type='normal', gain=0.02):\n        net = net\n        def init_func(m):\n            classname = m.__class__.__name__\n            if hasattr(m, 'weight') and (classname.find('Conv') != -1 or classname.find('Linear') != -1):\n                if init_type == 'normal':\n                    init.normal_(m.weight.data, 0.0, gain)\n                elif init_type == 'xavier':\n                    init.xavier_normal_(m.weight.data, gain=gain)\n                elif init_type == 'kaiming':\n                    init.kaiming_normal_(m.weight.data, a=0, mode='fan_in')\n                elif init_type == 'orthogonal':\n                    init.orthogonal_(m.weight.data, gain=gain)\n                elif init_type == 'pretrained':\n                    pass\n                else:\n                    raise NotImplementedError('initialization method [%s] is not implemented' % init_type)\n                if hasattr(m, 'bias') and m.bias is not None and init_type != 'pretrained':\n                    init.constant_(m.bias.data, 0.0)\n            elif classname.find('BatchNorm2d') != -1:\n                init.normal_(m.weight.data, 1.0, gain)\n                init.constant_(m.bias.data, 0.0)\n\n        net.apply(init_func)\n\ndef init_net(net, init_type='normal', init_gain=0.02, gpu_ids=[]):\n    if len(gpu_ids) > 0:\n        assert(torch.cuda.is_available())\n        net.to(gpu_ids[0])\n        net = torch.nn.DataParallel(net, gpu_ids)\n\n    for root_child in net.children():\n        for children in root_child.children():\n            init_weights(children, init_type, gain=init_gain)\n\n    return net\n\ndef define_RoadSeg(num_labels, use_sne=True, init_type='xavier', init_gain=0.02, gpu_ids=[]):\n\n    net= mit_b2() # or other model mit_b1,...,mit_b5\n\n    return init_net(net, init_type, init_gain, gpu_ids)\n\nclass crossAttentionModule(nn.Module):\n    \"\"\"Channel attention module\"\"\"\n\n    def __init__(self, in_features, hidden_features):\n        super(crossAttentionModule, self).__init__()\n        self.fc = nn.Linear(in_features, hidden_features)\n        self.sigmoid = nn.Sigmoid()\n\n    def forward(self, x, y):\n\n        cross_attention = self.sigmoid(self.fc(x+y))\n\n        return cross_attention\n\nclass MixVisionTransformer(nn.Module):\n    def __init__(self, img_size=224, patch_size=16, in_chans=3, num_classes=1000, embed_dims=[64, 128, 320, 512],\n                 num_heads=[1, 2, 4, 8], mlp_ratios=[4, 4, 4, 4], qkv_bias=False, qk_scale=None, drop_rate=0.,\n                 attn_drop_rate=0., drop_path_rate=0., norm_layer=nn.LayerNorm,\n                 depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1]):\n        super().__init__()\n        self.num_classes = num_classes\n        self.depths = depths\n\n        self.pam_1 = _PositionAttentionModule(embed_dims[0])\n        self.pam_2 = _PositionAttentionModule(embed_dims[1])\n        self.pam_3 = _PositionAttentionModule(embed_dims[2])\n        self.pam_4 = _PositionAttentionModule(embed_dims[3])\n        \n        self.cam = crossAttentionModule(embed_dims[0],embed_dims[0])\n\n        # patch_embed\n        self.patch_embed1 = OverlapPatchEmbed(img_size=img_size, patch_size=7, stride=4, in_chans=in_chans,\n                                              embed_dim=embed_dims[0])\n        self.patch_embed2 = OverlapPatchEmbed(img_size=img_size // 4, patch_size=3, stride=2, in_chans=embed_dims[0],\n                                              embed_dim=embed_dims[1])\n        self.patch_embed3 = OverlapPatchEmbed(img_size=img_size // 8, patch_size=3, stride=2, in_chans=embed_dims[1],\n                                              embed_dim=embed_dims[2])\n        self.patch_embed4 = OverlapPatchEmbed(img_size=img_size // 16, patch_size=3, stride=2, in_chans=embed_dims[2],\n                                              embed_dim=embed_dims[3])\n\n        # transformer encoder\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule\n        cur = 0\n        self.block1 = nn.ModuleList([Block(\n            dim=embed_dims[0], num_heads=num_heads[0], mlp_ratio=mlp_ratios[0], qkv_bias=qkv_bias, qk_scale=qk_scale,\n            drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer,\n            sr_ratio=sr_ratios[0])\n            for i in range(depths[0])])\n        self.norm1 = norm_layer(embed_dims[0])\n\n        cur += depths[0]\n        self.block2 = nn.ModuleList([Block(\n            dim=embed_dims[1], num_heads=num_heads[1], mlp_ratio=mlp_ratios[1], qkv_bias=qkv_bias, qk_scale=qk_scale,\n            drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer,\n            sr_ratio=sr_ratios[1])\n            for i in range(depths[1])])\n        self.norm2 = norm_layer(embed_dims[1])\n\n        cur += depths[1]\n        self.block3 = nn.ModuleList([Block(\n            dim=embed_dims[2], num_heads=num_heads[2], mlp_ratio=mlp_ratios[2], qkv_bias=qkv_bias, qk_scale=qk_scale,\n            drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer,\n            sr_ratio=sr_ratios[2])\n            for i in range(depths[2])])\n        self.norm3 = norm_layer(embed_dims[2])\n\n        cur += depths[2]\n        self.block4 = nn.ModuleList([Block(\n            dim=embed_dims[3], num_heads=num_heads[3], mlp_ratio=mlp_ratios[3], qkv_bias=qkv_bias, qk_scale=qk_scale,\n            drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[cur + i], norm_layer=norm_layer,\n            sr_ratio=sr_ratios[3])\n            for i in range(depths[3])])\n        self.norm4 = norm_layer(embed_dims[3])\n\n        self.head = Head()\n\n        self.apply(self._init_weights)\n\n    def _init_weights(self, m):\n        if isinstance(m, nn.Linear):\n            trunc_normal_(m.weight, std=.02)\n            if isinstance(m, nn.Linear) and m.bias is not None:\n                nn.init.constant_(m.bias, 0)\n        elif isinstance(m, nn.LayerNorm):\n            nn.init.constant_(m.bias, 0)\n            nn.init.constant_(m.weight, 1.0)\n        elif isinstance(m, nn.Conv2d):\n            fan_out = m.kernel_size[0] * m.kernel_size[1] * m.out_channels\n            fan_out //= m.groups\n            m.weight.data.normal_(0, math.sqrt(2.0 / fan_out))\n            if m.bias is not None:\n                m.bias.data.zero_()\n\n    def init_weights(self, pretrained=None):\n        if isinstance(pretrained, str):\n            logger = get_root_logger()\n            load_checkpoint(self, pretrained, map_location='cpu', strict=False, logger=logger)\n\n    def reset_drop_path(self, drop_path_rate):\n        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(self.depths))]\n        cur = 0\n        for i in range(self.depths[0]):\n            self.block1[i].drop_path.drop_prob = dpr[cur + i]\n\n        cur += self.depths[0]\n        for i in range(self.depths[1]):\n            self.block2[i].drop_path.drop_prob = dpr[cur + i]\n\n        cur += self.depths[1]\n        for i in range(self.depths[2]):\n            self.block3[i].drop_path.drop_prob = dpr[cur + i]\n\n        cur += self.depths[2]\n        for i in range(self.depths[3]):\n            self.block4[i].drop_path.drop_prob = dpr[cur + i]\n\n    def freeze_patch_emb(self):\n        self.patch_embed1.requires_grad = False\n\n    @torch.jit.ignore\n    def no_weight_decay(self):\n        return {'pos_embed1', 'pos_embed2', 'pos_embed3', 'pos_embed4', 'cls_token'}  # has pos_embed may be better\n\n    def get_classifier(self):\n        return self.head\n\n    def reset_classifier(self, num_classes, global_pool=''):\n        self.num_classes = num_classes\n        self.head = nn.Linear(self.embed_dim, num_classes) if num_classes > 0 else nn.Identity()\n\n    def forward_features(self, x, y):\n        B = x.shape[0]\n        outs = []\n\n        # stage 1\n        x, H, W = self.patch_embed1(x)\n        for i, blk in enumerate(self.block1):\n            x = blk(x, H, W)\n        x = self.norm1(x)\n        \n\n        y, H, W = self.patch_embed1(y)\n        for i, blk in enumerate(self.block1):\n            y = blk(y, H, W)\n        y = self.norm1(y)\n\n        cross_attention  = self.cam(x, y)\n        x = cross_attention*x + x\n        y = (1-cross_attention)*y + y\n\n        x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()\n        y = y.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()\n    \n        outs.append(x)\n\n        # stage 2\n        x, H, W = self.patch_embed2(x+y)\n        for i, blk in enumerate(self.block2):\n            x = blk(x, H, W)\n        x = self.norm2(x)\n        x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()\n\n        outs.append(x)\n\n\n        # stage 3\n        x, H, W = self.patch_embed3(x)\n        for i, blk in enumerate(self.block3):\n            x = blk(x, H, W)\n        x = self.norm3(x)\n        x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()\n       \n        outs.append(x)\n\n        # stage 4\n        x, H, W = self.patch_embed4(x)\n        for i, blk in enumerate(self.block4):\n            x = blk(x, H, W)\n        x = self.norm4(x)\n        x = x.reshape(B, H, W, -1).permute(0, 3, 1, 2).contiguous()\n        \n        outs.append(x)\n\n        return outs\n\n    def forward(self, x, y):\n        x = self.forward_features(x, y)       \n        x = self.head(x)\n        \n        return x\n\n\nclass DWConv(nn.Module):\n    def __init__(self, dim=768):\n        super(DWConv, self).__init__()\n        self.dwconv = nn.Conv2d(dim, dim, 3, 1, 1, bias=True, groups=dim)\n\n    def forward(self, x, H, W):\n        B, N, C = x.shape\n        x = x.transpose(1, 2).view(B, C, H, W)\n        x = self.dwconv(x)\n        x = x.flatten(2).transpose(1, 2)\n\n        return x\n\n\nclass mit_b0(MixVisionTransformer):\n    def __init__(self, **kwargs):\n        super(mit_b0, self).__init__(\n            patch_size=4, embed_dims=[32, 64, 160, 256], num_heads=[1, 2, 5, 8], mlp_ratios=[4, 4, 4, 4],\n            qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 2, 2], sr_ratios=[8, 4, 2, 1],\n            drop_rate=0.0, drop_path_rate=0.1)\n\nclass mit_b1(MixVisionTransformer):\n    def __init__(self, **kwargs):\n        super(mit_b1, self).__init__(\n            patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[4, 4, 4, 4],\n            qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[2, 2, 2, 2], sr_ratios=[8, 4, 2, 1],\n            drop_rate=0.0, drop_path_rate=0.1)\n\nclass mit_b2(MixVisionTransformer):\n    def __init__(self, **kwargs):\n        super(mit_b2, self).__init__(\n            patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[4, 4, 4, 4],\n            qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3, 4, 6, 3], sr_ratios=[8, 4, 2, 1],\n            drop_rate=0.0, drop_path_rate=0.1)\n\n\nclass mit_b3(MixVisionTransformer):\n    def __init__(self, **kwargs):\n        super(mit_b3, self).__init__(\n            patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[4, 4, 4, 4],\n            qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3, 4, 18, 3], sr_ratios=[8, 4, 2, 1],\n            drop_rate=0.0, drop_path_rate=0.1)\n\nclass mit_b4(MixVisionTransformer):\n    def __init__(self, **kwargs):\n        super(mit_b4, self).__init__(\n            patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[4, 4, 4, 4],\n            qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3, 8, 27, 3], sr_ratios=[8, 4, 2, 1],\n            drop_rate=0.0, drop_path_rate=0.1)\n\nclass mit_b5(MixVisionTransformer):\n    def __init__(self, **kwargs):\n        super(mit_b5, self).__init__(\n            patch_size=4, embed_dims=[64, 128, 320, 512], num_heads=[1, 2, 5, 8], mlp_ratios=[4, 4, 4, 4],\n            qkv_bias=True, norm_layer=partial(nn.LayerNorm, eps=1e-6), depths=[3, 6, 40, 3], sr_ratios=[8, 4, 2, 1],\n            drop_rate=0.0, drop_path_rate=0.1)"
  },
  {
    "path": "models/transformer_models/decode_heads/__init__.py",
    "content": ""
  },
  {
    "path": "models/transformer_models/decode_heads/decode_head.py",
    "content": "from abc import ABCMeta, abstractmethod\n\nimport torch\nimport torch.nn as nn\nfrom mmcv.cnn import normal_init\nfrom mmcv.runner import auto_fp16, force_fp32\n\nfrom mmseg.core import build_pixel_sampler\nfrom mmseg.ops import resize\n\n\nnorm_cfg = dict(type='BN', requires_grad=True)\n\nclass BaseDecodeHead(nn.Module, metaclass=ABCMeta):\n\n    def __init__(self,\n                 in_channels = [64, 128, 320, 512],\n                 channels = 128,\n                 *,\n                 num_classes = 2,\n                 dropout_ratio=0.1,\n                 conv_cfg=None,\n                 norm_cfg=norm_cfg,\n                 act_cfg=dict(type='ReLU'),\n                 in_index=[0, 1, 2, 3],\n                 input_transform=None,\n                 decoder_params=dict(embed_dim=256),\n                 ignore_index=255,\n                 sampler=None,\n                 align_corners=False):\n        super(BaseDecodeHead, self).__init__()\n        self._init_inputs(in_channels, in_index, input_transform)\n        self.channels = channels\n        self.num_classes = num_classes\n        self.dropout_ratio = dropout_ratio\n        self.conv_cfg = conv_cfg\n        self.norm_cfg = norm_cfg\n        self.act_cfg = act_cfg\n        self.in_index = in_index\n        self.ignore_index = ignore_index\n        self.align_corners = align_corners\n\n        if sampler is not None:\n            self.sampler = build_pixel_sampler(sampler, context=self)\n        else:\n            self.sampler = None\n\n        self.conv_seg = nn.Conv2d(channels, num_classes, kernel_size=1)\n        if dropout_ratio > 0:\n            self.dropout = nn.Dropout2d(dropout_ratio)\n        else:\n            self.dropout = None\n        self.fp16_enabled = False\n\n    def extra_repr(self):\n        \"\"\"Extra repr.\"\"\"\n        s = f'input_transform={self.input_transform}, ' \\\n            f'ignore_index={self.ignore_index}, ' \\\n            f'align_corners={self.align_corners}'\n        return s\n\n    def _init_inputs(self, in_channels, in_index, input_transform):\n        \"\"\"Check and initialize input transforms.\n\n        The in_channels, in_index and input_transform must match.\n        Specifically, when input_transform is None, only single feature map\n        will be selected. So in_channels and in_index must be of type int.\n        When input_transform\n\n        Args:\n            in_channels (int|Sequence[int]): Input channels.\n            in_index (int|Sequence[int]): Input feature index.\n            input_transform (str|None): Transformation type of input features.\n                Options: 'resize_concat', 'multiple_select', None.\n                'resize_concat': Multiple feature maps will be resize to the\n                    same size as first one and than concat together.\n                    Usually used in FCN head of HRNet.\n                'multiple_select': Multiple feature maps will be bundle into\n                    a list and passed into decode head.\n                None: Only one select feature map is allowed.\n        \"\"\"\n\n        if input_transform is not None:\n            assert input_transform in ['resize_concat', 'multiple_select']\n        self.input_transform = input_transform\n        self.in_index = in_index\n        if input_transform is not None:\n            assert isinstance(in_channels, (list, tuple))\n            assert isinstance(in_index, (list, tuple))\n            assert len(in_channels) == len(in_index)\n            if input_transform == 'resize_concat':\n                self.in_channels = sum(in_channels)\n            else:\n                self.in_channels = in_channels\n        else:\n            assert isinstance(in_channels, int)\n            assert isinstance(in_index, int)\n            self.in_channels = in_channels\n\n    def init_weights(self):\n        \"\"\"Initialize weights of classification layer.\"\"\"\n        normal_init(self.conv_seg, mean=0, std=0.01)\n\n    def _transform_inputs(self, inputs):\n        \"\"\"Transform inputs for decoder.\n\n        Args:\n            inputs (list[Tensor]): List of multi-level img features.\n\n        Returns:\n            Tensor: The transformed inputs\n        \"\"\"\n\n        if self.input_transform == 'resize_concat':\n            inputs = [inputs[i] for i in self.in_index]\n            upsampled_inputs = [\n                resize(\n                    input=x,\n                    size=inputs[0].shape[2:],\n                    mode='bilinear',\n                    align_corners=self.align_corners) for x in inputs\n            ]\n            inputs = torch.cat(upsampled_inputs, dim=1)\n        elif self.input_transform == 'multiple_select':\n            inputs = [inputs[i] for i in self.in_index]\n        else:\n            inputs = inputs[self.in_index]\n\n        return inputs\n\n    @auto_fp16()\n    @abstractmethod\n    def forward(self, inputs):\n        \"\"\"Placeholder of forward function.\"\"\"\n        pass\n\n    def forward_train(self, inputs, img_metas, gt_semantic_seg, train_cfg):\n        \"\"\"Forward function for training.\n        Args:\n            inputs (list[Tensor]): List of multi-level img features.\n            img_metas (list[dict]): List of image info dict where each dict\n                has: 'img_shape', 'scale_factor', 'flip', and may also contain\n                'filename', 'ori_shape', 'pad_shape', and 'img_norm_cfg'.\n                For details on the values of these keys see\n                `mmseg/datasets/pipelines/formatting.py:Collect`.\n            gt_semantic_seg (Tensor): Semantic segmentation masks\n                used if the architecture supports semantic segmentation task.\n            train_cfg (dict): The training config.\n\n        Returns:\n            dict[str, Tensor]: a dictionary of loss components\n        \"\"\"\n        seg_logits = self.forward(inputs)\n        losses = self.losses(seg_logits, gt_semantic_seg)\n        return losses\n\n    def forward_test(self, inputs, img_metas, test_cfg):\n        \"\"\"Forward function for testing.\n\n        Args:\n            inputs (list[Tensor]): List of multi-level img features.\n            img_metas (list[dict]): List of image info dict where each dict\n                has: 'img_shape', 'scale_factor', 'flip', and may also contain\n                'filename', 'ori_shape', 'pad_shape', and 'img_norm_cfg'.\n                For details on the values of these keys see\n                `mmseg/datasets/pipelines/formatting.py:Collect`.\n            test_cfg (dict): The testing config.\n\n        Returns:\n            Tensor: Output segmentation map.\n        \"\"\"\n        return self.forward(inputs)\n\n    def cls_seg(self, feat):\n        \"\"\"Classify each pixel.\"\"\"\n        if self.dropout is not None:\n            feat = self.dropout(feat)\n        output = self.conv_seg(feat)\n        return output"
  },
  {
    "path": "models/transformer_models/decode_heads/head.py",
    "content": "# ---------------------------------------------------------------\n# Copyright (c) 2021, NVIDIA Corporation. All rights reserved.\n#\n# This work is licensed under the NVIDIA Source Code License\n# ---------------------------------------------------------------\nimport numpy as np\nimport torch.nn as nn\nimport torch\nfrom mmcv.cnn import ConvModule, DepthwiseSeparableConvModule\nfrom collections import OrderedDict\n\nfrom mmseg.ops import resize, Upsample\nfrom .decode_head import BaseDecodeHead\nfrom mmseg.models.utils import *\nimport attr\n\nfrom IPython import embed\n\nclass MLP(nn.Module):\n    \"\"\"\n    Linear Embedding\n    \"\"\"\n    def __init__(self, input_dim=2048, embed_dim=768):\n        super().__init__()\n        self.proj = nn.Linear(input_dim, embed_dim)\n\n    def forward(self, x):\n        x = x.flatten(2).transpose(1, 2)\n        x = self.proj(x)\n\n        return x\n\nclass Head(BaseDecodeHead):\n    def __init__(self, feature_strides=[4, 8, 16, 32], **kwargs):\n        super(Head, self).__init__(input_transform='multiple_select', **kwargs)\n        assert len(feature_strides) == len(self.in_channels)\n        assert min(feature_strides) == feature_strides[0]\n        self.feature_strides = feature_strides\n\n        c1_in_channels, c2_in_channels, c3_in_channels, c4_in_channels = self.in_channels\n\n        decoder_params = dict(embed_dim=256)\n        embedding_dim = decoder_params['embed_dim']\n\n        self.linear_c4 = MLP(input_dim=c4_in_channels, embed_dim=embedding_dim)\n        self.linear_c3 = MLP(input_dim=c3_in_channels, embed_dim=embedding_dim)\n        self.linear_c2 = MLP(input_dim=c2_in_channels, embed_dim=embedding_dim)\n        self.linear_c1 = MLP(input_dim=c1_in_channels, embed_dim=embedding_dim)\n        \n        self.linear_fuse = ConvModule(\n            in_channels=embedding_dim*4,\n            out_channels=embedding_dim,\n            kernel_size=1,\n            norm_cfg=dict(type='BN', requires_grad=True)\n        )\n\n        self.linear_pred = nn.Conv2d(embedding_dim, self.num_classes, kernel_size=1)\n        self.Upsample = Upsample(scale_factor=4)\n\n    def forward(self, inputs):  \n        x = self._transform_inputs(inputs)  \n\n        c1, c2, c3, c4 = x\n\n        n, _, h, w = c4.shape\n       \n        _c4 = self.linear_c4(c4).permute(0,2,1).reshape(n, -1, c4.shape[2], c4.shape[3])\n       \n        _c4 = resize(_c4, size=c1.size()[2:],mode='bilinear',align_corners=False)\n        \n        _c3 = self.linear_c3(c3).permute(0,2,1).reshape(n, -1, c3.shape[2], c3.shape[3])\n       \n        _c3 = resize(_c3, size=c1.size()[2:],mode='bilinear',align_corners=False)\n\n        _c2 = self.linear_c2(c2).permute(0,2,1).reshape(n, -1, c2.shape[2], c2.shape[3])\n        \n        _c2 = resize(_c2, size=c1.size()[2:],mode='bilinear',align_corners=False)\n\n        _c1 = self.linear_c1(c1).permute(0,2,1).reshape(n, -1, c1.shape[2], c1.shape[3])\n        \n        _c1 = resize(_c1, size=c1.size()[2:],mode='bilinear',align_corners=False)\n\n        x = self.linear_fuse(torch.cat([_c4, _c3, _c2, _c1], dim=1))\n\n        x = self.linear_pred(x)\n\n        return x\n"
  },
  {
    "path": "models/transformer_models/utils/__init__.py",
    "content": "from .inverted_residual import InvertedResidual, InvertedResidualV3\nfrom .make_divisible import make_divisible\nfrom .res_layer import ResLayer\nfrom .self_attention_block import SelfAttentionBlock\nfrom .up_conv_block import UpConvBlock\n\n__all__ = [\n    'ResLayer', 'SelfAttentionBlock', 'make_divisible', 'InvertedResidual',\n    'UpConvBlock', 'InvertedResidualV3'\n]\n"
  },
  {
    "path": "models/transformer_models/utils/drop.py",
    "content": "\"\"\" DropBlock, DropPath\nPyTorch implementations of DropBlock and DropPath (Stochastic Depth) regularization layers.\nPapers:\nDropBlock: A regularization method for convolutional networks (https://arxiv.org/abs/1810.12890)\nDeep Networks with Stochastic Depth (https://arxiv.org/abs/1603.09382)\nCode:\nDropBlock impl inspired by two Tensorflow impl that I liked:\n - https://github.com/tensorflow/tpu/blob/master/models/official/resnet/resnet_model.py#L74\n - https://github.com/clovaai/assembled-cnn/blob/master/nets/blocks.py\nHacked together by / Copyright 2020 Ross Wightman\n\"\"\"\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\ndef drop_block_2d(\n        x, drop_prob: float = 0.1, block_size: int = 7,  gamma_scale: float = 1.0,\n        with_noise: bool = False, inplace: bool = False, batchwise: bool = False):\n    \"\"\" DropBlock. See https://arxiv.org/pdf/1810.12890.pdf\n    DropBlock with an experimental gaussian noise option. This layer has been tested on a few training\n    runs with success, but needs further validation and possibly optimization for lower runtime impact.\n    \"\"\"\n    B, C, H, W = x.shape\n    total_size = W * H\n    clipped_block_size = min(block_size, min(W, H))\n    # seed_drop_rate, the gamma parameter\n    gamma = gamma_scale * drop_prob * total_size / clipped_block_size ** 2 / (\n        (W - block_size + 1) * (H - block_size + 1))\n\n    # Forces the block to be inside the feature map.\n    w_i, h_i = torch.meshgrid(torch.arange(W).to(x.device), torch.arange(H).to(x.device))\n    valid_block = ((w_i >= clipped_block_size // 2) & (w_i < W - (clipped_block_size - 1) // 2)) & \\\n                  ((h_i >= clipped_block_size // 2) & (h_i < H - (clipped_block_size - 1) // 2))\n    valid_block = torch.reshape(valid_block, (1, 1, H, W)).to(dtype=x.dtype)\n\n    if batchwise:\n        # one mask for whole batch, quite a bit faster\n        uniform_noise = torch.rand((1, C, H, W), dtype=x.dtype, device=x.device)\n    else:\n        uniform_noise = torch.rand_like(x)\n    block_mask = ((2 - gamma - valid_block + uniform_noise) >= 1).to(dtype=x.dtype)\n    block_mask = -F.max_pool2d(\n        -block_mask,\n        kernel_size=clipped_block_size,  # block_size,\n        stride=1,\n        padding=clipped_block_size // 2)\n\n    if with_noise:\n        normal_noise = torch.randn((1, C, H, W), dtype=x.dtype, device=x.device) if batchwise else torch.randn_like(x)\n        if inplace:\n            x.mul_(block_mask).add_(normal_noise * (1 - block_mask))\n        else:\n            x = x * block_mask + normal_noise * (1 - block_mask)\n    else:\n        normalize_scale = (block_mask.numel() / block_mask.to(dtype=torch.float32).sum().add(1e-7)).to(x.dtype)\n        if inplace:\n            x.mul_(block_mask * normalize_scale)\n        else:\n            x = x * block_mask * normalize_scale\n    return x\n\n\ndef drop_block_fast_2d(\n        x: torch.Tensor, drop_prob: float = 0.1, block_size: int = 7,\n        gamma_scale: float = 1.0, with_noise: bool = False, inplace: bool = False, batchwise: bool = False):\n    \"\"\" DropBlock. See https://arxiv.org/pdf/1810.12890.pdf\n    DropBlock with an experimental gaussian noise option. Simplied from above without concern for valid\n    block mask at edges.\n    \"\"\"\n    B, C, H, W = x.shape\n    total_size = W * H\n    clipped_block_size = min(block_size, min(W, H))\n    gamma = gamma_scale * drop_prob * total_size / clipped_block_size ** 2 / (\n            (W - block_size + 1) * (H - block_size + 1))\n\n    if batchwise:\n        # one mask for whole batch, quite a bit faster\n        block_mask = torch.rand((1, C, H, W), dtype=x.dtype, device=x.device) < gamma\n    else:\n        # mask per batch element\n        block_mask = torch.rand_like(x) < gamma\n    block_mask = F.max_pool2d(\n        block_mask.to(x.dtype), kernel_size=clipped_block_size, stride=1, padding=clipped_block_size // 2)\n\n    if with_noise:\n        normal_noise = torch.randn((1, C, H, W), dtype=x.dtype, device=x.device) if batchwise else torch.randn_like(x)\n        if inplace:\n            x.mul_(1. - block_mask).add_(normal_noise * block_mask)\n        else:\n            x = x * (1. - block_mask) + normal_noise * block_mask\n    else:\n        block_mask = 1 - block_mask\n        normalize_scale = (block_mask.numel() / block_mask.to(dtype=torch.float32).sum().add(1e-7)).to(dtype=x.dtype)\n        if inplace:\n            x.mul_(block_mask * normalize_scale)\n        else:\n            x = x * block_mask * normalize_scale\n    return x\n\n\nclass DropBlock2d(nn.Module):\n    \"\"\" DropBlock. See https://arxiv.org/pdf/1810.12890.pdf\n    \"\"\"\n    def __init__(self,\n                 drop_prob=0.1,\n                 block_size=7,\n                 gamma_scale=1.0,\n                 with_noise=False,\n                 inplace=False,\n                 batchwise=False,\n                 fast=True):\n        super(DropBlock2d, self).__init__()\n        self.drop_prob = drop_prob\n        self.gamma_scale = gamma_scale\n        self.block_size = block_size\n        self.with_noise = with_noise\n        self.inplace = inplace\n        self.batchwise = batchwise\n        self.fast = fast  # FIXME finish comparisons of fast vs not\n\n    def forward(self, x):\n        if not self.training or not self.drop_prob:\n            return x\n        if self.fast:\n            return drop_block_fast_2d(\n                x, self.drop_prob, self.block_size, self.gamma_scale, self.with_noise, self.inplace, self.batchwise)\n        else:\n            return drop_block_2d(\n                x, self.drop_prob, self.block_size, self.gamma_scale, self.with_noise, self.inplace, self.batchwise)\n\n\ndef drop_path(x, drop_prob: float = 0., training: bool = False):\n    \"\"\"Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).\n    This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,\n    the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...\n    See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for\n    changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use\n    'survival rate' as the argument.\n    \"\"\"\n    if drop_prob == 0. or not training:\n        return x\n    keep_prob = 1 - drop_prob\n    shape = (x.shape[0],) + (1,) * (x.ndim - 1)  # work with diff dim tensors, not just 2D ConvNets\n    random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)\n    random_tensor.floor_()  # binarize\n    output = x.div(keep_prob) * random_tensor\n    return output\n\n\nclass DropPath(nn.Module):\n    \"\"\"Drop paths (Stochastic Depth) per sample  (when applied in main path of residual blocks).\n    \"\"\"\n    def __init__(self, drop_prob=None):\n        super(DropPath, self).__init__()\n        self.drop_prob = drop_prob\n\n    def forward(self, x):\n        return drop_path(x, self.drop_prob, self.training)\n"
  },
  {
    "path": "models/transformer_models/utils/inverted_residual.py",
    "content": "from mmcv.cnn import ConvModule\nfrom torch import nn as nn\nfrom torch.utils import checkpoint as cp\n\nfrom .se_layer import SELayer\n\n\nclass InvertedResidual(nn.Module):\n    \"\"\"InvertedResidual block for MobileNetV2.\n\n    Args:\n        in_channels (int): The input channels of the InvertedResidual block.\n        out_channels (int): The output channels of the InvertedResidual block.\n        stride (int): Stride of the middle (first) 3x3 convolution.\n        expand_ratio (int): Adjusts number of channels of the hidden layer\n            in InvertedResidual by this amount.\n        dilation (int): Dilation rate of depthwise conv. Default: 1\n        conv_cfg (dict): Config dict for convolution layer.\n            Default: None, which means using conv2d.\n        norm_cfg (dict): Config dict for normalization layer.\n            Default: dict(type='BN').\n        act_cfg (dict): Config dict for activation layer.\n            Default: dict(type='ReLU6').\n        with_cp (bool): Use checkpoint or not. Using checkpoint will save some\n            memory while slowing down the training speed. Default: False.\n\n    Returns:\n        Tensor: The output tensor.\n    \"\"\"\n\n    def __init__(self,\n                 in_channels,\n                 out_channels,\n                 stride,\n                 expand_ratio,\n                 dilation=1,\n                 conv_cfg=None,\n                 norm_cfg=dict(type='BN'),\n                 act_cfg=dict(type='ReLU6'),\n                 with_cp=False):\n        super(InvertedResidual, self).__init__()\n        self.stride = stride\n        assert stride in [1, 2], f'stride must in [1, 2]. ' \\\n            f'But received {stride}.'\n        self.with_cp = with_cp\n        self.use_res_connect = self.stride == 1 and in_channels == out_channels\n        hidden_dim = int(round(in_channels * expand_ratio))\n\n        layers = []\n        if expand_ratio != 1:\n            layers.append(\n                ConvModule(\n                    in_channels=in_channels,\n                    out_channels=hidden_dim,\n                    kernel_size=1,\n                    conv_cfg=conv_cfg,\n                    norm_cfg=norm_cfg,\n                    act_cfg=act_cfg))\n        layers.extend([\n            ConvModule(\n                in_channels=hidden_dim,\n                out_channels=hidden_dim,\n                kernel_size=3,\n                stride=stride,\n                padding=dilation,\n                dilation=dilation,\n                groups=hidden_dim,\n                conv_cfg=conv_cfg,\n                norm_cfg=norm_cfg,\n                act_cfg=act_cfg),\n            ConvModule(\n                in_channels=hidden_dim,\n                out_channels=out_channels,\n                kernel_size=1,\n                conv_cfg=conv_cfg,\n                norm_cfg=norm_cfg,\n                act_cfg=None)\n        ])\n        self.conv = nn.Sequential(*layers)\n\n    def forward(self, x):\n\n        def _inner_forward(x):\n            if self.use_res_connect:\n                return x + self.conv(x)\n            else:\n                return self.conv(x)\n\n        if self.with_cp and x.requires_grad:\n            out = cp.checkpoint(_inner_forward, x)\n        else:\n            out = _inner_forward(x)\n\n        return out\n\n\nclass InvertedResidualV3(nn.Module):\n    \"\"\"Inverted Residual Block for MobileNetV3.\n\n    Args:\n        in_channels (int): The input channels of this Module.\n        out_channels (int): The output channels of this Module.\n        mid_channels (int): The input channels of the depthwise convolution.\n        kernel_size (int): The kernal size of the depthwise convolution.\n            Default: 3.\n        stride (int): The stride of the depthwise convolution. Default: 1.\n        se_cfg (dict): Config dict for se layer. Defaul: None, which means no\n            se layer.\n        with_expand_conv (bool): Use expand conv or not. If set False,\n            mid_channels must be the same with in_channels. Default: True.\n        conv_cfg (dict): Config dict for convolution layer. Default: None,\n            which means using conv2d.\n        norm_cfg (dict): Config dict for normalization layer.\n            Default: dict(type='BN').\n        act_cfg (dict): Config dict for activation layer.\n            Default: dict(type='ReLU').\n        with_cp (bool): Use checkpoint or not. Using checkpoint will save some\n            memory while slowing down the training speed. Default: False.\n\n    Returns:\n        Tensor: The output tensor.\n    \"\"\"\n\n    def __init__(self,\n                 in_channels,\n                 out_channels,\n                 mid_channels,\n                 kernel_size=3,\n                 stride=1,\n                 se_cfg=None,\n                 with_expand_conv=True,\n                 conv_cfg=None,\n                 norm_cfg=dict(type='BN'),\n                 act_cfg=dict(type='ReLU'),\n                 with_cp=False):\n        super(InvertedResidualV3, self).__init__()\n        self.with_res_shortcut = (stride == 1 and in_channels == out_channels)\n        assert stride in [1, 2]\n        self.with_cp = with_cp\n        self.with_se = se_cfg is not None\n        self.with_expand_conv = with_expand_conv\n\n        if self.with_se:\n            assert isinstance(se_cfg, dict)\n        if not self.with_expand_conv:\n            assert mid_channels == in_channels\n\n        if self.with_expand_conv:\n            self.expand_conv = ConvModule(\n                in_channels=in_channels,\n                out_channels=mid_channels,\n                kernel_size=1,\n                stride=1,\n                padding=0,\n                conv_cfg=conv_cfg,\n                norm_cfg=norm_cfg,\n                act_cfg=act_cfg)\n        self.depthwise_conv = ConvModule(\n            in_channels=mid_channels,\n            out_channels=mid_channels,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=kernel_size // 2,\n            groups=mid_channels,\n            conv_cfg=dict(\n                type='Conv2dAdaptivePadding') if stride == 2 else conv_cfg,\n            norm_cfg=norm_cfg,\n            act_cfg=act_cfg)\n\n        if self.with_se:\n            self.se = SELayer(**se_cfg)\n\n        self.linear_conv = ConvModule(\n            in_channels=mid_channels,\n            out_channels=out_channels,\n            kernel_size=1,\n            stride=1,\n            padding=0,\n            conv_cfg=conv_cfg,\n            norm_cfg=norm_cfg,\n            act_cfg=None)\n\n    def forward(self, x):\n\n        def _inner_forward(x):\n            out = x\n\n            if self.with_expand_conv:\n                out = self.expand_conv(out)\n\n            out = self.depthwise_conv(out)\n\n            if self.with_se:\n                out = self.se(out)\n\n            out = self.linear_conv(out)\n\n            if self.with_res_shortcut:\n                return x + out\n            else:\n                return out\n\n        if self.with_cp and x.requires_grad:\n            out = cp.checkpoint(_inner_forward, x)\n        else:\n            out = _inner_forward(x)\n\n        return out\n"
  },
  {
    "path": "models/transformer_models/utils/make_divisible.py",
    "content": "def make_divisible(value, divisor, min_value=None, min_ratio=0.9):\n    \"\"\"Make divisible function.\n\n    This function rounds the channel number to the nearest value that can be\n    divisible by the divisor. It is taken from the original tf repo. It ensures\n    that all layers have a channel number that is divisible by divisor. It can\n    be seen here: https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py  # noqa\n\n    Args:\n        value (int): The original channel number.\n        divisor (int): The divisor to fully divide the channel number.\n        min_value (int): The minimum value of the output channel.\n            Default: None, means that the minimum value equal to the divisor.\n        min_ratio (float): The minimum ratio of the rounded channel number to\n            the original channel number. Default: 0.9.\n\n    Returns:\n        int: The modified output channel number.\n    \"\"\"\n\n    if min_value is None:\n        min_value = divisor\n    new_value = max(min_value, int(value + divisor / 2) // divisor * divisor)\n    # Make sure that round down does not go down by more than (1-min_ratio).\n    if new_value < min_ratio * value:\n        new_value += divisor\n    return new_value\n"
  },
  {
    "path": "models/transformer_models/utils/norm.py",
    "content": "import torch\nimport math\nimport warnings\n\n\ndef _no_grad_trunc_normal_(tensor, mean, std, a, b):\n    # Cut & paste from PyTorch official master until it's in a few official releases - RW\n    # Method based on https://people.sc.fsu.edu/~jburkardt/presentations/truncated_normal.pdf\n    def norm_cdf(x):\n        # Computes standard normal cumulative distribution function\n        return (1. + math.erf(x / math.sqrt(2.))) / 2.\n\n    if (mean < a - 2 * std) or (mean > b + 2 * std):\n        warnings.warn(\"mean is more than 2 std from [a, b] in nn.init.trunc_normal_. \"\n                      \"The distribution of values may be incorrect.\",\n                      stacklevel=2)\n\n    with torch.no_grad():\n        # Values are generated by using a truncated uniform distribution and\n        # then using the inverse CDF for the normal distribution.\n        # Get upper and lower cdf values\n        l = norm_cdf((a - mean) / std)\n        u = norm_cdf((b - mean) / std)\n\n        # Uniformly fill tensor with values from [l, u], then translate to\n        # [2l-1, 2u-1].\n        tensor.uniform_(2 * l - 1, 2 * u - 1)\n\n        # Use inverse cdf transform for normal distribution to get truncated\n        # standard normal\n        tensor.erfinv_()\n\n        # Transform to proper mean, std\n        tensor.mul_(std * math.sqrt(2.))\n        tensor.add_(mean)\n\n        # Clamp to ensure it's in the proper range\n        tensor.clamp_(min=a, max=b)\n        return tensor\n\n\ndef trunc_normal_(tensor, mean=0., std=1., a=-2., b=2.):\n    # type: (Tensor, float, float, float, float) -> Tensor\n    r\"\"\"Fills the input Tensor with values drawn from a truncated\n    normal distribution. The values are effectively drawn from the\n    normal distribution :math:`\\mathcal{N}(\\text{mean}, \\text{std}^2)`\n    with values outside :math:`[a, b]` redrawn until they are within\n    the bounds. The method used for generating the random values works\n    best when :math:`a \\leq \\text{mean} \\leq b`.\n    Args:\n        tensor: an n-dimensional `torch.Tensor`\n        mean: the mean of the normal distribution\n        std: the standard deviation of the normal distribution\n        a: the minimum cutoff value\n        b: the maximum cutoff value\n    Examples:\n        >>> w = torch.empty(3, 5)\n        >>> nn.init.trunc_normal_(w)\n    \"\"\"\n    return _no_grad_trunc_normal_(tensor, mean, std, a, b)"
  },
  {
    "path": "models/transformer_models/utils/res_layer.py",
    "content": "from mmcv.cnn import build_conv_layer, build_norm_layer\nfrom torch import nn as nn\n\n\nclass ResLayer(nn.Sequential):\n    \"\"\"ResLayer to build ResNet style backbone.\n\n    Args:\n        block (nn.Module): block used to build ResLayer.\n        inplanes (int): inplanes of block.\n        planes (int): planes of block.\n        num_blocks (int): number of blocks.\n        stride (int): stride of the first block. Default: 1\n        avg_down (bool): Use AvgPool instead of stride conv when\n            downsampling in the bottleneck. Default: False\n        conv_cfg (dict): dictionary to construct and config conv layer.\n            Default: None\n        norm_cfg (dict): dictionary to construct and config norm layer.\n            Default: dict(type='BN')\n        multi_grid (int | None): Multi grid dilation rates of last\n            stage. Default: None\n        contract_dilation (bool): Whether contract first dilation of each layer\n            Default: False\n    \"\"\"\n\n    def __init__(self,\n                 block,\n                 inplanes,\n                 planes,\n                 num_blocks,\n                 stride=1,\n                 dilation=1,\n                 avg_down=False,\n                 conv_cfg=None,\n                 norm_cfg=dict(type='BN'),\n                 multi_grid=None,\n                 contract_dilation=False,\n                 **kwargs):\n        self.block = block\n\n        downsample = None\n        if stride != 1 or inplanes != planes * block.expansion:\n            downsample = []\n            conv_stride = stride\n            if avg_down:\n                conv_stride = 1\n                downsample.append(\n                    nn.AvgPool2d(\n                        kernel_size=stride,\n                        stride=stride,\n                        ceil_mode=True,\n                        count_include_pad=False))\n            downsample.extend([\n                build_conv_layer(\n                    conv_cfg,\n                    inplanes,\n                    planes * block.expansion,\n                    kernel_size=1,\n                    stride=conv_stride,\n                    bias=False),\n                build_norm_layer(norm_cfg, planes * block.expansion)[1]\n            ])\n            downsample = nn.Sequential(*downsample)\n\n        layers = []\n        if multi_grid is None:\n            if dilation > 1 and contract_dilation:\n                first_dilation = dilation // 2\n            else:\n                first_dilation = dilation\n        else:\n            first_dilation = multi_grid[0]\n        layers.append(\n            block(\n                inplanes=inplanes,\n                planes=planes,\n                stride=stride,\n                dilation=first_dilation,\n                downsample=downsample,\n                conv_cfg=conv_cfg,\n                norm_cfg=norm_cfg,\n                **kwargs))\n        inplanes = planes * block.expansion\n        for i in range(1, num_blocks):\n            layers.append(\n                block(\n                    inplanes=inplanes,\n                    planes=planes,\n                    stride=1,\n                    dilation=dilation if multi_grid is None else multi_grid[i],\n                    conv_cfg=conv_cfg,\n                    norm_cfg=norm_cfg,\n                    **kwargs))\n        super(ResLayer, self).__init__(*layers)\n"
  },
  {
    "path": "models/transformer_models/utils/se_layer.py",
    "content": "import mmcv\nimport torch.nn as nn\nfrom mmcv.cnn import ConvModule\n\nfrom .make_divisible import make_divisible\n\n\nclass SELayer(nn.Module):\n    \"\"\"Squeeze-and-Excitation Module.\n\n    Args:\n        channels (int): The input (and output) channels of the SE layer.\n        ratio (int): Squeeze ratio in SELayer, the intermediate channel will be\n            ``int(channels/ratio)``. Default: 16.\n        conv_cfg (None or dict): Config dict for convolution layer.\n            Default: None, which means using conv2d.\n        act_cfg (dict or Sequence[dict]): Config dict for activation layer.\n            If act_cfg is a dict, two activation layers will be configurated\n            by this dict. If act_cfg is a sequence of dicts, the first\n            activation layer will be configurated by the first dict and the\n            second activation layer will be configurated by the second dict.\n            Default: (dict(type='ReLU'), dict(type='HSigmoid', bias=3.0,\n            divisor=6.0)).\n    \"\"\"\n\n    def __init__(self,\n                 channels,\n                 ratio=16,\n                 conv_cfg=None,\n                 act_cfg=(dict(type='ReLU'),\n                          dict(type='HSigmoid', bias=3.0, divisor=6.0))):\n        super(SELayer, self).__init__()\n        if isinstance(act_cfg, dict):\n            act_cfg = (act_cfg, act_cfg)\n        assert len(act_cfg) == 2\n        assert mmcv.is_tuple_of(act_cfg, dict)\n        self.global_avgpool = nn.AdaptiveAvgPool2d(1)\n        self.conv1 = ConvModule(\n            in_channels=channels,\n            out_channels=make_divisible(channels // ratio, 8),\n            kernel_size=1,\n            stride=1,\n            conv_cfg=conv_cfg,\n            act_cfg=act_cfg[0])\n        self.conv2 = ConvModule(\n            in_channels=make_divisible(channels // ratio, 8),\n            out_channels=channels,\n            kernel_size=1,\n            stride=1,\n            conv_cfg=conv_cfg,\n            act_cfg=act_cfg[1])\n\n    def forward(self, x):\n        out = self.global_avgpool(x)\n        out = self.conv1(out)\n        out = self.conv2(out)\n        return x * out\n"
  },
  {
    "path": "models/transformer_models/utils/self_attention_block.py",
    "content": "import torch\nfrom mmcv.cnn import ConvModule, constant_init\nfrom torch import nn as nn\nfrom torch.nn import functional as F\n\n\nclass SelfAttentionBlock(nn.Module):\n    \"\"\"General self-attention block/non-local block.\n\n    Please refer to https://arxiv.org/abs/1706.03762 for details about key,\n    query and value.\n\n    Args:\n        key_in_channels (int): Input channels of key feature.\n        query_in_channels (int): Input channels of query feature.\n        channels (int): Output channels of key/query transform.\n        out_channels (int): Output channels.\n        share_key_query (bool): Whether share projection weight between key\n            and query projection.\n        query_downsample (nn.Module): Query downsample module.\n        key_downsample (nn.Module): Key downsample module.\n        key_query_num_convs (int): Number of convs for key/query projection.\n        value_num_convs (int): Number of convs for value projection.\n        matmul_norm (bool): Whether normalize attention map with sqrt of\n            channels\n        with_out (bool): Whether use out projection.\n        conv_cfg (dict|None): Config of conv layers.\n        norm_cfg (dict|None): Config of norm layers.\n        act_cfg (dict|None): Config of activation layers.\n    \"\"\"\n\n    def __init__(self, key_in_channels, query_in_channels, channels,\n                 out_channels, share_key_query, query_downsample,\n                 key_downsample, key_query_num_convs, value_out_num_convs,\n                 key_query_norm, value_out_norm, matmul_norm, with_out,\n                 conv_cfg, norm_cfg, act_cfg):\n        super(SelfAttentionBlock, self).__init__()\n        if share_key_query:\n            assert key_in_channels == query_in_channels\n        self.key_in_channels = key_in_channels\n        self.query_in_channels = query_in_channels\n        self.out_channels = out_channels\n        self.channels = channels\n        self.share_key_query = share_key_query\n        self.conv_cfg = conv_cfg\n        self.norm_cfg = norm_cfg\n        self.act_cfg = act_cfg\n        self.key_project = self.build_project(\n            key_in_channels,\n            channels,\n            num_convs=key_query_num_convs,\n            use_conv_module=key_query_norm,\n            conv_cfg=conv_cfg,\n            norm_cfg=norm_cfg,\n            act_cfg=act_cfg)\n        if share_key_query:\n            self.query_project = self.key_project\n        else:\n            self.query_project = self.build_project(\n                query_in_channels,\n                channels,\n                num_convs=key_query_num_convs,\n                use_conv_module=key_query_norm,\n                conv_cfg=conv_cfg,\n                norm_cfg=norm_cfg,\n                act_cfg=act_cfg)\n        self.value_project = self.build_project(\n            key_in_channels,\n            channels if with_out else out_channels,\n            num_convs=value_out_num_convs,\n            use_conv_module=value_out_norm,\n            conv_cfg=conv_cfg,\n            norm_cfg=norm_cfg,\n            act_cfg=act_cfg)\n        if with_out:\n            self.out_project = self.build_project(\n                channels,\n                out_channels,\n                num_convs=value_out_num_convs,\n                use_conv_module=value_out_norm,\n                conv_cfg=conv_cfg,\n                norm_cfg=norm_cfg,\n                act_cfg=act_cfg)\n        else:\n            self.out_project = None\n\n        self.query_downsample = query_downsample\n        self.key_downsample = key_downsample\n        self.matmul_norm = matmul_norm\n\n        self.init_weights()\n\n    def init_weights(self):\n        \"\"\"Initialize weight of later layer.\"\"\"\n        if self.out_project is not None:\n            if not isinstance(self.out_project, ConvModule):\n                constant_init(self.out_project, 0)\n\n    def build_project(self, in_channels, channels, num_convs, use_conv_module,\n                      conv_cfg, norm_cfg, act_cfg):\n        \"\"\"Build projection layer for key/query/value/out.\"\"\"\n        if use_conv_module:\n            convs = [\n                ConvModule(\n                    in_channels,\n                    channels,\n                    1,\n                    conv_cfg=conv_cfg,\n                    norm_cfg=norm_cfg,\n                    act_cfg=act_cfg)\n            ]\n            for _ in range(num_convs - 1):\n                convs.append(\n                    ConvModule(\n                        channels,\n                        channels,\n                        1,\n                        conv_cfg=conv_cfg,\n                        norm_cfg=norm_cfg,\n                        act_cfg=act_cfg))\n        else:\n            convs = [nn.Conv2d(in_channels, channels, 1)]\n            for _ in range(num_convs - 1):\n                convs.append(nn.Conv2d(channels, channels, 1))\n        if len(convs) > 1:\n            convs = nn.Sequential(*convs)\n        else:\n            convs = convs[0]\n        return convs\n\n    def forward(self, query_feats, key_feats):\n        \"\"\"Forward function.\"\"\"\n        batch_size = query_feats.size(0)\n        query = self.query_project(query_feats)\n        if self.query_downsample is not None:\n            query = self.query_downsample(query)\n        query = query.reshape(*query.shape[:2], -1)\n        query = query.permute(0, 2, 1).contiguous()\n\n        key = self.key_project(key_feats)\n        value = self.value_project(key_feats)\n        if self.key_downsample is not None:\n            key = self.key_downsample(key)\n            value = self.key_downsample(value)\n        key = key.reshape(*key.shape[:2], -1)\n        value = value.reshape(*value.shape[:2], -1)\n        value = value.permute(0, 2, 1).contiguous()\n\n        sim_map = torch.matmul(query, key)\n        if self.matmul_norm:\n            sim_map = (self.channels**-.5) * sim_map\n        sim_map = F.softmax(sim_map, dim=-1)\n\n        context = torch.matmul(sim_map, value)\n        context = context.permute(0, 2, 1).contiguous()\n        context = context.reshape(batch_size, -1, *query_feats.shape[2:])\n        if self.out_project is not None:\n            context = self.out_project(context)\n        return context\n"
  },
  {
    "path": "models/transformer_models/utils/up_conv_block.py",
    "content": "import torch\nimport torch.nn as nn\nfrom mmcv.cnn import ConvModule, build_upsample_layer\n\n\nclass UpConvBlock(nn.Module):\n    \"\"\"Upsample convolution block in decoder for UNet.\n\n    This upsample convolution block consists of one upsample module\n    followed by one convolution block. The upsample module expands the\n    high-level low-resolution feature map and the convolution block fuses\n    the upsampled high-level low-resolution feature map and the low-level\n    high-resolution feature map from encoder.\n\n    Args:\n        conv_block (nn.Sequential): Sequential of convolutional layers.\n        in_channels (int): Number of input channels of the high-level\n        skip_channels (int): Number of input channels of the low-level\n        high-resolution feature map from encoder.\n        out_channels (int): Number of output channels.\n        num_convs (int): Number of convolutional layers in the conv_block.\n            Default: 2.\n        stride (int): Stride of convolutional layer in conv_block. Default: 1.\n        dilation (int): Dilation rate of convolutional layer in conv_block.\n            Default: 1.\n        with_cp (bool): Use checkpoint or not. Using checkpoint will save some\n            memory while slowing down the training speed. Default: False.\n        conv_cfg (dict | None): Config dict for convolution layer.\n            Default: None.\n        norm_cfg (dict | None): Config dict for normalization layer.\n            Default: dict(type='BN').\n        act_cfg (dict | None): Config dict for activation layer in ConvModule.\n            Default: dict(type='ReLU').\n        upsample_cfg (dict): The upsample config of the upsample module in\n            decoder. Default: dict(type='InterpConv'). If the size of\n            high-level feature map is the same as that of skip feature map\n            (low-level feature map from encoder), it does not need upsample the\n            high-level feature map and the upsample_cfg is None.\n        dcn (bool): Use deformable convoluton in convolutional layer or not.\n            Default: None.\n        plugins (dict): plugins for convolutional layers. Default: None.\n    \"\"\"\n\n    def __init__(self,\n                 conv_block,\n                 in_channels,\n                 skip_channels,\n                 out_channels,\n                 num_convs=2,\n                 stride=1,\n                 dilation=1,\n                 with_cp=False,\n                 conv_cfg=None,\n                 norm_cfg=dict(type='BN'),\n                 act_cfg=dict(type='ReLU'),\n                 upsample_cfg=dict(type='InterpConv'),\n                 dcn=None,\n                 plugins=None):\n        super(UpConvBlock, self).__init__()\n        assert dcn is None, 'Not implemented yet.'\n        assert plugins is None, 'Not implemented yet.'\n\n        self.conv_block = conv_block(\n            in_channels=2 * skip_channels,\n            out_channels=out_channels,\n            num_convs=num_convs,\n            stride=stride,\n            dilation=dilation,\n            with_cp=with_cp,\n            conv_cfg=conv_cfg,\n            norm_cfg=norm_cfg,\n            act_cfg=act_cfg,\n            dcn=None,\n            plugins=None)\n        if upsample_cfg is not None:\n            self.upsample = build_upsample_layer(\n                cfg=upsample_cfg,\n                in_channels=in_channels,\n                out_channels=skip_channels,\n                with_cp=with_cp,\n                norm_cfg=norm_cfg,\n                act_cfg=act_cfg)\n        else:\n            self.upsample = ConvModule(\n                in_channels,\n                skip_channels,\n                kernel_size=1,\n                stride=1,\n                padding=0,\n                conv_cfg=conv_cfg,\n                norm_cfg=norm_cfg,\n                act_cfg=act_cfg)\n\n    def forward(self, skip, x):\n        \"\"\"Forward function.\"\"\"\n\n        x = self.upsample(x)\n        out = torch.cat([skip, x], dim=1)\n        out = self.conv_block(out)\n\n        return out\n"
  },
  {
    "path": "options/__init__.py",
    "content": ""
  },
  {
    "path": "options/base_options.py",
    "content": "import argparse\nimport os\nfrom util import util\nimport torch\nimport models\nimport data\n\n\nclass BaseOptions():\n    def __init__(self):\n        self.initialized = False\n\n    def initialize(self, parser):\n        parser.add_argument('--dataroot', required=True, help='path to images, should have training, validation and testing')\n        parser.add_argument('--batch_size', type=int, default=2, help='input batch size')\n        parser.add_argument('--useWidth', type=int, default=1280, help='scale images to this width 640')\n        parser.add_argument('--useHeight', type=int, default=704, help='scale images to this height 352')\n        parser.add_argument('--gpu_ids', type=str, default='0', help='gpu ids: e.g. 0  0,1,2, 0,2. use -1 for CPU')\n        parser.add_argument('--name', type=str, default='experiment_name', help='name of the experiment. It decides where to store samples and models')\n        parser.add_argument('--use_sne', action='store_true', help='chooses if using sne')\n        parser.add_argument('--dataset', type=str, default='ORFD', help='chooses which dataset to load.')\n        parser.add_argument('--model', type=str, default='roadseg', help='chooses which model to use.')\n        parser.add_argument('--epoch', type=str, default='latest', help='chooses which epoch to load')\n        parser.add_argument('--num_threads', default=0, type=int, help='# default 8, threads for loading data')\n        parser.add_argument('--checkpoints_dir', type=str, default='./checkpoints', help='models are saved here')\n        parser.add_argument('--norm', type=str, default='instance', help='instance normalization or batch normalization')\n        parser.add_argument('--serial_batches', action='store_true', help='if true, takes images in order to make batches, otherwise takes them randomly')\n        parser.add_argument('--init_type', type=str, default='kaiming', help='network initialization [normal|xavier|kaiming|orthogonal]')\n        parser.add_argument('--init_gain', type=float, default=0.02, help='scaling factor for normal, xavier and orthogonal.')\n        parser.add_argument('--verbose', action='store_true', help='if specified, print more debugging information')\n        parser.add_argument('--seed', type=int, default=0, help='seed for random generators')\n        self.initialized = True\n        return parser\n\n    def gather_options(self):\n        # initialize parser with basic options\n        if not self.initialized:\n            parser = argparse.ArgumentParser(\n                formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n            parser = self.initialize(parser)\n\n        # get the basic options\n        opt, _ = parser.parse_known_args()\n\n        # modify model-related parser options\n        model_name = opt.model\n        model_option_setter = models.get_option_setter(model_name)\n        parser = model_option_setter(parser, self.isTrain)\n        opt, _ = parser.parse_known_args()  # parse again with the new defaults\n\n        # modify dataset-related parser options\n        dataset_name = opt.dataset\n        dataset_option_setter = data.get_option_setter(dataset_name)\n        parser = dataset_option_setter(parser, self.isTrain)\n\n        self.parser = parser\n\n        return parser.parse_args()\n\n    def print_options(self, opt):\n        message = ''\n        message += '----------------- Options ---------------\\n'\n        for k, v in sorted(vars(opt).items()):\n            comment = ''\n            default = self.parser.get_default(k)\n            if v != default:\n                comment = '\\t[default: %s]' % str(default)\n            message += '{:>25}: {:<30}{}\\n'.format(str(k), str(v), comment)\n        message += '----------------- End -------------------'\n        print(message)\n\n        # save to the disk\n        expr_dir = os.path.join(opt.checkpoints_dir, opt.name)\n        util.mkdirs(expr_dir)\n        file_name = os.path.join(expr_dir, 'opt.txt')\n        with open(file_name, 'wt') as opt_file:\n            opt_file.write(message)\n            opt_file.write('\\n')\n\n    def parse(self):\n        opt = self.gather_options()\n        opt.isTrain = self.isTrain   # train or test\n\n        self.print_options(opt)\n\n        # set gpu ids\n        str_ids = opt.gpu_ids.split(',')\n        opt.gpu_ids = []\n        for str_id in str_ids:\n            id = int(str_id)\n            if id >= 0:\n                opt.gpu_ids.append(id)\n        if len(opt.gpu_ids) > 0:\n            torch.cuda.set_device(opt.gpu_ids[0])\n\n        self.opt = opt\n        return self.opt\n"
  },
  {
    "path": "options/test_options.py",
    "content": "from .base_options import BaseOptions\n\nclass TestOptions(BaseOptions):\n    def initialize(self, parser):\n        parser = BaseOptions.initialize(self, parser)\n        parser.add_argument('--results_dir', type=str, default='./testresults/', help='saves results here.')\n        parser.add_argument('--phase', type=str, default='test', help='train, val, test')\n        parser.add_argument('--prob_map', action='store_true', help='chooses outputting prob maps or binary predictions')\n        parser.add_argument('--no_label', action='store_true', help='chooses if we have gt labels in testing phase')\n        self.isTrain = False\n        return parser\n"
  },
  {
    "path": "options/train_options.py",
    "content": "from .base_options import BaseOptions\n\nclass TrainOptions(BaseOptions):\n    def initialize(self, parser):\n        parser = BaseOptions.initialize(self, parser)\n        parser.add_argument('--print_freq', type=int, default=10, help='frequency of showing training results on console')\n        parser.add_argument('--continue_train', action='store_true', help='continue training: load the latest model')\n        parser.add_argument('--epoch_count', type=int, default=1, help='the starting epoch count')\n        parser.add_argument('--phase', type=str, default='train', help='train, val, test')\n        parser.add_argument('--nepoch', type=int, default=30, help='maximum epochs')\n        parser.add_argument('--beta1', type=float, default=0.5, help='momentum term of adam')\n        parser.add_argument('--lr', type=float, default=0.001, help='initial learning rate for optimizer')\n        parser.add_argument('--momentum', type=float, default=0.9, help='momentum factor for SGD')\n        parser.add_argument('--weight_decay', type=float, default=0.0005, help='momentum factor for optimizer')\n        parser.add_argument('--lr_policy', type=str, default='lambda', help='learning rate policy: lambda|step|plateau|cosine')\n        parser.add_argument('--lr_decay_iters', type=int, default=5000000, help='multiply by a gamma every lr_decay_iters iterations')\n        parser.add_argument('--lr_decay_epochs', type=int, default=25, help='multiply by a gamma every lr_decay_epoch epochs')\n        parser.add_argument('--lr_gamma', type=float, default=0.9, help='gamma factor for lr_scheduler')\n        self.isTrain = True\n        return parser\n"
  },
  {
    "path": "road_hesai40_process.py",
    "content": "from __future__ import absolute_import, division, print_function\n\nimport os\nimport copy\nimport glob\nimport numpy as np\nfrom collections import Counter\nfrom scipy import interpolate\nimport skimage.transform\nimport cv2\nif not (\"DISPLAY\" in os.environ):\n    import matplotlib as mpl\n    mpl.use('Agg')\nimport matplotlib.pyplot as plt\nfrom PIL import Image\nimport mayavi.mlab\n\nimport multiprocessing as mp\n \n\ncmap = plt.cm.jet\ncmap2 = plt.cm.nipy_spectral\n\n\ndef show_velo(pointcloud):\n    x = pointcloud[:, 0]  # x position of point\n    y = pointcloud[:, 1]  # y position of point\n    z = pointcloud[:, 2]  # z position of point\n    \n    d = np.sqrt(x ** 2 + y ** 2)  # Map Distance from sensor\n    \n    degr = np.degrees(np.arctan(z / d))\n    \n    vals = 'height'\n    if vals == \"height\":\n        col = z\n    else:\n        col = d\n    \n    fig = mayavi.mlab.figure(bgcolor=(0, 0, 0), size=(640, 500))\n    mayavi.mlab.points3d(x, y, z,\n                        col,  # Values used for Color\n                        mode=\"point\",\n                        colormap='spectral',  # 'bone', 'copper', 'gnuplot'\n                        # color=(0, 1, 0),   # Used a fixed (r,g,b) instead\n                        figure=fig,\n                        )\n    \n    mayavi.mlab.show()\n\ndef depth_colorize(depth):\n    depth = (depth - np.min(depth)) / (np.max(depth) - np.min(depth))\n    depth = 255 * cmap(depth)[:, :, :3]  # H, W, C\n    return depth.astype('uint8')\n\ndef load_velodyne_points(filename):\n    \"\"\"Load 3D point cloud from KITTI file format\n    (adapted from https://github.com/hunse/kitti)\n    \"\"\"\n    #points = np.fromfile(filename, dtype=np.float32).reshape(-1, 4) #kitti\n    points = np.fromfile(filename, dtype=np.float32).reshape(-1, 5)[:,:4]  #jk hesai40 data\n    points[:, 3] = 1.0  # homogeneous\n    return points\n\n\ndef read_calib_file(path):\n    \"\"\"Read KITTI calibration file\n    (from https://github.com/hunse/kitti)\n    \"\"\"\n    float_chars = set(\"0123456789.e+- \")\n    data = {}\n    with open(path, 'r') as f:\n        for line in f.readlines():\n            key, value = line.split(':', 1)\n            value = value.strip()\n            data[key] = value\n            if float_chars.issuperset(value):\n                # try to cast to float array\n                try:\n                    data[key] = np.array(list(map(float, value.split(' '))))\n                except ValueError:\n                    # casting error: data[key] already eq. value, so pass\n                    pass\n\n    return data\n\n\ndef sub2ind(matrixSize, rowSub, colSub):\n    \"\"\"Convert row, col matrix subscripts to linear indices\n    \"\"\"\n    m, n = matrixSize\n    return rowSub * (n-1) + colSub - 1\n\n\ndef generate_depth_map(velo_filename):\n    \"\"\"Generate a depth map from jk hesai40 data\n    \"\"\"\n\n    # set image shape\n    im_shape = np.array([720,1280],dtype=np.int32)\n\n    # load velodyne points\n    velo = load_velodyne_points(velo_filename)\n    #show_velo(velo)\n    \n    # camera parameter\n    K = np.array([ 1472.919866, 0.000000, 614.779599, 0.000000, 1452.953534, 353.800982, 0.000000, 0.000000, 1.000000 ]).reshape(3,3)\n    \n    RT = np.array([ 9.9954895655531806e-01, -2.9636912774399400e-02,\n       4.8514791948404291e-03, 5.4418414831161499e-02,\n       2.6634465658627177e-03, -7.3426033150586920e-02,\n       -9.9729710904432078e-01, -1.2367740273475647e-01,\n       2.9913032303096956e-02, 9.9686020637648665e-01,\n       -7.3313978486113582e-02, -1.0199587792158127e-01, 0., 0., 0., 1. ]).reshape(4,4)\n    \n    '''\n    # the default RT of camera is camera to car, so inv is needed \n    print('R before:',RT)\n    RTC = np.linalg.inv(RT)\n    print('R after:',RTC)\n    '''\n    \n\n    # velodyne parameter from velodyne to car\n    R_velo = np.array([-0.996842,-0.0793231,0.00385075,\n        0.0794014,-0.994533,0.067813,\n        -0.00154944,0.0679046,0.997691]).reshape(3,3)\n    T_velo = np.array([0, 0, 0]).reshape(3,1)\n    \n    # project velodyne to car coordinate\n    velo = R_velo @ velo[:,:3].T + T_velo\n    velo = velo.T\n\n    # remove all behind image plane (approximation)\n    velo = velo[velo[:, 1] >= 0, :]  \n    #show_velo(velo)\n\n    '''\n    One_array = np.ones(velo.shape[0]).reshape(velo.shape[0],1)\n    velo = np.hstack((velo,One_array))\n    print('velo.shape{}'.format(velo.shape))\n    '''\n\n    # project the points to the camera\n    # projection from car to image\n    '''\n    # version 1:\n    P_car2im = K @ RT[:3, :4]\n    velo_pts_im = np.dot(P_car2im, velo.T).T\n    '''\n\n    # version 2:\n    R = RT[:3,:3]\n    T = RT[:3,-1].reshape(3,1)\n\n    velo_pts_cam = (R @ velo.T + T).T\n    #show_velo(velo_pts_cam)\n\n    velo_pts_im = (K @ velo_pts_cam.T).T\n   \n    #show_velo(velo_pts_im)\n    #velo_pts_im = np.array((velo_pts_im[:,0],velo_pts_im[:,1],velo_pts_im[:,2])).T#*(-1)\n\n    velo_pts_im[:, :2] = velo_pts_im[:, :2] / velo_pts_im[:, 2][..., np.newaxis]\n    \n    # check if in bounds\n    # use minus 1 to get the exact same value as KITTI matlab code\n    velo_pts_im[:, 0] = np.round(velo_pts_im[:, 0]) - 1\n    velo_pts_im[:, 1] = np.round(velo_pts_im[:, 1]) - 1\n    #velo_pts_im[:, 0] \n    val_inds = (velo_pts_im[:, 0] >= 0) & (velo_pts_im[:, 1] >= 0)\n    val_inds = val_inds & (velo_pts_im[:, 0] < im_shape[1]) & (velo_pts_im[:, 1] < im_shape[0])\n    velo_pts_im = velo_pts_im[val_inds, :]\n\n    \n\n    # project to image\n    depth = np.zeros((im_shape[:2]))\n    depth[velo_pts_im[:, 1].astype(np.int), velo_pts_im[:, 0].astype(np.int)] = velo_pts_im[:, 2]#*(-1)\n\n    # find the duplicate points and choose the closest depth\n    inds = sub2ind(depth.shape, velo_pts_im[:, 1], velo_pts_im[:, 0])\n    dupe_inds = [item for item, count in Counter(inds).items() if count > 1]\n    for dd in dupe_inds:\n        pts = np.where(inds == dd)[0]\n        x_loc = int(velo_pts_im[pts[0], 0])\n        y_loc = int(velo_pts_im[pts[0], 1])\n        depth[y_loc, x_loc] = velo_pts_im[pts, 2].min()\n    depth[depth < 0] = 0\n    print('depth.max:{}, depth.min:{}:'.format(depth.max(),depth.min()))\n\n    # interpolate the depth map to fill in holes\n    depth_inter = lin_interp(im_shape, velo_pts_im)\n    #depth_inter = lstsq_interp(im_shape, velo_pts_im,valid=False)\n\n    '''\n    # transfor depth to pointclouds\n    depth_nonzero = np.where(depth_inter>0)\n    #depth_nonzero = np.where(depth>0)\n    u = depth_nonzero[1]\n    v = depth_nonzero[0]\n    z = depth_inter[v,u]\n    \n    #print('depth_nonzero:',depth_nonzero)\n    #print('u:',u)\n    #print('v:',v)\n    #print('z:',z)\n\n    #uvz = np.vstack((u,v,z))\n    uvz = np.vstack((u,v,z))\n    \n    velo_c = np.linalg.inv(K) @ uvz\n    print('velo_c:',velo_c,velo_c.shape)\n\n    velo_w = R.T @ (velo_c - T)\n    print('velo_w:',velo_w,velo_w.shape)\n\n    show_velo(velo_w.T)\n    '''\n    return depth,depth_inter\n\ndef lin_interp(shape, xyd):\n    from scipy.interpolate import LinearNDInterpolator\n\n    m, n = shape\n    ij, d = xyd[:, 1::-1], xyd[:, 2]\n    f = LinearNDInterpolator(ij, d, fill_value=0)\n    J, I = np.meshgrid(np.arange(n), np.arange(m))\n    IJ = np.vstack([I.flatten(), J.flatten()]).T\n    disparity = f(IJ).reshape(shape)\n    return disparity\n\n\ndef main(velo_filename):\n    \"\"\"\n    process hesai40 lidar to sparse depth image and dense image\n    \"\"\"\n    path1_list = glob.glob(velo_filename+'/*') \n    for path2 in path1_list:\n        lidar_path = os.path.join(path2,'lidar_data')\n        rgb_path = os.path.join(path2,'image_data')\n\n        sparse_depth_dir = os.path.join(path2,'sparse_depth')\n        dense_depth_dir = os.path.join(path2,'dense_depth')\n        if not os.path.exists(sparse_depth_dir):\n            os.mkdir(sparse_depth_dir)\n        if not os.path.exists(dense_depth_dir):\n            os.mkdir(dense_depth_dir)\n\n        # depth img name and dirs\n        lidar_data_list = glob.glob(lidar_path+'/*')\n        for velo_filename in lidar_data_list:\n            print('velo_filename:',velo_filename)\n            img_name = velo_filename.split('/')[-1].split('.')[0]\n            sparse_depth_save_dir = os.path.join(sparse_depth_dir,img_name+'.png')\n            dense_depth_save_dir = os.path.join(dense_depth_dir,img_name+'.png')\n            rgb_img_dir = os.path.join(rgb_path,img_name+'.png')\n        \n\n            # TODO: prepare for multi process \n            # lidar to depth\n            sparse_depth_pred, dense_depth_pred = generate_depth_map(velo_filename)\n\n            sparse_depth = copy.deepcopy(sparse_depth_pred)\n            dense_depth = copy.deepcopy(dense_depth_pred)\n            \n            # save depth in unit16 format\n            img = (sparse_depth * 256.0).astype('uint16')\n            img_buffer = img.tobytes()\n            imgsave = Image.new(\"I\", img.T.shape)\n            imgsave.frombytes(img_buffer, 'raw', \"I;16\")\n            imgsave.save(sparse_depth_save_dir)\n\n            img = (dense_depth * 256.0).astype('uint16')\n            img_buffer = img.tobytes()\n            imgsave = Image.new(\"I\", img.T.shape)\n            imgsave.frombytes(img_buffer, 'raw', \"I;16\")\n            imgsave.save(dense_depth_save_dir)\n            \n            '''\n            # show depth \n            sparse_depth = depth_colorize(sparse_depth)\n            sparse_depth = cv2.cvtColor(sparse_depth, cv2.COLOR_RGB2BGR)\n\n            dense_depth = depth_colorize(dense_depth)\n            dense_depth = cv2.cvtColor(dense_depth, cv2.COLOR_RGB2BGR)\n            \n            rgb = cv2.imread(rgb_img_dir) # the correspoing rgb img\n\n            depth = np.concatenate((rgb,dense_depth,sparse_depth),axis=1)\n            depth = cv2.resize(depth, (0,0),fx=0.5, fy=0.5)\n\n            \n            #cv2.imshow('depth_gt',depth_new)\n            cv2.imshow('depth',depth)\n            \n            # show image with sparse to check the project consistency\n            rgb = np.transpose(rgb, (2, 0, 1))\n            ind = sparse_depth_pred>0\n            rgb[0][ind] = 0 #sparse_depth[ind]\n            rgb[1][ind] = 0 #sparse_depth[ind]\n            rgb[2][ind] = 0  #sparse_depth[ind]\n            rgb = rgb.transpose(1,2,0)\n\n            cv2.imshow('rgb+sparse depth',rgb)\n            cv2.waitKey(1)\n            '''\n\n\nif __name__=='__main__':\n    velo_filename = 'xxx' \n    main(velo_filename)"
  },
  {
    "path": "scripts/demo.sh",
    "content": "python3 demo.py --dataroot examples --name ORFD --use_sne --epoch best\n"
  },
  {
    "path": "scripts/test.sh",
    "content": "CUDA_VISIBLE_DEVICES=0 python3 test.py --dataroot '/workspace/data/' --dataset ORFD --name ORFD --use_sne --prob_map  --epoch best"
  },
  {
    "path": "scripts/train.sh",
    "content": "python3 train.py --dataroot '/workspace/data' --dataset ORFD --name ORFD --use_sne --batch_size 8  --gpu_ids 0,1,2,3"
  },
  {
    "path": "test.py",
    "content": "import os\nfrom options.test_options import TestOptions\nfrom data import CreateDataLoader\nfrom models import create_model\nfrom util.util import confusion_matrix, getScores, save_images\nimport torch\nimport numpy as np\nimport cv2\n\nif __name__ == '__main__':\n    opt = TestOptions().parse()\n    opt.num_threads = 1\n    opt.batch_size = 1\n    opt.serial_batches = True  # no shuffle\n    opt.isTrain = False\n\n    save_dir = os.path.join(opt.results_dir, opt.name, opt.phase + '_' + opt.epoch)\n    if not os.path.exists(save_dir):\n        os.makedirs(save_dir)\n\n    data_loader = CreateDataLoader(opt)\n    dataset = data_loader.load_data()\n    model = create_model(opt, dataset.dataset)\n    model.setup(opt)\n    model.eval()\n\n    test_loss_iter = []\n    epoch_iter = 0\n    conf_mat = np.zeros((dataset.dataset.num_labels, dataset.dataset.num_labels), dtype=np.float)\n    with torch.no_grad():\n        for i, data in enumerate(dataset):\n            model.set_input(data)\n            model.forward()\n            model.get_loss()\n            epoch_iter += opt.batch_size\n            gt = model.label.cpu().int().numpy()\n            _, pred = torch.max(model.output.data.cpu(), 1)\n            pred = pred.float().detach().int().numpy()\n            save_images(save_dir, model.get_current_visuals(), model.get_image_names(), model.get_image_oriSize(), opt.prob_map)\n\n            # Resize images to the original size for evaluation\n            image_size = model.get_image_oriSize()\n            oriSize = (image_size[0].item(), image_size[1].item())\n            gt = np.expand_dims(cv2.resize(np.squeeze(gt, axis=0), oriSize, interpolation=cv2.INTER_NEAREST), axis=0)\n            pred = np.expand_dims(cv2.resize(np.squeeze(pred, axis=0), oriSize, interpolation=cv2.INTER_NEAREST), axis=0)\n            conf_mat += confusion_matrix(gt, pred, dataset.dataset.num_labels)\n\n            test_loss_iter.append(model.loss_segmentation)\n            print('Epoch {0:}, iters: {1:}/{2:}, loss: {3:.3f} '.format(opt.epoch,\n                                                                        epoch_iter,\n                                                                        len(dataset) * opt.batch_size,\n                                                                        test_loss_iter[-1]), end='\\r')\n\n        avg_test_loss = torch.mean(torch.stack(test_loss_iter))\n        print ('Epoch {0:} test loss: {1:.3f} '.format(opt.epoch, avg_test_loss))\n        globalacc, pre, recall, F_score, iou = getScores(conf_mat)\n        print ('Epoch {0:} glob acc : {1:.3f}, pre : {2:.3f}, recall : {3:.3f}, F_score : {4:.3f}, IoU : {5:.3f}'.format(opt.epoch, globalacc, pre, recall, F_score, iou))\n"
  },
  {
    "path": "train.py",
    "content": "import time\nfrom options.train_options import TrainOptions\nfrom data import CreateDataLoader\nfrom models import create_model\nfrom util.util import confusion_matrix, getScores, tensor2labelim, tensor2im, print_current_losses\nimport numpy as np\nimport random\nimport torch\nimport cv2\nfrom tensorboardX import SummaryWriter\n\n\nif __name__ == '__main__':\n    train_opt = TrainOptions().parse()\n\n    np.random.seed(train_opt.seed)\n    random.seed(train_opt.seed)\n    torch.manual_seed(train_opt.seed)\n    torch.cuda.manual_seed(train_opt.seed)\n\n    train_data_loader = CreateDataLoader(train_opt)\n    train_dataset = train_data_loader.load_data()\n    train_dataset_size = len(train_data_loader)\n    print('#training images = %d' % train_dataset_size)\n\n    valid_opt = TrainOptions().parse()\n    valid_opt.phase = 'val'\n    valid_opt.batch_size = 1\n    valid_opt.num_threads = 1\n    valid_opt.serial_batches = True\n    valid_opt.isTrain = False\n    valid_data_loader = CreateDataLoader(valid_opt)\n    valid_dataset = valid_data_loader.load_data()\n    valid_dataset_size = len(valid_data_loader)\n    print('#validation images = %d' % valid_dataset_size)\n\n    writer = SummaryWriter()\n\n    model = create_model(train_opt, train_dataset.dataset)\n    model.setup(train_opt)\n    \n    total_steps = 0\n    tfcount = 0\n    F_score_max = 0\n    for epoch in range(train_opt.epoch_count, train_opt.nepoch + 1):\n        ### Training on the training set ###\n        model.train()\n        epoch_start_time = time.time()\n        iter_data_time = time.time()\n        epoch_iter = 0\n        train_loss_iter = []\n        for i, data in enumerate(train_dataset):\n            iter_start_time = time.time()\n            if total_steps % train_opt.print_freq == 0:\n                t_data = iter_start_time - iter_data_time\n            total_steps += train_opt.batch_size\n            epoch_iter += train_opt.batch_size\n            model.set_input(data)\n            model.optimize_parameters()\n\n            if total_steps % train_opt.print_freq == 0:\n                tfcount = tfcount + 1\n                losses = model.get_current_losses()\n                train_loss_iter.append(losses[\"segmentation\"])\n                t = (time.time() - iter_start_time) / train_opt.batch_size\n                print_current_losses(epoch, epoch_iter, losses, t, t_data)\n                # There are several whole_loss values shown in tensorboard in one epoch,\n                # to help better see the optimization phase\n                writer.add_scalar('train/whole_loss', losses[\"segmentation\"], tfcount)\n\n            iter_data_time = time.time()\n\n        mean_loss = np.mean(train_loss_iter)\n        # One average training loss value in tensorboard in one epoch\n        writer.add_scalar('train/mean_loss', mean_loss, epoch)\n\n        palet_file = 'datasets/palette.txt'\n        impalette = list(np.genfromtxt(palet_file,dtype=np.uint8).reshape(3*256))\n        tempDict = model.get_current_visuals()\n        rgb = tensor2im(tempDict['rgb_image'])\n        if train_opt.use_sne:\n            another = tensor2im((tempDict['another_image']+1)/2)    # color normal images\n        else:\n            another = tensor2im(tempDict['another_image'])\n        label = tensor2labelim(tempDict['label'], impalette)\n        output = tensor2labelim(tempDict['output'], impalette)\n        #image_numpy = np.concatenate((rgb, another, label, output), axis=1)\n        image_numpy = np.concatenate((rgb, another), axis=1)\n        image_numpy = image_numpy.astype(np.float64) / 255\n        writer.add_image('Epoch' + str(epoch), image_numpy, dataformats='HWC')  # show training images in tensorboard\n\n        print('End of epoch %d / %d \\t Time Taken: %d sec' %   (epoch, train_opt.nepoch, time.time() - epoch_start_time))\n        model.update_learning_rate()\n\n        ### Evaluation on the validation set ###\n        model.eval()\n        valid_loss_iter = []\n        epoch_iter = 0\n        conf_mat = np.zeros((valid_dataset.dataset.num_labels, valid_dataset.dataset.num_labels), dtype=np.float)\n        with torch.no_grad():\n            for i, data in enumerate(valid_dataset):\n                model.set_input(data)\n                model.forward()\n                model.get_loss()\n                epoch_iter += valid_opt.batch_size\n                gt = model.label.cpu().int().numpy()\n                _, pred = torch.max(model.output.data.cpu(), 1)\n                pred = pred.float().detach().int().numpy()\n\n                # Resize images to the original size for evaluation\n                image_size = model.get_image_oriSize()\n                oriSize = (image_size[0].item(), image_size[1].item())\n                gt = np.expand_dims(cv2.resize(np.squeeze(gt, axis=0), oriSize, interpolation=cv2.INTER_NEAREST), axis=0)\n                pred = np.expand_dims(cv2.resize(np.squeeze(pred, axis=0), oriSize, interpolation=cv2.INTER_NEAREST), axis=0)\n\n                conf_mat += confusion_matrix(gt, pred, valid_dataset.dataset.num_labels)\n                losses = model.get_current_losses()\n                valid_loss_iter.append(model.loss_segmentation)\n                print('valid epoch {0:}, iters: {1:}/{2:} '.format(epoch, epoch_iter, len(valid_dataset) * valid_opt.batch_size), end='\\r')\n\n        avg_valid_loss = torch.mean(torch.stack(valid_loss_iter))\n        globalacc, pre, recall, F_score, iou = getScores(conf_mat)\n\n        # Record performance on the validation set\n        writer.add_scalar('valid/loss', avg_valid_loss, epoch)\n        writer.add_scalar('valid/global_acc', globalacc, epoch)\n        writer.add_scalar('valid/pre', pre, epoch)\n        writer.add_scalar('valid/recall', recall, epoch)\n        writer.add_scalar('valid/F_score', F_score, epoch)\n        writer.add_scalar('valid/iou', iou, epoch)\n\n        # Save the best model according to the F-score, and record corresponding epoch number in tensorboard\n        if F_score > F_score_max:\n            print('saving the model at the end of epoch %d, iters %d' % (epoch, total_steps))\n            model.save_networks('best')\n            F_score_max = F_score\n            writer.add_text('best model', str(epoch))\n"
  },
  {
    "path": "util/__init__.py",
    "content": ""
  },
  {
    "path": "util/util.py",
    "content": "from __future__ import print_function\nimport torch\nimport numpy as np\nfrom PIL import Image\nimport os\nimport cv2\n\n\ndef save_images(save_dir, visuals, image_name, image_size, prob_map):\n    \"\"\"save images to disk\"\"\"\n    image_name = image_name[0]\n    oriSize = (image_size[0].item(), image_size[1].item())\n    palet_file = 'datasets/palette.txt'\n    impalette = list(np.genfromtxt(palet_file, dtype=np.uint8).reshape(3*256))\n\n    for label, im_data in visuals.items():\n        if label == 'output':\n            if prob_map:\n                im = tensor2confidencemap(im_data)\n                im = cv2.resize(im, oriSize)\n                cv2.imwrite(os.path.join(save_dir, image_name), im)\n            else:\n                im = tensor2labelim(im_data, impalette)\n                im = cv2.resize(im, oriSize)\n                cv2.imwrite(os.path.join(save_dir, image_name), cv2.cvtColor(im, cv2.COLOR_RGB2BGR))\n\ndef tensor2im(input_image, imtype=np.uint8):\n    \"\"\"Converts a image Tensor into an image array (numpy)\"\"\"\n    if isinstance(input_image, torch.Tensor):\n        image_tensor = input_image.data\n    else:\n        return input_image\n    image_numpy = image_tensor[0].cpu().float().numpy()\n    if image_numpy.shape[0] == 1:\n        image_numpy = np.tile(image_numpy, (3, 1, 1))\n    image_numpy = (np.transpose(image_numpy, (1, 2, 0)))* 255.0\n    return image_numpy.astype(imtype)\n\ndef tensor2labelim(label_tensor, impalette, imtype=np.uint8):\n    \"\"\"Converts a label Tensor into an image array (numpy),\n    we use a palette to color the label images\"\"\"\n    if len(label_tensor.shape) == 4:\n        _, label_tensor = torch.max(label_tensor.data.cpu(), 1)\n\n    label_numpy = label_tensor[0].cpu().float().detach().numpy()\n    label_image = Image.fromarray(label_numpy.astype(np.uint8))\n    label_image = label_image.convert(\"P\")\n    label_image.putpalette(impalette)\n    label_image = label_image.convert(\"RGB\")\n    return np.array(label_image).astype(imtype)\n\ndef tensor2confidencemap(label_tensor, imtype=np.uint8):\n    \"\"\"Converts a prediction Tensor into an image array (numpy)\"\"\"\n    softmax_numpy = label_tensor[0].cpu().float().detach().numpy()\n    softmax_numpy = np.exp(softmax_numpy)\n    label_image = np.true_divide(softmax_numpy[1], softmax_numpy[0] + softmax_numpy[1])\n    label_image = np.floor(255 * (label_image - label_image.min()) / (label_image.max() - label_image.min()))\n    return np.array(label_image).astype(imtype)\n\n\ndef print_current_losses(epoch, i, losses, t, t_data):\n    message = '(epoch: %d, iters: %d, time: %.3f, data: %.3f) ' % (epoch, i, t, t_data)\n    for k, v in losses.items():\n        message += '%s: %.3f ' % (k, v)\n    print(message)\n\n\ndef mkdirs(paths):\n    if isinstance(paths, list) and not isinstance(paths, str):\n        for path in paths:\n            mkdir(path)\n    else:\n        mkdir(paths)\n\ndef mkdir(path):\n    if not os.path.exists(path):\n        os.makedirs(path)\n\n\ndef confusion_matrix(x, y, n, ignore_label=None, mask=None):\n    if mask is None:\n        mask = np.ones_like(x) == 1\n    k = (x >= 0) & (y < n) & (x != ignore_label) & (mask.astype(np.bool))\n    return np.bincount(n * x[k].astype(int) + y[k], minlength=n**2).reshape(n, n)\n\ndef getScores(conf_matrix):\n    if conf_matrix.sum() == 0:\n        return 0, 0, 0, 0, 0\n    with np.errstate(divide='ignore',invalid='ignore'):\n        globalacc = np.diag(conf_matrix).sum() / np.float(conf_matrix.sum())\n        classpre = np.diag(conf_matrix) / conf_matrix.sum(0).astype(np.float)\n        classrecall = np.diag(conf_matrix) / conf_matrix.sum(1).astype(np.float)\n        IU = np.diag(conf_matrix) / (conf_matrix.sum(1) + conf_matrix.sum(0) - np.diag(conf_matrix)).astype(np.float)\n        pre = classpre[1]\n        recall = classrecall[1]\n        iou = IU[1]\n        F_score = 2*(recall*pre)/(recall+pre)\n    return globalacc, pre, recall, F_score, iou\n"
  }
]