[
  {
    "path": ".gitignore",
    "content": "## General\r\n\r\n# Compiled Object files\r\n*.slo\r\n*.lo\r\n*.o\r\n*.cuo\r\n\r\n# Compiled Dynamic libraries\r\n*.so\r\n*.dylib\r\n\r\n# Compiled Static libraries\r\n*.lai\r\n*.la\r\n*.a\r\n\r\n# Compiled protocol buffers\r\n*.pb.h\r\n*.pb.cc\r\n*_pb2.py\r\n\r\n# Compiled python\r\n*.pyc\r\n\r\n# Compiled MATLAB\r\n*.mex*\r\n\r\n# IPython notebook checkpoints\r\n.ipynb_checkpoints\r\n\r\n# Editor temporaries\r\n*.swp\r\n*~\r\n\r\n# Sublime Text settings\r\n*.sublime-workspace\r\n*.sublime-project\r\n\r\n# Eclipse Project settings\r\n*.*project\r\n.settings\r\n\r\n# QtCreator files\r\n*.user\r\n\r\n# PyCharm files\r\n.idea\r\n\r\n# Visual Studio Code files\r\n.vscode\r\n\r\n# OSX dir files\r\n.DS_Store\r\n\r\n# personal\r\n*.log\r\n*.pth\r\n*.caffemodel\r\nexp/\r\nsummary/\r\n__pycache__/\r\n# data/\r\nback/\r\n*.png\r\n*.jpg\r\n*.log\r\n*.pth\r\nevents*\r\nconfig/\r\ninitmodel/\r\n*.ninja_deps\r\n*.ninja_log\r\n*.ninja\r\n*.yaml\r\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2019 Hengshuang Zhao\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE.\n"
  },
  {
    "path": "README.md",
    "content": "# PointWeb: Enhancing Local Neighborhood Features for Point Cloud Processing\n\nby Hengshuang Zhao\\*, Li Jiang*, Chi-Wing Fu, and Jiaya Jia, details are in [paper](http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhao_PointWeb_Enhancing_Local_Neighborhood_Features_for_Point_Cloud_Processing_CVPR_2019_paper.pdf).\n\n### Introduction\n\nThis repository is build for PointWeb in point cloud scene understanding.\n\n<img src=\"./figure/pointweb.jpg\" width=\"900\"/>\n\n### Usage\n\n1. Requirement:\n\n   - Hardware: 4 GPUs (better with >=11G GPU memory)\n   - Software: PyTorch>=1.0.0, Python3, CUDA>=9.0, [tensorboardX](https://github.com/lanpa/tensorboardX)\n\n2. Clone the repository and build the ops:\n\n   ```shell\n   git clone https://github.com/hszhao/PointWeb.git\n   cd PointWeb\n   cd lib/pointops && python setup.py install && cd ../../\n   ```\n\n3. Train:\n\n   - Download related [datasets](https://drive.google.com/open?id=1Jpi2IP58zHs6Ppv05kqvwJhBnl-Kge2q) and symlink the paths to them as follows (you can alternatively modify the relevant paths specified in folder `config`):\n\n     ```\n     mkdir -p dataset\n     ln -s /path_to_s3dis_dataset dataset/s3dis\n     ```\n\n   - Specify the gpu used in config and then do training:\n\n     ```shell\n     sh tool/train.sh s3dis pointweb\n     ```\n\n4. Test:\n\n   - Download trained segmentation models and put them under folder specified in config or modify the specified paths.\n\n   - For full testing (get listed performance):\n\n     ```shell\n     sh tool/test.sh s3dis pointweb\n     ```\n\n5. Visualization: [tensorboardX](https://github.com/lanpa/tensorboardX) incorporated for better visualization.\n\n   ```shell\n   tensorboard --logdir=run1:$EXP1,run2:$EXP2 --port=6789\n   ```\n\n6. Other:\n\n   - Resources: GoogleDrive [LINK](https://drive.google.com/open?id=1IFoKe5TM3ZO38LT4VXCaHKvCNkXfgtBf) contains shared models, predictions and part of the related datasets.\n   - Video predictions: Youtube [LINK](https://youtu.be/CaobqpsUP_4).\n\n### Performance\n\nDescription: **mIoU/mAcc/aAcc/voxAcc** stands for mean IoU, mean accuracy of each class, all pixel accuracy , and voxel label accuracy respectively. \n\nmIoU/mAcc/aAcc of PointWeb on S3DIS dataset: 0.6055/0.6682/0.8658.\n\nmIoU/mAcc/aAcc/voxAcc of PointWeb on ScanNet dataset: 0.5063/0.6061/0.8529/0.8568.\n\n### Citation\n\nIf you find the code or trained models useful, please consider citing:\n\n```\n@inproceedings{zhao2019pointweb,\n  title={{PointWeb}: Enhancing Local Neighborhood Features for Point Cloud Processing},\n  author={Zhao, Hengshuang and Jiang, Li and Fu, Chi-Wing and Jia, Jiaya},\n  booktitle={CVPR},\n  year={2019}\n}\n```\n"
  },
  {
    "path": "data/s3dis/s3dis_names.txt",
    "content": "ceiling\nfloor\nwall\nbeam\ncolumn\nwindow\ndoor\nchair\ntable\nbookcase\nsofa\nboard\nclutter\n"
  },
  {
    "path": "data/scannet/scannet_names.txt",
    "content": "bathtub\nbed\nbookshelf\ncabinet\nchair\ncounter\ncurtain\ndesk\ndoor\nfloor\notherfurniture\npicture\nrefrigerator\nshowercurtain\nsink\nsofa\ntable\ntoilet\nwall\nwindow\n"
  },
  {
    "path": "lib/__init__.py",
    "content": ""
  },
  {
    "path": "lib/pointops/__init__.py",
    "content": ""
  },
  {
    "path": "lib/pointops/functions/__init__.py",
    "content": ""
  },
  {
    "path": "lib/pointops/functions/pointops.py",
    "content": "from typing import Tuple\n\nimport torch\nfrom torch.autograd import Function\nimport torch.nn as nn\n\nimport pointops_cuda\n\n\nclass FurthestSampling(Function):\n    @staticmethod\n    def forward(ctx, xyz, m):\n        \"\"\"\n        input: xyz: (b, n, 3) and n > m, m: int32\n        output: idx: (b, m)\n        \"\"\"\n        assert xyz.is_contiguous()\n        b, n, _ = xyz.size()\n        idx = torch.cuda.IntTensor(b, m)\n        temp = torch.cuda.FloatTensor(b, n).fill_(1e10)\n        pointops_cuda.furthestsampling_cuda(b, n, m, xyz, temp, idx)\n        return idx\n\n    @staticmethod\n    def backward(xyz, a=None):\n        return None, None\n\nfurthestsampling = FurthestSampling.apply\n\n\nclass Gathering(Function):\n    @staticmethod\n    def forward(ctx, features, idx):\n        \"\"\"\n        input: features: (b, c, n), idx : (b, m) tensor\n        output: (b, c, m)\n        \"\"\"\n        assert features.is_contiguous()\n        assert idx.is_contiguous()\n        b, c, n = features.size()\n        m = idx.size(1)\n        output = torch.cuda.FloatTensor(b, c, m)\n        pointops_cuda.gathering_forward_cuda(b, c, n, m, features, idx, output)\n        ctx.for_backwards = (idx, c, n)\n        return output\n\n    @staticmethod\n    def backward(ctx, grad_out):\n        idx, c, n = ctx.for_backwards\n        b, m = idx.size()\n        grad_features = torch.cuda.FloatTensor(b, c, n).zero_()\n        grad_out_data = grad_out.data.contiguous()\n        pointops_cuda.gathering_backward_cuda(b, c, n, m, grad_out_data, idx, grad_features.data)\n        return grad_features, None\n\ngathering = Gathering.apply\n\n\nclass NearestNeighbor(Function):\n    @staticmethod\n    def forward(ctx, unknown: torch.Tensor, known: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:\n        \"\"\"\n        Find the three nearest neighbors of unknown in known\n        input: unknown: (b, n, 3), known: (b, m, 3)\n        output: dist2: (b, n, 3) l2 distance to the three nearest neighbors\n                idx: (b, n, 3) index of 3 nearest neighbors\n        \"\"\"\n        assert unknown.is_contiguous()\n        assert known.is_contiguous()\n        b, n, _ = unknown.size()\n        m = known.size(1)\n        dist2 = torch.cuda.FloatTensor(b, n, 3)\n        idx = torch.cuda.IntTensor(b, n, 3)\n        pointops_cuda.nearestneighbor_cuda(b, n, m, unknown, known, dist2, idx)\n        return torch.sqrt(dist2), idx\n\n    @staticmethod\n    def backward(ctx, a=None, b=None):\n        return None, None\n\nnearestneighbor = NearestNeighbor.apply\n\n\nclass Interpolation(Function):\n    @staticmethod\n    def forward(ctx, features: torch.Tensor, idx: torch.Tensor, weight: torch.Tensor) -> torch.Tensor:\n        \"\"\"\n        Performs weight linear interpolation on 3 features\n        input: features: (b, c, m) features descriptors to be interpolated from\n               idx: (b, n, 3) three nearest neighbors of the target features in features\n               weight: (b, n, 3) weights\n        output: (b, c, n) tensor of the interpolated features\n        \"\"\"\n        assert features.is_contiguous()\n        assert idx.is_contiguous()\n        assert weight.is_contiguous()\n        b, c, m = features.size()\n        n = idx.size(1)\n        ctx.interpolation_for_backward = (idx, weight, m)\n        output = torch.cuda.FloatTensor(b, c, n)\n        pointops_cuda.interpolation_forward_cuda(b, c, m, n, features, idx, weight, output)\n        return output\n\n    @staticmethod\n    def backward(ctx, grad_out: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, torch.Tensor]:\n        \"\"\"\n        input: grad_out: (b, c, n)\n        output: grad_features: (b, c, m), None, None\n        \"\"\"\n        idx, weight, m = ctx.interpolation_for_backward\n        b, c, n = grad_out.size()\n        grad_features = torch.cuda.FloatTensor(b, c, m).zero_()\n        grad_out_data = grad_out.data.contiguous()\n        pointops_cuda.interpolation_backward_cuda(b, c, n, m, grad_out_data, idx, weight, grad_features.data)\n        return grad_features, None, None\n\ninterpolation = Interpolation.apply\n\n\nclass Grouping(Function):\n    @staticmethod\n    def forward(ctx, features: torch.Tensor, idx: torch.Tensor) -> torch.Tensor:\n        \"\"\"\n        input: features: (b, c, n), idx : (b, m, nsample) containing the indicies of features to group with\n        output: (b, c, m, nsample)\n        \"\"\"\n        assert features.is_contiguous()\n        assert idx.is_contiguous()\n        b, c, n = features.size()\n        _, m, nsample = idx.size()\n        output = torch.cuda.FloatTensor(b, c, m, nsample)\n        pointops_cuda.grouping_forward_cuda(b, c, n, m, nsample, features, idx, output)\n        ctx.for_backwards = (idx, n)\n        return output\n\n    @staticmethod\n    def backward(ctx, grad_out: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:\n        \"\"\"\n        input: grad_out: (b, c, m, nsample)\n        output: (b, c, n), None\n        \"\"\"\n        idx, n = ctx.for_backwards\n        b, c, m, nsample = grad_out.size()\n        grad_features = torch.cuda.FloatTensor(b, c, n).zero_()\n        grad_out_data = grad_out.data.contiguous()\n        pointops_cuda.grouping_backward_cuda(b, c, n, m, nsample, grad_out_data, idx, grad_features.data)\n        return grad_features, None\n\ngrouping = Grouping.apply\n\n\nclass GroupingInt(Function):\n    @staticmethod\n    def forward(ctx, features: torch.Tensor, idx: torch.Tensor) -> torch.Tensor:\n        \"\"\"\n        input: features: (b, c, n), idx : (b, m, nsample) containing the indicies of features to group with\n        output: (b, c, m, nsample)\n        \"\"\"\n        assert features.is_contiguous()\n        assert idx.is_contiguous()\n        b, c, n = features.size()\n        _, m, nsample = idx.size()\n        output = torch.cuda.LongTensor(b, c, m, nsample)\n        pointops_cuda.grouping_int_forward_cuda(b, c, n, m, nsample, features, idx, output)\n        return output\n\n    @staticmethod\n    def backward(ctx, a=None):\n        return None, None\n\ngrouping_int = GroupingInt.apply\n\n\nclass BallQuery(Function):\n    @staticmethod\n    def forward(ctx, radius: float, nsample: int, xyz: torch.Tensor, new_xyz: torch.Tensor) -> torch.Tensor:\n        \"\"\"\n        input: radius: float, radius of the balls\n               nsample: int, maximum number of features in the balls\n               xyz: torch.Tensor, (b, n, 3) xyz coordinates of the features\n               new_xyz: torch.Tensor, (b, m, 3) centers of the ball query\n        output: (b, m, nsample) tensor with the indicies of the features that form the query balls\n        \"\"\"\n        assert xyz.is_contiguous()\n        assert new_xyz.is_contiguous()\n        b, n, _ = xyz.size()\n        m = new_xyz.size(1)\n        idx = torch.cuda.IntTensor(b, m, nsample).zero_()\n        pointops_cuda.ballquery_cuda(b, n, m, radius, nsample, new_xyz, xyz, idx)\n        return idx\n\n    @staticmethod\n    def backward(ctx, a=None):\n        return None, None, None, None\n\nballquery = BallQuery.apply\n\n\nclass FeatureDistribute(Function):\n    @staticmethod\n    def forward(ctx, max_xyz: torch.Tensor, xyz: torch.Tensor) -> torch.Tensor:\n        \"\"\"\n        :param ctx:\n        :param max_xyz: (b, n, 3)\n        :param xyz: (b, m, 3)\n        :return: distribute_idx: (b, m)\n        \"\"\"\n        assert max_xyz.is_contiguous()\n        assert xyz.is_contiguous()\n        b, n, _ = max_xyz.size()\n        m = xyz.size(1)\n        distribute_idx = torch.cuda.IntTensor(b, m).zero_()\n        pointops_cuda.featuredistribute_cuda(b, n, m, max_xyz, xyz, distribute_idx)\n        return distribute_idx\n\n    @staticmethod\n    def backward(ctx, a=None):\n        return None, None\n\nfeaturedistribute = FeatureDistribute.apply\n\n\nclass FeatureGather(Function):\n    @staticmethod\n    def forward(ctx, max_feature: torch.Tensor, distribute_idx: torch.Tensor) -> torch.Tensor:\n        '''\n        :param ctx:\n        :param max_feature: (b, c, n)\n        :param distribute_idx: (b, m)\n        :return: distribute_feature: (b, c, m)\n        '''\n        assert max_feature.is_contiguous()\n        assert distribute_idx.is_contiguous()\n        b, c, n = max_feature.size()\n        m = distribute_idx.size(1)\n        distribute_feature = torch.cuda.FloatTensor(b, c, m).zero_()\n        pointops_cuda.featuregather_forward_cuda(b, n, m, c, max_feature, distribute_idx, distribute_feature)\n        ctx.for_backwards = (distribute_idx, n)\n        return distribute_feature\n\n    @staticmethod\n    def backward(ctx, grad_distribute_feature: torch.Tensor):\n        '''\n        :param ctx:\n        :param grad_distribute_feature: (b, c, m)\n        :return: grad_max_feature: (b, c, n),    None\n        '''\n        distribute_idx, n = ctx.for_backwards\n        b, c, m = grad_distribute_feature.size()\n        grad_max_feature = torch.cuda.FloatTensor(b, c, n).zero_()\n        grad_distribute_feature_data = grad_distribute_feature.data.contiguous()\n        pointops_cuda.featuregather_backward_cuda(b, n, m, c, grad_distribute_feature_data, distribute_idx, grad_max_feature.data)\n        return grad_max_feature, None\n\nfeaturegather = FeatureGather.apply\n\n\nclass LabelStatBallRange(Function):\n    @staticmethod\n    def forward(ctx, radius: float, xyz: torch.Tensor, new_xyz: torch.Tensor, label_stat: torch.Tensor) -> torch.Tensor:\n        '''\n        :param ctx:\n        :param radius:\n        :param xyz: (b, n, 3)\n        :param new_xyz: (b, m, 3)\n        :param label_stat: (b, n, nclass)\n        :return: new_label_stat: (b, m, nclass)\n        '''\n        assert xyz.is_contiguous()\n        assert new_xyz.is_contiguous()\n        assert label_stat.is_contiguous()\n\n        b, n, nclass = label_stat.size()\n        m = new_xyz.size(1)\n        new_label_stat = torch.cuda.IntTensor(b, m, nclass).zero_()\n        pointops_cuda.labelstat_ballrange_cuda(b, n, m, radius, nclass, new_xyz, xyz, label_stat, new_label_stat)\n\n        return new_label_stat\n\n    @staticmethod\n    def backward(ctx, a=None):\n        return None, None, None, None\n\nlabelstat_ballrange = LabelStatBallRange.apply\n\n\nclass LabelStatIdx(Function):\n    @staticmethod\n    def forward(ctx, nsample: int, label_stat: torch.Tensor, idx: torch.Tensor) -> torch.Tensor:\n        '''\n        :param ctx:\n        :param nsample:\n        :param label_stat: (b, n, nclass)\n        :param idx: (b, m, nsample)\n        :return: new_label_stat: (b, m, nclass)\n        '''\n        assert label_stat.is_contiguous()\n        assert idx.is_contiguous()\n\n        b, n, nclass = label_stat.size()\n        m = idx.size(1)\n        new_label_stat = torch.cuda.IntTensor(b, m, nclass).zero_()\n        pointops_cuda.labelstat_idx_cuda(b, n, m, nsample, nclass, label_stat, idx, new_label_stat)\n\n        return new_label_stat\n\n    @staticmethod\n    def backward(ctx, a=None):\n        return None, None, None\n\nlabelstat_idx = LabelStatIdx.apply\n\n\nclass LabelStatAndBallQuery(Function):\n    @staticmethod\n    def forward(ctx, radius: float, nsample: int, xyz: torch.Tensor, new_xyz: torch.Tensor, label_stat: torch.Tensor):\n        '''\n        :param ctx:\n        :param radius:\n        :param nsample:\n        :param xyz: (b, n, 3)\n        :param new_xyz: (b, m, 3)\n        :param label_stat: (b, n, nclass)\n        :return: new_label_stat: (b, m, nclass)  idx: (b, m, nsample)\n        '''\n        assert xyz.is_contiguous()\n        assert new_xyz.is_contiguous()\n        assert label_stat.is_contiguous()\n\n        b, n, nclass = label_stat.size()\n        m = new_xyz.size(1)\n        new_label_stat = torch.cuda.IntTensor(b, m, nclass).zero_()\n        idx = torch.cuda.IntTensor(b, m, nsample).zero_()\n\n        pointops_cuda.labelstat_and_ballquery_cuda(b, n, m, radius, nsample, nclass, new_xyz, xyz, label_stat, idx, new_label_stat)\n\n        return new_label_stat, idx\n\n    @staticmethod\n    def backward(ctx, a=None, b=None):\n        return None, None, None, None, None\n\nlabelstat_and_ballquery = LabelStatAndBallQuery.apply\n\n\ndef pairwise_distances(x, y=None):\n    '''\n    Input: x is a Nxd matrix\n           y is an optional Mxd matirx\n    Output: dist is a NxM matrix where dist[i,j] is the square norm between x[i,:] and y[j,:]\n            if y is not given then use 'y=x'.\n    i.e. dist[i,j] = ||x[i,:]-y[j,:]||^2\n    '''\n    x_norm = (x ** 2).sum(1).view(-1, 1)\n    if y is not None:\n        y_t = torch.transpose(y, 0, 1)\n        y_norm = (y ** 2).sum(1).view(1, -1)\n    else:\n        y_t = torch.transpose(x, 0, 1)\n        y_norm = x_norm.view(1, -1)\n    dist = x_norm + y_norm - 2.0 * torch.mm(x, y_t)\n    import numpy as np\n    return torch.clamp(dist, 0.0, np.inf)\n\n\nclass KNNQueryNaive(Function):\n    @staticmethod\n    def forward(ctx, nsample: int, xyz: torch.Tensor, new_xyz: torch.Tensor = None) -> Tuple[torch.Tensor]:\n        \"\"\"\n        KNN Indexing\n        input: nsample: int32, Number of neighbor\n               xyz: (b, n, 3) coordinates of the features\n               new_xyz: (b, m, 3) centriods\n            output: idx: (b, m, nsample)\n        \"\"\"\n        if new_xyz is None:\n            new_xyz = xyz\n        b, m, _ = new_xyz.size()\n        n = xyz.size(1)\n\n        '''\n        idx = torch.zeros(b, m, nsample).int().cuda()\n        for i in range(b):\n            dist = pairwise_distances(new_xyz[i, :, :], xyz[i, :, :])\n            [_, idxs] = torch.sort(dist, dim=1)\n            idx[i, :, :] = idxs[:, 0:nsample]\n        '''\n\n        # '''\n        # new_xyz_repeat = new_xyz.repeat(1, 1, n).view(b, m * n, 3)\n        # xyz_repeat = xyz.repeat(1, m, 1).view(b, m * n, 3)\n        # dist = (new_xyz_repeat - xyz_repeat).pow(2).sum(dim=2).view(b, m, n)\n        dist = (new_xyz.repeat(1, 1, n).view(b, m * n, 3) - xyz.repeat(1, m, 1).view(b, m * n, 3)).pow(2).sum(dim=2).view(b, m, n)\n        [_, idxs] = torch.sort(dist, dim=2)\n        idx = idxs[:, :, 0:nsample].int()\n        # '''\n        return idx\n\n    @staticmethod\n    def backward(ctx):\n        return None, None, None\n\nknnquery_naive = KNNQueryNaive.apply\n\n\nclass KNNQuery(Function):\n    @staticmethod\n    def forward(ctx, nsample: int, xyz: torch.Tensor, new_xyz: torch.Tensor = None) -> Tuple[torch.Tensor]:\n        \"\"\"\n        KNN Indexing\n        input: nsample: int32, Number of neighbor\n               xyz: (b, n, 3) coordinates of the features\n               new_xyz: (b, m, 3) centriods\n            output: idx: (b, m, nsample)\n                   ( dist2: (b, m, nsample) )\n        \"\"\"\n        if new_xyz is None:\n            new_xyz = xyz\n        assert xyz.is_contiguous()\n        assert new_xyz.is_contiguous()\n        b, m, _ = new_xyz.size()\n        n = xyz.size(1)\n        idx = torch.cuda.IntTensor(b, m, nsample).zero_()\n        dist2 = torch.cuda.FloatTensor(b, m, nsample).zero_()\n        pointops_cuda.knnquery_cuda(b, n, m, nsample, xyz, new_xyz, idx, dist2)\n        return idx\n\n    @staticmethod\n    def backward(ctx, a=None):\n        return None, None, None\n\nknnquery = KNNQuery.apply\n\n\nclass KNNQueryExclude(Function):\n    @staticmethod\n    def forward(ctx, nsample: int, xyz: torch.Tensor, new_xyz: torch.Tensor = None) -> Tuple[torch.Tensor]:\n        \"\"\"\n        KNN Indexing\n        input: nsample: int32, Number of neighbor\n               xyz: (b, n, 3) coordinates of the features\n               new_xyz: (b, m, 3) centriods\n            output: new_features: (b, m, nsample)\n        \"\"\"\n        if new_xyz is None:\n            new_xyz = xyz\n        b, m, _ = new_xyz.size()\n        n = xyz.size(1)\n\n        '''\n        idx = torch.zeros(b, m, nsample).int().cuda()\n        for i in range(b):\n            dist = pairwise_distances(new_xyz[i, :, :], xyz[i, :, :])\n            [_, idxs] = torch.sort(dist, dim=1)\n            idx[i, :, :] = idxs[:, 0:nsample]\n        '''\n\n        # '''\n        # new_xyz_repeat = new_xyz.repeat(1, 1, n).view(b, m * n, 3)\n        # xyz_repeat = xyz.repeat(1, m, 1).view(b, m * n, 3)\n        # dist = (new_xyz_repeat - xyz_repeat).pow(2).sum(dim=2).view(b, m, n)\n        dist = (new_xyz.repeat(1, 1, n).view(b, m * n, 3) - xyz.repeat(1, m, 1).view(b, m * n, 3)).pow(2).sum(dim=2).view(b, m, n)\n        [_, idxs] = torch.sort(dist, dim=2)\n        idx = idxs[:, :, 1:nsample+1].int()\n        # '''\n        return idx\n\n    @staticmethod\n    def backward(ctx):\n        return None, None, None\n\nknnquery_exclude = KNNQueryExclude.apply\n\n\nclass QueryAndGroup(nn.Module):\n    \"\"\"\n    Groups with a ball query of radius\n    parameters:\n        radius: float32, Radius of ball\n        nsample: int32, Maximum number of features to gather in the ball\n    \"\"\"\n    def __init__(self, radius=None, nsample=32, use_xyz=True):\n        super(QueryAndGroup, self).__init__()\n        self.radius, self.nsample, self.use_xyz = radius, nsample, use_xyz\n\n    def forward(self, xyz: torch.Tensor, new_xyz: torch.Tensor = None, features: torch.Tensor = None, idx: torch.Tensor = None) -> torch.Tensor:\n        \"\"\"\n        input: xyz: (b, n, 3) coordinates of the features\n               new_xyz: (b, m, 3) centriods\n               features: (b, c, n)\n               idx: idx of neighbors\n               # idxs: (b, n)\n        output: new_features: (b, c+3, m, nsample)\n              #  grouped_idxs: (b, m, nsample)\n        \"\"\"\n        if new_xyz is None:\n            new_xyz = xyz\n        if idx is None:\n            if self.radius is not None:\n                idx = ballquery(self.radius, self.nsample, xyz, new_xyz)\n            else:\n                # idx = knnquery_naive(self.nsample, xyz, new_xyz)   # (b, m, nsample)\n                idx = knnquery(self.nsample, xyz, new_xyz)  # (b, m, nsample)\n        xyz_trans = xyz.transpose(1, 2).contiguous()\n        grouped_xyz = grouping(xyz_trans, idx)  # (b, 3, m, nsample)\n        # grouped_idxs = grouping(idxs.unsqueeze(1).float(), idx).squeeze(1).int()  # (b, m, nsample)\n\n        grouped_xyz -= new_xyz.transpose(1, 2).unsqueeze(-1)\n        if features is not None:\n            grouped_features = grouping(features, idx)\n            if self.use_xyz:\n                new_features = torch.cat([grouped_xyz, grouped_features], dim=1)  # (b, c+3, m, nsample)\n            else:\n                new_features = grouped_features\n        else:\n            assert self.use_xyz, \"Cannot have not features and not use xyz as a feature!\"\n            new_features = grouped_xyz\n        return new_features\n\n\nclass GroupAll(nn.Module):\n    \"\"\"\n    Groups all features\n    \"\"\"\n    def __init__(self, use_xyz: bool = True):\n        super(GroupAll, self).__init__()\n        self.use_xyz = use_xyz\n\n    def forward(self, xyz: torch.Tensor, new_xyz: torch.Tensor, features: torch.Tensor = None) -> Tuple[torch.Tensor]:\n        \"\"\"\n        input: xyz: (b, n, 3) coordinates of the features\n               new_xyz: ignored torch\n               features: (b, c, n) descriptors of the features\n        output: new_features: (b, c+3, 1, N) tensor\n        \"\"\"\n        grouped_xyz = xyz.transpose(1, 2).unsqueeze(2)\n        if features is not None:\n            grouped_features = features.unsqueeze(2)\n            if self.use_xyz:\n                new_features = torch.cat([grouped_xyz, grouped_features], dim=1)  # (b, c+3, 1, n)\n            else:\n                new_features = grouped_features\n        else:\n            new_features = grouped_xyz\n        return new_features\n\n\n"
  },
  {
    "path": "lib/pointops/setup.py",
    "content": "#python3 setup.py install\nfrom setuptools import setup\nfrom torch.utils.cpp_extension import BuildExtension, CUDAExtension\n\nsetup(\n    name='pointops',\n    ext_modules=[\n        CUDAExtension('pointops_cuda', [\n            'src/pointops_api.cpp',\n\n            'src/ballquery/ballquery_cuda.cpp',\n            'src/ballquery/ballquery_cuda_kernel.cu',\n            'src/knnquery/knnquery_cuda.cpp',\n            'src/knnquery/knnquery_cuda_kernel.cu',\n            'src/grouping/grouping_cuda.cpp',\n            'src/grouping/grouping_cuda_kernel.cu',\n            'src/grouping_int/grouping_int_cuda.cpp',\n            'src/grouping_int/grouping_int_cuda_kernel.cu',\n            'src/interpolation/interpolation_cuda.cpp',\n            'src/interpolation/interpolation_cuda_kernel.cu',\n            'src/sampling/sampling_cuda.cpp',\n            'src/sampling/sampling_cuda_kernel.cu',\n\n            'src/labelstat/labelstat_cuda.cpp',\n            'src/labelstat/labelstat_cuda_kernel.cu',\n\n            'src/featuredistribute/featuredistribute_cuda.cpp',\n            'src/featuredistribute/featuredistribute_cuda_kernel.cu'\n        ],\n                      extra_compile_args={'cxx': ['-g'],\n                                          'nvcc': ['-O2']})\n    ],\n    cmdclass={'build_ext': BuildExtension})\n"
  },
  {
    "path": "lib/pointops/src/__init__.py",
    "content": ""
  },
  {
    "path": "lib/pointops/src/ballquery/ballquery_cuda.cpp",
    "content": "#include <torch/serialize/tensor.h>\n#include <vector>\n#include <THC/THC.h>\n#include <ATen/cuda/CUDAContext.h>\n\n#include \"ballquery_cuda_kernel.h\"\n\nextern THCState *state;\n\n#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, \" must be a CUDAtensor \")\n#define CHECK_CONTIGUOUS(x) AT_CHECK(x.is_contiguous(), #x, \" must be contiguous \")\n#define CHECK_INPUT(x) CHECK_CUDA(x);CHECK_CONTIGUOUS(x)\n\nvoid ballquery_cuda(int b, int n, int m, float radius, int nsample, at::Tensor new_xyz_tensor, at::Tensor xyz_tensor, at::Tensor idx_tensor)\n{\n    const float *new_xyz = new_xyz_tensor.data<float>();\n    const float *xyz = xyz_tensor.data<float>();\n    int *idx = idx_tensor.data<int>();\n\n    ballquery_cuda_launcher(b, n, m, radius, nsample, new_xyz, xyz, idx);\n}\n\n\nvoid ballquery_cuda_fast(int b, int n, int m, float radius, int nsample, at::Tensor new_xyz_tensor, at::Tensor xyz_tensor, at::Tensor idx_tensor)\n{\n    CHECK_INPUT(new_xyz_tensor);\n    CHECK_INPUT(xyz_tensor);\n\n    const float *new_xyz = new_xyz_tensor.data<float>();\n    const float *xyz = xyz_tensor.data<float>();\n    int *idx = idx_tensor.data<int>();\n\n    cudaStream_t stream = THCState_getCurrentStream(state);\n\n    ballquery_cuda_launcher_fast(b, n, m, radius, nsample, new_xyz, xyz, idx, stream);\n}\n"
  },
  {
    "path": "lib/pointops/src/ballquery/ballquery_cuda_kernel.cu",
    "content": "#include \"../cuda_utils.h\"\n#include \"ballquery_cuda_kernel.h\"\n\n// input: new_xyz(b, m, 3) xyz(b, n, 3)\n// output: idx(b, m, nsample)\n__global__ void ballquery_cuda_kernel(int b, int n, int m, float radius, int nsample, const float *new_xyz, const float *xyz, int *idx)\n{\n    int batch_index = blockIdx.x;\n    xyz += batch_index * n * 3;\n    new_xyz += batch_index * m * 3;\n    idx += m * nsample * batch_index;\n    int index = threadIdx.x;\n    int stride = blockDim.x;\n\n    float radius2 = radius * radius;\n    for (int j = index; j < m; j += stride)\n    {\n        float new_x = new_xyz[j * 3 + 0];\n        float new_y = new_xyz[j * 3 + 1];\n        float new_z = new_xyz[j * 3 + 2];\n        for (int k = 0, cnt = 0; k < n && cnt < nsample; ++k)\n        {\n            float x = xyz[k * 3 + 0];\n            float y = xyz[k * 3 + 1];\n            float z = xyz[k * 3 + 2];\n            float d2 = (new_x - x) * (new_x - x) + (new_y - y) * (new_y - y) + (new_z - z) * (new_z - z);\n            if (d2 < radius2)\n            {\n                if (cnt == 0)\n                {\n                    for (int l = 0; l < nsample; ++l)\n                        idx[j * nsample + l] = k;\n                }\n                idx[j * nsample + cnt] = k;\n                ++cnt;\n            }\n        }\n    }\n}\n\nvoid ballquery_cuda_launcher(int b, int n, int m, float radius, int nsample, const float *new_xyz, const float *xyz, int *idx)\n{\n    ballquery_cuda_kernel<<<b, opt_n_threads(m), 0>>>(b, n, m, radius, nsample, new_xyz, xyz, idx);\n}\n\n\n__global__ void ballquery_cuda_kernel_fast(int b, int n, int m, float radius, int nsample, const float *__restrict__ new_xyz, const float *__restrict__ xyz, int *__restrict__ idx) {\n    int bs_idx = blockIdx.y;\n    int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;\n    if (bs_idx >= b || pt_idx >= m) return;\n\n    new_xyz += bs_idx * m * 3 + pt_idx * 3;\n    xyz += bs_idx * n * 3;\n    idx += bs_idx * m * nsample + pt_idx * nsample;\n\n    float radius2 = radius * radius;\n    float new_x = new_xyz[0];\n    float new_y = new_xyz[1];\n    float new_z = new_xyz[2];\n\n    int cnt = 0;\n    for (int k = 0; k < n; ++k) {\n        float x = xyz[k * 3 + 0];\n        float y = xyz[k * 3 + 1];\n        float z = xyz[k * 3 + 2];\n        float d2 = (new_x - x) * (new_x - x) + (new_y - y) * (new_y - y) + (new_z - z) * (new_z - z);\n        if (d2 < radius2){\n            if (cnt == 0){\n                for (int l = 0; l < nsample; ++l) {\n                    idx[l] = k;\n                }\n            }\n            idx[cnt] = k;\n            ++cnt;\n            if (cnt >= nsample){\n                break;\n            }\n        }\n    }\n}\n\n\nvoid ballquery_cuda_launcher_fast(int b, int n, int m, float radius, int nsample, const float *new_xyz, const float *xyz, int *idx, cudaStream_t stream) {\n    // param new_xyz: (B, m, 3)\n    // param xyz: (B, n, 3)\n    // param idx: (B, m, nsample)\n\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(m, THREADS_PER_BLOCK), b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    ballquery_cuda_kernel_fast<<<blocks, threads, 0, stream>>>(b, n, m, radius, nsample, new_xyz, xyz, idx);\n    // cudaDeviceSynchronize();  // for using printf in kernel function\n\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}\n"
  },
  {
    "path": "lib/pointops/src/ballquery/ballquery_cuda_kernel.h",
    "content": "#ifndef _BALLQUERY_CUDA_KERNEL\n#define _BALLQUERY_CUDA_KERNEL\n#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n\nvoid ballquery_cuda(int b, int n, int m, float radius, int nsample, at::Tensor new_xyz_tensor, at::Tensor xyz_tensor, at::Tensor idx_tensor);\n\nvoid ballquery_cuda_fast(int b, int n, int m, float radius, int nsample, at::Tensor new_xyz_tensor, at::Tensor xyz_tensor, at::Tensor idx_tensor);\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\nvoid ballquery_cuda_launcher(int b, int n, int m, float radius, int nsample, const float *xyz, const float *new_xyz, int *idx);\n\nvoid ballquery_cuda_launcher_fast(int b, int n, int m, float radius, int nsample, const float *new_xyz, const float *xyz, int *idx, cudaStream_t stream);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "lib/pointops/src/cuda_utils.h",
    "content": "#ifndef _CUDA_UTILS_H\n#define _CUDA_UTILS_H\n\n#include <cmath>\n\n#define TOTAL_THREADS 1024\n\n#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, \" must be a CUDAtensor \")\n#define CHECK_CONTIGUOUS(x) AT_CHECK(x.is_contiguous(), #x, \" must be contiguous \")\n#define CHECK_INPUT(x) CHECK_CUDA(x);CHECK_CONTIGUOUS(x)\n\n#define THREADS_PER_BLOCK 256\n#define DIVUP(m,n) ((m) / (n) + ((m) % (n) > 0))\n\ninline int opt_n_threads(int work_size) {\n    const int pow_2 = std::log(static_cast<double>(work_size)) / std::log(2.0);\n    return max(min(1 << pow_2, TOTAL_THREADS), 1);\n}\n\ninline dim3 opt_block_config(int x, int y) {\n    const int x_threads = opt_n_threads(x);\n    const int y_threads = max(min(opt_n_threads(y), TOTAL_THREADS / x_threads), 1);\n    dim3 block_config(x_threads, y_threads, 1);\n    return block_config;\n}\n\n#endif"
  },
  {
    "path": "lib/pointops/src/featuredistribute/featuredistribute_cuda.cpp",
    "content": "#include <torch/serialize/tensor.h>\n#include <vector>\n#include <THC/THC.h>\n#include <ATen/cuda/CUDAContext.h>\n\n#include \"featuredistribute_cuda_kernel.h\"\n\nextern THCState *state;\n\n#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, \" must be a CUDAtensor \")\n#define CHECK_CONTIGUOUS(x) AT_CHECK(x.is_contiguous(), #x, \" must be contiguous \")\n#define CHECK_INPUT(x) CHECK_CUDA(x);CHECK_CONTIGUOUS(x)\n\n\nvoid featuredistribute_cuda(int b, int n, int m, at::Tensor max_xyz_tensor, at::Tensor xyz_tensor, at::Tensor distribute_idx_tensor)\n{\n    CHECK_INPUT(max_xyz_tensor);\n    CHECK_INPUT(xyz_tensor);\n\n    const float *max_xyz = max_xyz_tensor.data<float>();\n    const float *xyz = xyz_tensor.data<float>();\n    int *distribute_idx = distribute_idx_tensor.data<int>();\n\n    cudaStream_t stream = THCState_getCurrentStream(state);\n\n    featuredistribute_cuda_launcher(b, n, m, max_xyz, xyz, distribute_idx, stream);\n}\n\n\nvoid featuregather_forward_cuda(int b, int n, int m, int c, at::Tensor max_feature_tensor, at::Tensor distribute_idx_tensor, at::Tensor distribute_feature_tensor)\n{\n    CHECK_INPUT(max_feature_tensor);\n    CHECK_INPUT(distribute_idx_tensor);\n\n    const float *max_feature = max_feature_tensor.data<float>();\n    const int *distribute_idx = distribute_idx_tensor.data<int>();\n    float *distribute_feature = distribute_feature_tensor.data<float>();\n\n    cudaStream_t stream = THCState_getCurrentStream(state);\n\n    featuregather_forward_cuda_launcher(b, n, m, c, max_feature, distribute_idx, distribute_feature, stream);\n}\n\n\nvoid featuregather_backward_cuda(int b, int n, int m, int c, at::Tensor grad_distribute_feature_tensor, at::Tensor distribute_idx_tensor, at::Tensor grad_max_feature_tensor)\n{\n    CHECK_INPUT(grad_distribute_feature_tensor);\n    CHECK_INPUT(distribute_idx_tensor);\n\n    const float *grad_distribute_feature = grad_distribute_feature_tensor.data<float>();\n    const int *distribute_idx = distribute_idx_tensor.data<int>();\n    float *grad_max_feature = grad_max_feature_tensor.data<float>();\n\n    cudaStream_t stream = THCState_getCurrentStream(state);\n\n    featuregather_backward_cuda_launcher(b, n, m, c, grad_distribute_feature, distribute_idx, grad_max_feature, stream);\n}"
  },
  {
    "path": "lib/pointops/src/featuredistribute/featuredistribute_cuda_kernel.cu",
    "content": "#include \"../cuda_utils.h\"\n#include \"featuredistribute_cuda_kernel.h\"\n\n__global__ void featuredistribute_cuda_kernel(int b, int n, int m, const float *max_xyz, const float *xyz, int *distribute_idx) {\n    int bs_idx = blockIdx.y;\n    int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;\n    if (bs_idx >= b || pt_idx >= m) return;\n\n    max_xyz += bs_idx * n * 3;\n    xyz += bs_idx * m * 3 + pt_idx * 3;\n    distribute_idx += bs_idx * m + pt_idx;\n\n    float x = xyz[0];\n    float y = xyz[1];\n    float z = xyz[2];\n\n    float min_dist2 = 100000;\n    int min_dist_idx = -1;\n    for (int k = 0; k < n; ++k) {\n        float max_x = max_xyz[k * 3 + 0];\n        float max_y = max_xyz[k * 3 + 1];\n        float max_z = max_xyz[k * 3 + 2];\n        float d2 = (max_x - x) * (max_x - x) + (max_y - y) * (max_y - y) + (max_z - z) * (max_z - z);\n        if (d2 < min_dist2){\n            min_dist_idx = k;\n            min_dist2 = d2;\n        }\n    }\n    distribute_idx[0] = min_dist_idx;\n}\n\n\nvoid featuredistribute_cuda_launcher(int b, int n, int m, const float *max_xyz, const float *xyz, int *distribute_idx, cudaStream_t stream) {\n    // param max_xyz: (b, n, 3)\n    // param xyz: (b, m, 3)\n    // return distribute_idx: (b, m)\n\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(m, THREADS_PER_BLOCK), b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    featuredistribute_cuda_kernel<<<blocks, threads, 0, stream>>>(b, n, m, max_xyz, xyz, distribute_idx);\n    // cudaDeviceSynchronize();  // for using printf in kernel function\n\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}\n\n__global__ void featuregather_forward_cuda_kernel(int b, int n, int m, int c, const float *max_feature, const int *distribute_idx, float *distribute_feature) {\n    int bs_idx = blockIdx.z;\n    int c_idx = blockIdx.y;\n    int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;\n    if (bs_idx >= b || c_idx >= c || pt_idx >= m) return;\n\n    max_feature += bs_idx * c * n + c_idx * n;\n    distribute_idx += bs_idx * m + pt_idx;\n    distribute_feature += bs_idx * c * m + c_idx * m + pt_idx;\n\n    int idx = distribute_idx[0];\n    distribute_feature[0] = max_feature[idx];\n}\n\n\nvoid featuregather_forward_cuda_launcher(int b, int n, int m, int c, const float *max_feature, const int *distribute_idx, float *distribute_feature, cudaStream_t stream){\n    // param max_feature: (b, c, n)\n    // param distribute_idx: (b, m)\n    // return distribute_feature: (b, c, m)\n\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(m, THREADS_PER_BLOCK), c, b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    featuregather_forward_cuda_kernel<<<blocks, threads, 0, stream>>>(b, n, m, c, max_feature, distribute_idx, distribute_feature);\n    // cudaDeviceSynchronize();  // for using printf in kernel function\n\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}\n\n\n__global__ void featuregather_backward_cuda_kernel(int b, int n, int m, int c, const float *grad_distribute_feature, const int *distribute_idx, float *grad_max_feature){\n    int bs_idx = blockIdx.z;\n    int c_idx = blockIdx.y;\n    int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;\n    if(bs_idx >= b || c_idx >= c || pt_idx >= m) return;\n\n    grad_distribute_feature += bs_idx * c * m + c_idx * m + pt_idx;\n    distribute_idx += bs_idx * m + pt_idx;\n    grad_max_feature += bs_idx * c * n + c_idx * n;\n\n    int idx = distribute_idx[0];\n    atomicAdd(grad_max_feature + idx, grad_distribute_feature[0]);\n}\n\n\nvoid featuregather_backward_cuda_launcher(int b, int n, int m, int c, const float *grad_distribute_feature, const int *distribute_idx, float *grad_max_feature, cudaStream_t stream){\n    // param grad_distribute_feature: (b, c, m)\n    // param distribute_idx: (b, m)\n    // return grad_max_feature: (b, c, n)\n\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(m, THREADS_PER_BLOCK), c, b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    featuregather_backward_cuda_kernel<<<blocks, threads, 0, stream>>>(b, n, m, c, grad_distribute_feature, distribute_idx, grad_max_feature);\n    // cudaDeviceSynchronize();  // for using printf in kernel function\n\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}"
  },
  {
    "path": "lib/pointops/src/featuredistribute/featuredistribute_cuda_kernel.h",
    "content": "#ifndef _FEATUREDISTRIBUTE_CUDA_KERNEL\n#define _FEATUREDISTRIBUTE_CUDA_KERNEL\n#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n\nvoid featuredistribute_cuda(int b, int n, int m, at::Tensor max_xyz_tensor, at::Tensor xyz_tensor, at::Tensor distribute_idx_tensor);\nvoid featuregather_forward_cuda(int b, int n, int m, int c, at::Tensor max_feature_tensor, at::Tensor distribute_idx_tensor, at::Tensor distribute_feature_tensor);\nvoid featuregather_backward_cuda(int b, int n, int m, int c, at::Tensor grad_distribute_feature_tensor, at::Tensor distribute_idx_tensor, at::Tensor grad_max_feature_tensor);\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\nvoid featuredistribute_cuda_launcher(int b, int n, int m, const float *max_xyz, const float *xyz, int *distribute_idx, cudaStream_t stream);\nvoid featuregather_forward_cuda_launcher(int b, int n, int m, int c, const float *max_feature, const int *distribute_idx, float *distribute_feature, cudaStream_t stream);\nvoid featuregather_backward_cuda_launcher(int b, int n, int m, int c, const float *grad_distribute_feature, const int *distribute_idx, float *grad_max_feature, cudaStream_t stream);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "lib/pointops/src/grouping/grouping_cuda.cpp",
    "content": "#include <torch/serialize/tensor.h>\n#include <ATen/cuda/CUDAContext.h>\n#include <vector>\n#include <THC/THC.h>\n\n#include \"grouping_cuda_kernel.h\"\n\nextern THCState *state;\n\nvoid grouping_forward_cuda(int b, int c, int n, int m, int nsample, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out_tensor)\n{\n    const float *points = points_tensor.data<float>();\n    const int *idx = idx_tensor.data<int>();\n    float *out = out_tensor.data<float>();\n    grouping_forward_cuda_launcher(b, c, n, m, nsample, points, idx, out);\n}\n\nvoid grouping_backward_cuda(int b, int c, int n, int m, int nsample, at::Tensor grad_out_tensor, at::Tensor idx_tensor, at::Tensor grad_points_tensor)\n{\n    float *grad_points = grad_points_tensor.data<float>();\n    const int *idx = idx_tensor.data<int>();\n    const float *grad_out = grad_out_tensor.data<float>();\n    grouping_backward_cuda_launcher(b, c, n, m, nsample, grad_out, idx, grad_points);\n}\n\nvoid grouping_forward_cuda_fast(int b, int c, int n, int npoints, int nsample, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out_tensor) {\n\n    const float *points = points_tensor.data<float>();\n    const int *idx = idx_tensor.data<int>();\n    float *out = out_tensor.data<float>();\n    grouping_forward_cuda_launcher_fast(b, c, n, npoints, nsample, points, idx, out);\n}"
  },
  {
    "path": "lib/pointops/src/grouping/grouping_cuda_kernel.cu",
    "content": "#include \"../cuda_utils.h\"\n#include \"grouping_cuda_kernel.h\"\n\n// input: points(b, c, n) idx(b, m, nsample)\n// output: out(b, c, m, nsample)\n__global__ void grouping_forward_cuda_kernel(int b, int c, int n, int m, int nsample, const float *points, const int *idx, float *out)\n{\n    int batch_index = blockIdx.x;\n    points += batch_index * n * c;\n    idx += batch_index * m * nsample;\n    out += batch_index * m * nsample * c;\n    const int index = threadIdx.y * blockDim.x + threadIdx.x;\n    const int stride = blockDim.y * blockDim.x;\n    for (int i = index; i < c * m; i += stride)\n    {\n        const int l = i / m;\n        const int j = i % m;\n        for (int k = 0; k < nsample; ++k)\n        {\n            int ii = idx[j * nsample + k];\n            out[(l * m + j) * nsample + k] = points[l * n + ii];\n        }\n    }\n}\n\n// input: grad_out(b, c, m, nsample), idx(b, m, nsample)\n// output: grad_points(b, c, n)\n__global__ void grouping_backward_cuda_kernel(int b, int c, int n, int m, int nsample, const float *grad_out, const int *idx, float *grad_points)\n{\n    int batch_index = blockIdx.x;\n    grad_out += batch_index * m * nsample * c;\n    idx += batch_index * m * nsample;\n    grad_points += batch_index * n * c;\n    const int index = threadIdx.y * blockDim.x + threadIdx.x;\n    const int stride = blockDim.y * blockDim.x;\n    for (int i = index; i < c * m; i += stride)\n    {\n        const int l = i / m;\n        const int j = i % m;\n        for (int k = 0; k < nsample; ++k)\n        {\n            int ii = idx[j * nsample + k];\n            atomicAdd(grad_points + l * n + ii, grad_out[(l * m + j) * nsample + k]);\n        }\n    }\n}\n\nvoid grouping_forward_cuda_launcher(int b, int c, int n, int m, int nsample, const float *points, const int *idx, float *out)\n{\n    grouping_forward_cuda_kernel<<<b, opt_block_config(m, c), 0>>>(b, c, n, m, nsample, points, idx, out);\n}\n\nvoid grouping_backward_cuda_launcher(int b, int c, int n, int m, int nsample, const float *grad_out, const int *idx, float *grad_points)\n{\n    grouping_backward_cuda_kernel<<<b, opt_block_config(m, c), 0>>>(b, c, n, m, nsample, grad_out, idx, grad_points);\n}\n\n// input: points(b, c, n) idx(b, npoints, nsample)\n// output: out(b, c, npoints, nsample)\n__global__ void grouping_forward_cuda_kernel_fast(int b, int c, int n, int npoints, int nsample, const float *__restrict__ points, const int *__restrict__ idx, float *__restrict__ out) {\n    int bs_idx = blockIdx.z;\n    int c_idx = blockIdx.y;\n    int index = blockIdx.x * blockDim.x + threadIdx.x;\n    int pt_idx = index / nsample;\n    if (bs_idx >= b || c_idx >= c || pt_idx >= npoints) return;\n\n    int sample_idx = index % nsample;\n\n    idx += bs_idx * npoints * nsample + pt_idx * nsample + sample_idx;\n    int in_idx = bs_idx * c * n + c_idx * n + idx[0];\n    int out_idx = bs_idx * c * npoints * nsample + c_idx * npoints * nsample + pt_idx * nsample + sample_idx;\n\n    out[out_idx] = points[in_idx];\n}\n\n// input: points(b, c, n) idx(b, npoints, nsample)\n// output: out(b, c, npoints, nsample)\nvoid grouping_forward_cuda_launcher_fast(int b, int c, int n, int npoints, int nsample, const float *points, const int *idx, float *out) {\n\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(npoints * nsample, THREADS_PER_BLOCK), c, b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    grouping_forward_cuda_kernel_fast<<<blocks, threads, 0>>>(b, c, n, npoints, nsample, points, idx, out);\n    // cudaDeviceSynchronize();  // for using printf in kernel function\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}\n\n\n"
  },
  {
    "path": "lib/pointops/src/grouping/grouping_cuda_kernel.h",
    "content": "#ifndef _GROUPING_CUDA_KERNEL\n#define _GROUPING_CUDA_KERNEL\n#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n\nvoid grouping_forward_cuda(int b, int c, int n, int m, int nsample, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out);\nvoid grouping_backward_cuda(int b, int c, int n, int m, int nsample, at::Tensor grad_out_tensor, at::Tensor idx_tensor, at::Tensor grad_points_tensor);\n\nvoid grouping_forward_cuda_fast(int b, int c, int n, int npoints, int nsample, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out_tensor);\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\nvoid grouping_forward_cuda_launcher(int b, int c, int n, int m, int nsample, const float *points, const int *idx, float *out);\nvoid grouping_backward_cuda_launcher(int b, int c, int n, int m, int nsample, const float *grad_out, const int *idx, float *grad_points);\n\nvoid grouping_forward_cuda_launcher_fast(int b, int c, int n, int npoints, int nsample, const float *points, const int *idx, float *out);\n\n#ifdef __cplusplus\n}\n#endif\n#endif\n"
  },
  {
    "path": "lib/pointops/src/grouping_int/grouping_int_cuda.cpp",
    "content": "#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n#include <THC/THC.h>\n\n#include \"grouping_int_cuda_kernel.h\"\n\nextern THCState *state;\n\nvoid grouping_int_forward_cuda(int b, int c, int n, int m, int nsample, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out_tensor)\n{\n    const long int *points = points_tensor.data<long int>();\n    const int *idx = idx_tensor.data<int>();\n    long int *out = out_tensor.data<long int>();\n    grouping_int_forward_cuda_launcher(b, c, n, m, nsample, points, idx, out);\n}\n\nvoid grouping_int_forward_cuda_fast(int b, int c, int n, int m, int nsample, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out_tensor)\n{\n    const long int *points = points_tensor.data<long int>();\n    const int *idx = idx_tensor.data<int>();\n    long int *out = out_tensor.data<long int>();\n    grouping_int_forward_cuda_launcher_fast(b, c, n, m, nsample, points, idx, out);\n}"
  },
  {
    "path": "lib/pointops/src/grouping_int/grouping_int_cuda_kernel.cu",
    "content": "#include \"../cuda_utils.h\"\n#include \"grouping_int_cuda_kernel.h\"\n\n// input: points(b, c, n) idx(b, m, nsample)\n// output: out(b, c, m, nsample)\n__global__ void grouping_int_forward_cuda_kernel(int b, int c, int n, int m, int nsample, const long int *points, const int *idx, long int *out)\n{\n    int batch_index = blockIdx.x;\n    points += batch_index * n * c;\n    idx += batch_index * m * nsample;\n    out += batch_index * m * nsample * c;\n    const int index = threadIdx.y * blockDim.x + threadIdx.x;\n    const int stride = blockDim.y * blockDim.x;\n    for (int i = index; i < c * m; i += stride)\n    {\n        const int l = i / m;\n        const int j = i % m;\n        for (int k = 0; k < nsample; ++k)\n        {\n            int ii = idx[j * nsample + k];\n            out[(l * m + j) * nsample + k] = points[l * n + ii];\n        }\n    }\n}\n\n\nvoid grouping_int_forward_cuda_launcher(int b, int c, int n, int m, int nsample, const long int *points, const int *idx, long int *out)\n{\n    grouping_int_forward_cuda_kernel<<<b, opt_block_config(m, c), 0>>>(b, c, n, m, nsample, points, idx, out);\n}\n\n\n__global__ void grouping_int_forward_cuda_kernel_fast(int b, int c, int n, int npoints, int nsample, const long int *__restrict__ points, const int *__restrict__ idx, long int *__restrict__ out)\n{\n    int bs_idx = blockIdx.z;\n    int c_idx = blockIdx.y;\n    int index = blockIdx.x * blockDim.x + threadIdx.x;\n    int pt_idx = index / nsample;\n    if (bs_idx >= b || c_idx >= c || pt_idx >= npoints) return;\n\n    int sample_idx = index % nsample;\n\n    idx += bs_idx * npoints * nsample + pt_idx * nsample + sample_idx;\n    int in_idx = bs_idx * c * n + c_idx * n + idx[0];\n    int out_idx = bs_idx * c * npoints * nsample + c_idx * npoints * nsample + pt_idx * nsample + sample_idx;\n\n    out[out_idx] = points[in_idx];\n}\n\n\nvoid grouping_int_forward_cuda_launcher_fast(int b, int c, int n, int npoints, int nsample, const long int *points, const int *idx, long int *out)\n{\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(npoints * nsample, THREADS_PER_BLOCK), c, b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    grouping_int_forward_cuda_kernel_fast<<<blocks, threads, 0>>>(b, c, n, npoints, nsample, points, idx, out);\n    // cudaDeviceSynchronize();  // for using printf in kernel function\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}"
  },
  {
    "path": "lib/pointops/src/grouping_int/grouping_int_cuda_kernel.h",
    "content": "#ifndef _GROUPING_INT_CUDA_KERNEL\n#define _GROUPING_INT_CUDA_KERNEL\n#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n\nvoid grouping_int_forward_cuda(int b, int c, int n, int m, int nsample, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out);\n\nvoid grouping_int_forward_cuda_fast(int b, int c, int n, int m, int nsample, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out_tensor);\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\nvoid grouping_int_forward_cuda_launcher(int b, int c, int n, int m, int nsample, const long int *points, const int *idx, long int *out);\n\nvoid grouping_int_forward_cuda_launcher_fast(int b, int c, int n, int npoints, int nsample, const long int *points, const int *idx, long int *out);\n\n#ifdef __cplusplus\n}\n#endif\n#endif\n"
  },
  {
    "path": "lib/pointops/src/interpolation/interpolation_cuda.cpp",
    "content": "#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n#include <THC/THC.h>\n#include \"interpolation_cuda_kernel.h\"\n\nextern THCState *state;\n\nvoid nearestneighbor_cuda(int b, int n, int m, at::Tensor unknown_tensor, at::Tensor known_tensor, at::Tensor dist2_tensor, at::Tensor idx_tensor)\n{\n    const float *unknown = unknown_tensor.data<float>();\n    const float *known = known_tensor.data<float>();\n    float *dist2 = dist2_tensor.data<float>();\n    int *idx = idx_tensor.data<int>();\n    nearestneighbor_cuda_launcher(b, n, m, unknown, known, dist2, idx);\n}\n\nvoid interpolation_forward_cuda(int b, int c, int m, int n, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor out_tensor)\n{\n    const float *points = points_tensor.data<float>();\n    const float *weight = weight_tensor.data<float>();\n    float *out = out_tensor.data<float>();\n    const int *idx = idx_tensor.data<int>();\n    interpolation_forward_cuda_launcher(b, c, m, n, points, idx, weight, out);\n}\n\nvoid interpolation_backward_cuda(int b, int c, int n, int m, at::Tensor grad_out_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor grad_points_tensor)\n{\n    const float *grad_out = grad_out_tensor.data<float>();\n    const float *weight = weight_tensor.data<float>();\n    float *grad_points = grad_points_tensor.data<float>();\n    const int *idx = idx_tensor.data<int>();\n    interpolation_backward_cuda_launcher(b, c, n, m, grad_out, idx, weight, grad_points);\n}\n\nvoid nearestneighbor_cuda_fast(int b, int n, int m, at::Tensor unknown_tensor, at::Tensor known_tensor, at::Tensor dist2_tensor, at::Tensor idx_tensor) {\n    const float *unknown = unknown_tensor.data<float>();\n    const float *known = known_tensor.data<float>();\n    float *dist2 = dist2_tensor.data<float>();\n    int *idx = idx_tensor.data<int>();\n    nearestneighbor_cuda_launcher_fast(b, n, m, unknown, known, dist2, idx);\n}\n\nvoid interpolation_forward_cuda_fast(int b, int c, int m, int n, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor out_tensor) {\n\n    const float *points = points_tensor.data<float>();\n    const float *weight = weight_tensor.data<float>();\n    float *out = out_tensor.data<float>();\n    const int *idx = idx_tensor.data<int>();\n    interpolation_forward_cuda_launcher_fast(b, c, m, n, points, idx, weight, out);\n}"
  },
  {
    "path": "lib/pointops/src/interpolation/interpolation_cuda_kernel.cu",
    "content": "#include \"../cuda_utils.h\"\n#include \"interpolation_cuda_kernel.h\"\n\n// input: unknown(b, n, 3) known(b, m, 3)\n// output: dist2(b, n, 3), idx(b, n, 3)\n__global__ void nearestneighbor_cuda_kernel(int b, int n, int m, const float *unknown, const float *known, float *dist2, int *idx)\n{\n    int batch_index = blockIdx.x;\n    unknown += batch_index * n * 3;\n    known += batch_index * m * 3;\n    dist2 += batch_index * n * 3;\n    idx += batch_index * n * 3;\n\n    int index = threadIdx.x;\n    int stride = blockDim.x;\n    for (int j = index; j < n; j += stride)\n    {\n\t\tfloat ux = unknown[j * 3 + 0];\n\t\tfloat uy = unknown[j * 3 + 1];\n\t\tfloat uz = unknown[j * 3 + 2];\n\n\t\tdouble best1 = 1e40, best2 = 1e40, best3 = 1e40;\n\t\tint besti1 = 0, besti2 = 0, besti3 = 0;\n\t\tfor (int k = 0; k < m; ++k)\n\t\t{\n\t\t    float x = known[k * 3 + 0];\n\t\t    float y = known[k * 3 + 1];\n\t\t    float z = known[k * 3 + 2];\n\t\t    float d =\n\t\t\t(ux - x) * (ux - x) + (uy - y) * (uy - y) + (uz - z) * (uz - z);\n\t\t    if (d < best1)\n\t\t    {\n\t\t\t\tbest3 = best2;\n\t\t\t\tbesti3 = besti2;\n\t\t\t\tbest2 = best1;\n\t\t\t\tbesti2 = besti1;\n\t\t\t\tbest1 = d;\n\t\t\t\tbesti1 = k;\n\t\t\t}\n\t\t\telse if (d < best2)\n\t\t\t{\n\t\t\t\tbest3 = best2;\n\t\t\t\tbesti3 = besti2;\n\t\t\t\tbest2 = d;\n\t\t\t\tbesti2 = k;\n\t\t\t}\n\t\t\telse if (d < best3)\n\t\t\t{\n\t\t\t\tbest3 = d;\n\t\t\t\tbesti3 = k;\n\t\t    }\n\t\t}\n\t\tdist2[j * 3 + 0] = best1;\n\t\tdist2[j * 3 + 1] = best2;\n\t\tdist2[j * 3 + 2] = best3;\n\t\tidx[j * 3 + 0] = besti1;\n\t\tidx[j * 3 + 1] = besti2;\n\t\tidx[j * 3 + 2] = besti3;\n    }\n}\n\n// input: points(b, c, m), idx(b, n, 3), weight(b, n, 3)\n// output: out(b, c, n)\n__global__ void interpolation_forward_cuda_kernel(int b, int c, int m, int n, const float *points, const int *idx, const float *weight, float *out)\n{\n    int batch_index = blockIdx.x;\n    points += batch_index * m * c;\n    idx += batch_index * n * 3;\n    weight += batch_index * n * 3;\n    out += batch_index * n * c;\n\n    const int index = threadIdx.y * blockDim.x + threadIdx.x;\n    const int stride = blockDim.y * blockDim.x;\n    for (int i = index; i < c * n; i += stride)\n    {\n\t\tconst int l = i / n;\n\t\tconst int j = i % n;\n\t\tfloat w1 = weight[j * 3 + 0];\n\t\tfloat w2 = weight[j * 3 + 1];\n\t\tfloat w3 = weight[j * 3 + 2];\n\t\tint i1 = idx[j * 3 + 0];\n\t\tint i2 = idx[j * 3 + 1];\n\t\tint i3 = idx[j * 3 + 2];\n\t\tout[i] = points[l * m + i1] * w1 + points[l * m + i2] * w2 + points[l * m + i3] * w3;\n    }\n}\n\n// input: grad_out(b, c, n), idx(b, n, 3), weight(b, n, 3)\n// output: grad_points(b, c, m)\n__global__ void interpolation_backward_cuda_kernel( int b, int c, int n, int m, const float *grad_out, const int *idx, const float *weight, float *grad_points)\n{\n    int batch_index = blockIdx.x;\n    grad_out += batch_index * n * c;\n    idx += batch_index * n * 3;\n    weight += batch_index * n * 3;\n    grad_points += batch_index * m * c;\n\n    const int index = threadIdx.y * blockDim.x + threadIdx.x;\n    const int stride = blockDim.y * blockDim.x;\n    for (int i = index; i < c * n; i += stride)\n    {\n\t\tconst int l = i / n;\n\t\tconst int j = i % n;\n\t\tfloat w1 = weight[j * 3 + 0];\n\t\tfloat w2 = weight[j * 3 + 1];\n\t\tfloat w3 = weight[j * 3 + 2];\n\t\tint i1 = idx[j * 3 + 0];\n\t\tint i2 = idx[j * 3 + 1];\n\t\tint i3 = idx[j * 3 + 2];\n\t\tatomicAdd(grad_points + l * m + i1, grad_out[i] * w1);\n\t\tatomicAdd(grad_points + l * m + i2, grad_out[i] * w2);\n\t\tatomicAdd(grad_points + l * m + i3, grad_out[i] * w3);\n    }\n}\n\nvoid nearestneighbor_cuda_launcher(int b, int n, int m, const float *unknown, const float *known, float *dist2, int *idx)\n{\n    nearestneighbor_cuda_kernel<<<b, opt_n_threads(n), 0>>>(b, n, m, unknown, known, dist2, idx);\n}\n\nvoid interpolation_forward_cuda_launcher(int b, int c, int m, int n, const float *points, const int *idx, const float *weight, float *out)\n{\n    interpolation_forward_cuda_kernel<<<b, opt_block_config(n, c), 0>>>(b, c, m, n, points, idx, weight, out);\n}\n\nvoid interpolation_backward_cuda_launcher(int b, int n, int c, int m, const float *grad_out, const int *idx, const float *weight, float *grad_points)\n{\n    interpolation_backward_cuda_kernel<<<b, opt_block_config(n, c), 0>>>(b, n, c, m, grad_out, idx, weight, grad_points);\n}\n\n\n// input: unknown(b, n, 3) known(b, m, 3)\n// output: dist2(b, n, 3), idx(b, n, 3)\n__global__ void nearestneighbor_cuda_kernel_fast(int b, int n, int m, const float *__restrict__ unknown, const float *__restrict__ known, float *__restrict__ dist2, int *__restrict__ idx) {\n\n    int bs_idx = blockIdx.y;\n    int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;\n    if (bs_idx >= b || pt_idx >= n) return;\n\n    unknown += bs_idx * n * 3 + pt_idx * 3;\n    known += bs_idx * m * 3;\n    dist2 += bs_idx * n * 3 + pt_idx * 3;\n    idx += bs_idx * n * 3 + pt_idx * 3;\n\n    float ux = unknown[0];\n    float uy = unknown[1];\n    float uz = unknown[2];\n\n    double best1 = 1e40, best2 = 1e40, best3 = 1e40;\n    int besti1 = 0, besti2 = 0, besti3 = 0;\n    for (int k = 0; k < m; ++k) {\n        float x = known[k * 3 + 0];\n        float y = known[k * 3 + 1];\n        float z = known[k * 3 + 2];\n        float d = (ux - x) * (ux - x) + (uy - y) * (uy - y) + (uz - z) * (uz - z);\n        if (d < best1) {\n            best3 = best2; besti3 = besti2;\n            best2 = best1; besti2 = besti1;\n            best1 = d; besti1 = k;\n        }\n        else if (d < best2) {\n            best3 = best2; besti3 = besti2;\n            best2 = d; besti2 = k;\n        }\n        else if (d < best3) {\n            best3 = d; besti3 = k;\n        }\n    }\n    dist2[0] = best1;\n    dist2[1] = best2;\n    dist2[2] = best3;\n\n    idx[0] = besti1;\n    idx[1] = besti2;\n    idx[2] = besti3;\n}\n\n\n// input: points(b, c, m), idx(b, n, 3), weight(b, n, 3)\n// output: out(b, c, n)\n__global__ void interpolation_forward_cuda_kernel_fast(int b, int c, int m, int n, const float *__restrict__ points, const int *__restrict__ idx, const float *__restrict__ weight, float *__restrict__ out) {\n\n    int bs_idx = blockIdx.z;\n    int c_idx = blockIdx.y;\n    int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;\n\n    if (bs_idx >= b || c_idx >= c || pt_idx >= n) return;\n\n    weight += bs_idx * n * 3 + pt_idx * 3;\n    points += bs_idx * c * m + c_idx * m;\n    idx += bs_idx * n * 3 + pt_idx * 3;\n    out += bs_idx * c * n + c_idx * n;\n\n    out[pt_idx] = weight[0] * points[idx[0]] + weight[1] * points[idx[1]] + weight[2] * points[idx[2]];\n}\n\n\nvoid nearestneighbor_cuda_launcher_fast(int b, int n, int m, const float *unknown, const float *known, float *dist2, int *idx)\n{\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(n, THREADS_PER_BLOCK), b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    nearestneighbor_cuda_kernel_fast<<<blocks, threads, 0>>>(b, n, m, unknown, known, dist2, idx);\n\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}\n\nvoid interpolation_forward_cuda_launcher_fast(int b, int c, int m, int n, const float *points, const int *idx, const float *weight, float *out) {\n\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(n, THREADS_PER_BLOCK), c, b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n    interpolation_forward_cuda_kernel_fast<<<blocks, threads, 0>>>(b, c, m, n, points, idx, weight, out);\n\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\",\n        cudaGetErrorString(err));\n        exit(-1);\n    }\n}\n"
  },
  {
    "path": "lib/pointops/src/interpolation/interpolation_cuda_kernel.h",
    "content": "#ifndef _INTERPOLATION_CUDA_KERNEL\n#define _INTERPOLATION_CUDA_KERNEL\n#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n\nvoid nearestneighbor_cuda(int b, int n, int m, at::Tensor unknown_tensor, at::Tensor known_tensor, at::Tensor dist2_tensor, at::Tensor idx_tensor);\nvoid interpolation_forward_cuda(int b, int c, int m, int n, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor out_tensor);\nvoid interpolation_backward_cuda(int b, int c, int n, int m, at::Tensor grad_out_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor grad_points_tensor);\n\nvoid nearestneighbor_cuda_fast(int b, int n, int m, at::Tensor unknown_tensor, at::Tensor known_tensor, at::Tensor dist2_tensor, at::Tensor idx_tensor);\nvoid interpolation_forward_cuda_fast(int b, int c, int m, int n, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor weight_tensor, at::Tensor out_tensor);\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\nvoid nearestneighbor_cuda_launcher(int b, int n, int m, const float *unknown, const float *known, float *dist2, int *idx);\nvoid interpolation_forward_cuda_launcher(int b, int c, int m, int n, const float *points, const int *idx, const float *weight, float *out);\nvoid interpolation_backward_cuda_launcher(int b, int c, int n, int m, const float *grad_out, const int *idx, const float *weight, float *grad_points);\n\nvoid nearestneighbor_cuda_launcher_fast(int b, int n, int m, const float *unknown, const float *known, float *dist2, int *idx);\nvoid interpolation_forward_cuda_launcher_fast(int b, int c, int m, int n, const float *points, const int *idx, const float *weight, float *out);\n\n#ifdef __cplusplus\n}\n#endif\n#endif\n"
  },
  {
    "path": "lib/pointops/src/knnquery/__init__.py",
    "content": ""
  },
  {
    "path": "lib/pointops/src/knnquery/knnquery_cuda.cpp",
    "content": "#include <torch/serialize/tensor.h>\n#include <vector>\n#include <THC/THC.h>\n#include <ATen/cuda/CUDAContext.h>\n\n#include \"knnquery_cuda_kernel.h\"\n\nextern THCState *state;\n\n#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, \" must be a CUDAtensor \")\n#define CHECK_CONTIGUOUS(x) AT_CHECK(x.is_contiguous(), #x, \" must be contiguous \")\n#define CHECK_INPUT(x) CHECK_CUDA(x);CHECK_CONTIGUOUS(x)\n\n\nvoid knnquery_cuda(int b, int n, int m, int nsample, at::Tensor xyz_tensor, at::Tensor new_xyz_tensor, at::Tensor idx_tensor, at::Tensor dist2_tensor)\n{\n    CHECK_INPUT(new_xyz_tensor);\n    CHECK_INPUT(xyz_tensor);\n\n    const float *new_xyz = new_xyz_tensor.data<float>();\n    const float *xyz = xyz_tensor.data<float>();\n    int *idx = idx_tensor.data<int>();\n    float *dist2 = dist2_tensor.data<float>();\n\n    cudaStream_t stream = THCState_getCurrentStream(state);\n\n    knnquery_cuda_launcher(b, n, m, nsample, xyz, new_xyz, idx, dist2, stream);\n}\n"
  },
  {
    "path": "lib/pointops/src/knnquery/knnquery_cuda_kernel.cu",
    "content": "#include \"../cuda_utils.h\"\n#include \"knnquery_cuda_kernel.h\"\n\n// input: xyz (b, n, 3) new_xyz (b, m, 3)\n// output: idx (b, m, nsample) dist2 (b, m, nsample)\n__global__ void knnquery_cuda_kernel(int b, int n, int m, int nsample, const float *__restrict__ xyz, const float *__restrict__ new_xyz, int *__restrict__ idx, float *__restrict__ dist2) {\n    int bs_idx = blockIdx.y;\n    int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;\n    if (bs_idx >= b || pt_idx >= m) return;\n\n    new_xyz += bs_idx * m * 3 + pt_idx * 3;\n    xyz += bs_idx * n * 3;\n    idx += bs_idx * m * nsample + pt_idx * nsample;\n\n    float new_x = new_xyz[0];\n    float new_y = new_xyz[1];\n    float new_z = new_xyz[2];\n\n    //double* best = new double[nsample];\n    //int* besti = new int[nsample];\n    double best[200];\n    int besti[200];\n    for(int i = 0; i < nsample; i++){\n        best[i] = 1e40;\n        besti[i] = 0;\n    }\n    for(int k = 0; k < n; k++){\n        float x = xyz[k * 3 + 0];\n        float y = xyz[k * 3 + 1];\n        float z = xyz[k * 3 + 2];\n        float d2 = (new_x - x) * (new_x - x) + (new_y - y) * (new_y - y) + (new_z - z) * (new_z - z);\n        for(int j = 0; j < nsample; j++){\n            if(d2 < best[j]){\n                for(int i = nsample - 1; i > j; i--){\n                    best[i] = best[i - 1];\n                    besti[i] = besti[i - 1];\n                }\n                best[j] = d2;\n                besti[j] = k;\n                break;\n            }\n        }\n    }\n    for(int i = 0; i < nsample; i++){\n        idx[i] = besti[i];\n        dist2[i] = best[i];\n    }\n    //delete []best;\n    //delete []besti;\n}\n\n\nvoid knnquery_cuda_launcher(int b, int n, int m, int nsample, const float *xyz, const float *new_xyz, int *idx, float *dist2, cudaStream_t stream) {\n    // param new_xyz: (B, m, 3)\n    // param xyz: (B, n, 3)\n    // param idx: (B, m, nsample)\n\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(m, THREADS_PER_BLOCK), b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    knnquery_cuda_kernel<<<blocks, threads, 0, stream>>>(b, n, m, nsample, xyz, new_xyz, idx, dist2);\n    // cudaDeviceSynchronize();  // for using printf in kernel function\n\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}"
  },
  {
    "path": "lib/pointops/src/knnquery/knnquery_cuda_kernel.h",
    "content": "#ifndef _KNNQUERY_CUDA_KERNEL\n#define _KNNQUERY_CUDA_KERNEL\n\n#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n\nvoid knnquery_cuda(int b, int n, int m, int nsample, at::Tensor xyz_tensor, at::Tensor new_xyz_tensor, at::Tensor idx_tensor, at::Tensor dist2_tensor);\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\nvoid knnquery_cuda_launcher(int b, int n, int m, int nsample, const float *xyz, const float *new_xyz, int *idx, float *dist2, cudaStream_t stream);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif"
  },
  {
    "path": "lib/pointops/src/labelstat/labelstat_cuda.cpp",
    "content": "#include <torch/serialize/tensor.h>\n#include <vector>\n#include <THC/THC.h>\n#include <ATen/cuda/CUDAContext.h>\n\n#include \"labelstat_cuda_kernel.h\"\n\nextern THCState *state;\n\n#define CHECK_CUDA(x) AT_CHECK(x.type().is_cuda(), #x, \" must be a CUDAtensor \")\n#define CHECK_CONTIGUOUS(x) AT_CHECK(x.is_contiguous(), #x, \" must be contiguous \")\n#define CHECK_INPUT(x) CHECK_CUDA(x);CHECK_CONTIGUOUS(x)\n\nvoid labelstat_idx_cuda_fast(int b, int n, int m, int nsample, int nclass,\n    at::Tensor label_stat_tensor, at::Tensor idx_tensor, at::Tensor new_label_stat_tensor)\n{\n    CHECK_INPUT(label_stat_tensor);\n    CHECK_INPUT(idx_tensor);\n\n    const int *label_stat = label_stat_tensor.data<int>();\n    const int *idx = idx_tensor.data<int>();\n    int *new_label_stat = new_label_stat_tensor.data<int>();\n\n    cudaStream_t stream = THCState_getCurrentStream(state);\n\n    labelstat_idx_cuda_launcher_fast(b, n, m, nsample, nclass, label_stat, idx, new_label_stat, stream);\n}\n\nvoid labelstat_ballrange_cuda_fast(int b, int n, int m, float radius, int nclass,\n    at::Tensor new_xyz_tensor, at::Tensor xyz_tensor, at::Tensor label_stat_tensor, at::Tensor new_label_stat_tensor)\n{\n    CHECK_INPUT(new_xyz_tensor);\n    CHECK_INPUT(xyz_tensor);\n    CHECK_INPUT(label_stat_tensor);\n\n    const float *new_xyz = new_xyz_tensor.data<float>();\n    const float *xyz = xyz_tensor.data<float>();\n    const int *label_stat = label_stat_tensor.data<int>();\n    int *new_label_stat = new_label_stat_tensor.data<int>();\n\n    cudaStream_t stream = THCState_getCurrentStream(state);\n\n    labelstat_ballrange_cuda_launcher_fast(b, n, m, radius, nclass, new_xyz, xyz, label_stat, new_label_stat, stream);\n}\n\nvoid labelstat_and_ballquery_cuda_fast(int b, int n, int m, float radius, int nsample, int nclass,\n    at::Tensor new_xyz_tensor, at::Tensor xyz_tensor, at::Tensor label_stat_tensor, at::Tensor idx_tensor, at::Tensor new_label_stat_tensor)\n{\n    CHECK_INPUT(new_xyz_tensor);\n    CHECK_INPUT(xyz_tensor);\n    CHECK_INPUT(label_stat_tensor);\n    CHECK_INPUT(idx_tensor);\n\n    const float *new_xyz = new_xyz_tensor.data<float>();\n    const float *xyz = xyz_tensor.data<float>();\n    const int *label_stat = label_stat_tensor.data<int>();\n    int *idx = idx_tensor.data<int>();\n    int *new_label_stat = new_label_stat_tensor.data<int>();\n\n    cudaStream_t stream = THCState_getCurrentStream(state);\n\n    labelstat_and_ballquery_cuda_launcher_fast(b, n, m, radius, nsample, nclass, new_xyz, xyz, label_stat, idx, new_label_stat, stream);\n}\n"
  },
  {
    "path": "lib/pointops/src/labelstat/labelstat_cuda_kernel.cu",
    "content": "#include \"../cuda_utils.h\"\n#include \"labelstat_cuda_kernel.h\"\n\n// input: new_xyz(b, m, 3) xyz(b, n, 3) label_stat(b, n, nclass)\n// output: idx(b, m, nsample)  new_label_stat(b, m, nclass)\n__global__ void labelstat_and_ballquery_cuda_kernel_fast(int b, int n, int m, float radius, int nsample, int nclass,\n    const float *new_xyz, const float *xyz, const int *label_stat, int *idx, int *new_label_stat) {\n    int bs_idx = blockIdx.y;\n    int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;\n    if (bs_idx >= b || pt_idx >= m) return;\n\n    new_xyz += bs_idx * m * 3 + pt_idx * 3;\n    xyz += bs_idx * n * 3;\n    idx += bs_idx * m * nsample + pt_idx * nsample;\n    label_stat += bs_idx * n * nclass;\n    new_label_stat += bs_idx * m * nclass + pt_idx * nclass;\n\n    for(int i = 0; i < nclass; i++){\n        new_label_stat[i] = 0;\n    }\n\n    float radius2 = radius * radius;\n    float new_x = new_xyz[0];\n    float new_y = new_xyz[1];\n    float new_z = new_xyz[2];\n\n    int cnt = 0;\n    for (int k = 0; k < n; ++k) {\n        float x = xyz[k * 3 + 0];\n        float y = xyz[k * 3 + 1];\n        float z = xyz[k * 3 + 2];\n        float d2 = (new_x - x) * (new_x - x) + (new_y - y) * (new_y - y) + (new_z - z) * (new_z - z);\n        if (d2 < radius2){\n            for(int i = 0; i < nclass; i++){\n                new_label_stat[i] += label_stat[k * nclass + i];\n            }\n            if (cnt == 0){\n                for (int l = 0; l < nsample; ++l) {\n                    idx[l] = k;\n                }\n            }\n            idx[cnt] = k;\n            ++cnt;\n            if (cnt >= nsample){\n                break;\n            }\n        }\n    }\n}\n\nvoid labelstat_and_ballquery_cuda_launcher_fast(int b, int n, int m, float radius, int nsample, int nclass,\n    const float *new_xyz, const float *xyz, const int *label_stat, int *idx, int *new_label_stat, cudaStream_t stream) {\n    // param new_xyz: (B, m, 3)\n    // param xyz: (B, n, 3)\n    // param idx: (B, m, nsample)\n\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(m, THREADS_PER_BLOCK), b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    labelstat_and_ballquery_cuda_kernel_fast<<<blocks, threads, 0, stream>>>(b, n, m, radius, nsample, nclass, new_xyz, xyz, label_stat, idx, new_label_stat);\n    // cudaDeviceSynchronize();  // for using printf in kernel function\n\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}\n\n// input: new_xyz(b, m, 3) xyz(b, n, 3) label_stat(b, n, nclass)\n// output: new_label_stat(b, m, nclass)\n__global__ void labelstat_ballrange_cuda_kernel_fast(int b, int n, int m, float radius, int nclass,\n    const float *new_xyz, const float *xyz, const int *label_stat, int *new_label_stat) {\n    int bs_idx = blockIdx.y;\n    int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;\n    if (bs_idx >= b || pt_idx >= m) return;\n\n    new_xyz += bs_idx * m * 3 + pt_idx * 3;\n    xyz += bs_idx * n * 3;\n    label_stat += bs_idx * n * nclass;\n    new_label_stat += bs_idx * m * nclass + pt_idx * nclass;\n\n    for(int i = 0; i < nclass; i++){\n        new_label_stat[i] = 0;\n    }\n\n    float radius2 = radius * radius;\n    float new_x = new_xyz[0];\n    float new_y = new_xyz[1];\n    float new_z = new_xyz[2];\n\n    for (int k = 0; k < n; ++k) {\n        float x = xyz[k * 3 + 0];\n        float y = xyz[k * 3 + 1];\n        float z = xyz[k * 3 + 2];\n        float d2 = (new_x - x) * (new_x - x) + (new_y - y) * (new_y - y) + (new_z - z) * (new_z - z);\n        if (d2 < radius2){\n            for(int i = 0; i < nclass; i++){\n                new_label_stat[i] += label_stat[k * nclass + i];\n            }\n        }\n    }\n}\n\n\nvoid labelstat_ballrange_cuda_launcher_fast(int b, int n, int m, float radius, int nclass,\n    const float *new_xyz, const float *xyz, const int *label_stat, int *new_label_stat, cudaStream_t stream) {\n    // param new_xyz: (B, m, 3)\n    // param xyz: (B, n, 3)\n    // param idx: (B, m, nsample)\n\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(m, THREADS_PER_BLOCK), b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    labelstat_ballrange_cuda_kernel_fast<<<blocks, threads, 0, stream>>>(b, n, m, radius, nclass, new_xyz, xyz, label_stat, new_label_stat);\n    // cudaDeviceSynchronize();  // for using printf in kernel function\n\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}\n\n// input: idx(b, m, nsample) label_stat(b, n, nclass)\n// output: new_label_stat(b, m, nclass)\n__global__ void labelstat_idx_cuda_kernel_fast(int b, int n, int m, int nsample, int nclass,\n    const int *label_stat, const int *idx, int *new_label_stat) {\n    int bs_idx = blockIdx.y;\n    int pt_idx = blockIdx.x * blockDim.x + threadIdx.x;\n    if (bs_idx >= b || pt_idx >= m) return;\n\n    idx += bs_idx * m * nsample + pt_idx * nsample;\n    label_stat += bs_idx * n * nclass;\n    new_label_stat += bs_idx * m * nclass + pt_idx * nclass;\n\n    for(int i = 0; i < nclass; i++){\n        new_label_stat[i] = 0;\n    }\n\n    for(int k = 0; k < nsample; k++){\n        const int *label_stat_k = label_stat + idx[k] * nclass;\n        for(int i = 0; i < nclass; i++){\n            new_label_stat[i] += label_stat_k[i];\n        }\n    }\n}\n\n\nvoid labelstat_idx_cuda_launcher_fast(int b, int n, int m, int nsample, int nclass,\n    const int *label_stat, const int *idx, int *new_label_stat, cudaStream_t stream) {\n    // param new_xyz: (B, m, 3)\n    // param xyz: (B, n, 3)\n    // param idx: (B, m, nsample)\n\n    cudaError_t err;\n\n    dim3 blocks(DIVUP(m, THREADS_PER_BLOCK), b);  // blockIdx.x(col), blockIdx.y(row)\n    dim3 threads(THREADS_PER_BLOCK);\n\n    labelstat_idx_cuda_kernel_fast<<<blocks, threads, 0, stream>>>(b, n, m, nsample, nclass, label_stat, idx, new_label_stat);\n    // cudaDeviceSynchronize();  // for using printf in kernel function\n\n    err = cudaGetLastError();\n    if (cudaSuccess != err) {\n        fprintf(stderr, \"CUDA kernel failed : %s\\n\", cudaGetErrorString(err));\n        exit(-1);\n    }\n}"
  },
  {
    "path": "lib/pointops/src/labelstat/labelstat_cuda_kernel.h",
    "content": "#ifndef _LABELSTAT_CUDA_KERNEL\n#define _LABELSTAT_CUDA_KERNEL\n#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n\nvoid labelstat_and_ballquery_cuda_fast(int b, int n, int m, float radius, int nsample, int nclass,\n    at::Tensor new_xyz_tensor, at::Tensor xyz_tensor, at::Tensor label_stat_tensor, at::Tensor idx_tensor, at::Tensor new_label_stat_tensor);\n\nvoid labelstat_ballrange_cuda_fast(int b, int n, int m, float radius, int nclass,\n    at::Tensor new_xyz_tensor, at::Tensor xyz_tensor, at::Tensor label_stat_tensor, at::Tensor new_label_stat_tensor);\n\nvoid labelstat_idx_cuda_fast(int b, int n, int m, int nsample, int nclass,\n    at::Tensor label_stat_tensor, at::Tensor idx_tensor, at::Tensor new_label_stat_tensor);\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\nvoid labelstat_and_ballquery_cuda_launcher_fast(int b, int n, int m, float radius, int nsample, int nclass, \\\n    const float *new_xyz, const float *xyz, const int *label_stat, int *idx, int *new_label_stat, cudaStream_t stream);\n\nvoid labelstat_ballrange_cuda_launcher_fast(int b, int n, int m, float radius, int nclass, \\\n    const float *new_xyz, const float *xyz, const int *label_stat, int *new_label_stat, cudaStream_t stream);\n\nvoid labelstat_idx_cuda_launcher_fast(int b, int n, int m, int nsample, int nclass, \\\n    const int *label_stat, const int *idx, int *new_label_stat, cudaStream_t stream);\n\n#ifdef __cplusplus\n}\n#endif\n\n#endif\n"
  },
  {
    "path": "lib/pointops/src/pointops_api.cpp",
    "content": "#include <torch/serialize/tensor.h>\n#include <torch/extension.h>\n\n#include \"ballquery/ballquery_cuda_kernel.h\"\n#include \"grouping/grouping_cuda_kernel.h\"\n#include \"grouping_int/grouping_int_cuda_kernel.h\"\n#include \"sampling/sampling_cuda_kernel.h\"\n#include \"interpolation/interpolation_cuda_kernel.h\"\n#include \"knnquery/knnquery_cuda_kernel.h\"\n\n#include \"labelstat/labelstat_cuda_kernel.h\"\n#include \"featuredistribute/featuredistribute_cuda_kernel.h\"\n\n\nPYBIND11_MODULE(TORCH_EXTENSION_NAME, m) {\n    m.def(\"ballquery_cuda\", &ballquery_cuda_fast, \"ballquery_cuda_fast\");   // name in python, cpp function address, docs\n\n    m.def(\"knnquery_cuda\", &knnquery_cuda, \"knnquery_cuda\");\n\n    m.def(\"grouping_forward_cuda\", &grouping_forward_cuda_fast, \"grouping_forward_cuda_fast\");\n    m.def(\"grouping_backward_cuda\", &grouping_backward_cuda, \"grouping_backward_cuda\");\n\n    m.def(\"grouping_int_forward_cuda\", &grouping_int_forward_cuda_fast, \"grouping_int_forward_cuda_fast\");\n\n    m.def(\"gathering_forward_cuda\", &gathering_forward_cuda, \"gathering_forward_cuda\");\n    m.def(\"gathering_backward_cuda\", &gathering_backward_cuda, \"gathering_backward_cuda\");\n    m.def(\"furthestsampling_cuda\", &furthestsampling_cuda, \"furthestsampling_cuda\");\n\n    m.def(\"nearestneighbor_cuda\", &nearestneighbor_cuda_fast, \"nearestneighbor_cuda_fast\");\n    m.def(\"interpolation_forward_cuda\", &interpolation_forward_cuda_fast, \"interpolation_forward_cuda_fast\");\n    m.def(\"interpolation_backward_cuda\", &interpolation_backward_cuda, \"interpolation_backward_cuda\");\n\n    m.def(\"labelstat_idx_cuda\", &labelstat_idx_cuda_fast, \"labelstat_idx_cuda_fast\");\n    m.def(\"labelstat_ballrange_cuda\", &labelstat_ballrange_cuda_fast, \"labelstat_ballrange_cuda_fast\");\n    m.def(\"labelstat_and_ballquery_cuda\", &labelstat_and_ballquery_cuda_fast, \"labelstat_and_ballquery_cuda_fast\");\n\n    m.def(\"featuredistribute_cuda\", &featuredistribute_cuda, \"featuredistribute_cuda\");\n    m.def(\"featuregather_forward_cuda\", &featuregather_forward_cuda, \"featuregather_forward_cuda\");\n    m.def(\"featuregather_backward_cuda\", &featuregather_backward_cuda, \"featuregather_backward_cuda\");\n}\n"
  },
  {
    "path": "lib/pointops/src/sampling/sampling_cuda.cpp",
    "content": "#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n#include <THC/THC.h>\n#include \"sampling_cuda_kernel.h\"\n\nextern THCState *state;\n\nvoid gathering_forward_cuda(int b, int c, int n, int m, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out_tensor)\n{\n    const float *points = points_tensor.data<float>();\n    const int *idx = idx_tensor.data<int>();\n    float *out = out_tensor.data<float>();\n    gathering_forward_cuda_launcher(b, c, n, m, points, idx, out);\n}\n\nvoid gathering_backward_cuda(int b, int c, int n, int m, at::Tensor grad_out_tensor, at::Tensor idx_tensor, at::Tensor grad_points_tensor)\n{\n\n    const float *grad_out = grad_out_tensor.data<float>();\n    const int *idx = idx_tensor.data<int>();\n    float *grad_points = grad_points_tensor.data<float>();\n    gathering_backward_cuda_launcher(b, c, n, m, grad_out, idx, grad_points);\n}\n\nvoid furthestsampling_cuda(int b, int n, int m, at::Tensor points_tensor, at::Tensor temp_tensor, at::Tensor idx_tensor)\n{\n    const float *points = points_tensor.data<float>();\n    float *temp = temp_tensor.data<float>();\n    int *idx = idx_tensor.data<int>();\n    furthestsampling_cuda_launcher(b, n, m, points, temp, idx);\n}\n"
  },
  {
    "path": "lib/pointops/src/sampling/sampling_cuda_kernel.cu",
    "content": "#include \"../cuda_utils.h\"\n#include \"sampling_cuda_kernel.h\"\n\n// input: points(b, c, n) idx(b, m)\n// output: out(b, c, m)\n__global__ void gathering_forward_cuda_kernel(int b, int c, int n, int m, const float *points, const int *idx, float *out)\n{\n    for (int i = blockIdx.x; i < b; i += gridDim.x)\n    {\n        for (int l = blockIdx.y; l < c; l += gridDim.y)\n        {\n            for (int j = threadIdx.x; j < m; j += blockDim.x)\n            {\n                int a = idx[i * m + j];\n                out[(i * c + l) * m + j] = points[(i * c + l) * n + a];\n            }\n        }\n    }\n}\n\n// input: grad_out(b, c, m) idx(b, m)\n// output: grad_points(b, c, n)\n__global__ void gathering_backward_cuda_kernel(int b, int c, int n, int m, const float *grad_out, const int *idx, float *grad_points)\n{\n    for (int i = blockIdx.x; i < b; i += gridDim.x)\n    {\n        for (int l = blockIdx.y; l < c; l += gridDim.y)\n        {\n            for (int j = threadIdx.x; j < m; j += blockDim.x)\n            {\n                int a = idx[i * m + j];\n                atomicAdd(grad_points + (i * c + l) * n + a, grad_out[(i * c + l) * m + j]);\n            }\n        }\n    }\n}\n\nvoid gathering_forward_cuda_launcher(int b, int c, int n, int m, const float *points, const int *idx, float *out)\n{\n    gathering_forward_cuda_kernel<<<dim3(b, c, 1), opt_n_threads(m), 0>>>(b, c, n, m, points, idx, out);\n}\n\nvoid gathering_backward_cuda_launcher(int b, int c, int n, int m, const float *grad_out, const int *idx, float *grad_points)\n{\n    gathering_backward_cuda_kernel<<<dim3(b, c, 1), opt_n_threads(m), 0>>>(b, c, n, m, grad_out, idx, grad_points);\n}\n\n__device__ void __update(float *dists, int *dists_i,\n\t\t\t int idx1, int idx2) {\n    const float v1 = dists[idx1], v2 = dists[idx2];\n    const int i1 = dists_i[idx1], i2 = dists_i[idx2];\n    dists[idx1] = max(v1, v2);\n    dists_i[idx1] = v2 > v1 ? i2 : i1;\n}\n\n// Input dataset: (b, n, 3), tmp: (b, n)\n// Ouput idxs (b, m)\ntemplate <unsigned int block_size>\n__global__ void furthestsampling_cuda_kernel(int b, int n, int m, const float *dataset, float *temp, int *idxs)\n{\n    if (m <= 0)\n\t    return;\n    __shared__ float dists[block_size];\n    __shared__ int dists_i[block_size];\n\n    int batch_index = blockIdx.x;\n    dataset += batch_index * n * 3;\n    temp += batch_index * n;\n    idxs += batch_index * m;\n    int tid = threadIdx.x;\n    const int stride = block_size;\n    int old = 0;\n    if (threadIdx.x == 0)\n\t    idxs[0] = old;\n\n    __syncthreads();\n    for (int j = 1; j < m; j++)\n    {\n        int besti = 0;\n        float best = -1;\n        float x1 = dataset[old * 3 + 0];\n        float y1 = dataset[old * 3 + 1];\n        float z1 = dataset[old * 3 + 2];\n        for (int k = tid; k < n; k += stride)\n        {\n            float x2, y2, z2;\n            x2 = dataset[k * 3 + 0];\n            y2 = dataset[k * 3 + 1];\n            z2 = dataset[k * 3 + 2];\n            //float mag = (x2 * x2) + (y2 * y2) + (z2 * z2);\n            //if (mag <= 1e-3)\n            //    continue;\n            float d = (x2 - x1) * (x2 - x1) + (y2 - y1) * (y2 - y1) + (z2 - z1) * (z2 - z1);\n            float d2 = min(d, temp[k]);\n            temp[k] = d2;\n            besti = d2 > best ? k : besti;\n            best = d2 > best ? d2 : best;\n        }\n        dists[tid] = best;\n        dists_i[tid] = besti;\n        __syncthreads();\n\n        if (block_size >= 1024) {\n            if (tid < 512) {\n            __update(dists, dists_i, tid, tid + 512);\n            }\n            __syncthreads();\n        }\n        if (block_size >= 512) {\n            if (tid < 256) {\n            __update(dists, dists_i, tid, tid + 256);\n            }\n            __syncthreads();\n        }\n        if (block_size >= 256) {\n            if (tid < 128) {\n            __update(dists, dists_i, tid, tid + 128);\n            }\n            __syncthreads();\n        }\n        if (block_size >= 128) {\n            if (tid < 64) {\n            __update(dists, dists_i, tid, tid + 64);\n            }\n            __syncthreads();\n        }\n        if (block_size >= 64) {\n            if (tid < 32) {\n            __update(dists, dists_i, tid, tid + 32);\n            }\n            __syncthreads();\n        }\n        if (block_size >= 32) {\n            if (tid < 16) {\n            __update(dists, dists_i, tid, tid + 16);\n            }\n            __syncthreads();\n        }\n        if (block_size >= 16) {\n            if (tid < 8) {\n            __update(dists, dists_i, tid, tid + 8);\n            }\n            __syncthreads();\n        }\n        if (block_size >= 8) {\n            if (tid < 4) {\n            __update(dists, dists_i, tid, tid + 4);\n            }\n            __syncthreads();\n        }\n        if (block_size >= 4) {\n            if (tid < 2) {\n            __update(dists, dists_i, tid, tid + 2);\n            }\n            __syncthreads();\n        }\n        if (block_size >= 2) {\n            if (tid < 1) {\n            __update(dists, dists_i, tid, tid + 1);\n            }\n            __syncthreads();\n        }\n\n        old = dists_i[0];\n        if (tid == 0)\n            idxs[j] = old;\n    }\n}\n\nvoid furthestsampling_cuda_launcher(int b, int n, int m, const float *dataset, float *temp, int *idxs)\n{   \n\tunsigned int n_threads = opt_n_threads(n);\n\tswitch (n_threads) {\n\t    case 1024:\n\t        furthestsampling_cuda_kernel<1024><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t        break;\n\t\tcase 512:\n\t\t\tfurthestsampling_cuda_kernel<512><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t\t\tbreak;\n    \tcase 256:\n\t\t\tfurthestsampling_cuda_kernel<256><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t\t\tbreak;\n    \tcase 128:\n\t\t\tfurthestsampling_cuda_kernel<128><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t\t\tbreak;\n    \tcase 64:\n\t\t\tfurthestsampling_cuda_kernel<64><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t\t\tbreak;\n    \tcase 32:\n\t\t\tfurthestsampling_cuda_kernel<32><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t\t\tbreak;\n    \tcase 16:\n\t\t\tfurthestsampling_cuda_kernel<16><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t\t\tbreak;\n    \tcase 8:\n\t\t\tfurthestsampling_cuda_kernel<8><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t\t\tbreak;\n\t    case 4:\n\t\t\tfurthestsampling_cuda_kernel<4><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t\t\tbreak;\n\t    case 2:\n\t\t\tfurthestsampling_cuda_kernel<2><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t\t\tbreak;\n\t    case 1:\n\t\t\tfurthestsampling_cuda_kernel<1><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t\t\tbreak;\n\t    default:\n\t\t\tfurthestsampling_cuda_kernel<512><<<b, n_threads, 0>>>(b, n, m, dataset, temp, idxs);\n\t    }\n}\n"
  },
  {
    "path": "lib/pointops/src/sampling/sampling_cuda_kernel.h",
    "content": "#ifndef _SAMPLING_CUDA_KERNEL\n#define _SAMPLING_CUDA_KERNEL\n#include <torch/serialize/tensor.h>\n#include <vector>\n#include <ATen/cuda/CUDAContext.h>\n\nvoid gathering_forward_cuda(int b, int c, int n, int m, at::Tensor points_tensor, at::Tensor idx_tensor, at::Tensor out_tensor);\nvoid gathering_backward_cuda(int b, int c, int n, int m, at::Tensor grad_out_tensor, at::Tensor idx_tensor, at::Tensor grad_points_tensor);\nvoid furthestsampling_cuda(int b, int n, int m, at::Tensor points_tensor, at::Tensor temp_tensor, at::Tensor idx_tensor);\n\n#ifdef __cplusplus\nextern \"C\" {\n#endif\n\nvoid gathering_forward_cuda_launcher(int b, int c, int n, int m, const float *points, const int *idx, float *out);\nvoid gathering_backward_cuda_launcher(int b, int c, int n, int m, const float *grad_out, const int *idx, float *grad_points);\nvoid furthestsampling_cuda_launcher(int b, int n, int m, const float *dataset, float *temp, int *idxs);\n\n#ifdef __cplusplus\n}\n#endif\n#endif\n"
  },
  {
    "path": "lib/sync_bn/__init__.py",
    "content": "# -*- coding: utf-8 -*-\n# File   : __init__.py\n# Author : Jiayuan Mao\n# Email  : maojiayuan@gmail.com\n# Date   : 27/01/2018\n# \n# This file is part of Synchronized-BatchNorm-PyTorch.\n# https://github.com/vacancy/Synchronized-BatchNorm-PyTorch\n# Distributed under MIT License.\n\nfrom .batchnorm import SynchronizedBatchNorm1d, SynchronizedBatchNorm2d, SynchronizedBatchNorm3d\nfrom .replicate import DataParallelWithCallback, patch_replication_callback\n"
  },
  {
    "path": "lib/sync_bn/batchnorm.py",
    "content": "# -*- coding: utf-8 -*-\n# File   : batchnorm.py\n# Author : Jiayuan Mao\n# Email  : maojiayuan@gmail.com\n# Date   : 27/01/2018\n# \n# This file is part of Synchronized-BatchNorm-PyTorch.\n# https://github.com/vacancy/Synchronized-BatchNorm-PyTorch\n# Distributed under MIT License.\n\nimport collections\n\nimport torch\nimport torch.nn.functional as F\n\nfrom torch.nn.modules.batchnorm import _BatchNorm\nfrom torch.nn.parallel._functions import ReduceAddCoalesced, Broadcast\n\nfrom .comm import SyncMaster\n\n__all__ = ['SynchronizedBatchNorm1d', 'SynchronizedBatchNorm2d', 'SynchronizedBatchNorm3d']\n\n\ndef _sum_ft(tensor):\n    \"\"\"sum over the first and last dimention\"\"\"\n    return tensor.sum(dim=0).sum(dim=-1)\n\n\ndef _unsqueeze_ft(tensor):\n    \"\"\"add new dementions at the front and the tail\"\"\"\n    return tensor.unsqueeze(0).unsqueeze(-1)\n\n\n_ChildMessage = collections.namedtuple('_ChildMessage', ['sum', 'ssum', 'sum_size'])\n_MasterMessage = collections.namedtuple('_MasterMessage', ['sum', 'inv_std'])\n\n\nclass _SynchronizedBatchNorm(_BatchNorm):\n    def __init__(self, num_features, eps=1e-5, momentum=0.1, affine=True):\n        super(_SynchronizedBatchNorm, self).__init__(num_features, eps=eps, momentum=momentum, affine=affine)\n\n        self._sync_master = SyncMaster(self._data_parallel_master)\n\n        self._is_parallel = False\n        self._parallel_id = None\n        self._slave_pipe = None\n\n    def forward(self, input):\n        # If it is not parallel computation or is in evaluation mode, use PyTorch's implementation.\n        if not (self._is_parallel and self.training):\n            return F.batch_norm(\n                input, self.running_mean, self.running_var, self.weight, self.bias,\n                self.training, self.momentum, self.eps)\n\n        # Resize the input to (B, C, -1).\n        input_shape = input.size()\n        input = input.view(input.size(0), self.num_features, -1)\n\n        # Compute the sum and square-sum.\n        sum_size = input.size(0) * input.size(2)\n        input_sum = _sum_ft(input)\n        input_ssum = _sum_ft(input ** 2)\n\n        # Reduce-and-broadcast the statistics.\n        if self._parallel_id == 0:\n            mean, inv_std = self._sync_master.run_master(_ChildMessage(input_sum, input_ssum, sum_size))\n        else:\n            mean, inv_std = self._slave_pipe.run_slave(_ChildMessage(input_sum, input_ssum, sum_size))\n\n        # Compute the output.\n        if self.affine:\n            # MJY:: Fuse the multiplication for speed.\n            output = (input - _unsqueeze_ft(mean)) * _unsqueeze_ft(inv_std * self.weight) + _unsqueeze_ft(self.bias)\n        else:\n            output = (input - _unsqueeze_ft(mean)) * _unsqueeze_ft(inv_std)\n\n        # Reshape it.\n        return output.view(input_shape)\n\n    def __data_parallel_replicate__(self, ctx, copy_id):\n        self._is_parallel = True\n        self._parallel_id = copy_id\n\n        # parallel_id == 0 means master device.\n        if self._parallel_id == 0:\n            ctx.sync_master = self._sync_master\n        else:\n            self._slave_pipe = ctx.sync_master.register_slave(copy_id)\n\n    def _data_parallel_master(self, intermediates):\n        \"\"\"Reduce the sum and square-sum, compute the statistics, and broadcast it.\"\"\"\n\n        # Always using same \"device order\" makes the ReduceAdd operation faster.\n        # Thanks to:: Tete Xiao (http://tetexiao.com/)\n        intermediates = sorted(intermediates, key=lambda i: i[1].sum.get_device())\n\n        to_reduce = [i[1][:2] for i in intermediates]\n        to_reduce = [j for i in to_reduce for j in i]  # flatten\n        target_gpus = [i[1].sum.get_device() for i in intermediates]\n\n        sum_size = sum([i[1].sum_size for i in intermediates])\n        sum_, ssum = ReduceAddCoalesced.apply(target_gpus[0], 2, *to_reduce)\n        mean, inv_std = self._compute_mean_std(sum_, ssum, sum_size)\n\n        broadcasted = Broadcast.apply(target_gpus, mean, inv_std)\n\n        outputs = []\n        for i, rec in enumerate(intermediates):\n            outputs.append((rec[0], _MasterMessage(*broadcasted[i*2:i*2+2])))\n\n        return outputs\n\n    def _compute_mean_std(self, sum_, ssum, size):\n        \"\"\"Compute the mean and standard-deviation with sum and square-sum. This method\n        also maintains the moving average on the master device.\"\"\"\n        assert size > 1, 'BatchNorm computes unbiased standard-deviation, which requires size > 1.'\n        mean = sum_ / size\n        sumvar = ssum - sum_ * mean\n        unbias_var = sumvar / (size - 1)\n        bias_var = sumvar / size\n\n        self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean.data\n        self.running_var = (1 - self.momentum) * self.running_var + self.momentum * unbias_var.data\n\n        return mean, bias_var.clamp(self.eps) ** -0.5\n\n\nclass SynchronizedBatchNorm1d(_SynchronizedBatchNorm):\n    r\"\"\"Applies Synchronized Batch Normalization over a 2d or 3d input that is seen as a\n    mini-batch.\n\n    .. math::\n\n        y = \\frac{x - mean[x]}{ \\sqrt{Var[x] + \\epsilon}} * gamma + beta\n\n    This module differs from the built-in PyTorch BatchNorm1d as the mean and\n    standard-deviation are reduced across all devices during training.\n\n    For example, when one uses `nn.DataParallel` to wrap the network during\n    training, PyTorch's implementation normalize the tensor on each device using\n    the statistics only on that device, which accelerated the computation and\n    is also easy to implement, but the statistics might be inaccurate.\n    Instead, in this synchronized version, the statistics will be computed\n    over all training samples distributed on multiple devices.\n    \n    Note that, for one-GPU or CPU-only case, this module behaves exactly same\n    as the built-in PyTorch implementation.\n\n    The mean and standard-deviation are calculated per-dimension over\n    the mini-batches and gamma and beta are learnable parameter vectors\n    of size C (where C is the input size).\n\n    During training, this layer keeps a running estimate of its computed mean\n    and variance. The running sum is kept with a default momentum of 0.1.\n\n    During evaluation, this running mean/variance is used for normalization.\n\n    Because the BatchNorm is done over the `C` dimension, computing statistics\n    on `(N, L)` slices, it's common terminology to call this Temporal BatchNorm\n\n    Args:\n        num_features: num_features from an expected input of size\n            `batch_size x num_features [x width]`\n        eps: a value added to the denominator for numerical stability.\n            Default: 1e-5\n        momentum: the value used for the running_mean and running_var\n            computation. Default: 0.1\n        affine: a boolean value that when set to ``True``, gives the layer learnable\n            affine parameters. Default: ``True``\n\n    Shape:\n        - Input: :math:`(N, C)` or :math:`(N, C, L)`\n        - Output: :math:`(N, C)` or :math:`(N, C, L)` (same shape as input)\n\n    Examples:\n        >>> # With Learnable Parameters\n        >>> m = SynchronizedBatchNorm1d(100)\n        >>> # Without Learnable Parameters\n        >>> m = SynchronizedBatchNorm1d(100, affine=False)\n        >>> input = torch.autograd.Variable(torch.randn(20, 100))\n        >>> output = m(input)\n    \"\"\"\n\n    def _check_input_dim(self, input):\n        if input.dim() != 2 and input.dim() != 3:\n            raise ValueError('expected 2D or 3D input (got {}D input)'\n                             .format(input.dim()))\n        super(SynchronizedBatchNorm1d, self)._check_input_dim(input)\n\n\nclass SynchronizedBatchNorm2d(_SynchronizedBatchNorm):\n    r\"\"\"Applies Batch Normalization over a 4d input that is seen as a mini-batch\n    of 3d inputs\n\n    .. math::\n\n        y = \\frac{x - mean[x]}{ \\sqrt{Var[x] + \\epsilon}} * gamma + beta\n\n    This module differs from the built-in PyTorch BatchNorm2d as the mean and\n    standard-deviation are reduced across all devices during training.\n\n    For example, when one uses `nn.DataParallel` to wrap the network during\n    training, PyTorch's implementation normalize the tensor on each device using\n    the statistics only on that device, which accelerated the computation and\n    is also easy to implement, but the statistics might be inaccurate.\n    Instead, in this synchronized version, the statistics will be computed\n    over all training samples distributed on multiple devices.\n    \n    Note that, for one-GPU or CPU-only case, this module behaves exactly same\n    as the built-in PyTorch implementation.\n\n    The mean and standard-deviation are calculated per-dimension over\n    the mini-batches and gamma and beta are learnable parameter vectors\n    of size C (where C is the input size).\n\n    During training, this layer keeps a running estimate of its computed mean\n    and variance. The running sum is kept with a default momentum of 0.1.\n\n    During evaluation, this running mean/variance is used for normalization.\n\n    Because the BatchNorm is done over the `C` dimension, computing statistics\n    on `(N, H, W)` slices, it's common terminology to call this Spatial BatchNorm\n\n    Args:\n        num_features: num_features from an expected input of\n            size batch_size x num_features x height x width\n        eps: a value added to the denominator for numerical stability.\n            Default: 1e-5\n        momentum: the value used for the running_mean and running_var\n            computation. Default: 0.1\n        affine: a boolean value that when set to ``True``, gives the layer learnable\n            affine parameters. Default: ``True``\n\n    Shape:\n        - Input: :math:`(N, C, H, W)`\n        - Output: :math:`(N, C, H, W)` (same shape as input)\n\n    Examples:\n        >>> # With Learnable Parameters\n        >>> m = SynchronizedBatchNorm2d(100)\n        >>> # Without Learnable Parameters\n        >>> m = SynchronizedBatchNorm2d(100, affine=False)\n        >>> input = torch.autograd.Variable(torch.randn(20, 100, 35, 45))\n        >>> output = m(input)\n    \"\"\"\n\n    def _check_input_dim(self, input):\n        if input.dim() != 4:\n            raise ValueError('expected 4D input (got {}D input)'\n                             .format(input.dim()))\n        super(SynchronizedBatchNorm2d, self)._check_input_dim(input)\n\n\nclass SynchronizedBatchNorm3d(_SynchronizedBatchNorm):\n    r\"\"\"Applies Batch Normalization over a 5d input that is seen as a mini-batch\n    of 4d inputs\n\n    .. math::\n\n        y = \\frac{x - mean[x]}{ \\sqrt{Var[x] + \\epsilon}} * gamma + beta\n\n    This module differs from the built-in PyTorch BatchNorm3d as the mean and\n    standard-deviation are reduced across all devices during training.\n\n    For example, when one uses `nn.DataParallel` to wrap the network during\n    training, PyTorch's implementation normalize the tensor on each device using\n    the statistics only on that device, which accelerated the computation and\n    is also easy to implement, but the statistics might be inaccurate.\n    Instead, in this synchronized version, the statistics will be computed\n    over all training samples distributed on multiple devices.\n    \n    Note that, for one-GPU or CPU-only case, this module behaves exactly same\n    as the built-in PyTorch implementation.\n\n    The mean and standard-deviation are calculated per-dimension over\n    the mini-batches and gamma and beta are learnable parameter vectors\n    of size C (where C is the input size).\n\n    During training, this layer keeps a running estimate of its computed mean\n    and variance. The running sum is kept with a default momentum of 0.1.\n\n    During evaluation, this running mean/variance is used for normalization.\n\n    Because the BatchNorm is done over the `C` dimension, computing statistics\n    on `(N, D, H, W)` slices, it's common terminology to call this Volumetric BatchNorm\n    or Spatio-temporal BatchNorm\n\n    Args:\n        num_features: num_features from an expected input of\n            size batch_size x num_features x depth x height x width\n        eps: a value added to the denominator for numerical stability.\n            Default: 1e-5\n        momentum: the value used for the running_mean and running_var\n            computation. Default: 0.1\n        affine: a boolean value that when set to ``True``, gives the layer learnable\n            affine parameters. Default: ``True``\n\n    Shape:\n        - Input: :math:`(N, C, D, H, W)`\n        - Output: :math:`(N, C, D, H, W)` (same shape as input)\n\n    Examples:\n        >>> # With Learnable Parameters\n        >>> m = SynchronizedBatchNorm3d(100)\n        >>> # Without Learnable Parameters\n        >>> m = SynchronizedBatchNorm3d(100, affine=False)\n        >>> input = torch.autograd.Variable(torch.randn(20, 100, 35, 45, 10))\n        >>> output = m(input)\n    \"\"\"\n\n    def _check_input_dim(self, input):\n        if input.dim() != 5:\n            raise ValueError('expected 5D input (got {}D input)'\n                             .format(input.dim()))\n        super(SynchronizedBatchNorm3d, self)._check_input_dim(input)\n"
  },
  {
    "path": "lib/sync_bn/comm.py",
    "content": "# -*- coding: utf-8 -*-\n# File   : comm.py\n# Author : Jiayuan Mao\n# Email  : maojiayuan@gmail.com\n# Date   : 27/01/2018\n# \n# This file is part of Synchronized-BatchNorm-PyTorch.\n# https://github.com/vacancy/Synchronized-BatchNorm-PyTorch\n# Distributed under MIT License.\n\nimport queue\nimport collections\nimport threading\n\n__all__ = ['FutureResult', 'SlavePipe', 'SyncMaster']\n\n\nclass FutureResult(object):\n    \"\"\"A thread-safe future implementation. Used only as one-to-one pipe.\"\"\"\n\n    def __init__(self):\n        self._result = None\n        self._lock = threading.Lock()\n        self._cond = threading.Condition(self._lock)\n\n    def put(self, result):\n        with self._lock:\n            assert self._result is None, 'Previous result has\\'t been fetched.'\n            self._result = result\n            self._cond.notify()\n\n    def get(self):\n        with self._lock:\n            if self._result is None:\n                self._cond.wait()\n\n            res = self._result\n            self._result = None\n            return res\n\n\n_MasterRegistry = collections.namedtuple('MasterRegistry', ['result'])\n_SlavePipeBase = collections.namedtuple('_SlavePipeBase', ['identifier', 'queue', 'result'])\n\n\nclass SlavePipe(_SlavePipeBase):\n    \"\"\"Pipe for master-slave communication.\"\"\"\n\n    def run_slave(self, msg):\n        self.queue.put((self.identifier, msg))\n        ret = self.result.get()\n        self.queue.put(True)\n        return ret\n\n\nclass SyncMaster(object):\n    \"\"\"An abstract `SyncMaster` object.\n\n    - During the replication, as the data parallel will trigger an callback of each module, all slave devices should\n    call `register(id)` and obtain an `SlavePipe` to communicate with the master.\n    - During the forward pass, master device invokes `run_master`, all messages from slave devices will be collected,\n    and passed to a registered callback.\n    - After receiving the messages, the master device should gather the information and determine to message passed\n    back to each slave devices.\n    \"\"\"\n\n    def __init__(self, master_callback):\n        \"\"\"\n\n        Args:\n            master_callback: a callback to be invoked after having collected messages from slave devices.\n        \"\"\"\n        self._master_callback = master_callback\n        self._queue = queue.Queue()\n        self._registry = collections.OrderedDict()\n        self._activated = False\n\n    def __getstate__(self):\n        return {'master_callback': self._master_callback}\n\n    def __setstate__(self, state):\n        self.__init__(state['master_callback'])\n\n    def register_slave(self, identifier):\n        \"\"\"\n        Register an slave device.\n\n        Args:\n            identifier: an identifier, usually is the device id.\n\n        Returns: a `SlavePipe` object which can be used to communicate with the master device.\n\n        \"\"\"\n        if self._activated:\n            assert self._queue.empty(), 'Queue is not clean before next initialization.'\n            self._activated = False\n            self._registry.clear()\n        future = FutureResult()\n        self._registry[identifier] = _MasterRegistry(future)\n        return SlavePipe(identifier, self._queue, future)\n\n    def run_master(self, master_msg):\n        \"\"\"\n        Main entry for the master device in each forward pass.\n        The messages were first collected from each devices (including the master device), and then\n        an callback will be invoked to compute the message to be sent back to each devices\n        (including the master device).\n\n        Args:\n            master_msg: the message that the master want to send to itself. This will be placed as the first\n            message when calling `master_callback`. For detailed usage, see `_SynchronizedBatchNorm` for an example.\n\n        Returns: the message to be sent back to the master device.\n\n        \"\"\"\n        self._activated = True\n\n        intermediates = [(0, master_msg)]\n        for i in range(self.nr_slaves):\n            intermediates.append(self._queue.get())\n\n        results = self._master_callback(intermediates)\n        assert results[0][0] == 0, 'The first result should belongs to the master.'\n\n        for i, res in results:\n            if i == 0:\n                continue\n            self._registry[i].result.put(res)\n\n        for i in range(self.nr_slaves):\n            assert self._queue.get() is True\n\n        return results[0][1]\n\n    @property\n    def nr_slaves(self):\n        return len(self._registry)\n"
  },
  {
    "path": "lib/sync_bn/replicate.py",
    "content": "# -*- coding: utf-8 -*-\n# File   : replicate.py\n# Author : Jiayuan Mao\n# Email  : maojiayuan@gmail.com\n# Date   : 27/01/2018\n# \n# This file is part of Synchronized-BatchNorm-PyTorch.\n# https://github.com/vacancy/Synchronized-BatchNorm-PyTorch\n# Distributed under MIT License.\n\nimport functools\n\nfrom torch.nn.parallel.data_parallel import DataParallel\n\n__all__ = [\n    'CallbackContext',\n    'execute_replication_callbacks',\n    'DataParallelWithCallback',\n    'patch_replication_callback'\n]\n\n\nclass CallbackContext(object):\n    pass\n\n\ndef execute_replication_callbacks(modules):\n    \"\"\"\n    Execute an replication callback `__data_parallel_replicate__` on each module created by original replication.\n\n    The callback will be invoked with arguments `__data_parallel_replicate__(ctx, copy_id)`\n\n    Note that, as all modules are isomorphism, we assign each sub-module with a context\n    (shared among multiple copies of this module on different devices).\n    Through this context, different copies can share some information.\n\n    We guarantee that the callback on the master copy (the first copy) will be called ahead of calling the callback\n    of any slave copies.\n    \"\"\"\n    master_copy = modules[0]\n    nr_modules = len(list(master_copy.modules()))\n    ctxs = [CallbackContext() for _ in range(nr_modules)]\n\n    for i, module in enumerate(modules):\n        for j, m in enumerate(module.modules()):\n            if hasattr(m, '__data_parallel_replicate__'):\n                m.__data_parallel_replicate__(ctxs[j], i)\n\n\nclass DataParallelWithCallback(DataParallel):\n    \"\"\"\n    Data Parallel with a replication callback.\n\n    An replication callback `__data_parallel_replicate__` of each module will be invoked after being created by\n    original `replicate` functions.\n    The callback will be invoked with arguments `__data_parallel_replicate__(ctx, copy_id)`\n\n    Examples:\n        > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False)\n        > sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1])\n        # sync_bn.__data_parallel_replicate__ will be invoked.\n    \"\"\"\n\n    def replicate(self, module, device_ids):\n        modules = super(DataParallelWithCallback, self).replicate(module, device_ids)\n        execute_replication_callbacks(modules)\n        return modules\n\n\ndef patch_replication_callback(data_parallel):\n    \"\"\"\n    Monkey-patch an existing `DataParallel` object. Add the replication callback.\n    Useful when you have customized `DataParallel` implementation.\n\n    Examples:\n        > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False)\n        > sync_bn = DataParallel(sync_bn, device_ids=[0, 1])\n        > patch_replication_callback(sync_bn)\n        # this is equivalent to\n        > sync_bn = SynchronizedBatchNorm1d(10, eps=1e-5, affine=False)\n        > sync_bn = DataParallelWithCallback(sync_bn, device_ids=[0, 1])\n    \"\"\"\n\n    assert isinstance(data_parallel, DataParallel)\n\n    old_replicate = data_parallel.replicate\n\n    @functools.wraps(old_replicate)\n    def new_replicate(module, device_ids):\n        modules = old_replicate(module, device_ids)\n        execute_replication_callbacks(modules)\n        return modules\n\n    data_parallel.replicate = new_replicate\n"
  },
  {
    "path": "lib/sync_bn/unittest.py",
    "content": "# -*- coding: utf-8 -*-\n# File   : unittest.py\n# Author : Jiayuan Mao\n# Email  : maojiayuan@gmail.com\n# Date   : 27/01/2018\n# \n# This file is part of Synchronized-BatchNorm-PyTorch.\n# https://github.com/vacancy/Synchronized-BatchNorm-PyTorch\n# Distributed under MIT License.\n\nimport unittest\n\nimport numpy as np\nfrom torch.autograd import Variable\n\n\ndef as_numpy(v):\n    if isinstance(v, Variable):\n        v = v.data\n    return v.cpu().numpy()\n\n\nclass TorchTestCase(unittest.TestCase):\n    def assertTensorClose(self, a, b, atol=1e-3, rtol=1e-3):\n        npa, npb = as_numpy(a), as_numpy(b)\n        self.assertTrue(\n                np.allclose(npa, npb, atol=atol),\n                'Tensor close check failed\\n{}\\n{}\\nadiff={}, rdiff={}'.format(a, b, np.abs(npa - npb).max(), np.abs((npa - npb) / np.fmax(npa, 1e-5)).max())\n        )\n"
  },
  {
    "path": "model/__init__.py",
    "content": ""
  },
  {
    "path": "model/pointnet/pointnet.py",
    "content": "import torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\n\nclass STN3D(nn.Module):\n    def __init__(self, c):\n        super(STN3D, self).__init__()\n        self.c = c\n        self.conv1 = nn.Conv1d(self.c, 64, 1)\n        self.conv2 = nn.Conv1d(64, 128, 1)\n        self.conv3 = nn.Conv1d(128, 1024, 1)\n        self.mp = nn.AdaptiveMaxPool1d(1)\n        self.fc1 = nn.Linear(1024, 512)\n        self.fc2 = nn.Linear(512, 256)\n        self.fc3 = nn.Linear(256, self.c*self.c)\n\n        self.bn1 = nn.BatchNorm1d(64)\n        self.bn2 = nn.BatchNorm1d(128)\n        self.bn3 = nn.BatchNorm1d(1024)\n        self.bn4 = nn.BatchNorm1d(512)\n        self.bn5 = nn.BatchNorm1d(256)\n\n    def forward(self, x):\n        batch_size = x.size()[0]\n        x = F.relu(self.bn1(self.conv1(x)))\n        x = F.relu(self.bn2(self.conv2(x)))\n        x = F.relu(self.bn3(self.conv3(x)))\n        x = self.mp(x)\n        x = x.view(-1, 1024)\n        x = F.relu(self.bn4(self.fc1(x)))\n        x = F.relu(self.bn5(self.fc2(x)))\n        x = self.fc3(x)\n\n        iden = torch.eye(self.c).view(1, -1).repeat(batch_size, 1)\n        if x.is_cuda:\n            iden = iden.cuda()\n        x = x + iden\n        x = x.view(-1, self.c, self.c)\n        return x\n\n\nclass PointNetFeat(nn.Module):\n    def __init__(self, c=3, global_feat=True):\n        super(PointNetFeat, self).__init__()\n        self.global_feat = global_feat\n        self.stn1 = STN3D(c)\n        self.conv1 = nn.Conv1d(c, 64, 1)\n        self.conv2 = nn.Conv1d(64, 64, 1)\n        self.stn2 = STN3D(64)\n        self.conv3 = nn.Conv1d(64, 64, 1)\n        self.conv4 = nn.Conv1d(64, 128, 1)\n        self.conv5 = nn.Conv1d(128, 1024, 1)\n        self.mp = nn.AdaptiveMaxPool1d(1)\n\n        self.bn1 = nn.BatchNorm1d(64)\n        self.bn2 = nn.BatchNorm1d(64)\n        self.bn3 = nn.BatchNorm1d(64)\n        self.bn4 = nn.BatchNorm1d(128)\n        self.bn5 = nn.BatchNorm1d(1024)\n\n    def forward(self, x):\n        stn1 = self.stn1(x)\n        x = torch.bmm(stn1, x)\n        x = F.relu(self.bn1(self.conv1(x)))\n        x = F.relu(self.bn2(self.conv2(x)))\n        stn2 = self.stn2(x)\n        x_tmp = torch.bmm(stn2, x)\n        x = F.relu(self.bn3(self.conv3(x_tmp)))\n        x = F.relu(self.bn4(self.conv4(x)))\n        x = F.relu(self.bn5(self.conv5(x)))\n        x = self.mp(x)\n        x = x.view(-1, 1024)\n\n        if not self.global_feat:\n            x = x.view(-1, 1024, 1).repeat(1, 1, x_tmp.size()[2])\n            x = torch.cat([x_tmp, x], 1)\n        return x\n\n\nclass PointNetCls(nn.Module):\n    def __init__(self, c=3, k=40, dropout=0.3, sync_bn=False):\n        super(PointNetCls, self).__init__()\n        self.feat = PointNetFeat(c, global_feat=True)\n        self.fc1 = nn.Linear(1024, 512)\n        self.fc2 = nn.Linear(512, 256)\n        self.fc3 = nn.Linear(256, k)\n\n        self.bn1 = nn.BatchNorm1d(512)\n        self.bn2 = nn.BatchNorm1d(256)\n        self.dropout = nn.Dropout(p=dropout)\n\n    def forward(self, x):\n        x = x.transpose(1, 2)\n        x = self.feat(x)\n        x = F.relu(self.bn1(self.fc1(x)))\n        x = F.relu(self.bn2(self.fc2(x)))\n        x = self.dropout(x)\n        x = self.fc3(x)\n        return x\n\n\n# Segmentation with 9 channels input XYZ, RGB and normalized location to the room (from 0 to 1), with STN3D on input and feature\nclass PointNetSeg(nn.Module):\n    def __init__(self, c=9, k=13, sync_bn=False):\n        super(PointNetSeg, self).__init__()\n        self.feat = PointNetFeat(c, global_feat=False)\n        self.conv1 = nn.Conv1d(1088, 512, 1)\n        self.conv2 = nn.Conv1d(512, 256, 1)\n        self.conv3 = nn.Conv1d(256, 128, 1)\n        self.conv4 = nn.Conv1d(128, 128, 1)\n        self.conv5 = nn.Conv1d(128, k, 1)\n\n        self.bn1 = nn.BatchNorm1d(512)\n        self.bn2 = nn.BatchNorm1d(256)\n        self.bn3 = nn.BatchNorm1d(128)\n        self.bn4 = nn.BatchNorm1d(128)\n\n    def forward(self, x):\n        x = x.transpose(1, 2)\n        x = self.feat(x)\n        x = F.relu(self.bn1(self.conv1(x)))\n        x = F.relu(self.bn2(self.conv2(x)))\n        x = F.relu(self.bn3(self.conv3(x)))\n        x = F.relu(self.bn4(self.conv4(x)))\n        x = self.conv5(x)\n        return x\n\n\nif __name__ == '__main__':\n    import os\n    os.environ[\"CUDA_VISIBLE_DEVICES\"] = '0'\n\n    sim_data = torch.rand(16, 2048, 3)\n\n    trans = STN3D(c=3)\n    out = trans(sim_data.transpose(1, 2))\n    print('stn', out.size())\n\n    point_feat = PointNetFeat(global_feat=True)\n    out = point_feat(sim_data.transpose(1, 2))\n    print('global feat', out.size())\n\n    point_feat = PointNetFeat(global_feat=False)\n    out = point_feat(sim_data.transpose(1, 2))\n    print('point feat', out.size())\n\n    cls = PointNetCls(c=3, k=40)\n    out = cls(sim_data)\n    print('class', out.size())\n\n    sim_data = torch.rand(16, 2048, 9)\n    seg = PointNetSeg(c=9, k=13)\n    out = seg(sim_data)\n    print('seg', out.size())\n"
  },
  {
    "path": "model/pointnet2/pointnet2_modules.py",
    "content": "from typing import List\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom lib.pointops.functions import pointops\nfrom util import pt_util\n\n\nclass _PointNet2SAModuleBase(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.npoint = None\n        self.groupers = None\n        self.mlps = None\n\n    def forward(self, xyz: torch.Tensor, features: torch.Tensor = None) -> (torch.Tensor, torch.Tensor):\n        r\"\"\"\n        Parameters\n        ----------\n        xyz : torch.Tensor\n            (B, N, 3) tensor of the xyz coordinates of the features\n        features : torch.Tensor\n            (B, N, C) tensor of the descriptors of the the features\n        Returns\n        -------\n        new_xyz : torch.Tensor\n            (B, npoint, 3) tensor of the new features' xyz\n        new_features : torch.Tensor\n            (B, npoint, \\sum_k(mlps[k][-1])) tensor of the new_features descriptors\n        \"\"\"\n        new_features_list = []\n        xyz_trans = xyz.transpose(1, 2).contiguous()\n        new_xyz = pointops.gathering(\n            xyz_trans,\n            pointops.furthestsampling(xyz, self.npoint)\n        ).transpose(1, 2).contiguous() if self.npoint is not None else None\n        for i in range(len(self.groupers)):\n            new_features = self.groupers[i](xyz, new_xyz, features)  # (B, C, npoint, nsample)\n            new_features = self.mlps[i](new_features)  # (B, mlp[-1], npoint, nsample)\n            new_features = F.max_pool2d(new_features, kernel_size=[1, new_features.size(3)])  # (B, mlp[-1], npoint, 1)\n            new_features = new_features.squeeze(-1)  # (B, mlp[-1], npoint)\n            new_features_list.append(new_features)\n        return new_xyz, torch.cat(new_features_list, dim=1)\n\n\nclass PointNet2SAModuleMSG(_PointNet2SAModuleBase):\n    r\"\"\"Pointnet set abstrction layer with multiscale grouping\n    Parameters\n    ----------\n    npoint : int\n        Number of features\n    radii : list of float32\n        list of radii to group with\n    nsamples : list of int32\n        Number of samples in each ball query\n    mlps : list of list of int32\n        Spec of the pointnet_old before the global max_pool for each scale\n    bn : bool\n        Use batchnorm\n    \"\"\"\n    def __init__(self, *, npoint: int, radii: List[float], nsamples: List[int], mlps: List[List[int]], bn: bool = True, use_xyz: bool = True):\n        super().__init__()\n        assert len(radii) == len(nsamples) == len(mlps)\n        self.npoint = npoint\n        self.groupers = nn.ModuleList()\n        self.mlps = nn.ModuleList()\n        for i in range(len(radii)):\n            radius = radii[i]\n            nsample = nsamples[i]\n            self.groupers.append(\n                pointops.QueryAndGroup(radius, nsample, use_xyz=use_xyz)\n                if npoint is not None else pointops.GroupAll(use_xyz)\n            )\n            mlp_spec = mlps[i]\n            if use_xyz:\n                mlp_spec[0] += 3\n            self.mlps.append(pt_util.SharedMLP(mlp_spec, bn=bn))\n\n\nclass PointNet2SAModule(PointNet2SAModuleMSG):\n    r\"\"\"Pointnet set abstrction layer\n    Parameters\n    ----------\n    npoint : int\n        Number of features\n    radius : float\n        Radius of ball\n    nsample : int\n        Number of samples in the ball query\n    mlp : list\n        Spec of the pointnet_old before the global max_pool\n    bn : bool\n        Use batchnorm\n    \"\"\"\n    def __init__(self, *, mlp: List[int], npoint: int = None, radius: float = None, nsample: int = None, bn: bool = True, use_xyz: bool = True):\n        super().__init__(mlps=[mlp], npoint=npoint, radii=[radius], nsamples=[nsample], bn=bn, use_xyz=use_xyz)\n\n\nclass PointNet2FPModule(nn.Module):\n    r\"\"\"Propigates the features of one set to another\n    Parameters\n    ----------\n    mlp : list\n        Pointnet module parameters\n    bn : bool\n        Use batchnorm\n    \"\"\"\n    def __init__(self, *, mlp: List[int], bn: bool = True):\n        super().__init__()\n        self.mlp = pt_util.SharedMLP(mlp, bn=bn)\n\n    def forward(self, unknown: torch.Tensor, known: torch.Tensor, unknow_feats: torch.Tensor, known_feats: torch.Tensor) -> torch.Tensor:\n        r\"\"\"\n        Parameters\n        ----------\n        unknown : torch.Tensor\n            (B, n, 3) tensor of the xyz positions of the unknown features\n        known : torch.Tensor\n            (B, m, 3) tensor of the xyz positions of the known features\n        unknow_feats : torch.Tensor\n            (B, C1, n) tensor of the features to be propigated to\n        known_feats : torch.Tensor\n            (B, C2, m) tensor of features to be propigated\n        Returns\n        -------\n        new_features : torch.Tensor\n            (B, mlp[-1], n) tensor of the features of the unknown features\n        \"\"\"\n\n        if known is not None:\n            dist, idx = pointops.nearestneighbor(unknown, known)\n            dist_recip = 1.0 / (dist + 1e-8)\n            norm = torch.sum(dist_recip, dim=2, keepdim=True)\n            weight = dist_recip / norm\n            interpolated_feats = pointops.interpolation(known_feats, idx, weight)\n        else:\n            interpolated_feats = known_feats.expand(*known_feats.size()[0:2], unknown.size(1))\n\n        if unknow_feats is not None:\n            new_features = torch.cat([interpolated_feats, unknow_feats], dim=1)  # (B, C2 + C1, n)\n        else:\n            new_features = interpolated_feats\n        return self.mlp(new_features.unsqueeze(-1)).squeeze(-1)\n\n\nif __name__ == \"__main__\":\n    torch.manual_seed(1)\n    torch.cuda.manual_seed_all(1)\n    xyz = torch.randn(2, 9, 3, requires_grad=True).cuda()\n    xyz_feats = torch.randn(2, 9, 6, requires_grad=True).cuda()\n\n    test_module = PointNet2SAModuleMSG(npoint=2, radii=[5.0, 10.0], nsamples=[6, 3], mlps=[[9, 3], [9, 6]])\n    test_module.cuda()\n    print(test_module(xyz, xyz_feats))\n\n    # test_module = PointNet2FPModule(mlp=[6, 6])\n    # test_module.cuda()\n    # from torch.autograd import gradcheck\n    # inputs = (xyz, xyz, None, xyz_feats)\n    # test = gradcheck(test_module, inputs, eps=1e-6, atol=1e-4)\n    # print(test)\n\n    for _ in range(1):\n        _, new_features = test_module(xyz, xyz_feats)\n        new_features.backward(torch.cuda.FloatTensor(*new_features.size()).fill_(1))\n        print(new_features)\n        print(xyz.grad)\n"
  },
  {
    "path": "model/pointnet2/pointnet2_seg.py",
    "content": "from collections import namedtuple\n\nimport torch\nimport torch.nn as nn\n\nfrom model.pointnet2.pointnet2_modules import PointNet2SAModule, PointNet2SAModuleMSG, PointNet2FPModule\nfrom util import pt_util\n\n\nclass PointNet2SSGSeg(nn.Module):\n    r\"\"\"\n        PointNet2 with single-scale grouping\n        Semantic segmentation network that uses feature propogation layers\n        Parameters\n        ----------\n        k: int\n            Number of semantics classes to predict over -- size of softmax classifier that run for each point\n        c: int = 6\n            Number of input channels in the feature descriptor for each point.  If the point cloud is Nx9, this\n            value should be 6 as in an Nx9 point cloud, 3 of the channels are xyz, and 6 are feature descriptors\n        use_xyz: bool = True\n            Whether or not to use the xyz position of a point as a feature\n    \"\"\"\n\n    def __init__(self, c=3, k=13, use_xyz=True):\n        super().__init__()\n        self.SA_modules = nn.ModuleList()\n        self.SA_modules.append(PointNet2SAModule(npoint=1024, nsample=32, mlp=[c, 32, 32, 64], use_xyz=use_xyz))\n        self.SA_modules.append(PointNet2SAModule(npoint=256, nsample=32, mlp=[64, 64, 64, 128], use_xyz=use_xyz))\n        self.SA_modules.append(PointNet2SAModule(npoint=64, nsample=32, mlp=[128, 128, 128, 256], use_xyz=use_xyz))\n        self.SA_modules.append(PointNet2SAModule(npoint=16, nsample=32, mlp=[256, 256, 256, 512], use_xyz=use_xyz))\n        self.FP_modules = nn.ModuleList()\n        self.FP_modules.append(PointNet2FPModule(mlp=[128 + c, 128, 128, 128]))\n        self.FP_modules.append(PointNet2FPModule(mlp=[256 + 64, 256, 128]))\n        self.FP_modules.append(PointNet2FPModule(mlp=[256 + 128, 256, 256]))\n        self.FP_modules.append(PointNet2FPModule(mlp=[512 + 256, 256, 256]))\n        self.FC_layer = nn.Sequential(pt_util.Conv2d(128, 128, bn=True), nn.Dropout(), pt_util.Conv2d(128, k, activation=None))\n\n    def _break_up_pc(self, pc):\n        xyz = pc[..., 0:3].contiguous()\n        features = (pc[..., 3:].transpose(1, 2).contiguous() if pc.size(-1) > 3 else None)\n        return xyz, features\n\n    def forward(self, pointcloud: torch.cuda.FloatTensor):\n        r\"\"\"\n            Forward pass of the network\n            Parameters\n            ----------\n            pointcloud: Variable(torch.cuda.FloatTensor)\n                (B, N, 3 + input_channels) tensor\n                Point cloud to run predicts on\n                Each point in the point-cloud MUST\n                be formated as (x, y, z, features...)\n        \"\"\"\n        xyz, features = self._break_up_pc(pointcloud)\n        l_xyz, l_features = [xyz], [features]\n        for i in range(len(self.SA_modules)):\n            li_xyz, li_features = self.SA_modules[i](l_xyz[i], l_features[i])\n            l_xyz.append(li_xyz)\n            l_features.append(li_features)\n        for i in range(-1, -(len(self.FP_modules) + 1), -1):\n            l_features[i - 1] = self.FP_modules[i](l_xyz[i - 1], l_xyz[i], l_features[i - 1], l_features[i])\n        # return self.FC_layer(l_features[0])\n        return self.FC_layer(l_features[0].unsqueeze(-1)).squeeze(-1)\n\n\nclass PointNet2MSGSeg(PointNet2SSGSeg):\n    r\"\"\"\n        PointNet2 with multi-scale grouping\n        Semantic segmentation network that uses feature propogation layers\n        Parameters\n        ----------\n        k: int\n            Number of semantics classes to predict over -- size of softmax classifier that run for each point\n        c: int = 6\n            Number of input channels in the feature descriptor for each point.  If the point cloud is Nx9, this\n            value should be 6 as in an Nx9 point cloud, 3 of the channels are xyz, and 6 are feature descriptors\n        use_xyz: bool = True\n            Whether or not to use the xyz position of a point as a feature\n    \"\"\"\n\n    def __init__(self, k, c=6, use_xyz=True):\n        super().__init__()\n        self.SA_modules = nn.ModuleList()\n        c_in = c\n        self.SA_modules.append(PointNet2SAModuleMSG(npoint=1024, radii=[0.05, 0.1], nsamples=[16, 32], mlps=[[c_in, 16, 16, 32], [c_in, 32, 32, 64]], use_xyz=use_xyz ))\n        c_out_0 = 32 + 64\n        c_in = c_out_0\n        self.SA_modules.append(PointNet2SAModuleMSG(npoint=256, radii=[0.1, 0.2], nsamples=[16, 32], mlps=[[c_in, 64, 64, 128], [c_in, 64, 96, 128]], use_xyz=use_xyz))\n        c_out_1 = 128 + 128\n        c_in = c_out_1\n        self.SA_modules.append(PointNet2SAModuleMSG(npoint=64, radii=[0.2, 0.4], nsamples=[16, 32], mlps=[[c_in, 128, 196, 256], [c_in, 128, 196, 256]], use_xyz=use_xyz))\n        c_out_2 = 256 + 256\n        c_in = c_out_2\n        self.SA_modules.append(PointNet2SAModuleMSG(npoint=16, radii=[0.4, 0.8], nsamples=[16, 32], mlps=[[c_in, 256, 256, 512], [c_in, 256, 384, 512]], use_xyz=use_xyz))\n        c_out_3 = 512 + 512\n        self.FP_modules = nn.ModuleList()\n        self.FP_modules.append(PointNet2FPModule(mlp=[256 + c, 128, 128]))\n        self.FP_modules.append(PointNet2FPModule(mlp=[512 + c_out_0, 256, 256]))\n        self.FP_modules.append(PointNet2FPModule(mlp=[512 + c_out_1, 512, 512]))\n        self.FP_modules.append(PointNet2FPModule(mlp=[c_out_3 + c_out_2, 512, 512]))\n        self.FC_layer = nn.Sequential(pt_util.Conv2d(128, 128, bn=True), nn.Dropout(), pt_util.Conv2d(128, k, activation=None))\n\n\ndef model_fn_decorator(criterion):\n    ModelReturn = namedtuple(\"ModelReturn\", ['preds', 'loss', 'acc'])\n\n    def model_fn(model, data, eval=False):\n        with torch.set_grad_enabled(not eval):\n            inputs, labels = data\n            inputs = inputs.cuda(async=True)\n            labels = labels.cuda(async=True)\n            preds = model(inputs)\n            loss = criterion(preds, labels)\n            _, classes = torch.max(preds, 1)\n            acc = (classes == labels).float().sum() / labels.numel()\n            return ModelReturn(preds, loss, {\"acc\": acc.item(), 'loss': loss.item()})\n    return model_fn\n\n\nif __name__ == \"__main__\":\n    import torch.optim as optim\n    B, N, C, K = 2, 4096, 3, 13\n    inputs = torch.randn(B, N, 6).cuda()\n    labels = torch.randint(0, 3, (B, N)).cuda()\n\n    model = PointNet2SSGSeg(c=C, k=K).cuda()\n    optimizer = optim.SGD(model.parameters(), lr=5e-2, momentum=0.9, weight_decay=1e-4)\n    print(\"Testing SSGCls with xyz\")\n    model_fn = model_fn_decorator(nn.CrossEntropyLoss())\n    for _ in range(5):\n        optimizer.zero_grad()\n        _, loss, _ = model_fn(model, (inputs, labels))\n        loss.backward()\n        print(loss.item())\n        optimizer.step()\n\n    model = PointNet2SSGSeg(c=C, k=K, use_xyz=False).cuda()\n    optimizer = optim.SGD(model.parameters(), lr=5e-2, momentum=0.9, weight_decay=1e-4)\n    print(\"Testing SSGCls without xyz\")\n    model_fn = model_fn_decorator(nn.CrossEntropyLoss())\n    for _ in range(5):\n        optimizer.zero_grad()\n        _, loss, _ = model_fn(model, (inputs, labels))\n        loss.backward()\n        print(loss.item())\n        optimizer.step()\n\n    model = PointNet2MSGSeg(c=C, k=K).cuda()\n    optimizer = optim.SGD(model.parameters(), lr=5e-2, momentum=0.9, weight_decay=1e-4)\n    print(\"Testing MSGCls with xyz\")\n    model_fn = model_fn_decorator(nn.CrossEntropyLoss())\n    for _ in range(5):\n        optimizer.zero_grad()\n        _, loss, _ = model_fn(model, (inputs, labels))\n        loss.backward()\n        print(loss.item())\n        optimizer.step()\n\n    model = PointNet2MSGSeg(c=C, k=K, use_xyz=False).cuda()\n    optimizer = optim.SGD(model.parameters(), lr=5e-2, momentum=0.9, weight_decay=1e-4)\n    print(\"Testing MSGCls without xyz\")\n    model_fn = model_fn_decorator(nn.CrossEntropyLoss())\n    for _ in range(5):\n        optimizer.zero_grad()\n        _, loss, _ = model_fn(model, (inputs, labels))\n        loss.backward()\n        print(loss.item())\n        optimizer.step()\n\n"
  },
  {
    "path": "model/pointweb/pointweb_module.py",
    "content": "from typing import List\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\n\nfrom lib.pointops.functions import pointops\nfrom util import pt_util\n\n\nclass _AFAModule(nn.Module):\n    def __init__(self, mlp, use_softmax=False):\n        r\"\"\"\n        :param mlp: mlp for learning weight\n               mode: transformation or aggregation\n        \"\"\"\n        super().__init__()\n        self.mlp = mlp\n        self.use_softmax = use_softmax\n\n    def forward(self, feature: torch.Tensor) -> torch.Tensor:\n        r\"\"\"\n        Parameters\n        ----------\n        features : torch.Tensor\n            (B, C, N, M) or (B, C, N)\n        Returns\n        -------\n        new_features : torch.Tensor\n            transformation: (B, C, N, M) or (B, C, N)\n            aggregation: (B, C, N) or (B, C)\n        \"\"\"\n        B, C, N, M = feature.size()\n        feature = feature.transpose(1, 2).contiguous().view(B * N, C, M, 1).repeat(1, 1, 1, M)  # (BN, C, M, M)\n        feature = feature - feature.transpose(2, 3).contiguous() + torch.mul(feature, torch.eye(M).view(1, 1, M, M).cuda())  # (BN, C, M, M)\n        weight = self.mlp(feature)\n        if self.use_softmax:\n            weight = F.softmax(weight, -1)\n        feature = (feature * weight).sum(-1).view(B, N, C, M).transpose(1, 2).contiguous()  # (B, C, N, M)\n        return feature\n\n\nclass _PointWebSAModuleBase(nn.Module):\n    def __init__(self):\n        super().__init__()\n        self.npoint = None\n        self.grouper = None\n        self.mlp = None\n        self.afa = None\n\n    def forward(self, xyz: torch.Tensor, features: torch.Tensor = None) -> (torch.Tensor, torch.Tensor):\n        r\"\"\"\n        Parameters\n        ----------\n        xyz : torch.Tensor\n            (B, N, 3) tensor of the xyz coordinates of the features\n        features : torch.Tensor\n            (B, C, N) tensor of the descriptors of the the features\n        Returns\n        -------\n        new_xyz : torch.Tensor\n            (B, npoint, 3) tensor of the new features' xyz\n        new_features : torch.Tensor\n            (B, npoint, \\sum_k(mlps[k][-1])) tensor of the new_features descriptors\n        \"\"\"\n        new_features_list = []\n        xyz_trans = xyz.transpose(1, 2).contiguous()\n        new_xyz = pointops.gathering(\n            xyz_trans,\n            pointops.furthestsampling(xyz, self.npoint)\n        ).transpose(1, 2).contiguous() if self.npoint is not None else None\n        new_features = self.grouper(xyz, new_xyz, features)  # (B, C, npoint, nsample)\n        if new_features.shape[2] != 1:  # for npoint is none\n            new_features = new_features + self.afa(new_features)  # (B, C, npoint, nsample)\n        new_features = self.mlp(new_features)\n        new_features = F.max_pool2d(new_features, kernel_size=[1, new_features.size(3)]).squeeze(-1)  # (B, mlp[-1], npoint)\n        new_features_list.append(new_features)\n        return new_xyz, torch.cat(new_features_list, dim=1)\n\n\nclass PointWebSAModule(_PointWebSAModuleBase):\n    r\"\"\"Pointnet set abstrction layer with multiscale grouping\n    Parameters\n    ----------\n    npoint : int\n        Number of features\n    nsample : int32\n        Number of sample\n    mlps : list of int32\n        Spec of the MLP before the global max_pool\n    mlps2: list of list of int32\n        Spec of the MLP for AFA\n    bn : bool\n        Use batchnorm\n    \"\"\"\n    def __init__(self, *, npoint: int = None, nsample: int = None, mlp: List[int] = None, mlp2: List[int] = None, bn: bool = True, use_xyz: bool = True, use_bn = True):\n        super().__init__()\n        self.npoint = npoint\n        self.grouper = pointops.QueryAndGroup(nsample=nsample, use_xyz=use_xyz) if npoint is not None else pointops.GroupAll(use_xyz)\n        if use_xyz:\n            mlp[0] += 3\n        if npoint is not None:\n            mlp_tmp = pt_util.SharedMLP([mlp[0]] + mlp2, bn=use_bn)\n            mlp_tmp.add_module('weight', (pt_util.SharedMLP([mlp2[-1], mlp[0]], bn=False, activation=None)))\n            self.afa = _AFAModule(mlp=mlp_tmp)\n        self.mlp = pt_util.SharedMLP(mlp, bn=bn)\n\n\nif __name__ == \"__main__\":\n    torch.manual_seed(1)\n    torch.cuda.manual_seed_all(1)\n    c = 6\n    xyz = torch.randn(2, 8, 3, requires_grad=True).cuda()\n    xyz_feats = torch.randn(2, 8, c, requires_grad=True).cuda()\n\n    test_module = PointWebSAModule(npoint=2, nsample=6, mlp=[c, 32, 32], mlp2=[16, 16], use_bn=True)\n    test_module.cuda()\n    xyz_feats = xyz_feats.transpose(1, 2).contiguous()\n    print(test_module)\n    print(test_module(xyz, xyz_feats))\n\n    for _ in range(1):\n        _, new_features = test_module(xyz, xyz_feats)\n        new_features.backward(torch.cuda.FloatTensor(*new_features.size()).fill_(1))\n        print(new_features)\n        print(xyz.grad)\n"
  },
  {
    "path": "model/pointweb/pointweb_seg.py",
    "content": "from collections import namedtuple\n\nimport torch\nimport torch.nn as nn\n\nfrom model.pointweb.pointweb_module import PointWebSAModule\nfrom model.pointnet2.pointnet2_modules import PointNet2FPModule\nfrom util import pt_util\n\n\nclass PointWebSeg(nn.Module):\n    r\"\"\"\n        PointNet2 with single-scale grouping\n        Semantic segmentation network that uses feature propogation layers\n        Parameters\n        ----------\n        k: int\n            Number of semantics classes to predict over -- size of softmax classifier that run for each point\n        c: int = 6\n            Number of input channels in the feature descriptor for each point.  If the point cloud is Nx9, this\n            value should be 6 as in an Nx9 point cloud, 3 of the channels are xyz, and 6 are feature descriptors\n        use_xyz: bool = True\n            Whether or not to use the xyz position of a point as a feature\n    \"\"\"\n\n    def __init__(self, c=3, k=13, use_xyz=True):\n        super().__init__()\n        self.SA_modules = nn.ModuleList()\n        self.SA_modules.append(PointWebSAModule(npoint=1024, nsample=32, mlp=[c, 32, 32, 64], mlp2=[32, 32], use_xyz=use_xyz))\n        self.SA_modules.append(PointWebSAModule(npoint=256, nsample=32, mlp=[64, 64, 64, 128], mlp2=[32, 32], use_xyz=use_xyz))\n        self.SA_modules.append(PointWebSAModule(npoint=64, nsample=32, mlp=[128, 128, 128, 256], mlp2=[32, 32], use_xyz=use_xyz))\n        self.SA_modules.append(PointWebSAModule(npoint=16, nsample=32, mlp=[256, 256, 256, 512], mlp2=[32, 32], use_xyz=use_xyz))\n\n        self.FP_modules = nn.ModuleList()\n        self.FP_modules.append(PointNet2FPModule(mlp=[128 + c, 128, 128, 128]))\n        self.FP_modules.append(PointNet2FPModule(mlp=[256 + 64, 256, 128]))\n        self.FP_modules.append(PointNet2FPModule(mlp=[256 + 128, 256, 256]))\n        self.FP_modules.append(PointNet2FPModule(mlp=[512 + 256, 256, 256]))\n        self.FC_layer = nn.Sequential(pt_util.Conv2d(128, 128, bn=True), nn.Dropout(), pt_util.Conv2d(128, k, activation=None))\n\n    def _break_up_pc(self, pc):\n        xyz = pc[..., 0:3].contiguous()\n        features = (pc[..., 3:].transpose(1, 2).contiguous() if pc.size(-1) > 3 else None)\n        return xyz, features\n\n    def forward(self, pointcloud: torch.cuda.FloatTensor):\n        r\"\"\"\n            Forward pass of the network\n            Parameters\n            ----------\n            pointcloud: Variable(torch.cuda.FloatTensor)\n                (B, N, 3 + input_channels) tensor\n                Point cloud to run predicts on\n                Each point in the point-cloud MUST\n                be formated as (x, y, z, features...)\n        \"\"\"\n        xyz, features = self._break_up_pc(pointcloud)\n        l_xyz, l_features = [xyz], [features]\n        for i in range(len(self.SA_modules)):\n            li_xyz, li_features = self.SA_modules[i](l_xyz[i], l_features[i])\n            l_xyz.append(li_xyz)\n            l_features.append(li_features)\n        for i in range(-1, -(len(self.FP_modules) + 1), -1):\n            l_features[i - 1] = self.FP_modules[i](l_xyz[i - 1], l_xyz[i], l_features[i - 1], l_features[i])\n        return self.FC_layer(l_features[0].unsqueeze(-1)).squeeze(-1)\n\n\ndef model_fn_decorator(criterion):\n    ModelReturn = namedtuple(\"ModelReturn\", ['preds', 'loss', 'acc'])\n\n    def model_fn(model, data, epoch=0, eval=False):\n        with torch.set_grad_enabled(not eval):\n            inputs, labels = data\n            inputs = inputs.cuda(async=True)\n            labels = labels.cuda(async=True)\n            preds = model(inputs)\n            loss = criterion(preds, labels)\n            _, classes = torch.max(preds, 1)\n            acc = (classes == labels).float().sum() / labels.numel()\n            return ModelReturn(preds, loss, {\"acc\": acc.item(), 'loss': loss.item()})\n    return model_fn\n\n\ndef model_fn_decorator(criterion):\n    ModelReturn = namedtuple(\"ModelReturn\", ['preds', 'loss', 'acc'])\n\n    def model_fn(model, data, eval=False):\n        with torch.set_grad_enabled(not eval):\n            inputs, labels = data\n            inputs = inputs.cuda(async=True)\n            labels = labels.cuda(async=True)\n            preds = model(inputs)\n            loss = criterion(preds, labels)\n            _, classes = torch.max(preds, 1)\n            acc = (classes == labels).float().sum() / labels.numel()\n            return ModelReturn(preds, loss, {\"acc\": acc.item(), 'loss': loss.item()})\n    return model_fn\n\n\nif __name__ == \"__main__\":\n    import torch.optim as optim\n    B, N, C, K = 2, 4096, 3, 13\n    inputs = torch.randn(B, N, 6).cuda()\n    labels = torch.randint(0, 3, (B, N)).cuda()\n\n    model = PointWebSeg(c=C, k=K).cuda()\n    print(model)\n    optimizer = optim.SGD(model.parameters(), lr=5e-2, momentum=0.9, weight_decay=1e-4)\n    print(\"Testing SSGCls with xyz\")\n    model_fn = model_fn_decorator(nn.CrossEntropyLoss())\n    for _ in range(5):\n        optimizer.zero_grad()\n        _, loss, _ = model_fn(model, (inputs, labels))\n        loss.backward()\n        print(loss.item())\n        optimizer.step()\n\n    model = PointWebSeg(c=C, k=K, use_xyz=False).cuda()\n    print(model)\n    optimizer = optim.SGD(model.parameters(), lr=5e-2, momentum=0.9, weight_decay=1e-4)\n    print(\"Testing SSGCls without xyz\")\n    model_fn = model_fn_decorator(nn.CrossEntropyLoss())\n    for _ in range(5):\n        optimizer.zero_grad()\n        _, loss, _ = model_fn(model, (inputs, labels))\n        loss.backward()\n        print(loss.item())\n        optimizer.step()\n"
  },
  {
    "path": "tool/test.sh",
    "content": "#!/bin/sh\nexport PYTHONPATH=./\n\nPYTHON=python\ndataset=$1\nexp_name=$2\nexp_dir=exp/${dataset}/${exp_name}\nmodel_dir=${exp_dir}/model\nconfig=config/${dataset}/${dataset}_${exp_name}.yaml\n\nmkdir -p ${model_dir}\nnow=$(date +\"%Y%m%d_%H%M%S\")\n\nif [ ${dataset} = 's3dis' ]\nthen\n  cp tool/test.sh tool/test_s3dis.py ${config} ${exp_dir}\n  $PYTHON tool/test_s3dis.py --config=${config} 2>&1 | tee ${model_dir}/test-$now.log\nelif [ ${dataset} = 'scannet' ]\nthen\n  cp tool/test.sh tool/test_scannet.py ${config} ${exp_dir}\n  $PYTHON tool/test_scannet.py --config=${config} 2>&1 | tee ${model_dir}/test-$now.log\nfi\n"
  },
  {
    "path": "tool/test_s3dis.py",
    "content": "import os\nimport time\nimport random\nimport numpy as np\nimport logging\nimport pickle\nimport argparse\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.parallel\nimport torch.optim\nimport torch.utils.data\n\nfrom util import config\nfrom util.util import AverageMeter, intersectionAndUnion, check_makedirs\n\nrandom.seed(123)\nnp.random.seed(123)\n\n\ndef get_parser():\n    parser = argparse.ArgumentParser(description='PyTorch Point Cloud Classification / Semantic Segmentation')\n    parser.add_argument('--config', type=str, default='config/s3dis/s3dis_pointweb.yaml', help='config file')\n    parser.add_argument('opts', help='see config/s3dis/s3dis_pointweb.yaml for all options', default=None, nargs=argparse.REMAINDER)\n    args = parser.parse_args()\n    assert args.config is not None\n    cfg = config.load_cfg_from_cfg_file(args.config)\n    if args.opts is not None:\n        cfg = config.merge_cfg_from_list(cfg, args.opts)\n    return cfg\n\n\ndef get_logger():\n    logger_name = \"main-logger\"\n    logger = logging.getLogger(logger_name)\n    logger.setLevel(logging.INFO)\n    handler = logging.StreamHandler()\n    fmt = \"[%(asctime)s %(levelname)s %(filename)s line %(lineno)d %(process)d] %(message)s\"\n    handler.setFormatter(logging.Formatter(fmt))\n    logger.addHandler(handler)\n    return logger\n\n\ndef main():\n    global args, logger\n    args = get_parser()\n    logger = get_logger()\n    logger.info(args)\n    assert args.classes > 1\n    logger.info(\"=> creating model ...\")\n    logger.info(\"Classes: {}\".format(args.classes))\n\n    if args.arch == 'pointnet_seg':\n        from model.pointnet.pointnet import PointNetSeg as Model\n    elif args.arch == 'pointnet2_seg':\n        from model.pointnet2.pointnet2_seg import PointNet2SSGSeg as Model\n    elif args.arch == 'pointweb_seg':\n        from model.pointweb.pointweb_seg import PointWebSeg as Model\n    else:\n        raise Exception('architecture not supported yet'.format(args.arch))\n    model = Model(c=args.fea_dim, k=args.classes, use_xyz=args.use_xyz)\n    model = torch.nn.DataParallel(model.cuda())\n    logger.info(model)\n    criterion = nn.CrossEntropyLoss(ignore_index=args.ignore_label).cuda()\n    names = [line.rstrip('\\n') for line in open(args.names_path)]\n    if os.path.isfile(args.model_path):\n        logger.info(\"=> loading checkpoint '{}'\".format(args.model_path))\n        checkpoint = torch.load(args.model_path)\n        model.load_state_dict(checkpoint['state_dict'], strict=False)\n        logger.info(\"=> loaded checkpoint '{}'\".format(args.model_path))\n    else:\n        raise RuntimeError(\"=> no checkpoint found at '{}'\".format(args.model_path))\n    test(model, criterion, names)\n\n\ndef data_prepare(room_path):\n    room_data = np.load(room_path)\n    points, labels = room_data[:, 0:6], room_data[:, 6]  # xyzrgb, N*6; l, N\n    coord_min, coord_max = np.amin(points, axis=0)[:3], np.amax(points, axis=0)[:3]\n    stride = args.block_size * args.stride_rate\n    grid_x = int(np.ceil(float(coord_max[0] - coord_min[0] - args.block_size) / stride) + 1)\n    grid_y = int(np.ceil(float(coord_max[1] - coord_min[1] - args.block_size) / stride) + 1)\n    data_room, label_room, index_room = np.array([]), np.array([]), np.array([])\n    for index_y in range(0, grid_y):\n        for index_x in range(0, grid_x):\n            s_x = coord_min[0] + index_x * stride\n            e_x = min(s_x + args.block_size, coord_max[0])\n            s_x = e_x - args.block_size\n            s_y = coord_min[1] + index_y * stride\n            e_y = min(s_y + args.block_size, coord_max[1])\n            s_y = e_y - args.block_size\n            point_idxs = np.where((points[:, 0] >= s_x - 1e-8) & (points[:, 0] <= e_x + 1e-8) & (points[:, 1] >= s_y - 1e-8) & (points[:, 1] <= e_y + 1e-8))[0]\n            if point_idxs.size == 0:\n                continue\n            num_batch = int(np.ceil(point_idxs.size / args.num_point))\n            point_size = int(num_batch * args.num_point)\n            replace = False if (point_size - point_idxs.size <= point_idxs.size) else True\n            point_idxs_repeat = np.random.choice(point_idxs, point_size - point_idxs.size, replace=replace)\n            point_idxs = np.concatenate((point_idxs, point_idxs_repeat))\n            np.random.shuffle(point_idxs)\n            data_batch = points[point_idxs, :]\n            normlized_xyz = np.zeros((point_size, 3))\n            normlized_xyz[:, 0] = data_batch[:, 0] / coord_max[0]\n            normlized_xyz[:, 1] = data_batch[:, 1] / coord_max[1]\n            normlized_xyz[:, 2] = data_batch[:, 2] / coord_max[2]\n            data_batch[:, 0] = data_batch[:, 0] - (s_x + args.block_size / 2.0)\n            data_batch[:, 1] = data_batch[:, 1] - (s_y + args.block_size / 2.0)\n            data_batch[:, 3:6] /= 255.0\n            data_batch = np.concatenate((data_batch, normlized_xyz), axis=1)\n            label_batch = labels[point_idxs]\n            data_room = np.vstack([data_room, data_batch]) if data_room.size else data_batch\n            label_room = np.hstack([label_room, label_batch]) if label_room.size else label_batch\n            index_room = np.hstack([index_room, point_idxs]) if index_room.size else point_idxs\n    assert np.unique(index_room).size == labels.size\n    return data_room, label_room, index_room, labels\n\n\ndef test(model, criterion, names):\n    logger.info('>>>>>>>>>>>>>>>> Start Evaluation >>>>>>>>>>>>>>>>')\n    batch_time = AverageMeter()\n    intersection_meter = AverageMeter()\n    union_meter = AverageMeter()\n    target_meter = AverageMeter()\n\n    model.eval()\n    rooms = sorted(os.listdir(args.train_full_folder))\n    rooms_split = [room for room in rooms if 'Area_{}'.format(args.test_area) in room]\n    gt_all, pred_all = np.array([]), np.array([])\n    check_makedirs(args.save_folder)\n    pred_save, gt_save = [], []\n    for idx, room_name in enumerate(rooms_split):\n        data_room, label_room, index_room, gt = data_prepare(os.path.join(args.train_full_folder, room_name))\n        batch_point = args.num_point * args.test_batch_size\n        batch_num = int(np.ceil(label_room.size / batch_point))\n        end = time.time()\n        output_room = np.array([])\n        for i in range(batch_num):\n            s_i, e_i = i * batch_point, min((i + 1) * batch_point, label_room.size)\n            input, target, index = data_room[s_i:e_i, :], label_room[s_i:e_i], index_room[s_i:e_i]\n            input = torch.from_numpy(input).float().view(-1, args.num_point, input.shape[1])\n            target = torch.from_numpy(target).long().view(-1, args.num_point)\n            with torch.no_grad():\n                output = model(input.cuda())\n            loss = criterion(output, target.cuda())  # for reference\n            output = output.transpose(1, 2).contiguous().view(-1, args.classes).data.cpu().numpy()\n            pred = np.argmax(output, axis=1)\n            intersection, union, target = intersectionAndUnion(pred, target.view(-1).data.cpu().numpy(), args.classes, args.ignore_label)\n            accuracy = sum(intersection) / (sum(target) + 1e-10)\n            output_room = np.vstack([output_room, output]) if output_room.size else output\n            batch_time.update(time.time() - end)\n            end = time.time()\n            if ((i + 1) % args.print_freq == 0) or (i + 1 == batch_num):\n                logger.info('Test: [{}/{}]-[{}/{}] '\n                            'Batch {batch_time.val:.3f} ({batch_time.avg:.3f}) '\n                            'Loss {loss:.4f} '\n                            'Accuracy {accuracy:.4f} '\n                            'Points {gt.size}.'.format(idx + 1, len(rooms_split),\n                                                       i + 1, batch_num,\n                                                       batch_time=batch_time,\n                                                       loss=loss,\n                                                       accuracy=accuracy,\n                                                       gt=gt))\n        '''\n        unq, unq_inv, unq_cnt = np.unique(index_room, return_inverse=True, return_counts=True)\n        index_array = np.split(np.argsort(unq_inv), np.cumsum(unq_cnt[:-1]))\n        output_room = np.vstack([output_room, np.zeros((1, args.classes))])\n        index_array_fill = np.array(list(itertools.zip_longest(*index_array, fillvalue=output_room.shape[0] - 1))).T\n        pred = output_room[index_array_fill].sum(1)\n        pred = np.argmax(pred, axis=1)\n        '''\n        pred = np.zeros((gt.size, args.classes))\n        for j in range(len(index_room)):\n            pred[index_room[j]] += output_room[j]\n        pred = np.argmax(pred, axis=1)\n\n        # calculation 1: add per room predictions\n        intersection, union, target = intersectionAndUnion(pred, gt, args.classes, args.ignore_label)\n        intersection_meter.update(intersection)\n        union_meter.update(union)\n        target_meter.update(target)\n        # calculation 2\n        pred_all = np.hstack([pred_all, pred]) if pred_all.size else pred\n        gt_all = np.hstack([gt_all, gt]) if gt_all.size else gt\n        pred_save.append(pred), gt_save.append(gt)\n\n    with open(os.path.join(args.save_folder, \"pred_{}.pickle\".format(args.test_area)), 'wb') as handle:\n        pickle.dump({'pred': pred_save}, handle, protocol=pickle.HIGHEST_PROTOCOL)\n    with open(os.path.join(args.save_folder, \"gt_{}.pickle\".format(args.test_area)), 'wb') as handle:\n        pickle.dump({'gt': gt_save}, handle, protocol=pickle.HIGHEST_PROTOCOL)\n\n    # calculation 1\n    iou_class = intersection_meter.sum / (union_meter.sum + 1e-10)\n    accuracy_class = intersection_meter.sum / (target_meter.sum + 1e-10)\n    mIoU1 = np.mean(iou_class)\n    mAcc1 = np.mean(accuracy_class)\n    allAcc1 = sum(intersection_meter.sum) / (sum(target_meter.sum) + 1e-10)\n\n    # calculation 2\n    intersection, union, target = intersectionAndUnion(pred_all, gt_all, args.classes, args.ignore_label)\n    iou_class = intersection / (union + 1e-10)\n    accuracy_class = intersection / (target + 1e-10)\n    mIoU = np.mean(iou_class)\n    mAcc = np.mean(accuracy_class)\n    allAcc = sum(intersection) / (sum(target) + 1e-10)\n    logger.info('Val result: mIoU/mAcc/allAcc {:.4f}/{:.4f}/{:.4f}.'.format(mIoU, mAcc, allAcc))\n    logger.info('Val1 result: mIoU/mAcc/allAcc {:.4f}/{:.4f}/{:.4f}.'.format(mIoU1, mAcc1, allAcc1))\n\n    for i in range(args.classes):\n        logger.info('Class_{} Result: iou/accuracy {:.4f}/{:.4f}, name: {}.'.format(i, iou_class[i], accuracy_class[i], names[i]))\n    logger.info('<<<<<<<<<<<<<<<<< End Evaluation <<<<<<<<<<<<<<<<<')\n    return mIoU, mAcc, allAcc, pred_all\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "tool/test_scannet.py",
    "content": "import os\nimport time\nimport random\nimport numpy as np\nimport logging\nimport pickle\nimport argparse\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.parallel\nimport torch.optim\nimport torch.utils.data\n\nfrom util import config\nfrom util.util import AverageMeter, intersectionAndUnion, check_makedirs\n\nrandom.seed(123)\nnp.random.seed(123)\n\n\ndef get_parser():\n    parser = argparse.ArgumentParser(description='PyTorch Point Cloud Classification / Semantic Segmentation')\n    parser.add_argument('--config', type=str, default='config/scannet/scannet_pointweb.yaml', help='config file')\n    parser.add_argument('opts', help='see config/scannet/scannet_pointweb.yaml for all options', default=None, nargs=argparse.REMAINDER)\n    args = parser.parse_args()\n    assert args.config is not None\n    cfg = config.load_cfg_from_cfg_file(args.config)\n    if args.opts is not None:\n        cfg = config.merge_cfg_from_list(cfg, args.opts)\n    return cfg\n\n\ndef get_logger():\n    logger_name = \"main-logger\"\n    logger = logging.getLogger(logger_name)\n    logger.setLevel(logging.INFO)\n    handler = logging.StreamHandler()\n    fmt = \"[%(asctime)s %(levelname)s %(filename)s line %(lineno)d %(process)d] %(message)s\"\n    handler.setFormatter(logging.Formatter(fmt))\n    logger.addHandler(handler)\n    return logger\n\n\ndef main():\n    global args, logger\n    args = get_parser()\n    logger = get_logger()\n    logger.info(args)\n    assert args.classes > 1\n    logger.info(\"=> creating model ...\")\n    logger.info(\"Classes: {}\".format(args.classes))\n\n    if args.arch == 'pointnet_seg':\n        from model.pointnet.pointnet import PointNetSeg as Model\n    elif args.arch == 'pointnet2_seg':\n        from model.pointnet2.pointnet2_seg import PointNet2SSGSeg as Model\n    elif args.arch == 'pointweb_seg':\n        from model.pointweb.pointweb_seg import PointWebSeg as Model\n    else:\n        raise Exception('architecture not supported yet'.format(args.arch))\n    model = Model(c=args.fea_dim, k=args.classes, use_xyz=args.use_xyz)\n    model = torch.nn.DataParallel(model.cuda())\n    logger.info(model)\n    criterion = nn.CrossEntropyLoss(ignore_index=args.ignore_label).cuda()\n    names = [line.rstrip('\\n') for line in open(args.names_path)]\n    if os.path.isfile(args.model_path):\n        logger.info(\"=> loading checkpoint '{}'\".format(args.model_path))\n        checkpoint = torch.load(args.model_path)\n        model.load_state_dict(checkpoint['state_dict'], strict=False)\n        logger.info(\"=> loaded checkpoint '{}'\".format(args.model_path))\n    else:\n        raise RuntimeError(\"=> no checkpoint found at '{}'\".format(args.model_path))\n    test(model, criterion, names)\n\n\ndef data_prepare(points, labels):\n    coord_min, coord_max = np.amin(points, axis=0)[:3], np.amax(points, axis=0)[:3]\n    stride = args.block_size * args.stride_rate\n    grid_x = int(np.ceil(float(coord_max[0] - coord_min[0] - args.block_size) / stride) + 1)\n    grid_y = int(np.ceil(float(coord_max[1] - coord_min[1] - args.block_size) / stride) + 1)\n    data_room, label_room, index_room = np.array([]), np.array([]), np.array([])\n    for index_y in range(0, grid_y):\n        for index_x in range(0, grid_x):\n            s_x = coord_min[0] + index_x * stride\n            e_x = min(s_x + args.block_size, coord_max[0])\n            s_x = e_x - args.block_size\n            s_y = coord_min[1] + index_y * stride\n            e_y = min(s_y + args.block_size, coord_max[1])\n            s_y = e_y - args.block_size\n            point_idxs = np.where((points[:, 0] >= s_x - 1e-8) & (points[:, 0] <= e_x + 1e-8) & (points[:, 1] >= s_y - 1e-8) & (points[:, 1] <= e_y + 1e-8))[0]\n            if point_idxs.size == 0:\n                continue\n            num_batch = int(np.ceil(point_idxs.size / args.num_point))\n            point_size = int(num_batch * args.num_point)\n            replace = False if (point_size - point_idxs.size <= point_idxs.size) else True\n            point_idxs_repeat = np.random.choice(point_idxs, point_size - point_idxs.size, replace=replace)\n            point_idxs = np.concatenate((point_idxs, point_idxs_repeat))\n            np.random.shuffle(point_idxs)\n            data_batch = points[point_idxs, :]\n            normlized_xyz = np.zeros((point_size, 3))\n            normlized_xyz[:, 0] = data_batch[:, 0] / coord_max[0]\n            normlized_xyz[:, 1] = data_batch[:, 1] / coord_max[1]\n            normlized_xyz[:, 2] = data_batch[:, 2] / coord_max[2]\n            data_batch[:, 0] = data_batch[:, 0] - (s_x + args.block_size / 2.0)\n            data_batch[:, 1] = data_batch[:, 1] - (s_y + args.block_size / 2.0)\n            data_batch = np.concatenate((data_batch, normlized_xyz), axis=1)\n            label_batch = labels[point_idxs]\n            data_room = np.vstack([data_room, data_batch]) if data_room.size else data_batch\n            label_room = np.hstack([label_room, label_batch]) if label_room.size else label_batch\n            index_room = np.hstack([index_room, point_idxs]) if index_room.size else point_idxs\n    assert np.unique(index_room).size == labels.size\n    return data_room, label_room, index_room\n\n\ndef test(model, criterion, names):\n    logger.info('>>>>>>>>>>>>>>>> Start Evaluation >>>>>>>>>>>>>>>>')\n    batch_time = AverageMeter()\n    intersection_meter = AverageMeter()\n    union_meter = AverageMeter()\n    target_meter = AverageMeter()\n\n    model.eval()\n    data_file = os.path.join(args.data_root, 'scannet_{}.pickle'.format(args.split))\n    file_pickle = open(data_file, 'rb')\n    xyz_all = pickle.load(file_pickle, encoding='latin1')\n    label_all = pickle.load(file_pickle, encoding='latin1')\n    file_pickle.close()\n    gt_all, pred_all = np.array([]), np.array([])\n    vox_acc = []\n    check_makedirs(args.save_folder)\n    pred_save, gt_save = [], []\n    for idx in range(len(xyz_all)):\n        points, labels = xyz_all[idx], label_all[idx].astype(np.int32)\n        gt = labels - 1\n        gt[labels == 0] = 255\n        data_room, label_room, index_room = data_prepare(points, gt)\n        batch_point = args.num_point * args.test_batch_size\n        batch_num = int(np.ceil(label_room.size / batch_point))\n        end = time.time()\n        output_room = np.array([])\n        for i in range(batch_num):\n            s_i, e_i = i * batch_point, min((i + 1) * batch_point, label_room.size)\n            input, target, index = data_room[s_i:e_i, :], label_room[s_i:e_i], index_room[s_i:e_i]\n            input = torch.from_numpy(input).float().view(-1, args.num_point, input.shape[1])\n            target = torch.from_numpy(target).long().view(-1, args.num_point)\n            with torch.no_grad():\n                output = model(input.cuda())\n            loss = criterion(output, target.cuda())  # for reference\n            output = output.transpose(1, 2).contiguous().view(-1, args.classes).data.cpu().numpy()\n            pred = np.argmax(output, axis=1)\n            intersection, union, target = intersectionAndUnion(pred, target.view(-1).data.cpu().numpy(), args.classes,\n                                                               args.ignore_label)\n            accuracy = sum(intersection) / (sum(target) + 1e-10)\n            output_room = np.vstack([output_room, output]) if output_room.size else output\n            batch_time.update(time.time() - end)\n            end = time.time()\n            if ((i + 1) % args.print_freq == 0) or (i + 1 == batch_num):\n                logger.info('Test: [{}/{}]-[{}/{}] '\n                            'Batch {batch_time.val:.3f} ({batch_time.avg:.3f}) '\n                            'Loss {loss:.4f} '\n                            'Accuracy {accuracy:.4f} '\n                            'Points {gt.size}.'.format(idx + 1, len(xyz_all),\n                                                       i + 1, batch_num,\n                                                       batch_time=batch_time,\n                                                       loss=loss,\n                                                       accuracy=accuracy,\n                                                       gt=gt))\n\n        pred = np.zeros((gt.size, args.classes))\n        for j in range(len(index_room)):\n            pred[index_room[j]] += output_room[j]\n        pred = np.argmax(pred, axis=1)\n\n        # calculation 1: add per room predictions\n        intersection, union, target = intersectionAndUnion(pred, gt, args.classes, args.ignore_label)\n        intersection_meter.update(intersection)\n        union_meter.update(union)\n        target_meter.update(target)\n        # calculation 2\n        pred_all = np.hstack([pred_all, pred]) if pred_all.size else pred\n        gt_all = np.hstack([gt_all, gt]) if gt_all.size else gt\n        pred_save.append(pred), gt_save.append(gt)\n\n        # compute voxel accuracy (follow scannet, pointnet++ and pointcnn)\n        res = 0.0484\n        coord_min, coord_max = np.min(points, axis=0), np.max(points, axis=0)\n        nvox = np.ceil((coord_max - coord_min) / res)\n        vidx = np.ceil((points - coord_min) / res)\n        vidx = vidx[:, 0] + vidx[:, 1] * nvox[0] + vidx[:, 2] * nvox[0] * nvox[1]\n        uvidx, vpidx = np.unique(vidx, return_index=True)\n        # compute voxel label\n        uvlabel = np.array(gt)[vpidx]\n        uvpred = np.array(pred)[vpidx]\n        # compute voxel accuracy (ignore label 0 which is scannet unannotated)\n        c_accvox = np.sum(np.equal(uvpred, uvlabel))\n        c_ignore = np.sum(np.equal(uvlabel, 255))\n        vox_acc.append([c_accvox, len(uvlabel) - c_ignore])\n\n    with open(os.path.join(args.save_folder, \"pred_{}.pickle\".format(args.split)), 'wb') as handle:\n        pickle.dump({'pred': pred_save}, handle, protocol=pickle.HIGHEST_PROTOCOL)\n    with open(os.path.join(args.save_folder, \"gt_{}.pickle\".format(args.split)), 'wb') as handle:\n        pickle.dump({'gt': gt_save}, handle, protocol=pickle.HIGHEST_PROTOCOL)\n\n    # calculation 1\n    iou_class = intersection_meter.sum / (union_meter.sum + 1e-10)\n    accuracy_class = intersection_meter.sum / (target_meter.sum + 1e-10)\n    mIoU1 = np.mean(iou_class)\n    mAcc1 = np.mean(accuracy_class)\n    allAcc1 = sum(intersection_meter.sum) / (sum(target_meter.sum) + 1e-10)\n\n    # calculation 2\n    intersection, union, target = intersectionAndUnion(pred_all, gt_all, args.classes, args.ignore_label)\n    iou_class = intersection / (union + 1e-10)\n    accuracy_class = intersection / (target + 1e-10)\n    mIoU = np.mean(iou_class)\n    mAcc = np.mean(accuracy_class)\n    allAcc = sum(intersection) / (sum(target) + 1e-10)\n    # compute avg voxel acc\n    vox_acc = np.sum(vox_acc, 0)\n    voxAcc = vox_acc[0] * 1.0 / vox_acc[1]\n    logger.info('Val result: mIoU/mAcc/allAcc/voxAcc {:.4f}/{:.4f}/{:.4f}/{:.4f}.'.format(mIoU, mAcc, allAcc, voxAcc))\n    logger.info('Val111 result: mIoU/mAcc/allAcc {:.4f}/{:.4f}/{:.4f}/{:.4f}.'.format(mIoU1, mAcc1, allAcc1, voxAcc))\n\n    for i in range(args.classes):\n        logger.info('Class_{} Result: iou/accuracy {:.4f}/{:.4f}, name: {}.'.format(i, iou_class[i], accuracy_class[i],\n                                                                                    names[i]))\n    logger.info('<<<<<<<<<<<<<<<<< End Evaluation <<<<<<<<<<<<<<<<<')\n    return mIoU, mAcc, allAcc, pred_all\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "tool/train.py",
    "content": "import os\nimport time\nimport random\nimport numpy as np\nimport logging\nimport argparse\n\nimport torch\nimport torch.backends.cudnn as cudnn\nimport torch.nn as nn\nimport torch.nn.parallel\nimport torch.optim\nimport torch.utils.data\nimport torch.optim.lr_scheduler as lr_scheduler\nfrom tensorboardX import SummaryWriter\n\nfrom util import dataset, transform, config\nfrom util.s3dis import S3DIS\nfrom util.scannet import ScanNet\nfrom util.util import AverageMeter, intersectionAndUnionGPU\n\n\ndef get_parser():\n    parser = argparse.ArgumentParser(description='PyTorch Point Cloud Semantic Segmentation')\n    parser.add_argument('--config', type=str, default='config/s3dis/s3dis_pointweb.yaml', help='config file')\n    parser.add_argument('opts', help='see config/s3dis/s3dis_pointweb.yaml for all options', default=None, nargs=argparse.REMAINDER)\n    args = parser.parse_args()\n    assert args.config is not None\n    cfg = config.load_cfg_from_cfg_file(args.config)\n    if args.opts is not None:\n        cfg = config.merge_cfg_from_list(cfg, args.opts)\n    return cfg\n\n\ndef get_logger():\n    logger_name = \"main-logger\"\n    logger = logging.getLogger(logger_name)\n    logger.setLevel(logging.INFO)\n    handler = logging.StreamHandler()\n    fmt = \"[%(asctime)s %(levelname)s %(filename)s line %(lineno)d %(process)d] %(message)s\"\n    handler.setFormatter(logging.Formatter(fmt))\n    logger.addHandler(handler)\n    return logger\n\n\ndef worker_init_fn(worker_id):\n    random.seed(args.manual_seed + worker_id)\n\n\ndef init():\n    global args, logger, writer\n    args = get_parser()\n    logger = get_logger()\n    writer = SummaryWriter(args.save_path)\n    os.environ[\"CUDA_VISIBLE_DEVICES\"] = ','.join(str(x) for x in args.train_gpu)\n    if args.manual_seed is not None:\n        cudnn.benchmark = False\n        cudnn.deterministic = True\n        torch.manual_seed(args.manual_seed)\n        np.random.seed(args.manual_seed)\n        torch.manual_seed(args.manual_seed)\n        torch.cuda.manual_seed_all(args.manual_seed)\n    if len(args.train_gpu) == 1:\n        args.sync_bn = False\n    logger.info(args)\n\n\ndef main():\n    init()\n    if args.arch == 'pointnet_seg':\n        from model.pointnet.pointnet import PointNetSeg as Model\n    elif args.arch == 'pointnet2_seg':\n        from model.pointnet2.pointnet2_seg import PointNet2SSGSeg as Model\n    elif args.arch == 'pointweb_seg':\n        from model.pointweb.pointweb_seg import PointWebSeg as Model\n    else:\n        raise Exception('architecture not supported yet'.format(args.arch))\n    model = Model(c=args.fea_dim, k=args.classes, use_xyz=args.use_xyz)\n    if args.sync_bn:\n        from util.util import convert_to_syncbn\n        convert_to_syncbn(model)\n    criterion = nn.CrossEntropyLoss(ignore_index=args.ignore_label).cuda()\n    optimizer = torch.optim.SGD(model.parameters(), lr=args.base_lr, momentum=args.momentum, weight_decay=args.weight_decay)\n    scheduler = lr_scheduler.StepLR(optimizer, step_size=args.step_epoch, gamma=args.multiplier)\n    logger.info(\"=> creating model ...\")\n    logger.info(\"Classes: {}\".format(args.classes))\n    logger.info(model)\n    model = torch.nn.DataParallel(model.cuda())\n    if args.sync_bn:\n        from lib.sync_bn import patch_replication_callback\n        patch_replication_callback(model)\n    if args.weight:\n        if os.path.isfile(args.weight):\n            logger.info(\"=> loading weight '{}'\".format(args.weight))\n            checkpoint = torch.load(args.weight)\n            model.load_state_dict(checkpoint['state_dict'])\n            logger.info(\"=> loaded weight '{}'\".format(args.weight))\n        else:\n            logger.info(\"=> no weight found at '{}'\".format(args.weight))\n\n    if args.resume:\n        if os.path.isfile(args.resume):\n            logger.info(\"=> loading checkpoint '{}'\".format(args.resume))\n            # checkpoint = torch.load(args.resume)\n            checkpoint = torch.load(args.resume, map_location=lambda storage, loc: storage.cuda())\n            args.start_epoch = checkpoint['epoch']\n            model.load_state_dict(checkpoint['state_dict'])\n            optimizer.load_state_dict(checkpoint['optimizer'])\n            scheduler.load_state_dict(checkpoint['scheduler'])\n            logger.info(\"=> loaded checkpoint '{}' (epoch {})\".format(args.resume, checkpoint['epoch']))\n        else:\n            logger.info(\"=> no checkpoint found at '{}'\".format(args.resume))\n\n    train_transform = transform.Compose([transform.ToTensor()])\n    if args.data_name == 's3dis':\n        train_data = S3DIS(split='train', data_root=args.train_full_folder, num_point=args.num_point, test_area=args.test_area, block_size=args.block_size, sample_rate=args.sample_rate, transform=train_transform)\n        # train_data = dataset.PointData(split='train', data_root=args.data_root, data_list=args.train_list, transform=train_transform)\n    elif args.data_name == 'scannet':\n        train_data = ScanNet(split='train', data_root=args.data_root, num_point=args.num_point, block_size=args.block_size, sample_rate=args.sample_rate, transform=train_transform)\n    elif args.data_name == 'modelnet40':\n        train_data = dataset.PointData(split='train', data_root=args.data_root, data_list=args.train_list, transform=train_transform, num_point=args.num_point, random_index=True)\n    train_loader = torch.utils.data.DataLoader(train_data, batch_size=args.train_batch_size, shuffle=True, num_workers=args.train_workers, pin_memory=True)\n\n    val_loader = None\n    if args.evaluate:\n        val_transform = transform.Compose([transform.ToTensor()])\n        val_data = dataset.PointData(split='val', data_root=args.data_root, data_list=args.val_list, transform=val_transform)\n        val_loader = torch.utils.data.DataLoader(val_data, batch_size=args.train_batch_size_val, shuffle=False, num_workers=args.train_workers, pin_memory=True)\n\n    for epoch in range(args.start_epoch, args.epochs):\n        scheduler.step()\n        loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch)\n        epoch_log = epoch + 1\n        writer.add_scalar('loss_train', loss_train, epoch_log)\n        writer.add_scalar('mIoU_train', mIoU_train, epoch_log)\n        writer.add_scalar('mAcc_train', mAcc_train, epoch_log)\n        writer.add_scalar('allAcc_train', allAcc_train, epoch_log)\n\n        if epoch_log % args.save_freq == 0:\n            filename = args.save_path + '/train_epoch_' + str(epoch_log) + '.pth'\n            logger.info('Saving checkpoint to: ' + filename)\n            torch.save({'epoch': epoch_log, 'state_dict': model.state_dict(), 'optimizer': optimizer.state_dict(), 'scheduler': scheduler.state_dict()}, filename)\n            if epoch_log / args.save_freq > 2:\n                deletename = args.save_path + '/train_epoch_' + str(epoch_log - args.save_freq * 2) + '.pth'\n                os.remove(deletename)\n        if args.evaluate:\n            loss_val, mIoU_val, mAcc_val, allAcc_val = validate(val_loader, model, criterion)\n            writer.add_scalar('loss_val', loss_val, epoch_log)\n            writer.add_scalar('mIoU_val', mIoU_val, epoch_log)\n            writer.add_scalar('mAcc_val', mAcc_val, epoch_log)\n            writer.add_scalar('allAcc_val', allAcc_val, epoch_log)\n\n\ndef train(train_loader, model, criterion, optimizer, epoch):\n    batch_time = AverageMeter()\n    data_time = AverageMeter()\n    loss_meter = AverageMeter()\n    intersection_meter = AverageMeter()\n    union_meter = AverageMeter()\n    target_meter = AverageMeter()\n\n    model.train()\n    end = time.time()\n    max_iter = args.epochs * len(train_loader)\n    for i, (input, target) in enumerate(train_loader):\n        data_time.update(time.time() - end)\n        input = input.cuda(non_blocking=True)\n        target = target.cuda(non_blocking=True)\n        output = model(input)\n        if target.shape[-1] == 1:\n            target = target[:, 0]  # for cls\n        loss = criterion(output, target)\n        optimizer.zero_grad()\n        loss.backward()\n        optimizer.step()\n\n        output = output.max(1)[1]\n        intersection, union, target = intersectionAndUnionGPU(output, target, args.classes, args.ignore_label)\n        intersection, union, target = intersection.cpu().numpy(), union.cpu().numpy(), target.cpu().numpy()\n        intersection_meter.update(intersection), union_meter.update(union), target_meter.update(target)\n\n        accuracy = sum(intersection_meter.val) / (sum(target_meter.val) + 1e-10)\n        loss_meter.update(loss.item(), input.size(0))\n        batch_time.update(time.time() - end)\n        end = time.time()\n\n        # calculate remain time\n        current_iter = epoch * len(train_loader) + i + 1\n        remain_iter = max_iter - current_iter\n        remain_time = remain_iter * batch_time.avg\n        t_m, t_s = divmod(remain_time, 60)\n        t_h, t_m = divmod(t_m, 60)\n        remain_time = '{:02d}:{:02d}:{:02d}'.format(int(t_h), int(t_m), int(t_s))\n\n        if (i + 1) % args.print_freq == 0:\n            logger.info('Epoch: [{}/{}][{}/{}] '\n                        'Data {data_time.val:.3f} ({data_time.avg:.3f}) '\n                        'Batch {batch_time.val:.3f} ({batch_time.avg:.3f}) '\n                        'Remain {remain_time} '\n                        'Loss {loss_meter.val:.4f} '\n                        'Accuracy {accuracy:.4f}.'.format(epoch+1, args.epochs, i + 1, len(train_loader),\n                                                          batch_time=batch_time, data_time=data_time,\n                                                          remain_time=remain_time,\n                                                          loss_meter=loss_meter,\n                                                          accuracy=accuracy))\n        writer.add_scalar('loss_train_batch', loss_meter.val, current_iter)\n        writer.add_scalar('mIoU_train_batch', np.mean(intersection / (union + 1e-10)), current_iter)\n        writer.add_scalar('mAcc_train_batch', np.mean(intersection / (target + 1e-10)), current_iter)\n        writer.add_scalar('allAcc_train_batch', accuracy, current_iter)\n\n    iou_class = intersection_meter.sum / (union_meter.sum + 1e-10)\n    accuracy_class = intersection_meter.sum / (target_meter.sum + 1e-10)\n    mIoU = np.mean(iou_class)\n    mAcc = np.mean(accuracy_class)\n    allAcc = sum(intersection_meter.sum) / (sum(target_meter.sum) + 1e-10)\n    logger.info('Train result at epoch [{}/{}]: mIoU/mAcc/allAcc {:.4f}/{:.4f}/{:.4f}.'.format(epoch+1, args.epochs, mIoU, mAcc, allAcc))\n    return loss_meter.avg, mIoU, mAcc, allAcc\n\n\ndef validate(val_loader, model, criterion):\n    logger.info('>>>>>>>>>>>>>>>> Start Evaluation >>>>>>>>>>>>>>>>')\n    batch_time = AverageMeter()\n    data_time = AverageMeter()\n    loss_meter = AverageMeter()\n    intersection_meter = AverageMeter()\n    union_meter = AverageMeter()\n    target_meter = AverageMeter()\n\n    model.eval()\n    end = time.time()\n    for i, (input, target) in enumerate(val_loader):\n        data_time.update(time.time() - end)\n        input = input.cuda(non_blocking=True)\n        target = target.cuda(non_blocking=True)\n        if target.shape[-1] == 1:\n            target = target[:, 0]  # for cls\n        output = model(input)\n        loss = criterion(output, target)\n\n        output = output.max(1)[1]\n        intersection, union, target = intersectionAndUnionGPU(output, target, args.classes, args.ignore_label)\n        intersection, union, target = intersection.cpu().numpy(), union.cpu().numpy(), target.cpu().numpy()\n        intersection_meter.update(intersection), union_meter.update(union), target_meter.update(target)\n\n        accuracy = sum(intersection_meter.val) / (sum(target_meter.val) + 1e-10)\n        loss_meter.update(loss.item(), input.size(0))\n        batch_time.update(time.time() - end)\n        end = time.time()\n        if (i + 1) % args.print_freq == 0:\n            logger.info('Test: [{}/{}] '\n                        'Data {data_time.val:.3f} ({data_time.avg:.3f}) '\n                        'Batch {batch_time.val:.3f} ({batch_time.avg:.3f}) '\n                        'Loss {loss_meter.val:.4f} ({loss_meter.avg:.4f}) '\n                        'Accuracy {accuracy:.4f}.'.format(i + 1, len(val_loader),\n                                                          data_time=data_time,\n                                                          batch_time=batch_time,\n                                                          loss_meter=loss_meter,\n                                                          accuracy=accuracy))\n\n    iou_class = intersection_meter.sum / (union_meter.sum + 1e-10)\n    accuracy_class = intersection_meter.sum / (target_meter.sum + 1e-10)\n    mIoU = np.mean(iou_class)\n    mAcc = np.mean(accuracy_class)\n    allAcc = sum(intersection_meter.sum) / (sum(target_meter.sum) + 1e-10)\n\n    logger.info('Val result: mIoU/mAcc/allAcc {:.4f}/{:.4f}/{:.4f}.'.format(mIoU, mAcc, allAcc))\n    for i in range(args.classes):\n        logger.info('Class_{} Result: iou/accuracy {:.4f}/{:.4f}.'.format(i, iou_class[i], accuracy_class[i]))\n    logger.info('<<<<<<<<<<<<<<<<< End Evaluation <<<<<<<<<<<<<<<<<')\n    return loss_meter.avg, mIoU, mAcc, allAcc\n\n\nif __name__ == '__main__':\n    main()\n"
  },
  {
    "path": "tool/train.sh",
    "content": "#!/bin/sh\nexport PYTHONPATH=./\n\nPYTHON=python\ndataset=$1\nexp_name=$2\nexp_dir=exp/${dataset}/${exp_name}\nmodel_dir=${exp_dir}/model\nconfig=config/${dataset}/${dataset}_${exp_name}.yaml\n\nmkdir -p ${model_dir}\nnow=$(date +\"%Y%m%d_%H%M%S\")\ncp tool/train.sh tool/train.py ${config} ${exp_dir}\n\n$PYTHON tool/train.py --config=${config} 2>&1 | tee ${model_dir}/train-$now.log\n\nif [ ${dataset} = 's3dis' ]\nthen\n  $PYTHON tool/test_s3dis.py --config=${config} 2>&1 | tee ${model_dir}/test-$now.log\nelif [ ${dataset} = 'scannet' ]\nthen\n  $PYTHON tool/test_scannet.py --config=${config} 2>&1 | tee ${model_dir}/test-$now.log\nfi\n"
  },
  {
    "path": "util/config.py",
    "content": "# -----------------------------------------------------------------------------\n# Functions for parsing args\n# -----------------------------------------------------------------------------\nimport yaml\nimport os\nfrom ast import literal_eval\nimport copy\n\n\nclass CfgNode(dict):\n    \"\"\"\n    CfgNode represents an internal node in the configuration tree. It's a simple\n    dict-like container that allows for attribute-based access to keys.\n    \"\"\"\n\n    def __init__(self, init_dict=None, key_list=None, new_allowed=False):\n        # Recursively convert nested dictionaries in init_dict into CfgNodes\n        init_dict = {} if init_dict is None else init_dict\n        key_list = [] if key_list is None else key_list\n        for k, v in init_dict.items():\n            if type(v) is dict:\n                # Convert dict to CfgNode\n                init_dict[k] = CfgNode(v, key_list=key_list + [k])\n        super(CfgNode, self).__init__(init_dict)\n\n    def __getattr__(self, name):\n        if name in self:\n            return self[name]\n        else:\n            raise AttributeError(name)\n\n    def __setattr__(self, name, value):\n        self[name] = value\n\n    def __str__(self):\n        def _indent(s_, num_spaces):\n            s = s_.split(\"\\n\")\n            if len(s) == 1:\n                return s_\n            first = s.pop(0)\n            s = [(num_spaces * \" \") + line for line in s]\n            s = \"\\n\".join(s)\n            s = first + \"\\n\" + s\n            return s\n\n        r = \"\"\n        s = []\n        for k, v in sorted(self.items()):\n            seperator = \"\\n\" if isinstance(v, CfgNode) else \" \"\n            attr_str = \"{}:{}{}\".format(str(k), seperator, str(v))\n            attr_str = _indent(attr_str, 2)\n            s.append(attr_str)\n        r += \"\\n\".join(s)\n        return r\n\n    def __repr__(self):\n        return \"{}({})\".format(self.__class__.__name__, super(CfgNode, self).__repr__())\n\n\ndef load_cfg_from_cfg_file(file):\n    cfg = {}\n    assert os.path.isfile(file) and file.endswith('.yaml'), \\\n        '{} is not a yaml file'.format(file)\n\n    with open(file, 'r') as f:\n        cfg_from_file = yaml.safe_load(f)\n\n    for key in cfg_from_file:\n        for k, v in cfg_from_file[key].items():\n            cfg[k] = v\n\n    cfg = CfgNode(cfg)\n    return cfg\n\n\ndef merge_cfg_from_list(cfg, cfg_list):\n    new_cfg = copy.deepcopy(cfg)\n    assert len(cfg_list) % 2 == 0\n    for full_key, v in zip(cfg_list[0::2], cfg_list[1::2]):\n        subkey = full_key.split('.')[-1]\n        assert subkey in cfg, 'Non-existent key: {}'.format(full_key)\n        value = _decode_cfg_value(v)\n        value = _check_and_coerce_cfg_value_type(\n            value, cfg[subkey], subkey, full_key\n        )\n        setattr(new_cfg, subkey, value)\n\n    return new_cfg\n\n\ndef _decode_cfg_value(v):\n    \"\"\"Decodes a raw config value (e.g., from a yaml config files or command\n    line argument) into a Python object.\n    \"\"\"\n    # All remaining processing is only applied to strings\n    if not isinstance(v, str):\n        return v\n    # Try to interpret `v` as a:\n    #   string, number, tuple, list, dict, boolean, or None\n    try:\n        v = literal_eval(v)\n    # The following two excepts allow v to pass through when it represents a\n    # string.\n    #\n    # Longer explanation:\n    # The type of v is always a string (before calling literal_eval), but\n    # sometimes it *represents* a string and other times a data structure, like\n    # a list. In the case that v represents a string, what we got back from the\n    # yaml parser is 'foo' *without quotes* (so, not '\"foo\"'). literal_eval is\n    # ok with '\"foo\"', but will raise a ValueError if given 'foo'. In other\n    # cases, like paths (v = 'foo/bar' and not v = '\"foo/bar\"'), literal_eval\n    # will raise a SyntaxError.\n    except ValueError:\n        pass\n    except SyntaxError:\n        pass\n    return v\n\n\ndef _check_and_coerce_cfg_value_type(replacement, original, key, full_key):\n    \"\"\"Checks that `replacement`, which is intended to replace `original` is of\n    the right type. The type is correct if it matches exactly or is one of a few\n    cases in which the type can be easily coerced.\n    \"\"\"\n    original_type = type(original)\n    replacement_type = type(replacement)\n\n    # The types must match (with some exceptions)\n    if replacement_type == original_type:\n        return replacement\n\n    # Cast replacement from from_type to to_type if the replacement and original\n    # types match from_type and to_type\n    def conditional_cast(from_type, to_type):\n        if replacement_type == from_type and original_type == to_type:\n            return True, to_type(replacement)\n        else:\n            return False, None\n\n    # Conditionally casts\n    # list <-> tuple\n    casts = [(tuple, list), (list, tuple)]\n    # For py2: allow converting from str (bytes) to a unicode string\n    try:\n        casts.append((str, unicode))  # noqa: F821\n    except Exception:\n        pass\n\n    for (from_type, to_type) in casts:\n        converted, converted_value = conditional_cast(from_type, to_type)\n        if converted:\n            return converted_value\n\n    raise ValueError(\n        \"Type mismatch ({} vs. {}) with values ({} vs. {}) for config \"\n        \"key: {}\".format(\n            original_type, replacement_type, original, replacement, full_key\n        )\n    )\n\n\ndef _assert_with_logging(cond, msg):\n    if not cond:\n        logger.debug(msg)\n    assert cond, msg\n\n"
  },
  {
    "path": "util/dataset.py",
    "content": "import os\nimport h5py\nimport numpy as np\n\nfrom torch.utils.data import Dataset\n\n\ndef make_dataset(split='train', data_root=None, data_list=None):\n    if not os.path.isfile(data_list):\n        raise (RuntimeError(\"Point list file do not exist: \" + data_list + \"\\n\"))\n    point_list = []\n    list_read = open(data_list).readlines()\n    print(\"Totally {} samples in {} set.\".format(len(list_read), split))\n    for line in list_read:\n        point_list.append(os.path.join(data_root, line.strip()))\n    return point_list\n\n\nclass PointData(Dataset):\n    def __init__(self, split='train', data_root=None, data_list=None, transform=None, num_point=None, random_index=False):\n        assert split in ['train', 'val', 'test']\n        self.split = split\n        self.data_list = make_dataset(split, data_root, data_list)\n        self.transform = transform\n        self.num_point = num_point\n        self.random_index = random_index\n\n    def __len__(self):\n        return len(self.data_list)\n\n    def __getitem__(self, index):\n        data_path = self.data_list[index]\n        f = h5py.File(data_path, 'r')\n        data = f['data'][:]\n        if self.split is 'test':\n            label = 255  # place holder\n        else:\n            label = f['label'][:]\n        f.close()\n        if self.num_point is None:\n            self.num_point = data.shape[0]\n        idxs = np.arange(data.shape[0])\n        if self.random_index:\n            np.random.shuffle(idxs)\n        idxs = idxs[0:self.num_point]\n        data = data[idxs, :]\n        if label.size != 1:  # seg data\n            label = label[idxs]\n        if self.transform is not None:\n            data, label = self.transform(data, label)\n        return data, label\n\n\nif __name__ == '__main__':\n    data_root = '/mnt/sda1/hszhao/dataset/3d/s3dis'\n    data_list = '/mnt/sda1/hszhao/dataset/3d/s3dis/list/train12346.txt'\n    point_data = PointData('train', data_root, data_list)\n    print('point data size:', point_data.__len__())\n    print('point data 0 shape:', point_data.__getitem__(0)[0].shape)\n    print('point label 0 shape:', point_data.__getitem__(0)[1].shape)\n"
  },
  {
    "path": "util/pt_util.py",
    "content": "import shutil, os\nimport tqdm\nfrom itertools import repeat\nimport numpy as np\nfrom typing import List, Tuple\n# from scipy.stats import t as student_t\n# import statistics as stats\n\nimport torch\nimport torch.nn as nn\nfrom torch.autograd.function import InplaceFunction\n\nBN1d, BN2d, BN3d = nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d\n\nclass SharedMLP(nn.Sequential):\n\n    def __init__(\n            self,\n            args: List[int],\n            *,\n            bn: bool = False,\n            activation=nn.ReLU(inplace=True),\n            preact: bool = False,\n            first: bool = False,\n            name: str = \"\"\n    ):\n        super().__init__()\n\n        for i in range(len(args) - 1):\n            self.add_module(\n                name + 'layer{}'.format(i),\n                Conv2d(\n                    args[i],\n                    args[i + 1],\n                    bn=(not first or not preact or (i != 0)) and bn,\n                    activation=activation\n                    if (not first or not preact or (i != 0)) else None,\n                    preact=preact\n                )\n            )\n\n\nclass _BNBase(nn.Sequential):\n\n    def __init__(self, in_size, batch_norm=None, name=\"\"):\n        super().__init__()\n        self.add_module(name + \"bn\", batch_norm(in_size))\n\n        nn.init.constant_(self[0].weight, 1.0)\n        nn.init.constant_(self[0].bias, 0)\n\n\nclass BatchNorm1d(_BNBase):\n\n    def __init__(self, in_size: int, *, name: str = \"\"):\n        super().__init__(in_size, batch_norm=BN1d, name=name)\n\n\nclass BatchNorm2d(_BNBase):\n\n    def __init__(self, in_size: int, name: str = \"\"):\n        super().__init__(in_size, batch_norm=BN2d, name=name)\n\n\nclass BatchNorm3d(_BNBase):\n\n    def __init__(self, in_size: int, name: str = \"\"):\n        super().__init__(in_size, batch_norm=BN3d, name=name)\n\n\nclass _ConvBase(nn.Sequential):\n\n    def __init__(\n            self,\n            in_size,\n            out_size,\n            kernel_size,\n            stride,\n            padding,\n            activation,\n            bn,\n            init,\n            conv=None,\n            batch_norm=None,\n            bias=True,\n            preact=False,\n            name=\"\"\n    ):\n        super().__init__()\n\n        bias = bias and (not bn)\n        conv_unit = conv(\n            in_size,\n            out_size,\n            kernel_size=kernel_size,\n            stride=stride,\n            padding=padding,\n            bias=bias\n        )\n        init(conv_unit.weight)\n        if bias:\n            nn.init.constant_(conv_unit.bias, 0)\n\n        if bn:\n            if not preact:\n                bn_unit = batch_norm(out_size)\n            else:\n                bn_unit = batch_norm(in_size)\n\n        if preact:\n            if bn:\n                self.add_module(name + 'bn', bn_unit)\n\n            if activation is not None:\n                self.add_module(name + 'activation', activation)\n\n        self.add_module(name + 'conv', conv_unit)\n\n        if not preact:\n            if bn:\n                self.add_module(name + 'bn', bn_unit)\n\n            if activation is not None:\n                self.add_module(name + 'activation', activation)\n\n\nclass Conv1d(_ConvBase):\n\n    def __init__(\n            self,\n            in_size: int,\n            out_size: int,\n            *,\n            kernel_size: int = 1,\n            stride: int = 1,\n            padding: int = 0,\n            activation=nn.ReLU(inplace=True),\n            bn: bool = False,\n            init=nn.init.kaiming_normal_,\n            bias: bool = True,\n            preact: bool = False,\n            name: str = \"\"\n    ):\n        super().__init__(\n            in_size,\n            out_size,\n            kernel_size,\n            stride,\n            padding,\n            activation,\n            bn,\n            init,\n            conv=nn.Conv1d,\n            batch_norm=BatchNorm1d,\n            bias=bias,\n            preact=preact,\n            name=name\n        )\n\n\nclass Conv2d(_ConvBase):\n\n    def __init__(\n            self,\n            in_size: int,\n            out_size: int,\n            *,\n            kernel_size: Tuple[int, int] = (1, 1),\n            stride: Tuple[int, int] = (1, 1),\n            padding: Tuple[int, int] = (0, 0),\n            activation=nn.ReLU(inplace=True),\n            bn: bool = False,\n            init=nn.init.kaiming_normal_,\n            bias: bool = True,\n            preact: bool = False,\n            name: str = \"\"\n    ):\n        super().__init__(\n            in_size,\n            out_size,\n            kernel_size,\n            stride,\n            padding,\n            activation,\n            bn,\n            init,\n            conv=nn.Conv2d,\n            batch_norm=BatchNorm2d,\n            bias=bias,\n            preact=preact,\n            name=name\n        )\n\n\nclass Conv3d(_ConvBase):\n\n    def __init__(\n            self,\n            in_size: int,\n            out_size: int,\n            *,\n            kernel_size: Tuple[int, int, int] = (1, 1, 1),\n            stride: Tuple[int, int, int] = (1, 1, 1),\n            padding: Tuple[int, int, int] = (0, 0, 0),\n            activation=nn.ReLU(inplace=True),\n            bn: bool = False,\n            init=nn.init.kaiming_normal_,\n            bias: bool = True,\n            preact: bool = False,\n            name: str = \"\"\n    ):\n        super().__init__(\n            in_size,\n            out_size,\n            kernel_size,\n            stride,\n            padding,\n            activation,\n            bn,\n            init,\n            conv=nn.Conv3d,\n            batch_norm=BatchNorm3d,\n            bias=bias,\n            preact=preact,\n            name=name\n        )\n\n\nclass FC(nn.Sequential):\n\n    def __init__(\n            self,\n            in_size: int,\n            out_size: int,\n            *,\n            activation=nn.ReLU(inplace=True),\n            bn: bool = False,\n            init=None,\n            preact: bool = False,\n            name: str = \"\"\n    ):\n        super().__init__()\n\n        fc = nn.Linear(in_size, out_size, bias=not bn)\n        if init is not None:\n            init(fc.weight)\n        if not bn:\n            nn.init.constant_(fc.bias, 0)\n\n        if preact:\n            if bn:\n                self.add_module(name + 'bn', BatchNorm1d(in_size))\n\n            if activation is not None:\n                self.add_module(name + 'activation', activation)\n\n        self.add_module(name + 'fc', fc)\n\n        if not preact:\n            if bn:\n                self.add_module(name + 'bn', BatchNorm1d(out_size))\n\n            if activation is not None:\n                self.add_module(name + 'activation', activation)\n\n\nclass _DropoutNoScaling(InplaceFunction):\n\n    @staticmethod\n    def _make_noise(input):\n        return input.new().resize_as_(input)\n\n    @staticmethod\n    def symbolic(g, input, p=0.5, train=False, inplace=False):\n        if inplace:\n            return None\n        n = g.appendNode(\n            g.create(\"Dropout\", [input]).f_(\"ratio\",\n                                            p).i_(\"is_test\", not train)\n        )\n        real = g.appendNode(g.createSelect(n, 0))\n        g.appendNode(g.createSelect(n, 1))\n        return real\n\n    @classmethod\n    def forward(cls, ctx, input, p=0.5, train=False, inplace=False):\n        if p < 0 or p > 1:\n            raise ValueError(\n                \"dropout probability has to be between 0 and 1, \"\n                \"but got {}\".format(p)\n            )\n        ctx.p = p\n        ctx.train = train\n        ctx.inplace = inplace\n\n        if ctx.inplace:\n            ctx.mark_dirty(input)\n            output = input\n        else:\n            output = input.clone()\n\n        if ctx.p > 0 and ctx.train:\n            ctx.noise = cls._make_noise(input)\n            if ctx.p == 1:\n                ctx.noise.fill_(0)\n            else:\n                ctx.noise.bernoulli_(1 - ctx.p)\n            ctx.noise = ctx.noise.expand_as(input)\n            output.mul_(ctx.noise)\n\n        return output\n\n    @staticmethod\n    def backward(ctx, grad_output):\n        if ctx.p > 0 and ctx.train:\n            return grad_output.mul(ctx.noise), None, None, None\n        else:\n            return grad_output, None, None, None\n\n\ndropout_no_scaling = _DropoutNoScaling.apply\n\n\nclass _FeatureDropoutNoScaling(_DropoutNoScaling):\n\n    @staticmethod\n    def symbolic(input, p=0.5, train=False, inplace=False):\n        return None\n\n    @staticmethod\n    def _make_noise(input):\n        return input.new().resize_(\n            input.size(0), input.size(1), *repeat(1,\n                                                  input.dim() - 2)\n        )\n\n\nfeature_dropout_no_scaling = _FeatureDropoutNoScaling.apply\n\n\ndef group_model_params(model: nn.Module, **kwargs):\n    decay_group = []\n    no_decay_group = []\n\n    for name, param in model.named_parameters():\n        if name.find(\"bn\") != -1 or name.find(\"bias\") != -1:\n            no_decay_group.append(param)\n        else:\n            decay_group.append(param)\n\n    assert len(list(model.parameters())) == len(decay_group) + len(no_decay_group)\n\n    return [\n        dict(params=decay_group, **kwargs),\n        dict(params=no_decay_group, weight_decay=0.0, **kwargs)\n    ]\n\n\ndef checkpoint_state(\n        model=None, optimizer=None, best_prec=None, epoch=None, it=None\n):\n    optim_state = optimizer.state_dict() if optimizer is not None else None\n    if model is not None:\n        if isinstance(model, torch.nn.DataParallel):\n            model_state = model.module.state_dict()\n        else:\n            model_state = model.state_dict()\n    else:\n        model_state = None\n\n    return {\n        'epoch': epoch,\n        'it': it,\n        'best_prec': best_prec,\n        'model_state': model_state,\n        'optimizer_state': optim_state\n    }\n\n\ndef save_checkpoint(\n        state, is_best, filename='checkpoint', bestname='model_best'\n):\n    filename = '{}.pth.tar'.format(filename)\n    torch.save(state, filename)\n    if is_best:\n        shutil.copyfile(filename, '{}.pth.tar'.format(bestname))\n\n\ndef load_checkpoint(model=None, optimizer=None, filename='checkpoint'):\n    filename = \"{}.pth.tar\".format(filename)\n    if os.path.isfile(filename):\n        print(\"==> Loading from checkpoint '{}'\".format(filename))\n        checkpoint = torch.load(filename)\n        epoch = checkpoint['epoch']\n        it = checkpoint.get('it', 0.0)\n        best_prec = checkpoint['best_prec']\n        if model is not None and checkpoint['model_state'] is not None:\n            model.load_state_dict(checkpoint['model_state'])\n        if optimizer is not None and checkpoint['optimizer_state'] is not None:\n            optimizer.load_state_dict(checkpoint['optimizer_state'])\n        print(\"==> Done\")\n    else:\n        print(\"==> Checkpoint '{}' not found\".format(filename))\n\n    return it, epoch, best_prec\n\n\ndef variable_size_collate(pad_val=0, use_shared_memory=True):\n    import collections\n    _numpy_type_map = {\n        'float64': torch.DoubleTensor,\n        'float32': torch.FloatTensor,\n        'float16': torch.HalfTensor,\n        'int64': torch.LongTensor,\n        'int32': torch.IntTensor,\n        'int16': torch.ShortTensor,\n        'int8': torch.CharTensor,\n        'uint8': torch.ByteTensor,\n    }\n\n    def wrapped(batch):\n        \"Puts each data field into a tensor with outer dimension batch size\"\n\n        error_msg = \"batch must contain tensors, numbers, dicts or lists; found {}\"\n        elem_type = type(batch[0])\n        if torch.is_tensor(batch[0]):\n            max_len = 0\n            for b in batch:\n                max_len = max(max_len, b.size(0))\n\n            numel = sum([int(b.numel() / b.size(0) * max_len) for b in batch])\n            if use_shared_memory:\n                # If we're in a background process, concatenate directly into a\n                # shared memory tensor to avoid an extra copy\n                storage = batch[0].storage()._new_shared(numel)\n                out = batch[0].new(storage)\n            else:\n                out = batch[0].new(numel)\n\n            out = out.view(\n                len(batch), max_len,\n                *[batch[0].size(i) for i in range(1, batch[0].dim())]\n            )\n            out.fill_(pad_val)\n            for i in range(len(batch)):\n                out[i, 0:batch[i].size(0)] = batch[i]\n\n            return out\n        elif elem_type.__module__ == 'numpy' and elem_type.__name__ != 'str_' \\\n                and elem_type.__name__ != 'string_':\n            elem = batch[0]\n            if elem_type.__name__ == 'ndarray':\n                # array of string classes and object\n                if re.search('[SaUO]', elem.dtype.str) is not None:\n                    raise TypeError(error_msg.format(elem.dtype))\n\n                return wrapped([torch.from_numpy(b) for b in batch])\n            if elem.shape == ():  # scalars\n                py_type = float if elem.dtype.name.startswith('float') else int\n                return _numpy_type_map[elem.dtype.name](\n                    list(map(py_type, batch))\n                )\n        elif isinstance(batch[0], int):\n            return torch.LongTensor(batch)\n        elif isinstance(batch[0], float):\n            return torch.DoubleTensor(batch)\n        elif isinstance(batch[0], collections.Mapping):\n            return {key: wrapped([d[key] for d in batch]) for key in batch[0]}\n        elif isinstance(batch[0], collections.Sequence):\n            transposed = zip(*batch)\n            return [wrapped(samples) for samples in transposed]\n\n        raise TypeError((error_msg.format(type(batch[0]))))\n\n    return wrapped\n\n\nclass TrainValSplitter():\n    r\"\"\"\n        Creates a training and validation split to be used as the sampler in a pytorch DataLoader\n    Parameters\n    ---------\n        numel : int\n            Number of elements in the entire training dataset\n        percent_train : float\n            Percentage of data in the training split\n        shuffled : bool\n            Whether or not shuffle which data goes to which split\n    \"\"\"\n\n    def __init__(\n            self, *, numel: int, percent_train: float, shuffled: bool = False\n    ):\n        indicies = np.array([i for i in range(numel)])\n        if shuffled:\n            np.random.shuffle(indicies)\n\n        self.train = torch.utils.data.sampler.SubsetRandomSampler(\n            indicies[0:int(percent_train * numel)]\n        )\n        self.val = torch.utils.data.sampler.SubsetRandomSampler(\n            indicies[int(percent_train * numel):-1]\n        )\n\n\n'''\nclass CrossValSplitter():\n    r\"\"\"\n        Class that creates cross validation splits.  The train and val splits can be used in pytorch DataLoaders.  The splits can be updated\n        by calling next(self) or using a loop:\n            for _ in self:\n                ....\n    Parameters\n    ---------\n        numel : int\n            Number of elements in the training set\n        k_folds : int\n            Number of folds\n        shuffled : bool\n            Whether or not to shuffle which data goes in which fold\n    \"\"\"\n\n    def __init__(self, *, numel: int, k_folds: int, shuffled: bool = False):\n        inidicies = np.array([i for i in range(numel)])\n        if shuffled:\n            np.random.shuffle(inidicies)\n\n        self.folds = np.array(np.array_split(inidicies, k_folds), dtype=object)\n        self.current_v_ind = -1\n\n        self.val = torch.utils.data.sampler.SubsetRandomSampler(self.folds[0])\n        self.train = torch.utils.data.sampler.SubsetRandomSampler(\n            np.concatenate(self.folds[1:], axis=0)\n        )\n\n        self.metrics = {}\n\n    def __iter__(self):\n        self.current_v_ind = -1\n        return self\n\n    def __len__(self):\n        return len(self.folds)\n\n    def __getitem__(self, idx):\n        assert idx >= 0 and idx < len(self)\n        self.val.inidicies = self.folds[idx]\n        self.train.inidicies = np.concatenate(\n            self.folds[np.arange(len(self)) != idx], axis=0\n        )\n\n    def __next__(self):\n        self.current_v_ind += 1\n        if self.current_v_ind >= len(self):\n            raise StopIteration\n\n        self[self.current_v_ind]\n\n    def update_metrics(self, to_post: dict):\n        for k, v in to_post.items():\n            if k in self.metrics:\n                self.metrics[k].append(v)\n            else:\n                self.metrics[k] = [v]\n\n    def print_metrics(self):\n        for name, samples in self.metrics.items():\n            xbar = stats.mean(samples)\n            sx = stats.stdev(samples, xbar)\n            tstar = student_t.ppf(1.0 - 0.025, len(samples) - 1)\n            margin_of_error = tstar * sx / sqrt(len(samples))\n            print(\"{}: {} +/- {}\".format(name, xbar, margin_of_error))\n'''\n\n\ndef set_bn_momentum_default(bn_momentum):\n\n    def fn(m):\n        if isinstance(m, (nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d)):\n            m.momentum = bn_momentum\n\n    return fn\n\n\nclass BNMomentumScheduler(object):\n\n    def __init__(\n            self, model, bn_lambda, last_epoch=-1,\n            setter=set_bn_momentum_default\n    ):\n        if not isinstance(model, nn.Module):\n            raise RuntimeError(\n                \"Class '{}' is not a PyTorch nn Module\".format(\n                    type(model).__name__\n                )\n            )\n\n        self.model = model\n        self.setter = setter\n        self.lmbd = bn_lambda\n\n        self.step(last_epoch + 1)\n        self.last_epoch = last_epoch\n\n    def step(self, epoch=None):\n        if epoch is None:\n            epoch = self.last_epoch + 1\n\n        self.last_epoch = epoch\n        self.model.apply(self.setter(self.lmbd(epoch)))\n\n\nclass Trainer(object):\n    r\"\"\"\n        Reasonably generic trainer for pytorch models\n\n    Parameters\n    ----------\n    model : pytorch model\n        Model to be trained\n    model_fn : function (model, inputs, labels) -> preds, loss, accuracy\n    optimizer : torch.optim\n        Optimizer for model\n    checkpoint_name : str\n        Name of file to save checkpoints to\n    best_name : str\n        Name of file to save best model to\n    lr_scheduler : torch.optim.lr_scheduler\n        Learning rate scheduler.  .step() will be called at the start of every epoch\n    bnm_scheduler : BNMomentumScheduler\n        Batchnorm momentum scheduler.  .step() will be called at the start of every epoch\n    eval_frequency : int\n        How often to run an eval\n    log_name : str\n        Name of file to output tensorboard_logger to\n    \"\"\"\n\n    def __init__(\n            self,\n            model,\n            model_fn,\n            optimizer,\n            checkpoint_name=\"ckpt\",\n            best_name=\"best\",\n            lr_scheduler=None,\n            bnm_scheduler=None,\n            eval_frequency=-1,\n            viz=None\n    ):\n        self.model, self.model_fn, self.optimizer, self.lr_scheduler, self.bnm_scheduler = (\n            model, model_fn, optimizer, lr_scheduler, bnm_scheduler\n        )\n\n        self.checkpoint_name, self.best_name = checkpoint_name, best_name\n        self.eval_frequency = eval_frequency\n\n        self.training_best, self.eval_best = {}, {}\n        self.viz = viz\n\n    @staticmethod\n    def _decode_value(v):\n        if isinstance(v[0], float):\n            return np.mean(v)\n        elif isinstance(v[0], tuple):\n            if len(v[0]) == 3:\n                num = [l[0] for l in v]\n                denom = [l[1] for l in v]\n                w = v[0][2]\n            else:\n                num = [l[0] for l in v]\n                denom = [l[1] for l in v]\n                w = None\n\n            return np.average(\n                np.sum(num, axis=0) / (np.sum(denom, axis=0) + 1e-6), weights=w\n            )\n        else:\n            raise AssertionError(\"Unknown type: {}\".format(type(v)))\n\n    def _train_it(self, it, batch):\n        self.model.train()\n\n        if self.lr_scheduler is not None:\n            self.lr_scheduler.step(it)\n\n        if self.bnm_scheduler is not None:\n            self.bnm_scheduler.step(it)\n\n        self.optimizer.zero_grad()\n        _, loss, eval_res = self.model_fn(self.model, batch)\n\n        loss.backward()\n        self.optimizer.step()\n\n        return eval_res\n\n    def eval_epoch(self, d_loader):\n        self.model.eval()\n\n        eval_dict = {}\n        total_loss = 0.0\n        count = 1.0\n        for i, data in tqdm.tqdm(enumerate(d_loader, 0), total=len(d_loader),\n                                 leave=False, desc='val'):\n            self.optimizer.zero_grad()\n\n            _, loss, eval_res = self.model_fn(self.model, data, eval=True)\n\n            total_loss += loss.item()\n            count += 1\n            for k, v in eval_res.items():\n                if v is not None:\n                    eval_dict[k] = eval_dict.get(k, []) + [v]\n\n        return total_loss / count, eval_dict\n\n    def train(\n            self,\n            start_it,\n            start_epoch,\n            n_epochs,\n            train_loader,\n            test_loader=None,\n            best_loss=0.0\n    ):\n        r\"\"\"\n           Call to begin training the model\n\n        Parameters\n        ----------\n        start_epoch : int\n            Epoch to start at\n        n_epochs : int\n            Number of epochs to train for\n        test_loader : torch.utils.data.DataLoader\n            DataLoader of the test_data\n        train_loader : torch.utils.data.DataLoader\n            DataLoader of training data\n        best_loss : float\n            Testing loss of the best model\n        \"\"\"\n\n        eval_frequency = (\n            self.eval_frequency\n            if self.eval_frequency > 0 else len(train_loader)\n        )\n\n        it = start_it\n        with tqdm.trange(start_epoch, n_epochs + 1, desc='epochs') as tbar, \\\n                tqdm.tqdm(total=eval_frequency, leave=False, desc='train') as pbar:\n\n            for epoch in tbar:\n                for batch in train_loader:\n                    res = self._train_it(it, batch)\n                    it += 1\n\n                    pbar.update()\n                    pbar.set_postfix(dict(total_it=it))\n                    tbar.refresh()\n\n                    if self.viz is not None:\n                        self.viz.update('train', it, res)\n\n                    if (it % eval_frequency) == 0:\n                        pbar.close()\n\n                        if test_loader is not None:\n                            val_loss, res = self.eval_epoch(test_loader)\n\n                            if self.viz is not None:\n                                self.viz.update('val', it, res)\n\n                            is_best = val_loss < best_loss\n                            best_loss = min(best_loss, val_loss)\n                            save_checkpoint(\n                                checkpoint_state(\n                                    self.model, self.optimizer, val_loss, epoch,\n                                    it\n                                ),\n                                is_best,\n                                filename=self.checkpoint_name,\n                                bestname=self.best_name\n                            )\n\n                        pbar = tqdm.tqdm(\n                            total=eval_frequency, leave=False, desc='train'\n                        )\n                        pbar.set_postfix(dict(total_it=it))\n\n        return best_loss\n"
  },
  {
    "path": "util/s3dis.py",
    "content": "import os\nimport numpy as np\n\nfrom torch.utils.data import Dataset\n\n\nclass S3DIS(Dataset):\n    def __init__(self, split='train', data_root='trainval_fullarea', num_point=4096, test_area=5, block_size=1.0, sample_rate=1.0, transform=None):\n        super().__init__()\n        self.num_point = num_point\n        self.block_size = block_size\n        self.transform = transform\n        rooms = sorted(os.listdir(data_root))\n        rooms = [room for room in rooms if 'Area_' in room]\n        if split == 'train':\n            rooms_split = [room for room in rooms if not 'Area_{}'.format(test_area) in room]\n        else:\n            rooms_split = [room for room in rooms if 'Area_{}'.format(test_area) in room]\n        self.room_points, self.room_labels = [], []\n        self.room_coord_min, self.room_coord_max = [], []\n        num_point_all = []\n        for room_name in rooms_split:\n            room_path = os.path.join(data_root, room_name)\n            room_data = np.load(room_path)  # xyzrgbl, N*7\n            points, labels = room_data[:, 0:6], room_data[:, 6]  # xyzrgb, N*6; l, N\n            coord_min, coord_max = np.amin(points, axis=0)[:3], np.amax(points, axis=0)[:3]\n            self.room_points.append(points), self.room_labels.append(labels)\n            self.room_coord_min.append(coord_min), self.room_coord_max.append(coord_max)\n            num_point_all.append(labels.size)\n        sample_prob = num_point_all / np.sum(num_point_all)\n        num_iter = int(np.sum(num_point_all) * sample_rate / num_point)\n        room_idxs = []\n        for index in range(len(rooms_split)):\n            room_idxs.extend([index] * int(round(sample_prob[index] * num_iter)))\n        self.room_idxs = np.array(room_idxs)\n        print(\"Totally {} samples in {} set.\".format(len(self.room_idxs), split))\n\n    def __getitem__(self, idx):\n        room_idx = self.room_idxs[idx]\n        points = self.room_points[room_idx]   # N * 6\n        labels = self.room_labels[room_idx]   # N\n        N_points = points.shape[0]\n\n        while (True):\n            center = points[np.random.choice(N_points)][:3]\n            block_min = center - [self.block_size / 2.0, self.block_size / 2.0, 0]\n            block_max = center + [self.block_size / 2.0, self.block_size / 2.0, 0]\n            point_idxs = np.where((points[:, 0] >= block_min[0]) & (points[:, 0] <= block_max[0]) & (points[:, 1] >= block_min[1]) & (points[:, 1] <= block_max[1]))[0]\n            if point_idxs.size > 1024:\n                break\n\n        if point_idxs.size >= self.num_point:\n            selected_point_idxs = np.random.choice(point_idxs, self.num_point, replace=False)\n        else:\n            selected_point_idxs = np.random.choice(point_idxs, self.num_point, replace=True)\n\n        # normalize\n        selected_points = points[selected_point_idxs, :]  # num_point * 6\n        current_points = np.zeros((self.num_point, 9))  # num_point * 9\n        current_points[:, 6] = selected_points[:, 0] / self.room_coord_max[room_idx][0]\n        current_points[:, 7] = selected_points[:, 1] / self.room_coord_max[room_idx][1]\n        current_points[:, 8] = selected_points[:, 2] / self.room_coord_max[room_idx][2]\n        selected_points[:, 0] = selected_points[:, 0] - center[0]\n        selected_points[:, 1] = selected_points[:, 1] - center[1]\n        selected_points[:, 3:6] /= 255.0\n        current_points[:, 0:6] = selected_points\n        current_labels = labels[selected_point_idxs]\n        if self.transform is not None:\n            current_points, current_labels = self.transform(current_points, current_labels)\n        return current_points, current_labels\n\n    def __len__(self):\n        return len(self.room_idxs)\n\n\nif __name__ == '__main__':\n    data_root = '/mnt/lustre/zhaohengshuang/dataset/s3dis/trainval_fullarea'\n    num_point, test_area, block_size, sample_rate = 4096, 5, 1.0, 0.01\n\n    point_data = S3DIS(split='train', data_root=data_root, num_point=num_point, test_area=test_area, block_size=block_size, sample_rate=sample_rate, transform=None)\n    print('point data size:', point_data.__len__())\n    print('point data 0 shape:', point_data.__getitem__(0)[0].shape)\n    print('point label 0 shape:', point_data.__getitem__(0)[1].shape)\n    import torch, time, random\n    manual_seed = 123\n    random.seed(manual_seed)\n    np.random.seed(manual_seed)\n    torch.manual_seed(manual_seed)\n    torch.cuda.manual_seed_all(manual_seed)\n    def worker_init_fn(worker_id):\n        random.seed(manual_seed + worker_id)\n    train_loader = torch.utils.data.DataLoader(point_data, batch_size=16, shuffle=True, num_workers=16, pin_memory=True, worker_init_fn=worker_init_fn)\n    for idx in range(4):\n        end = time.time()\n        for i, (input, target) in enumerate(train_loader):\n            print('time: {}/{}--{}'.format(i+1, len(train_loader), time.time() - end))\n            end = time.time()\n"
  },
  {
    "path": "util/scannet.py",
    "content": "import pickle\nimport os\nimport numpy as np\n\nfrom torch.utils.data import Dataset\n\n\nclass ScanNet(Dataset):\n    def __init__(self, split='train', data_root='scannet', num_point=8192, classes=20, block_size=1.5, sample_rate=1.0, transform=None):\n        self.split = split\n        self.num_point = num_point\n        self.block_size = block_size\n        self.transform = transform\n        data_file = os.path.join(data_root, 'scannet_{}.pickle'.format(split))\n        file_pickle = open(data_file, 'rb')\n        xyz_all = pickle.load(file_pickle, encoding='latin1')\n        label_all = pickle.load(file_pickle, encoding='latin1')\n        file_pickle.close()\n\n        self.label_all = []  # for change 0-20 to 0-19 + 255\n        self.room_coord_min, self.room_coord_max = [], []\n        num_point_all = []\n        label_weight = np.zeros(classes+1)\n        for index in range(len(xyz_all)):\n            xyz, label = xyz_all[index], label_all[index]  # xyzrgb, N*6; l, N\n            coord_min, coord_max = np.amin(xyz, axis=0)[:3], np.amax(xyz, axis=0)[:3]\n            self.room_coord_min.append(coord_min), self.room_coord_max.append(coord_max)\n            num_point_all.append(label.size)\n            tmp, _ = np.histogram(label, range(classes + 2))\n            label_weight += tmp\n            label_new = label - 1\n            label_new[label == 0] = 255\n            self.label_all.append(label_new.astype(np.uint8))\n        label_weight = label_weight[1:].astype(np.float32)\n        label_weight = label_weight / label_weight.sum()\n        label_weight = 1 / np.log(1.2 + label_weight)\n        sample_prob = num_point_all / np.sum(num_point_all)\n        num_iter = int(np.sum(num_point_all) * sample_rate / num_point)\n        room_idxs = []\n        for index in range(len(xyz_all)):\n            room_idxs.extend([index] * int(round(sample_prob[index] * num_iter)))\n        self.room_idxs = np.array(room_idxs)\n        self.xyz_all = xyz_all\n        self.label_weight = label_weight\n        print(\"Totally {} samples in {} set.\".format(len(self.room_idxs), split))\n\n    def __getitem__(self, idx):\n        room_idx = self.room_idxs[idx]\n        points = self.xyz_all[room_idx]  # N * 3\n        labels = self.label_all[room_idx]  # N\n        N_points = points.shape[0]\n\n        for i in range(10):\n            center = points[np.random.choice(N_points)][:3]\n            block_min = center - [self.block_size / 2.0, self.block_size / 2.0, 0]\n            block_max = center + [self.block_size / 2.0, self.block_size / 2.0, 0]\n            block_min[2], block_max[2] = self.room_coord_min[room_idx][2], self.room_coord_max[room_idx][2]\n            point_idxs = np.where((points[:, 0] >= block_min[0]) & (points[:, 0] <= block_max[0]) & (points[:, 1] >= block_min[1]) & (points[:, 1] <= block_max[1]))[0]\n            if point_idxs.size == 0:\n                continue\n            vidx = np.ceil((points[point_idxs, :] - block_min) / (block_max - block_min) * [31.0, 31.0, 62.0])\n            vidx = np.unique(vidx[:, 0] * 31.0 * 62.0 + vidx[:, 1] * 62.0 + vidx[:, 2])\n            if ((labels[point_idxs] != 255).sum() / point_idxs.size >= 0.7) and (vidx.size/31.0/31.0/62.0 >= 0.02):\n                break\n\n        if point_idxs.size >= self.num_point:\n            selected_point_idxs = np.random.choice(point_idxs, self.num_point, replace=False)\n        else:\n            selected_point_idxs = np.random.choice(point_idxs, self.num_point, replace=True)\n        # normalize\n        selected_points = points[selected_point_idxs, :]  # num_point * 3\n        current_points = np.zeros((self.num_point, 6))  # num_point * 6\n        current_points[:, 3] = selected_points[:, 0] / self.room_coord_max[room_idx][0]\n        current_points[:, 4] = selected_points[:, 1] / self.room_coord_max[room_idx][1]\n        current_points[:, 5] = selected_points[:, 2] / self.room_coord_max[room_idx][2]\n        selected_points[:, 0] = selected_points[:, 0] - center[0]\n        selected_points[:, 1] = selected_points[:, 1] - center[1]\n        current_points[:, 0:3] = selected_points\n        current_labels = labels[selected_point_idxs]\n        if self.transform is not None:\n            current_points, current_labels = self.transform(current_points, current_labels)\n        return current_points, current_labels\n\n    def __len__(self):\n        return len(self.room_idxs)\n\n\nif __name__ == '__main__':\n    data_root = '/mnt/sda1/hszhao/dataset/scannet'\n    point_data = ScanNet(split='train', data_root=data_root, num_point=8192, transform=None)\n    print('point data size:', point_data.__len__())\n    print('point data 0 shape:', point_data.__getitem__(0)[0].shape)\n    print('point label 0 shape:', point_data.__getitem__(0)[1].shape)\n    import torch, time, random\n    manual_seed = 123\n    def worker_init_fn(worker_id):\n        random.seed(manual_seed + worker_id)\n    random.seed(manual_seed)\n    np.random.seed(manual_seed)\n    torch.manual_seed(manual_seed)\n    torch.cuda.manual_seed_all(manual_seed)\n    train_loader = torch.utils.data.DataLoader(point_data, batch_size=16, shuffle=True, num_workers=1, pin_memory=True, worker_init_fn=worker_init_fn)\n    for idx in range(2):\n        end = time.time()\n        for i, (input, target) in enumerate(train_loader):\n            print('time: {}/{}--{}'.format(i+1, len(train_loader), time.time() - end))\n            end = time.time()\n"
  },
  {
    "path": "util/transform.py",
    "content": "import numpy as np\n\nimport torch\n\n\nclass Compose(object):\n    def __init__(self, transforms):\n        self.transforms = transforms\n\n    def __call__(self, data, label):\n        for t in self.transforms:\n            data, label = t(data, label)\n        return data, label\n\n\nclass ToTensor(object):\n    def __call__(self, data, label):\n        data = torch.from_numpy(data)\n        if not isinstance(data, torch.FloatTensor):\n            data = data.float()\n        label = torch.from_numpy(label)\n        if not isinstance(label, torch.LongTensor):\n            label = label.long()\n        return data, label\n\n\nclass RandomRotate(object):\n    def __init__(self, rotate_angle=None, along_z=False):\n        self.rotate_angle = rotate_angle\n        self.along_z = along_z\n\n    def __call__(self, data, label):\n        if self.rotate_angle is None:\n            rotate_angle = np.random.uniform() * 2 * np.pi\n        else:\n            rotate_angle = self.rotate_angle\n        cosval, sinval = np.cos(rotate_angle), np.sin(rotate_angle)\n        if self.along_z:\n            rotation_matrix = np.array([[cosval, sinval, 0], [-sinval, cosval, 0], [0, 0, 1]])\n        else:\n            rotation_matrix = np.array([[cosval, 0, sinval], [0, 1, 0], [-sinval, 0, cosval]])\n        data[:, 0:3] = np.dot(data[:, 0:3], rotation_matrix)\n        if data.shape[1] > 3:  # use normal\n            data[:, 3:6] = np.dot(data[:, 3:6], rotation_matrix)\n        return data, label\n\n\nclass RandomRotatePerturbation(object):\n    def __init__(self, angle_sigma=0.06, angle_clip=0.18):\n        self.angle_sigma = angle_sigma\n        self.angle_clip = angle_clip\n\n    def __call__(self, data, label):\n        angles = np.clip(self.angle_sigma*np.random.randn(3), -self.angle_clip, self.angle_clip)\n        Rx = np.array([[1, 0, 0],\n                       [0, np.cos(angles[0]), -np.sin(angles[0])],\n                       [0, np.sin(angles[0]), np.cos(angles[0])]])\n        Ry = np.array([[np.cos(angles[1]), 0, np.sin(angles[1])],\n                       [0, 1, 0],\n                       [-np.sin(angles[1]), 0, np.cos(angles[1])]])\n        Rz = np.array([[np.cos(angles[2]), -np.sin(angles[2]), 0],\n                       [np.sin(angles[2]), np.cos(angles[2]), 0],\n                       [0, 0, 1]])\n        R = np.dot(Rz, np.dot(Ry, Rx))\n        data[:, 0:3] = np.dot(data[:, 0:3], R)\n        if data.shape[1] > 3:  # use normal\n            data[:, 3:6] = np.dot(data[:, 3:6], R)\n        return data, label\n\n\nclass RandomScale(object):\n    def __init__(self, scale_low=0.8, scale_high=1.25):\n        self.scale_low = scale_low\n        self.scale_high = scale_high\n\n    def __call__(self, data, label):\n        scale = np.random.uniform(self.scale_low, self.scale_high)\n        data[:, 0:3] *= scale\n        return data, label\n\n\nclass RandomShift(object):\n    def __init__(self, shift_range=0.1):\n        self.shift_range = shift_range\n\n    def __call__(self, data, label):\n        shift = np.random.uniform(-self.shift_range, self.shift_range, 3)\n        data[:, 0:3] += shift\n        return data, label\n\n\nclass RandomJitter(object):\n    def __init__(self, sigma=0.01, clip=0.05):\n        self.sigma = sigma\n        self.clip = clip\n\n    def __call__(self, data, label):\n        assert (self.clip > 0)\n        jitter = np.clip(self.sigma * np.random.randn(data.shape[0], 3), -1 * self.clip, self.clip)\n        data[:, 0:3] += jitter\n        return data, label\n"
  },
  {
    "path": "util/util.py",
    "content": "import os\nimport numpy as np\nfrom PIL import Image\n\nimport torch\nfrom torch import nn\nfrom torch.nn.modules.conv import _ConvNd\nfrom torch.nn.modules.batchnorm import _BatchNorm\nimport torch.nn.init as initer\n\n\nclass AverageMeter(object):\n    \"\"\"Computes and stores the average and current value\"\"\"\n    def __init__(self):\n        self.reset()\n\n    def reset(self):\n        self.val = 0\n        self.avg = 0\n        self.sum = 0\n        self.count = 0\n\n    def update(self, val, n=1):\n        self.val = val\n        self.sum += val * n\n        self.count += n\n        self.avg = self.sum / self.count\n\n\ndef step_learning_rate(optimizer, base_lr, epoch, step_epoch, multiplier=0.1, clip=1e-6):\n    \"\"\"Sets the learning rate to the base LR decayed by 10 every step epochs\"\"\"\n    lr = max(base_lr * (multiplier ** (epoch // step_epoch)), clip)\n    for param_group in optimizer.param_groups:\n        param_group['lr'] = lr\n\n\ndef poly_learning_rate(optimizer, base_lr, curr_iter, max_iter, power=0.9):\n    \"\"\"poly learning rate policy\"\"\"\n    lr = base_lr * (1 - float(curr_iter) / max_iter) ** power\n    for param_group in optimizer.param_groups:\n        param_group['lr'] = lr\n\n\ndef intersectionAndUnion(output, target, K, ignore_index=255):\n    # 'K' classes, output and target sizes are N or N * L or N * H * W, each value in range 0 to K - 1.\n    assert (output.ndim in [1, 2, 3])\n    assert output.shape == target.shape\n    output = output.reshape(output.size).copy()\n    target = target.reshape(target.size)\n    output[np.where(target == ignore_index)[0]] = 255\n    intersection = output[np.where(output == target)[0]]\n    area_intersection, _ = np.histogram(intersection, bins=np.arange(K+1))\n    area_output, _ = np.histogram(output, bins=np.arange(K+1))\n    area_target, _ = np.histogram(target, bins=np.arange(K+1))\n    area_union = area_output + area_target - area_intersection\n    return area_intersection, area_union, area_target\n\n\ndef intersectionAndUnionGPU(output, target, K, ignore_index=255):\n    # 'K' classes, output and target sizes are N or N * L or N * H * W, each value in range 0 to K - 1.\n    assert (output.dim() in [1, 2, 3])\n    assert output.shape == target.shape\n    output = output.view(-1)\n    target = target.view(-1)\n    output[target == ignore_index] = ignore_index\n    intersection = output[output == target]\n    # https://github.com/pytorch/pytorch/issues/1382\n    area_intersection = torch.histc(intersection.float().cpu(), bins=K, min=0, max=K-1)\n    area_output = torch.histc(output.float().cpu(), bins=K, min=0, max=K-1)\n    area_target = torch.histc(target.float().cpu(), bins=K, min=0, max=K-1)\n    area_union = area_output + area_target - area_intersection\n    return area_intersection.cuda(), area_union.cuda(), area_target.cuda()\n\n\ndef check_mkdir(dir_name):\n    if not os.path.exists(dir_name):\n        os.mkdir(dir_name)\n\n\ndef check_makedirs(dir_name):\n    if not os.path.exists(dir_name):\n        os.makedirs(dir_name)\n\n\ndef init_weights(model, conv='kaiming', batchnorm='normal', linear='kaiming', lstm='kaiming'):\n    \"\"\"\n    :param model: Pytorch Model which is nn.Module\n    :param conv:  'kaiming' or 'xavier'\n    :param batchnorm: 'normal' or 'constant'\n    :param linear: 'kaiming' or 'xavier'\n    :param lstm: 'kaiming' or 'xavier'\n    \"\"\"\n    for m in model.modules():\n        if isinstance(m, (_ConvNd)):\n            if conv == 'kaiming':\n                initer.kaiming_normal_(m.weight)\n            elif conv == 'xavier':\n                initer.xavier_normal_(m.weight)\n            else:\n                raise ValueError(\"init type of conv error.\\n\")\n            if m.bias is not None:\n                initer.constant_(m.bias, 0)\n\n        elif isinstance(m, _BatchNorm):\n            if batchnorm == 'normal':\n                initer.normal_(m.weight, 1.0, 0.02)\n            elif batchnorm == 'constant':\n                initer.constant_(m.weight, 1.0)\n            else:\n                raise ValueError(\"init type of batchnorm error.\\n\")\n            initer.constant_(m.bias, 0.0)\n\n        elif isinstance(m, nn.Linear):\n            if linear == 'kaiming':\n                initer.kaiming_normal_(m.weight)\n            elif linear == 'xavier':\n                initer.xavier_normal_(m.weight)\n            else:\n                raise ValueError(\"init type of linear error.\\n\")\n            if m.bias is not None:\n                initer.constant_(m.bias, 0)\n\n        elif isinstance(m, nn.LSTM):\n            for name, param in m.named_parameters():\n                if 'weight' in name:\n                    if lstm == 'kaiming':\n                        initer.kaiming_normal_(param)\n                    elif lstm == 'xavier':\n                        initer.xavier_normal_(param)\n                    else:\n                        raise ValueError(\"init type of lstm error.\\n\")\n                elif 'bias' in name:\n                    initer.constant_(param, 0)\n\n\ndef convert_to_syncbn(model):\n    def recursive_set(cur_module, name, module):\n        if len(name.split('.')) > 1:\n            recursive_set(getattr(cur_module, name[:name.find('.')]), name[name.find('.')+1:], module)\n        else:\n            setattr(cur_module, name, module)\n    from lib.sync_bn import SynchronizedBatchNorm1d, SynchronizedBatchNorm2d, SynchronizedBatchNorm3d\n    for name, m in model.named_modules():\n        if isinstance(m, nn.BatchNorm1d):\n            recursive_set(model, name, SynchronizedBatchNorm1d(m.num_features, m.eps, m.momentum, m.affine))\n        elif isinstance(m, nn.BatchNorm2d):\n            recursive_set(model, name, SynchronizedBatchNorm2d(m.num_features, m.eps, m.momentum, m.affine))\n        elif isinstance(m, nn.BatchNorm3d):\n            recursive_set(model, name, SynchronizedBatchNorm3d(m.num_features, m.eps, m.momentum, m.affine))\n\n\ndef colorize(gray, palette):\n    # gray: numpy array of the label and 1*3N size list palette\n    color = Image.fromarray(gray.astype(np.uint8)).convert('P')\n    color.putpalette(palette)\n    return color\n"
  }
]