[
  {
    "path": ".gitignore",
    "content": ".spyproject\n__pycache__\nlogs\ncache\n"
  },
  {
    "path": "LICENSE",
    "content": "MIT License\n\nCopyright (c) 2022 Gabriele Berton, Carlo Masone, Barbara Caputo\n\nPermission is hereby granted, free of charge, to any person obtaining a copy\nof this software and associated documentation files (the \"Software\"), to deal\nin the Software without restriction, including without limitation the rights\nto use, copy, modify, merge, publish, distribute, sublicense, and/or sell\ncopies of the Software, and to permit persons to whom the Software is\nfurnished to do so, subject to the following conditions:\n\nThe above copyright notice and this permission notice shall be included in all\ncopies or substantial portions of the Software.\n\nTHE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\nIMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\nAUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\nLIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\nOUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\nSOFTWARE."
  },
  {
    "path": "README.md",
    "content": "\n# Rethinking Visual Geo-localization for Large-Scale Applications\n\nThis is the official pyTorch implementation of the CVPR 2022 paper \"Rethinking Visual Geo-localization for Large-Scale Applications\".\nThe paper presents a new dataset called San Francisco eXtra Large (SF-XL, go [_here_](https://forms.gle/wpyDzhDyoWLQygAT9) to download it), and a highly scalable training method (called CosPlace), which allows to reach SOTA results with compact descriptors.\n\n\n[[CVPR OpenAccess](https://openaccess.thecvf.com/content/CVPR2022/html/Berton_Rethinking_Visual_Geo-Localization_for_Large-Scale_Applications_CVPR_2022_paper.html)] [[ArXiv](https://arxiv.org/abs/2204.02287)] [[Video](https://www.youtube.com/watch?v=oDyL6oVNN3I)] [[BibTex](https://github.com/gmberton/CosPlace?tab=readme-ov-file#cite)]\n\nNote that CosPlace is quite old. **🚀 Looking for SOTA Visual Place Recognition (VPR)? Check out [MegaLoc](https://github.com/gmberton/MegaLoc)**\n\nThe images below represent respectively:\n1) the map of San Francisco eXtra Large\n2) a visualization of how CosPlace Groups (read datasets) are formed\n3) results with CosPlace vs other methods on Pitts250k (CosPlace trained on SF-XL, others on Pitts30k)\n<p float=\"left\">\n  <img src=\"https://github.com/gmberton/gmberton.github.io/blob/main/images/SF-XL%20map.jpg\" height=\"150\" />\n  <img src=\"https://github.com/gmberton/gmberton.github.io/blob/main/images/map_groups.png\" height=\"150\" /> \n  <img src=\"https://github.com/gmberton/gmberton.github.io/blob/main/images/backbones_pitts250k_main.png\" height=\"150\" />\n</p>\n\n\n\n## Train\nAfter downloading the SF-XL dataset, simply run \n\n`$ python3 train.py --train_set_folder path/to/sf_xl/raw/train/database --val_set_folder path/to/sf_xl/processed/val --test_set_folder path/to/sf_xl/processed/test`\n\nthe script automatically splits SF-XL in CosPlace Groups, and saves the resulting object in the folder `cache`.\nBy default training is performed with a ResNet-18 with descriptors dimensionality 512, which fits in less than 4GB of VRAM.\n\nTo change the backbone or the output descriptors dimensionality simply run \n\n`$ python3 train.py --backbone ResNet50 --fc_output_dim 128`\n\nYou can also speed up your training with Automatic Mixed Precision (note that all results/statistics from the paper did not use AMP)\n\n`$ python3 train.py --use_amp16`\n\nRun `$ python3 train.py -h` to have a look at all the hyperparameters that you can change. You will find all hyperparameters mentioned in the paper.\n\n#### Dataset size and lightweight version\n\nThe SF-XL dataset is about 1 TB.\nFor training only a subset of the images is used, and you can use this subset for training, which is only 360 GB.\nIf this is still too heavy for you (e.g. if you're using Colab), but you would like to run CosPlace, we also created a small version of SF-XL, which is only 5 GB.\nObviously, using the small version will lead to lower results, and it should be used only for debugging / exploration purposes.\nMore information on the dataset and lightweight version are on the README that you can find on the dataset download page (go [_here_](https://forms.gle/wpyDzhDyoWLQygAT9) to find it).\n\n#### Reproducibility\nResults from the paper are fully reproducible, and we followed deep learning's best practices (average over multiple runs for the main results, validation/early stopping and hyperparameter search on the val set).\nIf you are a researcher comparing your work against ours, please make sure to follow these best practices and avoid picking the best model on the test set.\n\n\n## Test\nYou can test a trained model as such\n\n`$ python3 eval.py --backbone ResNet50 --fc_output_dim 128 --resume_model path/to/best_model.pth`\n\nYou can download plenty of trained models below.\n\n\n### Visualize predictions\n\nPredictions can be easily visualized through the `num_preds_to_save` parameter. For example running this\n\n```\npython3 eval.py --backbone ResNet50 --fc_output_dim 512 --resume_model path/to/best_model.pth \\\n    --num_preds_to_save=3 --exp_name=cosplace_on_stlucia\n```\nwill generate under the path `./logs/cosplace_on_stlucia/*/preds` images such as\n\n<p float=\"left\">\n  <img src=\"https://raw.githubusercontent.com/gmberton/VPR-methods-evaluation/master/images/pred.jpg\"  height=\"200\"/>\n</p>\n\nGiven that saving predictions for each query might take long, you can also pass the parameter `--save_only_wrong_preds` which will save only predictions for wrongly predicted queries (i.e. where the first prediction is wrong), which should be the most interesting failure cases.\n\n\n## Trained Models\n\nWe now have all our trained models on [PyTorch Hub](https://pytorch.org/docs/stable/hub.html), so that you can use them in any codebase without cloning this repository simply like this\n```\nimport torch\nmodel = torch.hub.load(\"gmberton/cosplace\", \"get_trained_model\", backbone=\"ResNet50\", fc_output_dim=2048)\n```\n\nAs an alternative, you can download the trained models from the table below, which provides links to models with different backbones and dimensionality of descriptors, trained on SF-XL.\n\n<table>\n  <tr>\n    <th rowspan=2>Model</th>\n    <th colspan=7>Dimension of Descriptors</th>\n  </tr>\n  <tr>\n    <td>32</td>\n    <td>64</td>\n    <td>128</td>\n    <td>256</td>\n    <td>512</td>\n    <td>1024</td>\n    <td>2048</td>\n  </tr>\n  <tr>\n    <td>ResNet-18</td>\n    <td><a href=\"https://drive.google.com/file/d/1tfT8r2fBeMVAEHg2bVfCql5pV9YzK620/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1-d_Yi3ly3bY6hUW1F9w144FFKsZtYBL4/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1HaQjGY5x--Ok0RcspVVjZ0bwrAVmBvrZ/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1hjkogugTsHTQ6GTuW3MHqx-t4cXqx0uo/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1rQAC2ZddDjzwB2OVqAcNgCFEf3gLNa9U/view?usp=sharing\">link</a></td>\n    <td>-</td>\n    <td>-</td>\n  </tr>\n  <tr>\n    <td>ResNet-50</td>\n    <td><a href=\"https://drive.google.com/file/d/18AxbLO66CO0kG05-1YrRb1YwqN7Wgp6Z/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1F2WMt7vMUqXBjsZDIwSga3N0l0r9NP2s/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/14U3jsoNEWC-QsINoVCWZaHFUGE20fIgZ/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1Q2sZPEJfHAe19JaZkdgeFotUYwKbV_x2/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1LgDaxCjbQqQWuk5qrPogfg7oN8Ksl1jh/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1VBLUiQJfmnZ4kVQIrXBW-AE1dZ3EnMv2/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1yNzxsMg34KO04UJ49ncANdCIWlB3aUGA/view?usp=sharing\">link</a></td>\n  </tr>\n  <tr>\n    <td>ResNet-101</td>\n    <td><a href=\"https://drive.google.com/file/d/1a5FqhujOn0Pr6duKrRknoOgz8L8ckDSE/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/17C8jBQluxsbI9d8Bzf67b5OsauOJAIuX/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1w37AztnIyGVklBMtm-lwkajb0DWbYhhc/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1G5_I4vX4s4_oiAC3EWbrCyXrCOkV8Bbs/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1uBKpNfMBt6sLIjCGfH6Orx9eQdQgN-8Z/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/12BU8BgfqFYzGLXXNaKLpaAzTHuN5I9gQ/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1PF7lsSw1sFMh-Bl_xwO74fM1InyYy1t8/view?usp=sharing\">link</a></td>\n  </tr>\n  <tr>\n    <td>ResNet-152</td>\n    <td><a href=\"https://drive.google.com/file/d/12pI1FToqKKt8I6-802CHWXDP-JmHEFSW/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1rTjlv_pNtXgxY8VELiGYvLcgXiRa2zqB/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1q5-szPBn4zL8evWmYT04wFaKjen66mrk/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1sCQMA_rsIjmD-f381I0f2yDf0At4TnSx/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1ggNYQfGSfE-dciKCS_6SKeQT76O0OXPX/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/15vBWuHVqEMxkAWWrc7IrkGsQroC65tPc/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1AlF7xPSswDLA1TdhZ9yTVBkfRnJm0Hn8/view?usp=sharing\">link</a></td>\n  </tr>\n  <tr>\n    <td>VGG-16</td>\n    <td>-</td>\n    <td><a href=\"https://drive.google.com/file/d/1YJTBwagC0v50oPydpKtsTnGZnaYOV0z-/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1vgw509lGBfJR46cGDJGkFcdBTGhIeyAH/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1-4JtACE47rkXXSAlRBFIbydimfKemdo7/view?usp=sharing\">link</a></td>\n    <td><a href=\"https://drive.google.com/file/d/1F6CT-rnAGTTexdpLoQYncn-ooqzJe6wf/view?usp=sharing\">link</a></td>\n    <td>-</td>\n    <td>-</td>\n  </tr>\n</table>\n\nOr you can download all models at once at [this link](https://drive.google.com/drive/folders/1WzSLnv05FLm-XqP5DxR5nXaaixH23uvV?usp=sharing)\n\n## Issues\nIf you have any questions regarding our code or dataset, feel free to open an issue or send an email to berton.gabri@gmail.com\n\n## Acknowledgements\nParts of this repo are inspired by the following repositories:\n- [CosFace implementation in PyTorch](https://github.com/MuggleWang/CosFace_pytorch/blob/master/layer.py)\n- [CNN Image Retrieval in PyTorch](https://github.com/filipradenovic/cnnimageretrieval-pytorch) (for the GeM layer)\n- [Visual Geo-localization benchmark](https://github.com/gmberton/deep-visual-geo-localization-benchmark) (for the evaluation / test code)\n\n## Cite\nHere is the bibtex to cite our paper\n```bibtex\n@inproceedings{Berton_CVPR_2022_CosPlace,\n    author    = {Berton, Gabriele and Masone, Carlo and Caputo, Barbara},\n    title     = {Rethinking Visual Geo-Localization for Large-Scale Applications},\n    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},\n    month     = {June},\n    year      = {2022},\n    pages     = {4878--4888}\n}\n```\n"
  },
  {
    "path": "augmentations.py",
    "content": "\nimport torch\nfrom typing import Tuple, Union\nimport torchvision.transforms as T\n\n\nclass DeviceAgnosticColorJitter(T.ColorJitter):\n    def __init__(self, brightness: float = 0., contrast: float = 0., saturation: float = 0., hue: float = 0.):\n        \"\"\"This is the same as T.ColorJitter but it only accepts batches of images and works on GPU\"\"\"\n        super().__init__(brightness=brightness, contrast=contrast, saturation=saturation, hue=hue)\n    \n    def forward(self, images: torch.Tensor) -> torch.Tensor:\n        assert len(images.shape) == 4, f\"images should be a batch of images, but it has shape {images.shape}\"\n        B, C, H, W = images.shape\n        # Applies a different color jitter to each image\n        color_jitter = super(DeviceAgnosticColorJitter, self).forward\n        augmented_images = [color_jitter(img).unsqueeze(0) for img in images]\n        augmented_images = torch.cat(augmented_images)\n        assert augmented_images.shape == torch.Size([B, C, H, W])\n        return augmented_images\n\n\nclass DeviceAgnosticRandomResizedCrop(T.RandomResizedCrop):\n    def __init__(self, size: Union[int, Tuple[int, int]], scale: float):\n        \"\"\"This is the same as T.RandomResizedCrop but it only accepts batches of images and works on GPU\"\"\"\n        super().__init__(size=size, scale=scale, antialias=True)\n    \n    def forward(self, images: torch.Tensor) -> torch.Tensor:\n        assert len(images.shape) == 4, f\"images should be a batch of images, but it has shape {images.shape}\"\n        B, C, H, W = images.shape\n        # Applies a different RandomResizedCrop to each image\n        random_resized_crop = super(DeviceAgnosticRandomResizedCrop, self).forward\n        augmented_images = [random_resized_crop(img).unsqueeze(0) for img in images]\n        augmented_images = torch.cat(augmented_images)\n        return augmented_images\n\n\nif __name__ == \"__main__\":\n    \"\"\"\n    You can run this script to visualize the transformations, and verify that\n    the augmentations are applied individually on each image of the batch.\n    \"\"\"\n    from PIL import Image\n    # Import skimage in here, so it is not necessary to install it unless you run this script\n    from skimage import data\n    \n    # Initialize DeviceAgnosticRandomResizedCrop\n    random_crop = DeviceAgnosticRandomResizedCrop(size=[256, 256], scale=[0.5, 1])\n    # Create a batch with 2 astronaut images\n    pil_image = Image.fromarray(data.astronaut())\n    tensor_image = T.functional.to_tensor(pil_image).unsqueeze(0)\n    images_batch = torch.cat([tensor_image, tensor_image])\n    # Apply augmentation (individually on each of the 2 images)\n    augmented_batch = random_crop(images_batch)\n    # Convert to PIL images\n    augmented_image_0 = T.functional.to_pil_image(augmented_batch[0])\n    augmented_image_1 = T.functional.to_pil_image(augmented_batch[1])\n    # Visualize the original image, as well as the two augmented ones\n    pil_image.show()\n    augmented_image_0.show()\n    augmented_image_1.show()\n"
  },
  {
    "path": "commons.py",
    "content": "\nimport os\nimport sys\nimport torch\nimport random\nimport logging\nimport traceback\nimport numpy as np\n\n\nclass InfiniteDataLoader(torch.utils.data.DataLoader):\n    def __init__(self, *args, **kwargs):\n        super().__init__(*args, **kwargs)\n        self.dataset_iterator = super().__iter__()\n    \n    def __iter__(self):\n        return self\n    \n    def __next__(self):\n        try:\n            batch = next(self.dataset_iterator)\n        except StopIteration:\n            self.dataset_iterator = super().__iter__()\n            batch = next(self.dataset_iterator)\n        return batch\n\n\ndef make_deterministic(seed: int = 0):\n    \"\"\"Make results deterministic. If seed == -1, do not make deterministic.\n        Running your script in a deterministic way might slow it down.\n        Note that for some packages (eg: sklearn's PCA) this function is not enough.\n    \"\"\"\n    seed = int(seed)\n    if seed == -1:\n        return\n    random.seed(seed)\n    np.random.seed(seed)\n    torch.manual_seed(seed)\n    torch.cuda.manual_seed_all(seed)\n    torch.backends.cudnn.deterministic = True\n    torch.backends.cudnn.benchmark = False\n\n\ndef setup_logging(output_folder: str, exist_ok: bool = False, console: str = \"debug\",\n                  info_filename: str = \"info.log\", debug_filename: str = \"debug.log\"):\n    \"\"\"Set up logging files and console output.\n    Creates one file for INFO logs and one for DEBUG logs.\n    Args:\n        output_folder (str): creates the folder where to save the files.\n        exist_ok (boolean): if False throw a FileExistsError if output_folder already exists\n        debug (str):\n            if == \"debug\" prints on console debug messages and higher\n            if == \"info\"  prints on console info messages and higher\n            if == None does not use console (useful when a logger has already been set)\n        info_filename (str): the name of the info file. if None, don't create info file\n        debug_filename (str): the name of the debug file. if None, don't create debug file\n    \"\"\"\n    if not exist_ok and os.path.exists(output_folder):\n        raise FileExistsError(f\"{output_folder} already exists!\")\n    os.makedirs(output_folder, exist_ok=True)\n    base_formatter = logging.Formatter('%(asctime)s   %(message)s', \"%Y-%m-%d %H:%M:%S\")\n    logger = logging.getLogger('')\n    logger.setLevel(logging.DEBUG)\n    \n    if info_filename is not None:\n        info_file_handler = logging.FileHandler(f'{output_folder}/{info_filename}')\n        info_file_handler.setLevel(logging.INFO)\n        info_file_handler.setFormatter(base_formatter)\n        logger.addHandler(info_file_handler)\n    \n    if debug_filename is not None:\n        debug_file_handler = logging.FileHandler(f'{output_folder}/{debug_filename}')\n        debug_file_handler.setLevel(logging.DEBUG)\n        debug_file_handler.setFormatter(base_formatter)\n        logger.addHandler(debug_file_handler)\n    \n    if console is not None:\n        console_handler = logging.StreamHandler()\n        if console == \"debug\":\n            console_handler.setLevel(logging.DEBUG)\n        if console == \"info\":\n            console_handler.setLevel(logging.INFO)\n        console_handler.setFormatter(base_formatter)\n        logger.addHandler(console_handler)\n    \n    def my_handler(type_, value, tb):\n        logger.info(\"\\n\" + \"\".join(traceback.format_exception(type, value, tb)))\n        logging.info(\"Experiment finished (with some errors)\")\n    sys.excepthook = my_handler\n"
  },
  {
    "path": "cosface_loss.py",
    "content": "\n# Based on https://github.com/MuggleWang/CosFace_pytorch/blob/master/layer.py\n\nimport torch\nimport torch.nn as nn\nfrom torch.nn import Parameter\n\n\ndef cosine_sim(x1: torch.Tensor, x2: torch.Tensor, dim: int = 1, eps: float = 1e-8) -> torch.Tensor:\n    ip = torch.mm(x1, x2.t())\n    w1 = torch.norm(x1, 2, dim)\n    w2 = torch.norm(x2, 2, dim)\n    return ip / torch.ger(w1, w2).clamp(min=eps)\n\n\nclass MarginCosineProduct(nn.Module):\n    \"\"\"Implement of large margin cosine distance:\n    Args:\n        in_features: size of each input sample\n        out_features: size of each output sample\n        s: norm of input feature\n        m: margin\n    \"\"\"\n    def __init__(self, in_features: int, out_features: int, s: float = 30.0, m: float = 0.40):\n        super().__init__()\n        self.in_features = in_features\n        self.out_features = out_features\n        self.s = s\n        self.m = m\n        self.weight = Parameter(torch.Tensor(out_features, in_features))\n        nn.init.xavier_uniform_(self.weight)\n    \n    def forward(self, inputs: torch.Tensor, label: torch.Tensor) -> torch.Tensor:\n        cosine = cosine_sim(inputs, self.weight)\n        one_hot = torch.zeros_like(cosine)\n        one_hot.scatter_(1, label.view(-1, 1), 1.0)\n        output = self.s * (cosine - one_hot * self.m)\n        return output\n    \n    def __repr__(self):\n        return self.__class__.__name__ + '(' \\\n               + 'in_features=' + str(self.in_features) \\\n               + ', out_features=' + str(self.out_features) \\\n               + ', s=' + str(self.s) \\\n               + ', m=' + str(self.m) + ')'\n"
  },
  {
    "path": "cosplace_model/__init__.py",
    "content": ""
  },
  {
    "path": "cosplace_model/cosplace_network.py",
    "content": "\nimport torch\nimport logging\nimport torchvision\nfrom torch import nn\nfrom typing import Tuple\n\nfrom cosplace_model.layers import Flatten, L2Norm, GeM\n\n# The number of channels in the last convolutional layer, the one before average pooling\nCHANNELS_NUM_IN_LAST_CONV = {\n    \"ResNet18\": 512,\n    \"ResNet50\": 2048,\n    \"ResNet101\": 2048,\n    \"ResNet152\": 2048,\n    \"VGG16\": 512,\n    \"EfficientNet_B0\": 1280,\n    \"EfficientNet_B1\": 1280,\n    \"EfficientNet_B2\": 1408,\n    \"EfficientNet_B3\": 1536,\n    \"EfficientNet_B4\": 1792,\n    \"EfficientNet_B5\": 2048,\n    \"EfficientNet_B6\": 2304,\n    \"EfficientNet_B7\": 2560,\n}\n\n\nclass GeoLocalizationNet(nn.Module):\n    def __init__(self, backbone : str, fc_output_dim : int, train_all_layers : bool = False):\n        \"\"\"Return a model for GeoLocalization.\n        \n        Args:\n            backbone (str): which torchvision backbone to use. Must be VGG16 or a ResNet.\n            fc_output_dim (int): the output dimension of the last fc layer, equivalent to the descriptors dimension.\n            train_all_layers (bool): whether to freeze the first layers of the backbone during training or not.\n        \"\"\"\n        super().__init__()\n        assert backbone in CHANNELS_NUM_IN_LAST_CONV, f\"backbone must be one of {list(CHANNELS_NUM_IN_LAST_CONV.keys())}\"\n        self.backbone, features_dim = get_backbone(backbone, train_all_layers)\n        self.aggregation = nn.Sequential(\n            L2Norm(),\n            GeM(),\n            Flatten(),\n            nn.Linear(features_dim, fc_output_dim),\n            L2Norm()\n        )\n    \n    def forward(self, x):\n        x = self.backbone(x)\n        x = self.aggregation(x)\n        return x\n\n\ndef get_pretrained_torchvision_model(backbone_name : str) -> torch.nn.Module:\n    \"\"\"This function takes the name of a backbone and returns the corresponding pretrained\n    model from torchvision. Examples of backbone_name are 'VGG16' or 'ResNet18'\n    \"\"\"\n    try:  # Newer versions of pytorch require to pass weights=weights_module.DEFAULT\n        weights_module = getattr(__import__('torchvision.models', fromlist=[f\"{backbone_name}_Weights\"]), f\"{backbone_name}_Weights\")\n        model = getattr(torchvision.models, backbone_name.lower())(weights=weights_module.DEFAULT)\n    except (ImportError, AttributeError):  # Older versions of pytorch require to pass pretrained=True\n        model = getattr(torchvision.models, backbone_name.lower())(pretrained=True)\n    return model\n\n\ndef get_backbone(backbone_name : str, train_all_layers : bool) -> Tuple[torch.nn.Module, int]:\n    backbone = get_pretrained_torchvision_model(backbone_name)\n    if backbone_name.startswith(\"ResNet\"):\n        if train_all_layers:\n            logging.debug(f\"Train all layers of the {backbone_name}\")\n        else:\n            for name, child in backbone.named_children():\n                if name == \"layer3\":  # Freeze layers before conv_3\n                    break\n                for params in child.parameters():\n                    params.requires_grad = False\n            logging.debug(f\"Train only layer3 and layer4 of the {backbone_name}, freeze the previous ones\")\n\n        layers = list(backbone.children())[:-2]  # Remove avg pooling and FC layer\n    \n    elif backbone_name == \"VGG16\":\n        layers = list(backbone.features.children())[:-2]  # Remove avg pooling and FC layer\n        if train_all_layers:\n            logging.debug(\"Train all layers of the VGG-16\")\n        else:\n            for layer in layers[:-5]:\n                for p in layer.parameters():\n                    p.requires_grad = False\n            logging.debug(\"Train last layers of the VGG-16, freeze the previous ones\")\n\n    elif backbone_name.startswith(\"EfficientNet\"):\n        if train_all_layers:\n            logging.debug(f\"Train all layers of the {backbone_name}\")\n        else:\n            for name, child in backbone.features.named_children():\n                if name == \"5\": # Freeze layers before block 5\n                    break\n                for params in child.parameters():\n                    params.requires_grad = False\n            logging.debug(f\"Train only the last three blocks of the {backbone_name}, freeze the previous ones\")\n        layers = list(backbone.children())[:-2] # Remove avg pooling and FC layer\n    \n    backbone = torch.nn.Sequential(*layers)\n    features_dim = CHANNELS_NUM_IN_LAST_CONV[backbone_name]\n    \n    return backbone, features_dim\n"
  },
  {
    "path": "cosplace_model/layers.py",
    "content": "\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.nn.parameter import Parameter\n\n\ndef gem(x, p=torch.ones(1)*3, eps: float = 1e-6):\n    return F.avg_pool2d(x.clamp(min=eps).pow(p), (x.size(-2), x.size(-1))).pow(1./p)\n\n\nclass GeM(nn.Module):\n    def __init__(self, p=3, eps=1e-6):\n        super().__init__()\n        self.p = Parameter(torch.ones(1)*p)\n        self.eps = eps\n    \n    def forward(self, x):\n        return gem(x, p=self.p, eps=self.eps)\n    \n    def __repr__(self):\n        return f\"{self.__class__.__name__}(p={self.p.data.tolist()[0]:.4f}, eps={self.eps})\"\n\n\nclass Flatten(torch.nn.Module):\n    def __init__(self):\n        super().__init__()\n    \n    def forward(self, x):\n        assert x.shape[2] == x.shape[3] == 1, f\"{x.shape[2]} != {x.shape[3]} != 1\"\n        return x[:, :, 0, 0]\n\n\nclass L2Norm(nn.Module):\n    def __init__(self, dim=1):\n        super().__init__()\n        self.dim = dim\n    \n    def forward(self, x):\n        return F.normalize(x, p=2.0, dim=self.dim)\n"
  },
  {
    "path": "datasets/__init__.py",
    "content": ""
  },
  {
    "path": "datasets/dataset_utils.py",
    "content": "\nimport os\nimport logging\nfrom glob import glob\nfrom PIL import ImageFile\n\nImageFile.LOAD_TRUNCATED_IMAGES = True\n\n\ndef read_images_paths(dataset_folder, get_abs_path=False):\n    \"\"\"Find images within 'dataset_folder' and return their relative paths as a list.\n    If there is a file 'dataset_folder'_images_paths.txt, read paths from such file.\n    Otherwise, use glob(). Keeping the paths in the file speeds up computation,\n    because using glob over large folders can be slow.\n    \n    Parameters\n    ----------\n    dataset_folder : str, folder containing JPEG images\n    get_abs_path : bool, if True return absolute paths, otherwise remove\n        dataset_folder from each path\n    \n    Returns\n    -------\n    images_paths : list[str], paths of JPEG images within dataset_folder\n    \"\"\"\n    \n    if not os.path.exists(dataset_folder):\n        raise FileNotFoundError(f\"Folder {dataset_folder} does not exist\")\n    \n    file_with_paths = dataset_folder + \"_images_paths.txt\"\n    if os.path.exists(file_with_paths):\n        logging.debug(f\"Reading paths of images within {dataset_folder} from {file_with_paths}\")\n        with open(file_with_paths, \"r\") as file:\n            images_paths = file.read().splitlines()\n        images_paths = [os.path.join(dataset_folder, path) for path in images_paths]\n        # Sanity check that paths within the file exist\n        if not os.path.exists(images_paths[0]):\n            raise FileNotFoundError(f\"Image with path {images_paths[0]} \"\n                                    f\"does not exist within {dataset_folder}. It is likely \"\n                                    f\"that the content of {file_with_paths} is wrong.\")\n    else:\n        logging.debug(f\"Searching images in {dataset_folder} with glob()\")\n        images_paths = sorted(glob(f\"{dataset_folder}/**/*.jpg\", recursive=True))\n        if len(images_paths) == 0:\n            raise FileNotFoundError(f\"Directory {dataset_folder} does not contain any JPEG images\")\n    \n    if not get_abs_path:  # Remove dataset_folder from the path\n        images_paths = [p[len(dataset_folder) + 1:] for p in images_paths]\n    \n    return images_paths\n\n"
  },
  {
    "path": "datasets/test_dataset.py",
    "content": "\nimport os\nimport numpy as np\nfrom PIL import Image\nimport torch.utils.data as data\nimport torchvision.transforms as transforms\nfrom sklearn.neighbors import NearestNeighbors\n\nimport datasets.dataset_utils as dataset_utils\n\n\nclass TestDataset(data.Dataset):\n    def __init__(self, dataset_folder, database_folder=\"database\",\n                 queries_folder=\"queries\", positive_dist_threshold=25,\n                 image_size=512, resize_test_imgs=False):\n        self.database_folder = dataset_folder + \"/\" + database_folder\n        self.queries_folder = dataset_folder + \"/\" + queries_folder\n        self.database_paths = dataset_utils.read_images_paths(self.database_folder, get_abs_path=True)\n        self.queries_paths = dataset_utils.read_images_paths(self.queries_folder, get_abs_path=True)\n        \n        self.dataset_name = os.path.basename(dataset_folder)\n        \n        #### Read paths and UTM coordinates for all images.\n        # The format must be path/to/file/@utm_easting@utm_northing@...@.jpg\n        self.database_utms = np.array([(path.split(\"@\")[1], path.split(\"@\")[2]) for path in self.database_paths]).astype(float)\n        self.queries_utms = np.array([(path.split(\"@\")[1], path.split(\"@\")[2]) for path in self.queries_paths]).astype(float)\n        \n        # Find positives_per_query, which are within positive_dist_threshold (default 25 meters)\n        knn = NearestNeighbors(n_jobs=-1)\n        knn.fit(self.database_utms)\n        self.positives_per_query = knn.radius_neighbors(\n            self.queries_utms, radius=positive_dist_threshold, return_distance=False\n        )\n        \n        self.images_paths = self.database_paths + self.queries_paths\n        \n        self.database_num = len(self.database_paths)\n        self.queries_num = len(self.queries_paths)\n\n        transforms_list = []\n        if resize_test_imgs:\n            # Resize to image_size along the shorter side while maintaining aspect ratio\n            transforms_list += [transforms.Resize(image_size, antialias=True)]\n        transforms_list += [\n                transforms.ToTensor(),\n                transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n            ]\n        self.base_transform = transforms.Compose(transforms_list)\n    \n    @staticmethod\n    def open_image(path):\n        return Image.open(path).convert(\"RGB\")\n    \n    def __getitem__(self, index):\n        image_path = self.images_paths[index]\n        pil_img = TestDataset.open_image(image_path)\n        normalized_img = self.base_transform(pil_img)\n        return normalized_img, index\n    \n    def __len__(self):\n        return len(self.images_paths)\n    \n    def __repr__(self):\n        return f\"< {self.dataset_name} - #q: {self.queries_num}; #db: {self.database_num} >\"\n    \n    def get_positives(self):\n        return self.positives_per_query\n"
  },
  {
    "path": "datasets/train_dataset.py",
    "content": "\nimport os\nimport torch\nimport random\nimport logging\nimport numpy as np\nfrom PIL import Image\nfrom PIL import ImageFile\nimport torchvision.transforms as T\nfrom collections import defaultdict\n\nimport datasets.dataset_utils as dataset_utils\n\n\nImageFile.LOAD_TRUNCATED_IMAGES = True\n\n\nclass TrainDataset(torch.utils.data.Dataset):\n    def __init__(self, args, dataset_folder, M=10, alpha=30, N=5, L=2,\n                 current_group=0, min_images_per_class=10):\n        \"\"\"\n        Parameters (please check our paper for a clearer explanation of the parameters).\n        ----------\n        args : args for data augmentation\n        dataset_folder : str, the path of the folder with the train images.\n        M : int, the length of the side of each cell in meters.\n        alpha : int, size of each class in degrees.\n        N : int, distance (M-wise) between two classes of the same group.\n        L : int, distance (alpha-wise) between two classes of the same group.\n        current_group : int, which one of the groups to consider.\n        min_images_per_class : int, minimum number of image in a class.\n        \"\"\"\n        super().__init__()\n        self.M = M\n        self.alpha = alpha\n        self.N = N\n        self.L = L\n        self.current_group = current_group\n        self.dataset_folder = dataset_folder\n        self.augmentation_device = args.augmentation_device\n        \n        # dataset_name should be either \"processed\", \"small\" or \"raw\", if you're using SF-XL\n        dataset_name = os.path.basename(dataset_folder)\n        filename = f\"cache/{dataset_name}_M{M}_N{N}_alpha{alpha}_L{L}_mipc{min_images_per_class}.torch\"\n        if not os.path.exists(filename):\n            os.makedirs(\"cache\", exist_ok=True)\n            logging.info(f\"Cached dataset {filename} does not exist, I'll create it now.\")\n            self.initialize(dataset_folder, M, N, alpha, L, min_images_per_class, filename)\n        elif current_group == 0:\n            logging.info(f\"Using cached dataset {filename}\")\n        \n        classes_per_group, self.images_per_class = torch.load(filename)\n        if current_group >= len(classes_per_group):\n            raise ValueError(f\"With this configuration there are only {len(classes_per_group)} \" +\n                             f\"groups, therefore I can't create the {current_group}th group. \" +\n                             \"You should reduce the number of groups by setting for example \" +\n                             f\"'--groups_num {current_group}'\")\n        self.classes_ids = classes_per_group[current_group]\n        \n        if self.augmentation_device == \"cpu\":\n            self.transform = T.Compose([\n                    T.ColorJitter(brightness=args.brightness,\n                                  contrast=args.contrast,\n                                  saturation=args.saturation,\n                                  hue=args.hue),\n                    T.RandomResizedCrop([args.image_size, args.image_size], scale=[1-args.random_resized_crop, 1], antialias=True),\n                    T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n                ])\n    \n    @staticmethod\n    def open_image(path):\n        return Image.open(path).convert(\"RGB\")\n    \n    def __getitem__(self, class_num):\n        # This function takes as input the class_num instead of the index of\n        # the image. This way each class is equally represented during training.\n        \n        class_id = self.classes_ids[class_num]\n        # Pick a random image among those in this class.\n        image_path = os.path.join(self.dataset_folder, random.choice(self.images_per_class[class_id]))\n        \n        try:\n            pil_image = TrainDataset.open_image(image_path)\n        except Exception as e:\n            logging.info(f\"ERROR image {image_path} couldn't be opened, it might be corrupted.\")\n            raise e\n        \n        tensor_image = T.functional.to_tensor(pil_image)\n        assert tensor_image.shape == torch.Size([3, 512, 512]), \\\n            f\"Image {image_path} should have shape [3, 512, 512] but has {tensor_image.shape}.\"\n        \n        if self.augmentation_device == \"cpu\":\n            tensor_image = self.transform(tensor_image)\n        \n        return tensor_image, class_num, image_path\n    \n    def get_images_num(self):\n        \"\"\"Return the number of images within this group.\"\"\"\n        return sum([len(self.images_per_class[c]) for c in self.classes_ids])\n    \n    def __len__(self):\n        \"\"\"Return the number of classes within this group.\"\"\"\n        return len(self.classes_ids)\n    \n    @staticmethod\n    def initialize(dataset_folder, M, N, alpha, L, min_images_per_class, filename):\n        logging.debug(f\"Searching training images in {dataset_folder}\")\n        \n        images_paths = dataset_utils.read_images_paths(dataset_folder)\n        logging.debug(f\"Found {len(images_paths)} images\")\n        \n        logging.debug(\"For each image, get its UTM east, UTM north and heading from its path\")\n        images_metadatas = [p.split(\"@\") for p in images_paths]\n        # field 1 is UTM east, field 2 is UTM north, field 9 is heading\n        utmeast_utmnorth_heading = [(m[1], m[2], m[9]) for m in images_metadatas]\n        utmeast_utmnorth_heading = np.array(utmeast_utmnorth_heading).astype(np.float64)\n        \n        logging.debug(\"For each image, get class and group to which it belongs\")\n        class_id__group_id = [TrainDataset.get__class_id__group_id(*m, M, alpha, N, L)\n                              for m in utmeast_utmnorth_heading]\n        \n        logging.debug(\"Group together images belonging to the same class\")\n        images_per_class = defaultdict(list)\n        for image_path, (class_id, _) in zip(images_paths, class_id__group_id):\n            images_per_class[class_id].append(image_path)\n        \n        # Images_per_class is a dict where the key is class_id, and the value\n        # is a list with the paths of images within that class.\n        images_per_class = {k: v for k, v in images_per_class.items() if len(v) >= min_images_per_class}\n        \n        logging.debug(\"Group together classes belonging to the same group\")\n        # Classes_per_group is a dict where the key is group_id, and the value\n        # is a list with the class_ids belonging to that group.\n        classes_per_group = defaultdict(set)\n        for class_id, group_id in class_id__group_id:\n            if class_id not in images_per_class:\n                continue  # Skip classes with too few images\n            classes_per_group[group_id].add(class_id)\n        \n        # Convert classes_per_group to a list of lists.\n        # Each sublist represents the classes within a group.\n        classes_per_group = [list(c) for c in classes_per_group.values()]\n        \n        torch.save((classes_per_group, images_per_class), filename)\n    \n    @staticmethod\n    def get__class_id__group_id(utm_east, utm_north, heading, M, alpha, N, L):\n        \"\"\"Return class_id and group_id for a given point.\n            The class_id is a triplet (tuple) of UTM_east, UTM_north and\n            heading (e.g. (396520, 4983800,120)).\n            The group_id represents the group to which the class belongs\n            (e.g. (0, 1, 0)), and it is between (0, 0, 0) and (N, N, L).\n        \"\"\"\n        rounded_utm_east = int(utm_east // M * M)  # Rounded to nearest lower multiple of M\n        rounded_utm_north = int(utm_north // M * M)\n        rounded_heading = int(heading // alpha * alpha)\n        \n        class_id = (rounded_utm_east, rounded_utm_north, rounded_heading)\n        # group_id goes from (0, 0, 0) to (N, N, L)\n        group_id = (rounded_utm_east % (M * N) // M,\n                    rounded_utm_north % (M * N) // M,\n                    rounded_heading % (alpha * L) // alpha)\n        return class_id, group_id\n"
  },
  {
    "path": "eval.py",
    "content": "\nimport sys\nimport torch\nimport logging\nimport multiprocessing\nfrom datetime import datetime\n\nimport test\nimport parser\nimport commons\nfrom cosplace_model import cosplace_network\nfrom datasets.test_dataset import TestDataset\n\ntorch.backends.cudnn.benchmark = True  # Provides a speedup\n\nargs = parser.parse_arguments(is_training=False)\nstart_time = datetime.now()\nargs.output_folder = f\"logs/{args.save_dir}/{start_time.strftime('%Y-%m-%d_%H-%M-%S')}\"\ncommons.make_deterministic(args.seed)\ncommons.setup_logging(args.output_folder, console=\"info\")\nlogging.info(\" \".join(sys.argv))\nlogging.info(f\"Arguments: {args}\")\nlogging.info(f\"The outputs are being saved in {args.output_folder}\")\n\n#### Model\nmodel = cosplace_network.GeoLocalizationNet(args.backbone, args.fc_output_dim)\n\nlogging.info(f\"There are {torch.cuda.device_count()} GPUs and {multiprocessing.cpu_count()} CPUs.\")\n\nif args.resume_model is not None:\n    logging.info(f\"Loading model from {args.resume_model}\")\n    model_state_dict = torch.load(args.resume_model)\n    model.load_state_dict(model_state_dict)\nelse:\n    logging.info(\"WARNING: You didn't provide a path to resume the model (--resume_model parameter). \" +\n                 \"Evaluation will be computed using randomly initialized weights.\")\n\nmodel = model.to(args.device)\n\ntest_ds = TestDataset(args.test_set_folder, queries_folder=\"queries_v1\",\n                      positive_dist_threshold=args.positive_dist_threshold)\n\nrecalls, recalls_str = test.test(args, test_ds, model, args.num_preds_to_save)\nlogging.info(f\"{test_ds}: {recalls_str}\")\n"
  },
  {
    "path": "hubconf.py",
    "content": "\ndependencies = ['torch', 'torchvision']\n\nimport torch\nfrom cosplace_model import cosplace_network\n\n\nAVAILABLE_TRAINED_MODELS = {\n    # backbone : list of available fc_output_dim, which is equivalent to descriptors dimensionality\n    \"VGG16\":     [    64, 128, 256, 512],\n    \"ResNet18\":  [32, 64, 128, 256, 512],\n    \"ResNet50\":  [32, 64, 128, 256, 512, 1024, 2048],\n    \"ResNet101\": [32, 64, 128, 256, 512, 1024, 2048],\n    \"ResNet152\": [32, 64, 128, 256, 512, 1024, 2048],\n}\n\n\ndef get_trained_model(backbone : str = \"ResNet50\", fc_output_dim : int = 2048) -> torch.nn.Module:\n    \"\"\"Return a model trained with CosPlace on San Francisco eXtra Large.\n    \n    Args:\n        backbone (str): which torchvision backbone to use. Must be VGG16 or a ResNet.\n        fc_output_dim (int): the output dimension of the last fc layer, equivalent to\n            the descriptors dimension. Must be between 32 and 2048, depending on model's availability.\n    \n    Return:\n        model (torch.nn.Module): a trained model.\n    \"\"\"\n    print(f\"Returning CosPlace model with backbone: {backbone} with features dimension {fc_output_dim}\")\n    if backbone not in AVAILABLE_TRAINED_MODELS:\n        raise ValueError(f\"Parameter `backbone` is set to {backbone} but it must be one of {list(AVAILABLE_TRAINED_MODELS.keys())}\")\n    try:\n        fc_output_dim = int(fc_output_dim)\n    except:\n        raise ValueError(f\"Parameter `fc_output_dim` must be an integer, but it is set to {fc_output_dim}\")\n    if fc_output_dim not in AVAILABLE_TRAINED_MODELS[backbone]:\n        raise ValueError(f\"Parameter `fc_output_dim` is set to {fc_output_dim}, but for backbone {backbone} \"\n                         f\"it must be one of {list(AVAILABLE_TRAINED_MODELS[backbone])}\")\n    model = cosplace_network.GeoLocalizationNet(backbone, fc_output_dim)\n    model.load_state_dict(\n        torch.hub.load_state_dict_from_url(\n            f'https://github.com/gmberton/CosPlace/releases/download/v1.0/{backbone}_{fc_output_dim}_cosplace.pth',\n        map_location=torch.device('cpu'))\n    )\n    return model\n"
  },
  {
    "path": "parser.py",
    "content": "\nimport argparse\n\n\ndef parse_arguments(is_training: bool = True):\n    parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)\n    # CosPlace Groups parameters\n    parser.add_argument(\"--M\", type=int, default=10, help=\"_\")\n    parser.add_argument(\"--alpha\", type=int, default=30, help=\"_\")\n    parser.add_argument(\"--N\", type=int, default=5, help=\"_\")\n    parser.add_argument(\"--L\", type=int, default=2, help=\"_\")\n    parser.add_argument(\"--groups_num\", type=int, default=8, help=\"_\")\n    parser.add_argument(\"--min_images_per_class\", type=int, default=10, help=\"_\")\n    # Model parameters\n    parser.add_argument(\"--backbone\", type=str, default=\"ResNet18\",\n                        choices=[\"VGG16\",\n                                 \"ResNet18\", \"ResNet50\", \"ResNet101\", \"ResNet152\",\n                                 \"EfficientNet_B0\", \"EfficientNet_B1\", \"EfficientNet_B2\",\n                                 \"EfficientNet_B3\", \"EfficientNet_B4\", \"EfficientNet_B5\", \n                                 \"EfficientNet_B6\", \"EfficientNet_B7\"], help=\"_\")\n    parser.add_argument(\"--fc_output_dim\", type=int, default=512,\n                        help=\"Output dimension of final fully connected layer\")\n    parser.add_argument(\"--train_all_layers\", default=False, action=\"store_true\",\n                        help=\"If true, train all layers of the backbone\")\n    # Training parameters\n    parser.add_argument(\"--use_amp16\", action=\"store_true\",\n                        help=\"use Automatic Mixed Precision\")\n    parser.add_argument(\"--augmentation_device\", type=str, default=\"cuda\",\n                        choices=[\"cuda\", \"cpu\"],\n                        help=\"on which device to run data augmentation\")\n    parser.add_argument(\"--batch_size\", type=int, default=32, help=\"_\")\n    parser.add_argument(\"--epochs_num\", type=int, default=50, help=\"_\")\n    parser.add_argument(\"--iterations_per_epoch\", type=int, default=10000, help=\"_\")\n    parser.add_argument(\"--lr\", type=float, default=0.00001, help=\"_\")\n    parser.add_argument(\"--classifiers_lr\", type=float, default=0.01, help=\"_\")\n    parser.add_argument(\"--image_size\", type=int, default=512,\n                        help=\"Width and height of training images (1:1 aspect ratio))\")\n    parser.add_argument(\"--resize_test_imgs\", default=False, action=\"store_true\",\n                        help=\"If the test images should be resized to image_size along\"\n                          \"the shorter side while maintaining aspect ratio\")\n    # Data augmentation\n    parser.add_argument(\"--brightness\", type=float, default=0.7, help=\"_\")\n    parser.add_argument(\"--contrast\", type=float, default=0.7, help=\"_\")\n    parser.add_argument(\"--hue\", type=float, default=0.5, help=\"_\")\n    parser.add_argument(\"--saturation\", type=float, default=0.7, help=\"_\")\n    parser.add_argument(\"--random_resized_crop\", type=float, default=0.5, help=\"_\")\n    # Validation / test parameters\n    parser.add_argument(\"--infer_batch_size\", type=int, default=16,\n                        help=\"Batch size for inference (validating and testing)\")\n    parser.add_argument(\"--positive_dist_threshold\", type=int, default=25,\n                        help=\"distance in meters for a prediction to be considered a positive\")\n    # Resume parameters\n    parser.add_argument(\"--resume_train\", type=str, default=None,\n                        help=\"path to checkpoint to resume, e.g. logs/.../last_checkpoint.pth\")\n    parser.add_argument(\"--resume_model\", type=str, default=None,\n                        help=\"path to model to resume, e.g. logs/.../best_model.pth\")\n    # Other parameters\n    parser.add_argument(\"--device\", type=str, default=\"cuda\",\n                        choices=[\"cuda\", \"cpu\"], help=\"_\")\n    parser.add_argument(\"--seed\", type=int, default=0, help=\"_\")\n    parser.add_argument(\"--num_workers\", type=int, default=8, help=\"_\")\n    parser.add_argument(\"--num_preds_to_save\", type=int, default=0,\n                        help=\"At the end of training, save N preds for each query. \"\n                        \"Try with a small number like 3\")\n    parser.add_argument(\"--save_only_wrong_preds\", action=\"store_true\",\n                        help=\"When saving preds (if num_preds_to_save != 0) save only \"\n                        \"preds for difficult queries, i.e. with uncorrect first prediction\")\n    # Paths parameters\n    if is_training:  # train and val sets are needed only for training\n        parser.add_argument(\"--train_set_folder\", type=str, required=True,\n                            help=\"path of the folder with training images\")\n        parser.add_argument(\"--val_set_folder\", type=str, required=True,\n                            help=\"path of the folder with val images (split in database/queries)\")\n    parser.add_argument(\"--test_set_folder\", type=str, required=True,\n                        help=\"path of the folder with test images (split in database/queries)\")\n    parser.add_argument(\"--save_dir\", type=str, default=\"default\",\n                        help=\"name of directory on which to save the logs, under logs/save_dir\")\n    \n    args = parser.parse_args()\n    \n    return args\n\n"
  },
  {
    "path": "requirements.txt",
    "content": "faiss_cpu>=1.7.1\nnumpy>=1.21.2\nPillow>=9.0.1\nscikit_learn>=1.0.2\ntorch>=1.8.2\ntorchvision>=0.9.2\ntqdm>=4.62.3\nutm>=0.7.0\n"
  },
  {
    "path": "test.py",
    "content": "\nimport faiss\nimport torch\nimport logging\nimport numpy as np\nfrom tqdm import tqdm\nfrom typing import Tuple\nfrom argparse import Namespace\nfrom torch.utils.data.dataset import Subset\nfrom torch.utils.data import DataLoader, Dataset\n\nimport visualizations\n\n\n# Compute R@1, R@5, R@10, R@20\nRECALL_VALUES = [1, 5, 10, 20]\n\n\ndef test(args: Namespace, eval_ds: Dataset, model: torch.nn.Module,\n         num_preds_to_save: int = 0) -> Tuple[np.ndarray, str]:\n    \"\"\"Compute descriptors of the given dataset and compute the recalls.\"\"\"\n    \n    model = model.eval()\n    with torch.no_grad():\n        logging.debug(\"Extracting database descriptors for evaluation/testing\")\n        database_subset_ds = Subset(eval_ds, list(range(eval_ds.database_num)))\n        database_dataloader = DataLoader(dataset=database_subset_ds, num_workers=args.num_workers,\n                                         batch_size=args.infer_batch_size, pin_memory=(args.device == \"cuda\"))\n        all_descriptors = np.empty((len(eval_ds), args.fc_output_dim), dtype=\"float32\")\n        for images, indices in tqdm(database_dataloader, ncols=100):\n            descriptors = model(images.to(args.device))\n            descriptors = descriptors.cpu().numpy()\n            all_descriptors[indices.numpy(), :] = descriptors\n        \n        logging.debug(\"Extracting queries descriptors for evaluation/testing using batch size 1\")\n        queries_infer_batch_size = 1\n        queries_subset_ds = Subset(eval_ds, list(range(eval_ds.database_num, eval_ds.database_num+eval_ds.queries_num)))\n        queries_dataloader = DataLoader(dataset=queries_subset_ds, num_workers=args.num_workers,\n                                        batch_size=queries_infer_batch_size, pin_memory=(args.device == \"cuda\"))\n        for images, indices in tqdm(queries_dataloader, ncols=100):\n            descriptors = model(images.to(args.device))\n            descriptors = descriptors.cpu().numpy()\n            all_descriptors[indices.numpy(), :] = descriptors\n    \n    queries_descriptors = all_descriptors[eval_ds.database_num:]\n    database_descriptors = all_descriptors[:eval_ds.database_num]\n    \n    # Use a kNN to find predictions\n    faiss_index = faiss.IndexFlatL2(args.fc_output_dim)\n    faiss_index.add(database_descriptors)\n    del database_descriptors, all_descriptors\n    \n    logging.debug(\"Calculating recalls\")\n    _, predictions = faiss_index.search(queries_descriptors, max(RECALL_VALUES))\n    \n    #### For each query, check if the predictions are correct\n    positives_per_query = eval_ds.get_positives()\n    recalls = np.zeros(len(RECALL_VALUES))\n    for query_index, preds in enumerate(predictions):\n        for i, n in enumerate(RECALL_VALUES):\n            if np.any(np.in1d(preds[:n], positives_per_query[query_index])):\n                recalls[i:] += 1\n                break\n    \n    # Divide by queries_num and multiply by 100, so the recalls are in percentages\n    recalls = recalls / eval_ds.queries_num * 100\n    recalls_str = \", \".join([f\"R@{val}: {rec:.1f}\" for val, rec in zip(RECALL_VALUES, recalls)])\n    \n    # Save visualizations of predictions\n    if num_preds_to_save != 0:\n        # For each query save num_preds_to_save predictions\n        visualizations.save_preds(predictions[:, :num_preds_to_save], eval_ds, args.output_folder, args.save_only_wrong_preds)\n    \n    return recalls, recalls_str\n"
  },
  {
    "path": "train.py",
    "content": "\nimport sys\nimport torch\nimport logging\nimport numpy as np\nfrom tqdm import tqdm\nimport multiprocessing\nfrom datetime import datetime\nimport torchvision.transforms as T\n\nimport test\nimport util\nimport parser\nimport commons\nimport cosface_loss\nimport augmentations\nfrom cosplace_model import cosplace_network\nfrom datasets.test_dataset import TestDataset\nfrom datasets.train_dataset import TrainDataset\n\ntorch.backends.cudnn.benchmark = True  # Provides a speedup\n\nargs = parser.parse_arguments()\nstart_time = datetime.now()\nargs.output_folder = f\"logs/{args.save_dir}/{start_time.strftime('%Y-%m-%d_%H-%M-%S')}\"\ncommons.make_deterministic(args.seed)\ncommons.setup_logging(args.output_folder, console=\"debug\")\nlogging.info(\" \".join(sys.argv))\nlogging.info(f\"Arguments: {args}\")\nlogging.info(f\"The outputs are being saved in {args.output_folder}\")\n\n#### Model\nmodel = cosplace_network.GeoLocalizationNet(args.backbone, args.fc_output_dim, args.train_all_layers)\n\nlogging.info(f\"There are {torch.cuda.device_count()} GPUs and {multiprocessing.cpu_count()} CPUs.\")\n\nif args.resume_model is not None:\n    logging.debug(f\"Loading model from {args.resume_model}\")\n    model_state_dict = torch.load(args.resume_model)\n    model.load_state_dict(model_state_dict)\n\nmodel = model.to(args.device).train()\n\n#### Optimizer\ncriterion = torch.nn.CrossEntropyLoss()\nmodel_optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)\n\n#### Datasets\ngroups = [TrainDataset(args, args.train_set_folder, M=args.M, alpha=args.alpha, N=args.N, L=args.L,\n                       current_group=n, min_images_per_class=args.min_images_per_class) for n in range(args.groups_num)]\n# Each group has its own classifier, which depends on the number of classes in the group\nclassifiers = [cosface_loss.MarginCosineProduct(args.fc_output_dim, len(group)) for group in groups]\nclassifiers_optimizers = [torch.optim.Adam(classifier.parameters(), lr=args.classifiers_lr) for classifier in classifiers]\n\nlogging.info(f\"Using {len(groups)} groups\")\nlogging.info(f\"The {len(groups)} groups have respectively the following number of classes {[len(g) for g in groups]}\")\nlogging.info(f\"The {len(groups)} groups have respectively the following number of images {[g.get_images_num() for g in groups]}\")\n\nval_ds = TestDataset(args.val_set_folder, positive_dist_threshold=args.positive_dist_threshold,\n                     image_size=args.image_size, resize_test_imgs=args.resize_test_imgs)\ntest_ds = TestDataset(args.test_set_folder, queries_folder=\"queries_v1\",\n                      positive_dist_threshold=args.positive_dist_threshold,\n                      image_size=args.image_size, resize_test_imgs=args.resize_test_imgs)\nlogging.info(f\"Validation set: {val_ds}\")\nlogging.info(f\"Test set: {test_ds}\")\n\n#### Resume\nif args.resume_train:\n    model, model_optimizer, classifiers, classifiers_optimizers, best_val_recall1, start_epoch_num = \\\n        util.resume_train(args, args.output_folder, model, model_optimizer, classifiers, classifiers_optimizers)\n    model = model.to(args.device)\n    epoch_num = start_epoch_num - 1\n    logging.info(f\"Resuming from epoch {start_epoch_num} with best R@1 {best_val_recall1:.1f} from checkpoint {args.resume_train}\")\nelse:\n    best_val_recall1 = start_epoch_num = 0\n\n#### Train / evaluation loop\nlogging.info(\"Start training ...\")\nlogging.info(f\"There are {len(groups[0])} classes for the first group, \" +\n             f\"each epoch has {args.iterations_per_epoch} iterations \" +\n             f\"with batch_size {args.batch_size}, therefore the model sees each class (on average) \" +\n             f\"{args.iterations_per_epoch * args.batch_size / len(groups[0]):.1f} times per epoch\")\n\n\nif args.augmentation_device == \"cuda\":\n    gpu_augmentation = T.Compose([\n            augmentations.DeviceAgnosticColorJitter(brightness=args.brightness,\n                                                    contrast=args.contrast,\n                                                    saturation=args.saturation,\n                                                    hue=args.hue),\n            augmentations.DeviceAgnosticRandomResizedCrop([args.image_size, args.image_size],\n                                                          scale=[1-args.random_resized_crop, 1]),\n            T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n        ])\n\nif args.use_amp16:\n    scaler = torch.cuda.amp.GradScaler()\n\nfor epoch_num in range(start_epoch_num, args.epochs_num):\n    \n    #### Train\n    epoch_start_time = datetime.now()\n    # Select classifier and dataloader according to epoch\n    current_group_num = epoch_num % args.groups_num\n    classifiers[current_group_num] = classifiers[current_group_num].to(args.device)\n    util.move_to_device(classifiers_optimizers[current_group_num], args.device)\n    \n    dataloader = commons.InfiniteDataLoader(groups[current_group_num], num_workers=args.num_workers,\n                                            batch_size=args.batch_size, shuffle=True,\n                                            pin_memory=(args.device == \"cuda\"), drop_last=True)\n    \n    dataloader_iterator = iter(dataloader)\n    model = model.train()\n    \n    epoch_losses = np.zeros((0, 1), dtype=np.float32)\n    for iteration in tqdm(range(args.iterations_per_epoch), ncols=100):\n        images, targets, _ = next(dataloader_iterator)\n        images, targets = images.to(args.device), targets.to(args.device)\n        \n        if args.augmentation_device == \"cuda\":\n            images = gpu_augmentation(images)\n        \n        model_optimizer.zero_grad()\n        classifiers_optimizers[current_group_num].zero_grad()\n        \n        if not args.use_amp16:\n            descriptors = model(images)\n            output = classifiers[current_group_num](descriptors, targets)\n            loss = criterion(output, targets)\n            loss.backward()\n            epoch_losses = np.append(epoch_losses, loss.item())\n            del loss, output, images\n            model_optimizer.step()\n            classifiers_optimizers[current_group_num].step()\n        else:  # Use AMP 16\n            with torch.cuda.amp.autocast():\n                descriptors = model(images)\n                output = classifiers[current_group_num](descriptors, targets)\n                loss = criterion(output, targets)\n            scaler.scale(loss).backward()\n            epoch_losses = np.append(epoch_losses, loss.item())\n            del loss, output, images\n            scaler.step(model_optimizer)\n            scaler.step(classifiers_optimizers[current_group_num])\n            scaler.update()\n    \n    classifiers[current_group_num] = classifiers[current_group_num].cpu()\n    util.move_to_device(classifiers_optimizers[current_group_num], \"cpu\")\n    \n    logging.debug(f\"Epoch {epoch_num:02d} in {str(datetime.now() - epoch_start_time)[:-7]}, \"\n                  f\"loss = {epoch_losses.mean():.4f}\")\n    \n    #### Evaluation\n    recalls, recalls_str = test.test(args, val_ds, model)\n    logging.info(f\"Epoch {epoch_num:02d} in {str(datetime.now() - epoch_start_time)[:-7]}, {val_ds}: {recalls_str[:20]}\")\n    is_best = recalls[0] > best_val_recall1\n    best_val_recall1 = max(recalls[0], best_val_recall1)\n    # Save checkpoint, which contains all training parameters\n    util.save_checkpoint({\n        \"epoch_num\": epoch_num + 1,\n        \"model_state_dict\": model.state_dict(),\n        \"optimizer_state_dict\": model_optimizer.state_dict(),\n        \"classifiers_state_dict\": [c.state_dict() for c in classifiers],\n        \"optimizers_state_dict\": [c.state_dict() for c in classifiers_optimizers],\n        \"best_val_recall1\": best_val_recall1\n    }, is_best, args.output_folder)\n\n\nlogging.info(f\"Trained for {epoch_num+1:02d} epochs, in total in {str(datetime.now() - start_time)[:-7]}\")\n\n#### Test best model on test set v1\nbest_model_state_dict = torch.load(f\"{args.output_folder}/best_model.pth\")\nmodel.load_state_dict(best_model_state_dict)\n\nlogging.info(f\"Now testing on the test set: {test_ds}\")\nrecalls, recalls_str = test.test(args, test_ds, model, args.num_preds_to_save)\nlogging.info(f\"{test_ds}: {recalls_str}\")\n\nlogging.info(\"Experiment finished (without any errors)\")\n"
  },
  {
    "path": "util.py",
    "content": "\nimport torch\nimport shutil\nimport logging\nfrom typing import Type, List\nfrom argparse import Namespace\nfrom cosface_loss import MarginCosineProduct\n\n\ndef move_to_device(optimizer: Type[torch.optim.Optimizer], device: str):\n    for state in optimizer.state.values():\n        for k, v in state.items():\n            if torch.is_tensor(v):\n                state[k] = v.to(device)\n\n\ndef save_checkpoint(state: dict, is_best: bool, output_folder: str,\n                    ckpt_filename: str = \"last_checkpoint.pth\"):\n    # TODO it would be better to move weights to cpu before saving\n    checkpoint_path = f\"{output_folder}/{ckpt_filename}\"\n    torch.save(state, checkpoint_path)\n    if is_best:\n        torch.save(state[\"model_state_dict\"], f\"{output_folder}/best_model.pth\")\n\n\ndef resume_train(args: Namespace, output_folder: str, model: torch.nn.Module,\n                 model_optimizer: Type[torch.optim.Optimizer], classifiers: List[MarginCosineProduct],\n                 classifiers_optimizers: List[Type[torch.optim.Optimizer]]):\n    \"\"\"Load model, optimizer, and other training parameters\"\"\"\n    logging.info(f\"Loading checkpoint: {args.resume_train}\")\n    checkpoint = torch.load(args.resume_train)\n    start_epoch_num = checkpoint[\"epoch_num\"]\n    \n    model_state_dict = checkpoint[\"model_state_dict\"]\n    model.load_state_dict(model_state_dict)\n    \n    model = model.to(args.device)\n    model_optimizer.load_state_dict(checkpoint[\"optimizer_state_dict\"])\n    \n    assert args.groups_num == len(classifiers) == len(classifiers_optimizers) == \\\n        len(checkpoint[\"classifiers_state_dict\"]) == len(checkpoint[\"optimizers_state_dict\"]), \\\n        (f\"{args.groups_num}, {len(classifiers)}, {len(classifiers_optimizers)}, \"\n         f\"{len(checkpoint['classifiers_state_dict'])}, {len(checkpoint['optimizers_state_dict'])}\")\n    \n    for c, sd in zip(classifiers, checkpoint[\"classifiers_state_dict\"]):\n        # Move classifiers to GPU before loading their optimizers\n        c = c.to(args.device)\n        c.load_state_dict(sd)\n    for c, sd in zip(classifiers_optimizers, checkpoint[\"optimizers_state_dict\"]):\n        c.load_state_dict(sd)\n    for c in classifiers:\n        # Move classifiers back to CPU to save some GPU memory\n        c = c.cpu()\n    \n    best_val_recall1 = checkpoint[\"best_val_recall1\"]\n    \n    # Copy best model to current output_folder\n    shutil.copy(args.resume_train.replace(\"last_checkpoint.pth\", \"best_model.pth\"), output_folder)\n    \n    return model, model_optimizer, classifiers, classifiers_optimizers, best_val_recall1, start_epoch_num\n"
  },
  {
    "path": "visualizations.py",
    "content": "\nimport os\nimport cv2\nimport numpy as np\nfrom tqdm import tqdm\nfrom skimage.transform import rescale\nfrom PIL import Image, ImageDraw, ImageFont\n\n\n# Height and width of a single image\nH = 512\nW = 512\nTEXT_H = 175\nFONTSIZE = 80\nSPACE = 50  # Space between two images\n\n\ndef write_labels_to_image(labels=[\"text1\", \"text2\"]):\n    \"\"\"Creates an image with vertical text, spaced along rows.\"\"\"\n    font = ImageFont.truetype(\"/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf\", FONTSIZE)\n    img = Image.new('RGB', ((W * len(labels)) + 50 * (len(labels)-1), TEXT_H), (1, 1, 1))\n    d = ImageDraw.Draw(img)\n    for i, text in enumerate(labels):\n        _, _, w, h = d.textbbox((0,0), text, font=font)\n        d.text(((W+SPACE)*i + W//2 - w//2, 1), text, fill=(0, 0, 0), font=font)\n    return np.array(img)\n\n\ndef draw(img, c=(0, 255, 0), thickness=20):\n    \"\"\"Draw a colored (usually red or green) box around an image.\"\"\"\n    p = np.array([[0, 0], [0, img.shape[0]], [img.shape[1], img.shape[0]], [img.shape[1], 0]])\n    for i in range(3):\n        cv2.line(img, (p[i, 0], p[i, 1]), (p[i+1, 0], p[i+1, 1]), c, thickness=thickness*2)\n    return cv2.line(img, (p[3, 0], p[3, 1]), (p[0, 0], p[0, 1]), c, thickness=thickness*2)\n\n\ndef build_prediction_image(images_paths, preds_correct=None):\n    \"\"\"Build a row of images, where the first is the query and the rest are predictions.\n    For each image, if is_correct then draw a green/red box.\n    \"\"\"\n    assert len(images_paths) == len(preds_correct)\n    labels = [\"Query\"] + [f\"Pr{i} - {is_correct}\" for i, is_correct in enumerate(preds_correct[1:])]\n    num_images = len(images_paths)\n    images = [np.array(Image.open(path)) for path in images_paths]\n    for img, correct in zip(images, preds_correct):\n        if correct is None:\n            continue\n        color = (0, 255, 0) if correct else (255, 0, 0)\n        draw(img, color)\n    concat_image = np.ones([H, (num_images*W)+((num_images-1)*SPACE), 3])\n    rescaleds = [rescale(i, [min(H/i.shape[0], W/i.shape[1]), min(H/i.shape[0], W/i.shape[1]), 1]) for i in images]\n    for i, image in enumerate(rescaleds):\n        pad_width = (W - image.shape[1] + 1) // 2\n        pad_height = (H - image.shape[0] + 1) // 2\n        image = np.pad(image, [[pad_height, pad_height], [pad_width, pad_width], [0, 0]], constant_values=1)[:H, :W]\n        concat_image[: , i*(W+SPACE) : i*(W+SPACE)+W] = image\n    try:\n        labels_image = write_labels_to_image(labels)\n        final_image = np.concatenate([labels_image, concat_image])\n    except OSError:  # Handle error in case of missing PIL ImageFont\n        final_image = concat_image\n    final_image = Image.fromarray((final_image*255).astype(np.uint8))\n    return final_image\n\n\ndef save_file_with_paths(query_path, preds_paths, positives_paths, output_path):\n    file_content = []\n    file_content.append(\"Query path:\")\n    file_content.append(query_path + \"\\n\")\n    file_content.append(\"Predictions paths:\")\n    file_content.append(\"\\n\".join(preds_paths) + \"\\n\")\n    file_content.append(\"Positives paths:\")\n    file_content.append(\"\\n\".join(positives_paths) + \"\\n\")\n    with open(output_path, \"w\") as file:\n        _ = file.write(\"\\n\".join(file_content))\n\n\ndef save_preds(predictions, eval_ds, output_folder, save_only_wrong_preds=None):\n    \"\"\"For each query, save an image containing the query and its predictions,\n    and a file with the paths of the query, its predictions and its positives.\n\n    Parameters\n    ----------\n    predictions : np.array of shape [num_queries x num_preds_to_viz], with the preds\n        for each query\n    eval_ds : TestDataset\n    output_folder : str / Path with the path to save the predictions\n    save_only_wrong_preds : bool, if True save only the wrongly predicted queries,\n        i.e. the ones where the first pred is uncorrect (further than 25 m)\n    \"\"\"\n    positives_per_query = eval_ds.get_positives()\n    os.makedirs(f\"{output_folder}/preds\", exist_ok=True)\n    for query_index, preds in enumerate(tqdm(predictions, ncols=80, desc=f\"Saving preds in {output_folder}\")):\n        query_path = eval_ds.queries_paths[query_index]\n        list_of_images_paths = [query_path]\n        # List of None (query), True (correct preds) or False (wrong preds)\n        preds_correct = [None]\n        for pred_index, pred in enumerate(preds):\n            pred_path = eval_ds.database_paths[pred]\n            list_of_images_paths.append(pred_path)\n            is_correct = pred in positives_per_query[query_index]\n            preds_correct.append(is_correct)\n        \n        if save_only_wrong_preds and preds_correct[1]:\n            continue\n        \n        prediction_image = build_prediction_image(list_of_images_paths, preds_correct)\n        pred_image_path = f\"{output_folder}/preds/{query_index:03d}.jpg\"\n        prediction_image.save(pred_image_path)\n        \n        positives_paths = [eval_ds.database_paths[idx] for idx in positives_per_query[query_index]]\n        save_file_with_paths(\n            query_path=list_of_images_paths[0],\n            preds_paths=list_of_images_paths[1:],\n            positives_paths=positives_paths,\n            output_path=f\"{output_folder}/preds/{query_index:03d}.txt\"\n        )\n\n\n"
  }
]