Repository: gmberton/CosPlace Branch: main Commit: 52b56e95ea62 Files: 21 Total size: 64.2 KB Directory structure: gitextract_ij6gs5cz/ ├── .gitignore ├── LICENSE ├── README.md ├── augmentations.py ├── commons.py ├── cosface_loss.py ├── cosplace_model/ │ ├── __init__.py │ ├── cosplace_network.py │ └── layers.py ├── datasets/ │ ├── __init__.py │ ├── dataset_utils.py │ ├── test_dataset.py │ └── train_dataset.py ├── eval.py ├── hubconf.py ├── parser.py ├── requirements.txt ├── test.py ├── train.py ├── util.py └── visualizations.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ .spyproject __pycache__ logs cache ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2022 Gabriele Berton, Carlo Masone, Barbara Caputo Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # Rethinking Visual Geo-localization for Large-Scale Applications This is the official pyTorch implementation of the CVPR 2022 paper "Rethinking Visual Geo-localization for Large-Scale Applications". The paper presents a new dataset called San Francisco eXtra Large (SF-XL, go [_here_](https://forms.gle/wpyDzhDyoWLQygAT9) to download it), and a highly scalable training method (called CosPlace), which allows to reach SOTA results with compact descriptors. [[CVPR OpenAccess](https://openaccess.thecvf.com/content/CVPR2022/html/Berton_Rethinking_Visual_Geo-Localization_for_Large-Scale_Applications_CVPR_2022_paper.html)] [[ArXiv](https://arxiv.org/abs/2204.02287)] [[Video](https://www.youtube.com/watch?v=oDyL6oVNN3I)] [[BibTex](https://github.com/gmberton/CosPlace?tab=readme-ov-file#cite)] Note that CosPlace is quite old. **🚀 Looking for SOTA Visual Place Recognition (VPR)? Check out [MegaLoc](https://github.com/gmberton/MegaLoc)** The images below represent respectively: 1) the map of San Francisco eXtra Large 2) a visualization of how CosPlace Groups (read datasets) are formed 3) results with CosPlace vs other methods on Pitts250k (CosPlace trained on SF-XL, others on Pitts30k)

## Train After downloading the SF-XL dataset, simply run `$ python3 train.py --train_set_folder path/to/sf_xl/raw/train/database --val_set_folder path/to/sf_xl/processed/val --test_set_folder path/to/sf_xl/processed/test` the script automatically splits SF-XL in CosPlace Groups, and saves the resulting object in the folder `cache`. By default training is performed with a ResNet-18 with descriptors dimensionality 512, which fits in less than 4GB of VRAM. To change the backbone or the output descriptors dimensionality simply run `$ python3 train.py --backbone ResNet50 --fc_output_dim 128` You can also speed up your training with Automatic Mixed Precision (note that all results/statistics from the paper did not use AMP) `$ python3 train.py --use_amp16` Run `$ python3 train.py -h` to have a look at all the hyperparameters that you can change. You will find all hyperparameters mentioned in the paper. #### Dataset size and lightweight version The SF-XL dataset is about 1 TB. For training only a subset of the images is used, and you can use this subset for training, which is only 360 GB. If this is still too heavy for you (e.g. if you're using Colab), but you would like to run CosPlace, we also created a small version of SF-XL, which is only 5 GB. Obviously, using the small version will lead to lower results, and it should be used only for debugging / exploration purposes. More information on the dataset and lightweight version are on the README that you can find on the dataset download page (go [_here_](https://forms.gle/wpyDzhDyoWLQygAT9) to find it). #### Reproducibility Results from the paper are fully reproducible, and we followed deep learning's best practices (average over multiple runs for the main results, validation/early stopping and hyperparameter search on the val set). If you are a researcher comparing your work against ours, please make sure to follow these best practices and avoid picking the best model on the test set. ## Test You can test a trained model as such `$ python3 eval.py --backbone ResNet50 --fc_output_dim 128 --resume_model path/to/best_model.pth` You can download plenty of trained models below. ### Visualize predictions Predictions can be easily visualized through the `num_preds_to_save` parameter. For example running this ``` python3 eval.py --backbone ResNet50 --fc_output_dim 512 --resume_model path/to/best_model.pth \ --num_preds_to_save=3 --exp_name=cosplace_on_stlucia ``` will generate under the path `./logs/cosplace_on_stlucia/*/preds` images such as

Given that saving predictions for each query might take long, you can also pass the parameter `--save_only_wrong_preds` which will save only predictions for wrongly predicted queries (i.e. where the first prediction is wrong), which should be the most interesting failure cases. ## Trained Models We now have all our trained models on [PyTorch Hub](https://pytorch.org/docs/stable/hub.html), so that you can use them in any codebase without cloning this repository simply like this ``` import torch model = torch.hub.load("gmberton/cosplace", "get_trained_model", backbone="ResNet50", fc_output_dim=2048) ``` As an alternative, you can download the trained models from the table below, which provides links to models with different backbones and dimensionality of descriptors, trained on SF-XL.
Model Dimension of Descriptors
32 64 128 256 512 1024 2048
ResNet-18 link link link link link - -
ResNet-50 link link link link link link link
ResNet-101 link link link link link link link
ResNet-152 link link link link link link link
VGG-16 - link link link link - -
Or you can download all models at once at [this link](https://drive.google.com/drive/folders/1WzSLnv05FLm-XqP5DxR5nXaaixH23uvV?usp=sharing) ## Issues If you have any questions regarding our code or dataset, feel free to open an issue or send an email to berton.gabri@gmail.com ## Acknowledgements Parts of this repo are inspired by the following repositories: - [CosFace implementation in PyTorch](https://github.com/MuggleWang/CosFace_pytorch/blob/master/layer.py) - [CNN Image Retrieval in PyTorch](https://github.com/filipradenovic/cnnimageretrieval-pytorch) (for the GeM layer) - [Visual Geo-localization benchmark](https://github.com/gmberton/deep-visual-geo-localization-benchmark) (for the evaluation / test code) ## Cite Here is the bibtex to cite our paper ```bibtex @inproceedings{Berton_CVPR_2022_CosPlace, author = {Berton, Gabriele and Masone, Carlo and Caputo, Barbara}, title = {Rethinking Visual Geo-Localization for Large-Scale Applications}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition}, month = {June}, year = {2022}, pages = {4878--4888} } ``` ================================================ FILE: augmentations.py ================================================ import torch from typing import Tuple, Union import torchvision.transforms as T class DeviceAgnosticColorJitter(T.ColorJitter): def __init__(self, brightness: float = 0., contrast: float = 0., saturation: float = 0., hue: float = 0.): """This is the same as T.ColorJitter but it only accepts batches of images and works on GPU""" super().__init__(brightness=brightness, contrast=contrast, saturation=saturation, hue=hue) def forward(self, images: torch.Tensor) -> torch.Tensor: assert len(images.shape) == 4, f"images should be a batch of images, but it has shape {images.shape}" B, C, H, W = images.shape # Applies a different color jitter to each image color_jitter = super(DeviceAgnosticColorJitter, self).forward augmented_images = [color_jitter(img).unsqueeze(0) for img in images] augmented_images = torch.cat(augmented_images) assert augmented_images.shape == torch.Size([B, C, H, W]) return augmented_images class DeviceAgnosticRandomResizedCrop(T.RandomResizedCrop): def __init__(self, size: Union[int, Tuple[int, int]], scale: float): """This is the same as T.RandomResizedCrop but it only accepts batches of images and works on GPU""" super().__init__(size=size, scale=scale, antialias=True) def forward(self, images: torch.Tensor) -> torch.Tensor: assert len(images.shape) == 4, f"images should be a batch of images, but it has shape {images.shape}" B, C, H, W = images.shape # Applies a different RandomResizedCrop to each image random_resized_crop = super(DeviceAgnosticRandomResizedCrop, self).forward augmented_images = [random_resized_crop(img).unsqueeze(0) for img in images] augmented_images = torch.cat(augmented_images) return augmented_images if __name__ == "__main__": """ You can run this script to visualize the transformations, and verify that the augmentations are applied individually on each image of the batch. """ from PIL import Image # Import skimage in here, so it is not necessary to install it unless you run this script from skimage import data # Initialize DeviceAgnosticRandomResizedCrop random_crop = DeviceAgnosticRandomResizedCrop(size=[256, 256], scale=[0.5, 1]) # Create a batch with 2 astronaut images pil_image = Image.fromarray(data.astronaut()) tensor_image = T.functional.to_tensor(pil_image).unsqueeze(0) images_batch = torch.cat([tensor_image, tensor_image]) # Apply augmentation (individually on each of the 2 images) augmented_batch = random_crop(images_batch) # Convert to PIL images augmented_image_0 = T.functional.to_pil_image(augmented_batch[0]) augmented_image_1 = T.functional.to_pil_image(augmented_batch[1]) # Visualize the original image, as well as the two augmented ones pil_image.show() augmented_image_0.show() augmented_image_1.show() ================================================ FILE: commons.py ================================================ import os import sys import torch import random import logging import traceback import numpy as np class InfiniteDataLoader(torch.utils.data.DataLoader): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.dataset_iterator = super().__iter__() def __iter__(self): return self def __next__(self): try: batch = next(self.dataset_iterator) except StopIteration: self.dataset_iterator = super().__iter__() batch = next(self.dataset_iterator) return batch def make_deterministic(seed: int = 0): """Make results deterministic. If seed == -1, do not make deterministic. Running your script in a deterministic way might slow it down. Note that for some packages (eg: sklearn's PCA) this function is not enough. """ seed = int(seed) if seed == -1: return random.seed(seed) np.random.seed(seed) torch.manual_seed(seed) torch.cuda.manual_seed_all(seed) torch.backends.cudnn.deterministic = True torch.backends.cudnn.benchmark = False def setup_logging(output_folder: str, exist_ok: bool = False, console: str = "debug", info_filename: str = "info.log", debug_filename: str = "debug.log"): """Set up logging files and console output. Creates one file for INFO logs and one for DEBUG logs. Args: output_folder (str): creates the folder where to save the files. exist_ok (boolean): if False throw a FileExistsError if output_folder already exists debug (str): if == "debug" prints on console debug messages and higher if == "info" prints on console info messages and higher if == None does not use console (useful when a logger has already been set) info_filename (str): the name of the info file. if None, don't create info file debug_filename (str): the name of the debug file. if None, don't create debug file """ if not exist_ok and os.path.exists(output_folder): raise FileExistsError(f"{output_folder} already exists!") os.makedirs(output_folder, exist_ok=True) base_formatter = logging.Formatter('%(asctime)s %(message)s', "%Y-%m-%d %H:%M:%S") logger = logging.getLogger('') logger.setLevel(logging.DEBUG) if info_filename is not None: info_file_handler = logging.FileHandler(f'{output_folder}/{info_filename}') info_file_handler.setLevel(logging.INFO) info_file_handler.setFormatter(base_formatter) logger.addHandler(info_file_handler) if debug_filename is not None: debug_file_handler = logging.FileHandler(f'{output_folder}/{debug_filename}') debug_file_handler.setLevel(logging.DEBUG) debug_file_handler.setFormatter(base_formatter) logger.addHandler(debug_file_handler) if console is not None: console_handler = logging.StreamHandler() if console == "debug": console_handler.setLevel(logging.DEBUG) if console == "info": console_handler.setLevel(logging.INFO) console_handler.setFormatter(base_formatter) logger.addHandler(console_handler) def my_handler(type_, value, tb): logger.info("\n" + "".join(traceback.format_exception(type, value, tb))) logging.info("Experiment finished (with some errors)") sys.excepthook = my_handler ================================================ FILE: cosface_loss.py ================================================ # Based on https://github.com/MuggleWang/CosFace_pytorch/blob/master/layer.py import torch import torch.nn as nn from torch.nn import Parameter def cosine_sim(x1: torch.Tensor, x2: torch.Tensor, dim: int = 1, eps: float = 1e-8) -> torch.Tensor: ip = torch.mm(x1, x2.t()) w1 = torch.norm(x1, 2, dim) w2 = torch.norm(x2, 2, dim) return ip / torch.ger(w1, w2).clamp(min=eps) class MarginCosineProduct(nn.Module): """Implement of large margin cosine distance: Args: in_features: size of each input sample out_features: size of each output sample s: norm of input feature m: margin """ def __init__(self, in_features: int, out_features: int, s: float = 30.0, m: float = 0.40): super().__init__() self.in_features = in_features self.out_features = out_features self.s = s self.m = m self.weight = Parameter(torch.Tensor(out_features, in_features)) nn.init.xavier_uniform_(self.weight) def forward(self, inputs: torch.Tensor, label: torch.Tensor) -> torch.Tensor: cosine = cosine_sim(inputs, self.weight) one_hot = torch.zeros_like(cosine) one_hot.scatter_(1, label.view(-1, 1), 1.0) output = self.s * (cosine - one_hot * self.m) return output def __repr__(self): return self.__class__.__name__ + '(' \ + 'in_features=' + str(self.in_features) \ + ', out_features=' + str(self.out_features) \ + ', s=' + str(self.s) \ + ', m=' + str(self.m) + ')' ================================================ FILE: cosplace_model/__init__.py ================================================ ================================================ FILE: cosplace_model/cosplace_network.py ================================================ import torch import logging import torchvision from torch import nn from typing import Tuple from cosplace_model.layers import Flatten, L2Norm, GeM # The number of channels in the last convolutional layer, the one before average pooling CHANNELS_NUM_IN_LAST_CONV = { "ResNet18": 512, "ResNet50": 2048, "ResNet101": 2048, "ResNet152": 2048, "VGG16": 512, "EfficientNet_B0": 1280, "EfficientNet_B1": 1280, "EfficientNet_B2": 1408, "EfficientNet_B3": 1536, "EfficientNet_B4": 1792, "EfficientNet_B5": 2048, "EfficientNet_B6": 2304, "EfficientNet_B7": 2560, } class GeoLocalizationNet(nn.Module): def __init__(self, backbone : str, fc_output_dim : int, train_all_layers : bool = False): """Return a model for GeoLocalization. Args: backbone (str): which torchvision backbone to use. Must be VGG16 or a ResNet. fc_output_dim (int): the output dimension of the last fc layer, equivalent to the descriptors dimension. train_all_layers (bool): whether to freeze the first layers of the backbone during training or not. """ super().__init__() assert backbone in CHANNELS_NUM_IN_LAST_CONV, f"backbone must be one of {list(CHANNELS_NUM_IN_LAST_CONV.keys())}" self.backbone, features_dim = get_backbone(backbone, train_all_layers) self.aggregation = nn.Sequential( L2Norm(), GeM(), Flatten(), nn.Linear(features_dim, fc_output_dim), L2Norm() ) def forward(self, x): x = self.backbone(x) x = self.aggregation(x) return x def get_pretrained_torchvision_model(backbone_name : str) -> torch.nn.Module: """This function takes the name of a backbone and returns the corresponding pretrained model from torchvision. Examples of backbone_name are 'VGG16' or 'ResNet18' """ try: # Newer versions of pytorch require to pass weights=weights_module.DEFAULT weights_module = getattr(__import__('torchvision.models', fromlist=[f"{backbone_name}_Weights"]), f"{backbone_name}_Weights") model = getattr(torchvision.models, backbone_name.lower())(weights=weights_module.DEFAULT) except (ImportError, AttributeError): # Older versions of pytorch require to pass pretrained=True model = getattr(torchvision.models, backbone_name.lower())(pretrained=True) return model def get_backbone(backbone_name : str, train_all_layers : bool) -> Tuple[torch.nn.Module, int]: backbone = get_pretrained_torchvision_model(backbone_name) if backbone_name.startswith("ResNet"): if train_all_layers: logging.debug(f"Train all layers of the {backbone_name}") else: for name, child in backbone.named_children(): if name == "layer3": # Freeze layers before conv_3 break for params in child.parameters(): params.requires_grad = False logging.debug(f"Train only layer3 and layer4 of the {backbone_name}, freeze the previous ones") layers = list(backbone.children())[:-2] # Remove avg pooling and FC layer elif backbone_name == "VGG16": layers = list(backbone.features.children())[:-2] # Remove avg pooling and FC layer if train_all_layers: logging.debug("Train all layers of the VGG-16") else: for layer in layers[:-5]: for p in layer.parameters(): p.requires_grad = False logging.debug("Train last layers of the VGG-16, freeze the previous ones") elif backbone_name.startswith("EfficientNet"): if train_all_layers: logging.debug(f"Train all layers of the {backbone_name}") else: for name, child in backbone.features.named_children(): if name == "5": # Freeze layers before block 5 break for params in child.parameters(): params.requires_grad = False logging.debug(f"Train only the last three blocks of the {backbone_name}, freeze the previous ones") layers = list(backbone.children())[:-2] # Remove avg pooling and FC layer backbone = torch.nn.Sequential(*layers) features_dim = CHANNELS_NUM_IN_LAST_CONV[backbone_name] return backbone, features_dim ================================================ FILE: cosplace_model/layers.py ================================================ import torch import torch.nn as nn import torch.nn.functional as F from torch.nn.parameter import Parameter def gem(x, p=torch.ones(1)*3, eps: float = 1e-6): return F.avg_pool2d(x.clamp(min=eps).pow(p), (x.size(-2), x.size(-1))).pow(1./p) class GeM(nn.Module): def __init__(self, p=3, eps=1e-6): super().__init__() self.p = Parameter(torch.ones(1)*p) self.eps = eps def forward(self, x): return gem(x, p=self.p, eps=self.eps) def __repr__(self): return f"{self.__class__.__name__}(p={self.p.data.tolist()[0]:.4f}, eps={self.eps})" class Flatten(torch.nn.Module): def __init__(self): super().__init__() def forward(self, x): assert x.shape[2] == x.shape[3] == 1, f"{x.shape[2]} != {x.shape[3]} != 1" return x[:, :, 0, 0] class L2Norm(nn.Module): def __init__(self, dim=1): super().__init__() self.dim = dim def forward(self, x): return F.normalize(x, p=2.0, dim=self.dim) ================================================ FILE: datasets/__init__.py ================================================ ================================================ FILE: datasets/dataset_utils.py ================================================ import os import logging from glob import glob from PIL import ImageFile ImageFile.LOAD_TRUNCATED_IMAGES = True def read_images_paths(dataset_folder, get_abs_path=False): """Find images within 'dataset_folder' and return their relative paths as a list. If there is a file 'dataset_folder'_images_paths.txt, read paths from such file. Otherwise, use glob(). Keeping the paths in the file speeds up computation, because using glob over large folders can be slow. Parameters ---------- dataset_folder : str, folder containing JPEG images get_abs_path : bool, if True return absolute paths, otherwise remove dataset_folder from each path Returns ------- images_paths : list[str], paths of JPEG images within dataset_folder """ if not os.path.exists(dataset_folder): raise FileNotFoundError(f"Folder {dataset_folder} does not exist") file_with_paths = dataset_folder + "_images_paths.txt" if os.path.exists(file_with_paths): logging.debug(f"Reading paths of images within {dataset_folder} from {file_with_paths}") with open(file_with_paths, "r") as file: images_paths = file.read().splitlines() images_paths = [os.path.join(dataset_folder, path) for path in images_paths] # Sanity check that paths within the file exist if not os.path.exists(images_paths[0]): raise FileNotFoundError(f"Image with path {images_paths[0]} " f"does not exist within {dataset_folder}. It is likely " f"that the content of {file_with_paths} is wrong.") else: logging.debug(f"Searching images in {dataset_folder} with glob()") images_paths = sorted(glob(f"{dataset_folder}/**/*.jpg", recursive=True)) if len(images_paths) == 0: raise FileNotFoundError(f"Directory {dataset_folder} does not contain any JPEG images") if not get_abs_path: # Remove dataset_folder from the path images_paths = [p[len(dataset_folder) + 1:] for p in images_paths] return images_paths ================================================ FILE: datasets/test_dataset.py ================================================ import os import numpy as np from PIL import Image import torch.utils.data as data import torchvision.transforms as transforms from sklearn.neighbors import NearestNeighbors import datasets.dataset_utils as dataset_utils class TestDataset(data.Dataset): def __init__(self, dataset_folder, database_folder="database", queries_folder="queries", positive_dist_threshold=25, image_size=512, resize_test_imgs=False): self.database_folder = dataset_folder + "/" + database_folder self.queries_folder = dataset_folder + "/" + queries_folder self.database_paths = dataset_utils.read_images_paths(self.database_folder, get_abs_path=True) self.queries_paths = dataset_utils.read_images_paths(self.queries_folder, get_abs_path=True) self.dataset_name = os.path.basename(dataset_folder) #### Read paths and UTM coordinates for all images. # The format must be path/to/file/@utm_easting@utm_northing@...@.jpg self.database_utms = np.array([(path.split("@")[1], path.split("@")[2]) for path in self.database_paths]).astype(float) self.queries_utms = np.array([(path.split("@")[1], path.split("@")[2]) for path in self.queries_paths]).astype(float) # Find positives_per_query, which are within positive_dist_threshold (default 25 meters) knn = NearestNeighbors(n_jobs=-1) knn.fit(self.database_utms) self.positives_per_query = knn.radius_neighbors( self.queries_utms, radius=positive_dist_threshold, return_distance=False ) self.images_paths = self.database_paths + self.queries_paths self.database_num = len(self.database_paths) self.queries_num = len(self.queries_paths) transforms_list = [] if resize_test_imgs: # Resize to image_size along the shorter side while maintaining aspect ratio transforms_list += [transforms.Resize(image_size, antialias=True)] transforms_list += [ transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ] self.base_transform = transforms.Compose(transforms_list) @staticmethod def open_image(path): return Image.open(path).convert("RGB") def __getitem__(self, index): image_path = self.images_paths[index] pil_img = TestDataset.open_image(image_path) normalized_img = self.base_transform(pil_img) return normalized_img, index def __len__(self): return len(self.images_paths) def __repr__(self): return f"< {self.dataset_name} - #q: {self.queries_num}; #db: {self.database_num} >" def get_positives(self): return self.positives_per_query ================================================ FILE: datasets/train_dataset.py ================================================ import os import torch import random import logging import numpy as np from PIL import Image from PIL import ImageFile import torchvision.transforms as T from collections import defaultdict import datasets.dataset_utils as dataset_utils ImageFile.LOAD_TRUNCATED_IMAGES = True class TrainDataset(torch.utils.data.Dataset): def __init__(self, args, dataset_folder, M=10, alpha=30, N=5, L=2, current_group=0, min_images_per_class=10): """ Parameters (please check our paper for a clearer explanation of the parameters). ---------- args : args for data augmentation dataset_folder : str, the path of the folder with the train images. M : int, the length of the side of each cell in meters. alpha : int, size of each class in degrees. N : int, distance (M-wise) between two classes of the same group. L : int, distance (alpha-wise) between two classes of the same group. current_group : int, which one of the groups to consider. min_images_per_class : int, minimum number of image in a class. """ super().__init__() self.M = M self.alpha = alpha self.N = N self.L = L self.current_group = current_group self.dataset_folder = dataset_folder self.augmentation_device = args.augmentation_device # dataset_name should be either "processed", "small" or "raw", if you're using SF-XL dataset_name = os.path.basename(dataset_folder) filename = f"cache/{dataset_name}_M{M}_N{N}_alpha{alpha}_L{L}_mipc{min_images_per_class}.torch" if not os.path.exists(filename): os.makedirs("cache", exist_ok=True) logging.info(f"Cached dataset {filename} does not exist, I'll create it now.") self.initialize(dataset_folder, M, N, alpha, L, min_images_per_class, filename) elif current_group == 0: logging.info(f"Using cached dataset {filename}") classes_per_group, self.images_per_class = torch.load(filename) if current_group >= len(classes_per_group): raise ValueError(f"With this configuration there are only {len(classes_per_group)} " + f"groups, therefore I can't create the {current_group}th group. " + "You should reduce the number of groups by setting for example " + f"'--groups_num {current_group}'") self.classes_ids = classes_per_group[current_group] if self.augmentation_device == "cpu": self.transform = T.Compose([ T.ColorJitter(brightness=args.brightness, contrast=args.contrast, saturation=args.saturation, hue=args.hue), T.RandomResizedCrop([args.image_size, args.image_size], scale=[1-args.random_resized_crop, 1], antialias=True), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) @staticmethod def open_image(path): return Image.open(path).convert("RGB") def __getitem__(self, class_num): # This function takes as input the class_num instead of the index of # the image. This way each class is equally represented during training. class_id = self.classes_ids[class_num] # Pick a random image among those in this class. image_path = os.path.join(self.dataset_folder, random.choice(self.images_per_class[class_id])) try: pil_image = TrainDataset.open_image(image_path) except Exception as e: logging.info(f"ERROR image {image_path} couldn't be opened, it might be corrupted.") raise e tensor_image = T.functional.to_tensor(pil_image) assert tensor_image.shape == torch.Size([3, 512, 512]), \ f"Image {image_path} should have shape [3, 512, 512] but has {tensor_image.shape}." if self.augmentation_device == "cpu": tensor_image = self.transform(tensor_image) return tensor_image, class_num, image_path def get_images_num(self): """Return the number of images within this group.""" return sum([len(self.images_per_class[c]) for c in self.classes_ids]) def __len__(self): """Return the number of classes within this group.""" return len(self.classes_ids) @staticmethod def initialize(dataset_folder, M, N, alpha, L, min_images_per_class, filename): logging.debug(f"Searching training images in {dataset_folder}") images_paths = dataset_utils.read_images_paths(dataset_folder) logging.debug(f"Found {len(images_paths)} images") logging.debug("For each image, get its UTM east, UTM north and heading from its path") images_metadatas = [p.split("@") for p in images_paths] # field 1 is UTM east, field 2 is UTM north, field 9 is heading utmeast_utmnorth_heading = [(m[1], m[2], m[9]) for m in images_metadatas] utmeast_utmnorth_heading = np.array(utmeast_utmnorth_heading).astype(np.float64) logging.debug("For each image, get class and group to which it belongs") class_id__group_id = [TrainDataset.get__class_id__group_id(*m, M, alpha, N, L) for m in utmeast_utmnorth_heading] logging.debug("Group together images belonging to the same class") images_per_class = defaultdict(list) for image_path, (class_id, _) in zip(images_paths, class_id__group_id): images_per_class[class_id].append(image_path) # Images_per_class is a dict where the key is class_id, and the value # is a list with the paths of images within that class. images_per_class = {k: v for k, v in images_per_class.items() if len(v) >= min_images_per_class} logging.debug("Group together classes belonging to the same group") # Classes_per_group is a dict where the key is group_id, and the value # is a list with the class_ids belonging to that group. classes_per_group = defaultdict(set) for class_id, group_id in class_id__group_id: if class_id not in images_per_class: continue # Skip classes with too few images classes_per_group[group_id].add(class_id) # Convert classes_per_group to a list of lists. # Each sublist represents the classes within a group. classes_per_group = [list(c) for c in classes_per_group.values()] torch.save((classes_per_group, images_per_class), filename) @staticmethod def get__class_id__group_id(utm_east, utm_north, heading, M, alpha, N, L): """Return class_id and group_id for a given point. The class_id is a triplet (tuple) of UTM_east, UTM_north and heading (e.g. (396520, 4983800,120)). The group_id represents the group to which the class belongs (e.g. (0, 1, 0)), and it is between (0, 0, 0) and (N, N, L). """ rounded_utm_east = int(utm_east // M * M) # Rounded to nearest lower multiple of M rounded_utm_north = int(utm_north // M * M) rounded_heading = int(heading // alpha * alpha) class_id = (rounded_utm_east, rounded_utm_north, rounded_heading) # group_id goes from (0, 0, 0) to (N, N, L) group_id = (rounded_utm_east % (M * N) // M, rounded_utm_north % (M * N) // M, rounded_heading % (alpha * L) // alpha) return class_id, group_id ================================================ FILE: eval.py ================================================ import sys import torch import logging import multiprocessing from datetime import datetime import test import parser import commons from cosplace_model import cosplace_network from datasets.test_dataset import TestDataset torch.backends.cudnn.benchmark = True # Provides a speedup args = parser.parse_arguments(is_training=False) start_time = datetime.now() args.output_folder = f"logs/{args.save_dir}/{start_time.strftime('%Y-%m-%d_%H-%M-%S')}" commons.make_deterministic(args.seed) commons.setup_logging(args.output_folder, console="info") logging.info(" ".join(sys.argv)) logging.info(f"Arguments: {args}") logging.info(f"The outputs are being saved in {args.output_folder}") #### Model model = cosplace_network.GeoLocalizationNet(args.backbone, args.fc_output_dim) logging.info(f"There are {torch.cuda.device_count()} GPUs and {multiprocessing.cpu_count()} CPUs.") if args.resume_model is not None: logging.info(f"Loading model from {args.resume_model}") model_state_dict = torch.load(args.resume_model) model.load_state_dict(model_state_dict) else: logging.info("WARNING: You didn't provide a path to resume the model (--resume_model parameter). " + "Evaluation will be computed using randomly initialized weights.") model = model.to(args.device) test_ds = TestDataset(args.test_set_folder, queries_folder="queries_v1", positive_dist_threshold=args.positive_dist_threshold) recalls, recalls_str = test.test(args, test_ds, model, args.num_preds_to_save) logging.info(f"{test_ds}: {recalls_str}") ================================================ FILE: hubconf.py ================================================ dependencies = ['torch', 'torchvision'] import torch from cosplace_model import cosplace_network AVAILABLE_TRAINED_MODELS = { # backbone : list of available fc_output_dim, which is equivalent to descriptors dimensionality "VGG16": [ 64, 128, 256, 512], "ResNet18": [32, 64, 128, 256, 512], "ResNet50": [32, 64, 128, 256, 512, 1024, 2048], "ResNet101": [32, 64, 128, 256, 512, 1024, 2048], "ResNet152": [32, 64, 128, 256, 512, 1024, 2048], } def get_trained_model(backbone : str = "ResNet50", fc_output_dim : int = 2048) -> torch.nn.Module: """Return a model trained with CosPlace on San Francisco eXtra Large. Args: backbone (str): which torchvision backbone to use. Must be VGG16 or a ResNet. fc_output_dim (int): the output dimension of the last fc layer, equivalent to the descriptors dimension. Must be between 32 and 2048, depending on model's availability. Return: model (torch.nn.Module): a trained model. """ print(f"Returning CosPlace model with backbone: {backbone} with features dimension {fc_output_dim}") if backbone not in AVAILABLE_TRAINED_MODELS: raise ValueError(f"Parameter `backbone` is set to {backbone} but it must be one of {list(AVAILABLE_TRAINED_MODELS.keys())}") try: fc_output_dim = int(fc_output_dim) except: raise ValueError(f"Parameter `fc_output_dim` must be an integer, but it is set to {fc_output_dim}") if fc_output_dim not in AVAILABLE_TRAINED_MODELS[backbone]: raise ValueError(f"Parameter `fc_output_dim` is set to {fc_output_dim}, but for backbone {backbone} " f"it must be one of {list(AVAILABLE_TRAINED_MODELS[backbone])}") model = cosplace_network.GeoLocalizationNet(backbone, fc_output_dim) model.load_state_dict( torch.hub.load_state_dict_from_url( f'https://github.com/gmberton/CosPlace/releases/download/v1.0/{backbone}_{fc_output_dim}_cosplace.pth', map_location=torch.device('cpu')) ) return model ================================================ FILE: parser.py ================================================ import argparse def parse_arguments(is_training: bool = True): parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter) # CosPlace Groups parameters parser.add_argument("--M", type=int, default=10, help="_") parser.add_argument("--alpha", type=int, default=30, help="_") parser.add_argument("--N", type=int, default=5, help="_") parser.add_argument("--L", type=int, default=2, help="_") parser.add_argument("--groups_num", type=int, default=8, help="_") parser.add_argument("--min_images_per_class", type=int, default=10, help="_") # Model parameters parser.add_argument("--backbone", type=str, default="ResNet18", choices=["VGG16", "ResNet18", "ResNet50", "ResNet101", "ResNet152", "EfficientNet_B0", "EfficientNet_B1", "EfficientNet_B2", "EfficientNet_B3", "EfficientNet_B4", "EfficientNet_B5", "EfficientNet_B6", "EfficientNet_B7"], help="_") parser.add_argument("--fc_output_dim", type=int, default=512, help="Output dimension of final fully connected layer") parser.add_argument("--train_all_layers", default=False, action="store_true", help="If true, train all layers of the backbone") # Training parameters parser.add_argument("--use_amp16", action="store_true", help="use Automatic Mixed Precision") parser.add_argument("--augmentation_device", type=str, default="cuda", choices=["cuda", "cpu"], help="on which device to run data augmentation") parser.add_argument("--batch_size", type=int, default=32, help="_") parser.add_argument("--epochs_num", type=int, default=50, help="_") parser.add_argument("--iterations_per_epoch", type=int, default=10000, help="_") parser.add_argument("--lr", type=float, default=0.00001, help="_") parser.add_argument("--classifiers_lr", type=float, default=0.01, help="_") parser.add_argument("--image_size", type=int, default=512, help="Width and height of training images (1:1 aspect ratio))") parser.add_argument("--resize_test_imgs", default=False, action="store_true", help="If the test images should be resized to image_size along" "the shorter side while maintaining aspect ratio") # Data augmentation parser.add_argument("--brightness", type=float, default=0.7, help="_") parser.add_argument("--contrast", type=float, default=0.7, help="_") parser.add_argument("--hue", type=float, default=0.5, help="_") parser.add_argument("--saturation", type=float, default=0.7, help="_") parser.add_argument("--random_resized_crop", type=float, default=0.5, help="_") # Validation / test parameters parser.add_argument("--infer_batch_size", type=int, default=16, help="Batch size for inference (validating and testing)") parser.add_argument("--positive_dist_threshold", type=int, default=25, help="distance in meters for a prediction to be considered a positive") # Resume parameters parser.add_argument("--resume_train", type=str, default=None, help="path to checkpoint to resume, e.g. logs/.../last_checkpoint.pth") parser.add_argument("--resume_model", type=str, default=None, help="path to model to resume, e.g. logs/.../best_model.pth") # Other parameters parser.add_argument("--device", type=str, default="cuda", choices=["cuda", "cpu"], help="_") parser.add_argument("--seed", type=int, default=0, help="_") parser.add_argument("--num_workers", type=int, default=8, help="_") parser.add_argument("--num_preds_to_save", type=int, default=0, help="At the end of training, save N preds for each query. " "Try with a small number like 3") parser.add_argument("--save_only_wrong_preds", action="store_true", help="When saving preds (if num_preds_to_save != 0) save only " "preds for difficult queries, i.e. with uncorrect first prediction") # Paths parameters if is_training: # train and val sets are needed only for training parser.add_argument("--train_set_folder", type=str, required=True, help="path of the folder with training images") parser.add_argument("--val_set_folder", type=str, required=True, help="path of the folder with val images (split in database/queries)") parser.add_argument("--test_set_folder", type=str, required=True, help="path of the folder with test images (split in database/queries)") parser.add_argument("--save_dir", type=str, default="default", help="name of directory on which to save the logs, under logs/save_dir") args = parser.parse_args() return args ================================================ FILE: requirements.txt ================================================ faiss_cpu>=1.7.1 numpy>=1.21.2 Pillow>=9.0.1 scikit_learn>=1.0.2 torch>=1.8.2 torchvision>=0.9.2 tqdm>=4.62.3 utm>=0.7.0 ================================================ FILE: test.py ================================================ import faiss import torch import logging import numpy as np from tqdm import tqdm from typing import Tuple from argparse import Namespace from torch.utils.data.dataset import Subset from torch.utils.data import DataLoader, Dataset import visualizations # Compute R@1, R@5, R@10, R@20 RECALL_VALUES = [1, 5, 10, 20] def test(args: Namespace, eval_ds: Dataset, model: torch.nn.Module, num_preds_to_save: int = 0) -> Tuple[np.ndarray, str]: """Compute descriptors of the given dataset and compute the recalls.""" model = model.eval() with torch.no_grad(): logging.debug("Extracting database descriptors for evaluation/testing") database_subset_ds = Subset(eval_ds, list(range(eval_ds.database_num))) database_dataloader = DataLoader(dataset=database_subset_ds, num_workers=args.num_workers, batch_size=args.infer_batch_size, pin_memory=(args.device == "cuda")) all_descriptors = np.empty((len(eval_ds), args.fc_output_dim), dtype="float32") for images, indices in tqdm(database_dataloader, ncols=100): descriptors = model(images.to(args.device)) descriptors = descriptors.cpu().numpy() all_descriptors[indices.numpy(), :] = descriptors logging.debug("Extracting queries descriptors for evaluation/testing using batch size 1") queries_infer_batch_size = 1 queries_subset_ds = Subset(eval_ds, list(range(eval_ds.database_num, eval_ds.database_num+eval_ds.queries_num))) queries_dataloader = DataLoader(dataset=queries_subset_ds, num_workers=args.num_workers, batch_size=queries_infer_batch_size, pin_memory=(args.device == "cuda")) for images, indices in tqdm(queries_dataloader, ncols=100): descriptors = model(images.to(args.device)) descriptors = descriptors.cpu().numpy() all_descriptors[indices.numpy(), :] = descriptors queries_descriptors = all_descriptors[eval_ds.database_num:] database_descriptors = all_descriptors[:eval_ds.database_num] # Use a kNN to find predictions faiss_index = faiss.IndexFlatL2(args.fc_output_dim) faiss_index.add(database_descriptors) del database_descriptors, all_descriptors logging.debug("Calculating recalls") _, predictions = faiss_index.search(queries_descriptors, max(RECALL_VALUES)) #### For each query, check if the predictions are correct positives_per_query = eval_ds.get_positives() recalls = np.zeros(len(RECALL_VALUES)) for query_index, preds in enumerate(predictions): for i, n in enumerate(RECALL_VALUES): if np.any(np.in1d(preds[:n], positives_per_query[query_index])): recalls[i:] += 1 break # Divide by queries_num and multiply by 100, so the recalls are in percentages recalls = recalls / eval_ds.queries_num * 100 recalls_str = ", ".join([f"R@{val}: {rec:.1f}" for val, rec in zip(RECALL_VALUES, recalls)]) # Save visualizations of predictions if num_preds_to_save != 0: # For each query save num_preds_to_save predictions visualizations.save_preds(predictions[:, :num_preds_to_save], eval_ds, args.output_folder, args.save_only_wrong_preds) return recalls, recalls_str ================================================ FILE: train.py ================================================ import sys import torch import logging import numpy as np from tqdm import tqdm import multiprocessing from datetime import datetime import torchvision.transforms as T import test import util import parser import commons import cosface_loss import augmentations from cosplace_model import cosplace_network from datasets.test_dataset import TestDataset from datasets.train_dataset import TrainDataset torch.backends.cudnn.benchmark = True # Provides a speedup args = parser.parse_arguments() start_time = datetime.now() args.output_folder = f"logs/{args.save_dir}/{start_time.strftime('%Y-%m-%d_%H-%M-%S')}" commons.make_deterministic(args.seed) commons.setup_logging(args.output_folder, console="debug") logging.info(" ".join(sys.argv)) logging.info(f"Arguments: {args}") logging.info(f"The outputs are being saved in {args.output_folder}") #### Model model = cosplace_network.GeoLocalizationNet(args.backbone, args.fc_output_dim, args.train_all_layers) logging.info(f"There are {torch.cuda.device_count()} GPUs and {multiprocessing.cpu_count()} CPUs.") if args.resume_model is not None: logging.debug(f"Loading model from {args.resume_model}") model_state_dict = torch.load(args.resume_model) model.load_state_dict(model_state_dict) model = model.to(args.device).train() #### Optimizer criterion = torch.nn.CrossEntropyLoss() model_optimizer = torch.optim.Adam(model.parameters(), lr=args.lr) #### Datasets groups = [TrainDataset(args, args.train_set_folder, M=args.M, alpha=args.alpha, N=args.N, L=args.L, current_group=n, min_images_per_class=args.min_images_per_class) for n in range(args.groups_num)] # Each group has its own classifier, which depends on the number of classes in the group classifiers = [cosface_loss.MarginCosineProduct(args.fc_output_dim, len(group)) for group in groups] classifiers_optimizers = [torch.optim.Adam(classifier.parameters(), lr=args.classifiers_lr) for classifier in classifiers] logging.info(f"Using {len(groups)} groups") logging.info(f"The {len(groups)} groups have respectively the following number of classes {[len(g) for g in groups]}") logging.info(f"The {len(groups)} groups have respectively the following number of images {[g.get_images_num() for g in groups]}") val_ds = TestDataset(args.val_set_folder, positive_dist_threshold=args.positive_dist_threshold, image_size=args.image_size, resize_test_imgs=args.resize_test_imgs) test_ds = TestDataset(args.test_set_folder, queries_folder="queries_v1", positive_dist_threshold=args.positive_dist_threshold, image_size=args.image_size, resize_test_imgs=args.resize_test_imgs) logging.info(f"Validation set: {val_ds}") logging.info(f"Test set: {test_ds}") #### Resume if args.resume_train: model, model_optimizer, classifiers, classifiers_optimizers, best_val_recall1, start_epoch_num = \ util.resume_train(args, args.output_folder, model, model_optimizer, classifiers, classifiers_optimizers) model = model.to(args.device) epoch_num = start_epoch_num - 1 logging.info(f"Resuming from epoch {start_epoch_num} with best R@1 {best_val_recall1:.1f} from checkpoint {args.resume_train}") else: best_val_recall1 = start_epoch_num = 0 #### Train / evaluation loop logging.info("Start training ...") logging.info(f"There are {len(groups[0])} classes for the first group, " + f"each epoch has {args.iterations_per_epoch} iterations " + f"with batch_size {args.batch_size}, therefore the model sees each class (on average) " + f"{args.iterations_per_epoch * args.batch_size / len(groups[0]):.1f} times per epoch") if args.augmentation_device == "cuda": gpu_augmentation = T.Compose([ augmentations.DeviceAgnosticColorJitter(brightness=args.brightness, contrast=args.contrast, saturation=args.saturation, hue=args.hue), augmentations.DeviceAgnosticRandomResizedCrop([args.image_size, args.image_size], scale=[1-args.random_resized_crop, 1]), T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) if args.use_amp16: scaler = torch.cuda.amp.GradScaler() for epoch_num in range(start_epoch_num, args.epochs_num): #### Train epoch_start_time = datetime.now() # Select classifier and dataloader according to epoch current_group_num = epoch_num % args.groups_num classifiers[current_group_num] = classifiers[current_group_num].to(args.device) util.move_to_device(classifiers_optimizers[current_group_num], args.device) dataloader = commons.InfiniteDataLoader(groups[current_group_num], num_workers=args.num_workers, batch_size=args.batch_size, shuffle=True, pin_memory=(args.device == "cuda"), drop_last=True) dataloader_iterator = iter(dataloader) model = model.train() epoch_losses = np.zeros((0, 1), dtype=np.float32) for iteration in tqdm(range(args.iterations_per_epoch), ncols=100): images, targets, _ = next(dataloader_iterator) images, targets = images.to(args.device), targets.to(args.device) if args.augmentation_device == "cuda": images = gpu_augmentation(images) model_optimizer.zero_grad() classifiers_optimizers[current_group_num].zero_grad() if not args.use_amp16: descriptors = model(images) output = classifiers[current_group_num](descriptors, targets) loss = criterion(output, targets) loss.backward() epoch_losses = np.append(epoch_losses, loss.item()) del loss, output, images model_optimizer.step() classifiers_optimizers[current_group_num].step() else: # Use AMP 16 with torch.cuda.amp.autocast(): descriptors = model(images) output = classifiers[current_group_num](descriptors, targets) loss = criterion(output, targets) scaler.scale(loss).backward() epoch_losses = np.append(epoch_losses, loss.item()) del loss, output, images scaler.step(model_optimizer) scaler.step(classifiers_optimizers[current_group_num]) scaler.update() classifiers[current_group_num] = classifiers[current_group_num].cpu() util.move_to_device(classifiers_optimizers[current_group_num], "cpu") logging.debug(f"Epoch {epoch_num:02d} in {str(datetime.now() - epoch_start_time)[:-7]}, " f"loss = {epoch_losses.mean():.4f}") #### Evaluation recalls, recalls_str = test.test(args, val_ds, model) logging.info(f"Epoch {epoch_num:02d} in {str(datetime.now() - epoch_start_time)[:-7]}, {val_ds}: {recalls_str[:20]}") is_best = recalls[0] > best_val_recall1 best_val_recall1 = max(recalls[0], best_val_recall1) # Save checkpoint, which contains all training parameters util.save_checkpoint({ "epoch_num": epoch_num + 1, "model_state_dict": model.state_dict(), "optimizer_state_dict": model_optimizer.state_dict(), "classifiers_state_dict": [c.state_dict() for c in classifiers], "optimizers_state_dict": [c.state_dict() for c in classifiers_optimizers], "best_val_recall1": best_val_recall1 }, is_best, args.output_folder) logging.info(f"Trained for {epoch_num+1:02d} epochs, in total in {str(datetime.now() - start_time)[:-7]}") #### Test best model on test set v1 best_model_state_dict = torch.load(f"{args.output_folder}/best_model.pth") model.load_state_dict(best_model_state_dict) logging.info(f"Now testing on the test set: {test_ds}") recalls, recalls_str = test.test(args, test_ds, model, args.num_preds_to_save) logging.info(f"{test_ds}: {recalls_str}") logging.info("Experiment finished (without any errors)") ================================================ FILE: util.py ================================================ import torch import shutil import logging from typing import Type, List from argparse import Namespace from cosface_loss import MarginCosineProduct def move_to_device(optimizer: Type[torch.optim.Optimizer], device: str): for state in optimizer.state.values(): for k, v in state.items(): if torch.is_tensor(v): state[k] = v.to(device) def save_checkpoint(state: dict, is_best: bool, output_folder: str, ckpt_filename: str = "last_checkpoint.pth"): # TODO it would be better to move weights to cpu before saving checkpoint_path = f"{output_folder}/{ckpt_filename}" torch.save(state, checkpoint_path) if is_best: torch.save(state["model_state_dict"], f"{output_folder}/best_model.pth") def resume_train(args: Namespace, output_folder: str, model: torch.nn.Module, model_optimizer: Type[torch.optim.Optimizer], classifiers: List[MarginCosineProduct], classifiers_optimizers: List[Type[torch.optim.Optimizer]]): """Load model, optimizer, and other training parameters""" logging.info(f"Loading checkpoint: {args.resume_train}") checkpoint = torch.load(args.resume_train) start_epoch_num = checkpoint["epoch_num"] model_state_dict = checkpoint["model_state_dict"] model.load_state_dict(model_state_dict) model = model.to(args.device) model_optimizer.load_state_dict(checkpoint["optimizer_state_dict"]) assert args.groups_num == len(classifiers) == len(classifiers_optimizers) == \ len(checkpoint["classifiers_state_dict"]) == len(checkpoint["optimizers_state_dict"]), \ (f"{args.groups_num}, {len(classifiers)}, {len(classifiers_optimizers)}, " f"{len(checkpoint['classifiers_state_dict'])}, {len(checkpoint['optimizers_state_dict'])}") for c, sd in zip(classifiers, checkpoint["classifiers_state_dict"]): # Move classifiers to GPU before loading their optimizers c = c.to(args.device) c.load_state_dict(sd) for c, sd in zip(classifiers_optimizers, checkpoint["optimizers_state_dict"]): c.load_state_dict(sd) for c in classifiers: # Move classifiers back to CPU to save some GPU memory c = c.cpu() best_val_recall1 = checkpoint["best_val_recall1"] # Copy best model to current output_folder shutil.copy(args.resume_train.replace("last_checkpoint.pth", "best_model.pth"), output_folder) return model, model_optimizer, classifiers, classifiers_optimizers, best_val_recall1, start_epoch_num ================================================ FILE: visualizations.py ================================================ import os import cv2 import numpy as np from tqdm import tqdm from skimage.transform import rescale from PIL import Image, ImageDraw, ImageFont # Height and width of a single image H = 512 W = 512 TEXT_H = 175 FONTSIZE = 80 SPACE = 50 # Space between two images def write_labels_to_image(labels=["text1", "text2"]): """Creates an image with vertical text, spaced along rows.""" font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", FONTSIZE) img = Image.new('RGB', ((W * len(labels)) + 50 * (len(labels)-1), TEXT_H), (1, 1, 1)) d = ImageDraw.Draw(img) for i, text in enumerate(labels): _, _, w, h = d.textbbox((0,0), text, font=font) d.text(((W+SPACE)*i + W//2 - w//2, 1), text, fill=(0, 0, 0), font=font) return np.array(img) def draw(img, c=(0, 255, 0), thickness=20): """Draw a colored (usually red or green) box around an image.""" p = np.array([[0, 0], [0, img.shape[0]], [img.shape[1], img.shape[0]], [img.shape[1], 0]]) for i in range(3): cv2.line(img, (p[i, 0], p[i, 1]), (p[i+1, 0], p[i+1, 1]), c, thickness=thickness*2) return cv2.line(img, (p[3, 0], p[3, 1]), (p[0, 0], p[0, 1]), c, thickness=thickness*2) def build_prediction_image(images_paths, preds_correct=None): """Build a row of images, where the first is the query and the rest are predictions. For each image, if is_correct then draw a green/red box. """ assert len(images_paths) == len(preds_correct) labels = ["Query"] + [f"Pr{i} - {is_correct}" for i, is_correct in enumerate(preds_correct[1:])] num_images = len(images_paths) images = [np.array(Image.open(path)) for path in images_paths] for img, correct in zip(images, preds_correct): if correct is None: continue color = (0, 255, 0) if correct else (255, 0, 0) draw(img, color) concat_image = np.ones([H, (num_images*W)+((num_images-1)*SPACE), 3]) rescaleds = [rescale(i, [min(H/i.shape[0], W/i.shape[1]), min(H/i.shape[0], W/i.shape[1]), 1]) for i in images] for i, image in enumerate(rescaleds): pad_width = (W - image.shape[1] + 1) // 2 pad_height = (H - image.shape[0] + 1) // 2 image = np.pad(image, [[pad_height, pad_height], [pad_width, pad_width], [0, 0]], constant_values=1)[:H, :W] concat_image[: , i*(W+SPACE) : i*(W+SPACE)+W] = image try: labels_image = write_labels_to_image(labels) final_image = np.concatenate([labels_image, concat_image]) except OSError: # Handle error in case of missing PIL ImageFont final_image = concat_image final_image = Image.fromarray((final_image*255).astype(np.uint8)) return final_image def save_file_with_paths(query_path, preds_paths, positives_paths, output_path): file_content = [] file_content.append("Query path:") file_content.append(query_path + "\n") file_content.append("Predictions paths:") file_content.append("\n".join(preds_paths) + "\n") file_content.append("Positives paths:") file_content.append("\n".join(positives_paths) + "\n") with open(output_path, "w") as file: _ = file.write("\n".join(file_content)) def save_preds(predictions, eval_ds, output_folder, save_only_wrong_preds=None): """For each query, save an image containing the query and its predictions, and a file with the paths of the query, its predictions and its positives. Parameters ---------- predictions : np.array of shape [num_queries x num_preds_to_viz], with the preds for each query eval_ds : TestDataset output_folder : str / Path with the path to save the predictions save_only_wrong_preds : bool, if True save only the wrongly predicted queries, i.e. the ones where the first pred is uncorrect (further than 25 m) """ positives_per_query = eval_ds.get_positives() os.makedirs(f"{output_folder}/preds", exist_ok=True) for query_index, preds in enumerate(tqdm(predictions, ncols=80, desc=f"Saving preds in {output_folder}")): query_path = eval_ds.queries_paths[query_index] list_of_images_paths = [query_path] # List of None (query), True (correct preds) or False (wrong preds) preds_correct = [None] for pred_index, pred in enumerate(preds): pred_path = eval_ds.database_paths[pred] list_of_images_paths.append(pred_path) is_correct = pred in positives_per_query[query_index] preds_correct.append(is_correct) if save_only_wrong_preds and preds_correct[1]: continue prediction_image = build_prediction_image(list_of_images_paths, preds_correct) pred_image_path = f"{output_folder}/preds/{query_index:03d}.jpg" prediction_image.save(pred_image_path) positives_paths = [eval_ds.database_paths[idx] for idx in positives_per_query[query_index]] save_file_with_paths( query_path=list_of_images_paths[0], preds_paths=list_of_images_paths[1:], positives_paths=positives_paths, output_path=f"{output_folder}/preds/{query_index:03d}.txt" )