Repository: gmberton/CosPlace
Branch: main
Commit: 52b56e95ea62
Files: 21
Total size: 64.2 KB
Directory structure:
gitextract_ij6gs5cz/
├── .gitignore
├── LICENSE
├── README.md
├── augmentations.py
├── commons.py
├── cosface_loss.py
├── cosplace_model/
│ ├── __init__.py
│ ├── cosplace_network.py
│ └── layers.py
├── datasets/
│ ├── __init__.py
│ ├── dataset_utils.py
│ ├── test_dataset.py
│ └── train_dataset.py
├── eval.py
├── hubconf.py
├── parser.py
├── requirements.txt
├── test.py
├── train.py
├── util.py
└── visualizations.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
.spyproject
__pycache__
logs
cache
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2022 Gabriele Berton, Carlo Masone, Barbara Caputo
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# Rethinking Visual Geo-localization for Large-Scale Applications
This is the official pyTorch implementation of the CVPR 2022 paper "Rethinking Visual Geo-localization for Large-Scale Applications".
The paper presents a new dataset called San Francisco eXtra Large (SF-XL, go [_here_](https://forms.gle/wpyDzhDyoWLQygAT9) to download it), and a highly scalable training method (called CosPlace), which allows to reach SOTA results with compact descriptors.
[[CVPR OpenAccess](https://openaccess.thecvf.com/content/CVPR2022/html/Berton_Rethinking_Visual_Geo-Localization_for_Large-Scale_Applications_CVPR_2022_paper.html)] [[ArXiv](https://arxiv.org/abs/2204.02287)] [[Video](https://www.youtube.com/watch?v=oDyL6oVNN3I)] [[BibTex](https://github.com/gmberton/CosPlace?tab=readme-ov-file#cite)]
Note that CosPlace is quite old. **🚀 Looking for SOTA Visual Place Recognition (VPR)? Check out [MegaLoc](https://github.com/gmberton/MegaLoc)**
The images below represent respectively:
1) the map of San Francisco eXtra Large
2) a visualization of how CosPlace Groups (read datasets) are formed
3) results with CosPlace vs other methods on Pitts250k (CosPlace trained on SF-XL, others on Pitts30k)
<p float="left">
<img src="https://github.com/gmberton/gmberton.github.io/blob/main/images/SF-XL%20map.jpg" height="150" />
<img src="https://github.com/gmberton/gmberton.github.io/blob/main/images/map_groups.png" height="150" />
<img src="https://github.com/gmberton/gmberton.github.io/blob/main/images/backbones_pitts250k_main.png" height="150" />
</p>
## Train
After downloading the SF-XL dataset, simply run
`$ python3 train.py --train_set_folder path/to/sf_xl/raw/train/database --val_set_folder path/to/sf_xl/processed/val --test_set_folder path/to/sf_xl/processed/test`
the script automatically splits SF-XL in CosPlace Groups, and saves the resulting object in the folder `cache`.
By default training is performed with a ResNet-18 with descriptors dimensionality 512, which fits in less than 4GB of VRAM.
To change the backbone or the output descriptors dimensionality simply run
`$ python3 train.py --backbone ResNet50 --fc_output_dim 128`
You can also speed up your training with Automatic Mixed Precision (note that all results/statistics from the paper did not use AMP)
`$ python3 train.py --use_amp16`
Run `$ python3 train.py -h` to have a look at all the hyperparameters that you can change. You will find all hyperparameters mentioned in the paper.
#### Dataset size and lightweight version
The SF-XL dataset is about 1 TB.
For training only a subset of the images is used, and you can use this subset for training, which is only 360 GB.
If this is still too heavy for you (e.g. if you're using Colab), but you would like to run CosPlace, we also created a small version of SF-XL, which is only 5 GB.
Obviously, using the small version will lead to lower results, and it should be used only for debugging / exploration purposes.
More information on the dataset and lightweight version are on the README that you can find on the dataset download page (go [_here_](https://forms.gle/wpyDzhDyoWLQygAT9) to find it).
#### Reproducibility
Results from the paper are fully reproducible, and we followed deep learning's best practices (average over multiple runs for the main results, validation/early stopping and hyperparameter search on the val set).
If you are a researcher comparing your work against ours, please make sure to follow these best practices and avoid picking the best model on the test set.
## Test
You can test a trained model as such
`$ python3 eval.py --backbone ResNet50 --fc_output_dim 128 --resume_model path/to/best_model.pth`
You can download plenty of trained models below.
### Visualize predictions
Predictions can be easily visualized through the `num_preds_to_save` parameter. For example running this
```
python3 eval.py --backbone ResNet50 --fc_output_dim 512 --resume_model path/to/best_model.pth \
--num_preds_to_save=3 --exp_name=cosplace_on_stlucia
```
will generate under the path `./logs/cosplace_on_stlucia/*/preds` images such as
<p float="left">
<img src="https://raw.githubusercontent.com/gmberton/VPR-methods-evaluation/master/images/pred.jpg" height="200"/>
</p>
Given that saving predictions for each query might take long, you can also pass the parameter `--save_only_wrong_preds` which will save only predictions for wrongly predicted queries (i.e. where the first prediction is wrong), which should be the most interesting failure cases.
## Trained Models
We now have all our trained models on [PyTorch Hub](https://pytorch.org/docs/stable/hub.html), so that you can use them in any codebase without cloning this repository simply like this
```
import torch
model = torch.hub.load("gmberton/cosplace", "get_trained_model", backbone="ResNet50", fc_output_dim=2048)
```
As an alternative, you can download the trained models from the table below, which provides links to models with different backbones and dimensionality of descriptors, trained on SF-XL.
<table>
<tr>
<th rowspan=2>Model</th>
<th colspan=7>Dimension of Descriptors</th>
</tr>
<tr>
<td>32</td>
<td>64</td>
<td>128</td>
<td>256</td>
<td>512</td>
<td>1024</td>
<td>2048</td>
</tr>
<tr>
<td>ResNet-18</td>
<td><a href="https://drive.google.com/file/d/1tfT8r2fBeMVAEHg2bVfCql5pV9YzK620/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1-d_Yi3ly3bY6hUW1F9w144FFKsZtYBL4/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1HaQjGY5x--Ok0RcspVVjZ0bwrAVmBvrZ/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1hjkogugTsHTQ6GTuW3MHqx-t4cXqx0uo/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1rQAC2ZddDjzwB2OVqAcNgCFEf3gLNa9U/view?usp=sharing">link</a></td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>ResNet-50</td>
<td><a href="https://drive.google.com/file/d/18AxbLO66CO0kG05-1YrRb1YwqN7Wgp6Z/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1F2WMt7vMUqXBjsZDIwSga3N0l0r9NP2s/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/14U3jsoNEWC-QsINoVCWZaHFUGE20fIgZ/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1Q2sZPEJfHAe19JaZkdgeFotUYwKbV_x2/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1LgDaxCjbQqQWuk5qrPogfg7oN8Ksl1jh/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1VBLUiQJfmnZ4kVQIrXBW-AE1dZ3EnMv2/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1yNzxsMg34KO04UJ49ncANdCIWlB3aUGA/view?usp=sharing">link</a></td>
</tr>
<tr>
<td>ResNet-101</td>
<td><a href="https://drive.google.com/file/d/1a5FqhujOn0Pr6duKrRknoOgz8L8ckDSE/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/17C8jBQluxsbI9d8Bzf67b5OsauOJAIuX/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1w37AztnIyGVklBMtm-lwkajb0DWbYhhc/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1G5_I4vX4s4_oiAC3EWbrCyXrCOkV8Bbs/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1uBKpNfMBt6sLIjCGfH6Orx9eQdQgN-8Z/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/12BU8BgfqFYzGLXXNaKLpaAzTHuN5I9gQ/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1PF7lsSw1sFMh-Bl_xwO74fM1InyYy1t8/view?usp=sharing">link</a></td>
</tr>
<tr>
<td>ResNet-152</td>
<td><a href="https://drive.google.com/file/d/12pI1FToqKKt8I6-802CHWXDP-JmHEFSW/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1rTjlv_pNtXgxY8VELiGYvLcgXiRa2zqB/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1q5-szPBn4zL8evWmYT04wFaKjen66mrk/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1sCQMA_rsIjmD-f381I0f2yDf0At4TnSx/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1ggNYQfGSfE-dciKCS_6SKeQT76O0OXPX/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/15vBWuHVqEMxkAWWrc7IrkGsQroC65tPc/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1AlF7xPSswDLA1TdhZ9yTVBkfRnJm0Hn8/view?usp=sharing">link</a></td>
</tr>
<tr>
<td>VGG-16</td>
<td>-</td>
<td><a href="https://drive.google.com/file/d/1YJTBwagC0v50oPydpKtsTnGZnaYOV0z-/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1vgw509lGBfJR46cGDJGkFcdBTGhIeyAH/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1-4JtACE47rkXXSAlRBFIbydimfKemdo7/view?usp=sharing">link</a></td>
<td><a href="https://drive.google.com/file/d/1F6CT-rnAGTTexdpLoQYncn-ooqzJe6wf/view?usp=sharing">link</a></td>
<td>-</td>
<td>-</td>
</tr>
</table>
Or you can download all models at once at [this link](https://drive.google.com/drive/folders/1WzSLnv05FLm-XqP5DxR5nXaaixH23uvV?usp=sharing)
## Issues
If you have any questions regarding our code or dataset, feel free to open an issue or send an email to berton.gabri@gmail.com
## Acknowledgements
Parts of this repo are inspired by the following repositories:
- [CosFace implementation in PyTorch](https://github.com/MuggleWang/CosFace_pytorch/blob/master/layer.py)
- [CNN Image Retrieval in PyTorch](https://github.com/filipradenovic/cnnimageretrieval-pytorch) (for the GeM layer)
- [Visual Geo-localization benchmark](https://github.com/gmberton/deep-visual-geo-localization-benchmark) (for the evaluation / test code)
## Cite
Here is the bibtex to cite our paper
```bibtex
@inproceedings{Berton_CVPR_2022_CosPlace,
author = {Berton, Gabriele and Masone, Carlo and Caputo, Barbara},
title = {Rethinking Visual Geo-Localization for Large-Scale Applications},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
month = {June},
year = {2022},
pages = {4878--4888}
}
```
================================================
FILE: augmentations.py
================================================
import torch
from typing import Tuple, Union
import torchvision.transforms as T
class DeviceAgnosticColorJitter(T.ColorJitter):
def __init__(self, brightness: float = 0., contrast: float = 0., saturation: float = 0., hue: float = 0.):
"""This is the same as T.ColorJitter but it only accepts batches of images and works on GPU"""
super().__init__(brightness=brightness, contrast=contrast, saturation=saturation, hue=hue)
def forward(self, images: torch.Tensor) -> torch.Tensor:
assert len(images.shape) == 4, f"images should be a batch of images, but it has shape {images.shape}"
B, C, H, W = images.shape
# Applies a different color jitter to each image
color_jitter = super(DeviceAgnosticColorJitter, self).forward
augmented_images = [color_jitter(img).unsqueeze(0) for img in images]
augmented_images = torch.cat(augmented_images)
assert augmented_images.shape == torch.Size([B, C, H, W])
return augmented_images
class DeviceAgnosticRandomResizedCrop(T.RandomResizedCrop):
def __init__(self, size: Union[int, Tuple[int, int]], scale: float):
"""This is the same as T.RandomResizedCrop but it only accepts batches of images and works on GPU"""
super().__init__(size=size, scale=scale, antialias=True)
def forward(self, images: torch.Tensor) -> torch.Tensor:
assert len(images.shape) == 4, f"images should be a batch of images, but it has shape {images.shape}"
B, C, H, W = images.shape
# Applies a different RandomResizedCrop to each image
random_resized_crop = super(DeviceAgnosticRandomResizedCrop, self).forward
augmented_images = [random_resized_crop(img).unsqueeze(0) for img in images]
augmented_images = torch.cat(augmented_images)
return augmented_images
if __name__ == "__main__":
"""
You can run this script to visualize the transformations, and verify that
the augmentations are applied individually on each image of the batch.
"""
from PIL import Image
# Import skimage in here, so it is not necessary to install it unless you run this script
from skimage import data
# Initialize DeviceAgnosticRandomResizedCrop
random_crop = DeviceAgnosticRandomResizedCrop(size=[256, 256], scale=[0.5, 1])
# Create a batch with 2 astronaut images
pil_image = Image.fromarray(data.astronaut())
tensor_image = T.functional.to_tensor(pil_image).unsqueeze(0)
images_batch = torch.cat([tensor_image, tensor_image])
# Apply augmentation (individually on each of the 2 images)
augmented_batch = random_crop(images_batch)
# Convert to PIL images
augmented_image_0 = T.functional.to_pil_image(augmented_batch[0])
augmented_image_1 = T.functional.to_pil_image(augmented_batch[1])
# Visualize the original image, as well as the two augmented ones
pil_image.show()
augmented_image_0.show()
augmented_image_1.show()
================================================
FILE: commons.py
================================================
import os
import sys
import torch
import random
import logging
import traceback
import numpy as np
class InfiniteDataLoader(torch.utils.data.DataLoader):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.dataset_iterator = super().__iter__()
def __iter__(self):
return self
def __next__(self):
try:
batch = next(self.dataset_iterator)
except StopIteration:
self.dataset_iterator = super().__iter__()
batch = next(self.dataset_iterator)
return batch
def make_deterministic(seed: int = 0):
"""Make results deterministic. If seed == -1, do not make deterministic.
Running your script in a deterministic way might slow it down.
Note that for some packages (eg: sklearn's PCA) this function is not enough.
"""
seed = int(seed)
if seed == -1:
return
random.seed(seed)
np.random.seed(seed)
torch.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
def setup_logging(output_folder: str, exist_ok: bool = False, console: str = "debug",
info_filename: str = "info.log", debug_filename: str = "debug.log"):
"""Set up logging files and console output.
Creates one file for INFO logs and one for DEBUG logs.
Args:
output_folder (str): creates the folder where to save the files.
exist_ok (boolean): if False throw a FileExistsError if output_folder already exists
debug (str):
if == "debug" prints on console debug messages and higher
if == "info" prints on console info messages and higher
if == None does not use console (useful when a logger has already been set)
info_filename (str): the name of the info file. if None, don't create info file
debug_filename (str): the name of the debug file. if None, don't create debug file
"""
if not exist_ok and os.path.exists(output_folder):
raise FileExistsError(f"{output_folder} already exists!")
os.makedirs(output_folder, exist_ok=True)
base_formatter = logging.Formatter('%(asctime)s %(message)s', "%Y-%m-%d %H:%M:%S")
logger = logging.getLogger('')
logger.setLevel(logging.DEBUG)
if info_filename is not None:
info_file_handler = logging.FileHandler(f'{output_folder}/{info_filename}')
info_file_handler.setLevel(logging.INFO)
info_file_handler.setFormatter(base_formatter)
logger.addHandler(info_file_handler)
if debug_filename is not None:
debug_file_handler = logging.FileHandler(f'{output_folder}/{debug_filename}')
debug_file_handler.setLevel(logging.DEBUG)
debug_file_handler.setFormatter(base_formatter)
logger.addHandler(debug_file_handler)
if console is not None:
console_handler = logging.StreamHandler()
if console == "debug":
console_handler.setLevel(logging.DEBUG)
if console == "info":
console_handler.setLevel(logging.INFO)
console_handler.setFormatter(base_formatter)
logger.addHandler(console_handler)
def my_handler(type_, value, tb):
logger.info("\n" + "".join(traceback.format_exception(type, value, tb)))
logging.info("Experiment finished (with some errors)")
sys.excepthook = my_handler
================================================
FILE: cosface_loss.py
================================================
# Based on https://github.com/MuggleWang/CosFace_pytorch/blob/master/layer.py
import torch
import torch.nn as nn
from torch.nn import Parameter
def cosine_sim(x1: torch.Tensor, x2: torch.Tensor, dim: int = 1, eps: float = 1e-8) -> torch.Tensor:
ip = torch.mm(x1, x2.t())
w1 = torch.norm(x1, 2, dim)
w2 = torch.norm(x2, 2, dim)
return ip / torch.ger(w1, w2).clamp(min=eps)
class MarginCosineProduct(nn.Module):
"""Implement of large margin cosine distance:
Args:
in_features: size of each input sample
out_features: size of each output sample
s: norm of input feature
m: margin
"""
def __init__(self, in_features: int, out_features: int, s: float = 30.0, m: float = 0.40):
super().__init__()
self.in_features = in_features
self.out_features = out_features
self.s = s
self.m = m
self.weight = Parameter(torch.Tensor(out_features, in_features))
nn.init.xavier_uniform_(self.weight)
def forward(self, inputs: torch.Tensor, label: torch.Tensor) -> torch.Tensor:
cosine = cosine_sim(inputs, self.weight)
one_hot = torch.zeros_like(cosine)
one_hot.scatter_(1, label.view(-1, 1), 1.0)
output = self.s * (cosine - one_hot * self.m)
return output
def __repr__(self):
return self.__class__.__name__ + '(' \
+ 'in_features=' + str(self.in_features) \
+ ', out_features=' + str(self.out_features) \
+ ', s=' + str(self.s) \
+ ', m=' + str(self.m) + ')'
================================================
FILE: cosplace_model/__init__.py
================================================
================================================
FILE: cosplace_model/cosplace_network.py
================================================
import torch
import logging
import torchvision
from torch import nn
from typing import Tuple
from cosplace_model.layers import Flatten, L2Norm, GeM
# The number of channels in the last convolutional layer, the one before average pooling
CHANNELS_NUM_IN_LAST_CONV = {
"ResNet18": 512,
"ResNet50": 2048,
"ResNet101": 2048,
"ResNet152": 2048,
"VGG16": 512,
"EfficientNet_B0": 1280,
"EfficientNet_B1": 1280,
"EfficientNet_B2": 1408,
"EfficientNet_B3": 1536,
"EfficientNet_B4": 1792,
"EfficientNet_B5": 2048,
"EfficientNet_B6": 2304,
"EfficientNet_B7": 2560,
}
class GeoLocalizationNet(nn.Module):
def __init__(self, backbone : str, fc_output_dim : int, train_all_layers : bool = False):
"""Return a model for GeoLocalization.
Args:
backbone (str): which torchvision backbone to use. Must be VGG16 or a ResNet.
fc_output_dim (int): the output dimension of the last fc layer, equivalent to the descriptors dimension.
train_all_layers (bool): whether to freeze the first layers of the backbone during training or not.
"""
super().__init__()
assert backbone in CHANNELS_NUM_IN_LAST_CONV, f"backbone must be one of {list(CHANNELS_NUM_IN_LAST_CONV.keys())}"
self.backbone, features_dim = get_backbone(backbone, train_all_layers)
self.aggregation = nn.Sequential(
L2Norm(),
GeM(),
Flatten(),
nn.Linear(features_dim, fc_output_dim),
L2Norm()
)
def forward(self, x):
x = self.backbone(x)
x = self.aggregation(x)
return x
def get_pretrained_torchvision_model(backbone_name : str) -> torch.nn.Module:
"""This function takes the name of a backbone and returns the corresponding pretrained
model from torchvision. Examples of backbone_name are 'VGG16' or 'ResNet18'
"""
try: # Newer versions of pytorch require to pass weights=weights_module.DEFAULT
weights_module = getattr(__import__('torchvision.models', fromlist=[f"{backbone_name}_Weights"]), f"{backbone_name}_Weights")
model = getattr(torchvision.models, backbone_name.lower())(weights=weights_module.DEFAULT)
except (ImportError, AttributeError): # Older versions of pytorch require to pass pretrained=True
model = getattr(torchvision.models, backbone_name.lower())(pretrained=True)
return model
def get_backbone(backbone_name : str, train_all_layers : bool) -> Tuple[torch.nn.Module, int]:
backbone = get_pretrained_torchvision_model(backbone_name)
if backbone_name.startswith("ResNet"):
if train_all_layers:
logging.debug(f"Train all layers of the {backbone_name}")
else:
for name, child in backbone.named_children():
if name == "layer3": # Freeze layers before conv_3
break
for params in child.parameters():
params.requires_grad = False
logging.debug(f"Train only layer3 and layer4 of the {backbone_name}, freeze the previous ones")
layers = list(backbone.children())[:-2] # Remove avg pooling and FC layer
elif backbone_name == "VGG16":
layers = list(backbone.features.children())[:-2] # Remove avg pooling and FC layer
if train_all_layers:
logging.debug("Train all layers of the VGG-16")
else:
for layer in layers[:-5]:
for p in layer.parameters():
p.requires_grad = False
logging.debug("Train last layers of the VGG-16, freeze the previous ones")
elif backbone_name.startswith("EfficientNet"):
if train_all_layers:
logging.debug(f"Train all layers of the {backbone_name}")
else:
for name, child in backbone.features.named_children():
if name == "5": # Freeze layers before block 5
break
for params in child.parameters():
params.requires_grad = False
logging.debug(f"Train only the last three blocks of the {backbone_name}, freeze the previous ones")
layers = list(backbone.children())[:-2] # Remove avg pooling and FC layer
backbone = torch.nn.Sequential(*layers)
features_dim = CHANNELS_NUM_IN_LAST_CONV[backbone_name]
return backbone, features_dim
================================================
FILE: cosplace_model/layers.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.parameter import Parameter
def gem(x, p=torch.ones(1)*3, eps: float = 1e-6):
return F.avg_pool2d(x.clamp(min=eps).pow(p), (x.size(-2), x.size(-1))).pow(1./p)
class GeM(nn.Module):
def __init__(self, p=3, eps=1e-6):
super().__init__()
self.p = Parameter(torch.ones(1)*p)
self.eps = eps
def forward(self, x):
return gem(x, p=self.p, eps=self.eps)
def __repr__(self):
return f"{self.__class__.__name__}(p={self.p.data.tolist()[0]:.4f}, eps={self.eps})"
class Flatten(torch.nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
assert x.shape[2] == x.shape[3] == 1, f"{x.shape[2]} != {x.shape[3]} != 1"
return x[:, :, 0, 0]
class L2Norm(nn.Module):
def __init__(self, dim=1):
super().__init__()
self.dim = dim
def forward(self, x):
return F.normalize(x, p=2.0, dim=self.dim)
================================================
FILE: datasets/__init__.py
================================================
================================================
FILE: datasets/dataset_utils.py
================================================
import os
import logging
from glob import glob
from PIL import ImageFile
ImageFile.LOAD_TRUNCATED_IMAGES = True
def read_images_paths(dataset_folder, get_abs_path=False):
"""Find images within 'dataset_folder' and return their relative paths as a list.
If there is a file 'dataset_folder'_images_paths.txt, read paths from such file.
Otherwise, use glob(). Keeping the paths in the file speeds up computation,
because using glob over large folders can be slow.
Parameters
----------
dataset_folder : str, folder containing JPEG images
get_abs_path : bool, if True return absolute paths, otherwise remove
dataset_folder from each path
Returns
-------
images_paths : list[str], paths of JPEG images within dataset_folder
"""
if not os.path.exists(dataset_folder):
raise FileNotFoundError(f"Folder {dataset_folder} does not exist")
file_with_paths = dataset_folder + "_images_paths.txt"
if os.path.exists(file_with_paths):
logging.debug(f"Reading paths of images within {dataset_folder} from {file_with_paths}")
with open(file_with_paths, "r") as file:
images_paths = file.read().splitlines()
images_paths = [os.path.join(dataset_folder, path) for path in images_paths]
# Sanity check that paths within the file exist
if not os.path.exists(images_paths[0]):
raise FileNotFoundError(f"Image with path {images_paths[0]} "
f"does not exist within {dataset_folder}. It is likely "
f"that the content of {file_with_paths} is wrong.")
else:
logging.debug(f"Searching images in {dataset_folder} with glob()")
images_paths = sorted(glob(f"{dataset_folder}/**/*.jpg", recursive=True))
if len(images_paths) == 0:
raise FileNotFoundError(f"Directory {dataset_folder} does not contain any JPEG images")
if not get_abs_path: # Remove dataset_folder from the path
images_paths = [p[len(dataset_folder) + 1:] for p in images_paths]
return images_paths
================================================
FILE: datasets/test_dataset.py
================================================
import os
import numpy as np
from PIL import Image
import torch.utils.data as data
import torchvision.transforms as transforms
from sklearn.neighbors import NearestNeighbors
import datasets.dataset_utils as dataset_utils
class TestDataset(data.Dataset):
def __init__(self, dataset_folder, database_folder="database",
queries_folder="queries", positive_dist_threshold=25,
image_size=512, resize_test_imgs=False):
self.database_folder = dataset_folder + "/" + database_folder
self.queries_folder = dataset_folder + "/" + queries_folder
self.database_paths = dataset_utils.read_images_paths(self.database_folder, get_abs_path=True)
self.queries_paths = dataset_utils.read_images_paths(self.queries_folder, get_abs_path=True)
self.dataset_name = os.path.basename(dataset_folder)
#### Read paths and UTM coordinates for all images.
# The format must be path/to/file/@utm_easting@utm_northing@...@.jpg
self.database_utms = np.array([(path.split("@")[1], path.split("@")[2]) for path in self.database_paths]).astype(float)
self.queries_utms = np.array([(path.split("@")[1], path.split("@")[2]) for path in self.queries_paths]).astype(float)
# Find positives_per_query, which are within positive_dist_threshold (default 25 meters)
knn = NearestNeighbors(n_jobs=-1)
knn.fit(self.database_utms)
self.positives_per_query = knn.radius_neighbors(
self.queries_utms, radius=positive_dist_threshold, return_distance=False
)
self.images_paths = self.database_paths + self.queries_paths
self.database_num = len(self.database_paths)
self.queries_num = len(self.queries_paths)
transforms_list = []
if resize_test_imgs:
# Resize to image_size along the shorter side while maintaining aspect ratio
transforms_list += [transforms.Resize(image_size, antialias=True)]
transforms_list += [
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
]
self.base_transform = transforms.Compose(transforms_list)
@staticmethod
def open_image(path):
return Image.open(path).convert("RGB")
def __getitem__(self, index):
image_path = self.images_paths[index]
pil_img = TestDataset.open_image(image_path)
normalized_img = self.base_transform(pil_img)
return normalized_img, index
def __len__(self):
return len(self.images_paths)
def __repr__(self):
return f"< {self.dataset_name} - #q: {self.queries_num}; #db: {self.database_num} >"
def get_positives(self):
return self.positives_per_query
================================================
FILE: datasets/train_dataset.py
================================================
import os
import torch
import random
import logging
import numpy as np
from PIL import Image
from PIL import ImageFile
import torchvision.transforms as T
from collections import defaultdict
import datasets.dataset_utils as dataset_utils
ImageFile.LOAD_TRUNCATED_IMAGES = True
class TrainDataset(torch.utils.data.Dataset):
def __init__(self, args, dataset_folder, M=10, alpha=30, N=5, L=2,
current_group=0, min_images_per_class=10):
"""
Parameters (please check our paper for a clearer explanation of the parameters).
----------
args : args for data augmentation
dataset_folder : str, the path of the folder with the train images.
M : int, the length of the side of each cell in meters.
alpha : int, size of each class in degrees.
N : int, distance (M-wise) between two classes of the same group.
L : int, distance (alpha-wise) between two classes of the same group.
current_group : int, which one of the groups to consider.
min_images_per_class : int, minimum number of image in a class.
"""
super().__init__()
self.M = M
self.alpha = alpha
self.N = N
self.L = L
self.current_group = current_group
self.dataset_folder = dataset_folder
self.augmentation_device = args.augmentation_device
# dataset_name should be either "processed", "small" or "raw", if you're using SF-XL
dataset_name = os.path.basename(dataset_folder)
filename = f"cache/{dataset_name}_M{M}_N{N}_alpha{alpha}_L{L}_mipc{min_images_per_class}.torch"
if not os.path.exists(filename):
os.makedirs("cache", exist_ok=True)
logging.info(f"Cached dataset {filename} does not exist, I'll create it now.")
self.initialize(dataset_folder, M, N, alpha, L, min_images_per_class, filename)
elif current_group == 0:
logging.info(f"Using cached dataset {filename}")
classes_per_group, self.images_per_class = torch.load(filename)
if current_group >= len(classes_per_group):
raise ValueError(f"With this configuration there are only {len(classes_per_group)} " +
f"groups, therefore I can't create the {current_group}th group. " +
"You should reduce the number of groups by setting for example " +
f"'--groups_num {current_group}'")
self.classes_ids = classes_per_group[current_group]
if self.augmentation_device == "cpu":
self.transform = T.Compose([
T.ColorJitter(brightness=args.brightness,
contrast=args.contrast,
saturation=args.saturation,
hue=args.hue),
T.RandomResizedCrop([args.image_size, args.image_size], scale=[1-args.random_resized_crop, 1], antialias=True),
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
@staticmethod
def open_image(path):
return Image.open(path).convert("RGB")
def __getitem__(self, class_num):
# This function takes as input the class_num instead of the index of
# the image. This way each class is equally represented during training.
class_id = self.classes_ids[class_num]
# Pick a random image among those in this class.
image_path = os.path.join(self.dataset_folder, random.choice(self.images_per_class[class_id]))
try:
pil_image = TrainDataset.open_image(image_path)
except Exception as e:
logging.info(f"ERROR image {image_path} couldn't be opened, it might be corrupted.")
raise e
tensor_image = T.functional.to_tensor(pil_image)
assert tensor_image.shape == torch.Size([3, 512, 512]), \
f"Image {image_path} should have shape [3, 512, 512] but has {tensor_image.shape}."
if self.augmentation_device == "cpu":
tensor_image = self.transform(tensor_image)
return tensor_image, class_num, image_path
def get_images_num(self):
"""Return the number of images within this group."""
return sum([len(self.images_per_class[c]) for c in self.classes_ids])
def __len__(self):
"""Return the number of classes within this group."""
return len(self.classes_ids)
@staticmethod
def initialize(dataset_folder, M, N, alpha, L, min_images_per_class, filename):
logging.debug(f"Searching training images in {dataset_folder}")
images_paths = dataset_utils.read_images_paths(dataset_folder)
logging.debug(f"Found {len(images_paths)} images")
logging.debug("For each image, get its UTM east, UTM north and heading from its path")
images_metadatas = [p.split("@") for p in images_paths]
# field 1 is UTM east, field 2 is UTM north, field 9 is heading
utmeast_utmnorth_heading = [(m[1], m[2], m[9]) for m in images_metadatas]
utmeast_utmnorth_heading = np.array(utmeast_utmnorth_heading).astype(np.float64)
logging.debug("For each image, get class and group to which it belongs")
class_id__group_id = [TrainDataset.get__class_id__group_id(*m, M, alpha, N, L)
for m in utmeast_utmnorth_heading]
logging.debug("Group together images belonging to the same class")
images_per_class = defaultdict(list)
for image_path, (class_id, _) in zip(images_paths, class_id__group_id):
images_per_class[class_id].append(image_path)
# Images_per_class is a dict where the key is class_id, and the value
# is a list with the paths of images within that class.
images_per_class = {k: v for k, v in images_per_class.items() if len(v) >= min_images_per_class}
logging.debug("Group together classes belonging to the same group")
# Classes_per_group is a dict where the key is group_id, and the value
# is a list with the class_ids belonging to that group.
classes_per_group = defaultdict(set)
for class_id, group_id in class_id__group_id:
if class_id not in images_per_class:
continue # Skip classes with too few images
classes_per_group[group_id].add(class_id)
# Convert classes_per_group to a list of lists.
# Each sublist represents the classes within a group.
classes_per_group = [list(c) for c in classes_per_group.values()]
torch.save((classes_per_group, images_per_class), filename)
@staticmethod
def get__class_id__group_id(utm_east, utm_north, heading, M, alpha, N, L):
"""Return class_id and group_id for a given point.
The class_id is a triplet (tuple) of UTM_east, UTM_north and
heading (e.g. (396520, 4983800,120)).
The group_id represents the group to which the class belongs
(e.g. (0, 1, 0)), and it is between (0, 0, 0) and (N, N, L).
"""
rounded_utm_east = int(utm_east // M * M) # Rounded to nearest lower multiple of M
rounded_utm_north = int(utm_north // M * M)
rounded_heading = int(heading // alpha * alpha)
class_id = (rounded_utm_east, rounded_utm_north, rounded_heading)
# group_id goes from (0, 0, 0) to (N, N, L)
group_id = (rounded_utm_east % (M * N) // M,
rounded_utm_north % (M * N) // M,
rounded_heading % (alpha * L) // alpha)
return class_id, group_id
================================================
FILE: eval.py
================================================
import sys
import torch
import logging
import multiprocessing
from datetime import datetime
import test
import parser
import commons
from cosplace_model import cosplace_network
from datasets.test_dataset import TestDataset
torch.backends.cudnn.benchmark = True # Provides a speedup
args = parser.parse_arguments(is_training=False)
start_time = datetime.now()
args.output_folder = f"logs/{args.save_dir}/{start_time.strftime('%Y-%m-%d_%H-%M-%S')}"
commons.make_deterministic(args.seed)
commons.setup_logging(args.output_folder, console="info")
logging.info(" ".join(sys.argv))
logging.info(f"Arguments: {args}")
logging.info(f"The outputs are being saved in {args.output_folder}")
#### Model
model = cosplace_network.GeoLocalizationNet(args.backbone, args.fc_output_dim)
logging.info(f"There are {torch.cuda.device_count()} GPUs and {multiprocessing.cpu_count()} CPUs.")
if args.resume_model is not None:
logging.info(f"Loading model from {args.resume_model}")
model_state_dict = torch.load(args.resume_model)
model.load_state_dict(model_state_dict)
else:
logging.info("WARNING: You didn't provide a path to resume the model (--resume_model parameter). " +
"Evaluation will be computed using randomly initialized weights.")
model = model.to(args.device)
test_ds = TestDataset(args.test_set_folder, queries_folder="queries_v1",
positive_dist_threshold=args.positive_dist_threshold)
recalls, recalls_str = test.test(args, test_ds, model, args.num_preds_to_save)
logging.info(f"{test_ds}: {recalls_str}")
================================================
FILE: hubconf.py
================================================
dependencies = ['torch', 'torchvision']
import torch
from cosplace_model import cosplace_network
AVAILABLE_TRAINED_MODELS = {
# backbone : list of available fc_output_dim, which is equivalent to descriptors dimensionality
"VGG16": [ 64, 128, 256, 512],
"ResNet18": [32, 64, 128, 256, 512],
"ResNet50": [32, 64, 128, 256, 512, 1024, 2048],
"ResNet101": [32, 64, 128, 256, 512, 1024, 2048],
"ResNet152": [32, 64, 128, 256, 512, 1024, 2048],
}
def get_trained_model(backbone : str = "ResNet50", fc_output_dim : int = 2048) -> torch.nn.Module:
"""Return a model trained with CosPlace on San Francisco eXtra Large.
Args:
backbone (str): which torchvision backbone to use. Must be VGG16 or a ResNet.
fc_output_dim (int): the output dimension of the last fc layer, equivalent to
the descriptors dimension. Must be between 32 and 2048, depending on model's availability.
Return:
model (torch.nn.Module): a trained model.
"""
print(f"Returning CosPlace model with backbone: {backbone} with features dimension {fc_output_dim}")
if backbone not in AVAILABLE_TRAINED_MODELS:
raise ValueError(f"Parameter `backbone` is set to {backbone} but it must be one of {list(AVAILABLE_TRAINED_MODELS.keys())}")
try:
fc_output_dim = int(fc_output_dim)
except:
raise ValueError(f"Parameter `fc_output_dim` must be an integer, but it is set to {fc_output_dim}")
if fc_output_dim not in AVAILABLE_TRAINED_MODELS[backbone]:
raise ValueError(f"Parameter `fc_output_dim` is set to {fc_output_dim}, but for backbone {backbone} "
f"it must be one of {list(AVAILABLE_TRAINED_MODELS[backbone])}")
model = cosplace_network.GeoLocalizationNet(backbone, fc_output_dim)
model.load_state_dict(
torch.hub.load_state_dict_from_url(
f'https://github.com/gmberton/CosPlace/releases/download/v1.0/{backbone}_{fc_output_dim}_cosplace.pth',
map_location=torch.device('cpu'))
)
return model
================================================
FILE: parser.py
================================================
import argparse
def parse_arguments(is_training: bool = True):
parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter)
# CosPlace Groups parameters
parser.add_argument("--M", type=int, default=10, help="_")
parser.add_argument("--alpha", type=int, default=30, help="_")
parser.add_argument("--N", type=int, default=5, help="_")
parser.add_argument("--L", type=int, default=2, help="_")
parser.add_argument("--groups_num", type=int, default=8, help="_")
parser.add_argument("--min_images_per_class", type=int, default=10, help="_")
# Model parameters
parser.add_argument("--backbone", type=str, default="ResNet18",
choices=["VGG16",
"ResNet18", "ResNet50", "ResNet101", "ResNet152",
"EfficientNet_B0", "EfficientNet_B1", "EfficientNet_B2",
"EfficientNet_B3", "EfficientNet_B4", "EfficientNet_B5",
"EfficientNet_B6", "EfficientNet_B7"], help="_")
parser.add_argument("--fc_output_dim", type=int, default=512,
help="Output dimension of final fully connected layer")
parser.add_argument("--train_all_layers", default=False, action="store_true",
help="If true, train all layers of the backbone")
# Training parameters
parser.add_argument("--use_amp16", action="store_true",
help="use Automatic Mixed Precision")
parser.add_argument("--augmentation_device", type=str, default="cuda",
choices=["cuda", "cpu"],
help="on which device to run data augmentation")
parser.add_argument("--batch_size", type=int, default=32, help="_")
parser.add_argument("--epochs_num", type=int, default=50, help="_")
parser.add_argument("--iterations_per_epoch", type=int, default=10000, help="_")
parser.add_argument("--lr", type=float, default=0.00001, help="_")
parser.add_argument("--classifiers_lr", type=float, default=0.01, help="_")
parser.add_argument("--image_size", type=int, default=512,
help="Width and height of training images (1:1 aspect ratio))")
parser.add_argument("--resize_test_imgs", default=False, action="store_true",
help="If the test images should be resized to image_size along"
"the shorter side while maintaining aspect ratio")
# Data augmentation
parser.add_argument("--brightness", type=float, default=0.7, help="_")
parser.add_argument("--contrast", type=float, default=0.7, help="_")
parser.add_argument("--hue", type=float, default=0.5, help="_")
parser.add_argument("--saturation", type=float, default=0.7, help="_")
parser.add_argument("--random_resized_crop", type=float, default=0.5, help="_")
# Validation / test parameters
parser.add_argument("--infer_batch_size", type=int, default=16,
help="Batch size for inference (validating and testing)")
parser.add_argument("--positive_dist_threshold", type=int, default=25,
help="distance in meters for a prediction to be considered a positive")
# Resume parameters
parser.add_argument("--resume_train", type=str, default=None,
help="path to checkpoint to resume, e.g. logs/.../last_checkpoint.pth")
parser.add_argument("--resume_model", type=str, default=None,
help="path to model to resume, e.g. logs/.../best_model.pth")
# Other parameters
parser.add_argument("--device", type=str, default="cuda",
choices=["cuda", "cpu"], help="_")
parser.add_argument("--seed", type=int, default=0, help="_")
parser.add_argument("--num_workers", type=int, default=8, help="_")
parser.add_argument("--num_preds_to_save", type=int, default=0,
help="At the end of training, save N preds for each query. "
"Try with a small number like 3")
parser.add_argument("--save_only_wrong_preds", action="store_true",
help="When saving preds (if num_preds_to_save != 0) save only "
"preds for difficult queries, i.e. with uncorrect first prediction")
# Paths parameters
if is_training: # train and val sets are needed only for training
parser.add_argument("--train_set_folder", type=str, required=True,
help="path of the folder with training images")
parser.add_argument("--val_set_folder", type=str, required=True,
help="path of the folder with val images (split in database/queries)")
parser.add_argument("--test_set_folder", type=str, required=True,
help="path of the folder with test images (split in database/queries)")
parser.add_argument("--save_dir", type=str, default="default",
help="name of directory on which to save the logs, under logs/save_dir")
args = parser.parse_args()
return args
================================================
FILE: requirements.txt
================================================
faiss_cpu>=1.7.1
numpy>=1.21.2
Pillow>=9.0.1
scikit_learn>=1.0.2
torch>=1.8.2
torchvision>=0.9.2
tqdm>=4.62.3
utm>=0.7.0
================================================
FILE: test.py
================================================
import faiss
import torch
import logging
import numpy as np
from tqdm import tqdm
from typing import Tuple
from argparse import Namespace
from torch.utils.data.dataset import Subset
from torch.utils.data import DataLoader, Dataset
import visualizations
# Compute R@1, R@5, R@10, R@20
RECALL_VALUES = [1, 5, 10, 20]
def test(args: Namespace, eval_ds: Dataset, model: torch.nn.Module,
num_preds_to_save: int = 0) -> Tuple[np.ndarray, str]:
"""Compute descriptors of the given dataset and compute the recalls."""
model = model.eval()
with torch.no_grad():
logging.debug("Extracting database descriptors for evaluation/testing")
database_subset_ds = Subset(eval_ds, list(range(eval_ds.database_num)))
database_dataloader = DataLoader(dataset=database_subset_ds, num_workers=args.num_workers,
batch_size=args.infer_batch_size, pin_memory=(args.device == "cuda"))
all_descriptors = np.empty((len(eval_ds), args.fc_output_dim), dtype="float32")
for images, indices in tqdm(database_dataloader, ncols=100):
descriptors = model(images.to(args.device))
descriptors = descriptors.cpu().numpy()
all_descriptors[indices.numpy(), :] = descriptors
logging.debug("Extracting queries descriptors for evaluation/testing using batch size 1")
queries_infer_batch_size = 1
queries_subset_ds = Subset(eval_ds, list(range(eval_ds.database_num, eval_ds.database_num+eval_ds.queries_num)))
queries_dataloader = DataLoader(dataset=queries_subset_ds, num_workers=args.num_workers,
batch_size=queries_infer_batch_size, pin_memory=(args.device == "cuda"))
for images, indices in tqdm(queries_dataloader, ncols=100):
descriptors = model(images.to(args.device))
descriptors = descriptors.cpu().numpy()
all_descriptors[indices.numpy(), :] = descriptors
queries_descriptors = all_descriptors[eval_ds.database_num:]
database_descriptors = all_descriptors[:eval_ds.database_num]
# Use a kNN to find predictions
faiss_index = faiss.IndexFlatL2(args.fc_output_dim)
faiss_index.add(database_descriptors)
del database_descriptors, all_descriptors
logging.debug("Calculating recalls")
_, predictions = faiss_index.search(queries_descriptors, max(RECALL_VALUES))
#### For each query, check if the predictions are correct
positives_per_query = eval_ds.get_positives()
recalls = np.zeros(len(RECALL_VALUES))
for query_index, preds in enumerate(predictions):
for i, n in enumerate(RECALL_VALUES):
if np.any(np.in1d(preds[:n], positives_per_query[query_index])):
recalls[i:] += 1
break
# Divide by queries_num and multiply by 100, so the recalls are in percentages
recalls = recalls / eval_ds.queries_num * 100
recalls_str = ", ".join([f"R@{val}: {rec:.1f}" for val, rec in zip(RECALL_VALUES, recalls)])
# Save visualizations of predictions
if num_preds_to_save != 0:
# For each query save num_preds_to_save predictions
visualizations.save_preds(predictions[:, :num_preds_to_save], eval_ds, args.output_folder, args.save_only_wrong_preds)
return recalls, recalls_str
================================================
FILE: train.py
================================================
import sys
import torch
import logging
import numpy as np
from tqdm import tqdm
import multiprocessing
from datetime import datetime
import torchvision.transforms as T
import test
import util
import parser
import commons
import cosface_loss
import augmentations
from cosplace_model import cosplace_network
from datasets.test_dataset import TestDataset
from datasets.train_dataset import TrainDataset
torch.backends.cudnn.benchmark = True # Provides a speedup
args = parser.parse_arguments()
start_time = datetime.now()
args.output_folder = f"logs/{args.save_dir}/{start_time.strftime('%Y-%m-%d_%H-%M-%S')}"
commons.make_deterministic(args.seed)
commons.setup_logging(args.output_folder, console="debug")
logging.info(" ".join(sys.argv))
logging.info(f"Arguments: {args}")
logging.info(f"The outputs are being saved in {args.output_folder}")
#### Model
model = cosplace_network.GeoLocalizationNet(args.backbone, args.fc_output_dim, args.train_all_layers)
logging.info(f"There are {torch.cuda.device_count()} GPUs and {multiprocessing.cpu_count()} CPUs.")
if args.resume_model is not None:
logging.debug(f"Loading model from {args.resume_model}")
model_state_dict = torch.load(args.resume_model)
model.load_state_dict(model_state_dict)
model = model.to(args.device).train()
#### Optimizer
criterion = torch.nn.CrossEntropyLoss()
model_optimizer = torch.optim.Adam(model.parameters(), lr=args.lr)
#### Datasets
groups = [TrainDataset(args, args.train_set_folder, M=args.M, alpha=args.alpha, N=args.N, L=args.L,
current_group=n, min_images_per_class=args.min_images_per_class) for n in range(args.groups_num)]
# Each group has its own classifier, which depends on the number of classes in the group
classifiers = [cosface_loss.MarginCosineProduct(args.fc_output_dim, len(group)) for group in groups]
classifiers_optimizers = [torch.optim.Adam(classifier.parameters(), lr=args.classifiers_lr) for classifier in classifiers]
logging.info(f"Using {len(groups)} groups")
logging.info(f"The {len(groups)} groups have respectively the following number of classes {[len(g) for g in groups]}")
logging.info(f"The {len(groups)} groups have respectively the following number of images {[g.get_images_num() for g in groups]}")
val_ds = TestDataset(args.val_set_folder, positive_dist_threshold=args.positive_dist_threshold,
image_size=args.image_size, resize_test_imgs=args.resize_test_imgs)
test_ds = TestDataset(args.test_set_folder, queries_folder="queries_v1",
positive_dist_threshold=args.positive_dist_threshold,
image_size=args.image_size, resize_test_imgs=args.resize_test_imgs)
logging.info(f"Validation set: {val_ds}")
logging.info(f"Test set: {test_ds}")
#### Resume
if args.resume_train:
model, model_optimizer, classifiers, classifiers_optimizers, best_val_recall1, start_epoch_num = \
util.resume_train(args, args.output_folder, model, model_optimizer, classifiers, classifiers_optimizers)
model = model.to(args.device)
epoch_num = start_epoch_num - 1
logging.info(f"Resuming from epoch {start_epoch_num} with best R@1 {best_val_recall1:.1f} from checkpoint {args.resume_train}")
else:
best_val_recall1 = start_epoch_num = 0
#### Train / evaluation loop
logging.info("Start training ...")
logging.info(f"There are {len(groups[0])} classes for the first group, " +
f"each epoch has {args.iterations_per_epoch} iterations " +
f"with batch_size {args.batch_size}, therefore the model sees each class (on average) " +
f"{args.iterations_per_epoch * args.batch_size / len(groups[0]):.1f} times per epoch")
if args.augmentation_device == "cuda":
gpu_augmentation = T.Compose([
augmentations.DeviceAgnosticColorJitter(brightness=args.brightness,
contrast=args.contrast,
saturation=args.saturation,
hue=args.hue),
augmentations.DeviceAgnosticRandomResizedCrop([args.image_size, args.image_size],
scale=[1-args.random_resized_crop, 1]),
T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
])
if args.use_amp16:
scaler = torch.cuda.amp.GradScaler()
for epoch_num in range(start_epoch_num, args.epochs_num):
#### Train
epoch_start_time = datetime.now()
# Select classifier and dataloader according to epoch
current_group_num = epoch_num % args.groups_num
classifiers[current_group_num] = classifiers[current_group_num].to(args.device)
util.move_to_device(classifiers_optimizers[current_group_num], args.device)
dataloader = commons.InfiniteDataLoader(groups[current_group_num], num_workers=args.num_workers,
batch_size=args.batch_size, shuffle=True,
pin_memory=(args.device == "cuda"), drop_last=True)
dataloader_iterator = iter(dataloader)
model = model.train()
epoch_losses = np.zeros((0, 1), dtype=np.float32)
for iteration in tqdm(range(args.iterations_per_epoch), ncols=100):
images, targets, _ = next(dataloader_iterator)
images, targets = images.to(args.device), targets.to(args.device)
if args.augmentation_device == "cuda":
images = gpu_augmentation(images)
model_optimizer.zero_grad()
classifiers_optimizers[current_group_num].zero_grad()
if not args.use_amp16:
descriptors = model(images)
output = classifiers[current_group_num](descriptors, targets)
loss = criterion(output, targets)
loss.backward()
epoch_losses = np.append(epoch_losses, loss.item())
del loss, output, images
model_optimizer.step()
classifiers_optimizers[current_group_num].step()
else: # Use AMP 16
with torch.cuda.amp.autocast():
descriptors = model(images)
output = classifiers[current_group_num](descriptors, targets)
loss = criterion(output, targets)
scaler.scale(loss).backward()
epoch_losses = np.append(epoch_losses, loss.item())
del loss, output, images
scaler.step(model_optimizer)
scaler.step(classifiers_optimizers[current_group_num])
scaler.update()
classifiers[current_group_num] = classifiers[current_group_num].cpu()
util.move_to_device(classifiers_optimizers[current_group_num], "cpu")
logging.debug(f"Epoch {epoch_num:02d} in {str(datetime.now() - epoch_start_time)[:-7]}, "
f"loss = {epoch_losses.mean():.4f}")
#### Evaluation
recalls, recalls_str = test.test(args, val_ds, model)
logging.info(f"Epoch {epoch_num:02d} in {str(datetime.now() - epoch_start_time)[:-7]}, {val_ds}: {recalls_str[:20]}")
is_best = recalls[0] > best_val_recall1
best_val_recall1 = max(recalls[0], best_val_recall1)
# Save checkpoint, which contains all training parameters
util.save_checkpoint({
"epoch_num": epoch_num + 1,
"model_state_dict": model.state_dict(),
"optimizer_state_dict": model_optimizer.state_dict(),
"classifiers_state_dict": [c.state_dict() for c in classifiers],
"optimizers_state_dict": [c.state_dict() for c in classifiers_optimizers],
"best_val_recall1": best_val_recall1
}, is_best, args.output_folder)
logging.info(f"Trained for {epoch_num+1:02d} epochs, in total in {str(datetime.now() - start_time)[:-7]}")
#### Test best model on test set v1
best_model_state_dict = torch.load(f"{args.output_folder}/best_model.pth")
model.load_state_dict(best_model_state_dict)
logging.info(f"Now testing on the test set: {test_ds}")
recalls, recalls_str = test.test(args, test_ds, model, args.num_preds_to_save)
logging.info(f"{test_ds}: {recalls_str}")
logging.info("Experiment finished (without any errors)")
================================================
FILE: util.py
================================================
import torch
import shutil
import logging
from typing import Type, List
from argparse import Namespace
from cosface_loss import MarginCosineProduct
def move_to_device(optimizer: Type[torch.optim.Optimizer], device: str):
for state in optimizer.state.values():
for k, v in state.items():
if torch.is_tensor(v):
state[k] = v.to(device)
def save_checkpoint(state: dict, is_best: bool, output_folder: str,
ckpt_filename: str = "last_checkpoint.pth"):
# TODO it would be better to move weights to cpu before saving
checkpoint_path = f"{output_folder}/{ckpt_filename}"
torch.save(state, checkpoint_path)
if is_best:
torch.save(state["model_state_dict"], f"{output_folder}/best_model.pth")
def resume_train(args: Namespace, output_folder: str, model: torch.nn.Module,
model_optimizer: Type[torch.optim.Optimizer], classifiers: List[MarginCosineProduct],
classifiers_optimizers: List[Type[torch.optim.Optimizer]]):
"""Load model, optimizer, and other training parameters"""
logging.info(f"Loading checkpoint: {args.resume_train}")
checkpoint = torch.load(args.resume_train)
start_epoch_num = checkpoint["epoch_num"]
model_state_dict = checkpoint["model_state_dict"]
model.load_state_dict(model_state_dict)
model = model.to(args.device)
model_optimizer.load_state_dict(checkpoint["optimizer_state_dict"])
assert args.groups_num == len(classifiers) == len(classifiers_optimizers) == \
len(checkpoint["classifiers_state_dict"]) == len(checkpoint["optimizers_state_dict"]), \
(f"{args.groups_num}, {len(classifiers)}, {len(classifiers_optimizers)}, "
f"{len(checkpoint['classifiers_state_dict'])}, {len(checkpoint['optimizers_state_dict'])}")
for c, sd in zip(classifiers, checkpoint["classifiers_state_dict"]):
# Move classifiers to GPU before loading their optimizers
c = c.to(args.device)
c.load_state_dict(sd)
for c, sd in zip(classifiers_optimizers, checkpoint["optimizers_state_dict"]):
c.load_state_dict(sd)
for c in classifiers:
# Move classifiers back to CPU to save some GPU memory
c = c.cpu()
best_val_recall1 = checkpoint["best_val_recall1"]
# Copy best model to current output_folder
shutil.copy(args.resume_train.replace("last_checkpoint.pth", "best_model.pth"), output_folder)
return model, model_optimizer, classifiers, classifiers_optimizers, best_val_recall1, start_epoch_num
================================================
FILE: visualizations.py
================================================
import os
import cv2
import numpy as np
from tqdm import tqdm
from skimage.transform import rescale
from PIL import Image, ImageDraw, ImageFont
# Height and width of a single image
H = 512
W = 512
TEXT_H = 175
FONTSIZE = 80
SPACE = 50 # Space between two images
def write_labels_to_image(labels=["text1", "text2"]):
"""Creates an image with vertical text, spaced along rows."""
font = ImageFont.truetype("/usr/share/fonts/truetype/dejavu/DejaVuSans.ttf", FONTSIZE)
img = Image.new('RGB', ((W * len(labels)) + 50 * (len(labels)-1), TEXT_H), (1, 1, 1))
d = ImageDraw.Draw(img)
for i, text in enumerate(labels):
_, _, w, h = d.textbbox((0,0), text, font=font)
d.text(((W+SPACE)*i + W//2 - w//2, 1), text, fill=(0, 0, 0), font=font)
return np.array(img)
def draw(img, c=(0, 255, 0), thickness=20):
"""Draw a colored (usually red or green) box around an image."""
p = np.array([[0, 0], [0, img.shape[0]], [img.shape[1], img.shape[0]], [img.shape[1], 0]])
for i in range(3):
cv2.line(img, (p[i, 0], p[i, 1]), (p[i+1, 0], p[i+1, 1]), c, thickness=thickness*2)
return cv2.line(img, (p[3, 0], p[3, 1]), (p[0, 0], p[0, 1]), c, thickness=thickness*2)
def build_prediction_image(images_paths, preds_correct=None):
"""Build a row of images, where the first is the query and the rest are predictions.
For each image, if is_correct then draw a green/red box.
"""
assert len(images_paths) == len(preds_correct)
labels = ["Query"] + [f"Pr{i} - {is_correct}" for i, is_correct in enumerate(preds_correct[1:])]
num_images = len(images_paths)
images = [np.array(Image.open(path)) for path in images_paths]
for img, correct in zip(images, preds_correct):
if correct is None:
continue
color = (0, 255, 0) if correct else (255, 0, 0)
draw(img, color)
concat_image = np.ones([H, (num_images*W)+((num_images-1)*SPACE), 3])
rescaleds = [rescale(i, [min(H/i.shape[0], W/i.shape[1]), min(H/i.shape[0], W/i.shape[1]), 1]) for i in images]
for i, image in enumerate(rescaleds):
pad_width = (W - image.shape[1] + 1) // 2
pad_height = (H - image.shape[0] + 1) // 2
image = np.pad(image, [[pad_height, pad_height], [pad_width, pad_width], [0, 0]], constant_values=1)[:H, :W]
concat_image[: , i*(W+SPACE) : i*(W+SPACE)+W] = image
try:
labels_image = write_labels_to_image(labels)
final_image = np.concatenate([labels_image, concat_image])
except OSError: # Handle error in case of missing PIL ImageFont
final_image = concat_image
final_image = Image.fromarray((final_image*255).astype(np.uint8))
return final_image
def save_file_with_paths(query_path, preds_paths, positives_paths, output_path):
file_content = []
file_content.append("Query path:")
file_content.append(query_path + "\n")
file_content.append("Predictions paths:")
file_content.append("\n".join(preds_paths) + "\n")
file_content.append("Positives paths:")
file_content.append("\n".join(positives_paths) + "\n")
with open(output_path, "w") as file:
_ = file.write("\n".join(file_content))
def save_preds(predictions, eval_ds, output_folder, save_only_wrong_preds=None):
"""For each query, save an image containing the query and its predictions,
and a file with the paths of the query, its predictions and its positives.
Parameters
----------
predictions : np.array of shape [num_queries x num_preds_to_viz], with the preds
for each query
eval_ds : TestDataset
output_folder : str / Path with the path to save the predictions
save_only_wrong_preds : bool, if True save only the wrongly predicted queries,
i.e. the ones where the first pred is uncorrect (further than 25 m)
"""
positives_per_query = eval_ds.get_positives()
os.makedirs(f"{output_folder}/preds", exist_ok=True)
for query_index, preds in enumerate(tqdm(predictions, ncols=80, desc=f"Saving preds in {output_folder}")):
query_path = eval_ds.queries_paths[query_index]
list_of_images_paths = [query_path]
# List of None (query), True (correct preds) or False (wrong preds)
preds_correct = [None]
for pred_index, pred in enumerate(preds):
pred_path = eval_ds.database_paths[pred]
list_of_images_paths.append(pred_path)
is_correct = pred in positives_per_query[query_index]
preds_correct.append(is_correct)
if save_only_wrong_preds and preds_correct[1]:
continue
prediction_image = build_prediction_image(list_of_images_paths, preds_correct)
pred_image_path = f"{output_folder}/preds/{query_index:03d}.jpg"
prediction_image.save(pred_image_path)
positives_paths = [eval_ds.database_paths[idx] for idx in positives_per_query[query_index]]
save_file_with_paths(
query_path=list_of_images_paths[0],
preds_paths=list_of_images_paths[1:],
positives_paths=positives_paths,
output_path=f"{output_folder}/preds/{query_index:03d}.txt"
)
gitextract_ij6gs5cz/ ├── .gitignore ├── LICENSE ├── README.md ├── augmentations.py ├── commons.py ├── cosface_loss.py ├── cosplace_model/ │ ├── __init__.py │ ├── cosplace_network.py │ └── layers.py ├── datasets/ │ ├── __init__.py │ ├── dataset_utils.py │ ├── test_dataset.py │ └── train_dataset.py ├── eval.py ├── hubconf.py ├── parser.py ├── requirements.txt ├── test.py ├── train.py ├── util.py └── visualizations.py
SYMBOL INDEX (60 symbols across 13 files)
FILE: augmentations.py
class DeviceAgnosticColorJitter (line 7) | class DeviceAgnosticColorJitter(T.ColorJitter):
method __init__ (line 8) | def __init__(self, brightness: float = 0., contrast: float = 0., satur...
method forward (line 12) | def forward(self, images: torch.Tensor) -> torch.Tensor:
class DeviceAgnosticRandomResizedCrop (line 23) | class DeviceAgnosticRandomResizedCrop(T.RandomResizedCrop):
method __init__ (line 24) | def __init__(self, size: Union[int, Tuple[int, int]], scale: float):
method forward (line 28) | def forward(self, images: torch.Tensor) -> torch.Tensor:
FILE: commons.py
class InfiniteDataLoader (line 11) | class InfiniteDataLoader(torch.utils.data.DataLoader):
method __init__ (line 12) | def __init__(self, *args, **kwargs):
method __iter__ (line 16) | def __iter__(self):
method __next__ (line 19) | def __next__(self):
function make_deterministic (line 28) | def make_deterministic(seed: int = 0):
function setup_logging (line 44) | def setup_logging(output_folder: str, exist_ok: bool = False, console: s...
FILE: cosface_loss.py
function cosine_sim (line 9) | def cosine_sim(x1: torch.Tensor, x2: torch.Tensor, dim: int = 1, eps: fl...
class MarginCosineProduct (line 16) | class MarginCosineProduct(nn.Module):
method __init__ (line 24) | def __init__(self, in_features: int, out_features: int, s: float = 30....
method forward (line 33) | def forward(self, inputs: torch.Tensor, label: torch.Tensor) -> torch....
method __repr__ (line 40) | def __repr__(self):
FILE: cosplace_model/cosplace_network.py
class GeoLocalizationNet (line 28) | class GeoLocalizationNet(nn.Module):
method __init__ (line 29) | def __init__(self, backbone : str, fc_output_dim : int, train_all_laye...
method forward (line 48) | def forward(self, x):
function get_pretrained_torchvision_model (line 54) | def get_pretrained_torchvision_model(backbone_name : str) -> torch.nn.Mo...
function get_backbone (line 66) | def get_backbone(backbone_name : str, train_all_layers : bool) -> Tuple[...
FILE: cosplace_model/layers.py
function gem (line 8) | def gem(x, p=torch.ones(1)*3, eps: float = 1e-6):
class GeM (line 12) | class GeM(nn.Module):
method __init__ (line 13) | def __init__(self, p=3, eps=1e-6):
method forward (line 18) | def forward(self, x):
method __repr__ (line 21) | def __repr__(self):
class Flatten (line 25) | class Flatten(torch.nn.Module):
method __init__ (line 26) | def __init__(self):
method forward (line 29) | def forward(self, x):
class L2Norm (line 34) | class L2Norm(nn.Module):
method __init__ (line 35) | def __init__(self, dim=1):
method forward (line 39) | def forward(self, x):
FILE: datasets/dataset_utils.py
function read_images_paths (line 10) | def read_images_paths(dataset_folder, get_abs_path=False):
FILE: datasets/test_dataset.py
class TestDataset (line 12) | class TestDataset(data.Dataset):
method __init__ (line 13) | def __init__(self, dataset_folder, database_folder="database",
method open_image (line 51) | def open_image(path):
method __getitem__ (line 54) | def __getitem__(self, index):
method __len__ (line 60) | def __len__(self):
method __repr__ (line 63) | def __repr__(self):
method get_positives (line 66) | def get_positives(self):
FILE: datasets/train_dataset.py
class TrainDataset (line 18) | class TrainDataset(torch.utils.data.Dataset):
method __init__ (line 19) | def __init__(self, args, dataset_folder, M=10, alpha=30, N=5, L=2,
method open_image (line 71) | def open_image(path):
method __getitem__ (line 74) | def __getitem__(self, class_num):
method get_images_num (line 97) | def get_images_num(self):
method __len__ (line 101) | def __len__(self):
method initialize (line 106) | def initialize(dataset_folder, M, N, alpha, L, min_images_per_class, f...
method get__class_id__group_id (line 147) | def get__class_id__group_id(utm_east, utm_north, heading, M, alpha, N,...
FILE: hubconf.py
function get_trained_model (line 18) | def get_trained_model(backbone : str = "ResNet50", fc_output_dim : int =...
FILE: parser.py
function parse_arguments (line 5) | def parse_arguments(is_training: bool = True):
FILE: test.py
function test (line 19) | def test(args: Namespace, eval_ds: Dataset, model: torch.nn.Module,
FILE: util.py
function move_to_device (line 10) | def move_to_device(optimizer: Type[torch.optim.Optimizer], device: str):
function save_checkpoint (line 17) | def save_checkpoint(state: dict, is_best: bool, output_folder: str,
function resume_train (line 26) | def resume_train(args: Namespace, output_folder: str, model: torch.nn.Mo...
FILE: visualizations.py
function write_labels_to_image (line 18) | def write_labels_to_image(labels=["text1", "text2"]):
function draw (line 29) | def draw(img, c=(0, 255, 0), thickness=20):
function build_prediction_image (line 37) | def build_prediction_image(images_paths, preds_correct=None):
function save_file_with_paths (line 66) | def save_file_with_paths(query_path, preds_paths, positives_paths, outpu...
function save_preds (line 78) | def save_preds(predictions, eval_ds, output_folder, save_only_wrong_pred...
Condensed preview — 21 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (69K chars).
[
{
"path": ".gitignore",
"chars": 35,
"preview": ".spyproject\n__pycache__\nlogs\ncache\n"
},
{
"path": "LICENSE",
"chars": 1101,
"preview": "MIT License\n\nCopyright (c) 2022 Gabriele Berton, Carlo Masone, Barbara Caputo\n\nPermission is hereby granted, free of cha"
},
{
"path": "README.md",
"chars": 10155,
"preview": "\n# Rethinking Visual Geo-localization for Large-Scale Applications\n\nThis is the official pyTorch implementation of the C"
},
{
"path": "augmentations.py",
"chars": 2981,
"preview": "\nimport torch\nfrom typing import Tuple, Union\nimport torchvision.transforms as T\n\n\nclass DeviceAgnosticColorJitter(T.Col"
},
{
"path": "commons.py",
"chars": 3454,
"preview": "\nimport os\nimport sys\nimport torch\nimport random\nimport logging\nimport traceback\nimport numpy as np\n\n\nclass InfiniteData"
},
{
"path": "cosface_loss.py",
"chars": 1592,
"preview": "\n# Based on https://github.com/MuggleWang/CosFace_pytorch/blob/master/layer.py\n\nimport torch\nimport torch.nn as nn\nfrom "
},
{
"path": "cosplace_model/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "cosplace_model/cosplace_network.py",
"chars": 4422,
"preview": "\nimport torch\nimport logging\nimport torchvision\nfrom torch import nn\nfrom typing import Tuple\n\nfrom cosplace_model.layer"
},
{
"path": "cosplace_model/layers.py",
"chars": 1020,
"preview": "\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nfrom torch.nn.parameter import Parameter\n\n\ndef gem(x"
},
{
"path": "datasets/__init__.py",
"chars": 0,
"preview": ""
},
{
"path": "datasets/dataset_utils.py",
"chars": 2138,
"preview": "\nimport os\nimport logging\nfrom glob import glob\nfrom PIL import ImageFile\n\nImageFile.LOAD_TRUNCATED_IMAGES = True\n\n\ndef "
},
{
"path": "datasets/test_dataset.py",
"chars": 2848,
"preview": "\nimport os\nimport numpy as np\nfrom PIL import Image\nimport torch.utils.data as data\nimport torchvision.transforms as tra"
},
{
"path": "datasets/train_dataset.py",
"chars": 7801,
"preview": "\nimport os\nimport torch\nimport random\nimport logging\nimport numpy as np\nfrom PIL import Image\nfrom PIL import ImageFile\n"
},
{
"path": "eval.py",
"chars": 1567,
"preview": "\nimport sys\nimport torch\nimport logging\nimport multiprocessing\nfrom datetime import datetime\n\nimport test\nimport parser\n"
},
{
"path": "hubconf.py",
"chars": 2066,
"preview": "\ndependencies = ['torch', 'torchvision']\n\nimport torch\nfrom cosplace_model import cosplace_network\n\n\nAVAILABLE_TRAINED_M"
},
{
"path": "parser.py",
"chars": 5146,
"preview": "\nimport argparse\n\n\ndef parse_arguments(is_training: bool = True):\n parser = argparse.ArgumentParser(formatter_class=a"
},
{
"path": "requirements.txt",
"chars": 121,
"preview": "faiss_cpu>=1.7.1\nnumpy>=1.21.2\nPillow>=9.0.1\nscikit_learn>=1.0.2\ntorch>=1.8.2\ntorchvision>=0.9.2\ntqdm>=4.62.3\nutm>=0.7.0"
},
{
"path": "test.py",
"chars": 3371,
"preview": "\nimport faiss\nimport torch\nimport logging\nimport numpy as np\nfrom tqdm import tqdm\nfrom typing import Tuple\nfrom argpars"
},
{
"path": "train.py",
"chars": 8161,
"preview": "\nimport sys\nimport torch\nimport logging\nimport numpy as np\nfrom tqdm import tqdm\nimport multiprocessing\nfrom datetime im"
},
{
"path": "util.py",
"chars": 2579,
"preview": "\nimport torch\nimport shutil\nimport logging\nfrom typing import Type, List\nfrom argparse import Namespace\nfrom cosface_los"
},
{
"path": "visualizations.py",
"chars": 5186,
"preview": "\nimport os\nimport cv2\nimport numpy as np\nfrom tqdm import tqdm\nfrom skimage.transform import rescale\nfrom PIL import Ima"
}
]
About this extraction
This page contains the full source code of the gmberton/CosPlace GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 21 files (64.2 KB), approximately 16.6k tokens, and a symbol index with 60 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.