Repository: eminorhan/baby-vision Branch: master Commit: f0340acd873d Files: 32 Total size: 130.0 KB Directory structure: gitextract_uh48hnz9/ ├── .gitignore ├── LICENSE ├── README.md ├── feature_animation.py ├── feature_animation_class.py ├── highly_activating_imgs.py ├── hog_baseline.py ├── imagenet_finetuning.py ├── linear_combination_maps.py ├── linear_decoding.py ├── moco/ │ ├── __init__.py │ ├── builder.py │ └── loader.py ├── moco_img.py ├── moco_temp.py ├── moco_utils.py ├── read_saycam.py ├── scripts/ │ ├── feature_animation.sh │ ├── feature_animation_class.sh │ ├── highly_activating_imgs.sh │ ├── hog_baseline.sh │ ├── imagenet_finetuning.sh │ ├── linear_combination_maps.sh │ ├── linear_decoding.sh │ ├── moco_img.sh │ ├── moco_temp.sh │ ├── read_saycam.sh │ ├── selectivities.sh │ └── temporal_classification.sh ├── selectivities.py ├── temporal_classification.py └── utils.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ # File extensions *.out *.mp4 # Directories /__pychache__ /moco/__pychache__ ================================================ FILE: LICENSE ================================================ MIT License Copyright (c) 2021 Emin Orhan Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ================================================ FILE: README.md ================================================ # Self-supervised learning through the eyes of a child This repository contains code for reproducing the results reported in the following paper: Orhan AE, Gupta VV, Lake BM (2020) [Self-supervised learning through the eyes of a child.](https://arxiv.org/abs/2007.16189) *Advances in Neural Information Processing Systems 34 (NeurIPS 2020)*. ## Requirements * pytorch == 1.5.1 * torchvision == 0.6.1 Slightly older or newer versions will probably work fine as well. ## Datasets This project uses the SAYCam dataset described in the following paper: Sullivan J, Mei M, Perfors A, Wojcik EH, Frank MC (2020) [SAYCam: A large, longitudinal audiovisual dataset recorded from the infant’s perspective.](https://psyarxiv.com/fy8zx/) PsyArXiv. The dataset is hosted on the [Databrary](https://nyu.databrary.org/) repository for behavioral science. Unfortunately, we are unable to publicly share the SAYCam dataset here due to the terms of use. However, interested researchers can apply for access to the dataset with approval from their institution's IRB. In addition, this project also uses the Toybox dataset for evaluation purposes. The Toybox dataset is publicly available at [this address](https://aivaslab.github.io/toybox/). ## Code description * [`temporal_classification.py`](https://github.com/eminorhan/baby-vision/blob/master/temporal_classification.py): trains temporal classification models as described in the paper. This file uses code recycled from the PyTorch ImageNet training [example](https://github.com/pytorch/examples/tree/master/imagenet). * [`read_saycam.py`](https://github.com/eminorhan/baby-vision/blob/master/read_saycam.py): SAYCam video-to-image reader. * [`moco`](https://github.com/eminorhan/baby-vision/tree/master/moco) directory contains helper files for training static and temporal MoCo models. The code here was modified from [Facebook's MoCo repository](https://github.com/facebookresearch/moco). * [`moco_img.py`](https://github.com/eminorhan/baby-vision/blob/master/moco_img.py): trains an image-based MoCo model as described in the paper. This code was modified from [Facebook's MoCo repository](https://github.com/facebookresearch/moco). * [`moco_temp.py`](https://github.com/eminorhan/baby-vision/blob/master/moco_temp.py): trains a temporal MoCo model as described in the paper. This code was also modified from [Facebook's MoCo repository](https://github.com/facebookresearch/moco). * [`moco_utils.py`](https://github.com/eminorhan/baby-vision/blob/master/moco_utils.py): some utility functions for MoCo training. * [`linear_decoding.py`](https://github.com/eminorhan/baby-vision/blob/master/linear_decoding.py): evaluates self-supervised models on downstream linear classification tasks. * [`linear_combination_maps.py`](https://github.com/eminorhan/baby-vision/blob/master/linear_combination_maps.py): plots spatial attention maps as in Figure 4b and Figure 6 in the paper. * [`highly_activating_imgs.py`](https://github.com/eminorhan/baby-vision/blob/master/highly_activating_imgs.py): finds highly activating images for a given feature as in Figure 7b in the paper. * [`selectivities.py`](https://github.com/eminorhan/baby-vision/blob/master/selectivities.py): measures the class selecitivity indices of all features in a given layer as in Figure 7a in the paper. * [`hog_baseline.py`](https://github.com/eminorhan/baby-vision/blob/master/hog_baseline.py): runs the HOG baseline model as described in the paper. * [`imagenet_finetuning.py`](https://github.com/eminorhan/baby-vision/blob/master/imagenet_finetuning.py): ImageNet evaluations. * [`feature_animation.py`](https://github.com/eminorhan/baby-vision/blob/master/feature_animation.py) and [`feature_animation_class.py`](https://github.com/eminorhan/baby-vision/blob/master/feature_animation_class.py): Some tools for visualizing the learned features. For specific usage examples, please see the slurm scripts provided in the [`scripts`](https://github.com/eminorhan/baby-vision/tree/master/scripts) directory. ## Pre-trained models ### ResNeXt Since the publication of the paper, we have found that training larger capacity models for longer with the temporal classification objective significantly improves the evaluation results. Hence, we provide below pre-trained `resnext50_32x4d` type models that are currently our best models trained with the SAYCam data. We encourage people to use these new models instead of the `mobilenet_v2` type models reported in the paper (the pre-trained `mobilenet_v2` models reported in the paper are also provided below for the record). Four pre-trained `resnext50_32x4d` models are provided here: temporal classification models trained on data from the individual children in the SAYCam dataset (`TC-S-resnext`, `TC-A-resnext`, `TC-Y-resnext`) and a temporal classification model trained on data from all three children (`TC-SAY-resnext`). These models were all trained for 16 epochs (with batch size 256) with the following data augmentation pipeline: ```python import torchvision.transforms as tr tr.Compose([ tr.RandomResizedCrop(224, scale=(0.2, 1.)), tr.RandomApply([tr.ColorJitter(0.9, 0.9, 0.9, 0.5)], p=0.9), tr.RandomGrayscale(p=0.2), tr.RandomApply([GaussianBlur([.1, 2.])], p=0.5), tr.RandomHorizontalFlip(), tr.ToTensor(), tr.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ]) ``` This data augmentation pipeline is similar to that used in [the SimCLR paper](https://arxiv.org/abs/2002.05709) with slightly larger random crops and slightly stronger color augmentation. Here are some evaluation results for these `resnext50_32x4d` models (to download the models, click on the links over the model names): | Model | Toybox (*iid*) | Toybox (*exemplar*) | ImageNet (*linear*) | ImageNet (*1% ft + linear*) | | ----- |:--------------:|:-------------------:|:-------------------:|:---------------------------:| | [`TC-SAY-resnext`](https://drive.google.com/file/d/1I-HvIeuupsE88yS6eff_nE6pHpEmTVPG/view?usp=sharing) | **90.0** | **57.5** | **36.0** | **45.6** | | [`TC-S-resnext`](https://drive.google.com/file/d/14tZeOtK1Jd64ioxPwzwf2jblriN7Jgue/view?usp=sharing) | 88.5 | 54.9 | -- | -- | | [`TC-A-resnext`](https://drive.google.com/file/d/1aQuWfb4O0xL0PALRJpYIUHk0tsyrujDF/view?usp=sharing) | 86.8 | 50.4 | -- | -- | | [`TC-Y-resnext`](https://drive.google.com/file/d/1sB12pdnVEZsgVKiVdZyS0l4x24_T5zCj/view?usp=sharing) | 87.0 | 53.0 | -- | -- | Here, **ImageNet (*linear*)** refers to the top-1 validation accuracy on ImageNet with only a linear classifier trained on top of the frozen features, and **ImageNet (*1% ft + linear*)** is similar but with the entire model first fine-tuned on 1% of the ImageNet training data (~12800 images). Note that these are results from a single run, so you may observe slightly different numbers. These models come with the temporal classification heads attached. To load these models, please do something along the lines of: ```python import torch import torchvision.models as models model = models.resnext50_32x4d(pretrained=False) model.fc = torch.nn.Linear(in_features=2048, out_features=n_out, bias=True) model = torch.nn.DataParallel(model).cuda() checkpoint = torch.load('TC-SAY-resnext.tar') model.load_state_dict(checkpoint['model_state_dict']) ``` where `n_out` should be 6269 for `TC-SAY-resnext`, 2765 for `TC-S-resnext`, 1786 for `TC-A-resnext`, and 1718 for `TC-Y-resnext`. The differences here are due to the different lengths of the datasets. In addition, please find below the best performing ImageNet models reported above: a model with a linear ImageNet classifier trained on top of the frozen features of `TC-SAY-resnext` (`TC-SAY-resnext-IN-linear`) and a model that was first fine-tuned with 1% of the ImageNet training data (`TC-SAY-resnext-IN-1pt-linear`): * [`TC-SAY-resnext-IN-linear`](https://drive.google.com/file/d/1Qo0_1RwgOsr-JM3lP4ILWRY0WflnS7On/view?usp=sharing) * [`TC-SAY-resnext-IN-1pt-linear`](https://drive.google.com/file/d/1lvCG3L1_-gdqWDMD41yTIbuNBpzpUOQq/view?usp=sharing) You can load these models in the same way as described above. Since these are ImageNet models, `n_out` should be set to 1000. ### MobileNet The following are the pre-trained `mobilenet_v2` type models reported in the paper: * [TC-S-mobilenet](https://drive.google.com/file/d/1DVJjpaGhoBPNmlO7jXpwEX3lSCk2ZUCa/view?usp=sharing) (69.4 MB) * [TC-A-mobilenet](https://drive.google.com/file/d/1uQvJBbuy6P0uCW0HYs1wNgawRU8sGLhC/view?usp=sharing) (54.4 MB) * [TC-Y-mobilenet](https://drive.google.com/file/d/1TTndiiiqSiCMdZjwYZPKQySZot4ipCrG/view?usp=sharing) (53.3 MB) * [TC-SAY-mobilenet](https://drive.google.com/file/d/1zeidpBaXqqWCeeYj-fMI7V7x9EiAGH6Q/view?usp=sharing) (123.3 MB) ## Acknowledgments We are very grateful to the volunteers who contributed recordings to the SAYCam dataset. We thank Jessica Sullivan for her generous assistance with the dataset. We also thank the team behind the Toybox dataset, as well as the developers of PyTorch and torchvision for making this work possible. This project was partly funded by the NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science. ================================================ FILE: feature_animation.py ================================================ '''Animating features on short clips''' import os import argparse import numpy as np import torch import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models from torchvision.utils import make_grid import matplotlib as mp import matplotlib.pyplot as plt import matplotlib.animation as animation import matplotlib.cm as cm # TODO: combine the map extraction functions into a single function # TODO: combine model loading functions into a single function def extract_map_layer_7x7_res(res_model): layer_list = list(res_model.module.children())[:-2] new_model = torch.nn.Sequential(*layer_list) return new_model def extract_map_layer_7x7(mobilenetV2_model): layer_list = list(mobilenetV2_model.module.features.children()) new_model = torch.nn.Sequential(*layer_list) return new_model def extract_map_layer_14x14(mobilenetV2_model): layer_list = list(mobilenetV2_model.module.features.children()) new_layer_list = layer_list[:-5] new_layer_list.append(layer_list[-5].conv[0]) new_model = torch.nn.Sequential(*new_layer_list) return new_model def load_model_res(args): model = models.resnext50_32x4d(pretrained=False) model.fc = torch.nn.Linear(in_features=2048, out_features=args.n_out, bias=True) model = torch.nn.DataParallel(model).cuda() if args.model_path: if os.path.isfile(args.model_path): checkpoint = torch.load(args.model_path) model.load_state_dict(checkpoint['model_state_dict']) else: print("=> no checkpoint found at '{}'".format(args.model_path)) return model def load_model(args): model = models.mobilenet_v2(pretrained=True) model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True) model = torch.nn.DataParallel(model).cuda() if args.model_path: if os.path.isfile(args.model_path): checkpoint = torch.load(args.model_path) model.load_state_dict(checkpoint['model_state_dict']) else: print("=> no checkpoint found at '{}'".format(args.model_path)) return model def load_data(data_dir, args): normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) train_dataset = datasets.ImageFolder( data_dir, transforms.Compose([transforms.Resize(224), transforms.ToTensor(), normalize]) ) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True, sampler=None ) return train_loader def predict(data_loader, model, batch_size, feature_idx): # switch to evaluate mode model.eval() preds_list = [] imgs_list = [] with torch.no_grad(): for i, (images, target) in enumerate(data_loader): images = images.cuda() # compute predictions preds = model(images) preds = preds[:, feature_idx, :, :] preds_list.append(preds) imgs_list.append(images) preds = torch.cat(preds_list, 0) images = torch.cat(imgs_list, 0) print('Images shape:', images.size()) print('Preds shape:', preds.size()) # Copy activation map to all channels and upsample to image size x = torch.zeros(preds.size()[0], 3, 7, 7) x[:, 0, :, :] = preds x[:, 1, :, :] = preds x[:, 2, :, :] = preds m = torch.nn.Upsample(scale_factor=32, mode='bicubic') upsampled_maps = m(x).cuda() # upsampled_maps = torch.sigmoid(10. * upsampled_maps / torch.std(upsampled_maps)) upsampled_maps = upsampled_maps.cpu().numpy() images = images.cpu().numpy() return upsampled_maps, images def show_img(ax, img, save_name): '''Save maps''' npimg = img.cpu().numpy() print(npimg.shape) ax.imshow(np.transpose(npimg, (1, 2, 0)), interpolation='nearest') ax.spines["bottom"].set_visible(False) ax.spines["left"].set_visible(False) ax.spines["right"].set_visible(False) ax.spines["top"].set_visible(False) mp.rcParams['axes.linewidth'] = 0.75 mp.rcParams['patch.linewidth'] = 0.75 mp.rcParams['patch.linewidth'] = 1.15 mp.rcParams['font.sans-serif'] = ['FreeSans'] mp.rcParams['mathtext.fontset'] = 'cm' plt.savefig(save_name, bbox_inches='tight') if __name__ == '__main__': parser = argparse.ArgumentParser(description='Plot spatial attention maps') parser.add_argument('data', metavar='DIR', help='path to dataset') parser.add_argument('--workers', default=32, type=int, help='number of data loading workers (default: 4)') parser.add_argument('--batch-size', default=900, type=int, help='mini-batch size, this is the total ' 'batch size of all GPUs on the current node when ' 'using Data Parallel or Distributed Data Parallel') parser.add_argument('--model-path', default='', type=str, help='path to model checkpoint (default: ' 'ImageNet-pretrained)') parser.add_argument('--n_out', default=2765, type=int, help='output dim of pre-trained model') parser.add_argument('--feature-idx', default=1, type=int, help='feature index for which the maps will be computed') args = parser.parse_args() model = load_model(args) map_layer = extract_map_layer_7x7(model) data_loader = load_data(args.data, args) preds, images = predict(data_loader, map_layer, args.batch_size, args.feature_idx) preds = preds - preds.min() preds = preds / preds.max() preds = np.uint8(255 * preds) images = images - images.min() images = images / images.max() # images = np.uint8(255 * images) fig, ax = plt.subplots() ax.set_axis_off() ax.set_title('Feature: ' + str(args.feature_idx)) jet = cm.get_cmap("jet") jet_colors = jet(np.arange(256))[:, :3] preds = jet_colors[preds[:, 0, :, :]] masked_imgs = 1.0 * preds + np.transpose(images, (0, 2, 3, 1)) masked_imgs = np.uint8(255 * masked_imgs / masked_imgs.max()) imgs = [] for i in range(900): im = ax.imshow(masked_imgs[i]) if i == 0: im = ax.imshow(masked_imgs[i]) imgs.append([im]) ani = animation.ArtistAnimation(fig, imgs, interval=200, blit=True, repeat_delay=1000) # To save the animation, use e.g. ani.save('intphys_feature_animation_' + str(args.feature_idx) + '.mp4') ================================================ FILE: feature_animation_class.py ================================================ '''Animating features on short clips''' import os import argparse import numpy as np import torch import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models from torchvision.utils import make_grid import matplotlib as mp import matplotlib.pyplot as plt import matplotlib.animation as animation import matplotlib.cm as cm # TODO: combine the map extraction functions into a single function # TODO: combine model loading functions into a single function def extract_map_layer_7x7_res(res_model): layer_list = list(res_model.module.children())[:-2] new_model = torch.nn.Sequential(*layer_list) return new_model def extract_map_layer_7x7(mobilenetV2_model): layer_list = list(mobilenetV2_model.module.features.children()) new_model = torch.nn.Sequential(*layer_list) return new_model def extract_map_layer_14x14(mobilenetV2_model): layer_list = list(mobilenetV2_model.module.features.children()) new_layer_list = layer_list[:-5] new_layer_list.append(layer_list[-5].conv[0]) new_model = torch.nn.Sequential(*new_layer_list) return new_model def load_model_res(args): model = models.resnext50_32x4d(pretrained=False) model.fc = torch.nn.Linear(in_features=2048, out_features=args.n_out, bias=True) model = torch.nn.DataParallel(model).cuda() if args.model_path: if os.path.isfile(args.model_path): checkpoint = torch.load(args.model_path) model.load_state_dict(checkpoint['model_state_dict']) else: print("=> no checkpoint found at '{}'".format(args.model_path)) return model def load_model(args): model = models.mobilenet_v2(pretrained=True) model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True) model = torch.nn.DataParallel(model).cuda() if args.model_path: if os.path.isfile(args.model_path): checkpoint = torch.load(args.model_path) model.load_state_dict(checkpoint['model_state_dict']) else: print("=> no checkpoint found at '{}'".format(args.model_path)) return model def load_data(data_dir, args): normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) train_dataset = datasets.ImageFolder( data_dir, transforms.Compose([transforms.ToTensor(), normalize]) ) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True, sampler=None ) return train_loader def predict(data_loader, model, batch_size, weights): # switch to evaluate mode model.eval() preds_list = [] imgs_list = [] with torch.no_grad(): for i, (images, target) in enumerate(data_loader): images = images.cuda() # compute predictions preds = model(images) preds_list.append(preds) imgs_list.append(images) preds = torch.cat(preds_list, 0) images = torch.cat(imgs_list, 0) print('Images shape:', images.size()) print('Preds shape:', preds.size()) linear_combination_map = torch.einsum('ijkl,j->ikl', preds, weights) # Copy activation map to all channels and upsample to image size x = torch.zeros(preds.size()[0], 3, 7, 7) x[:, 0, :, :] = linear_combination_map x[:, 1, :, :] = linear_combination_map x[:, 2, :, :] = linear_combination_map m = torch.nn.Upsample(scale_factor=32, mode='bicubic') upsampled_maps = m(x).cuda() # upsampled_maps = torch.sigmoid(10. * upsampled_maps / torch.std(upsampled_maps)) upsampled_maps = upsampled_maps.cpu().numpy() images = images.cpu().numpy() return upsampled_maps, images def show_img(ax, img, save_name): '''Save maps''' npimg = img.cpu().numpy() print(npimg.shape) ax.imshow(np.transpose(npimg, (1, 2, 0)), interpolation='nearest') ax.spines["bottom"].set_visible(False) ax.spines["left"].set_visible(False) ax.spines["right"].set_visible(False) ax.spines["top"].set_visible(False) mp.rcParams['axes.linewidth'] = 0.75 mp.rcParams['patch.linewidth'] = 0.75 mp.rcParams['patch.linewidth'] = 1.15 mp.rcParams['font.sans-serif'] = ['FreeSans'] mp.rcParams['mathtext.fontset'] = 'cm' plt.savefig(save_name, bbox_inches='tight') if __name__ == '__main__': parser = argparse.ArgumentParser(description='Plot spatial attention maps') parser.add_argument('data', metavar='DIR', help='path to dataset') parser.add_argument('--workers', default=32, type=int, help='number of data loading workers (default: 4)') parser.add_argument('--batch-size', default=500, type=int, help='mini-batch size, this is the total ' 'batch size of all GPUs on the current node when ' 'using Data Parallel or Distributed Data Parallel') parser.add_argument('--model-path', default='', type=str, help='path to model checkpoint (default: ' 'ImageNet-pretrained)') parser.add_argument('--n_out', default=26, type=int, help='output dim of pre-trained model') parser.add_argument('--class-idx', default=1, type=int, help='class index for which the maps will be computed') args = parser.parse_args() model = load_model(args) map_layer = extract_map_layer_7x7(model) weights = model.module.classifier.weight.data[args.class_idx, :].cuda() data_loader = load_data(args.data, args) preds, images = predict(data_loader, map_layer, args.batch_size, weights) preds = preds - preds.min() preds = preds / preds.max() preds = np.uint8(255 * preds) images = images - images.min() images = images / images.max() # images = np.uint8(255 * images) fig, ax = plt.subplots() ax.set_axis_off() ax.set_title('Class: ' + str(args.class_idx)) jet = cm.get_cmap("jet") jet_colors = jet(np.arange(256))[:, :3] preds = jet_colors[preds[:, 0, :, :]] masked_imgs = 1.0 * preds + np.transpose(images, (0, 2, 3, 1)) masked_imgs = np.uint8(255 * masked_imgs / masked_imgs.max()) imgs = [] for i in range(200): im = ax.imshow(masked_imgs[i]) if i == 0: im = ax.imshow(masked_imgs[i]) imgs.append([im]) ani = animation.ArtistAnimation(fig, imgs, interval=200, blit=True, repeat_delay=1000) # To save the animation, use e.g. ani.save('computers_feature_animation_' + str(args.class_idx) + '.mp4') ================================================ FILE: highly_activating_imgs.py ================================================ '''Plots highly activating images''' import os import argparse import numpy as np import torch import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models from torchvision.utils import make_grid import matplotlib as mp import matplotlib.pyplot as plt def extract_map_layer_7x7(mobilenetV2_model): layer_list = list(mobilenetV2_model.module.features.children()) new_model = torch.nn.Sequential(*layer_list) return new_model def extract_map_layer_14x14(mobilenetV2_model, layer): layer_list = list(mobilenetV2_model.module.features.children()) new_layer_list = layer_list[:-layer] new_layer_list.append(layer_list[-layer].conv[0]) new_model = torch.nn.Sequential(*new_layer_list) return new_model def load_model(args): model = models.mobilenet_v2(pretrained=True) model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True) model = torch.nn.DataParallel(model).cuda() if args.model_path: if os.path.isfile(args.model_path): checkpoint = torch.load(args.model_path) model.load_state_dict(checkpoint['model_state_dict']) else: print("=> no checkpoint found at '{}'".format(args.model_path)) return model def load_data(data_dir, args): normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) train_dataset = datasets.ImageFolder( data_dir, transforms.Compose([transforms.ToTensor(), normalize]) ) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, shuffle=True, num_workers=args.workers, pin_memory=True, sampler=None ) return train_loader def predict(data_loader, model, neuron_idx): # switch to evaluate mode model.eval() with torch.no_grad(): for i, (images, target) in enumerate(data_loader): images = images.cuda() # compute predictions pred = model(images) pred_mean = torch.mean(pred, dim=(2, 3)) pred_mean = pred_mean[:, neuron_idx] if i == 0: break _, indices = torch.sort(pred_mean, descending=True) images = images[indices, :, :, :] return images def show_img(ax, img, save_name): '''Save maps''' npimg = img.cpu().numpy() print(npimg.shape) ax.imshow(np.transpose(npimg, (1, 2, 0)), interpolation='nearest') ax.spines["bottom"].set_visible(False) ax.spines["left"].set_visible(False) ax.spines["right"].set_visible(False) ax.spines["top"].set_visible(False) mp.rcParams['axes.linewidth'] = 0.75 mp.rcParams['patch.linewidth'] = 0.75 mp.rcParams['patch.linewidth'] = 1.15 mp.rcParams['font.sans-serif'] = ['FreeSans'] mp.rcParams['mathtext.fontset'] = 'cm' plt.savefig(save_name, bbox_inches='tight') if __name__ == '__main__': parser = argparse.ArgumentParser(description='Plot highly activating images for a given feature') parser.add_argument('data', metavar='DIR', help='path to dataset') parser.add_argument('--workers', default=4, type=int, help='number of data loading workers (default: 4)') parser.add_argument('--batch-size', default=1024, type=int, help='mini-batch size, this is the total ' 'batch size of all GPUs on the current node when ' 'using Data Parallel or Distributed Data Parallel') parser.add_argument('--model-path', default='', type=str, help='path to latest checkpoint (default: none)') parser.add_argument('--n_out', default=1000, type=int, help='output dim') parser.add_argument('--neuron_idx', default=276, type=int, help='neuron index') args = parser.parse_args() model = load_model(args) map_layer = extract_map_layer_7x7(model) data_loader = load_data(args.data, args) imgs = predict(data_loader, map_layer, neuron_idx=args.neuron_idx) print('Imgs shape', imgs.shape) print('Plotting the top 10 images') fig_img = plt.figure(figsize=(16, 16), dpi=300) ax_img = fig_img.add_subplot('111') grid_img = make_grid(imgs[:10, :, :, :], nrow=10, padding=2, normalize=True, scale_each=False) show_img(ax_img, grid_img, 'highly_activating_imgs_neuron_' + str(args.neuron_idx) + '.pdf') ================================================ FILE: hog_baseline.py ================================================ '''HoG baseline''' import os import argparse import numpy as np from skimage.feature import hog from skimage.io import imread from sklearn.linear_model import SGDClassifier from sklearn.model_selection import train_test_split parser = argparse.ArgumentParser(description='Linear decoding with HoG model') parser.add_argument('data', metavar='DIR', help='path to dataset') parser.add_argument('--subsample', default=False, action='store_true', help='subsample data?') if __name__ == '__main__': args = parser.parse_args() c_list = os.listdir(args.data) c_list.sort() print('Class list:', c_list) imgs = [] labels = [] label_counter = 0 file_counter = 0 for c in c_list: curr_dir = os.path.join(args.data, c) f_list = os.listdir(curr_dir) f_list.sort() print('Reading class:', c) for f in f_list: f_path = os.path.join(curr_dir, f) img = imread(f_path) feats = hog(img, orientations=9, pixels_per_cell=(16, 16), cells_per_block=(3, 3), block_norm='L2', visualize=False, transform_sqrt=False, feature_vector=True, multichannel=True) if args.subsample: if file_counter % 10 == 0: imgs.append(feats) labels.append(label_counter) else: imgs.append(feats) labels.append(label_counter) file_counter += 1 label_counter += 1 imgs = np.vstack(imgs) labels = np.array(labels) print('Imgs shape:', imgs.shape) print('Labels shape:', labels.shape) print('Splitting dataset') X_train, X_test, y_train, y_test = train_test_split(imgs, labels, test_size=0.5) print('Fitting training data') clf = SGDClassifier(loss="hinge", penalty="l2", alpha=0.0001, max_iter=250) clf.fit(X_train, y_train) print('Computing predictions') pred_test = clf.predict(X_test) test_acc = np.mean(y_test==pred_test) pred_train = clf.predict(X_train) train_acc = np.mean(y_train==pred_train) print('Test accuracy', test_acc) print('Train accuracy', train_acc) ================================================ FILE: imagenet_finetuning.py ================================================ import argparse import os import time import warnings import torch import torch.nn as nn import torch.nn.parallel import torch.backends.cudnn as cudnn import torch.distributed as dist import torch.optim import torch.multiprocessing as mp import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models parser = argparse.ArgumentParser(description='ImageNet fine-tuning or linear classification') parser.add_argument('-j', '--workers', default=32, type=int, metavar='N', help='number of data loading workers (default:' ' 4)') parser.add_argument('--epochs', default=25, type=int, metavar='N', help='number of total epochs to run') parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') parser.add_argument('-b', '--batch-size', default=256, type=int, metavar='N', help='mini-batch size (default: 256), this is the total batch size of all GPUs on the current node ' 'when using Data Parallel or Distributed Data Parallel') parser.add_argument('--lr', '--learning-rate', default=0.0005, type=float, metavar='LR', help='initial learning rate', dest='lr') parser.add_argument('--wd', '--weight-decay', default=0.0, type=float, metavar='W', help='weight decay (default: 0)', dest='weight_decay') parser.add_argument('-p', '--print-freq', default=5000, type=int, metavar='N', help='print frequency (default: 100)') parser.add_argument('--schedule', default=[23, 24], nargs='*', type=int, help='learning rate schedule (when to drop lr by a ratio)') parser.add_argument('--resume', default='', type=str, metavar='PATH', help='path to latest checkpoint (default: none)') parser.add_argument('--world-size', default=-1, type=int, help='number of nodes for distributed training') parser.add_argument('--rank', default=-1, type=int, help='node rank for distributed training') parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, help='url used to set up distributed ' 'training') parser.add_argument('--dist-backend', default='nccl', type=str, help='distributed backend') parser.add_argument('--gpu', default=None, type=int, help='GPU id to use.') parser.add_argument('--multiprocessing-distributed', action='store_true', help='Use multi-processing distributed training to launch ' 'N processes per node, which has N GPUs. This is the ' 'fastest way to use PyTorch for either single node or ' 'multi node data parallel training') parser.add_argument('--n_out', default=20, type=int, help='output dim') parser.add_argument('--freeze-trunk', default=False, action='store_true', help='freeze trunk?') parser.add_argument('--frac-retained', default=1.0, type=float, help='fraction of tr data retained') def set_parameter_requires_grad(model, feature_extracting=True): '''Helper function for setting body to non-trainable''' if feature_extracting: for param in model.parameters(): param.requires_grad = False for param in model.module.fc.parameters(): print(param.shape) param.requires_grad = True def main(): args = parser.parse_args() if args.gpu is not None: warnings.warn('You have chosen a specific GPU. This will completely disable data parallelism.') if args.dist_url == "env://" and args.world_size == -1: args.world_size = int(os.environ["WORLD_SIZE"]) args.distributed = args.world_size > 1 or args.multiprocessing_distributed ngpus_per_node = torch.cuda.device_count() if args.multiprocessing_distributed: # Since we have ngpus_per_node processes per node, the total world_size needs to be adjusted accordingly args.world_size = ngpus_per_node * args.world_size # Use torch.multiprocessing.spawn to launch distributed processes: the main_worker process function mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) else: # Simply call main_worker function main_worker(args.gpu, ngpus_per_node, args) def main_worker(gpu, ngpus_per_node, args): args.gpu = gpu if args.gpu is not None: print("Use GPU: {} for training".format(args.gpu)) if args.distributed: if args.dist_url == "env://" and args.rank == -1: args.rank = int(os.environ["RANK"]) if args.multiprocessing_distributed: # For multiprocessing distributed training, rank needs to be the # global rank among all the processes args.rank = args.rank * ngpus_per_node + gpu dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, world_size=args.world_size, rank=args.rank) model = models.resnext50_32x4d(pretrained=False) model.fc = torch.nn.Linear(in_features=2048, out_features=args.n_out, bias=True) # DataParallel will divide and allocate batch_size to all available GPUs model = torch.nn.DataParallel(model).cuda() # if resume from a pretrained model if args.resume: if os.path.isfile(args.resume): print("=> loading model '{}'".format(args.resume)) checkpoint = torch.load(args.resume) model.load_state_dict(checkpoint['model_state_dict']) if args.freeze_trunk: print('Freezing trunk.') set_parameter_requires_grad(model) # freeze the trunk model.module.fc = torch.nn.Linear(in_features=2048, out_features=1000, bias=True).cuda() else: print("=> no checkpoint found at '{}'".format(args.resume)) else: if args.freeze_trunk: print('Freezing trunk.') set_parameter_requires_grad(model) # freeze the trunk model.module.fc = torch.nn.Linear(in_features=2048, out_features=1000, bias=True).cuda() print(model) # define loss function (criterion) and optimizer criterion = nn.CrossEntropyLoss().cuda(args.gpu) optimizer = torch.optim.Adam(model.parameters(), args.lr, weight_decay=args.weight_decay) cudnn.benchmark = True # Save file name if args.resume: sv_name = args.resume savefile_name = 'ft_IN_' + sv_name # str(args.freeze_trunk) + 'fz_IN_' + sv_name[26:] else: savefile_name = str(args.freeze_trunk) + 'fz_IN_MobileNetV2_scratch.tar' # Data loaders basedir = '/misc/vlgscratch4/LakeGroup/emin/robust_vision/imagenet/' traindir = os.path.join(basedir, 'train') valdir = os.path.join(basedir, 'val') normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) train_dataset = datasets.ImageFolder( traindir, transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize ]) ) val_dataset = datasets.ImageFolder(valdir, transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(224), transforms.ToTensor(), normalize ])) if args.frac_retained < 1.0: print('Fraction of train data retained:', args.frac_retained) import numpy as np num_train = len(train_dataset) indices = list(range(num_train)) np.random.shuffle(indices) train_idx = indices[:int(args.frac_retained * num_train)] train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idx) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True, sampler=train_sampler) else: print('Using all of train data') train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, shuffle=True, num_workers=args.workers, pin_memory=True, sampler=None) val_loader = torch.utils.data.DataLoader( val_dataset, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True) acc1_list = [] val_acc1_list = [] for epoch in range(args.start_epoch, args.epochs): adjust_learning_rate(optimizer, epoch, args) # train for one epoch acc1 = train(train_loader, model, criterion, optimizer, epoch, args) acc1_list.append(acc1) # ... then validate val_acc1 = validate(val_loader, model, args) val_acc1_list.append(val_acc1) torch.save({'acc1_list': acc1_list, 'val_acc1_list': val_acc1_list, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict()}, savefile_name) def train(train_loader, model, criterion, optimizer, epoch, args): batch_time = AverageMeter('Time', ':6.3f') data_time = AverageMeter('Data', ':6.3f') losses = AverageMeter('Loss', ':.4e') top1 = AverageMeter('Acc@1', ':6.2f') top5 = AverageMeter('Acc@5', ':6.2f') progress = ProgressMeter( len(train_loader), [batch_time, data_time, losses, top1, top5], prefix="Epoch: [{}]".format(epoch)) # switch to train mode model.train() end = time.time() for i, (images, target) in enumerate(train_loader): # measure data loading time data_time.update(time.time() - end) if args.gpu is not None: images = images.cuda(args.gpu, non_blocking=True) target = target.cuda(args.gpu, non_blocking=True) # compute output output = model(images) loss = criterion(output, target) # measure accuracy and record loss acc1, acc5 = accuracy(output, target, topk=(1, 5)) losses.update(loss.item(), images.size(0)) top1.update(acc1[0], images.size(0)) top5.update(acc5[0], images.size(0)) # compute gradient and do SGD step optimizer.zero_grad() loss.backward() optimizer.step() # measure elapsed time batch_time.update(time.time() - end) end = time.time() if i % args.print_freq == 0: progress.display(i) return top1.avg.cpu().numpy() def validate(val_loader, model, args): top1 = AverageMeter('Acc@1', ':6.2f') top5 = AverageMeter('Acc@5', ':6.2f') # switch to eval mode model.eval() with torch.no_grad(): for i, (images, target) in enumerate(val_loader): if args.gpu is not None: images = images.cuda(args.gpu, non_blocking=True) target = target.cuda(args.gpu, non_blocking=True) # compute output output = model(images) # measure accuracy and record loss acc1, acc5 = accuracy(output, target, topk=(1, 5)) top1.update(acc1[0], images.size(0)) top5.update(acc5[0], images.size(0)) print('End of epoch validation: * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5)) return top1.avg.cpu().numpy() class AverageMeter(object): """Computes and stores the average and current value""" def __init__(self, name, fmt=':f'): self.name = name self.fmt = fmt self.reset() def reset(self): self.val = 0 self.avg = 0 self.sum = 0 self.count = 0 def update(self, val, n=1): self.val = val self.sum += val * n self.count += n self.avg = self.sum / self.count def __str__(self): fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' return fmtstr.format(**self.__dict__) class ProgressMeter(object): def __init__(self, num_batches, meters, prefix=""): self.batch_fmtstr = self._get_batch_fmtstr(num_batches) self.meters = meters self.prefix = prefix def display(self, batch): entries = [self.prefix + self.batch_fmtstr.format(batch)] entries += [str(meter) for meter in self.meters] print('\t'.join(entries)) def _get_batch_fmtstr(self, num_batches): num_digits = len(str(num_batches // 1)) fmt = '{:' + str(num_digits) + 'd}' return '[' + fmt + '/' + fmt.format(num_batches) + ']' def adjust_learning_rate(optimizer, epoch, args): """Decay the learning rate based on schedule""" lr = args.lr for milestone in args.schedule: lr *= 0.2 if epoch >= milestone else 1. for param_group in optimizer.param_groups: param_group['lr'] = lr def accuracy(output, target, topk=(1,)): """Computes the accuracy over the k top predictions for the specified values of k""" with torch.no_grad(): maxk = max(topk) batch_size = target.size(0) _, pred = output.topk(maxk, 1, True, True) pred = pred.t() correct = pred.eq(target.view(1, -1).expand_as(pred)) res = [] for k in topk: correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) res.append(correct_k.mul_(100.0 / batch_size)) return res if __name__ == '__main__': main() ================================================ FILE: linear_combination_maps.py ================================================ '''Plots spatial attention maps''' import os import argparse import numpy as np import torch import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models from torchvision.utils import make_grid import matplotlib as mp import matplotlib.pyplot as plt def extract_map_layer_7x7(mobilenetV2_model): layer_list = list(mobilenetV2_model.module.features.children()) new_model = torch.nn.Sequential(*layer_list) return new_model def extract_map_layer_14x14(mobilenetV2_model): layer_list = list(mobilenetV2_model.module.features.children()) new_layer_list = layer_list[:-5] new_layer_list.append(layer_list[-5].conv[0]) new_model = torch.nn.Sequential(*new_layer_list) return new_model def load_model(args): model = models.mobilenet_v2(pretrained=True) model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True) model = torch.nn.DataParallel(model).cuda() if args.model_path: if os.path.isfile(args.model_path): checkpoint = torch.load(args.model_path) model.load_state_dict(checkpoint['model_state_dict']) else: print("=> no checkpoint found at '{}'".format(args.model_path)) return model def load_data(data_dir, args): normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) train_dataset = datasets.ImageFolder( data_dir, transforms.Compose([transforms.ToTensor(), normalize]) ) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, shuffle=True, num_workers=args.workers, pin_memory=True, sampler=None ) return train_loader def predict(data_loader, model, weights, batch_size): # switch to evaluate mode model.eval() with torch.no_grad(): for i, (images, target) in enumerate(data_loader): images = images.cuda() print(images.size()) # compute predictions pred = model(images) if i == 0: break linear_combination_map = torch.einsum('ijkl,j->ikl', pred, weights) x = torch.zeros(batch_size, 3, 7, 7) x[:, 0, :, :] = linear_combination_map x[:, 1, :, :] = linear_combination_map x[:, 2, :, :] = linear_combination_map m = torch.nn.Upsample(scale_factor=32, mode='bicubic') mm = m(x).cuda() mm = torch.sigmoid(10. * mm / torch.std(mm)) return mm * images def show_img(ax, img, save_name): '''Save maps''' npimg = img.cpu().numpy() print(npimg.shape) ax.imshow(np.transpose(npimg, (1, 2, 0)), interpolation='nearest') ax.spines["bottom"].set_visible(False) ax.spines["left"].set_visible(False) ax.spines["right"].set_visible(False) ax.spines["top"].set_visible(False) mp.rcParams['axes.linewidth'] = 0.75 mp.rcParams['patch.linewidth'] = 0.75 mp.rcParams['patch.linewidth'] = 1.15 mp.rcParams['font.sans-serif'] = ['FreeSans'] mp.rcParams['mathtext.fontset'] = 'cm' plt.savefig(save_name, bbox_inches='tight') if __name__ == '__main__': parser = argparse.ArgumentParser(description='Plot spatial attention maps') parser.add_argument('data', metavar='DIR', help='path to dataset') parser.add_argument('--workers', default=4, type=int, help='number of data loading workers (default: 4)') parser.add_argument('--batch-size', default=36, type=int, help='mini-batch size, this is the total ' 'batch size of all GPUs on the current node when ' 'using Data Parallel or Distributed Data Parallel') parser.add_argument('--model-path', default='', type=str, help='path to model checkpoint (default: ' 'ImageNet-pretrained)') parser.add_argument('--n_out', default=1000, type=int, help='output dim of pre-trained model') parser.add_argument('--class-idx', default=6, type=int, help='class index for which the maps will be computed') args = parser.parse_args() model = load_model(args) map_layer = extract_map_layer_7x7(model) weights = model.module.classifier.weight.data[args.class_idx, :].cuda() data_loader = load_data(args.data, args) preds = predict(data_loader, map_layer, weights, args.batch_size) print('Preds shape:', preds.shape) fig_pred = plt.figure(figsize=(16, 16), dpi=300) ax_pred = fig_pred.add_subplot('111') grid_pred = make_grid(preds, nrow=12, padding=1, normalize=True, scale_each=False) show_img(ax_pred, grid_pred, 'linear_combination_maps_class_' + str(args. class_idx) + '.pdf') ================================================ FILE: linear_decoding.py ================================================ import argparse import os import random import shutil import time import warnings import numpy as np import torch import torch.nn as nn import torch.nn.parallel import torch.backends.cudnn as cudnn import torch.distributed as dist import torch.optim import torch.multiprocessing as mp import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models parser = argparse.ArgumentParser(description='Linear decoding with headcam data') parser.add_argument('data', metavar='DIR', help='path to dataset') parser.add_argument('-j', '--workers', default=32, type=int, metavar='N', help='number of data loading workers (default: 32)') parser.add_argument('--epochs', default=100, type=int, metavar='N', help='number of total epochs to run') parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') parser.add_argument('-b', '--batch-size', default=256, type=int, metavar='N', help='mini-batch size (default: 1024), this is the total batch size of all GPUs on the current node ' 'when using Data Parallel or Distributed Data Parallel') parser.add_argument('--lr', '--learning-rate', default=0.0005, type=float, metavar='LR', help='initial learning rate', dest='lr') parser.add_argument('--wd', '--weight-decay', default=0.0, type=float, metavar='W', help='weight decay (default: 0)', dest='weight_decay') parser.add_argument('-p', '--print-freq', default=100, type=int, metavar='N', help='print frequency (default: 100)') parser.add_argument('--world-size', default=-1, type=int, help='number of nodes for distributed training') parser.add_argument('--rank', default=-1, type=int, help='node rank for distributed training') parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, help='url used to set up distributed training') parser.add_argument('--dist-backend', default='nccl', type=str, help='distributed backend') parser.add_argument('--gpu', default=None, type=int, help='GPU id to use.') parser.add_argument('--multiprocessing-distributed', action='store_true', help='Use multi-processing distributed training to launch ' 'N processes per node, which has N GPUs. This is the ' 'fastest way to use PyTorch for either single node or ' 'multi node data parallel training') parser.add_argument('--model-name', type=str, default='random', choices=['random', 'imagenet', 'TC-S', 'TC-A', 'TC-Y', 'TC-SAY', 'moco_img_0011', 'moco_temp_0011'], help='evaluated model') parser.add_argument('--num-outs', default=16127, type=int, help='number of outputs in pretrained model') parser.add_argument('--num-classes', default=26, type=int, help='number of classes in downstream classification task') parser.add_argument('--subsample', default=False, action='store_true', help='subsample data?') def set_parameter_requires_grad(model, feature_extracting=True): '''Helper function for setting body to non-trainable''' if feature_extracting: for param in model.parameters(): param.requires_grad = False def load_split_train_test(datadir, args, train_frac=0.5): import numpy as np normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) train_data = datasets.ImageFolder(datadir, transform=transforms.Compose([transforms.ToTensor(), normalize])) test_data = datasets.ImageFolder(datadir, transform=transforms.Compose([transforms.ToTensor(), normalize])) num_train = len(train_data) print('Total data size is', num_train) indices = list(range(num_train)) split = int(np.floor(train_frac * num_train)) np.random.shuffle(indices) if args.subsample: num_data = int(0.1 * num_train) train_idx, test_idx = indices[:(num_data // 2)], indices[(num_data // 2):num_data] else: train_idx, test_idx = indices[:split], indices[split:] print('Training data size is', len(train_idx)) print('Test data size is', len(test_idx)) train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idx) test_sampler = torch.utils.data.sampler.SubsetRandomSampler(test_idx) trainloader = torch.utils.data.DataLoader(train_data, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True, sampler=train_sampler) testloader = torch.utils.data.DataLoader(test_data, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True, sampler=test_sampler) return trainloader, testloader def main(): args = parser.parse_args() if args.gpu is not None: warnings.warn('You have chosen a specific GPU. This will completely disable data parallelism.') if args.dist_url == "env://" and args.world_size == -1: args.world_size = int(os.environ["WORLD_SIZE"]) args.distributed = args.world_size > 1 or args.multiprocessing_distributed ngpus_per_node = torch.cuda.device_count() if args.multiprocessing_distributed: # Since we have ngpus_per_node processes per node, the total world_size needs to be adjusted accordingly args.world_size = ngpus_per_node * args.world_size # Use torch.multiprocessing.spawn to launch distributed processes: the main_worker process function mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) else: # Simply call main_worker function main_worker(args.gpu, ngpus_per_node, args) def main_worker(gpu, ngpus_per_node, args): args.gpu = gpu if args.gpu is not None: print("Use GPU: {} for training".format(args.gpu)) # model definition num_classes = args.num_classes if args.model_name == 'random': model = models.mobilenet_v2(pretrained=False) set_parameter_requires_grad(model) model.classifier = torch.nn.Linear(in_features=1280, out_features=num_classes, bias=True) model = torch.nn.DataParallel(model).cuda() elif args.model_name == 'imagenet': model = models.mobilenet_v2(pretrained=True) set_parameter_requires_grad(model) model.classifier = torch.nn.Linear(in_features=1280, out_features=num_classes, bias=True) model = torch.nn.DataParallel(model).cuda() elif args.model_name.startswith('moco'): model = models.mobilenet_v2(pretrained=False) model.classifier = torch.nn.Linear(in_features=1280, out_features=args.num_outs, bias=True) checkpoint = torch.load('../self_supervised_models/' + args.model_name + '.pth.tar') # rename moco pre-trained keys state_dict = checkpoint['state_dict'] for k in list(state_dict.keys()): # retain only encoder_q up to before the embedding layer if k.startswith('module.encoder_q') and not k.startswith('module.encoder_q.classifier'): # remove prefix state_dict[k[len("module.encoder_q."):]] = state_dict[k] # delete renamed or unused k del state_dict[k] msg = model.load_state_dict(state_dict, strict=False) assert set(msg.missing_keys) == {"classifier.weight", "classifier.bias"} print("=> loaded pre-trained model '{}'".format(args.model_name)) set_parameter_requires_grad(model) # freeze the trunk model.classifier = torch.nn.Linear(in_features=1280, out_features=num_classes, bias=True) model = torch.nn.DataParallel(model).cuda() else: model = models.resnext50_32x4d(pretrained=False) model.fc = torch.nn.Linear(in_features=2048, out_features=args.num_outs, bias=True) model = torch.nn.DataParallel(model).cuda() checkpoint = torch.load(args.model_name + '.tar') model.load_state_dict(checkpoint['model_state_dict']) set_parameter_requires_grad(model) # freeze the trunk model.module.fc = torch.nn.Linear(in_features=2048, out_features=num_classes, bias=True).cuda() # define loss function (criterion) and optimizer criterion = nn.CrossEntropyLoss().cuda(args.gpu) optimizer = torch.optim.Adam(model.parameters(), args.lr, weight_decay=args.weight_decay) cudnn.benchmark = True # Data loading code savefile_name = args.model_name + '_labeledS_5_iid.tar' train_loader, test_loader = load_split_train_test(args.data, args) acc1_list = [] val_acc1_list = [] for epoch in range(args.start_epoch, args.epochs): # train for one epoch acc1 = train(train_loader, model, criterion, optimizer, epoch, args) acc1_list.append(acc1) # validate at end of epoch val_acc1, preds, target, images = validate(test_loader, model, args) val_acc1_list.append(val_acc1) torch.save({'acc1_list': acc1_list, 'val_acc1_list': val_acc1_list, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'preds': preds, 'target': target, 'images': images }, savefile_name) def train(train_loader, model, criterion, optimizer, epoch, args): batch_time = AverageMeter('Time', ':6.3f') data_time = AverageMeter('Data', ':6.3f') losses = AverageMeter('Loss', ':.4e') top1 = AverageMeter('Acc@1', ':6.2f') top5 = AverageMeter('Acc@5', ':6.2f') progress = ProgressMeter( len(train_loader), [batch_time, data_time, losses, top1, top5], prefix="Epoch: [{}]".format(epoch)) # switch to train mode model.train() end = time.time() for i, (images, target) in enumerate(train_loader): # measure data loading time data_time.update(time.time() - end) if args.gpu is not None: images = images.cuda(args.gpu, non_blocking=True) target = target.cuda(args.gpu, non_blocking=True) # compute output output = model(images) loss = criterion(output, target) # measure accuracy and record loss acc1, acc5 = accuracy(output, target, topk=(1, 2)) losses.update(loss.item(), images.size(0)) top1.update(acc1[0], images.size(0)) top5.update(acc5[0], images.size(0)) # compute gradient and do SGD step optimizer.zero_grad() loss.backward() optimizer.step() # for param in model.parameters(): # print(param.requires_grad) # measure elapsed time batch_time.update(time.time() - end) end = time.time() if i % args.print_freq == 0: progress.display(i) return top1.avg.cpu().numpy() def validate(val_loader, model, args): batch_time = AverageMeter('Time', ':6.3f') top1 = AverageMeter('Acc@1', ':6.2f') # switch to evaluate mode model.eval() with torch.no_grad(): end = time.time() for i, (images, target) in enumerate(val_loader): if args.gpu is not None: images = images.cuda(args.gpu, non_blocking=True) target = target.cuda(args.gpu, non_blocking=True) # compute output output = model(images) preds = np.argmax(output.cpu().numpy(), axis=1) # measure accuracy and record loss acc1 = accuracy(output, target, topk=(1, )) top1.update(acc1[0].cpu().numpy()[0], images.size(0)) # measure elapsed time batch_time.update(time.time() - end) end = time.time() print('* Acc@1 {top1.avg:.3f} '.format(top1=top1)) return top1.avg, preds, target.cpu().numpy(), images.cpu().numpy() class AverageMeter(object): """Computes and stores the average and current value""" def __init__(self, name, fmt=':f'): self.name = name self.fmt = fmt self.reset() def reset(self): self.val = 0 self.avg = 0 self.sum = 0 self.count = 0 def update(self, val, n=1): self.val = val self.sum += val * n self.count += n self.avg = self.sum / self.count def __str__(self): fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' return fmtstr.format(**self.__dict__) class ProgressMeter(object): def __init__(self, num_batches, meters, prefix=""): self.batch_fmtstr = self._get_batch_fmtstr(num_batches) self.meters = meters self.prefix = prefix def display(self, batch): entries = [self.prefix + self.batch_fmtstr.format(batch)] entries += [str(meter) for meter in self.meters] print('\t'.join(entries)) def _get_batch_fmtstr(self, num_batches): num_digits = len(str(num_batches // 1)) fmt = '{:' + str(num_digits) + 'd}' return '[' + fmt + '/' + fmt.format(num_batches) + ']' def accuracy(output, target, topk=(1,)): """Computes the accuracy over the k top predictions for the specified values of k""" with torch.no_grad(): maxk = max(topk) batch_size = target.size(0) _, pred = output.topk(maxk, 1, True, True) pred = pred.t() correct = pred.eq(target.view(1, -1).expand_as(pred)) res = [] for k in topk: correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) res.append(correct_k.mul_(100.0 / batch_size)) return res if __name__ == '__main__': main() ================================================ FILE: moco/__init__.py ================================================ # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved ================================================ FILE: moco/builder.py ================================================ # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved import torch import torch.nn as nn class MoCo(nn.Module): """ Build a MoCo model with: a query encoder, a key encoder, and a queue https://arxiv.org/abs/1911.05722 """ def __init__(self, base_encoder, dim=128, K=65536, m=0.999, T=0.07, mlp=False): """ dim: feature dimension (default: 128) K: queue size; number of negative keys (default: 65536) m: moco momentum of updating key encoder (default: 0.999) T: softmax temperature (default: 0.07) """ super(MoCo, self).__init__() self.K = K self.m = m self.T = T # create the encoders # num_classes is the output fc dimension self.encoder_q = base_encoder(num_classes=dim) self.encoder_k = base_encoder(num_classes=dim) # self.encoder_q.classifier = self.encoder_q.classifier[-1] # remove dropout (only for mobilenet_v2) # self.encoder_k.classifier = self.encoder_k.classifier[-1] # remove dropout (only for mobilenet_v2) if mlp: # hack: brute-force replacement dim_mlp = self.encoder_q.fc.weight.shape[1] self.encoder_q.fc = nn.Sequential(nn.Linear(dim_mlp, dim_mlp), nn.ReLU(), self.encoder_q.fc) self.encoder_k.fc = nn.Sequential(nn.Linear(dim_mlp, dim_mlp), nn.ReLU(), self.encoder_k.fc) for param_q, param_k in zip(self.encoder_q.parameters(), self.encoder_k.parameters()): param_k.data.copy_(param_q.data) # initialize param_k.requires_grad = False # not update by gradient # create the queue self.register_buffer("queue", torch.randn(dim, K)) self.queue = nn.functional.normalize(self.queue, dim=0) self.register_buffer("queue_ptr", torch.zeros(1, dtype=torch.long)) @torch.no_grad() def _momentum_update_key_encoder(self): """ Momentum update of the key encoder """ for param_q, param_k in zip(self.encoder_q.parameters(), self.encoder_k.parameters()): param_k.data = param_k.data * self.m + param_q.data * (1. - self.m) @torch.no_grad() def _dequeue_and_enqueue(self, keys): # gather keys before updating queue keys = concat_all_gather(keys) batch_size = keys.shape[0] ptr = int(self.queue_ptr) assert self.K % batch_size == 0 # for simplicity # replace the keys at ptr (dequeue and enqueue) self.queue[:, ptr:ptr + batch_size] = keys.T ptr = (ptr + batch_size) % self.K # move pointer self.queue_ptr[0] = ptr @torch.no_grad() def _batch_shuffle_ddp(self, x): """ Batch shuffle, for making use of BatchNorm. *** Only support DistributedDataParallel (DDP) model. *** """ # gather from all gpus batch_size_this = x.shape[0] x_gather = concat_all_gather(x) batch_size_all = x_gather.shape[0] num_gpus = batch_size_all // batch_size_this # random shuffle index idx_shuffle = torch.randperm(batch_size_all).cuda() # broadcast to all gpus torch.distributed.broadcast(idx_shuffle, src=0) # index for restoring idx_unshuffle = torch.argsort(idx_shuffle) # shuffled index for this gpu gpu_idx = torch.distributed.get_rank() idx_this = idx_shuffle.view(num_gpus, -1)[gpu_idx] return x_gather[idx_this], idx_unshuffle @torch.no_grad() def _batch_unshuffle_ddp(self, x, idx_unshuffle): """ Undo batch shuffle. *** Only support DistributedDataParallel (DDP) model. *** """ # gather from all gpus batch_size_this = x.shape[0] x_gather = concat_all_gather(x) batch_size_all = x_gather.shape[0] num_gpus = batch_size_all // batch_size_this # restored index for this gpu gpu_idx = torch.distributed.get_rank() idx_this = idx_unshuffle.view(num_gpus, -1)[gpu_idx] return x_gather[idx_this] def forward(self, im_q, im_k): """ Input: im_q: a batch of query images im_k: a batch of key images Output: logits, targets """ # compute query features q = self.encoder_q(im_q) # queries: NxC q = nn.functional.normalize(q, dim=1) # compute key features with torch.no_grad(): # no gradient to keys self._momentum_update_key_encoder() # update the key encoder # shuffle for making use of BN im_k, idx_unshuffle = self._batch_shuffle_ddp(im_k) k = self.encoder_k(im_k) # keys: NxC k = nn.functional.normalize(k, dim=1) # undo shuffle k = self._batch_unshuffle_ddp(k, idx_unshuffle) # compute logits # Einstein sum is more intuitive # positive logits: Nx1 l_pos = torch.einsum('nc,nc->n', [q, k]).unsqueeze(-1) # negative logits: NxK l_neg = torch.einsum('nc,ck->nk', [q, self.queue.clone().detach()]) # logits: Nx(1+K) logits = torch.cat([l_pos, l_neg], dim=1) # apply temperature logits /= self.T # labels: positive key indicators labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda() # dequeue and enqueue self._dequeue_and_enqueue(k) return logits, labels # utils @torch.no_grad() def concat_all_gather(tensor): """ Performs all_gather operation on the provided tensors. *** Warning ***: torch.distributed.all_gather has no gradient. """ tensors_gather = [torch.ones_like(tensor) for _ in range(torch.distributed.get_world_size())] torch.distributed.all_gather(tensors_gather, tensor, async_op=False) output = torch.cat(tensors_gather, dim=0) return output ================================================ FILE: moco/loader.py ================================================ # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved from PIL import ImageFilter import random class TwoCropsTransform: """Take two random crops of one image as the query and key.""" def __init__(self, base_transform): self.base_transform = base_transform def __call__(self, x): q = self.base_transform(x) k = self.base_transform(x) return [q, k] class GaussianBlur(object): """Gaussian blur augmentation in SimCLR https://arxiv.org/abs/2002.05709""" def __init__(self, sigma=[.1, 2.]): self.sigma = sigma def __call__(self, x): sigma = random.uniform(self.sigma[0], self.sigma[1]) x = x.filter(ImageFilter.GaussianBlur(radius=sigma)) return x ================================================ FILE: moco_img.py ================================================ #!/usr/bin/env python # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved import argparse import builtins import math import os import random import shutil import time import warnings import torch import torch.nn as nn import torch.nn.parallel import torch.backends.cudnn as cudnn import torch.distributed as dist import torch.optim import torch.multiprocessing as mp import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models import moco.loader import moco.builder model_names = sorted(name for name in models.__dict__ if name.islower() and not name.startswith("__") and callable(models.__dict__[name])) parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') parser.add_argument('data', metavar='DIR', help='path to dataset') parser.add_argument('-a', '--arch', metavar='ARCH', default='resnet50', choices=model_names, help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet50)') parser.add_argument('-j', '--workers', default=32, type=int, metavar='N', help='number of data loading workers (default: 32)') parser.add_argument('--epochs', default=12, type=int, metavar='N', help='number of total epochs to run') parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') parser.add_argument('-b', '--batch-size', default=256, type=int, metavar='N', help='mini-batch size (default: 256), this is the total ' 'batch size of all GPUs on the current node when ' 'using Data Parallel or Distributed Data Parallel') parser.add_argument('--lr', '--learning-rate', default=0.03, type=float, metavar='LR', help='initial learning rate', dest='lr') parser.add_argument('--schedule', default=[11, 20], nargs='*', type=int, help='learning rate schedule (when to drop lr by 10x)') parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum of SGD solver') parser.add_argument('--wd', '--weight-decay', default=0, type=float, metavar='W', help='weight decay (default: 0)', dest='weight_decay') parser.add_argument('-p', '--print-freq', default=1000, type=int, metavar='N', help='print frequency (default: 10)') parser.add_argument('--resume', default='', type=str, metavar='PATH', help='path to latest checkpoint (default: none)') parser.add_argument('--world-size', default=-1, type=int, help='number of nodes for distributed training') parser.add_argument('--rank', default=-1, type=int, help='node rank for distributed training') parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, help='url used to set up distributed training') parser.add_argument('--dist-backend', default='nccl', type=str, help='distributed backend') parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ') parser.add_argument('--gpu', default=None, type=int, help='GPU id to use.') parser.add_argument('--multiprocessing-distributed', action='store_true', help='Use multi-processing distributed training to launch ' 'N processes per node, which has N GPUs. This is the ' 'fastest way to use PyTorch for either single node or ' 'multi node data parallel training') # moco specific configs: parser.add_argument('--moco-dim', default=128, type=int, help='feature dimension (default: 128)') parser.add_argument('--moco-k', default=65536, type=int, help='queue size; number of negative keys (default: 65536)') parser.add_argument('--moco-m', default=0.999, type=float, help='moco momentum of updating key encoder (default: 0.999)') parser.add_argument('--moco-t', default=0.07, type=float, help='softmax temperature (default: 0.07)') # options for moco v2 parser.add_argument('--mlp', action='store_true', help='use mlp head') parser.add_argument('--aug-plus', action='store_true', help='use moco v2 data augmentation') parser.add_argument('--cos', action='store_true', help='use cosine lr schedule') def main(): args = parser.parse_args() if args.seed is not None: random.seed(args.seed) torch.manual_seed(args.seed) cudnn.deterministic = True warnings.warn('You have chosen to seed training. ' 'This will turn on the CUDNN deterministic setting, ' 'which can slow down your training considerably! ' 'You may see unexpected behavior when restarting ' 'from checkpoints.') if args.gpu is not None: warnings.warn('You have chosen a specific GPU. This will completely ' 'disable data parallelism.') if args.dist_url == "env://" and args.world_size == -1: args.world_size = int(os.environ["WORLD_SIZE"]) args.distributed = args.world_size > 1 or args.multiprocessing_distributed ngpus_per_node = torch.cuda.device_count() if args.multiprocessing_distributed: # Since we have ngpus_per_node processes per node, the total world_size # needs to be adjusted accordingly args.world_size = ngpus_per_node * args.world_size # Use torch.multiprocessing.spawn to launch distributed processes: the # main_worker process function mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) else: # Simply call main_worker function main_worker(args.gpu, ngpus_per_node, args) def main_worker(gpu, ngpus_per_node, args): args.gpu = gpu print(args) # suppress printing if not master if args.multiprocessing_distributed and args.gpu != 0: def print_pass(*args): pass builtins.print = print_pass if args.gpu is not None: print("Use GPU: {} for training".format(args.gpu)) if args.distributed: if args.dist_url == "env://" and args.rank == -1: args.rank = int(os.environ["RANK"]) if args.multiprocessing_distributed: # For multiprocessing distributed training, rank needs to be the # global rank among all the processes args.rank = args.rank * ngpus_per_node + gpu dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, world_size=args.world_size, rank=args.rank) # create model print("=> creating model '{}'".format(args.arch)) model = moco.builder.MoCo( models.__dict__[args.arch], args.moco_dim, args.moco_k, args.moco_m, args.moco_t, args.mlp) print(model) if args.distributed: # For multiprocessing distributed, DistributedDataParallel constructor # should always set the single device scope, otherwise, # DistributedDataParallel will use all available devices. if args.gpu is not None: torch.cuda.set_device(args.gpu) model.cuda(args.gpu) # When using a single GPU per process and per # DistributedDataParallel, we need to divide the batch size # ourselves based on the total number of GPUs we have args.batch_size = int(args.batch_size / ngpus_per_node) args.workers = int((args.workers + ngpus_per_node - 1) / ngpus_per_node) model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu]) else: model.cuda() # DistributedDataParallel will divide and allocate batch_size to all # available GPUs if device_ids are not set model = torch.nn.parallel.DistributedDataParallel(model) elif args.gpu is not None: torch.cuda.set_device(args.gpu) model = model.cuda(args.gpu) # comment out the following line for debugging raise NotImplementedError("Only DistributedDataParallel is supported.") else: # AllGather implementation (batch shuffle, queue update, etc.) in # this code only supports DistributedDataParallel. raise NotImplementedError("Only DistributedDataParallel is supported.") # define loss function (criterion) and optimizer criterion = nn.CrossEntropyLoss().cuda(args.gpu) optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) # optionally resume from a checkpoint if args.resume: if os.path.isfile(args.resume): print("=> loading checkpoint '{}'".format(args.resume)) if args.gpu is None: checkpoint = torch.load(args.resume) else: # Map model to be loaded to specified single gpu. loc = 'cuda:{}'.format(args.gpu) checkpoint = torch.load(args.resume, map_location=loc) args.start_epoch = checkpoint['epoch'] model.load_state_dict(checkpoint['state_dict']) optimizer.load_state_dict(checkpoint['optimizer']) print("=> loaded checkpoint '{}' (epoch {})" .format(args.resume, checkpoint['epoch'])) else: print("=> no checkpoint found at '{}'".format(args.resume)) cudnn.benchmark = True # Data loading code traindir = args.data normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) if args.aug_plus: # MoCo v2's aug: similar to SimCLR https://arxiv.org/abs/2002.05709 augmentation = [ transforms.RandomResizedCrop(224, scale=(0.2, 1.)), transforms.RandomApply([ transforms.ColorJitter(0.4, 0.4, 0.4, 0.1) # not strengthened ], p=0.8), transforms.RandomGrayscale(p=0.2), transforms.RandomApply([moco.loader.GaussianBlur([.1, 2.])], p=0.5), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize ] else: # MoCo v1's aug: the same as InstDisc https://arxiv.org/abs/1805.01978 augmentation = [ transforms.RandomResizedCrop(224, scale=(0.2, 1.)), transforms.RandomGrayscale(p=0.2), transforms.ColorJitter(0.4, 0.4, 0.4, 0.4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize ] train_dataset = datasets.ImageFolder( traindir, moco.loader.TwoCropsTransform(transforms.Compose(augmentation))) print('Dataset size:', len(train_dataset)) if args.distributed: train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset) else: train_sampler = None train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None), num_workers=args.workers, pin_memory=True, sampler=train_sampler, drop_last=True) print('Starting training ...') for epoch in range(args.start_epoch, args.epochs): if args.distributed: train_sampler.set_epoch(epoch) adjust_learning_rate(optimizer, epoch, args) print('Start of epoch ', epoch) # train for one epoch train(train_loader, model, criterion, optimizer, epoch, args) if not args.multiprocessing_distributed or (args.multiprocessing_distributed and args.rank % ngpus_per_node == 0): save_checkpoint({ 'epoch': epoch + 1, 'arch': args.arch, 'state_dict': model.state_dict(), 'optimizer' : optimizer.state_dict(), }, is_best=False, filename='moco_img_checkpoint_{:04d}.pth.tar'.format(epoch)) def train(train_loader, model, criterion, optimizer, epoch, args): batch_time = AverageMeter('Time', ':6.3f') data_time = AverageMeter('Data', ':6.3f') losses = AverageMeter('Loss', ':.4e') top1 = AverageMeter('Acc@1', ':6.2f') top5 = AverageMeter('Acc@5', ':6.2f') progress = ProgressMeter( len(train_loader), [batch_time, data_time, losses, top1, top5], prefix="Epoch: [{}]".format(epoch)) # switch to train mode model.train() end = time.time() for i, (images, _) in enumerate(train_loader): # measure data loading time data_time.update(time.time() - end) if args.gpu is not None: images[0] = images[0].cuda(args.gpu, non_blocking=True) images[1] = images[1].cuda(args.gpu, non_blocking=True) # compute output output, target = model(im_q=images[0], im_k=images[1]) loss = criterion(output, target) # acc1/acc5 are (K+1)-way contrast classifier accuracy # measure accuracy and record loss acc1, acc5 = accuracy(output, target, topk=(1, 5)) losses.update(loss.item(), images[0].size(0)) top1.update(acc1[0], images[0].size(0)) top5.update(acc5[0], images[0].size(0)) # compute gradient and do SGD step optimizer.zero_grad() loss.backward() optimizer.step() # measure elapsed time batch_time.update(time.time() - end) end = time.time() if i % args.print_freq == 0: progress.display(i) def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'): torch.save(state, filename) if is_best: shutil.copyfile(filename, 'model_best.pth.tar') class AverageMeter(object): """Computes and stores the average and current value""" def __init__(self, name, fmt=':f'): self.name = name self.fmt = fmt self.reset() def reset(self): self.val = 0 self.avg = 0 self.sum = 0 self.count = 0 def update(self, val, n=1): self.val = val self.sum += val * n self.count += n self.avg = self.sum / self.count def __str__(self): fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' return fmtstr.format(**self.__dict__) class ProgressMeter(object): def __init__(self, num_batches, meters, prefix=""): self.batch_fmtstr = self._get_batch_fmtstr(num_batches) self.meters = meters self.prefix = prefix def display(self, batch): entries = [self.prefix + self.batch_fmtstr.format(batch)] entries += [str(meter) for meter in self.meters] print('\t'.join(entries)) def _get_batch_fmtstr(self, num_batches): num_digits = len(str(num_batches // 1)) fmt = '{:' + str(num_digits) + 'd}' return '[' + fmt + '/' + fmt.format(num_batches) + ']' def adjust_learning_rate(optimizer, epoch, args): """Decay the learning rate based on schedule""" lr = args.lr if args.cos: # cosine lr schedule lr *= 0.5 * (1. + math.cos(math.pi * epoch / args.epochs)) else: # stepwise lr schedule for milestone in args.schedule: lr *= 0.1 if epoch >= milestone else 1. for param_group in optimizer.param_groups: param_group['lr'] = lr def accuracy(output, target, topk=(1,)): """Computes the accuracy over the k top predictions for the specified values of k""" with torch.no_grad(): maxk = max(topk) batch_size = target.size(0) _, pred = output.topk(maxk, 1, True, True) pred = pred.t() correct = pred.eq(target.view(1, -1).expand_as(pred)) res = [] for k in topk: correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) res.append(correct_k.mul_(100.0 / batch_size)) return res if __name__ == '__main__': main() ================================================ FILE: moco_temp.py ================================================ #!/usr/bin/env python # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved import argparse import builtins import math import os import random import shutil import time import warnings import torch import torch.nn as nn import torch.nn.parallel import torch.backends.cudnn as cudnn import torch.distributed as dist import torch.optim import torch.multiprocessing as mp import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models import moco.loader import moco.builder from moco_utils import DistributedProxySampler, ContrastiveBatchSampler model_names = sorted(name for name in models.__dict__ if name.islower() and not name.startswith("__") and callable(models.__dict__[name])) parser = argparse.ArgumentParser(description='PyTorch ImageNet Training') parser.add_argument('data', metavar='DIR', help='path to dataset') parser.add_argument('-a', '--arch', metavar='ARCH', default='resnet50', choices=model_names, help='model architecture: ' + ' | '.join(model_names) + ' (default: resnet50)') parser.add_argument('-j', '--workers', default=32, type=int, metavar='N', help='number of data loading workers (default: 32)') parser.add_argument('--epochs', default=12, type=int, metavar='N', help='number of total epochs to run') parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') parser.add_argument('-b', '--batch-size', default=256, type=int, metavar='N', help='mini-batch size (default: 256), this is the total ' 'batch size of all GPUs on the current node when ' 'using Data Parallel or Distributed Data Parallel') parser.add_argument('--lr', '--learning-rate', default=0.03, type=float, metavar='LR', help='initial learning rate', dest='lr') parser.add_argument('--schedule', default=[11, 20], nargs='*', type=int, help='learning rate schedule (when to drop lr by 10x)') parser.add_argument('--momentum', default=0.9, type=float, metavar='M', help='momentum of SGD solver') parser.add_argument('--wd', '--weight-decay', default=0, type=float, metavar='W', help='weight decay (default: 0)', dest='weight_decay') parser.add_argument('-p', '--print-freq', default=1000, type=int, metavar='N', help='print frequency (default: 10)') parser.add_argument('--resume', default='', type=str, metavar='PATH', help='path to latest checkpoint (default: none)') parser.add_argument('--world-size', default=-1, type=int, help='number of nodes for distributed training') parser.add_argument('--rank', default=-1, type=int, help='node rank for distributed training') parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, help='url used to set up distributed training') parser.add_argument('--dist-backend', default='nccl', type=str, help='distributed backend') parser.add_argument('--seed', default=None, type=int, help='seed for initializing training. ') parser.add_argument('--gpu', default=None, type=int, help='GPU id to use.') parser.add_argument('--multiprocessing-distributed', action='store_true', help='Use multi-processing distributed training to launch ' 'N processes per node, which has N GPUs. This is the ' 'fastest way to use PyTorch for either single node or ' 'multi node data parallel training') # moco specific configs: parser.add_argument('--moco-dim', default=128, type=int, help='feature dimension (default: 128)') parser.add_argument('--moco-k', default=65536, type=int, help='queue size; number of negative keys (default: 65536)') parser.add_argument('--moco-m', default=0.999, type=float, help='moco momentum of updating key encoder (default: 0.999)') parser.add_argument('--moco-t', default=0.07, type=float, help='softmax temperature (default: 0.07)') # options for moco v2 parser.add_argument('--mlp', action='store_true', help='use mlp head') parser.add_argument('--aug-plus', action='store_true', help='use moco v2 data augmentation') parser.add_argument('--cos', action='store_true', help='use cosine lr schedule') def main(): args = parser.parse_args() if args.seed is not None: random.seed(args.seed) torch.manual_seed(args.seed) cudnn.deterministic = True warnings.warn('You have chosen to seed training. ' 'This will turn on the CUDNN deterministic setting, ' 'which can slow down your training considerably! ' 'You may see unexpected behavior when restarting ' 'from checkpoints.') if args.gpu is not None: warnings.warn('You have chosen a specific GPU. This will completely ' 'disable data parallelism.') if args.dist_url == "env://" and args.world_size == -1: args.world_size = int(os.environ["WORLD_SIZE"]) args.distributed = args.world_size > 1 or args.multiprocessing_distributed ngpus_per_node = torch.cuda.device_count() if args.multiprocessing_distributed: # Since we have ngpus_per_node processes per node, the total world_size # needs to be adjusted accordingly args.world_size = ngpus_per_node * args.world_size # Use torch.multiprocessing.spawn to launch distributed processes: the # main_worker process function mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) else: # Simply call main_worker function main_worker(args.gpu, ngpus_per_node, args) def main_worker(gpu, ngpus_per_node, args): args.gpu = gpu # suppress printing if not master if args.multiprocessing_distributed and args.gpu != 0: def print_pass(*args): pass builtins.print = print_pass if args.gpu is not None: print("Use GPU: {} for training".format(args.gpu)) if args.distributed: if args.dist_url == "env://" and args.rank == -1: args.rank = int(os.environ["RANK"]) if args.multiprocessing_distributed: # For multiprocessing distributed training, rank needs to be the # global rank among all the processes args.rank = args.rank * ngpus_per_node + gpu dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url, world_size=args.world_size, rank=args.rank) # create model print("=> creating model '{}'".format(args.arch)) model = moco.builder.MoCo( models.__dict__[args.arch], args.moco_dim, args.moco_k, args.moco_m, args.moco_t, args.mlp) print(model) if args.distributed: # For multiprocessing distributed, DistributedDataParallel constructor # should always set the single device scope, otherwise, # DistributedDataParallel will use all available devices. if args.gpu is not None: torch.cuda.set_device(args.gpu) model.cuda(args.gpu) # When using a single GPU per process and per # DistributedDataParallel, we need to divide the batch size # ourselves based on the total number of GPUs we have args.batch_size = int(args.batch_size / ngpus_per_node) args.workers = int((args.workers + ngpus_per_node - 1) / ngpus_per_node) model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu]) else: model.cuda() # DistributedDataParallel will divide and allocate batch_size to all # available GPUs if device_ids are not set model = torch.nn.parallel.DistributedDataParallel(model) elif args.gpu is not None: torch.cuda.set_device(args.gpu) model = model.cuda(args.gpu) # comment out the following line for debugging raise NotImplementedError("Only DistributedDataParallel is supported.") else: # AllGather implementation (batch shuffle, queue update, etc.) in # this code only supports DistributedDataParallel. raise NotImplementedError("Only DistributedDataParallel is supported.") # define loss function (criterion) and optimizer criterion = nn.CrossEntropyLoss().cuda(args.gpu) optimizer = torch.optim.SGD(model.parameters(), args.lr, momentum=args.momentum, weight_decay=args.weight_decay) # optionally resume from a checkpoint if args.resume: if os.path.isfile(args.resume): print("=> loading checkpoint '{}'".format(args.resume)) if args.gpu is None: checkpoint = torch.load(args.resume) else: # Map model to be loaded to specified single gpu. loc = 'cuda:{}'.format(args.gpu) checkpoint = torch.load(args.resume, map_location=loc) args.start_epoch = checkpoint['epoch'] model.load_state_dict(checkpoint['state_dict']) optimizer.load_state_dict(checkpoint['optimizer']) print("=> loaded checkpoint '{}' (epoch {})" .format(args.resume, checkpoint['epoch'])) else: print("=> no checkpoint found at '{}'".format(args.resume)) cudnn.benchmark = True # Data loading code traindir = args.data normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) if args.aug_plus: # MoCo v2's aug: similar to SimCLR https://arxiv.org/abs/2002.05709 augmentation = [ transforms.RandomResizedCrop(224, scale=(0.2, 1.)), transforms.RandomApply([ transforms.ColorJitter(0.4, 0.4, 0.4, 0.1) # not strengthened ], p=0.8), transforms.RandomGrayscale(p=0.2), transforms.RandomApply([moco.loader.GaussianBlur([.1, 2.])], p=0.5), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize ] else: # MoCo v1's aug: the same as InstDisc https://arxiv.org/abs/1805.01978 augmentation = [ transforms.RandomResizedCrop(224, scale=(0.2, 1.)), transforms.RandomGrayscale(p=0.2), transforms.ColorJitter(0.4, 0.4, 0.4, 0.4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize ] train_dataset = datasets.ImageFolder( traindir, transforms.Compose(augmentation)) print('Dataset size:', len(train_dataset)) if args.distributed: train_sampler = DistributedProxySampler(ContrastiveBatchSampler( train_dataset, args.batch_size, 1, False)) else: train_sampler = None train_loader = torch.utils.data.DataLoader(train_dataset, shuffle=(train_sampler is None), num_workers=args.workers, pin_memory=True, batch_sampler=train_sampler) print('Starting training ...') for epoch in range(args.start_epoch, args.epochs): if args.distributed: train_sampler.set_epoch(epoch) adjust_learning_rate(optimizer, epoch, args) print('Start of epoch ', epoch) # train for one epoch train(train_loader, model, criterion, optimizer, epoch, args) if not args.multiprocessing_distributed or (args.multiprocessing_distributed and args.rank % ngpus_per_node == 0): save_checkpoint({ 'epoch': epoch + 1, 'arch': args.arch, 'state_dict': model.state_dict(), 'optimizer' : optimizer.state_dict(), }, is_best=False, filename='moco_temp_checkpoint_{:04d}.pth.tar'.format(epoch)) def train(train_loader, model, criterion, optimizer, epoch, args): batch_time = AverageMeter('Time', ':6.3f') data_time = AverageMeter('Data', ':6.3f') losses = AverageMeter('Loss', ':.4e') top1 = AverageMeter('Acc@1', ':6.2f') top5 = AverageMeter('Acc@5', ':6.2f') progress = ProgressMeter( len(train_loader), [batch_time, data_time, losses, top1, top5], prefix="Epoch: [{}]".format(epoch)) # switch to train mode model.train() end = time.time() for i, (images, _) in enumerate(train_loader): # measure data loading time data_time.update(time.time() - end) if args.gpu is not None: images_0 = images[:images.size(0)//2].cuda(args.gpu, non_blocking=True) images_1 = images[images.size(0)//2:].cuda(args.gpu, non_blocking=True) # compute output output, target = model(im_q=images_0, im_k=images_1) loss = criterion(output, target) # acc1/acc5 are (K+1)-way contrast classifier accuracy # measure accuracy and record loss acc1, acc5 = accuracy(output, target, topk=(1, 5)) losses.update(loss.item(), images_0.size(0)) top1.update(acc1[0], images_0.size(0)) top5.update(acc5[0], images_0.size(0)) # compute gradient and do SGD step optimizer.zero_grad() loss.backward() optimizer.step() # measure elapsed time batch_time.update(time.time() - end) end = time.time() if i % args.print_freq == 0: progress.display(i) def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'): torch.save(state, filename) if is_best: shutil.copyfile(filename, 'model_best.pth.tar') class AverageMeter(object): """Computes and stores the average and current value""" def __init__(self, name, fmt=':f'): self.name = name self.fmt = fmt self.reset() def reset(self): self.val = 0 self.avg = 0 self.sum = 0 self.count = 0 def update(self, val, n=1): self.val = val self.sum += val * n self.count += n self.avg = self.sum / self.count def __str__(self): fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' return fmtstr.format(**self.__dict__) class ProgressMeter(object): def __init__(self, num_batches, meters, prefix=""): self.batch_fmtstr = self._get_batch_fmtstr(num_batches) self.meters = meters self.prefix = prefix def display(self, batch): entries = [self.prefix + self.batch_fmtstr.format(batch)] entries += [str(meter) for meter in self.meters] print('\t'.join(entries)) def _get_batch_fmtstr(self, num_batches): num_digits = len(str(num_batches // 1)) fmt = '{:' + str(num_digits) + 'd}' return '[' + fmt + '/' + fmt.format(num_batches) + ']' def adjust_learning_rate(optimizer, epoch, args): """Decay the learning rate based on schedule""" lr = args.lr if args.cos: # cosine lr schedule lr *= 0.5 * (1. + math.cos(math.pi * epoch / args.epochs)) else: # stepwise lr schedule for milestone in args.schedule: lr *= 0.1 if epoch >= milestone else 1. for param_group in optimizer.param_groups: param_group['lr'] = lr def accuracy(output, target, topk=(1,)): """Computes the accuracy over the k top predictions for the specified values of k""" with torch.no_grad(): maxk = max(topk) batch_size = target.size(0) _, pred = output.topk(maxk, 1, True, True) pred = pred.t() correct = pred.eq(target.view(1, -1).expand_as(pred)) res = [] for k in topk: correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) res.append(correct_k.mul_(100.0 / batch_size)) return res if __name__ == '__main__': main() ================================================ FILE: moco_utils.py ================================================ # Defines some util functions import torch from torch.utils.data import Sampler from torch.utils.data.distributed import DistributedSampler class DistributedProxySampler(DistributedSampler): """Sampler that restricts data loading to a subset of input sampler indices. It is especially useful in conjunction with :class:`torch.nn.parallel.DistributedDataParallel`. In such case, each process can pass a DistributedSampler instance as a DataLoader sampler, and load a subset of the original dataset that is exclusive to it. .. note:: Input sampler is assumed to be of constant size. Arguments: sampler: Input data sampler. num_replicas (optional): Number of processes participating in distributed training. rank (optional): Rank of the current process within num_replicas. """ def __init__(self, sampler, num_replicas=None, rank=None): super(DistributedProxySampler, self).__init__(sampler, num_replicas=num_replicas, rank=rank, shuffle=False) self.sampler = sampler def __iter__(self): # deterministically shuffle based on epoch torch.manual_seed(self.epoch) indices = list(self.sampler) # add extra samples to make it evenly divisible indices += indices[:(self.total_size - len(indices))] if len(indices) != self.total_size: raise RuntimeError("{} vs {}".format(len(indices), self.total_size)) # subsample indices = indices[self.rank:self.total_size:self.num_replicas] if len(indices) != self.num_samples: raise RuntimeError("{} vs {}".format(len(indices), self.num_samples)) return iter(indices) def set_epoch(self, epoch): self.epoch = epoch class ContrastiveBatchSampler(Sampler): def __init__(self, data_source, batch_size, pos_window, drop_last): self.data_source = data_source self.batch_size = batch_size self.pos_window = pos_window self.drop_last = drop_last self.n = len(self.data_source) def __iter__(self): for i in range(self.n // self.batch_size): x = torch.randint(low=0, high=self.n-1, size=(self.batch_size//2,), dtype=torch.int64) y = x + torch.randint(low=-self.pos_window, high=self.pos_window, size=(self.batch_size//2,), dtype=torch.int64) y = torch.clamp(y, 0, self.n-1) z = x.tolist() + y.tolist() yield z def __len__(self): if self.drop_last: return self.n // self.batch_size else: return (self.n + self.batch_size - 1) // self.batch_size ================================================ FILE: read_saycam.py ================================================ import os import sys import argparse import cv2 import numpy as np parser = argparse.ArgumentParser(description='Read SAYCam videos') parser.add_argument('data', metavar='DIR', help='path to SAYCam videos') parser.add_argument('--save-dir', default='', type=str, help='save directory') parser.add_argument('--fps', default=5, type=int, help='sampling rate (frames per second)') parser.add_argument('--seg-len', default=288, type=int, help='segment length (seconds)') if __name__ == '__main__': args = parser.parse_args() file_list = os.listdir(args.data) file_list.sort() class_counter = 0 img_counter = 0 file_counter = 0 final_size = 224 resized_minor_length = 256 edge_filter = False n_imgs_per_class = args.seg_len * args.fps curr_dir_name = os.path.join(args.save_dir, 'class_{:04d}'.format(class_counter)) os.mkdir(curr_dir_name) for file_indx in file_list: file_name = os.path.join(args.data, file_indx) cap = cv2.VideoCapture(file_name) frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT)) frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)) frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)) frame_rate = int(cap.get(cv2.CAP_PROP_FPS)) # take every sample_rate frames (30: 1fps, 15: 2fps, 10: 3fps, 6: 5fps, 5: 6fps, 3: 10fps, 2: 15fps, 1: 30fps) sample_rate = frame_rate // args.fps + 1 print('Total frame count: ', frame_count) print('Native frame rate: ', frame_rate) fc = 0 ret = True # Resize new_height = frame_height * resized_minor_length // min(frame_height, frame_width) new_width = frame_width * resized_minor_length // min(frame_height, frame_width) while (fc < frame_count): ret, frame = cap.read() if fc % sample_rate == 0 and ret: # Resize resized_frame = cv2.resize(frame, (new_width, new_height), interpolation=cv2.INTER_CUBIC) # Crop height, width, _ = resized_frame.shape startx = width // 2 - (final_size // 2) starty = height // 2 - (final_size // 2) - 16 cropped_frame = resized_frame[starty:starty + final_size, startx:startx + final_size] assert cropped_frame.shape[0] == final_size and cropped_frame.shape[1] == final_size, \ (cropped_frame.shape, height, width) if edge_filter: cropped_frame = cv2.Laplacian(cropped_frame, cv2.CV_64F, ksize=5) img_min = cropped_frame.min() img_max = cropped_frame.max() cropped_frame = np.uint8(255 * (cropped_frame - img_min) / (img_max - img_min)) cv2.imwrite(os.path.join(curr_dir_name, 'img_{:04d}.jpeg'.format(img_counter)), cropped_frame[::-1, ::-1, :]) img_counter += 1 if img_counter == n_imgs_per_class: img_counter = 0 class_counter += 1 curr_dir_name = os.path.join(args.save_dir, 'class_{:04d}'.format(class_counter)) os.mkdir(curr_dir_name) fc += 1 cap.release() file_counter += 1 print('Completed video {:4d} of {:4d}'.format(file_counter, len(file_list))) ================================================ FILE: scripts/feature_animation.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=8 #SBATCH --gres=gpu:1080ti:2 #SBATCH --mem=150GB #SBATCH --time=1:00:00 #SBATCH --array=0 #SBATCH --job-name=feature_animation #SBATCH --output=feature_animation_%A_%a.out module purge module load cuda-10.1 python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/feature_animation.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/feature_animation_imgs_intphys/' --model-path '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/self_supervised_models/TC-SAY.tar' --batch-size 900 --n_out 6269 --feature-idx 600 echo "Done" ================================================ FILE: scripts/feature_animation_class.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --exclude=hpc1,hpc2,hpc3,hpc4,hpc5,hpc6,hpc7,hpc8,hpc9,vine3,vine4,vine6,vine11,vine12,lion17,rose7,rose8,rose9 #SBATCH --ntasks=1 #SBATCH --gres=gpu:4 #SBATCH --mem=100GB #SBATCH --time=1:00:00 #SBATCH --array=0 #SBATCH --job-name=feature_animation #SBATCH --output=feature_animation_%A_%a.out module purge module load cuda-10.1 python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/feature_animation_class.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/feature_animation_computers/' --model-path '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/self_supervised_models/TC-S_labeledS_5_iid.tar' --batch-size 200 --n_out 26 --class-idx 6 echo "Done" ================================================ FILE: scripts/highly_activating_imgs.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --gres=gpu:1 #SBATCH --cpus-per-task=1 #SBATCH --mem=16GB #SBATCH --time=1:00:00 #SBATCH --array=0 #SBATCH --job-name=activating_imgs #SBATCH --output=activating_imgs_%A_%a.out module purge module load cuda-10.1 python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/highly_activating_imgs.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --n_out 26 --model-path 'mobilenetV2_S_5fps_2000cls_coloraug_labeled.tar' echo "Done" ================================================ FILE: scripts/hog_baseline.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=100GB #SBATCH --time=6:00:00 #SBATCH --array=0 #SBATCH --job-name=hog #SBATCH --output=hog_%A_%a.out module purge python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/hog_baseline.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/toybox_1fps/' echo "Done" ================================================ FILE: scripts/imagenet_finetuning.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=16 #SBATCH --gres=gpu:titanrtx:4 #SBATCH --mem=150GB #SBATCH --time=48:00:00 #SBATCH --array=0 #SBATCH --job-name=finetune_imgnet #SBATCH --output=finetune_imgnet_%A_%a.out module purge module load cuda-10.1 #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --freeze-trunk --n_out 6269 --resume 'resnext50_32x4d_augmentstrong_batch256_True_SAY_5_288_epoch_15.tar' #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --freeze-trunk --n_out 2765 --resume 'resnext50_32x4d_augmentstrong_batch256_True_S_5_288_epoch_10.tar' #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --freeze-trunk --n_out 1786 --resume 'resnext50_32x4d_augmentstrong_batch256_True_A_5_288_epoch_10.tar' #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --freeze-trunk --n_out 1718 --resume 'resnext50_32x4d_augmentstrong_batch256_True_Y_5_288_epoch_10.tar' #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --frac-retained 0.01 --n_out 6269 --resume 'resnext50_32x4d_augmentstrong_batch256_True_SAY_5_288_epoch_15.tar' python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --freeze-trunk --n_out 1000 --resume 'ft_IN_resnext50_32x4d_augmentstrong_batch256_True_SAY_5_288_epoch_15.tar' echo "Done" ================================================ FILE: scripts/linear_combination_maps.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --gres=gpu:1 #SBATCH --cpus-per-task=1 #SBATCH --mem=16GB #SBATCH --time=1:00:00 #SBATCH --array=0 #SBATCH --job-name=linear_maps #SBATCH --output=linear_maps_%A_%a.out module purge module load cuda-10.1 python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_combination_maps.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --n_out 26 --model-path 'mobilenetV2_S_5fps_2000cls_coloraug_labeled.tar' echo "Done" ================================================ FILE: scripts/linear_decoding.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=8 #SBATCH --gres=gpu:titanrtx:2 #SBATCH --mem=150GB #SBATCH --time=12:00:00 #SBATCH --array=0 #SBATCH --job-name=linear_decoding #SBATCH --output=linear_decoding_%A_%a.out module purge module load cuda-10.1 #python -u /misc/vlgscratch4/LakeGroup/emin/baby_vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --model-name 'random' --num-classes 26 --subsample #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --model-name 'moco_img_0005' --num-classes 26 --subsample #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --model-name 'moco_temp_0005' --num-classes 26 --subsample #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_5/' --model-name 'TC-S' --num-outs 2765 #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --model-name 'mobilenetV2_A_5fps_2000cls_coloraug' --num-outs 1786 #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --model-name 'mobilenetV2_Y_5fps_2000cls_coloraug' --num-outs 1718 python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_5/' --model-name 'TC-SAY' --num-outs 6269 #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_5/' --model-name 'TC-S' --num-outs 2765 echo "Done" ================================================ FILE: scripts/moco_img.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=16 #SBATCH --gres=gpu:titanrtx:4 #SBATCH --mem=150GB #SBATCH --time=48:00:00 #SBATCH --array=0 #SBATCH --job-name=moco_img #SBATCH --output=moco_img_%A_%a.out module purge module load cuda-10.1 python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/moco_img.py \ -a resnext50_32x4d \ --lr 0.015 \ --batch-size 256 \ --mlp \ --moco-t 0.2 \ --aug-plus --cos \ --dist-url 'tcp://localhost:10001' \ --multiprocessing-distributed \ --world-size 1 --rank 0 \ --start-epoch 0 \ --resume '' \ '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_data_5fps_2000cls_pytorch/' echo "Done" ================================================ FILE: scripts/moco_temp.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=16 #SBATCH --gres=gpu:v100:4 #SBATCH --mem=150GB #SBATCH --time=48:00:00 #SBATCH --array=0 #SBATCH --job-name=moco_temp #SBATCH --output=moco_temp_%A_%a.out module purge module load cuda-10.1 python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/moco_temp.py \ -a resnext50_32x4d \ --lr 0.015 \ --batch-size 256 \ --mlp \ --moco-t 0.2 \ --aug-plus --cos \ --dist-url 'tcp://localhost:10001' \ --multiprocessing-distributed \ --world-size 1 --rank 0 \ --start-epoch 0 \ --resume '' \ '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_data_5fps_2000cls_pytorch/' echo "Done" ================================================ FILE: scripts/read_saycam.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=1 #SBATCH --mem=16GB #SBATCH --time=48:00:00 #SBATCH --array=0 #SBATCH --job-name=read_saycam #SBATCH --output=read_saycam_%A_%a.out module purge module load cuda-10.1 python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/read_saycam.py \ --save-dir '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_15fps_288s' \ --fps 15 \ --seg-len 288 \ '/misc/vlgscratch4/LakeGroup/emin/headcam/data_2/S' echo "Done" ================================================ FILE: scripts/selectivities.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --gres=gpu:2 #SBATCH --mem=64GB #SBATCH --time=1:00:00 #SBATCH --array=0 #SBATCH --job-name=selectivity #SBATCH --output=selectivity_%A_%a.out module purge module load cuda-10.1 python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/selectivities.py \ --n_out 1000 \ --model-path '' \ --layer 18 \ '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' echo "Done" ================================================ FILE: scripts/temporal_classification.sh ================================================ #!/bin/bash #SBATCH --nodes=1 #SBATCH --ntasks=16 #SBATCH --gres=gpu:v100:4 #SBATCH --mem=150GB #SBATCH --time=48:00:00 #SBATCH --array=0 #SBATCH --job-name=tempclas #SBATCH --output=tempclas_%A_%a.out module purge module load cuda-10.1 #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/temporal_classification.py --model 'resnext50_32x4d' --n_out 6269 --resume 'resnext50_32x4d_augmentstrong_batch256_True_SAY_5_288_epoch_15.tar' --start-epoch 16 '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/SAY_data_5fps_2000cls_pytorch/' #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/temporal_classification.py --model 'resnext50_32x4d' --n_out 2765 '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_data_5fps_2000cls_pytorch/' #python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/temporal_classification.py --model 'resnext50_32x4d' --n_out 1786 '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/A_data_5fps_2000cls_pytorch/' python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/temporal_classification.py --model 'resnext50_32x4d' --n_out 1718 '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/Y_data_5fps_2000cls_pytorch/' echo "Done" ================================================ FILE: selectivities.py ================================================ '''Measure single feature class selectivities''' import os import argparse import numpy as np import torch import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models from torchvision.utils import make_grid def extract_map_layer_7x7(mobilenetV2_model): layer_list = list(mobilenetV2_model.module.features.children()) new_model = torch.nn.Sequential(*layer_list) return new_model def extract_map_layer_14x14(mobilenetV2_model, layer): layer_list = list(mobilenetV2_model.module.features.children()) new_layer_list = layer_list[:-layer] new_layer_list.append(layer_list[-layer].conv[0]) new_model = torch.nn.Sequential(*new_layer_list) return new_model def load_model(args): model = models.mobilenet_v2(pretrained=True) model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True) model = torch.nn.DataParallel(model).cuda() if args.model_path: if os.path.isfile(args.model_path): checkpoint = torch.load(args.model_path) model.load_state_dict(checkpoint['model_state_dict']) else: print("=> no checkpoint found at '{}'".format(args.model_path)) return model def load_data(data_dir, args): normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) train_dataset = datasets.ImageFolder( data_dir, transforms.Compose([transforms.ToTensor(), normalize]) ) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True, sampler=None ) return train_loader def predict(data_loader, model): targets = [] preds = [] # switch to evaluate mode model.eval() with torch.no_grad(): for i, (images, target) in enumerate(data_loader): images = images.cuda() # compute predictions pred = model(images) pred = torch.mean(pred, dim=(2, 3)) targets.append(target.cpu().numpy()) preds.append(pred.cpu().numpy()) print('Iter:', i) targets = np.concatenate(targets, axis=0) preds = np.concatenate(preds, axis=0) print('Targets size:', targets.shape) print('Preds size:', preds.shape) return targets, preds if __name__ == '__main__': parser = argparse.ArgumentParser(description='Measure single feature class selectivities') parser.add_argument('data', metavar='DIR', help='path to dataset') parser.add_argument('--workers', default=4, type=int, help='number of data loading workers (default: 4)') parser.add_argument('--batch-size', default=580, type=int, help='mini-batch size, this is the total ' 'batch size of all GPUs on the current node when ' 'using Data Parallel or Distributed Data Parallel') parser.add_argument('--model-path', default='', type=str, help='path to model checkpoint ' '(default: ImageNet-pretrained)') parser.add_argument('--n_out', default=1000, type=int, help='output dim of pre-trained model') parser.add_argument('--layer', default=1, type=int, choices=[1, 2, 6, 10, 14, 18], help='which layer?') args = parser.parse_args() model = load_model(args) if args.layer == 1: map_layer = extract_map_layer_7x7(model) else: map_layer = extract_map_layer_14x14(model, args.layer) data_loader = load_data(args.data, args) targets, preds = predict(data_loader, map_layer) n_classes = 26 n_neurons = preds.shape[1] class_matrix_mean = np.zeros((n_neurons, n_classes)) class_matrix_std = np.zeros((n_neurons, n_classes)) for i in range(n_neurons): for j in range(n_classes): aux_vec = preds[targets==j, i] class_matrix_mean[i, j] = np.mean(aux_vec) class_matrix_std[i, j] = np.std(aux_vec) sorted_mean = np.sort(class_matrix_mean, axis=1) selectivity = (sorted_mean[:, -1] - np.mean(sorted_mean[:, :-1], axis=1)) / \ (sorted_mean[:, -1] + np.mean(sorted_mean[:, :-1], axis=1)) print('Most selective 10 features:', np.argsort(selectivity)[-10:]) print('Highest 10 selectivities:', np.sort(selectivity)[-10:]) print('Selectivity shape:', selectivity.shape) np.save('selectivity_' + str(args.layer) + '.npy', selectivity) ================================================ FILE: temporal_classification.py ================================================ import argparse import os import random import shutil import time import warnings import torch import torch.nn as nn import torch.nn.parallel import torch.backends.cudnn as cudnn import torch.distributed as dist import torch.optim import torch.multiprocessing as mp import torch.utils.data import torch.utils.data.distributed import torchvision.transforms as transforms import torchvision.datasets as datasets import torchvision.models as models from utils import GaussianBlur parser = argparse.ArgumentParser(description='Temporal classification with headcam data') parser.add_argument('data', metavar='DIR', help='path to dataset') parser.add_argument('--model', default='resnet50', choices=['resnet50', 'resnext101_32x8d', 'resnext50_32x4d', 'mobilenet_v2'], help='model') parser.add_argument('-j', '--workers', default=32, type=int, metavar='N', help='number of data loading workers (default' ':16)') parser.add_argument('--epochs', default=16, type=int, metavar='N', help='number of total epochs to run') parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)') parser.add_argument('-b', '--batch-size', default=256, type=int, metavar='N', help='mini-batch size (default: 128), this is the total batch size of all GPUs on the current node ' 'when using Data Parallel or Distributed Data Parallel') parser.add_argument('--lr', '--learning-rate', default=0.0005, type=float, metavar='LR', help='initial learning rate', dest='lr') parser.add_argument('--wd', '--weight-decay', default=0.0, type=float, metavar='W', help='weight decay (default: 0)', dest='weight_decay') parser.add_argument('-p', '--print-freq', default=10000, type=int, metavar='N', help='print frequency (default: 250)') parser.add_argument('--resume', default='', type=str, metavar='PATH', help='path to latest checkpoint (default: none)') parser.add_argument('--world-size', default=-1, type=int, help='number of nodes for distributed training') parser.add_argument('--rank', default=-1, type=int, help='node rank for distributed training') parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, help='url used to set up distributed ' 'training') parser.add_argument('--dist-backend', default='nccl', type=str, help='distributed backend') parser.add_argument('--gpu', default=None, type=int, help='GPU id to use.') parser.add_argument('--multiprocessing-distributed', action='store_true', help='Use multi-processing distributed training to launch ' 'N processes per node, which has N GPUs. This is the ' 'fastest way to use PyTorch for either single node or ' 'multi node data parallel training') parser.add_argument('--n_out', default=1000, type=int, help='output dim') parser.add_argument('--augmentation', default=True, action='store_false', help='whether to use data augmentation?') def main(): args = parser.parse_args() print(args) if args.gpu is not None: warnings.warn('You have chosen a specific GPU. This will completely disable data parallelism.') if args.dist_url == "env://" and args.world_size == -1: args.world_size = int(os.environ["WORLD_SIZE"]) args.distributed = args.world_size > 1 or args.multiprocessing_distributed ngpus_per_node = torch.cuda.device_count() if args.multiprocessing_distributed: # Since we have ngpus_per_node processes per node, the total world_size needs to be adjusted accordingly args.world_size = ngpus_per_node * args.world_size # Use torch.multiprocessing.spawn to launch distributed processes: the main_worker process function mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args)) else: # Simply call main_worker function main_worker(args.gpu, ngpus_per_node, args) def main_worker(gpu, ngpus_per_node, args): args.gpu = gpu if args.gpu is not None: print("Use GPU: {} for training".format(args.gpu)) print('Model:', args.model) model = models.__dict__[args.model](pretrained=False) if args.model.startswith('res'): model.fc = torch.nn.Linear(in_features=2048, out_features=args.n_out, bias=True) else: model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True) # DataParallel will divide and allocate batch_size to all available GPUs model = torch.nn.DataParallel(model).cuda() # define loss function (criterion) and optimizer criterion = nn.CrossEntropyLoss().cuda(args.gpu) optimizer = torch.optim.Adam(model.parameters(), args.lr, weight_decay=args.weight_decay) cudnn.benchmark = True if args.resume: if os.path.isfile(args.resume): print(args.resume) checkpoint = torch.load(args.resume) model.load_state_dict(checkpoint['model_state_dict']) optimizer.load_state_dict(checkpoint['optimizer_state_dict']) else: print("=> no checkpoint found at '{}'".format(args.resume)) savefile_name = args.model + '_augmentstrong_batch256_' + str(args.augmentation) + '_Y_5_288' normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) if args.augmentation: train_dataset = datasets.ImageFolder( args.data, transforms.Compose([ transforms.RandomResizedCrop(224, scale=(0.2, 1.)), transforms.RandomApply([transforms.ColorJitter(0.9, 0.9, 0.9, 0.5)], p=0.9), transforms.RandomGrayscale(p=0.2), transforms.RandomApply([GaussianBlur([.1, 2.])], p=0.5), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize ]) ) else: train_dataset = datasets.ImageFolder( args.data, transforms.Compose([ transforms.ToTensor(), normalize ]) ) train_loader = torch.utils.data.DataLoader( train_dataset, batch_size=args.batch_size, shuffle=True, num_workers=args.workers, pin_memory=True, sampler=None ) acc1_list = [] for epoch in range(args.start_epoch, args.epochs): # train for one epoch acc1 = train(train_loader, model, criterion, optimizer, epoch, args) acc1_list.append(acc1) torch.save({'acc1_list': acc1_list, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict()}, savefile_name + '_epoch_' + str(epoch) + '.tar') def train(train_loader, model, criterion, optimizer, epoch, args): batch_time = AverageMeter('Time', ':6.3f') data_time = AverageMeter('Data', ':6.3f') losses = AverageMeter('Loss', ':.4e') top1 = AverageMeter('Acc@1', ':6.2f') top5 = AverageMeter('Acc@5', ':6.2f') progress = ProgressMeter( len(train_loader), [batch_time, data_time, losses, top1, top5], prefix="Epoch: [{}]".format(epoch)) # switch to train mode model.train() end = time.time() for i, (images, target) in enumerate(train_loader): # measure data loading time data_time.update(time.time() - end) if args.gpu is not None: images = images.cuda(args.gpu, non_blocking=True) target = target.cuda(args.gpu, non_blocking=True) # compute output output = model(images) loss = criterion(output, target) # measure accuracy and record loss acc1, acc5 = accuracy(output, target, topk=(1, 5)) losses.update(loss.item(), images.size(0)) top1.update(acc1[0], images.size(0)) top5.update(acc5[0], images.size(0)) # compute gradient and do SGD step optimizer.zero_grad() loss.backward() optimizer.step() # measure elapsed time batch_time.update(time.time() - end) end = time.time() if i % args.print_freq == 0: progress.display(i) return top1.avg.cpu().numpy() class AverageMeter(object): """Computes and stores the average and current value""" def __init__(self, name, fmt=':f'): self.name = name self.fmt = fmt self.reset() def reset(self): self.val = 0 self.avg = 0 self.sum = 0 self.count = 0 def update(self, val, n=1): self.val = val self.sum += val * n self.count += n self.avg = self.sum / self.count def __str__(self): fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})' return fmtstr.format(**self.__dict__) class ProgressMeter(object): def __init__(self, num_batches, meters, prefix=""): self.batch_fmtstr = self._get_batch_fmtstr(num_batches) self.meters = meters self.prefix = prefix def display(self, batch): entries = [self.prefix + self.batch_fmtstr.format(batch)] entries += [str(meter) for meter in self.meters] print('\t'.join(entries)) def _get_batch_fmtstr(self, num_batches): num_digits = len(str(num_batches // 1)) fmt = '{:' + str(num_digits) + 'd}' return '[' + fmt + '/' + fmt.format(num_batches) + ']' def accuracy(output, target, topk=(1,)): """Computes the accuracy over the k top predictions for the specified values of k""" with torch.no_grad(): maxk = max(topk) batch_size = target.size(0) _, pred = output.topk(maxk, 1, True, True) pred = pred.t() correct = pred.eq(target.view(1, -1).expand_as(pred)) res = [] for k in topk: correct_k = correct[:k].view(-1).float().sum(0, keepdim=True) res.append(correct_k.mul_(100.0 / batch_size)) return res if __name__ == '__main__': main() ================================================ FILE: utils.py ================================================ # Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved from PIL import ImageFilter import random class GaussianBlur(object): """Gaussian blur augmentation in SimCLR https://arxiv.org/abs/2002.05709""" def __init__(self, sigma=[.1, 2.]): self.sigma = sigma def __call__(self, x): sigma = random.uniform(self.sigma[0], self.sigma[1]) x = x.filter(ImageFilter.GaussianBlur(radius=sigma)) return x