Repository: eminorhan/baby-vision
Branch: master
Commit: f0340acd873d
Files: 32
Total size: 130.0 KB
Directory structure:
gitextract_uh48hnz9/
├── .gitignore
├── LICENSE
├── README.md
├── feature_animation.py
├── feature_animation_class.py
├── highly_activating_imgs.py
├── hog_baseline.py
├── imagenet_finetuning.py
├── linear_combination_maps.py
├── linear_decoding.py
├── moco/
│ ├── __init__.py
│ ├── builder.py
│ └── loader.py
├── moco_img.py
├── moco_temp.py
├── moco_utils.py
├── read_saycam.py
├── scripts/
│ ├── feature_animation.sh
│ ├── feature_animation_class.sh
│ ├── highly_activating_imgs.sh
│ ├── hog_baseline.sh
│ ├── imagenet_finetuning.sh
│ ├── linear_combination_maps.sh
│ ├── linear_decoding.sh
│ ├── moco_img.sh
│ ├── moco_temp.sh
│ ├── read_saycam.sh
│ ├── selectivities.sh
│ └── temporal_classification.sh
├── selectivities.py
├── temporal_classification.py
└── utils.py
================================================
FILE CONTENTS
================================================
================================================
FILE: .gitignore
================================================
# File extensions
*.out
*.mp4
# Directories
/__pychache__
/moco/__pychache__
================================================
FILE: LICENSE
================================================
MIT License
Copyright (c) 2021 Emin Orhan
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
================================================
FILE: README.md
================================================
# Self-supervised learning through the eyes of a child
This repository contains code for reproducing the results reported in the following paper:
Orhan AE, Gupta VV, Lake BM (2020) [Self-supervised learning through the eyes of a child.](https://arxiv.org/abs/2007.16189) *Advances in Neural Information Processing Systems 34 (NeurIPS 2020)*.
## Requirements
* pytorch == 1.5.1
* torchvision == 0.6.1
Slightly older or newer versions will probably work fine as well.
## Datasets
This project uses the SAYCam dataset described in the following paper:
Sullivan J, Mei M, Perfors A, Wojcik EH, Frank MC (2020) [SAYCam: A large, longitudinal audiovisual dataset recorded from the infant’s perspective.](https://psyarxiv.com/fy8zx/) PsyArXiv.
The dataset is hosted on the [Databrary](https://nyu.databrary.org/) repository for behavioral science. Unfortunately, we are unable to publicly share the SAYCam dataset here due to the terms of use. However, interested researchers can apply for access to the dataset with approval from their institution's IRB.
In addition, this project also uses the Toybox dataset for evaluation purposes. The Toybox dataset is publicly available at [this address](https://aivaslab.github.io/toybox/).
## Code description
* [`temporal_classification.py`](https://github.com/eminorhan/baby-vision/blob/master/temporal_classification.py): trains temporal classification models as described in the paper. This file uses code recycled from the PyTorch ImageNet training [example](https://github.com/pytorch/examples/tree/master/imagenet).
* [`read_saycam.py`](https://github.com/eminorhan/baby-vision/blob/master/read_saycam.py): SAYCam video-to-image reader.
* [`moco`](https://github.com/eminorhan/baby-vision/tree/master/moco) directory contains helper files for training static and temporal MoCo models. The code here was modified from [Facebook's MoCo repository](https://github.com/facebookresearch/moco).
* [`moco_img.py`](https://github.com/eminorhan/baby-vision/blob/master/moco_img.py): trains an image-based MoCo model as described in the paper. This code was modified from [Facebook's MoCo repository](https://github.com/facebookresearch/moco).
* [`moco_temp.py`](https://github.com/eminorhan/baby-vision/blob/master/moco_temp.py): trains a temporal MoCo model as described in the paper. This code was also modified from [Facebook's MoCo repository](https://github.com/facebookresearch/moco).
* [`moco_utils.py`](https://github.com/eminorhan/baby-vision/blob/master/moco_utils.py): some utility functions for MoCo training.
* [`linear_decoding.py`](https://github.com/eminorhan/baby-vision/blob/master/linear_decoding.py): evaluates self-supervised models on downstream linear classification tasks.
* [`linear_combination_maps.py`](https://github.com/eminorhan/baby-vision/blob/master/linear_combination_maps.py): plots spatial attention maps as in Figure 4b and Figure 6 in the paper.
* [`highly_activating_imgs.py`](https://github.com/eminorhan/baby-vision/blob/master/highly_activating_imgs.py): finds highly activating images for a given feature as in Figure 7b in the paper.
* [`selectivities.py`](https://github.com/eminorhan/baby-vision/blob/master/selectivities.py): measures the class selecitivity indices of all features in a given layer as in Figure 7a in the paper.
* [`hog_baseline.py`](https://github.com/eminorhan/baby-vision/blob/master/hog_baseline.py): runs the HOG baseline model as described in the paper.
* [`imagenet_finetuning.py`](https://github.com/eminorhan/baby-vision/blob/master/imagenet_finetuning.py): ImageNet evaluations.
* [`feature_animation.py`](https://github.com/eminorhan/baby-vision/blob/master/feature_animation.py) and [`feature_animation_class.py`](https://github.com/eminorhan/baby-vision/blob/master/feature_animation_class.py): Some tools for visualizing the learned features.
For specific usage examples, please see the slurm scripts provided in the [`scripts`](https://github.com/eminorhan/baby-vision/tree/master/scripts) directory.
## Pre-trained models
### ResNeXt
Since the publication of the paper, we have found that training larger capacity models for longer with the temporal classification objective significantly improves the evaluation results. Hence, we provide below pre-trained `resnext50_32x4d` type models that are currently our best models trained with the SAYCam data. We encourage people to use these new models instead of the `mobilenet_v2` type models reported in the paper (the pre-trained `mobilenet_v2` models reported in the paper are also provided below for the record).
Four pre-trained `resnext50_32x4d` models are provided here: temporal classification models trained on data from the individual children in the SAYCam dataset (`TC-S-resnext`, `TC-A-resnext`, `TC-Y-resnext`) and a temporal classification model trained on data from all three children (`TC-SAY-resnext`). These models were all trained for 16 epochs (with batch size 256) with the following data augmentation pipeline:
```python
import torchvision.transforms as tr
tr.Compose([
tr.RandomResizedCrop(224, scale=(0.2, 1.)),
tr.RandomApply([tr.ColorJitter(0.9, 0.9, 0.9, 0.5)], p=0.9),
tr.RandomGrayscale(p=0.2),
tr.RandomApply([GaussianBlur([.1, 2.])], p=0.5),
tr.RandomHorizontalFlip(),
tr.ToTensor(),
tr.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
```
This data augmentation pipeline is similar to that used in [the SimCLR paper](https://arxiv.org/abs/2002.05709) with slightly larger random crops and slightly stronger color augmentation. Here are some evaluation results for these `resnext50_32x4d` models (to download the models, click on the links over the model names):
| Model | Toybox (*iid*) | Toybox (*exemplar*) | ImageNet (*linear*) | ImageNet (*1% ft + linear*) |
| ----- |:--------------:|:-------------------:|:-------------------:|:---------------------------:|
| [`TC-SAY-resnext`](https://drive.google.com/file/d/1I-HvIeuupsE88yS6eff_nE6pHpEmTVPG/view?usp=sharing) | **90.0** | **57.5** | **36.0** | **45.6** |
| [`TC-S-resnext`](https://drive.google.com/file/d/14tZeOtK1Jd64ioxPwzwf2jblriN7Jgue/view?usp=sharing) | 88.5 | 54.9 | -- | -- |
| [`TC-A-resnext`](https://drive.google.com/file/d/1aQuWfb4O0xL0PALRJpYIUHk0tsyrujDF/view?usp=sharing) | 86.8 | 50.4 | -- | -- |
| [`TC-Y-resnext`](https://drive.google.com/file/d/1sB12pdnVEZsgVKiVdZyS0l4x24_T5zCj/view?usp=sharing) | 87.0 | 53.0 | -- | -- |
Here, **ImageNet (*linear*)** refers to the top-1 validation accuracy on ImageNet with only a linear classifier trained on top of the frozen features, and **ImageNet (*1% ft + linear*)** is similar but with the entire model first fine-tuned on 1% of the ImageNet training data (~12800 images). Note that these are results from a single run, so you may observe slightly different numbers.
These models come with the temporal classification heads attached. To load these models, please do something along the lines of:
```python
import torch
import torchvision.models as models
model = models.resnext50_32x4d(pretrained=False)
model.fc = torch.nn.Linear(in_features=2048, out_features=n_out, bias=True)
model = torch.nn.DataParallel(model).cuda()
checkpoint = torch.load('TC-SAY-resnext.tar')
model.load_state_dict(checkpoint['model_state_dict'])
```
where `n_out` should be 6269 for `TC-SAY-resnext`, 2765 for `TC-S-resnext`, 1786 for `TC-A-resnext`, and 1718 for `TC-Y-resnext`. The differences here are due to the different lengths of the datasets.
In addition, please find below the best performing ImageNet models reported above: a model with a linear ImageNet classifier trained on top of the frozen features of `TC-SAY-resnext` (`TC-SAY-resnext-IN-linear`) and a model that was first fine-tuned with 1% of the ImageNet training data (`TC-SAY-resnext-IN-1pt-linear`):
* [`TC-SAY-resnext-IN-linear`](https://drive.google.com/file/d/1Qo0_1RwgOsr-JM3lP4ILWRY0WflnS7On/view?usp=sharing)
* [`TC-SAY-resnext-IN-1pt-linear`](https://drive.google.com/file/d/1lvCG3L1_-gdqWDMD41yTIbuNBpzpUOQq/view?usp=sharing)
You can load these models in the same way as described above. Since these are ImageNet models, `n_out` should be set to 1000.
### MobileNet
The following are the pre-trained `mobilenet_v2` type models reported in the paper:
* [TC-S-mobilenet](https://drive.google.com/file/d/1DVJjpaGhoBPNmlO7jXpwEX3lSCk2ZUCa/view?usp=sharing) (69.4 MB)
* [TC-A-mobilenet](https://drive.google.com/file/d/1uQvJBbuy6P0uCW0HYs1wNgawRU8sGLhC/view?usp=sharing) (54.4 MB)
* [TC-Y-mobilenet](https://drive.google.com/file/d/1TTndiiiqSiCMdZjwYZPKQySZot4ipCrG/view?usp=sharing) (53.3 MB)
* [TC-SAY-mobilenet](https://drive.google.com/file/d/1zeidpBaXqqWCeeYj-fMI7V7x9EiAGH6Q/view?usp=sharing) (123.3 MB)
## Acknowledgments
We are very grateful to the volunteers who contributed recordings to the SAYCam dataset. We thank Jessica Sullivan for her generous assistance with the dataset. We also thank the team behind the Toybox dataset, as well as the developers of PyTorch and torchvision for making this work possible. This project was partly funded by the NSF Award 1922658 NRT-HDR: FUTURE Foundations, Translation, and Responsibility for Data Science.
================================================
FILE: feature_animation.py
================================================
'''Animating features on short clips'''
import os
import argparse
import numpy as np
import torch
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
from torchvision.utils import make_grid
import matplotlib as mp
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import matplotlib.cm as cm
# TODO: combine the map extraction functions into a single function
# TODO: combine model loading functions into a single function
def extract_map_layer_7x7_res(res_model):
layer_list = list(res_model.module.children())[:-2]
new_model = torch.nn.Sequential(*layer_list)
return new_model
def extract_map_layer_7x7(mobilenetV2_model):
layer_list = list(mobilenetV2_model.module.features.children())
new_model = torch.nn.Sequential(*layer_list)
return new_model
def extract_map_layer_14x14(mobilenetV2_model):
layer_list = list(mobilenetV2_model.module.features.children())
new_layer_list = layer_list[:-5]
new_layer_list.append(layer_list[-5].conv[0])
new_model = torch.nn.Sequential(*new_layer_list)
return new_model
def load_model_res(args):
model = models.resnext50_32x4d(pretrained=False)
model.fc = torch.nn.Linear(in_features=2048, out_features=args.n_out, bias=True)
model = torch.nn.DataParallel(model).cuda()
if args.model_path:
if os.path.isfile(args.model_path):
checkpoint = torch.load(args.model_path)
model.load_state_dict(checkpoint['model_state_dict'])
else:
print("=> no checkpoint found at '{}'".format(args.model_path))
return model
def load_model(args):
model = models.mobilenet_v2(pretrained=True)
model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True)
model = torch.nn.DataParallel(model).cuda()
if args.model_path:
if os.path.isfile(args.model_path):
checkpoint = torch.load(args.model_path)
model.load_state_dict(checkpoint['model_state_dict'])
else:
print("=> no checkpoint found at '{}'".format(args.model_path))
return model
def load_data(data_dir, args):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_dataset = datasets.ImageFolder(
data_dir,
transforms.Compose([transforms.Resize(224), transforms.ToTensor(), normalize])
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=False,
num_workers=args.workers, pin_memory=True, sampler=None
)
return train_loader
def predict(data_loader, model, batch_size, feature_idx):
# switch to evaluate mode
model.eval()
preds_list = []
imgs_list = []
with torch.no_grad():
for i, (images, target) in enumerate(data_loader):
images = images.cuda()
# compute predictions
preds = model(images)
preds = preds[:, feature_idx, :, :]
preds_list.append(preds)
imgs_list.append(images)
preds = torch.cat(preds_list, 0)
images = torch.cat(imgs_list, 0)
print('Images shape:', images.size())
print('Preds shape:', preds.size())
# Copy activation map to all channels and upsample to image size
x = torch.zeros(preds.size()[0], 3, 7, 7)
x[:, 0, :, :] = preds
x[:, 1, :, :] = preds
x[:, 2, :, :] = preds
m = torch.nn.Upsample(scale_factor=32, mode='bicubic')
upsampled_maps = m(x).cuda()
# upsampled_maps = torch.sigmoid(10. * upsampled_maps / torch.std(upsampled_maps))
upsampled_maps = upsampled_maps.cpu().numpy()
images = images.cpu().numpy()
return upsampled_maps, images
def show_img(ax, img, save_name):
'''Save maps'''
npimg = img.cpu().numpy()
print(npimg.shape)
ax.imshow(np.transpose(npimg, (1, 2, 0)), interpolation='nearest')
ax.spines["bottom"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
mp.rcParams['axes.linewidth'] = 0.75
mp.rcParams['patch.linewidth'] = 0.75
mp.rcParams['patch.linewidth'] = 1.15
mp.rcParams['font.sans-serif'] = ['FreeSans']
mp.rcParams['mathtext.fontset'] = 'cm'
plt.savefig(save_name, bbox_inches='tight')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Plot spatial attention maps')
parser.add_argument('data', metavar='DIR', help='path to dataset')
parser.add_argument('--workers', default=32, type=int, help='number of data loading workers (default: 4)')
parser.add_argument('--batch-size', default=900, type=int, help='mini-batch size, this is the total '
'batch size of all GPUs on the current node when '
'using Data Parallel or Distributed Data Parallel')
parser.add_argument('--model-path', default='', type=str, help='path to model checkpoint (default: '
'ImageNet-pretrained)')
parser.add_argument('--n_out', default=2765, type=int, help='output dim of pre-trained model')
parser.add_argument('--feature-idx', default=1, type=int, help='feature index for which the maps will be computed')
args = parser.parse_args()
model = load_model(args)
map_layer = extract_map_layer_7x7(model)
data_loader = load_data(args.data, args)
preds, images = predict(data_loader, map_layer, args.batch_size, args.feature_idx)
preds = preds - preds.min()
preds = preds / preds.max()
preds = np.uint8(255 * preds)
images = images - images.min()
images = images / images.max()
# images = np.uint8(255 * images)
fig, ax = plt.subplots()
ax.set_axis_off()
ax.set_title('Feature: ' + str(args.feature_idx))
jet = cm.get_cmap("jet")
jet_colors = jet(np.arange(256))[:, :3]
preds = jet_colors[preds[:, 0, :, :]]
masked_imgs = 1.0 * preds + np.transpose(images, (0, 2, 3, 1))
masked_imgs = np.uint8(255 * masked_imgs / masked_imgs.max())
imgs = []
for i in range(900):
im = ax.imshow(masked_imgs[i])
if i == 0:
im = ax.imshow(masked_imgs[i])
imgs.append([im])
ani = animation.ArtistAnimation(fig, imgs, interval=200, blit=True, repeat_delay=1000)
# To save the animation, use e.g.
ani.save('intphys_feature_animation_' + str(args.feature_idx) + '.mp4')
================================================
FILE: feature_animation_class.py
================================================
'''Animating features on short clips'''
import os
import argparse
import numpy as np
import torch
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
from torchvision.utils import make_grid
import matplotlib as mp
import matplotlib.pyplot as plt
import matplotlib.animation as animation
import matplotlib.cm as cm
# TODO: combine the map extraction functions into a single function
# TODO: combine model loading functions into a single function
def extract_map_layer_7x7_res(res_model):
layer_list = list(res_model.module.children())[:-2]
new_model = torch.nn.Sequential(*layer_list)
return new_model
def extract_map_layer_7x7(mobilenetV2_model):
layer_list = list(mobilenetV2_model.module.features.children())
new_model = torch.nn.Sequential(*layer_list)
return new_model
def extract_map_layer_14x14(mobilenetV2_model):
layer_list = list(mobilenetV2_model.module.features.children())
new_layer_list = layer_list[:-5]
new_layer_list.append(layer_list[-5].conv[0])
new_model = torch.nn.Sequential(*new_layer_list)
return new_model
def load_model_res(args):
model = models.resnext50_32x4d(pretrained=False)
model.fc = torch.nn.Linear(in_features=2048, out_features=args.n_out, bias=True)
model = torch.nn.DataParallel(model).cuda()
if args.model_path:
if os.path.isfile(args.model_path):
checkpoint = torch.load(args.model_path)
model.load_state_dict(checkpoint['model_state_dict'])
else:
print("=> no checkpoint found at '{}'".format(args.model_path))
return model
def load_model(args):
model = models.mobilenet_v2(pretrained=True)
model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True)
model = torch.nn.DataParallel(model).cuda()
if args.model_path:
if os.path.isfile(args.model_path):
checkpoint = torch.load(args.model_path)
model.load_state_dict(checkpoint['model_state_dict'])
else:
print("=> no checkpoint found at '{}'".format(args.model_path))
return model
def load_data(data_dir, args):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_dataset = datasets.ImageFolder(
data_dir,
transforms.Compose([transforms.ToTensor(), normalize])
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=False,
num_workers=args.workers, pin_memory=True, sampler=None
)
return train_loader
def predict(data_loader, model, batch_size, weights):
# switch to evaluate mode
model.eval()
preds_list = []
imgs_list = []
with torch.no_grad():
for i, (images, target) in enumerate(data_loader):
images = images.cuda()
# compute predictions
preds = model(images)
preds_list.append(preds)
imgs_list.append(images)
preds = torch.cat(preds_list, 0)
images = torch.cat(imgs_list, 0)
print('Images shape:', images.size())
print('Preds shape:', preds.size())
linear_combination_map = torch.einsum('ijkl,j->ikl', preds, weights)
# Copy activation map to all channels and upsample to image size
x = torch.zeros(preds.size()[0], 3, 7, 7)
x[:, 0, :, :] = linear_combination_map
x[:, 1, :, :] = linear_combination_map
x[:, 2, :, :] = linear_combination_map
m = torch.nn.Upsample(scale_factor=32, mode='bicubic')
upsampled_maps = m(x).cuda()
# upsampled_maps = torch.sigmoid(10. * upsampled_maps / torch.std(upsampled_maps))
upsampled_maps = upsampled_maps.cpu().numpy()
images = images.cpu().numpy()
return upsampled_maps, images
def show_img(ax, img, save_name):
'''Save maps'''
npimg = img.cpu().numpy()
print(npimg.shape)
ax.imshow(np.transpose(npimg, (1, 2, 0)), interpolation='nearest')
ax.spines["bottom"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
mp.rcParams['axes.linewidth'] = 0.75
mp.rcParams['patch.linewidth'] = 0.75
mp.rcParams['patch.linewidth'] = 1.15
mp.rcParams['font.sans-serif'] = ['FreeSans']
mp.rcParams['mathtext.fontset'] = 'cm'
plt.savefig(save_name, bbox_inches='tight')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Plot spatial attention maps')
parser.add_argument('data', metavar='DIR', help='path to dataset')
parser.add_argument('--workers', default=32, type=int, help='number of data loading workers (default: 4)')
parser.add_argument('--batch-size', default=500, type=int, help='mini-batch size, this is the total '
'batch size of all GPUs on the current node when '
'using Data Parallel or Distributed Data Parallel')
parser.add_argument('--model-path', default='', type=str, help='path to model checkpoint (default: '
'ImageNet-pretrained)')
parser.add_argument('--n_out', default=26, type=int, help='output dim of pre-trained model')
parser.add_argument('--class-idx', default=1, type=int, help='class index for which the maps will be computed')
args = parser.parse_args()
model = load_model(args)
map_layer = extract_map_layer_7x7(model)
weights = model.module.classifier.weight.data[args.class_idx, :].cuda()
data_loader = load_data(args.data, args)
preds, images = predict(data_loader, map_layer, args.batch_size, weights)
preds = preds - preds.min()
preds = preds / preds.max()
preds = np.uint8(255 * preds)
images = images - images.min()
images = images / images.max()
# images = np.uint8(255 * images)
fig, ax = plt.subplots()
ax.set_axis_off()
ax.set_title('Class: ' + str(args.class_idx))
jet = cm.get_cmap("jet")
jet_colors = jet(np.arange(256))[:, :3]
preds = jet_colors[preds[:, 0, :, :]]
masked_imgs = 1.0 * preds + np.transpose(images, (0, 2, 3, 1))
masked_imgs = np.uint8(255 * masked_imgs / masked_imgs.max())
imgs = []
for i in range(200):
im = ax.imshow(masked_imgs[i])
if i == 0:
im = ax.imshow(masked_imgs[i])
imgs.append([im])
ani = animation.ArtistAnimation(fig, imgs, interval=200, blit=True, repeat_delay=1000)
# To save the animation, use e.g.
ani.save('computers_feature_animation_' + str(args.class_idx) + '.mp4')
================================================
FILE: highly_activating_imgs.py
================================================
'''Plots highly activating images'''
import os
import argparse
import numpy as np
import torch
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
from torchvision.utils import make_grid
import matplotlib as mp
import matplotlib.pyplot as plt
def extract_map_layer_7x7(mobilenetV2_model):
layer_list = list(mobilenetV2_model.module.features.children())
new_model = torch.nn.Sequential(*layer_list)
return new_model
def extract_map_layer_14x14(mobilenetV2_model, layer):
layer_list = list(mobilenetV2_model.module.features.children())
new_layer_list = layer_list[:-layer]
new_layer_list.append(layer_list[-layer].conv[0])
new_model = torch.nn.Sequential(*new_layer_list)
return new_model
def load_model(args):
model = models.mobilenet_v2(pretrained=True)
model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True)
model = torch.nn.DataParallel(model).cuda()
if args.model_path:
if os.path.isfile(args.model_path):
checkpoint = torch.load(args.model_path)
model.load_state_dict(checkpoint['model_state_dict'])
else:
print("=> no checkpoint found at '{}'".format(args.model_path))
return model
def load_data(data_dir, args):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_dataset = datasets.ImageFolder(
data_dir,
transforms.Compose([transforms.ToTensor(), normalize])
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=True,
num_workers=args.workers, pin_memory=True, sampler=None
)
return train_loader
def predict(data_loader, model, neuron_idx):
# switch to evaluate mode
model.eval()
with torch.no_grad():
for i, (images, target) in enumerate(data_loader):
images = images.cuda()
# compute predictions
pred = model(images)
pred_mean = torch.mean(pred, dim=(2, 3))
pred_mean = pred_mean[:, neuron_idx]
if i == 0:
break
_, indices = torch.sort(pred_mean, descending=True)
images = images[indices, :, :, :]
return images
def show_img(ax, img, save_name):
'''Save maps'''
npimg = img.cpu().numpy()
print(npimg.shape)
ax.imshow(np.transpose(npimg, (1, 2, 0)), interpolation='nearest')
ax.spines["bottom"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
mp.rcParams['axes.linewidth'] = 0.75
mp.rcParams['patch.linewidth'] = 0.75
mp.rcParams['patch.linewidth'] = 1.15
mp.rcParams['font.sans-serif'] = ['FreeSans']
mp.rcParams['mathtext.fontset'] = 'cm'
plt.savefig(save_name, bbox_inches='tight')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Plot highly activating images for a given feature')
parser.add_argument('data', metavar='DIR', help='path to dataset')
parser.add_argument('--workers', default=4, type=int, help='number of data loading workers (default: 4)')
parser.add_argument('--batch-size', default=1024, type=int, help='mini-batch size, this is the total '
'batch size of all GPUs on the current node when '
'using Data Parallel or Distributed Data Parallel')
parser.add_argument('--model-path', default='', type=str, help='path to latest checkpoint (default: none)')
parser.add_argument('--n_out', default=1000, type=int, help='output dim')
parser.add_argument('--neuron_idx', default=276, type=int, help='neuron index')
args = parser.parse_args()
model = load_model(args)
map_layer = extract_map_layer_7x7(model)
data_loader = load_data(args.data, args)
imgs = predict(data_loader, map_layer, neuron_idx=args.neuron_idx)
print('Imgs shape', imgs.shape)
print('Plotting the top 10 images')
fig_img = plt.figure(figsize=(16, 16), dpi=300)
ax_img = fig_img.add_subplot('111')
grid_img = make_grid(imgs[:10, :, :, :], nrow=10, padding=2, normalize=True, scale_each=False)
show_img(ax_img, grid_img, 'highly_activating_imgs_neuron_' + str(args.neuron_idx) + '.pdf')
================================================
FILE: hog_baseline.py
================================================
'''HoG baseline'''
import os
import argparse
import numpy as np
from skimage.feature import hog
from skimage.io import imread
from sklearn.linear_model import SGDClassifier
from sklearn.model_selection import train_test_split
parser = argparse.ArgumentParser(description='Linear decoding with HoG model')
parser.add_argument('data', metavar='DIR', help='path to dataset')
parser.add_argument('--subsample', default=False, action='store_true', help='subsample data?')
if __name__ == '__main__':
args = parser.parse_args()
c_list = os.listdir(args.data)
c_list.sort()
print('Class list:', c_list)
imgs = []
labels = []
label_counter = 0
file_counter = 0
for c in c_list:
curr_dir = os.path.join(args.data, c)
f_list = os.listdir(curr_dir)
f_list.sort()
print('Reading class:', c)
for f in f_list:
f_path = os.path.join(curr_dir, f)
img = imread(f_path)
feats = hog(img, orientations=9, pixels_per_cell=(16, 16), cells_per_block=(3, 3), block_norm='L2',
visualize=False, transform_sqrt=False, feature_vector=True, multichannel=True)
if args.subsample:
if file_counter % 10 == 0:
imgs.append(feats)
labels.append(label_counter)
else:
imgs.append(feats)
labels.append(label_counter)
file_counter += 1
label_counter += 1
imgs = np.vstack(imgs)
labels = np.array(labels)
print('Imgs shape:', imgs.shape)
print('Labels shape:', labels.shape)
print('Splitting dataset')
X_train, X_test, y_train, y_test = train_test_split(imgs, labels, test_size=0.5)
print('Fitting training data')
clf = SGDClassifier(loss="hinge", penalty="l2", alpha=0.0001, max_iter=250)
clf.fit(X_train, y_train)
print('Computing predictions')
pred_test = clf.predict(X_test)
test_acc = np.mean(y_test==pred_test)
pred_train = clf.predict(X_train)
train_acc = np.mean(y_train==pred_train)
print('Test accuracy', test_acc)
print('Train accuracy', train_acc)
================================================
FILE: imagenet_finetuning.py
================================================
import argparse
import os
import time
import warnings
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.multiprocessing as mp
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
parser = argparse.ArgumentParser(description='ImageNet fine-tuning or linear classification')
parser.add_argument('-j', '--workers', default=32, type=int, metavar='N', help='number of data loading workers (default:'
' 4)')
parser.add_argument('--epochs', default=25, type=int, metavar='N', help='number of total epochs to run')
parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
parser.add_argument('-b', '--batch-size', default=256, type=int, metavar='N',
help='mini-batch size (default: 256), this is the total batch size of all GPUs on the current node '
'when using Data Parallel or Distributed Data Parallel')
parser.add_argument('--lr', '--learning-rate', default=0.0005, type=float, metavar='LR', help='initial learning rate',
dest='lr')
parser.add_argument('--wd', '--weight-decay', default=0.0, type=float, metavar='W', help='weight decay (default: 0)',
dest='weight_decay')
parser.add_argument('-p', '--print-freq', default=5000, type=int, metavar='N', help='print frequency (default: 100)')
parser.add_argument('--schedule', default=[23, 24], nargs='*', type=int,
help='learning rate schedule (when to drop lr by a ratio)')
parser.add_argument('--resume', default='', type=str, metavar='PATH', help='path to latest checkpoint (default: none)')
parser.add_argument('--world-size', default=-1, type=int, help='number of nodes for distributed training')
parser.add_argument('--rank', default=-1, type=int, help='node rank for distributed training')
parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, help='url used to set up distributed '
'training')
parser.add_argument('--dist-backend', default='nccl', type=str, help='distributed backend')
parser.add_argument('--gpu', default=None, type=int, help='GPU id to use.')
parser.add_argument('--multiprocessing-distributed', action='store_true',
help='Use multi-processing distributed training to launch '
'N processes per node, which has N GPUs. This is the '
'fastest way to use PyTorch for either single node or '
'multi node data parallel training')
parser.add_argument('--n_out', default=20, type=int, help='output dim')
parser.add_argument('--freeze-trunk', default=False, action='store_true', help='freeze trunk?')
parser.add_argument('--frac-retained', default=1.0, type=float, help='fraction of tr data retained')
def set_parameter_requires_grad(model, feature_extracting=True):
'''Helper function for setting body to non-trainable'''
if feature_extracting:
for param in model.parameters():
param.requires_grad = False
for param in model.module.fc.parameters():
print(param.shape)
param.requires_grad = True
def main():
args = parser.parse_args()
if args.gpu is not None:
warnings.warn('You have chosen a specific GPU. This will completely disable data parallelism.')
if args.dist_url == "env://" and args.world_size == -1:
args.world_size = int(os.environ["WORLD_SIZE"])
args.distributed = args.world_size > 1 or args.multiprocessing_distributed
ngpus_per_node = torch.cuda.device_count()
if args.multiprocessing_distributed:
# Since we have ngpus_per_node processes per node, the total world_size needs to be adjusted accordingly
args.world_size = ngpus_per_node * args.world_size
# Use torch.multiprocessing.spawn to launch distributed processes: the main_worker process function
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
else:
# Simply call main_worker function
main_worker(args.gpu, ngpus_per_node, args)
def main_worker(gpu, ngpus_per_node, args):
args.gpu = gpu
if args.gpu is not None:
print("Use GPU: {} for training".format(args.gpu))
if args.distributed:
if args.dist_url == "env://" and args.rank == -1:
args.rank = int(os.environ["RANK"])
if args.multiprocessing_distributed:
# For multiprocessing distributed training, rank needs to be the
# global rank among all the processes
args.rank = args.rank * ngpus_per_node + gpu
dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
world_size=args.world_size, rank=args.rank)
model = models.resnext50_32x4d(pretrained=False)
model.fc = torch.nn.Linear(in_features=2048, out_features=args.n_out, bias=True)
# DataParallel will divide and allocate batch_size to all available GPUs
model = torch.nn.DataParallel(model).cuda()
# if resume from a pretrained model
if args.resume:
if os.path.isfile(args.resume):
print("=> loading model '{}'".format(args.resume))
checkpoint = torch.load(args.resume)
model.load_state_dict(checkpoint['model_state_dict'])
if args.freeze_trunk:
print('Freezing trunk.')
set_parameter_requires_grad(model) # freeze the trunk
model.module.fc = torch.nn.Linear(in_features=2048, out_features=1000, bias=True).cuda()
else:
print("=> no checkpoint found at '{}'".format(args.resume))
else:
if args.freeze_trunk:
print('Freezing trunk.')
set_parameter_requires_grad(model) # freeze the trunk
model.module.fc = torch.nn.Linear(in_features=2048, out_features=1000, bias=True).cuda()
print(model)
# define loss function (criterion) and optimizer
criterion = nn.CrossEntropyLoss().cuda(args.gpu)
optimizer = torch.optim.Adam(model.parameters(), args.lr, weight_decay=args.weight_decay)
cudnn.benchmark = True
# Save file name
if args.resume:
sv_name = args.resume
savefile_name = 'ft_IN_' + sv_name # str(args.freeze_trunk) + 'fz_IN_' + sv_name[26:]
else:
savefile_name = str(args.freeze_trunk) + 'fz_IN_MobileNetV2_scratch.tar'
# Data loaders
basedir = '/misc/vlgscratch4/LakeGroup/emin/robust_vision/imagenet/'
traindir = os.path.join(basedir, 'train')
valdir = os.path.join(basedir, 'val')
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
train_dataset = datasets.ImageFolder(
traindir,
transforms.Compose([
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize
])
)
val_dataset = datasets.ImageFolder(valdir, transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
normalize
]))
if args.frac_retained < 1.0:
print('Fraction of train data retained:', args.frac_retained)
import numpy as np
num_train = len(train_dataset)
indices = list(range(num_train))
np.random.shuffle(indices)
train_idx = indices[:int(args.frac_retained * num_train)]
train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idx)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=False,
num_workers=args.workers, pin_memory=True, sampler=train_sampler)
else:
print('Using all of train data')
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=True,
num_workers=args.workers, pin_memory=True, sampler=None)
val_loader = torch.utils.data.DataLoader(
val_dataset, batch_size=args.batch_size, shuffle=False,
num_workers=args.workers, pin_memory=True)
acc1_list = []
val_acc1_list = []
for epoch in range(args.start_epoch, args.epochs):
adjust_learning_rate(optimizer, epoch, args)
# train for one epoch
acc1 = train(train_loader, model, criterion, optimizer, epoch, args)
acc1_list.append(acc1)
# ... then validate
val_acc1 = validate(val_loader, model, args)
val_acc1_list.append(val_acc1)
torch.save({'acc1_list': acc1_list,
'val_acc1_list': val_acc1_list,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict()}, savefile_name)
def train(train_loader, model, criterion, optimizer, epoch, args):
batch_time = AverageMeter('Time', ':6.3f')
data_time = AverageMeter('Data', ':6.3f')
losses = AverageMeter('Loss', ':.4e')
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
progress = ProgressMeter(
len(train_loader),
[batch_time, data_time, losses, top1, top5],
prefix="Epoch: [{}]".format(epoch))
# switch to train mode
model.train()
end = time.time()
for i, (images, target) in enumerate(train_loader):
# measure data loading time
data_time.update(time.time() - end)
if args.gpu is not None:
images = images.cuda(args.gpu, non_blocking=True)
target = target.cuda(args.gpu, non_blocking=True)
# compute output
output = model(images)
loss = criterion(output, target)
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.item(), images.size(0))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))
# compute gradient and do SGD step
optimizer.zero_grad()
loss.backward()
optimizer.step()
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i % args.print_freq == 0:
progress.display(i)
return top1.avg.cpu().numpy()
def validate(val_loader, model, args):
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
# switch to eval mode
model.eval()
with torch.no_grad():
for i, (images, target) in enumerate(val_loader):
if args.gpu is not None:
images = images.cuda(args.gpu, non_blocking=True)
target = target.cuda(args.gpu, non_blocking=True)
# compute output
output = model(images)
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))
print('End of epoch validation: * Acc@1 {top1.avg:.3f} Acc@5 {top5.avg:.3f}'.format(top1=top1, top5=top5))
return top1.avg.cpu().numpy()
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self, name, fmt=':f'):
self.name = name
self.fmt = fmt
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def __str__(self):
fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
return fmtstr.format(**self.__dict__)
class ProgressMeter(object):
def __init__(self, num_batches, meters, prefix=""):
self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
self.meters = meters
self.prefix = prefix
def display(self, batch):
entries = [self.prefix + self.batch_fmtstr.format(batch)]
entries += [str(meter) for meter in self.meters]
print('\t'.join(entries))
def _get_batch_fmtstr(self, num_batches):
num_digits = len(str(num_batches // 1))
fmt = '{:' + str(num_digits) + 'd}'
return '[' + fmt + '/' + fmt.format(num_batches) + ']'
def adjust_learning_rate(optimizer, epoch, args):
"""Decay the learning rate based on schedule"""
lr = args.lr
for milestone in args.schedule:
lr *= 0.2 if epoch >= milestone else 1.
for param_group in optimizer.param_groups:
param_group['lr'] = lr
def accuracy(output, target, topk=(1,)):
"""Computes the accuracy over the k top predictions for the specified values of k"""
with torch.no_grad():
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
if __name__ == '__main__':
main()
================================================
FILE: linear_combination_maps.py
================================================
'''Plots spatial attention maps'''
import os
import argparse
import numpy as np
import torch
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
from torchvision.utils import make_grid
import matplotlib as mp
import matplotlib.pyplot as plt
def extract_map_layer_7x7(mobilenetV2_model):
layer_list = list(mobilenetV2_model.module.features.children())
new_model = torch.nn.Sequential(*layer_list)
return new_model
def extract_map_layer_14x14(mobilenetV2_model):
layer_list = list(mobilenetV2_model.module.features.children())
new_layer_list = layer_list[:-5]
new_layer_list.append(layer_list[-5].conv[0])
new_model = torch.nn.Sequential(*new_layer_list)
return new_model
def load_model(args):
model = models.mobilenet_v2(pretrained=True)
model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True)
model = torch.nn.DataParallel(model).cuda()
if args.model_path:
if os.path.isfile(args.model_path):
checkpoint = torch.load(args.model_path)
model.load_state_dict(checkpoint['model_state_dict'])
else:
print("=> no checkpoint found at '{}'".format(args.model_path))
return model
def load_data(data_dir, args):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_dataset = datasets.ImageFolder(
data_dir,
transforms.Compose([transforms.ToTensor(), normalize])
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=True,
num_workers=args.workers, pin_memory=True, sampler=None
)
return train_loader
def predict(data_loader, model, weights, batch_size):
# switch to evaluate mode
model.eval()
with torch.no_grad():
for i, (images, target) in enumerate(data_loader):
images = images.cuda()
print(images.size())
# compute predictions
pred = model(images)
if i == 0:
break
linear_combination_map = torch.einsum('ijkl,j->ikl', pred, weights)
x = torch.zeros(batch_size, 3, 7, 7)
x[:, 0, :, :] = linear_combination_map
x[:, 1, :, :] = linear_combination_map
x[:, 2, :, :] = linear_combination_map
m = torch.nn.Upsample(scale_factor=32, mode='bicubic')
mm = m(x).cuda()
mm = torch.sigmoid(10. * mm / torch.std(mm))
return mm * images
def show_img(ax, img, save_name):
'''Save maps'''
npimg = img.cpu().numpy()
print(npimg.shape)
ax.imshow(np.transpose(npimg, (1, 2, 0)), interpolation='nearest')
ax.spines["bottom"].set_visible(False)
ax.spines["left"].set_visible(False)
ax.spines["right"].set_visible(False)
ax.spines["top"].set_visible(False)
mp.rcParams['axes.linewidth'] = 0.75
mp.rcParams['patch.linewidth'] = 0.75
mp.rcParams['patch.linewidth'] = 1.15
mp.rcParams['font.sans-serif'] = ['FreeSans']
mp.rcParams['mathtext.fontset'] = 'cm'
plt.savefig(save_name, bbox_inches='tight')
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Plot spatial attention maps')
parser.add_argument('data', metavar='DIR', help='path to dataset')
parser.add_argument('--workers', default=4, type=int, help='number of data loading workers (default: 4)')
parser.add_argument('--batch-size', default=36, type=int, help='mini-batch size, this is the total '
'batch size of all GPUs on the current node when '
'using Data Parallel or Distributed Data Parallel')
parser.add_argument('--model-path', default='', type=str, help='path to model checkpoint (default: '
'ImageNet-pretrained)')
parser.add_argument('--n_out', default=1000, type=int, help='output dim of pre-trained model')
parser.add_argument('--class-idx', default=6, type=int, help='class index for which the maps will be computed')
args = parser.parse_args()
model = load_model(args)
map_layer = extract_map_layer_7x7(model)
weights = model.module.classifier.weight.data[args.class_idx, :].cuda()
data_loader = load_data(args.data, args)
preds = predict(data_loader, map_layer, weights, args.batch_size)
print('Preds shape:', preds.shape)
fig_pred = plt.figure(figsize=(16, 16), dpi=300)
ax_pred = fig_pred.add_subplot('111')
grid_pred = make_grid(preds, nrow=12, padding=1, normalize=True, scale_each=False)
show_img(ax_pred, grid_pred, 'linear_combination_maps_class_' + str(args. class_idx) + '.pdf')
================================================
FILE: linear_decoding.py
================================================
import argparse
import os
import random
import shutil
import time
import warnings
import numpy as np
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.multiprocessing as mp
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
parser = argparse.ArgumentParser(description='Linear decoding with headcam data')
parser.add_argument('data', metavar='DIR', help='path to dataset')
parser.add_argument('-j', '--workers', default=32, type=int, metavar='N', help='number of data loading workers (default: 32)')
parser.add_argument('--epochs', default=100, type=int, metavar='N', help='number of total epochs to run')
parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
parser.add_argument('-b', '--batch-size', default=256, type=int, metavar='N',
help='mini-batch size (default: 1024), this is the total batch size of all GPUs on the current node '
'when using Data Parallel or Distributed Data Parallel')
parser.add_argument('--lr', '--learning-rate', default=0.0005, type=float, metavar='LR', help='initial learning rate', dest='lr')
parser.add_argument('--wd', '--weight-decay', default=0.0, type=float, metavar='W', help='weight decay (default: 0)', dest='weight_decay')
parser.add_argument('-p', '--print-freq', default=100, type=int, metavar='N', help='print frequency (default: 100)')
parser.add_argument('--world-size', default=-1, type=int, help='number of nodes for distributed training')
parser.add_argument('--rank', default=-1, type=int, help='node rank for distributed training')
parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, help='url used to set up distributed training')
parser.add_argument('--dist-backend', default='nccl', type=str, help='distributed backend')
parser.add_argument('--gpu', default=None, type=int, help='GPU id to use.')
parser.add_argument('--multiprocessing-distributed', action='store_true',
help='Use multi-processing distributed training to launch '
'N processes per node, which has N GPUs. This is the '
'fastest way to use PyTorch for either single node or '
'multi node data parallel training')
parser.add_argument('--model-name', type=str, default='random',
choices=['random', 'imagenet', 'TC-S', 'TC-A', 'TC-Y', 'TC-SAY', 'moco_img_0011', 'moco_temp_0011'],
help='evaluated model')
parser.add_argument('--num-outs', default=16127, type=int, help='number of outputs in pretrained model')
parser.add_argument('--num-classes', default=26, type=int, help='number of classes in downstream classification task')
parser.add_argument('--subsample', default=False, action='store_true', help='subsample data?')
def set_parameter_requires_grad(model, feature_extracting=True):
'''Helper function for setting body to non-trainable'''
if feature_extracting:
for param in model.parameters():
param.requires_grad = False
def load_split_train_test(datadir, args, train_frac=0.5):
import numpy as np
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
train_data = datasets.ImageFolder(datadir, transform=transforms.Compose([transforms.ToTensor(), normalize]))
test_data = datasets.ImageFolder(datadir, transform=transforms.Compose([transforms.ToTensor(), normalize]))
num_train = len(train_data)
print('Total data size is', num_train)
indices = list(range(num_train))
split = int(np.floor(train_frac * num_train))
np.random.shuffle(indices)
if args.subsample:
num_data = int(0.1 * num_train)
train_idx, test_idx = indices[:(num_data // 2)], indices[(num_data // 2):num_data]
else:
train_idx, test_idx = indices[:split], indices[split:]
print('Training data size is', len(train_idx))
print('Test data size is', len(test_idx))
train_sampler = torch.utils.data.sampler.SubsetRandomSampler(train_idx)
test_sampler = torch.utils.data.sampler.SubsetRandomSampler(test_idx)
trainloader = torch.utils.data.DataLoader(train_data, batch_size=args.batch_size, shuffle=False,
num_workers=args.workers, pin_memory=True, sampler=train_sampler)
testloader = torch.utils.data.DataLoader(test_data, batch_size=args.batch_size, shuffle=False,
num_workers=args.workers, pin_memory=True, sampler=test_sampler)
return trainloader, testloader
def main():
args = parser.parse_args()
if args.gpu is not None:
warnings.warn('You have chosen a specific GPU. This will completely disable data parallelism.')
if args.dist_url == "env://" and args.world_size == -1:
args.world_size = int(os.environ["WORLD_SIZE"])
args.distributed = args.world_size > 1 or args.multiprocessing_distributed
ngpus_per_node = torch.cuda.device_count()
if args.multiprocessing_distributed:
# Since we have ngpus_per_node processes per node, the total world_size needs to be adjusted accordingly
args.world_size = ngpus_per_node * args.world_size
# Use torch.multiprocessing.spawn to launch distributed processes: the main_worker process function
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
else:
# Simply call main_worker function
main_worker(args.gpu, ngpus_per_node, args)
def main_worker(gpu, ngpus_per_node, args):
args.gpu = gpu
if args.gpu is not None:
print("Use GPU: {} for training".format(args.gpu))
# model definition
num_classes = args.num_classes
if args.model_name == 'random':
model = models.mobilenet_v2(pretrained=False)
set_parameter_requires_grad(model)
model.classifier = torch.nn.Linear(in_features=1280, out_features=num_classes, bias=True)
model = torch.nn.DataParallel(model).cuda()
elif args.model_name == 'imagenet':
model = models.mobilenet_v2(pretrained=True)
set_parameter_requires_grad(model)
model.classifier = torch.nn.Linear(in_features=1280, out_features=num_classes, bias=True)
model = torch.nn.DataParallel(model).cuda()
elif args.model_name.startswith('moco'):
model = models.mobilenet_v2(pretrained=False)
model.classifier = torch.nn.Linear(in_features=1280, out_features=args.num_outs, bias=True)
checkpoint = torch.load('../self_supervised_models/' + args.model_name + '.pth.tar')
# rename moco pre-trained keys
state_dict = checkpoint['state_dict']
for k in list(state_dict.keys()):
# retain only encoder_q up to before the embedding layer
if k.startswith('module.encoder_q') and not k.startswith('module.encoder_q.classifier'):
# remove prefix
state_dict[k[len("module.encoder_q."):]] = state_dict[k]
# delete renamed or unused k
del state_dict[k]
msg = model.load_state_dict(state_dict, strict=False)
assert set(msg.missing_keys) == {"classifier.weight", "classifier.bias"}
print("=> loaded pre-trained model '{}'".format(args.model_name))
set_parameter_requires_grad(model) # freeze the trunk
model.classifier = torch.nn.Linear(in_features=1280, out_features=num_classes, bias=True)
model = torch.nn.DataParallel(model).cuda()
else:
model = models.resnext50_32x4d(pretrained=False)
model.fc = torch.nn.Linear(in_features=2048, out_features=args.num_outs, bias=True)
model = torch.nn.DataParallel(model).cuda()
checkpoint = torch.load(args.model_name + '.tar')
model.load_state_dict(checkpoint['model_state_dict'])
set_parameter_requires_grad(model) # freeze the trunk
model.module.fc = torch.nn.Linear(in_features=2048, out_features=num_classes, bias=True).cuda()
# define loss function (criterion) and optimizer
criterion = nn.CrossEntropyLoss().cuda(args.gpu)
optimizer = torch.optim.Adam(model.parameters(), args.lr, weight_decay=args.weight_decay)
cudnn.benchmark = True
# Data loading code
savefile_name = args.model_name + '_labeledS_5_iid.tar'
train_loader, test_loader = load_split_train_test(args.data, args)
acc1_list = []
val_acc1_list = []
for epoch in range(args.start_epoch, args.epochs):
# train for one epoch
acc1 = train(train_loader, model, criterion, optimizer, epoch, args)
acc1_list.append(acc1)
# validate at end of epoch
val_acc1, preds, target, images = validate(test_loader, model, args)
val_acc1_list.append(val_acc1)
torch.save({'acc1_list': acc1_list,
'val_acc1_list': val_acc1_list,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'preds': preds,
'target': target,
'images': images
}, savefile_name)
def train(train_loader, model, criterion, optimizer, epoch, args):
batch_time = AverageMeter('Time', ':6.3f')
data_time = AverageMeter('Data', ':6.3f')
losses = AverageMeter('Loss', ':.4e')
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
progress = ProgressMeter(
len(train_loader),
[batch_time, data_time, losses, top1, top5],
prefix="Epoch: [{}]".format(epoch))
# switch to train mode
model.train()
end = time.time()
for i, (images, target) in enumerate(train_loader):
# measure data loading time
data_time.update(time.time() - end)
if args.gpu is not None:
images = images.cuda(args.gpu, non_blocking=True)
target = target.cuda(args.gpu, non_blocking=True)
# compute output
output = model(images)
loss = criterion(output, target)
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 2))
losses.update(loss.item(), images.size(0))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))
# compute gradient and do SGD step
optimizer.zero_grad()
loss.backward()
optimizer.step()
# for param in model.parameters():
# print(param.requires_grad)
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i % args.print_freq == 0:
progress.display(i)
return top1.avg.cpu().numpy()
def validate(val_loader, model, args):
batch_time = AverageMeter('Time', ':6.3f')
top1 = AverageMeter('Acc@1', ':6.2f')
# switch to evaluate mode
model.eval()
with torch.no_grad():
end = time.time()
for i, (images, target) in enumerate(val_loader):
if args.gpu is not None:
images = images.cuda(args.gpu, non_blocking=True)
target = target.cuda(args.gpu, non_blocking=True)
# compute output
output = model(images)
preds = np.argmax(output.cpu().numpy(), axis=1)
# measure accuracy and record loss
acc1 = accuracy(output, target, topk=(1, ))
top1.update(acc1[0].cpu().numpy()[0], images.size(0))
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
print('* Acc@1 {top1.avg:.3f} '.format(top1=top1))
return top1.avg, preds, target.cpu().numpy(), images.cpu().numpy()
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self, name, fmt=':f'):
self.name = name
self.fmt = fmt
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def __str__(self):
fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
return fmtstr.format(**self.__dict__)
class ProgressMeter(object):
def __init__(self, num_batches, meters, prefix=""):
self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
self.meters = meters
self.prefix = prefix
def display(self, batch):
entries = [self.prefix + self.batch_fmtstr.format(batch)]
entries += [str(meter) for meter in self.meters]
print('\t'.join(entries))
def _get_batch_fmtstr(self, num_batches):
num_digits = len(str(num_batches // 1))
fmt = '{:' + str(num_digits) + 'd}'
return '[' + fmt + '/' + fmt.format(num_batches) + ']'
def accuracy(output, target, topk=(1,)):
"""Computes the accuracy over the k top predictions for the specified values of k"""
with torch.no_grad():
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
if __name__ == '__main__':
main()
================================================
FILE: moco/__init__.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
================================================
FILE: moco/builder.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import torch
import torch.nn as nn
class MoCo(nn.Module):
"""
Build a MoCo model with: a query encoder, a key encoder, and a queue
https://arxiv.org/abs/1911.05722
"""
def __init__(self, base_encoder, dim=128, K=65536, m=0.999, T=0.07, mlp=False):
"""
dim: feature dimension (default: 128)
K: queue size; number of negative keys (default: 65536)
m: moco momentum of updating key encoder (default: 0.999)
T: softmax temperature (default: 0.07)
"""
super(MoCo, self).__init__()
self.K = K
self.m = m
self.T = T
# create the encoders
# num_classes is the output fc dimension
self.encoder_q = base_encoder(num_classes=dim)
self.encoder_k = base_encoder(num_classes=dim)
# self.encoder_q.classifier = self.encoder_q.classifier[-1] # remove dropout (only for mobilenet_v2)
# self.encoder_k.classifier = self.encoder_k.classifier[-1] # remove dropout (only for mobilenet_v2)
if mlp: # hack: brute-force replacement
dim_mlp = self.encoder_q.fc.weight.shape[1]
self.encoder_q.fc = nn.Sequential(nn.Linear(dim_mlp, dim_mlp), nn.ReLU(), self.encoder_q.fc)
self.encoder_k.fc = nn.Sequential(nn.Linear(dim_mlp, dim_mlp), nn.ReLU(), self.encoder_k.fc)
for param_q, param_k in zip(self.encoder_q.parameters(), self.encoder_k.parameters()):
param_k.data.copy_(param_q.data) # initialize
param_k.requires_grad = False # not update by gradient
# create the queue
self.register_buffer("queue", torch.randn(dim, K))
self.queue = nn.functional.normalize(self.queue, dim=0)
self.register_buffer("queue_ptr", torch.zeros(1, dtype=torch.long))
@torch.no_grad()
def _momentum_update_key_encoder(self):
"""
Momentum update of the key encoder
"""
for param_q, param_k in zip(self.encoder_q.parameters(), self.encoder_k.parameters()):
param_k.data = param_k.data * self.m + param_q.data * (1. - self.m)
@torch.no_grad()
def _dequeue_and_enqueue(self, keys):
# gather keys before updating queue
keys = concat_all_gather(keys)
batch_size = keys.shape[0]
ptr = int(self.queue_ptr)
assert self.K % batch_size == 0 # for simplicity
# replace the keys at ptr (dequeue and enqueue)
self.queue[:, ptr:ptr + batch_size] = keys.T
ptr = (ptr + batch_size) % self.K # move pointer
self.queue_ptr[0] = ptr
@torch.no_grad()
def _batch_shuffle_ddp(self, x):
"""
Batch shuffle, for making use of BatchNorm.
*** Only support DistributedDataParallel (DDP) model. ***
"""
# gather from all gpus
batch_size_this = x.shape[0]
x_gather = concat_all_gather(x)
batch_size_all = x_gather.shape[0]
num_gpus = batch_size_all // batch_size_this
# random shuffle index
idx_shuffle = torch.randperm(batch_size_all).cuda()
# broadcast to all gpus
torch.distributed.broadcast(idx_shuffle, src=0)
# index for restoring
idx_unshuffle = torch.argsort(idx_shuffle)
# shuffled index for this gpu
gpu_idx = torch.distributed.get_rank()
idx_this = idx_shuffle.view(num_gpus, -1)[gpu_idx]
return x_gather[idx_this], idx_unshuffle
@torch.no_grad()
def _batch_unshuffle_ddp(self, x, idx_unshuffle):
"""
Undo batch shuffle.
*** Only support DistributedDataParallel (DDP) model. ***
"""
# gather from all gpus
batch_size_this = x.shape[0]
x_gather = concat_all_gather(x)
batch_size_all = x_gather.shape[0]
num_gpus = batch_size_all // batch_size_this
# restored index for this gpu
gpu_idx = torch.distributed.get_rank()
idx_this = idx_unshuffle.view(num_gpus, -1)[gpu_idx]
return x_gather[idx_this]
def forward(self, im_q, im_k):
"""
Input:
im_q: a batch of query images
im_k: a batch of key images
Output:
logits, targets
"""
# compute query features
q = self.encoder_q(im_q) # queries: NxC
q = nn.functional.normalize(q, dim=1)
# compute key features
with torch.no_grad(): # no gradient to keys
self._momentum_update_key_encoder() # update the key encoder
# shuffle for making use of BN
im_k, idx_unshuffle = self._batch_shuffle_ddp(im_k)
k = self.encoder_k(im_k) # keys: NxC
k = nn.functional.normalize(k, dim=1)
# undo shuffle
k = self._batch_unshuffle_ddp(k, idx_unshuffle)
# compute logits
# Einstein sum is more intuitive
# positive logits: Nx1
l_pos = torch.einsum('nc,nc->n', [q, k]).unsqueeze(-1)
# negative logits: NxK
l_neg = torch.einsum('nc,ck->nk', [q, self.queue.clone().detach()])
# logits: Nx(1+K)
logits = torch.cat([l_pos, l_neg], dim=1)
# apply temperature
logits /= self.T
# labels: positive key indicators
labels = torch.zeros(logits.shape[0], dtype=torch.long).cuda()
# dequeue and enqueue
self._dequeue_and_enqueue(k)
return logits, labels
# utils
@torch.no_grad()
def concat_all_gather(tensor):
"""
Performs all_gather operation on the provided tensors.
*** Warning ***: torch.distributed.all_gather has no gradient.
"""
tensors_gather = [torch.ones_like(tensor)
for _ in range(torch.distributed.get_world_size())]
torch.distributed.all_gather(tensors_gather, tensor, async_op=False)
output = torch.cat(tensors_gather, dim=0)
return output
================================================
FILE: moco/loader.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from PIL import ImageFilter
import random
class TwoCropsTransform:
"""Take two random crops of one image as the query and key."""
def __init__(self, base_transform):
self.base_transform = base_transform
def __call__(self, x):
q = self.base_transform(x)
k = self.base_transform(x)
return [q, k]
class GaussianBlur(object):
"""Gaussian blur augmentation in SimCLR https://arxiv.org/abs/2002.05709"""
def __init__(self, sigma=[.1, 2.]):
self.sigma = sigma
def __call__(self, x):
sigma = random.uniform(self.sigma[0], self.sigma[1])
x = x.filter(ImageFilter.GaussianBlur(radius=sigma))
return x
================================================
FILE: moco_img.py
================================================
#!/usr/bin/env python
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import argparse
import builtins
import math
import os
import random
import shutil
import time
import warnings
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.multiprocessing as mp
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
import moco.loader
import moco.builder
model_names = sorted(name for name in models.__dict__
if name.islower() and not name.startswith("__")
and callable(models.__dict__[name]))
parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
parser.add_argument('data', metavar='DIR',
help='path to dataset')
parser.add_argument('-a', '--arch', metavar='ARCH', default='resnet50',
choices=model_names,
help='model architecture: ' +
' | '.join(model_names) +
' (default: resnet50)')
parser.add_argument('-j', '--workers', default=32, type=int, metavar='N',
help='number of data loading workers (default: 32)')
parser.add_argument('--epochs', default=12, type=int, metavar='N',
help='number of total epochs to run')
parser.add_argument('--start-epoch', default=0, type=int, metavar='N',
help='manual epoch number (useful on restarts)')
parser.add_argument('-b', '--batch-size', default=256, type=int,
metavar='N',
help='mini-batch size (default: 256), this is the total '
'batch size of all GPUs on the current node when '
'using Data Parallel or Distributed Data Parallel')
parser.add_argument('--lr', '--learning-rate', default=0.03, type=float,
metavar='LR', help='initial learning rate', dest='lr')
parser.add_argument('--schedule', default=[11, 20], nargs='*', type=int,
help='learning rate schedule (when to drop lr by 10x)')
parser.add_argument('--momentum', default=0.9, type=float, metavar='M',
help='momentum of SGD solver')
parser.add_argument('--wd', '--weight-decay', default=0, type=float,
metavar='W', help='weight decay (default: 0)',
dest='weight_decay')
parser.add_argument('-p', '--print-freq', default=1000, type=int,
metavar='N', help='print frequency (default: 10)')
parser.add_argument('--resume', default='', type=str, metavar='PATH',
help='path to latest checkpoint (default: none)')
parser.add_argument('--world-size', default=-1, type=int,
help='number of nodes for distributed training')
parser.add_argument('--rank', default=-1, type=int,
help='node rank for distributed training')
parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str,
help='url used to set up distributed training')
parser.add_argument('--dist-backend', default='nccl', type=str,
help='distributed backend')
parser.add_argument('--seed', default=None, type=int,
help='seed for initializing training. ')
parser.add_argument('--gpu', default=None, type=int,
help='GPU id to use.')
parser.add_argument('--multiprocessing-distributed', action='store_true',
help='Use multi-processing distributed training to launch '
'N processes per node, which has N GPUs. This is the '
'fastest way to use PyTorch for either single node or '
'multi node data parallel training')
# moco specific configs:
parser.add_argument('--moco-dim', default=128, type=int,
help='feature dimension (default: 128)')
parser.add_argument('--moco-k', default=65536, type=int,
help='queue size; number of negative keys (default: 65536)')
parser.add_argument('--moco-m', default=0.999, type=float,
help='moco momentum of updating key encoder (default: 0.999)')
parser.add_argument('--moco-t', default=0.07, type=float,
help='softmax temperature (default: 0.07)')
# options for moco v2
parser.add_argument('--mlp', action='store_true',
help='use mlp head')
parser.add_argument('--aug-plus', action='store_true',
help='use moco v2 data augmentation')
parser.add_argument('--cos', action='store_true',
help='use cosine lr schedule')
def main():
args = parser.parse_args()
if args.seed is not None:
random.seed(args.seed)
torch.manual_seed(args.seed)
cudnn.deterministic = True
warnings.warn('You have chosen to seed training. '
'This will turn on the CUDNN deterministic setting, '
'which can slow down your training considerably! '
'You may see unexpected behavior when restarting '
'from checkpoints.')
if args.gpu is not None:
warnings.warn('You have chosen a specific GPU. This will completely '
'disable data parallelism.')
if args.dist_url == "env://" and args.world_size == -1:
args.world_size = int(os.environ["WORLD_SIZE"])
args.distributed = args.world_size > 1 or args.multiprocessing_distributed
ngpus_per_node = torch.cuda.device_count()
if args.multiprocessing_distributed:
# Since we have ngpus_per_node processes per node, the total world_size
# needs to be adjusted accordingly
args.world_size = ngpus_per_node * args.world_size
# Use torch.multiprocessing.spawn to launch distributed processes: the
# main_worker process function
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
else:
# Simply call main_worker function
main_worker(args.gpu, ngpus_per_node, args)
def main_worker(gpu, ngpus_per_node, args):
args.gpu = gpu
print(args)
# suppress printing if not master
if args.multiprocessing_distributed and args.gpu != 0:
def print_pass(*args):
pass
builtins.print = print_pass
if args.gpu is not None:
print("Use GPU: {} for training".format(args.gpu))
if args.distributed:
if args.dist_url == "env://" and args.rank == -1:
args.rank = int(os.environ["RANK"])
if args.multiprocessing_distributed:
# For multiprocessing distributed training, rank needs to be the
# global rank among all the processes
args.rank = args.rank * ngpus_per_node + gpu
dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
world_size=args.world_size, rank=args.rank)
# create model
print("=> creating model '{}'".format(args.arch))
model = moco.builder.MoCo(
models.__dict__[args.arch],
args.moco_dim, args.moco_k, args.moco_m, args.moco_t, args.mlp)
print(model)
if args.distributed:
# For multiprocessing distributed, DistributedDataParallel constructor
# should always set the single device scope, otherwise,
# DistributedDataParallel will use all available devices.
if args.gpu is not None:
torch.cuda.set_device(args.gpu)
model.cuda(args.gpu)
# When using a single GPU per process and per
# DistributedDataParallel, we need to divide the batch size
# ourselves based on the total number of GPUs we have
args.batch_size = int(args.batch_size / ngpus_per_node)
args.workers = int((args.workers + ngpus_per_node - 1) / ngpus_per_node)
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu])
else:
model.cuda()
# DistributedDataParallel will divide and allocate batch_size to all
# available GPUs if device_ids are not set
model = torch.nn.parallel.DistributedDataParallel(model)
elif args.gpu is not None:
torch.cuda.set_device(args.gpu)
model = model.cuda(args.gpu)
# comment out the following line for debugging
raise NotImplementedError("Only DistributedDataParallel is supported.")
else:
# AllGather implementation (batch shuffle, queue update, etc.) in
# this code only supports DistributedDataParallel.
raise NotImplementedError("Only DistributedDataParallel is supported.")
# define loss function (criterion) and optimizer
criterion = nn.CrossEntropyLoss().cuda(args.gpu)
optimizer = torch.optim.SGD(model.parameters(), args.lr,
momentum=args.momentum,
weight_decay=args.weight_decay)
# optionally resume from a checkpoint
if args.resume:
if os.path.isfile(args.resume):
print("=> loading checkpoint '{}'".format(args.resume))
if args.gpu is None:
checkpoint = torch.load(args.resume)
else:
# Map model to be loaded to specified single gpu.
loc = 'cuda:{}'.format(args.gpu)
checkpoint = torch.load(args.resume, map_location=loc)
args.start_epoch = checkpoint['epoch']
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
print("=> loaded checkpoint '{}' (epoch {})"
.format(args.resume, checkpoint['epoch']))
else:
print("=> no checkpoint found at '{}'".format(args.resume))
cudnn.benchmark = True
# Data loading code
traindir = args.data
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
if args.aug_plus:
# MoCo v2's aug: similar to SimCLR https://arxiv.org/abs/2002.05709
augmentation = [
transforms.RandomResizedCrop(224, scale=(0.2, 1.)),
transforms.RandomApply([
transforms.ColorJitter(0.4, 0.4, 0.4, 0.1) # not strengthened
], p=0.8),
transforms.RandomGrayscale(p=0.2),
transforms.RandomApply([moco.loader.GaussianBlur([.1, 2.])], p=0.5),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize
]
else:
# MoCo v1's aug: the same as InstDisc https://arxiv.org/abs/1805.01978
augmentation = [
transforms.RandomResizedCrop(224, scale=(0.2, 1.)),
transforms.RandomGrayscale(p=0.2),
transforms.ColorJitter(0.4, 0.4, 0.4, 0.4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize
]
train_dataset = datasets.ImageFolder(
traindir,
moco.loader.TwoCropsTransform(transforms.Compose(augmentation)))
print('Dataset size:', len(train_dataset))
if args.distributed:
train_sampler = torch.utils.data.distributed.DistributedSampler(train_dataset)
else:
train_sampler = None
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=(train_sampler is None),
num_workers=args.workers, pin_memory=True, sampler=train_sampler, drop_last=True)
print('Starting training ...')
for epoch in range(args.start_epoch, args.epochs):
if args.distributed:
train_sampler.set_epoch(epoch)
adjust_learning_rate(optimizer, epoch, args)
print('Start of epoch ', epoch)
# train for one epoch
train(train_loader, model, criterion, optimizer, epoch, args)
if not args.multiprocessing_distributed or (args.multiprocessing_distributed
and args.rank % ngpus_per_node == 0):
save_checkpoint({
'epoch': epoch + 1,
'arch': args.arch,
'state_dict': model.state_dict(),
'optimizer' : optimizer.state_dict(),
}, is_best=False, filename='moco_img_checkpoint_{:04d}.pth.tar'.format(epoch))
def train(train_loader, model, criterion, optimizer, epoch, args):
batch_time = AverageMeter('Time', ':6.3f')
data_time = AverageMeter('Data', ':6.3f')
losses = AverageMeter('Loss', ':.4e')
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
progress = ProgressMeter(
len(train_loader),
[batch_time, data_time, losses, top1, top5],
prefix="Epoch: [{}]".format(epoch))
# switch to train mode
model.train()
end = time.time()
for i, (images, _) in enumerate(train_loader):
# measure data loading time
data_time.update(time.time() - end)
if args.gpu is not None:
images[0] = images[0].cuda(args.gpu, non_blocking=True)
images[1] = images[1].cuda(args.gpu, non_blocking=True)
# compute output
output, target = model(im_q=images[0], im_k=images[1])
loss = criterion(output, target)
# acc1/acc5 are (K+1)-way contrast classifier accuracy
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.item(), images[0].size(0))
top1.update(acc1[0], images[0].size(0))
top5.update(acc5[0], images[0].size(0))
# compute gradient and do SGD step
optimizer.zero_grad()
loss.backward()
optimizer.step()
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i % args.print_freq == 0:
progress.display(i)
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
torch.save(state, filename)
if is_best:
shutil.copyfile(filename, 'model_best.pth.tar')
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self, name, fmt=':f'):
self.name = name
self.fmt = fmt
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def __str__(self):
fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
return fmtstr.format(**self.__dict__)
class ProgressMeter(object):
def __init__(self, num_batches, meters, prefix=""):
self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
self.meters = meters
self.prefix = prefix
def display(self, batch):
entries = [self.prefix + self.batch_fmtstr.format(batch)]
entries += [str(meter) for meter in self.meters]
print('\t'.join(entries))
def _get_batch_fmtstr(self, num_batches):
num_digits = len(str(num_batches // 1))
fmt = '{:' + str(num_digits) + 'd}'
return '[' + fmt + '/' + fmt.format(num_batches) + ']'
def adjust_learning_rate(optimizer, epoch, args):
"""Decay the learning rate based on schedule"""
lr = args.lr
if args.cos: # cosine lr schedule
lr *= 0.5 * (1. + math.cos(math.pi * epoch / args.epochs))
else: # stepwise lr schedule
for milestone in args.schedule:
lr *= 0.1 if epoch >= milestone else 1.
for param_group in optimizer.param_groups:
param_group['lr'] = lr
def accuracy(output, target, topk=(1,)):
"""Computes the accuracy over the k top predictions for the specified values of k"""
with torch.no_grad():
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
if __name__ == '__main__':
main()
================================================
FILE: moco_temp.py
================================================
#!/usr/bin/env python
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
import argparse
import builtins
import math
import os
import random
import shutil
import time
import warnings
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.multiprocessing as mp
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
import moco.loader
import moco.builder
from moco_utils import DistributedProxySampler, ContrastiveBatchSampler
model_names = sorted(name for name in models.__dict__
if name.islower() and not name.startswith("__")
and callable(models.__dict__[name]))
parser = argparse.ArgumentParser(description='PyTorch ImageNet Training')
parser.add_argument('data', metavar='DIR',
help='path to dataset')
parser.add_argument('-a', '--arch', metavar='ARCH', default='resnet50',
choices=model_names,
help='model architecture: ' +
' | '.join(model_names) +
' (default: resnet50)')
parser.add_argument('-j', '--workers', default=32, type=int, metavar='N',
help='number of data loading workers (default: 32)')
parser.add_argument('--epochs', default=12, type=int, metavar='N',
help='number of total epochs to run')
parser.add_argument('--start-epoch', default=0, type=int, metavar='N',
help='manual epoch number (useful on restarts)')
parser.add_argument('-b', '--batch-size', default=256, type=int,
metavar='N',
help='mini-batch size (default: 256), this is the total '
'batch size of all GPUs on the current node when '
'using Data Parallel or Distributed Data Parallel')
parser.add_argument('--lr', '--learning-rate', default=0.03, type=float,
metavar='LR', help='initial learning rate', dest='lr')
parser.add_argument('--schedule', default=[11, 20], nargs='*', type=int,
help='learning rate schedule (when to drop lr by 10x)')
parser.add_argument('--momentum', default=0.9, type=float, metavar='M',
help='momentum of SGD solver')
parser.add_argument('--wd', '--weight-decay', default=0, type=float,
metavar='W', help='weight decay (default: 0)',
dest='weight_decay')
parser.add_argument('-p', '--print-freq', default=1000, type=int,
metavar='N', help='print frequency (default: 10)')
parser.add_argument('--resume', default='', type=str, metavar='PATH',
help='path to latest checkpoint (default: none)')
parser.add_argument('--world-size', default=-1, type=int,
help='number of nodes for distributed training')
parser.add_argument('--rank', default=-1, type=int,
help='node rank for distributed training')
parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str,
help='url used to set up distributed training')
parser.add_argument('--dist-backend', default='nccl', type=str,
help='distributed backend')
parser.add_argument('--seed', default=None, type=int,
help='seed for initializing training. ')
parser.add_argument('--gpu', default=None, type=int,
help='GPU id to use.')
parser.add_argument('--multiprocessing-distributed', action='store_true',
help='Use multi-processing distributed training to launch '
'N processes per node, which has N GPUs. This is the '
'fastest way to use PyTorch for either single node or '
'multi node data parallel training')
# moco specific configs:
parser.add_argument('--moco-dim', default=128, type=int,
help='feature dimension (default: 128)')
parser.add_argument('--moco-k', default=65536, type=int,
help='queue size; number of negative keys (default: 65536)')
parser.add_argument('--moco-m', default=0.999, type=float,
help='moco momentum of updating key encoder (default: 0.999)')
parser.add_argument('--moco-t', default=0.07, type=float,
help='softmax temperature (default: 0.07)')
# options for moco v2
parser.add_argument('--mlp', action='store_true',
help='use mlp head')
parser.add_argument('--aug-plus', action='store_true',
help='use moco v2 data augmentation')
parser.add_argument('--cos', action='store_true',
help='use cosine lr schedule')
def main():
args = parser.parse_args()
if args.seed is not None:
random.seed(args.seed)
torch.manual_seed(args.seed)
cudnn.deterministic = True
warnings.warn('You have chosen to seed training. '
'This will turn on the CUDNN deterministic setting, '
'which can slow down your training considerably! '
'You may see unexpected behavior when restarting '
'from checkpoints.')
if args.gpu is not None:
warnings.warn('You have chosen a specific GPU. This will completely '
'disable data parallelism.')
if args.dist_url == "env://" and args.world_size == -1:
args.world_size = int(os.environ["WORLD_SIZE"])
args.distributed = args.world_size > 1 or args.multiprocessing_distributed
ngpus_per_node = torch.cuda.device_count()
if args.multiprocessing_distributed:
# Since we have ngpus_per_node processes per node, the total world_size
# needs to be adjusted accordingly
args.world_size = ngpus_per_node * args.world_size
# Use torch.multiprocessing.spawn to launch distributed processes: the
# main_worker process function
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
else:
# Simply call main_worker function
main_worker(args.gpu, ngpus_per_node, args)
def main_worker(gpu, ngpus_per_node, args):
args.gpu = gpu
# suppress printing if not master
if args.multiprocessing_distributed and args.gpu != 0:
def print_pass(*args):
pass
builtins.print = print_pass
if args.gpu is not None:
print("Use GPU: {} for training".format(args.gpu))
if args.distributed:
if args.dist_url == "env://" and args.rank == -1:
args.rank = int(os.environ["RANK"])
if args.multiprocessing_distributed:
# For multiprocessing distributed training, rank needs to be the
# global rank among all the processes
args.rank = args.rank * ngpus_per_node + gpu
dist.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
world_size=args.world_size, rank=args.rank)
# create model
print("=> creating model '{}'".format(args.arch))
model = moco.builder.MoCo(
models.__dict__[args.arch],
args.moco_dim, args.moco_k, args.moco_m, args.moco_t, args.mlp)
print(model)
if args.distributed:
# For multiprocessing distributed, DistributedDataParallel constructor
# should always set the single device scope, otherwise,
# DistributedDataParallel will use all available devices.
if args.gpu is not None:
torch.cuda.set_device(args.gpu)
model.cuda(args.gpu)
# When using a single GPU per process and per
# DistributedDataParallel, we need to divide the batch size
# ourselves based on the total number of GPUs we have
args.batch_size = int(args.batch_size / ngpus_per_node)
args.workers = int((args.workers + ngpus_per_node - 1) / ngpus_per_node)
model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.gpu])
else:
model.cuda()
# DistributedDataParallel will divide and allocate batch_size to all
# available GPUs if device_ids are not set
model = torch.nn.parallel.DistributedDataParallel(model)
elif args.gpu is not None:
torch.cuda.set_device(args.gpu)
model = model.cuda(args.gpu)
# comment out the following line for debugging
raise NotImplementedError("Only DistributedDataParallel is supported.")
else:
# AllGather implementation (batch shuffle, queue update, etc.) in
# this code only supports DistributedDataParallel.
raise NotImplementedError("Only DistributedDataParallel is supported.")
# define loss function (criterion) and optimizer
criterion = nn.CrossEntropyLoss().cuda(args.gpu)
optimizer = torch.optim.SGD(model.parameters(), args.lr,
momentum=args.momentum,
weight_decay=args.weight_decay)
# optionally resume from a checkpoint
if args.resume:
if os.path.isfile(args.resume):
print("=> loading checkpoint '{}'".format(args.resume))
if args.gpu is None:
checkpoint = torch.load(args.resume)
else:
# Map model to be loaded to specified single gpu.
loc = 'cuda:{}'.format(args.gpu)
checkpoint = torch.load(args.resume, map_location=loc)
args.start_epoch = checkpoint['epoch']
model.load_state_dict(checkpoint['state_dict'])
optimizer.load_state_dict(checkpoint['optimizer'])
print("=> loaded checkpoint '{}' (epoch {})"
.format(args.resume, checkpoint['epoch']))
else:
print("=> no checkpoint found at '{}'".format(args.resume))
cudnn.benchmark = True
# Data loading code
traindir = args.data
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
if args.aug_plus:
# MoCo v2's aug: similar to SimCLR https://arxiv.org/abs/2002.05709
augmentation = [
transforms.RandomResizedCrop(224, scale=(0.2, 1.)),
transforms.RandomApply([
transforms.ColorJitter(0.4, 0.4, 0.4, 0.1) # not strengthened
], p=0.8),
transforms.RandomGrayscale(p=0.2),
transforms.RandomApply([moco.loader.GaussianBlur([.1, 2.])], p=0.5),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize
]
else:
# MoCo v1's aug: the same as InstDisc https://arxiv.org/abs/1805.01978
augmentation = [
transforms.RandomResizedCrop(224, scale=(0.2, 1.)),
transforms.RandomGrayscale(p=0.2),
transforms.ColorJitter(0.4, 0.4, 0.4, 0.4),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize
]
train_dataset = datasets.ImageFolder(
traindir, transforms.Compose(augmentation))
print('Dataset size:', len(train_dataset))
if args.distributed:
train_sampler = DistributedProxySampler(ContrastiveBatchSampler(
train_dataset, args.batch_size, 1, False))
else:
train_sampler = None
train_loader = torch.utils.data.DataLoader(train_dataset, shuffle=(train_sampler is None),
num_workers=args.workers, pin_memory=True,
batch_sampler=train_sampler)
print('Starting training ...')
for epoch in range(args.start_epoch, args.epochs):
if args.distributed:
train_sampler.set_epoch(epoch)
adjust_learning_rate(optimizer, epoch, args)
print('Start of epoch ', epoch)
# train for one epoch
train(train_loader, model, criterion, optimizer, epoch, args)
if not args.multiprocessing_distributed or (args.multiprocessing_distributed
and args.rank % ngpus_per_node == 0):
save_checkpoint({
'epoch': epoch + 1,
'arch': args.arch,
'state_dict': model.state_dict(),
'optimizer' : optimizer.state_dict(),
}, is_best=False, filename='moco_temp_checkpoint_{:04d}.pth.tar'.format(epoch))
def train(train_loader, model, criterion, optimizer, epoch, args):
batch_time = AverageMeter('Time', ':6.3f')
data_time = AverageMeter('Data', ':6.3f')
losses = AverageMeter('Loss', ':.4e')
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
progress = ProgressMeter(
len(train_loader),
[batch_time, data_time, losses, top1, top5],
prefix="Epoch: [{}]".format(epoch))
# switch to train mode
model.train()
end = time.time()
for i, (images, _) in enumerate(train_loader):
# measure data loading time
data_time.update(time.time() - end)
if args.gpu is not None:
images_0 = images[:images.size(0)//2].cuda(args.gpu, non_blocking=True)
images_1 = images[images.size(0)//2:].cuda(args.gpu, non_blocking=True)
# compute output
output, target = model(im_q=images_0, im_k=images_1)
loss = criterion(output, target)
# acc1/acc5 are (K+1)-way contrast classifier accuracy
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.item(), images_0.size(0))
top1.update(acc1[0], images_0.size(0))
top5.update(acc5[0], images_0.size(0))
# compute gradient and do SGD step
optimizer.zero_grad()
loss.backward()
optimizer.step()
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i % args.print_freq == 0:
progress.display(i)
def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
torch.save(state, filename)
if is_best:
shutil.copyfile(filename, 'model_best.pth.tar')
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self, name, fmt=':f'):
self.name = name
self.fmt = fmt
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def __str__(self):
fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
return fmtstr.format(**self.__dict__)
class ProgressMeter(object):
def __init__(self, num_batches, meters, prefix=""):
self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
self.meters = meters
self.prefix = prefix
def display(self, batch):
entries = [self.prefix + self.batch_fmtstr.format(batch)]
entries += [str(meter) for meter in self.meters]
print('\t'.join(entries))
def _get_batch_fmtstr(self, num_batches):
num_digits = len(str(num_batches // 1))
fmt = '{:' + str(num_digits) + 'd}'
return '[' + fmt + '/' + fmt.format(num_batches) + ']'
def adjust_learning_rate(optimizer, epoch, args):
"""Decay the learning rate based on schedule"""
lr = args.lr
if args.cos: # cosine lr schedule
lr *= 0.5 * (1. + math.cos(math.pi * epoch / args.epochs))
else: # stepwise lr schedule
for milestone in args.schedule:
lr *= 0.1 if epoch >= milestone else 1.
for param_group in optimizer.param_groups:
param_group['lr'] = lr
def accuracy(output, target, topk=(1,)):
"""Computes the accuracy over the k top predictions for the specified values of k"""
with torch.no_grad():
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
if __name__ == '__main__':
main()
================================================
FILE: moco_utils.py
================================================
# Defines some util functions
import torch
from torch.utils.data import Sampler
from torch.utils.data.distributed import DistributedSampler
class DistributedProxySampler(DistributedSampler):
"""Sampler that restricts data loading to a subset of input sampler indices.
It is especially useful in conjunction with
:class:`torch.nn.parallel.DistributedDataParallel`. In such case, each
process can pass a DistributedSampler instance as a DataLoader sampler,
and load a subset of the original dataset that is exclusive to it.
.. note::
Input sampler is assumed to be of constant size.
Arguments:
sampler: Input data sampler.
num_replicas (optional): Number of processes participating in
distributed training.
rank (optional): Rank of the current process within num_replicas.
"""
def __init__(self, sampler, num_replicas=None, rank=None):
super(DistributedProxySampler, self).__init__(sampler, num_replicas=num_replicas, rank=rank, shuffle=False)
self.sampler = sampler
def __iter__(self):
# deterministically shuffle based on epoch
torch.manual_seed(self.epoch)
indices = list(self.sampler)
# add extra samples to make it evenly divisible
indices += indices[:(self.total_size - len(indices))]
if len(indices) != self.total_size:
raise RuntimeError("{} vs {}".format(len(indices), self.total_size))
# subsample
indices = indices[self.rank:self.total_size:self.num_replicas]
if len(indices) != self.num_samples:
raise RuntimeError("{} vs {}".format(len(indices), self.num_samples))
return iter(indices)
def set_epoch(self, epoch):
self.epoch = epoch
class ContrastiveBatchSampler(Sampler):
def __init__(self, data_source, batch_size, pos_window, drop_last):
self.data_source = data_source
self.batch_size = batch_size
self.pos_window = pos_window
self.drop_last = drop_last
self.n = len(self.data_source)
def __iter__(self):
for i in range(self.n // self.batch_size):
x = torch.randint(low=0, high=self.n-1, size=(self.batch_size//2,),
dtype=torch.int64)
y = x + torch.randint(low=-self.pos_window, high=self.pos_window, size=(self.batch_size//2,),
dtype=torch.int64)
y = torch.clamp(y, 0, self.n-1)
z = x.tolist() + y.tolist()
yield z
def __len__(self):
if self.drop_last:
return self.n // self.batch_size
else:
return (self.n + self.batch_size - 1) // self.batch_size
================================================
FILE: read_saycam.py
================================================
import os
import sys
import argparse
import cv2
import numpy as np
parser = argparse.ArgumentParser(description='Read SAYCam videos')
parser.add_argument('data', metavar='DIR', help='path to SAYCam videos')
parser.add_argument('--save-dir', default='', type=str, help='save directory')
parser.add_argument('--fps', default=5, type=int, help='sampling rate (frames per second)')
parser.add_argument('--seg-len', default=288, type=int, help='segment length (seconds)')
if __name__ == '__main__':
args = parser.parse_args()
file_list = os.listdir(args.data)
file_list.sort()
class_counter = 0
img_counter = 0
file_counter = 0
final_size = 224
resized_minor_length = 256
edge_filter = False
n_imgs_per_class = args.seg_len * args.fps
curr_dir_name = os.path.join(args.save_dir, 'class_{:04d}'.format(class_counter))
os.mkdir(curr_dir_name)
for file_indx in file_list:
file_name = os.path.join(args.data, file_indx)
cap = cv2.VideoCapture(file_name)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_rate = int(cap.get(cv2.CAP_PROP_FPS))
# take every sample_rate frames (30: 1fps, 15: 2fps, 10: 3fps, 6: 5fps, 5: 6fps, 3: 10fps, 2: 15fps, 1: 30fps)
sample_rate = frame_rate // args.fps + 1
print('Total frame count: ', frame_count)
print('Native frame rate: ', frame_rate)
fc = 0
ret = True
# Resize
new_height = frame_height * resized_minor_length // min(frame_height, frame_width)
new_width = frame_width * resized_minor_length // min(frame_height, frame_width)
while (fc < frame_count):
ret, frame = cap.read()
if fc % sample_rate == 0 and ret:
# Resize
resized_frame = cv2.resize(frame, (new_width, new_height), interpolation=cv2.INTER_CUBIC)
# Crop
height, width, _ = resized_frame.shape
startx = width // 2 - (final_size // 2)
starty = height // 2 - (final_size // 2) - 16
cropped_frame = resized_frame[starty:starty + final_size, startx:startx + final_size]
assert cropped_frame.shape[0] == final_size and cropped_frame.shape[1] == final_size, \
(cropped_frame.shape, height, width)
if edge_filter:
cropped_frame = cv2.Laplacian(cropped_frame, cv2.CV_64F, ksize=5)
img_min = cropped_frame.min()
img_max = cropped_frame.max()
cropped_frame = np.uint8(255 * (cropped_frame - img_min) / (img_max - img_min))
cv2.imwrite(os.path.join(curr_dir_name, 'img_{:04d}.jpeg'.format(img_counter)), cropped_frame[::-1, ::-1, :])
img_counter += 1
if img_counter == n_imgs_per_class:
img_counter = 0
class_counter += 1
curr_dir_name = os.path.join(args.save_dir, 'class_{:04d}'.format(class_counter))
os.mkdir(curr_dir_name)
fc += 1
cap.release()
file_counter += 1
print('Completed video {:4d} of {:4d}'.format(file_counter, len(file_list)))
================================================
FILE: scripts/feature_animation.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --gres=gpu:1080ti:2
#SBATCH --mem=150GB
#SBATCH --time=1:00:00
#SBATCH --array=0
#SBATCH --job-name=feature_animation
#SBATCH --output=feature_animation_%A_%a.out
module purge
module load cuda-10.1
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/feature_animation.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/feature_animation_imgs_intphys/' --model-path '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/self_supervised_models/TC-SAY.tar' --batch-size 900 --n_out 6269 --feature-idx 600
echo "Done"
================================================
FILE: scripts/feature_animation_class.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --exclude=hpc1,hpc2,hpc3,hpc4,hpc5,hpc6,hpc7,hpc8,hpc9,vine3,vine4,vine6,vine11,vine12,lion17,rose7,rose8,rose9
#SBATCH --ntasks=1
#SBATCH --gres=gpu:4
#SBATCH --mem=100GB
#SBATCH --time=1:00:00
#SBATCH --array=0
#SBATCH --job-name=feature_animation
#SBATCH --output=feature_animation_%A_%a.out
module purge
module load cuda-10.1
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/feature_animation_class.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/feature_animation_computers/' --model-path '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/self_supervised_models/TC-S_labeledS_5_iid.tar' --batch-size 200 --n_out 26 --class-idx 6
echo "Done"
================================================
FILE: scripts/highly_activating_imgs.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=1
#SBATCH --mem=16GB
#SBATCH --time=1:00:00
#SBATCH --array=0
#SBATCH --job-name=activating_imgs
#SBATCH --output=activating_imgs_%A_%a.out
module purge
module load cuda-10.1
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/highly_activating_imgs.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --n_out 26 --model-path 'mobilenetV2_S_5fps_2000cls_coloraug_labeled.tar'
echo "Done"
================================================
FILE: scripts/hog_baseline.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=100GB
#SBATCH --time=6:00:00
#SBATCH --array=0
#SBATCH --job-name=hog
#SBATCH --output=hog_%A_%a.out
module purge
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/hog_baseline.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/toybox_1fps/'
echo "Done"
================================================
FILE: scripts/imagenet_finetuning.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --gres=gpu:titanrtx:4
#SBATCH --mem=150GB
#SBATCH --time=48:00:00
#SBATCH --array=0
#SBATCH --job-name=finetune_imgnet
#SBATCH --output=finetune_imgnet_%A_%a.out
module purge
module load cuda-10.1
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --freeze-trunk --n_out 6269 --resume 'resnext50_32x4d_augmentstrong_batch256_True_SAY_5_288_epoch_15.tar'
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --freeze-trunk --n_out 2765 --resume 'resnext50_32x4d_augmentstrong_batch256_True_S_5_288_epoch_10.tar'
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --freeze-trunk --n_out 1786 --resume 'resnext50_32x4d_augmentstrong_batch256_True_A_5_288_epoch_10.tar'
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --freeze-trunk --n_out 1718 --resume 'resnext50_32x4d_augmentstrong_batch256_True_Y_5_288_epoch_10.tar'
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --frac-retained 0.01 --n_out 6269 --resume 'resnext50_32x4d_augmentstrong_batch256_True_SAY_5_288_epoch_15.tar'
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/imagenet_finetuning.py --freeze-trunk --n_out 1000 --resume 'ft_IN_resnext50_32x4d_augmentstrong_batch256_True_SAY_5_288_epoch_15.tar'
echo "Done"
================================================
FILE: scripts/linear_combination_maps.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:1
#SBATCH --cpus-per-task=1
#SBATCH --mem=16GB
#SBATCH --time=1:00:00
#SBATCH --array=0
#SBATCH --job-name=linear_maps
#SBATCH --output=linear_maps_%A_%a.out
module purge
module load cuda-10.1
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_combination_maps.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --n_out 26 --model-path 'mobilenetV2_S_5fps_2000cls_coloraug_labeled.tar'
echo "Done"
================================================
FILE: scripts/linear_decoding.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=8
#SBATCH --gres=gpu:titanrtx:2
#SBATCH --mem=150GB
#SBATCH --time=12:00:00
#SBATCH --array=0
#SBATCH --job-name=linear_decoding
#SBATCH --output=linear_decoding_%A_%a.out
module purge
module load cuda-10.1
#python -u /misc/vlgscratch4/LakeGroup/emin/baby_vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --model-name 'random' --num-classes 26 --subsample
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --model-name 'moco_img_0005' --num-classes 26 --subsample
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --model-name 'moco_temp_0005' --num-classes 26 --subsample
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_5/' --model-name 'TC-S' --num-outs 2765
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --model-name 'mobilenetV2_A_5fps_2000cls_coloraug' --num-outs 1786
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/' --model-name 'mobilenetV2_Y_5fps_2000cls_coloraug' --num-outs 1718
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_5/' --model-name 'TC-SAY' --num-outs 6269
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/linear_decoding.py '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_5/' --model-name 'TC-S' --num-outs 2765
echo "Done"
================================================
FILE: scripts/moco_img.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --gres=gpu:titanrtx:4
#SBATCH --mem=150GB
#SBATCH --time=48:00:00
#SBATCH --array=0
#SBATCH --job-name=moco_img
#SBATCH --output=moco_img_%A_%a.out
module purge
module load cuda-10.1
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/moco_img.py \
-a resnext50_32x4d \
--lr 0.015 \
--batch-size 256 \
--mlp \
--moco-t 0.2 \
--aug-plus --cos \
--dist-url 'tcp://localhost:10001' \
--multiprocessing-distributed \
--world-size 1 --rank 0 \
--start-epoch 0 \
--resume '' \
'/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_data_5fps_2000cls_pytorch/'
echo "Done"
================================================
FILE: scripts/moco_temp.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --gres=gpu:v100:4
#SBATCH --mem=150GB
#SBATCH --time=48:00:00
#SBATCH --array=0
#SBATCH --job-name=moco_temp
#SBATCH --output=moco_temp_%A_%a.out
module purge
module load cuda-10.1
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/moco_temp.py \
-a resnext50_32x4d \
--lr 0.015 \
--batch-size 256 \
--mlp \
--moco-t 0.2 \
--aug-plus --cos \
--dist-url 'tcp://localhost:10001' \
--multiprocessing-distributed \
--world-size 1 --rank 0 \
--start-epoch 0 \
--resume '' \
'/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_data_5fps_2000cls_pytorch/'
echo "Done"
================================================
FILE: scripts/read_saycam.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=16GB
#SBATCH --time=48:00:00
#SBATCH --array=0
#SBATCH --job-name=read_saycam
#SBATCH --output=read_saycam_%A_%a.out
module purge
module load cuda-10.1
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/read_saycam.py \
--save-dir '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_15fps_288s' \
--fps 15 \
--seg-len 288 \
'/misc/vlgscratch4/LakeGroup/emin/headcam/data_2/S'
echo "Done"
================================================
FILE: scripts/selectivities.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --gres=gpu:2
#SBATCH --mem=64GB
#SBATCH --time=1:00:00
#SBATCH --array=0
#SBATCH --job-name=selectivity
#SBATCH --output=selectivity_%A_%a.out
module purge
module load cuda-10.1
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/selectivities.py \
--n_out 1000 \
--model-path '' \
--layer 18 \
'/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_clean_labeled_data_1fps_4/'
echo "Done"
================================================
FILE: scripts/temporal_classification.sh
================================================
#!/bin/bash
#SBATCH --nodes=1
#SBATCH --ntasks=16
#SBATCH --gres=gpu:v100:4
#SBATCH --mem=150GB
#SBATCH --time=48:00:00
#SBATCH --array=0
#SBATCH --job-name=tempclas
#SBATCH --output=tempclas_%A_%a.out
module purge
module load cuda-10.1
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/temporal_classification.py --model 'resnext50_32x4d' --n_out 6269 --resume 'resnext50_32x4d_augmentstrong_batch256_True_SAY_5_288_epoch_15.tar' --start-epoch 16 '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/SAY_data_5fps_2000cls_pytorch/'
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/temporal_classification.py --model 'resnext50_32x4d' --n_out 2765 '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/S_data_5fps_2000cls_pytorch/'
#python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/temporal_classification.py --model 'resnext50_32x4d' --n_out 1786 '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/A_data_5fps_2000cls_pytorch/'
python -u /misc/vlgscratch4/LakeGroup/emin/baby-vision/temporal_classification.py --model 'resnext50_32x4d' --n_out 1718 '/misc/vlgscratch4/LakeGroup/emin/headcam/preprocessing/Y_data_5fps_2000cls_pytorch/'
echo "Done"
================================================
FILE: selectivities.py
================================================
'''Measure single feature class selectivities'''
import os
import argparse
import numpy as np
import torch
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
from torchvision.utils import make_grid
def extract_map_layer_7x7(mobilenetV2_model):
layer_list = list(mobilenetV2_model.module.features.children())
new_model = torch.nn.Sequential(*layer_list)
return new_model
def extract_map_layer_14x14(mobilenetV2_model, layer):
layer_list = list(mobilenetV2_model.module.features.children())
new_layer_list = layer_list[:-layer]
new_layer_list.append(layer_list[-layer].conv[0])
new_model = torch.nn.Sequential(*new_layer_list)
return new_model
def load_model(args):
model = models.mobilenet_v2(pretrained=True)
model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True)
model = torch.nn.DataParallel(model).cuda()
if args.model_path:
if os.path.isfile(args.model_path):
checkpoint = torch.load(args.model_path)
model.load_state_dict(checkpoint['model_state_dict'])
else:
print("=> no checkpoint found at '{}'".format(args.model_path))
return model
def load_data(data_dir, args):
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
train_dataset = datasets.ImageFolder(
data_dir,
transforms.Compose([transforms.ToTensor(), normalize])
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=False,
num_workers=args.workers, pin_memory=True, sampler=None
)
return train_loader
def predict(data_loader, model):
targets = []
preds = []
# switch to evaluate mode
model.eval()
with torch.no_grad():
for i, (images, target) in enumerate(data_loader):
images = images.cuda()
# compute predictions
pred = model(images)
pred = torch.mean(pred, dim=(2, 3))
targets.append(target.cpu().numpy())
preds.append(pred.cpu().numpy())
print('Iter:', i)
targets = np.concatenate(targets, axis=0)
preds = np.concatenate(preds, axis=0)
print('Targets size:', targets.shape)
print('Preds size:', preds.shape)
return targets, preds
if __name__ == '__main__':
parser = argparse.ArgumentParser(description='Measure single feature class selectivities')
parser.add_argument('data', metavar='DIR', help='path to dataset')
parser.add_argument('--workers', default=4, type=int, help='number of data loading workers (default: 4)')
parser.add_argument('--batch-size', default=580, type=int, help='mini-batch size, this is the total '
'batch size of all GPUs on the current node when '
'using Data Parallel or Distributed Data Parallel')
parser.add_argument('--model-path', default='', type=str, help='path to model checkpoint '
'(default: ImageNet-pretrained)')
parser.add_argument('--n_out', default=1000, type=int, help='output dim of pre-trained model')
parser.add_argument('--layer', default=1, type=int, choices=[1, 2, 6, 10, 14, 18], help='which layer?')
args = parser.parse_args()
model = load_model(args)
if args.layer == 1:
map_layer = extract_map_layer_7x7(model)
else:
map_layer = extract_map_layer_14x14(model, args.layer)
data_loader = load_data(args.data, args)
targets, preds = predict(data_loader, map_layer)
n_classes = 26
n_neurons = preds.shape[1]
class_matrix_mean = np.zeros((n_neurons, n_classes))
class_matrix_std = np.zeros((n_neurons, n_classes))
for i in range(n_neurons):
for j in range(n_classes):
aux_vec = preds[targets==j, i]
class_matrix_mean[i, j] = np.mean(aux_vec)
class_matrix_std[i, j] = np.std(aux_vec)
sorted_mean = np.sort(class_matrix_mean, axis=1)
selectivity = (sorted_mean[:, -1] - np.mean(sorted_mean[:, :-1], axis=1)) / \
(sorted_mean[:, -1] + np.mean(sorted_mean[:, :-1], axis=1))
print('Most selective 10 features:', np.argsort(selectivity)[-10:])
print('Highest 10 selectivities:', np.sort(selectivity)[-10:])
print('Selectivity shape:', selectivity.shape)
np.save('selectivity_' + str(args.layer) + '.npy', selectivity)
================================================
FILE: temporal_classification.py
================================================
import argparse
import os
import random
import shutil
import time
import warnings
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.distributed as dist
import torch.optim
import torch.multiprocessing as mp
import torch.utils.data
import torch.utils.data.distributed
import torchvision.transforms as transforms
import torchvision.datasets as datasets
import torchvision.models as models
from utils import GaussianBlur
parser = argparse.ArgumentParser(description='Temporal classification with headcam data')
parser.add_argument('data', metavar='DIR', help='path to dataset')
parser.add_argument('--model', default='resnet50', choices=['resnet50', 'resnext101_32x8d', 'resnext50_32x4d',
'mobilenet_v2'], help='model')
parser.add_argument('-j', '--workers', default=32, type=int, metavar='N', help='number of data loading workers (default'
':16)')
parser.add_argument('--epochs', default=16, type=int, metavar='N', help='number of total epochs to run')
parser.add_argument('--start-epoch', default=0, type=int, metavar='N', help='manual epoch number (useful on restarts)')
parser.add_argument('-b', '--batch-size', default=256, type=int, metavar='N',
help='mini-batch size (default: 128), this is the total batch size of all GPUs on the current node '
'when using Data Parallel or Distributed Data Parallel')
parser.add_argument('--lr', '--learning-rate', default=0.0005, type=float, metavar='LR', help='initial learning rate',
dest='lr')
parser.add_argument('--wd', '--weight-decay', default=0.0, type=float, metavar='W', help='weight decay (default: 0)',
dest='weight_decay')
parser.add_argument('-p', '--print-freq', default=10000, type=int, metavar='N', help='print frequency (default: 250)')
parser.add_argument('--resume', default='', type=str, metavar='PATH', help='path to latest checkpoint (default: none)')
parser.add_argument('--world-size', default=-1, type=int, help='number of nodes for distributed training')
parser.add_argument('--rank', default=-1, type=int, help='node rank for distributed training')
parser.add_argument('--dist-url', default='tcp://224.66.41.62:23456', type=str, help='url used to set up distributed '
'training')
parser.add_argument('--dist-backend', default='nccl', type=str, help='distributed backend')
parser.add_argument('--gpu', default=None, type=int, help='GPU id to use.')
parser.add_argument('--multiprocessing-distributed', action='store_true',
help='Use multi-processing distributed training to launch '
'N processes per node, which has N GPUs. This is the '
'fastest way to use PyTorch for either single node or '
'multi node data parallel training')
parser.add_argument('--n_out', default=1000, type=int, help='output dim')
parser.add_argument('--augmentation', default=True, action='store_false', help='whether to use data augmentation?')
def main():
args = parser.parse_args()
print(args)
if args.gpu is not None:
warnings.warn('You have chosen a specific GPU. This will completely disable data parallelism.')
if args.dist_url == "env://" and args.world_size == -1:
args.world_size = int(os.environ["WORLD_SIZE"])
args.distributed = args.world_size > 1 or args.multiprocessing_distributed
ngpus_per_node = torch.cuda.device_count()
if args.multiprocessing_distributed:
# Since we have ngpus_per_node processes per node, the total world_size needs to be adjusted accordingly
args.world_size = ngpus_per_node * args.world_size
# Use torch.multiprocessing.spawn to launch distributed processes: the main_worker process function
mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
else:
# Simply call main_worker function
main_worker(args.gpu, ngpus_per_node, args)
def main_worker(gpu, ngpus_per_node, args):
args.gpu = gpu
if args.gpu is not None:
print("Use GPU: {} for training".format(args.gpu))
print('Model:', args.model)
model = models.__dict__[args.model](pretrained=False)
if args.model.startswith('res'):
model.fc = torch.nn.Linear(in_features=2048, out_features=args.n_out, bias=True)
else:
model.classifier = torch.nn.Linear(in_features=1280, out_features=args.n_out, bias=True)
# DataParallel will divide and allocate batch_size to all available GPUs
model = torch.nn.DataParallel(model).cuda()
# define loss function (criterion) and optimizer
criterion = nn.CrossEntropyLoss().cuda(args.gpu)
optimizer = torch.optim.Adam(model.parameters(), args.lr, weight_decay=args.weight_decay)
cudnn.benchmark = True
if args.resume:
if os.path.isfile(args.resume):
print(args.resume)
checkpoint = torch.load(args.resume)
model.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
else:
print("=> no checkpoint found at '{}'".format(args.resume))
savefile_name = args.model + '_augmentstrong_batch256_' + str(args.augmentation) + '_Y_5_288'
normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
if args.augmentation:
train_dataset = datasets.ImageFolder(
args.data,
transforms.Compose([
transforms.RandomResizedCrop(224, scale=(0.2, 1.)),
transforms.RandomApply([transforms.ColorJitter(0.9, 0.9, 0.9, 0.5)], p=0.9),
transforms.RandomGrayscale(p=0.2),
transforms.RandomApply([GaussianBlur([.1, 2.])], p=0.5),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
normalize
])
)
else:
train_dataset = datasets.ImageFolder(
args.data,
transforms.Compose([
transforms.ToTensor(),
normalize
])
)
train_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=args.batch_size, shuffle=True,
num_workers=args.workers, pin_memory=True, sampler=None
)
acc1_list = []
for epoch in range(args.start_epoch, args.epochs):
# train for one epoch
acc1 = train(train_loader, model, criterion, optimizer, epoch, args)
acc1_list.append(acc1)
torch.save({'acc1_list': acc1_list,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict()}, savefile_name + '_epoch_' + str(epoch) + '.tar')
def train(train_loader, model, criterion, optimizer, epoch, args):
batch_time = AverageMeter('Time', ':6.3f')
data_time = AverageMeter('Data', ':6.3f')
losses = AverageMeter('Loss', ':.4e')
top1 = AverageMeter('Acc@1', ':6.2f')
top5 = AverageMeter('Acc@5', ':6.2f')
progress = ProgressMeter(
len(train_loader),
[batch_time, data_time, losses, top1, top5],
prefix="Epoch: [{}]".format(epoch))
# switch to train mode
model.train()
end = time.time()
for i, (images, target) in enumerate(train_loader):
# measure data loading time
data_time.update(time.time() - end)
if args.gpu is not None:
images = images.cuda(args.gpu, non_blocking=True)
target = target.cuda(args.gpu, non_blocking=True)
# compute output
output = model(images)
loss = criterion(output, target)
# measure accuracy and record loss
acc1, acc5 = accuracy(output, target, topk=(1, 5))
losses.update(loss.item(), images.size(0))
top1.update(acc1[0], images.size(0))
top5.update(acc5[0], images.size(0))
# compute gradient and do SGD step
optimizer.zero_grad()
loss.backward()
optimizer.step()
# measure elapsed time
batch_time.update(time.time() - end)
end = time.time()
if i % args.print_freq == 0:
progress.display(i)
return top1.avg.cpu().numpy()
class AverageMeter(object):
"""Computes and stores the average and current value"""
def __init__(self, name, fmt=':f'):
self.name = name
self.fmt = fmt
self.reset()
def reset(self):
self.val = 0
self.avg = 0
self.sum = 0
self.count = 0
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
def __str__(self):
fmtstr = '{name} {val' + self.fmt + '} ({avg' + self.fmt + '})'
return fmtstr.format(**self.__dict__)
class ProgressMeter(object):
def __init__(self, num_batches, meters, prefix=""):
self.batch_fmtstr = self._get_batch_fmtstr(num_batches)
self.meters = meters
self.prefix = prefix
def display(self, batch):
entries = [self.prefix + self.batch_fmtstr.format(batch)]
entries += [str(meter) for meter in self.meters]
print('\t'.join(entries))
def _get_batch_fmtstr(self, num_batches):
num_digits = len(str(num_batches // 1))
fmt = '{:' + str(num_digits) + 'd}'
return '[' + fmt + '/' + fmt.format(num_batches) + ']'
def accuracy(output, target, topk=(1,)):
"""Computes the accuracy over the k top predictions for the specified values of k"""
with torch.no_grad():
maxk = max(topk)
batch_size = target.size(0)
_, pred = output.topk(maxk, 1, True, True)
pred = pred.t()
correct = pred.eq(target.view(1, -1).expand_as(pred))
res = []
for k in topk:
correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
res.append(correct_k.mul_(100.0 / batch_size))
return res
if __name__ == '__main__':
main()
================================================
FILE: utils.py
================================================
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved
from PIL import ImageFilter
import random
class GaussianBlur(object):
"""Gaussian blur augmentation in SimCLR https://arxiv.org/abs/2002.05709"""
def __init__(self, sigma=[.1, 2.]):
self.sigma = sigma
def __call__(self, x):
sigma = random.uniform(self.sigma[0], self.sigma[1])
x = x.filter(ImageFilter.GaussianBlur(radius=sigma))
return x
gitextract_uh48hnz9/ ├── .gitignore ├── LICENSE ├── README.md ├── feature_animation.py ├── feature_animation_class.py ├── highly_activating_imgs.py ├── hog_baseline.py ├── imagenet_finetuning.py ├── linear_combination_maps.py ├── linear_decoding.py ├── moco/ │ ├── __init__.py │ ├── builder.py │ └── loader.py ├── moco_img.py ├── moco_temp.py ├── moco_utils.py ├── read_saycam.py ├── scripts/ │ ├── feature_animation.sh │ ├── feature_animation_class.sh │ ├── highly_activating_imgs.sh │ ├── hog_baseline.sh │ ├── imagenet_finetuning.sh │ ├── linear_combination_maps.sh │ ├── linear_decoding.sh │ ├── moco_img.sh │ ├── moco_temp.sh │ ├── read_saycam.sh │ ├── selectivities.sh │ └── temporal_classification.sh ├── selectivities.py ├── temporal_classification.py └── utils.py
SYMBOL INDEX (133 symbols across 14 files)
FILE: feature_animation.py
function extract_map_layer_7x7_res (line 18) | def extract_map_layer_7x7_res(res_model):
function extract_map_layer_7x7 (line 23) | def extract_map_layer_7x7(mobilenetV2_model):
function extract_map_layer_14x14 (line 28) | def extract_map_layer_14x14(mobilenetV2_model):
function load_model_res (line 35) | def load_model_res(args):
function load_model (line 49) | def load_model(args):
function load_data (line 63) | def load_data(data_dir, args):
function predict (line 78) | def predict(data_loader, model, batch_size, feature_idx):
function show_img (line 118) | def show_img(ax, img, save_name):
FILE: feature_animation_class.py
function extract_map_layer_7x7_res (line 18) | def extract_map_layer_7x7_res(res_model):
function extract_map_layer_7x7 (line 23) | def extract_map_layer_7x7(mobilenetV2_model):
function extract_map_layer_14x14 (line 28) | def extract_map_layer_14x14(mobilenetV2_model):
function load_model_res (line 35) | def load_model_res(args):
function load_model (line 49) | def load_model(args):
function load_data (line 63) | def load_data(data_dir, args):
function predict (line 78) | def predict(data_loader, model, batch_size, weights):
function show_img (line 119) | def show_img(ax, img, save_name):
FILE: highly_activating_imgs.py
function extract_map_layer_7x7 (line 14) | def extract_map_layer_7x7(mobilenetV2_model):
function extract_map_layer_14x14 (line 19) | def extract_map_layer_14x14(mobilenetV2_model, layer):
function load_model (line 26) | def load_model(args):
function load_data (line 40) | def load_data(data_dir, args):
function predict (line 55) | def predict(data_loader, model, neuron_idx):
function show_img (line 77) | def show_img(ax, img, save_name):
FILE: imagenet_finetuning.py
function set_parameter_requires_grad (line 52) | def set_parameter_requires_grad(model, feature_extracting=True):
function main (line 62) | def main():
function main_worker (line 84) | def main_worker(gpu, ngpus_per_node, args):
function train (line 209) | def train(train_loader, model, criterion, optimizer, epoch, args):
function validate (line 257) | def validate(val_loader, model, args):
class AverageMeter (line 284) | class AverageMeter(object):
method __init__ (line 286) | def __init__(self, name, fmt=':f'):
method reset (line 291) | def reset(self):
method update (line 297) | def update(self, val, n=1):
method __str__ (line 303) | def __str__(self):
class ProgressMeter (line 308) | class ProgressMeter(object):
method __init__ (line 309) | def __init__(self, num_batches, meters, prefix=""):
method display (line 314) | def display(self, batch):
method _get_batch_fmtstr (line 319) | def _get_batch_fmtstr(self, num_batches):
function adjust_learning_rate (line 325) | def adjust_learning_rate(optimizer, epoch, args):
function accuracy (line 333) | def accuracy(output, target, topk=(1,)):
FILE: linear_combination_maps.py
function extract_map_layer_7x7 (line 13) | def extract_map_layer_7x7(mobilenetV2_model):
function extract_map_layer_14x14 (line 18) | def extract_map_layer_14x14(mobilenetV2_model):
function load_model (line 25) | def load_model(args):
function load_data (line 39) | def load_data(data_dir, args):
function predict (line 54) | def predict(data_loader, model, weights, batch_size):
function show_img (line 84) | def show_img(ax, img, save_name):
FILE: linear_decoding.py
function set_parameter_requires_grad (line 52) | def set_parameter_requires_grad(model, feature_extracting=True):
function load_split_train_test (line 59) | def load_split_train_test(datadir, args, train_frac=0.5):
function main (line 96) | def main():
function main_worker (line 118) | def main_worker(gpu, ngpus_per_node, args):
function train (line 201) | def train(train_loader, model, criterion, optimizer, epoch, args):
function validate (line 252) | def validate(val_loader, model, args):
class AverageMeter (line 284) | class AverageMeter(object):
method __init__ (line 286) | def __init__(self, name, fmt=':f'):
method reset (line 291) | def reset(self):
method update (line 297) | def update(self, val, n=1):
method __str__ (line 303) | def __str__(self):
class ProgressMeter (line 308) | class ProgressMeter(object):
method __init__ (line 309) | def __init__(self, num_batches, meters, prefix=""):
method display (line 314) | def display(self, batch):
method _get_batch_fmtstr (line 319) | def _get_batch_fmtstr(self, num_batches):
function accuracy (line 325) | def accuracy(output, target, topk=(1,)):
FILE: moco/builder.py
class MoCo (line 6) | class MoCo(nn.Module):
method __init__ (line 11) | def __init__(self, base_encoder, dim=128, K=65536, m=0.999, T=0.07, ml...
method _momentum_update_key_encoder (line 48) | def _momentum_update_key_encoder(self):
method _dequeue_and_enqueue (line 56) | def _dequeue_and_enqueue(self, keys):
method _batch_shuffle_ddp (line 72) | def _batch_shuffle_ddp(self, x):
method _batch_unshuffle_ddp (line 100) | def _batch_unshuffle_ddp(self, x, idx_unshuffle):
method forward (line 118) | def forward(self, im_q, im_k):
function concat_all_gather (line 168) | def concat_all_gather(tensor):
FILE: moco/loader.py
class TwoCropsTransform (line 6) | class TwoCropsTransform:
method __init__ (line 9) | def __init__(self, base_transform):
method __call__ (line 12) | def __call__(self, x):
class GaussianBlur (line 18) | class GaussianBlur(object):
method __init__ (line 21) | def __init__(self, sigma=[.1, 2.]):
method __call__ (line 24) | def __call__(self, x):
FILE: moco_img.py
function main (line 101) | def main():
function main_worker (line 136) | def main_worker(gpu, ngpus_per_node, args):
function train (line 286) | def train(train_loader, model, criterion, optimizer, epoch, args):
function save_checkpoint (line 334) | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
class AverageMeter (line 340) | class AverageMeter(object):
method __init__ (line 342) | def __init__(self, name, fmt=':f'):
method reset (line 347) | def reset(self):
method update (line 353) | def update(self, val, n=1):
method __str__ (line 359) | def __str__(self):
class ProgressMeter (line 364) | class ProgressMeter(object):
method __init__ (line 365) | def __init__(self, num_batches, meters, prefix=""):
method display (line 370) | def display(self, batch):
method _get_batch_fmtstr (line 375) | def _get_batch_fmtstr(self, num_batches):
function adjust_learning_rate (line 381) | def adjust_learning_rate(optimizer, epoch, args):
function accuracy (line 393) | def accuracy(output, target, topk=(1,)):
FILE: moco_temp.py
function main (line 102) | def main():
function main_worker (line 137) | def main_worker(gpu, ngpus_per_node, args):
function train (line 285) | def train(train_loader, model, criterion, optimizer, epoch, args):
function save_checkpoint (line 333) | def save_checkpoint(state, is_best, filename='checkpoint.pth.tar'):
class AverageMeter (line 339) | class AverageMeter(object):
method __init__ (line 341) | def __init__(self, name, fmt=':f'):
method reset (line 346) | def reset(self):
method update (line 352) | def update(self, val, n=1):
method __str__ (line 358) | def __str__(self):
class ProgressMeter (line 363) | class ProgressMeter(object):
method __init__ (line 364) | def __init__(self, num_batches, meters, prefix=""):
method display (line 369) | def display(self, batch):
method _get_batch_fmtstr (line 374) | def _get_batch_fmtstr(self, num_batches):
function adjust_learning_rate (line 380) | def adjust_learning_rate(optimizer, epoch, args):
function accuracy (line 392) | def accuracy(output, target, topk=(1,)):
FILE: moco_utils.py
class DistributedProxySampler (line 7) | class DistributedProxySampler(DistributedSampler):
method __init__ (line 25) | def __init__(self, sampler, num_replicas=None, rank=None):
method __iter__ (line 29) | def __iter__(self):
method set_epoch (line 46) | def set_epoch(self, epoch):
class ContrastiveBatchSampler (line 49) | class ContrastiveBatchSampler(Sampler):
method __init__ (line 50) | def __init__(self, data_source, batch_size, pos_window, drop_last):
method __iter__ (line 57) | def __iter__(self):
method __len__ (line 68) | def __len__(self):
FILE: selectivities.py
function extract_map_layer_7x7 (line 12) | def extract_map_layer_7x7(mobilenetV2_model):
function extract_map_layer_14x14 (line 17) | def extract_map_layer_14x14(mobilenetV2_model, layer):
function load_model (line 24) | def load_model(args):
function load_data (line 38) | def load_data(data_dir, args):
function predict (line 53) | def predict(data_loader, model):
FILE: temporal_classification.py
function main (line 55) | def main():
function main_worker (line 79) | def main_worker(gpu, ngpus_per_node, args):
function train (line 153) | def train(train_loader, model, criterion, optimizer, epoch, args):
class AverageMeter (line 201) | class AverageMeter(object):
method __init__ (line 203) | def __init__(self, name, fmt=':f'):
method reset (line 208) | def reset(self):
method update (line 214) | def update(self, val, n=1):
method __str__ (line 220) | def __str__(self):
class ProgressMeter (line 225) | class ProgressMeter(object):
method __init__ (line 226) | def __init__(self, num_batches, meters, prefix=""):
method display (line 231) | def display(self, batch):
method _get_batch_fmtstr (line 236) | def _get_batch_fmtstr(self, num_batches):
function accuracy (line 242) | def accuracy(output, target, topk=(1,)):
FILE: utils.py
class GaussianBlur (line 6) | class GaussianBlur(object):
method __init__ (line 9) | def __init__(self, sigma=[.1, 2.]):
method __call__ (line 12) | def __call__(self, x):
Condensed preview — 32 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (138K chars).
[
{
"path": ".gitignore",
"chars": 79,
"preview": "# File extensions\n*.out\n*.mp4\n\n# Directories\n/__pychache__\n/moco/__pychache__\n\n"
},
{
"path": "LICENSE",
"chars": 1067,
"preview": "MIT License\n\nCopyright (c) 2021 Emin Orhan\n\nPermission is hereby granted, free of charge, to any person obtaining a copy"
},
{
"path": "README.md",
"chars": 9246,
"preview": "# Self-supervised learning through the eyes of a child\n\nThis repository contains code for reproducing the results report"
},
{
"path": "feature_animation.py",
"chars": 6614,
"preview": "'''Animating features on short clips'''\nimport os\nimport argparse\nimport numpy as np\nimport torch\nimport torchvision.tra"
},
{
"path": "feature_animation_class.py",
"chars": 6721,
"preview": "'''Animating features on short clips'''\nimport os\nimport argparse\nimport numpy as np\nimport torch\nimport torchvision.tra"
},
{
"path": "highly_activating_imgs.py",
"chars": 4445,
"preview": "'''Plots highly activating images'''\nimport os\nimport argparse\nimport numpy as np\nimport torch\nimport torchvision.transf"
},
{
"path": "hog_baseline.py",
"chars": 2314,
"preview": "'''HoG baseline'''\nimport os\nimport argparse\nimport numpy as np\nfrom skimage.feature import hog\nfrom skimage.io import i"
},
{
"path": "imagenet_finetuning.py",
"chars": 13554,
"preview": "import argparse\nimport os\nimport time\nimport warnings\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.parallel\nimpor"
},
{
"path": "linear_combination_maps.py",
"chars": 4797,
"preview": "'''Plots spatial attention maps'''\nimport os\nimport argparse\nimport numpy as np\nimport torch\nimport torchvision.transfor"
},
{
"path": "linear_decoding.py",
"chars": 13650,
"preview": "import argparse\nimport os\nimport random\nimport shutil\nimport time\nimport warnings\nimport numpy as np\n\nimport torch\nimpor"
},
{
"path": "moco/__init__.py",
"chars": 71,
"preview": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\n"
},
{
"path": "moco/builder.py",
"chars": 5933,
"preview": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport torch\nimport torch.nn as nn\n\n\nclass MoCo(n"
},
{
"path": "moco/loader.py",
"chars": 758,
"preview": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nfrom PIL import ImageFilter\nimport random\n\n\nclass"
},
{
"path": "moco_img.py",
"chars": 16391,
"preview": "#!/usr/bin/env python\n# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport argparse\nimport buil"
},
{
"path": "moco_temp.py",
"chars": 16515,
"preview": "#!/usr/bin/env python\n# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nimport argparse\nimport buil"
},
{
"path": "moco_utils.py",
"chars": 2718,
"preview": "# Defines some util functions\nimport torch\nfrom torch.utils.data import Sampler\nfrom torch.utils.data.distributed import"
},
{
"path": "read_saycam.py",
"chars": 3382,
"preview": "import os\nimport sys\nimport argparse\nimport cv2\nimport numpy as np\n\n\nparser = argparse.ArgumentParser(description='Read "
},
{
"path": "scripts/feature_animation.sh",
"chars": 588,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=8\n#SBATCH --gres=gpu:1080ti:2\n#SBATCH --mem=150GB\n#SBATCH --time=1:00:00"
},
{
"path": "scripts/feature_animation_class.sh",
"chars": 711,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --exclude=hpc1,hpc2,hpc3,hpc4,hpc5,hpc6,hpc7,hpc8,hpc9,vine3,vine4,vine6,vine11,v"
},
{
"path": "scripts/highly_activating_imgs.sh",
"chars": 527,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=1\n#SBATCH --gres=gpu:1\n#SBATCH --cpus-per-task=1\n#SBATCH --mem=16GB\n#SBA"
},
{
"path": "scripts/hog_baseline.sh",
"chars": 360,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=1\n#SBATCH --mem=100GB\n#SBATCH --time=6:00:00\n#"
},
{
"path": "scripts/imagenet_finetuning.sh",
"chars": 1387,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=16\n#SBATCH --gres=gpu:titanrtx:4\n#SBATCH --mem=150GB\n#SBATCH --time=48:0"
},
{
"path": "scripts/linear_combination_maps.sh",
"chars": 520,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=1\n#SBATCH --gres=gpu:1\n#SBATCH --cpus-per-task=1\n#SBATCH --mem=16GB\n#SBA"
},
{
"path": "scripts/linear_decoding.sh",
"chars": 1977,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=8\n#SBATCH --gres=gpu:titanrtx:2\n#SBATCH --mem=150GB\n#SBATCH --time=12:00"
},
{
"path": "scripts/moco_img.sh",
"chars": 658,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=16\n#SBATCH --gres=gpu:titanrtx:4\n#SBATCH --mem=150GB\n#SBATCH --time=48:0"
},
{
"path": "scripts/moco_temp.sh",
"chars": 657,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=16\n#SBATCH --gres=gpu:v100:4\n#SBATCH --mem=150GB\n#SBATCH --time=48:00:00"
},
{
"path": "scripts/read_saycam.sh",
"chars": 499,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=1\n#SBATCH --mem=16GB\n#SBATCH --time=48:00:00\n#"
},
{
"path": "scripts/selectivities.sh",
"chars": 457,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=1\n#SBATCH --gres=gpu:2\n#SBATCH --mem=64GB\n#SBATCH --time=1:00:00\n#SBATCH"
},
{
"path": "scripts/temporal_classification.sh",
"chars": 1184,
"preview": "#!/bin/bash\n\n#SBATCH --nodes=1\n#SBATCH --ntasks=16\n#SBATCH --gres=gpu:v100:4\n#SBATCH --mem=150GB\n#SBATCH --time=48:00:00"
},
{
"path": "selectivities.py",
"chars": 4610,
"preview": "'''Measure single feature class selectivities'''\nimport os\nimport argparse\nimport numpy as np\nimport torch\nimport torchv"
},
{
"path": "temporal_classification.py",
"chars": 10301,
"preview": "import argparse\nimport os\nimport random\nimport shutil\nimport time\nimport warnings\n\nimport torch\nimport torch.nn as nn\nim"
},
{
"path": "utils.py",
"chars": 457,
"preview": "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved\nfrom PIL import ImageFilter\nimport random\n\n\nclass"
}
]
About this extraction
This page contains the full source code of the eminorhan/baby-vision GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 32 files (130.0 KB), approximately 33.0k tokens, and a symbol index with 133 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.
Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.