Repository: hsinyilin19/ResNetVAE
Branch: master
Commit: 0befd92878da
Files: 7
Total size: 39.1 KB
Directory structure:
gitextract_r_nthkwy/
├── README.md
├── ResNetVAE_FACE.py
├── ResNetVAE_MNIST.py
├── ResNetVAE_cifar10.py
├── ResNetVAE_reconstruction.ipynb
├── modules.py
└── plot_latent.ipynb
================================================
FILE CONTENTS
================================================
================================================
FILE: README.md
================================================
# Variational Autoencoder (VAE) + Transfer learning (ResNet + VAE)
This repository implements the VAE in PyTorch, using a pretrained ResNet model as its encoder, and a transposed convolutional network as decoder.
## Datasets
### 1. MNIST
The [MNIST](http://yann.lecun.com/exdb/mnist/) database contains 60,000 training images and 10,000 testing images. Each image is saved as a 28x28 matrix.
### 2. CIFAR10
The [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class.
### 3. Olivetti faces dataset
The [Olivetti](https://scikit-learn.org/0.19/datasets/olivetti_faces.html) faces dataset consists of 10 64x64 images for 40 distinct subjects.
## Model
A [VAE](https://arxiv.org/pdf/1312.6114.pdf) model contains a pair of encoder and decoder. An encoder
compresses an 2D image *x* into a vector *z* in a lower dimension space, which is normally called the latent space, while the decoder
receives the vectors in latent space, and outputs objects in the same space as the inputs of the encoder. The training goal is to make the composition of encoder and decoder to be "as close to identity as possible". Precisely, the loss function is:
,
where
is the Kullback-Leibler divergence, and
is the standard normal distribution. The first term measures how good the reconstruction is, and second term measures how close the normal distribution and q are. After training two applications will be granted. First, the encoder can do dimension reduction. Second, the decoder can be used to reproduce input images, or even generate new images. We shall show the results of our experiments in the end.
- For our **encoder**, we do fine tuning, a technique in transfer learning, on [ResNet-152](https://arxiv.org/abs/1512.03385). ResNet-152 is a [CNN](https://en.wikipedia.org/wiki/Convolutional_neural_network) pretrained on ImageNet [ILSVRC-2012-CLS](http://www.image-net.org/challenges/LSVRC/2012/). Our **decoder** uses transposed convolution network.
## Training
- The input images are resized to **(channels, x-dim, y-dim) = (3, 224, 224)**, which is reqiured by the ResNet-152 model.
- We use ADAM in our optimization process.
## Usage
### Prerequisites
- [Python 3.6](https://www.python.org/)
- [PyTorch 1.0.0](https://pytorch.org/)
- [Numpy 1.15.0](http://www.numpy.org/)
- [Sklearn 0.19.2](https://scikit-learn.org/stable/)
- [Matplotlib](https://matplotlib.org/)
### Model ouputs
We saved labels (y coordinates), resulting latent space (z coordinates), models, and optimizers.
- Run plot_latent.ipynb to see the clustering results
- Run ResNetVAE_reconstruction.ipynb to reproduce or generate images
- Optimizer recordings are convenient for re-training.
## Results
### Clustering
With encoder compressing high dimension inputs to low dimension latent space, we can use it to see the clustering of data points.
### Reproduce and generate images
The decoder reproduces the input images from the latent space. Not only so, it can even generate new images, which are not in the original datasets.
================================================
FILE: ResNetVAE_FACE.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import torchvision.transforms as transforms
import torch.utils.data as data
import torchvision
from torch.autograd import Variable
import matplotlib.pyplot as plt
from modules import *
from sklearn.datasets import fetch_olivetti_faces
from torch.utils.data import Dataset, DataLoader, TensorDataset
from skimage.transform import resize
from sklearn.model_selection import train_test_split
import pickle
# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
# os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# EncoderCNN architecture
CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024
CNN_embed_dim = 256 # latent dim extracted by 2D CNN
res_size = 224 # ResNet image size
dropout_p = 0.2 # dropout probability
# training parameters
epochs = 100 # training epochs
batch_size = 50
learning_rate = 1e-3
log_interval = 10 # interval for displaying training info
# save model
save_model_path = './results_Olivetti_face'
def check_mkdir(dir_name):
if not os.path.exists(dir_name):
os.mkdir(dir_name)
def loss_function(recon_x, x, mu, logvar):
# MSE = F.mse_loss(recon_x, x, reduction='sum')
MSE = F.binary_cross_entropy(recon_x, x, reduction='sum')
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return MSE + KLD
def train(log_interval, model, device, train_loader, optimizer, epoch):
# set model as training mode
model.train()
losses = []
all_X, all_y, all_z, all_mu, all_logvar = [], [], [], [], []
N_count = 0 # counting total trained sample in one epoch
for batch_idx, (X, y) in enumerate(train_loader):
# distribute data to device
X, y = X.to(device), y.to(device).view(-1, )
N_count += X.size(0)
optimizer.zero_grad()
X_reconst, z, mu, logvar = model(X) # VAE
loss = loss_function(X_reconst, X, mu, logvar)
losses.append(loss.item())
loss.backward()
optimizer.step()
all_X.extend(X.data.cpu().numpy())
all_y.extend(y.data.cpu().numpy())
all_z.extend(z.data.cpu().numpy())
all_mu.extend(mu.data.cpu().numpy())
all_logvar.extend(logvar.data.cpu().numpy())
# show information
if (batch_idx + 1) % log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch + 1, N_count, len(train_loader.dataset), 100. * (batch_idx + 1) / len(train_loader), loss.item()))
all_X = np.stack(all_X, axis=0)
all_y = np.stack(all_y, axis=0)
all_z = np.stack(all_z, axis=0)
all_mu = np.stack(all_mu, axis=0)
all_logvar = np.stack(all_logvar, axis=0)
# save Pytorch models of best record
torch.save(model.state_dict(), os.path.join(save_model_path, 'model_epoch{}.pth'.format(epoch + 1))) # save motion_encoder
torch.save(optimizer.state_dict(), os.path.join(save_model_path, 'optimizer_epoch{}.pth'.format(epoch + 1))) # save optimizer
print("Epoch {} model saved!".format(epoch + 1))
return all_X, all_y, all_z, all_mu, all_logvar, losses
def validation(model, device, optimizer, test_loader):
# set model as testing mode
model.eval()
test_loss = 0
all_X, all_y, all_z, all_mu, all_logvar = [], [], [], [], []
with torch.no_grad():
for X, y in test_loader:
# distribute data to device
X, y = X.to(device), y.to(device).view(-1, )
X_reconst, z, mu, logvar = model(X)
loss = loss_function(X_reconst, X, mu, logvar)
test_loss += loss.item() # sum up batch loss
all_X.extend(X.data.cpu().numpy())
all_y.extend(y.data.cpu().numpy())
all_z.extend(z.data.cpu().numpy())
all_mu.extend(mu.data.cpu().numpy())
all_logvar.extend(logvar.data.cpu().numpy())
test_loss /= len(test_loader.dataset)
all_X = np.stack(all_X, axis=0)
all_y = np.stack(all_y, axis=0)
all_z = np.stack(all_z, axis=0)
all_mu = np.stack(all_mu, axis=0)
all_logvar = np.stack(all_logvar, axis=0)
# show information
print('\nTest set ({:d} samples): Average loss: {:.4f}\n'.format(len(test_loader.dataset), test_loss))
return all_X, all_y, all_z, all_mu, all_logvar, test_loss
# Detect devices
use_cuda = torch.cuda.is_available() # check if GPU exists
device = torch.device("cuda" if use_cuda else "cpu") # use CPU or GPU
# Data loading parameters
params = {'batch_size': batch_size, 'shuffle': True, 'num_workers': 2, 'pin_memory': True} if use_cuda else {}
# Load the faces datasets
data = fetch_olivetti_faces()
face_img = data.images.reshape((data.images.shape[0], data.images.shape[1], data.images.shape[2]))
face_img_resized = [np.tile(np.expand_dims(resize(face_img[i, :, :], (res_size, res_size), anti_aliasing=True), axis=0), (3, 1, 1)) for i in range(face_img.shape[0])]
face_img_resized = np.stack(face_img_resized, axis=0)
face_img_resized = torch.from_numpy(face_img_resized).float()
labels = torch.from_numpy(data.target)
olivetti_data = TensorDataset(face_img_resized, labels)
# Data loader (input pipeline)
train_loader = torch.utils.data.DataLoader(dataset=olivetti_data, **params)
valid_loader = torch.utils.data.DataLoader(dataset=olivetti_data, **params)
# Create model
resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device)
print("Using", torch.cuda.device_count(), "GPU!")
model_params = list(resnet_vae.parameters())
optimizer = torch.optim.Adam(model_params, lr=learning_rate)
# record training process
epoch_train_losses = []
epoch_test_losses = []
check_mkdir(save_model_path)
# start training
for epoch in range(epochs):
# train, test model
X_train, y_train, z_train, mu_train, logvar_train, train_losses = train(log_interval, resnet_vae, device, train_loader, optimizer, epoch)
X_test, y_test, z_test, mu_test, logvar_test, epoch_test_loss = validation(resnet_vae, device, optimizer, valid_loader)
# save results
epoch_train_losses.append(train_losses)
epoch_test_losses.append(epoch_test_loss)
# save all train test results
A = np.array(epoch_train_losses)
C = np.array(epoch_test_losses)
np.save(os.path.join(save_model_path, 'ResNet_VAE_training_loss.npy'), A)
np.save(os.path.join(save_model_path, 'X_Olivetti_train_epoch{}.npy'.format(epoch + 1)), X_train)
np.save(os.path.join(save_model_path, 'y_Olivetti_train_epoch{}.npy'.format(epoch + 1)), y_train)
np.save(os.path.join(save_model_path, 'z_Olivetti_train_epoch{}.npy'.format(epoch + 1)), z_train)
================================================
FILE: ResNetVAE_MNIST.py
================================================
import os
import glob
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import torchvision.transforms as transforms
import torch.utils.data as data
import torchvision
from torch.autograd import Variable
import matplotlib.pyplot as plt
from modules import *
from sklearn.model_selection import train_test_split
import pickle
# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
# os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# EncoderCNN architecture
CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024
CNN_embed_dim = 256 # latent dim extracted by 2D CNN
res_size = 224 # ResNet image size
dropout_p = 0.2 # dropout probability
# training parameters
epochs = 20 # training epochs
batch_size = 50
learning_rate = 1e-3
log_interval = 10 # interval for displaying training info
# save model
save_model_path = './results_MNIST'
def check_mkdir(dir_name):
if not os.path.exists(dir_name):
os.mkdir(dir_name)
def loss_function(recon_x, x, mu, logvar):
# MSE = F.mse_loss(recon_x, x, reduction='sum')
MSE = F.binary_cross_entropy(recon_x, x, reduction='sum')
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return MSE + KLD
def train(log_interval, model, device, train_loader, optimizer, epoch):
# set model as training mode
model.train()
losses = []
all_y, all_z, all_mu, all_logvar = [], [], [], []
N_count = 0 # counting total trained sample in one epoch
for batch_idx, (X, y) in enumerate(train_loader):
# distribute data to device
X, y = X.to(device), y.to(device).view(-1, )
N_count += X.size(0)
optimizer.zero_grad()
X_reconst, z, mu, logvar = model(X) # VAE
loss = loss_function(X_reconst, X, mu, logvar)
losses.append(loss.item())
loss.backward()
optimizer.step()
all_y.extend(y.data.cpu().numpy())
all_z.extend(z.data.cpu().numpy())
all_mu.extend(mu.data.cpu().numpy())
all_logvar.extend(logvar.data.cpu().numpy())
# show information
if (batch_idx + 1) % log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch + 1, N_count, len(train_loader.dataset), 100. * (batch_idx + 1) / len(train_loader), loss.item()))
all_y = np.stack(all_y, axis=0)
all_z = np.stack(all_z, axis=0)
all_mu = np.stack(all_mu, axis=0)
all_logvar = np.stack(all_logvar, axis=0)
# save Pytorch models of best record
torch.save(model.state_dict(), os.path.join(save_model_path, 'model_epoch{}.pth'.format(epoch + 1))) # save motion_encoder
torch.save(optimizer.state_dict(), os.path.join(save_model_path, 'optimizer_epoch{}.pth'.format(epoch + 1))) # save optimizer
print("Epoch {} model saved!".format(epoch + 1))
return X.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, losses
def validation(model, device, optimizer, test_loader):
# set model as testing mode
model.eval()
test_loss = 0
all_y, all_z, all_mu, all_logvar = [], [], [], []
with torch.no_grad():
for X, y in test_loader:
# distribute data to device
X, y = X.to(device), y.to(device).view(-1, )
X_reconst, z, mu, logvar = model(X)
loss = loss_function(X_reconst, X, mu, logvar)
test_loss += loss.item() # sum up batch loss
all_y.extend(y.data.cpu().numpy())
all_z.extend(z.data.cpu().numpy())
all_mu.extend(mu.data.cpu().numpy())
all_logvar.extend(logvar.data.cpu().numpy())
test_loss /= len(test_loader.dataset)
all_y = np.stack(all_y, axis=0)
all_z = np.stack(all_z, axis=0)
all_mu = np.stack(all_mu, axis=0)
all_logvar = np.stack(all_logvar, axis=0)
# show information
print('\nTest set ({:d} samples): Average loss: {:.4f}\n'.format(len(test_loader.dataset), test_loss))
return X.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, test_loss
# Detect devices
use_cuda = torch.cuda.is_available() # check if GPU exists
device = torch.device("cuda" if use_cuda else "cpu") # use CPU or GPU
# Data loading parameters
params = {'batch_size': batch_size, 'shuffle': True, 'num_workers': 4, 'pin_memory': True} if use_cuda else {}
transform = transforms.Compose([transforms.Resize([res_size, res_size]),
transforms.ToTensor(),
transforms.Lambda(lambda x: x.repeat(3, 1, 1)), # gray -> GRB 3 channel (lambda function)
transforms.Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0])]) # for grayscale images
# MNIST dataset (images and labels)
MNIST_train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
MNIST_test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform)
# Data loader (input pipeline)
train_loader = torch.utils.data.DataLoader(dataset=MNIST_train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = torch.utils.data.DataLoader(dataset=MNIST_test_dataset, batch_size=batch_size, shuffle=False)
# Create model
resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device)
print("Using", torch.cuda.device_count(), "GPU!")
model_params = list(resnet_vae.parameters())
optimizer = torch.optim.Adam(model_params, lr=learning_rate)
# record training process
epoch_train_losses = []
epoch_test_losses = []
check_mkdir(save_model_path)
# start training
for epoch in range(epochs):
# train, test model
X_train, y_train, z_train, mu_train, logvar_train, train_losses = train(log_interval, resnet_vae, device, train_loader, optimizer, epoch)
X_test, y_test, z_test, mu_test, logvar_test, epoch_test_loss = validation(resnet_vae, device, optimizer, valid_loader)
# save results
epoch_train_losses.append(train_losses)
epoch_test_losses.append(epoch_test_loss)
# save all train test results
A = np.array(epoch_train_losses)
C = np.array(epoch_test_losses)
np.save(os.path.join(save_model_path, 'ResNet_VAE_training_loss.npy'), A)
np.save(os.path.join(save_model_path, 'X_MNIST_train_epoch{}.npy'.format(epoch + 1)), X_train) #save last batch
np.save(os.path.join(save_model_path, 'y_MNIST_train_epoch{}.npy'.format(epoch + 1)), y_train)
np.save(os.path.join(save_model_path, 'z_MNIST_train_epoch{}.npy'.format(epoch + 1)), z_train)
================================================
FILE: ResNetVAE_cifar10.py
================================================
import os
import glob
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import torchvision.transforms as transforms
import torch.utils.data as data
import torchvision
from torch.autograd import Variable
import matplotlib.pyplot as plt
from modules import *
from sklearn.model_selection import train_test_split
import pickle
# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"
# os.environ["CUDA_VISIBLE_DEVICES"] = "0"
# EncoderCNN architecture
CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024
CNN_embed_dim = 256 # latent dim extracted by 2D CNN
res_size = 224 # ResNet image size
dropout_p = 0.2 # dropout probability
# training parameters
epochs = 20 # training epochs
batch_size = 50
learning_rate = 1e-3
log_interval = 10 # interval for displaying training info
# save model
save_model_path = './results_cifar10'
def check_mkdir(dir_name):
if not os.path.exists(dir_name):
os.mkdir(dir_name)
def loss_function(recon_x, x, mu, logvar):
# MSE = F.mse_loss(recon_x, x, reduction='sum')
MSE = F.binary_cross_entropy(recon_x, x, reduction='sum')
KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
return MSE + KLD
def train(log_interval, model, device, train_loader, optimizer, epoch):
# set model as training mode
model.train()
losses = []
all_y, all_z, all_mu, all_logvar = [], [], [], []
N_count = 0 # counting total trained sample in one epoch
for batch_idx, (X, y) in enumerate(train_loader):
# distribute data to device
X, y = X.to(device), y.to(device).view(-1, )
N_count += X.size(0)
optimizer.zero_grad()
X_reconst, z, mu, logvar = model(X) # VAE
loss = loss_function(X_reconst, X, mu, logvar)
losses.append(loss.item())
loss.backward()
optimizer.step()
all_y.extend(y.data.cpu().numpy())
all_z.extend(z.data.cpu().numpy())
all_mu.extend(mu.data.cpu().numpy())
all_logvar.extend(logvar.data.cpu().numpy())
# show information
if (batch_idx + 1) % log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch + 1, N_count, len(train_loader.dataset), 100. * (batch_idx + 1) / len(train_loader), loss.item()))
all_y = np.stack(all_y, axis=0)
all_z = np.stack(all_z, axis=0)
all_mu = np.stack(all_mu, axis=0)
all_logvar = np.stack(all_logvar, axis=0)
# save Pytorch models of best record
torch.save(model.state_dict(), os.path.join(save_model_path, 'model_epoch{}.pth'.format(epoch + 1))) # save motion_encoder
torch.save(optimizer.state_dict(), os.path.join(save_model_path, 'optimizer_epoch{}.pth'.format(epoch + 1))) # save optimizer
print("Epoch {} model saved!".format(epoch + 1))
return X_reconst.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, losses
def validation(model, device, optimizer, test_loader):
# set model as testing mode
model.eval()
test_loss = 0
all_y, all_z, all_mu, all_logvar = [], [], [], []
with torch.no_grad():
for X, y in test_loader:
# distribute data to device
X, y = X.to(device), y.to(device).view(-1, )
X_reconst, z, mu, logvar = model(X)
loss = loss_function(X_reconst, X, mu, logvar)
test_loss += loss.item() # sum up batch loss
all_y.extend(y.data.cpu().numpy())
all_z.extend(z.data.cpu().numpy())
all_mu.extend(mu.data.cpu().numpy())
all_logvar.extend(logvar.data.cpu().numpy())
test_loss /= len(test_loader.dataset)
all_y = np.stack(all_y, axis=0)
all_z = np.stack(all_z, axis=0)
all_mu = np.stack(all_mu, axis=0)
all_logvar = np.stack(all_logvar, axis=0)
# show information
print('\nTest set ({:d} samples): Average loss: {:.4f}\n'.format(len(test_loader.dataset), test_loss))
return X_reconst.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, test_loss
# Detect devices
use_cuda = torch.cuda.is_available() # check if GPU exists
device = torch.device("cuda" if use_cuda else "cpu") # use CPU or GPU
# Data loading parameters
params = {'batch_size': batch_size, 'shuffle': True, 'num_workers': 2, 'pin_memory': True} if use_cuda else {}
# transform = transforms.Compose([transforms.Resize([res_size, res_size]),
# transforms.ToTensor(),
# transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
transform = transforms.Compose([transforms.Resize([res_size, res_size]),
transforms.ToTensor(),
transforms.Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0])])
# cifar10 dataset (images and labels)
cifar10_train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
cifar10_test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
# classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
# Data loader (input pipeline)
train_loader = torch.utils.data.DataLoader(dataset=cifar10_train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = torch.utils.data.DataLoader(dataset=cifar10_test_dataset, batch_size=batch_size, shuffle=False)
# Create model
resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device)
print("Using", torch.cuda.device_count(), "GPU!")
model_params = list(resnet_vae.parameters())
optimizer = torch.optim.Adam(model_params, lr=learning_rate)
# record training process
epoch_train_losses = []
epoch_test_losses = []
check_mkdir(save_model_path)
# start training
for epoch in range(epochs):
# train, test model
X_reconst_train, y_train, z_train, mu_train, logvar_train, train_losses = train(log_interval, resnet_vae, device, train_loader, optimizer, epoch)
X_reconst_test, y_test, z_test, mu_test, logvar_test, epoch_test_loss = validation(resnet_vae, device, optimizer, valid_loader)
# save results
epoch_train_losses.append(train_losses)
epoch_test_losses.append(epoch_test_loss)
# save all train test results
A = np.array(epoch_train_losses)
C = np.array(epoch_test_losses)
np.save(os.path.join(save_model_path, 'ResNet_VAE_training_loss.npy'), A)
np.save(os.path.join(save_model_path, 'y_cifar10_train_epoch{}.npy'.format(epoch + 1)), y_train)
np.save(os.path.join(save_model_path, 'z_cifar10_train_epoch{}.npy'.format(epoch + 1)), z_train)
================================================
FILE: ResNetVAE_reconstruction.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib\n",
"import matplotlib.pyplot as plt\n",
"import os\n",
"import glob\n",
"import numpy as np\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"import torchvision.models as models\n",
"import torchvision.transforms as transforms\n",
"import torch.utils.data as data\n",
"import torchvision\n",
"from torch.autograd import Variable\n",
"import matplotlib.pyplot as plt\n",
"from modules import *\n",
"from sklearn.model_selection import train_test_split\n",
"import pickle\n",
"from sklearn.datasets import fetch_olivetti_faces\n",
"from torch.utils.data import Dataset, DataLoader, TensorDataset\n",
"from skimage.transform import resize\n",
"\n",
"\n",
"def decoder(model, device, z):\n",
" model.eval()\n",
" z = Variable(torch.FloatTensor(z)).to(device)\n",
" new_images = model.decode(z).squeeze_().data.cpu().numpy().transpose((1, 2, 0))\n",
" return new_images\n",
"\n",
"saved_model_path = './results_Olivetti_face'\n",
"# saved_model_path = './results_MNIST'\n",
"\n",
"exp = 'Olivetti'\n",
"# exp = 'MNIST'\n",
"\n",
"# use same ResNet Encoder saved earlier!\n",
"CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024\n",
"CNN_embed_dim = 256\n",
"res_size = 224 # ResNet image size\n",
"dropout_p = 0.2 # dropout probability\n",
"\n",
"epoch = 20\n",
"\n",
"\n",
"use_cuda = torch.cuda.is_available() # check if GPU exists\n",
"device = torch.device(\"cuda\" if use_cuda else \"cpu\") # use CPU or GPU\n",
"\n",
"# reload ResNetVAE model\n",
"resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device)\n",
"resnet_vae.load_state_dict(torch.load(os.path.join(saved_model_path, 'model_epoch{}.pth'.format(epoch))))\n",
"\n",
"print('ResNetVAE epoch {} model reloaded!'.format(epoch))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reconstruction "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z_train = np.load(os.path.join(saved_model_path, 'z_{}_train_epoch{}.npy').format(exp, epoch))\n",
"X_train = np.load(os.path.join(saved_model_path, 'X_{}_train_epoch{}.npy').format(exp, epoch))\n",
"\n",
"ind = 1\n",
"zz = torch.from_numpy(z_train[ind]).view(1, -1)\n",
"X = np.transpose(X_train[ind], (1, 2, 0))\n",
"\n",
"new_imgs = decoder(resnet_vae, device, zz)\n",
"\n",
"fig = plt.figure(figsize=(10, 10))\n",
"\n",
"plt.subplot(1, 2, 1)\n",
"plt.imshow(X)\n",
"plt.title('original')\n",
"plt.axis('off')\n",
"\n",
"plt.subplot(1, 2, 2)\n",
"plt.imshow(new_imgs)\n",
"plt.title('reconstructed')\n",
"plt.axis('off')\n",
"plt.savefig(\"./reconstruction_{}.png\".format(exp), bbox_inches='tight', dpi=600)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Generate new images from latent points"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"# choose two original images\n",
"sample1, sample2 = 0, 1\n",
"w = 0.4 # weight for fusing two images\n",
"\n",
"X1 = np.transpose(X_train[-sample1], (1, 2, 0))\n",
"X2 = np.transpose(X_train[-sample2], (1, 2, 0))\n",
"\n",
"# generate image using decoder\n",
"z_train = np.load(os.path.join(saved_model_path, 'z_{}_train_epoch{}.npy').format(exp, epoch))\n",
"z = z_train[-sample1] * w + z_train[-sample2] * (1 - w)\n",
"new_imgs = decoder(resnet_vae, device, torch.from_numpy(z).view(1, -1))\n",
"\n",
"fig = plt.figure(figsize=(15, 15))\n",
"\n",
"plt.subplot(1, 3, 1)\n",
"plt.imshow(X1)\n",
"plt.title('original 1')\n",
"plt.axis('off')\n",
"\n",
"plt.subplot(1, 3, 2)\n",
"plt.imshow(X2)\n",
"plt.title('original 2')\n",
"plt.axis('off')\n",
"\n",
"\n",
"plt.subplot(1, 3, 3)\n",
"plt.imshow(new_imgs)\n",
"plt.title('new image')\n",
"plt.axis('off')\n",
"plt.savefig(\"./generated_{}.png\".format(exp), bbox_inches='tight', dpi=600)\n",
"plt.show()\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
================================================
FILE: modules.py
================================================
import os
import numpy as np
from PIL import Image
from torch.utils import data
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
from torch.autograd import Variable
import torchvision.transforms as transforms
class Dataset(data.Dataset):
"Characterizes a dataset for PyTorch"
def __init__(self, filenames, labels, transform=None):
"Initialization"
self.filenames = filenames
self.labels = labels
self.transform = transform
def __len__(self):
"Denotes the total number of samples"
return len(self.filenames)
def __getitem__(self, index):
"Generates one sample of data"
# Select sample
filename = self.filenames[index]
X = Image.open(filename)
if self.transform:
X = self.transform(X) # transform
y = torch.LongTensor([self.labels[index]])
return X, y
## ---------------------- end of Dataloaders ---------------------- ##
def conv2D_output_size(img_size, padding, kernel_size, stride):
# compute output shape of conv2D
outshape = (np.floor((img_size[0] + 2 * padding[0] - (kernel_size[0] - 1) - 1) / stride[0] + 1).astype(int),
np.floor((img_size[1] + 2 * padding[1] - (kernel_size[1] - 1) - 1) / stride[1] + 1).astype(int))
return outshape
def convtrans2D_output_size(img_size, padding, kernel_size, stride):
# compute output shape of conv2D
outshape = ((img_size[0] - 1) * stride[0] - 2 * padding[0] + kernel_size[0],
(img_size[1] - 1) * stride[1] - 2 * padding[1] + kernel_size[1])
return outshape
## ---------------------- ResNet VAE ---------------------- ##
class ResNet_VAE(nn.Module):
def __init__(self, fc_hidden1=1024, fc_hidden2=768, drop_p=0.3, CNN_embed_dim=256):
super(ResNet_VAE, self).__init__()
self.fc_hidden1, self.fc_hidden2, self.CNN_embed_dim = fc_hidden1, fc_hidden2, CNN_embed_dim
# CNN architechtures
self.ch1, self.ch2, self.ch3, self.ch4 = 16, 32, 64, 128
self.k1, self.k2, self.k3, self.k4 = (5, 5), (3, 3), (3, 3), (3, 3) # 2d kernal size
self.s1, self.s2, self.s3, self.s4 = (2, 2), (2, 2), (2, 2), (2, 2) # 2d strides
self.pd1, self.pd2, self.pd3, self.pd4 = (0, 0), (0, 0), (0, 0), (0, 0) # 2d padding
# encoding components
resnet = models.resnet152(pretrained=True)
modules = list(resnet.children())[:-1] # delete the last fc layer.
self.resnet = nn.Sequential(*modules)
self.fc1 = nn.Linear(resnet.fc.in_features, self.fc_hidden1)
self.bn1 = nn.BatchNorm1d(self.fc_hidden1, momentum=0.01)
self.fc2 = nn.Linear(self.fc_hidden1, self.fc_hidden2)
self.bn2 = nn.BatchNorm1d(self.fc_hidden2, momentum=0.01)
# Latent vectors mu and sigma
self.fc3_mu = nn.Linear(self.fc_hidden2, self.CNN_embed_dim) # output = CNN embedding latent variables
self.fc3_logvar = nn.Linear(self.fc_hidden2, self.CNN_embed_dim) # output = CNN embedding latent variables
# Sampling vector
self.fc4 = nn.Linear(self.CNN_embed_dim, self.fc_hidden2)
self.fc_bn4 = nn.BatchNorm1d(self.fc_hidden2)
self.fc5 = nn.Linear(self.fc_hidden2, 64 * 4 * 4)
self.fc_bn5 = nn.BatchNorm1d(64 * 4 * 4)
self.relu = nn.ReLU(inplace=True)
# Decoder
self.convTrans6 = nn.Sequential(
nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=self.k4, stride=self.s4,
padding=self.pd4),
nn.BatchNorm2d(32, momentum=0.01),
nn.ReLU(inplace=True),
)
self.convTrans7 = nn.Sequential(
nn.ConvTranspose2d(in_channels=32, out_channels=8, kernel_size=self.k3, stride=self.s3,
padding=self.pd3),
nn.BatchNorm2d(8, momentum=0.01),
nn.ReLU(inplace=True),
)
self.convTrans8 = nn.Sequential(
nn.ConvTranspose2d(in_channels=8, out_channels=3, kernel_size=self.k2, stride=self.s2,
padding=self.pd2),
nn.BatchNorm2d(3, momentum=0.01),
nn.Sigmoid() # y = (y1, y2, y3) \in [0 ,1]^3
)
def encode(self, x):
x = self.resnet(x) # ResNet
x = x.view(x.size(0), -1) # flatten output of conv
# FC layers
x = self.bn1(self.fc1(x))
x = self.relu(x)
x = self.bn2(self.fc2(x))
x = self.relu(x)
# x = F.dropout(x, p=self.drop_p, training=self.training)
mu, logvar = self.fc3_mu(x), self.fc3_logvar(x)
return mu, logvar
def reparameterize(self, mu, logvar):
if self.training:
std = logvar.mul(0.5).exp_()
eps = Variable(std.data.new(std.size()).normal_())
return eps.mul(std).add_(mu)
else:
return mu
def decode(self, z):
x = self.relu(self.fc_bn4(self.fc4(z)))
x = self.relu(self.fc_bn5(self.fc5(x))).view(-1, 64, 4, 4)
x = self.convTrans6(x)
x = self.convTrans7(x)
x = self.convTrans8(x)
x = F.interpolate(x, size=(224, 224), mode='bilinear')
return x
def forward(self, x):
mu, logvar = self.encode(x)
z = self.reparameterize(mu, logvar)
x_reconst = self.decode(z)
return x_reconst, z, mu, logvar
================================================
FILE: plot_latent.ipynb
================================================
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import matplotlib.pyplot as plt\n",
"from mpl_toolkits.mplot3d import Axes3D\n",
"import matplotlib.cm as cm\n",
"from sklearn.manifold import TSNE\n",
"import numpy as np\n",
"import pickle\n",
"\n",
" \n",
"epoch = 20\n",
"exp = 'cifar10'\n",
"# exp = 'MNIST'\n",
"\n",
"N = 6000 # image number\n",
"\n",
"y_train = np.load('./results_{}/y_{}_train_epoch{}.npy'.format(exp, exp, epoch))\n",
"z_train = np.load('./results_{}/z_{}_train_epoch{}.npy'.format(exp, exp, epoch))\n",
"classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] # cifar10\n",
"# classes = np.arange(10) #MNIST\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Direct projection of latent space"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"y_train = y_train[:N]\n",
"z_train = z_train[:N]\n",
"\n",
"fig = plt.figure(figsize=(12, 10))\n",
"plots = []\n",
"markers = ['o', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']\n",
"for i, c in enumerate(classes):\n",
" ind = (y_train == i).tolist() or ([j < N // len(classes) for j in range(len(y_train))])\n",
" color = cm.jet([i / len(classes)] * sum(ind))\n",
" plots.append(plt.scatter(z_train[ind, 1], z_train[ind, 2], marker=markers[i], c=color, s=8, label=i))\n",
"\n",
"plt.axis('off')\n",
"plt.legend(plots, classes, fontsize=14, loc='upper right')\n",
"plt.title('{} (direct projection: {}-dim -> 2-dim)'.format(exp, z_train.shape[1]), fontsize=14)\n",
"plt.savefig(\"./ResNetVAE_{}_direct_plot.png\".format(exp), bbox_inches='tight', dpi=600)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Use t-SNE for dimension reduction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### compressed to 2-dimension"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z_embed = TSNE(n_components=2, n_iter=12000).fit_transform(z_train[:N])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig = plt.figure(figsize=(12, 10))\n",
"plots = []\n",
"markers = ['o', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd'] # select different markers\n",
"for i, c in enumerate(classes):\n",
" ind = (y_train[:N] == i).tolist()\n",
" color = cm.jet([i / len(classes)] * sum(ind))\n",
" # plot each category one at a time \n",
" plots.append(plt.scatter(z_embed[ind, 0], z_embed[ind, 1], c=color, marker=markers[i], s=8, label=i))\n",
"\n",
"plt.axis('off')\n",
"plt.xlim(-150, 150)\n",
"plt.ylim(-150, 150)\n",
"plt.legend(plots, classes, fontsize=14, loc='upper right')\n",
"plt.title('{} (t-SNE: {}-dim -> 2-dim)'.format(exp, z_train.shape[1]), fontsize=14)\n",
"plt.savefig(\"./ResNetVAE_{}_embedded_plot.png\".format(exp), bbox_inches='tight', dpi=600)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### compressed to 3-dimension"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"z_embed3D = TSNE(n_components=3, n_iter=12000).fit_transform(z_train[:N])"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"fig = plt.figure(figsize=(12, 10))\n",
"ax = fig.add_subplot(111, projection='3d')\n",
"\n",
"plots = []\n",
"markers = ['o', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd'] # select different markers\n",
"for i, c in enumerate(classes):\n",
" ind = (y_train[:N] == i).tolist()\n",
" color = cm.jet([i / len(classes)] * sum(ind))\n",
" # plot each category one at a time \n",
" ax.scatter(z_embed3D[ind, 0], z_embed3D[ind, 1], c=color, marker=markers[i], s=8, label=i)\n",
"\n",
"ax.axis('on')\n",
"\n",
"r_max = 20\n",
"r_min = -r_max\n",
"\n",
"ax.set_xlim(r_min, r_max)\n",
"ax.set_ylim(r_min, r_max)\n",
"ax.set_zlim(r_min, r_max)\n",
"ax.set_xlabel('z-dim 1')\n",
"ax.set_ylabel('z-dim 2')\n",
"ax.set_zlabel('z-dim 3')\n",
"ax.set_title('{} (t-SNE: {}-dim -> 3-dim)'.format(exp, z_train.shape[1]), fontsize=14)\n",
"ax.legend(plots, classes, fontsize=14, loc='upper right')\n",
"plt.savefig(\"./ResNetVAE_{}_embedded_3Dplot.png\".format(exp), bbox_inches='tight', dpi=600)\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 2
}