Repository: hsinyilin19/ResNetVAE Branch: master Commit: 0befd92878da Files: 7 Total size: 39.1 KB Directory structure: gitextract_r_nthkwy/ ├── README.md ├── ResNetVAE_FACE.py ├── ResNetVAE_MNIST.py ├── ResNetVAE_cifar10.py ├── ResNetVAE_reconstruction.ipynb ├── modules.py └── plot_latent.ipynb ================================================ FILE CONTENTS ================================================ ================================================ FILE: README.md ================================================ # Variational Autoencoder (VAE) + Transfer learning (ResNet + VAE) This repository implements the VAE in PyTorch, using a pretrained ResNet model as its encoder, and a transposed convolutional network as decoder. ## Datasets ### 1. MNIST The [MNIST](http://yann.lecun.com/exdb/mnist/) database contains 60,000 training images and 10,000 testing images. Each image is saved as a 28x28 matrix. ### 2. CIFAR10 The [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. ### 3. Olivetti faces dataset The [Olivetti](https://scikit-learn.org/0.19/datasets/olivetti_faces.html) faces dataset consists of 10 64x64 images for 40 distinct subjects. ## Model A [VAE](https://arxiv.org/pdf/1312.6114.pdf) model contains a pair of encoder and decoder. An encoder compresses an 2D image *x* into a vector *z* in a lower dimension space, which is normally called the latent space, while the decoder receives the vectors in latent space, and outputs objects in the same space as the inputs of the encoder. The training goal is to make the composition of encoder and decoder to be "as close to identity as possible". Precisely, the loss function is: , where is the Kullback-Leibler divergence, and is the standard normal distribution. The first term measures how good the reconstruction is, and second term measures how close the normal distribution and q are. After training two applications will be granted. First, the encoder can do dimension reduction. Second, the decoder can be used to reproduce input images, or even generate new images. We shall show the results of our experiments in the end. - For our **encoder**, we do fine tuning, a technique in transfer learning, on [ResNet-152](https://arxiv.org/abs/1512.03385). ResNet-152 is a [CNN](https://en.wikipedia.org/wiki/Convolutional_neural_network) pretrained on ImageNet [ILSVRC-2012-CLS](http://www.image-net.org/challenges/LSVRC/2012/). Our **decoder** uses transposed convolution network. ## Training - The input images are resized to **(channels, x-dim, y-dim) = (3, 224, 224)**, which is reqiured by the ResNet-152 model. - We use ADAM in our optimization process. ## Usage ### Prerequisites - [Python 3.6](https://www.python.org/) - [PyTorch 1.0.0](https://pytorch.org/) - [Numpy 1.15.0](http://www.numpy.org/) - [Sklearn 0.19.2](https://scikit-learn.org/stable/) - [Matplotlib](https://matplotlib.org/) ### Model ouputs We saved labels (y coordinates), resulting latent space (z coordinates), models, and optimizers. - Run plot_latent.ipynb to see the clustering results - Run ResNetVAE_reconstruction.ipynb to reproduce or generate images - Optimizer recordings are convenient for re-training. ## Results ### Clustering With encoder compressing high dimension inputs to low dimension latent space, we can use it to see the clustering of data points. ### Reproduce and generate images The decoder reproduces the input images from the latent space. Not only so, it can even generate new images, which are not in the original datasets. ================================================ FILE: ResNetVAE_FACE.py ================================================ import torch import torch.nn as nn import torch.nn.functional as F import torchvision.models as models import torchvision.transforms as transforms import torch.utils.data as data import torchvision from torch.autograd import Variable import matplotlib.pyplot as plt from modules import * from sklearn.datasets import fetch_olivetti_faces from torch.utils.data import Dataset, DataLoader, TensorDataset from skimage.transform import resize from sklearn.model_selection import train_test_split import pickle # os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # os.environ["CUDA_VISIBLE_DEVICES"] = "0" # EncoderCNN architecture CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024 CNN_embed_dim = 256 # latent dim extracted by 2D CNN res_size = 224 # ResNet image size dropout_p = 0.2 # dropout probability # training parameters epochs = 100 # training epochs batch_size = 50 learning_rate = 1e-3 log_interval = 10 # interval for displaying training info # save model save_model_path = './results_Olivetti_face' def check_mkdir(dir_name): if not os.path.exists(dir_name): os.mkdir(dir_name) def loss_function(recon_x, x, mu, logvar): # MSE = F.mse_loss(recon_x, x, reduction='sum') MSE = F.binary_cross_entropy(recon_x, x, reduction='sum') KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp()) return MSE + KLD def train(log_interval, model, device, train_loader, optimizer, epoch): # set model as training mode model.train() losses = [] all_X, all_y, all_z, all_mu, all_logvar = [], [], [], [], [] N_count = 0 # counting total trained sample in one epoch for batch_idx, (X, y) in enumerate(train_loader): # distribute data to device X, y = X.to(device), y.to(device).view(-1, ) N_count += X.size(0) optimizer.zero_grad() X_reconst, z, mu, logvar = model(X) # VAE loss = loss_function(X_reconst, X, mu, logvar) losses.append(loss.item()) loss.backward() optimizer.step() all_X.extend(X.data.cpu().numpy()) all_y.extend(y.data.cpu().numpy()) all_z.extend(z.data.cpu().numpy()) all_mu.extend(mu.data.cpu().numpy()) all_logvar.extend(logvar.data.cpu().numpy()) # show information if (batch_idx + 1) % log_interval == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch + 1, N_count, len(train_loader.dataset), 100. * (batch_idx + 1) / len(train_loader), loss.item())) all_X = np.stack(all_X, axis=0) all_y = np.stack(all_y, axis=0) all_z = np.stack(all_z, axis=0) all_mu = np.stack(all_mu, axis=0) all_logvar = np.stack(all_logvar, axis=0) # save Pytorch models of best record torch.save(model.state_dict(), os.path.join(save_model_path, 'model_epoch{}.pth'.format(epoch + 1))) # save motion_encoder torch.save(optimizer.state_dict(), os.path.join(save_model_path, 'optimizer_epoch{}.pth'.format(epoch + 1))) # save optimizer print("Epoch {} model saved!".format(epoch + 1)) return all_X, all_y, all_z, all_mu, all_logvar, losses def validation(model, device, optimizer, test_loader): # set model as testing mode model.eval() test_loss = 0 all_X, all_y, all_z, all_mu, all_logvar = [], [], [], [], [] with torch.no_grad(): for X, y in test_loader: # distribute data to device X, y = X.to(device), y.to(device).view(-1, ) X_reconst, z, mu, logvar = model(X) loss = loss_function(X_reconst, X, mu, logvar) test_loss += loss.item() # sum up batch loss all_X.extend(X.data.cpu().numpy()) all_y.extend(y.data.cpu().numpy()) all_z.extend(z.data.cpu().numpy()) all_mu.extend(mu.data.cpu().numpy()) all_logvar.extend(logvar.data.cpu().numpy()) test_loss /= len(test_loader.dataset) all_X = np.stack(all_X, axis=0) all_y = np.stack(all_y, axis=0) all_z = np.stack(all_z, axis=0) all_mu = np.stack(all_mu, axis=0) all_logvar = np.stack(all_logvar, axis=0) # show information print('\nTest set ({:d} samples): Average loss: {:.4f}\n'.format(len(test_loader.dataset), test_loss)) return all_X, all_y, all_z, all_mu, all_logvar, test_loss # Detect devices use_cuda = torch.cuda.is_available() # check if GPU exists device = torch.device("cuda" if use_cuda else "cpu") # use CPU or GPU # Data loading parameters params = {'batch_size': batch_size, 'shuffle': True, 'num_workers': 2, 'pin_memory': True} if use_cuda else {} # Load the faces datasets data = fetch_olivetti_faces() face_img = data.images.reshape((data.images.shape[0], data.images.shape[1], data.images.shape[2])) face_img_resized = [np.tile(np.expand_dims(resize(face_img[i, :, :], (res_size, res_size), anti_aliasing=True), axis=0), (3, 1, 1)) for i in range(face_img.shape[0])] face_img_resized = np.stack(face_img_resized, axis=0) face_img_resized = torch.from_numpy(face_img_resized).float() labels = torch.from_numpy(data.target) olivetti_data = TensorDataset(face_img_resized, labels) # Data loader (input pipeline) train_loader = torch.utils.data.DataLoader(dataset=olivetti_data, **params) valid_loader = torch.utils.data.DataLoader(dataset=olivetti_data, **params) # Create model resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device) print("Using", torch.cuda.device_count(), "GPU!") model_params = list(resnet_vae.parameters()) optimizer = torch.optim.Adam(model_params, lr=learning_rate) # record training process epoch_train_losses = [] epoch_test_losses = [] check_mkdir(save_model_path) # start training for epoch in range(epochs): # train, test model X_train, y_train, z_train, mu_train, logvar_train, train_losses = train(log_interval, resnet_vae, device, train_loader, optimizer, epoch) X_test, y_test, z_test, mu_test, logvar_test, epoch_test_loss = validation(resnet_vae, device, optimizer, valid_loader) # save results epoch_train_losses.append(train_losses) epoch_test_losses.append(epoch_test_loss) # save all train test results A = np.array(epoch_train_losses) C = np.array(epoch_test_losses) np.save(os.path.join(save_model_path, 'ResNet_VAE_training_loss.npy'), A) np.save(os.path.join(save_model_path, 'X_Olivetti_train_epoch{}.npy'.format(epoch + 1)), X_train) np.save(os.path.join(save_model_path, 'y_Olivetti_train_epoch{}.npy'.format(epoch + 1)), y_train) np.save(os.path.join(save_model_path, 'z_Olivetti_train_epoch{}.npy'.format(epoch + 1)), z_train) ================================================ FILE: ResNetVAE_MNIST.py ================================================ import os import glob import numpy as np import torch import torch.nn as nn import torch.nn.functional as F import torchvision.models as models import torchvision.transforms as transforms import torch.utils.data as data import torchvision from torch.autograd import Variable import matplotlib.pyplot as plt from modules import * from sklearn.model_selection import train_test_split import pickle # os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # os.environ["CUDA_VISIBLE_DEVICES"] = "0" # EncoderCNN architecture CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024 CNN_embed_dim = 256 # latent dim extracted by 2D CNN res_size = 224 # ResNet image size dropout_p = 0.2 # dropout probability # training parameters epochs = 20 # training epochs batch_size = 50 learning_rate = 1e-3 log_interval = 10 # interval for displaying training info # save model save_model_path = './results_MNIST' def check_mkdir(dir_name): if not os.path.exists(dir_name): os.mkdir(dir_name) def loss_function(recon_x, x, mu, logvar): # MSE = F.mse_loss(recon_x, x, reduction='sum') MSE = F.binary_cross_entropy(recon_x, x, reduction='sum') KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp()) return MSE + KLD def train(log_interval, model, device, train_loader, optimizer, epoch): # set model as training mode model.train() losses = [] all_y, all_z, all_mu, all_logvar = [], [], [], [] N_count = 0 # counting total trained sample in one epoch for batch_idx, (X, y) in enumerate(train_loader): # distribute data to device X, y = X.to(device), y.to(device).view(-1, ) N_count += X.size(0) optimizer.zero_grad() X_reconst, z, mu, logvar = model(X) # VAE loss = loss_function(X_reconst, X, mu, logvar) losses.append(loss.item()) loss.backward() optimizer.step() all_y.extend(y.data.cpu().numpy()) all_z.extend(z.data.cpu().numpy()) all_mu.extend(mu.data.cpu().numpy()) all_logvar.extend(logvar.data.cpu().numpy()) # show information if (batch_idx + 1) % log_interval == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch + 1, N_count, len(train_loader.dataset), 100. * (batch_idx + 1) / len(train_loader), loss.item())) all_y = np.stack(all_y, axis=0) all_z = np.stack(all_z, axis=0) all_mu = np.stack(all_mu, axis=0) all_logvar = np.stack(all_logvar, axis=0) # save Pytorch models of best record torch.save(model.state_dict(), os.path.join(save_model_path, 'model_epoch{}.pth'.format(epoch + 1))) # save motion_encoder torch.save(optimizer.state_dict(), os.path.join(save_model_path, 'optimizer_epoch{}.pth'.format(epoch + 1))) # save optimizer print("Epoch {} model saved!".format(epoch + 1)) return X.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, losses def validation(model, device, optimizer, test_loader): # set model as testing mode model.eval() test_loss = 0 all_y, all_z, all_mu, all_logvar = [], [], [], [] with torch.no_grad(): for X, y in test_loader: # distribute data to device X, y = X.to(device), y.to(device).view(-1, ) X_reconst, z, mu, logvar = model(X) loss = loss_function(X_reconst, X, mu, logvar) test_loss += loss.item() # sum up batch loss all_y.extend(y.data.cpu().numpy()) all_z.extend(z.data.cpu().numpy()) all_mu.extend(mu.data.cpu().numpy()) all_logvar.extend(logvar.data.cpu().numpy()) test_loss /= len(test_loader.dataset) all_y = np.stack(all_y, axis=0) all_z = np.stack(all_z, axis=0) all_mu = np.stack(all_mu, axis=0) all_logvar = np.stack(all_logvar, axis=0) # show information print('\nTest set ({:d} samples): Average loss: {:.4f}\n'.format(len(test_loader.dataset), test_loss)) return X.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, test_loss # Detect devices use_cuda = torch.cuda.is_available() # check if GPU exists device = torch.device("cuda" if use_cuda else "cpu") # use CPU or GPU # Data loading parameters params = {'batch_size': batch_size, 'shuffle': True, 'num_workers': 4, 'pin_memory': True} if use_cuda else {} transform = transforms.Compose([transforms.Resize([res_size, res_size]), transforms.ToTensor(), transforms.Lambda(lambda x: x.repeat(3, 1, 1)), # gray -> GRB 3 channel (lambda function) transforms.Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0])]) # for grayscale images # MNIST dataset (images and labels) MNIST_train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True) MNIST_test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform) # Data loader (input pipeline) train_loader = torch.utils.data.DataLoader(dataset=MNIST_train_dataset, batch_size=batch_size, shuffle=True) valid_loader = torch.utils.data.DataLoader(dataset=MNIST_test_dataset, batch_size=batch_size, shuffle=False) # Create model resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device) print("Using", torch.cuda.device_count(), "GPU!") model_params = list(resnet_vae.parameters()) optimizer = torch.optim.Adam(model_params, lr=learning_rate) # record training process epoch_train_losses = [] epoch_test_losses = [] check_mkdir(save_model_path) # start training for epoch in range(epochs): # train, test model X_train, y_train, z_train, mu_train, logvar_train, train_losses = train(log_interval, resnet_vae, device, train_loader, optimizer, epoch) X_test, y_test, z_test, mu_test, logvar_test, epoch_test_loss = validation(resnet_vae, device, optimizer, valid_loader) # save results epoch_train_losses.append(train_losses) epoch_test_losses.append(epoch_test_loss) # save all train test results A = np.array(epoch_train_losses) C = np.array(epoch_test_losses) np.save(os.path.join(save_model_path, 'ResNet_VAE_training_loss.npy'), A) np.save(os.path.join(save_model_path, 'X_MNIST_train_epoch{}.npy'.format(epoch + 1)), X_train) #save last batch np.save(os.path.join(save_model_path, 'y_MNIST_train_epoch{}.npy'.format(epoch + 1)), y_train) np.save(os.path.join(save_model_path, 'z_MNIST_train_epoch{}.npy'.format(epoch + 1)), z_train) ================================================ FILE: ResNetVAE_cifar10.py ================================================ import os import glob import numpy as np import torch import torch.nn as nn import torch.nn.functional as F import torchvision.models as models import torchvision.transforms as transforms import torch.utils.data as data import torchvision from torch.autograd import Variable import matplotlib.pyplot as plt from modules import * from sklearn.model_selection import train_test_split import pickle # os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID" # os.environ["CUDA_VISIBLE_DEVICES"] = "0" # EncoderCNN architecture CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024 CNN_embed_dim = 256 # latent dim extracted by 2D CNN res_size = 224 # ResNet image size dropout_p = 0.2 # dropout probability # training parameters epochs = 20 # training epochs batch_size = 50 learning_rate = 1e-3 log_interval = 10 # interval for displaying training info # save model save_model_path = './results_cifar10' def check_mkdir(dir_name): if not os.path.exists(dir_name): os.mkdir(dir_name) def loss_function(recon_x, x, mu, logvar): # MSE = F.mse_loss(recon_x, x, reduction='sum') MSE = F.binary_cross_entropy(recon_x, x, reduction='sum') KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp()) return MSE + KLD def train(log_interval, model, device, train_loader, optimizer, epoch): # set model as training mode model.train() losses = [] all_y, all_z, all_mu, all_logvar = [], [], [], [] N_count = 0 # counting total trained sample in one epoch for batch_idx, (X, y) in enumerate(train_loader): # distribute data to device X, y = X.to(device), y.to(device).view(-1, ) N_count += X.size(0) optimizer.zero_grad() X_reconst, z, mu, logvar = model(X) # VAE loss = loss_function(X_reconst, X, mu, logvar) losses.append(loss.item()) loss.backward() optimizer.step() all_y.extend(y.data.cpu().numpy()) all_z.extend(z.data.cpu().numpy()) all_mu.extend(mu.data.cpu().numpy()) all_logvar.extend(logvar.data.cpu().numpy()) # show information if (batch_idx + 1) % log_interval == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch + 1, N_count, len(train_loader.dataset), 100. * (batch_idx + 1) / len(train_loader), loss.item())) all_y = np.stack(all_y, axis=0) all_z = np.stack(all_z, axis=0) all_mu = np.stack(all_mu, axis=0) all_logvar = np.stack(all_logvar, axis=0) # save Pytorch models of best record torch.save(model.state_dict(), os.path.join(save_model_path, 'model_epoch{}.pth'.format(epoch + 1))) # save motion_encoder torch.save(optimizer.state_dict(), os.path.join(save_model_path, 'optimizer_epoch{}.pth'.format(epoch + 1))) # save optimizer print("Epoch {} model saved!".format(epoch + 1)) return X_reconst.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, losses def validation(model, device, optimizer, test_loader): # set model as testing mode model.eval() test_loss = 0 all_y, all_z, all_mu, all_logvar = [], [], [], [] with torch.no_grad(): for X, y in test_loader: # distribute data to device X, y = X.to(device), y.to(device).view(-1, ) X_reconst, z, mu, logvar = model(X) loss = loss_function(X_reconst, X, mu, logvar) test_loss += loss.item() # sum up batch loss all_y.extend(y.data.cpu().numpy()) all_z.extend(z.data.cpu().numpy()) all_mu.extend(mu.data.cpu().numpy()) all_logvar.extend(logvar.data.cpu().numpy()) test_loss /= len(test_loader.dataset) all_y = np.stack(all_y, axis=0) all_z = np.stack(all_z, axis=0) all_mu = np.stack(all_mu, axis=0) all_logvar = np.stack(all_logvar, axis=0) # show information print('\nTest set ({:d} samples): Average loss: {:.4f}\n'.format(len(test_loader.dataset), test_loss)) return X_reconst.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, test_loss # Detect devices use_cuda = torch.cuda.is_available() # check if GPU exists device = torch.device("cuda" if use_cuda else "cpu") # use CPU or GPU # Data loading parameters params = {'batch_size': batch_size, 'shuffle': True, 'num_workers': 2, 'pin_memory': True} if use_cuda else {} # transform = transforms.Compose([transforms.Resize([res_size, res_size]), # transforms.ToTensor(), # transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) transform = transforms.Compose([transforms.Resize([res_size, res_size]), transforms.ToTensor(), transforms.Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0])]) # cifar10 dataset (images and labels) cifar10_train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform) cifar10_test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform) # classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck') # Data loader (input pipeline) train_loader = torch.utils.data.DataLoader(dataset=cifar10_train_dataset, batch_size=batch_size, shuffle=True) valid_loader = torch.utils.data.DataLoader(dataset=cifar10_test_dataset, batch_size=batch_size, shuffle=False) # Create model resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device) print("Using", torch.cuda.device_count(), "GPU!") model_params = list(resnet_vae.parameters()) optimizer = torch.optim.Adam(model_params, lr=learning_rate) # record training process epoch_train_losses = [] epoch_test_losses = [] check_mkdir(save_model_path) # start training for epoch in range(epochs): # train, test model X_reconst_train, y_train, z_train, mu_train, logvar_train, train_losses = train(log_interval, resnet_vae, device, train_loader, optimizer, epoch) X_reconst_test, y_test, z_test, mu_test, logvar_test, epoch_test_loss = validation(resnet_vae, device, optimizer, valid_loader) # save results epoch_train_losses.append(train_losses) epoch_test_losses.append(epoch_test_loss) # save all train test results A = np.array(epoch_train_losses) C = np.array(epoch_test_losses) np.save(os.path.join(save_model_path, 'ResNet_VAE_training_loss.npy'), A) np.save(os.path.join(save_model_path, 'y_cifar10_train_epoch{}.npy'.format(epoch + 1)), y_train) np.save(os.path.join(save_model_path, 'z_cifar10_train_epoch{}.npy'.format(epoch + 1)), z_train) ================================================ FILE: ResNetVAE_reconstruction.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "import os\n", "import glob\n", "import numpy as np\n", "import torch\n", "import torch.nn as nn\n", "import torch.nn.functional as F\n", "import torchvision.models as models\n", "import torchvision.transforms as transforms\n", "import torch.utils.data as data\n", "import torchvision\n", "from torch.autograd import Variable\n", "import matplotlib.pyplot as plt\n", "from modules import *\n", "from sklearn.model_selection import train_test_split\n", "import pickle\n", "from sklearn.datasets import fetch_olivetti_faces\n", "from torch.utils.data import Dataset, DataLoader, TensorDataset\n", "from skimage.transform import resize\n", "\n", "\n", "def decoder(model, device, z):\n", " model.eval()\n", " z = Variable(torch.FloatTensor(z)).to(device)\n", " new_images = model.decode(z).squeeze_().data.cpu().numpy().transpose((1, 2, 0))\n", " return new_images\n", "\n", "saved_model_path = './results_Olivetti_face'\n", "# saved_model_path = './results_MNIST'\n", "\n", "exp = 'Olivetti'\n", "# exp = 'MNIST'\n", "\n", "# use same ResNet Encoder saved earlier!\n", "CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024\n", "CNN_embed_dim = 256\n", "res_size = 224 # ResNet image size\n", "dropout_p = 0.2 # dropout probability\n", "\n", "epoch = 20\n", "\n", "\n", "use_cuda = torch.cuda.is_available() # check if GPU exists\n", "device = torch.device(\"cuda\" if use_cuda else \"cpu\") # use CPU or GPU\n", "\n", "# reload ResNetVAE model\n", "resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device)\n", "resnet_vae.load_state_dict(torch.load(os.path.join(saved_model_path, 'model_epoch{}.pth'.format(epoch))))\n", "\n", "print('ResNetVAE epoch {} model reloaded!'.format(epoch))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reconstruction " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z_train = np.load(os.path.join(saved_model_path, 'z_{}_train_epoch{}.npy').format(exp, epoch))\n", "X_train = np.load(os.path.join(saved_model_path, 'X_{}_train_epoch{}.npy').format(exp, epoch))\n", "\n", "ind = 1\n", "zz = torch.from_numpy(z_train[ind]).view(1, -1)\n", "X = np.transpose(X_train[ind], (1, 2, 0))\n", "\n", "new_imgs = decoder(resnet_vae, device, zz)\n", "\n", "fig = plt.figure(figsize=(10, 10))\n", "\n", "plt.subplot(1, 2, 1)\n", "plt.imshow(X)\n", "plt.title('original')\n", "plt.axis('off')\n", "\n", "plt.subplot(1, 2, 2)\n", "plt.imshow(new_imgs)\n", "plt.title('reconstructed')\n", "plt.axis('off')\n", "plt.savefig(\"./reconstruction_{}.png\".format(exp), bbox_inches='tight', dpi=600)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generate new images from latent points" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "# choose two original images\n", "sample1, sample2 = 0, 1\n", "w = 0.4 # weight for fusing two images\n", "\n", "X1 = np.transpose(X_train[-sample1], (1, 2, 0))\n", "X2 = np.transpose(X_train[-sample2], (1, 2, 0))\n", "\n", "# generate image using decoder\n", "z_train = np.load(os.path.join(saved_model_path, 'z_{}_train_epoch{}.npy').format(exp, epoch))\n", "z = z_train[-sample1] * w + z_train[-sample2] * (1 - w)\n", "new_imgs = decoder(resnet_vae, device, torch.from_numpy(z).view(1, -1))\n", "\n", "fig = plt.figure(figsize=(15, 15))\n", "\n", "plt.subplot(1, 3, 1)\n", "plt.imshow(X1)\n", "plt.title('original 1')\n", "plt.axis('off')\n", "\n", "plt.subplot(1, 3, 2)\n", "plt.imshow(X2)\n", "plt.title('original 2')\n", "plt.axis('off')\n", "\n", "\n", "plt.subplot(1, 3, 3)\n", "plt.imshow(new_imgs)\n", "plt.title('new image')\n", "plt.axis('off')\n", "plt.savefig(\"./generated_{}.png\".format(exp), bbox_inches='tight', dpi=600)\n", "plt.show()\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 } ================================================ FILE: modules.py ================================================ import os import numpy as np from PIL import Image from torch.utils import data import torch import torch.nn as nn import torch.nn.functional as F import torchvision.models as models from torch.autograd import Variable import torchvision.transforms as transforms class Dataset(data.Dataset): "Characterizes a dataset for PyTorch" def __init__(self, filenames, labels, transform=None): "Initialization" self.filenames = filenames self.labels = labels self.transform = transform def __len__(self): "Denotes the total number of samples" return len(self.filenames) def __getitem__(self, index): "Generates one sample of data" # Select sample filename = self.filenames[index] X = Image.open(filename) if self.transform: X = self.transform(X) # transform y = torch.LongTensor([self.labels[index]]) return X, y ## ---------------------- end of Dataloaders ---------------------- ## def conv2D_output_size(img_size, padding, kernel_size, stride): # compute output shape of conv2D outshape = (np.floor((img_size[0] + 2 * padding[0] - (kernel_size[0] - 1) - 1) / stride[0] + 1).astype(int), np.floor((img_size[1] + 2 * padding[1] - (kernel_size[1] - 1) - 1) / stride[1] + 1).astype(int)) return outshape def convtrans2D_output_size(img_size, padding, kernel_size, stride): # compute output shape of conv2D outshape = ((img_size[0] - 1) * stride[0] - 2 * padding[0] + kernel_size[0], (img_size[1] - 1) * stride[1] - 2 * padding[1] + kernel_size[1]) return outshape ## ---------------------- ResNet VAE ---------------------- ## class ResNet_VAE(nn.Module): def __init__(self, fc_hidden1=1024, fc_hidden2=768, drop_p=0.3, CNN_embed_dim=256): super(ResNet_VAE, self).__init__() self.fc_hidden1, self.fc_hidden2, self.CNN_embed_dim = fc_hidden1, fc_hidden2, CNN_embed_dim # CNN architechtures self.ch1, self.ch2, self.ch3, self.ch4 = 16, 32, 64, 128 self.k1, self.k2, self.k3, self.k4 = (5, 5), (3, 3), (3, 3), (3, 3) # 2d kernal size self.s1, self.s2, self.s3, self.s4 = (2, 2), (2, 2), (2, 2), (2, 2) # 2d strides self.pd1, self.pd2, self.pd3, self.pd4 = (0, 0), (0, 0), (0, 0), (0, 0) # 2d padding # encoding components resnet = models.resnet152(pretrained=True) modules = list(resnet.children())[:-1] # delete the last fc layer. self.resnet = nn.Sequential(*modules) self.fc1 = nn.Linear(resnet.fc.in_features, self.fc_hidden1) self.bn1 = nn.BatchNorm1d(self.fc_hidden1, momentum=0.01) self.fc2 = nn.Linear(self.fc_hidden1, self.fc_hidden2) self.bn2 = nn.BatchNorm1d(self.fc_hidden2, momentum=0.01) # Latent vectors mu and sigma self.fc3_mu = nn.Linear(self.fc_hidden2, self.CNN_embed_dim) # output = CNN embedding latent variables self.fc3_logvar = nn.Linear(self.fc_hidden2, self.CNN_embed_dim) # output = CNN embedding latent variables # Sampling vector self.fc4 = nn.Linear(self.CNN_embed_dim, self.fc_hidden2) self.fc_bn4 = nn.BatchNorm1d(self.fc_hidden2) self.fc5 = nn.Linear(self.fc_hidden2, 64 * 4 * 4) self.fc_bn5 = nn.BatchNorm1d(64 * 4 * 4) self.relu = nn.ReLU(inplace=True) # Decoder self.convTrans6 = nn.Sequential( nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=self.k4, stride=self.s4, padding=self.pd4), nn.BatchNorm2d(32, momentum=0.01), nn.ReLU(inplace=True), ) self.convTrans7 = nn.Sequential( nn.ConvTranspose2d(in_channels=32, out_channels=8, kernel_size=self.k3, stride=self.s3, padding=self.pd3), nn.BatchNorm2d(8, momentum=0.01), nn.ReLU(inplace=True), ) self.convTrans8 = nn.Sequential( nn.ConvTranspose2d(in_channels=8, out_channels=3, kernel_size=self.k2, stride=self.s2, padding=self.pd2), nn.BatchNorm2d(3, momentum=0.01), nn.Sigmoid() # y = (y1, y2, y3) \in [0 ,1]^3 ) def encode(self, x): x = self.resnet(x) # ResNet x = x.view(x.size(0), -1) # flatten output of conv # FC layers x = self.bn1(self.fc1(x)) x = self.relu(x) x = self.bn2(self.fc2(x)) x = self.relu(x) # x = F.dropout(x, p=self.drop_p, training=self.training) mu, logvar = self.fc3_mu(x), self.fc3_logvar(x) return mu, logvar def reparameterize(self, mu, logvar): if self.training: std = logvar.mul(0.5).exp_() eps = Variable(std.data.new(std.size()).normal_()) return eps.mul(std).add_(mu) else: return mu def decode(self, z): x = self.relu(self.fc_bn4(self.fc4(z))) x = self.relu(self.fc_bn5(self.fc5(x))).view(-1, 64, 4, 4) x = self.convTrans6(x) x = self.convTrans7(x) x = self.convTrans8(x) x = F.interpolate(x, size=(224, 224), mode='bilinear') return x def forward(self, x): mu, logvar = self.encode(x) z = self.reparameterize(mu, logvar) x_reconst = self.decode(z) return x_reconst, z, mu, logvar ================================================ FILE: plot_latent.ipynb ================================================ { "cells": [ { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import matplotlib.pyplot as plt\n", "from mpl_toolkits.mplot3d import Axes3D\n", "import matplotlib.cm as cm\n", "from sklearn.manifold import TSNE\n", "import numpy as np\n", "import pickle\n", "\n", " \n", "epoch = 20\n", "exp = 'cifar10'\n", "# exp = 'MNIST'\n", "\n", "N = 6000 # image number\n", "\n", "y_train = np.load('./results_{}/y_{}_train_epoch{}.npy'.format(exp, exp, epoch))\n", "z_train = np.load('./results_{}/z_{}_train_epoch{}.npy'.format(exp, exp, epoch))\n", "classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck'] # cifar10\n", "# classes = np.arange(10) #MNIST\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Direct projection of latent space" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_train = y_train[:N]\n", "z_train = z_train[:N]\n", "\n", "fig = plt.figure(figsize=(12, 10))\n", "plots = []\n", "markers = ['o', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']\n", "for i, c in enumerate(classes):\n", " ind = (y_train == i).tolist() or ([j < N // len(classes) for j in range(len(y_train))])\n", " color = cm.jet([i / len(classes)] * sum(ind))\n", " plots.append(plt.scatter(z_train[ind, 1], z_train[ind, 2], marker=markers[i], c=color, s=8, label=i))\n", "\n", "plt.axis('off')\n", "plt.legend(plots, classes, fontsize=14, loc='upper right')\n", "plt.title('{} (direct projection: {}-dim -> 2-dim)'.format(exp, z_train.shape[1]), fontsize=14)\n", "plt.savefig(\"./ResNetVAE_{}_direct_plot.png\".format(exp), bbox_inches='tight', dpi=600)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Use t-SNE for dimension reduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### compressed to 2-dimension" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z_embed = TSNE(n_components=2, n_iter=12000).fit_transform(z_train[:N])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = plt.figure(figsize=(12, 10))\n", "plots = []\n", "markers = ['o', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd'] # select different markers\n", "for i, c in enumerate(classes):\n", " ind = (y_train[:N] == i).tolist()\n", " color = cm.jet([i / len(classes)] * sum(ind))\n", " # plot each category one at a time \n", " plots.append(plt.scatter(z_embed[ind, 0], z_embed[ind, 1], c=color, marker=markers[i], s=8, label=i))\n", "\n", "plt.axis('off')\n", "plt.xlim(-150, 150)\n", "plt.ylim(-150, 150)\n", "plt.legend(plots, classes, fontsize=14, loc='upper right')\n", "plt.title('{} (t-SNE: {}-dim -> 2-dim)'.format(exp, z_train.shape[1]), fontsize=14)\n", "plt.savefig(\"./ResNetVAE_{}_embedded_plot.png\".format(exp), bbox_inches='tight', dpi=600)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### compressed to 3-dimension" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "z_embed3D = TSNE(n_components=3, n_iter=12000).fit_transform(z_train[:N])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig = plt.figure(figsize=(12, 10))\n", "ax = fig.add_subplot(111, projection='3d')\n", "\n", "plots = []\n", "markers = ['o', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd'] # select different markers\n", "for i, c in enumerate(classes):\n", " ind = (y_train[:N] == i).tolist()\n", " color = cm.jet([i / len(classes)] * sum(ind))\n", " # plot each category one at a time \n", " ax.scatter(z_embed3D[ind, 0], z_embed3D[ind, 1], c=color, marker=markers[i], s=8, label=i)\n", "\n", "ax.axis('on')\n", "\n", "r_max = 20\n", "r_min = -r_max\n", "\n", "ax.set_xlim(r_min, r_max)\n", "ax.set_ylim(r_min, r_max)\n", "ax.set_zlim(r_min, r_max)\n", "ax.set_xlabel('z-dim 1')\n", "ax.set_ylabel('z-dim 2')\n", "ax.set_zlabel('z-dim 3')\n", "ax.set_title('{} (t-SNE: {}-dim -> 3-dim)'.format(exp, z_train.shape[1]), fontsize=14)\n", "ax.legend(plots, classes, fontsize=14, loc='upper right')\n", "plt.savefig(\"./ResNetVAE_{}_embedded_3Dplot.png\".format(exp), bbox_inches='tight', dpi=600)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }