Repository: hsinyilin19/ResNetVAE
Branch: master
Commit: 0befd92878da
Files: 7
Total size: 39.1 KB

Directory structure:
gitextract_r_nthkwy/

├── README.md
├── ResNetVAE_FACE.py
├── ResNetVAE_MNIST.py
├── ResNetVAE_cifar10.py
├── ResNetVAE_reconstruction.ipynb
├── modules.py
└── plot_latent.ipynb

================================================
FILE CONTENTS
================================================

================================================
FILE: README.md
================================================
# Variational Autoencoder (VAE) + Transfer learning (ResNet + VAE)

This repository implements the VAE in PyTorch, using a pretrained ResNet model as its encoder, and a transposed convolutional network as decoder.  


## Datasets

### 1. MNIST

The [MNIST](http://yann.lecun.com/exdb/mnist/) database contains 60,000 training images and 10,000 testing images. Each image is saved as a 28x28 matrix.

<img src="./fig/MNIST.png" width="650">


### 2. CIFAR10

The [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class.

<img src="./fig/cifar10.png" width="650">


### 3. Olivetti faces dataset

The [Olivetti](https://scikit-learn.org/0.19/datasets/olivetti_faces.html) faces dataset consists of 10 64x64 images for 40 distinct subjects.

<img src="./fig/face.png" width="650">


## Model


A [VAE](https://arxiv.org/pdf/1312.6114.pdf) model contains a pair of encoder and decoder. An encoder <img src="./fig/theta.png" width="8"> compresses an 2D image *x* into a vector *z* in a lower dimension space, which is normally called the latent space, while the decoder <img src="./fig/phi.png" width="10"> receives the vectors in latent space, and outputs objects in the same space as the inputs of the encoder. The training goal is to make the composition of encoder and decoder to be "as close to identity as possible". Precisely, the loss function is:
<img src="./fig/loss.png" width="350">,
where <img src="./fig/DKL.png" width="30"> is the Kullback-Leibler divergence, and <img src="./fig/normal.png" width="18"> is the standard normal distribution. The first term measures how good the reconstruction is, and second term measures how close the normal distribution and q are. After training two applications will be granted. First, the encoder can do dimension reduction. Second, the decoder can be used to reproduce input images, or even generate new images. We shall show the results of our experiments in the end.


  - For our **encoder**, we do fine tuning, a technique in transfer learning, on [ResNet-152](https://arxiv.org/abs/1512.03385). ResNet-152 is a [CNN](https://en.wikipedia.org/wiki/Convolutional_neural_network) pretrained on ImageNet [ILSVRC-2012-CLS](http://www.image-net.org/challenges/LSVRC/2012/). Our **decoder** uses transposed convolution network. 
  
  
## Training 

- The input images are resized to **(channels, x-dim, y-dim) = (3, 224, 224)**, which is reqiured by the ResNet-152 model. 
- We use ADAM in our optimization process.
   
<img src="./fig/training_curve.png" width="750">


## Usage 

### Prerequisites
- [Python 3.6](https://www.python.org/)
- [PyTorch 1.0.0](https://pytorch.org/)
- [Numpy 1.15.0](http://www.numpy.org/)
- [Sklearn 0.19.2](https://scikit-learn.org/stable/)
- [Matplotlib](https://matplotlib.org/)


### Model ouputs

We saved labels (y coordinates), resulting latent space (z coordinates), models, and optimizers.


 - Run plot_latent.ipynb to see the clustering results
   
 - Run ResNetVAE_reconstruction.ipynb to reproduce or generate images
  
 - Optimizer recordings are convenient for re-training. 


## Results 

### Clustering

With encoder compressing high dimension inputs to low dimension latent space, we can use it to see the clustering of data points. 

<img src="./fig/cluster_MNIST.png" width="850">
<img src="./fig/cluster_cifar10.png" width="850">


### Reproduce and generate images

The decoder reproduces the input images from the latent space. Not only so, it can even generate new images, which are not in the original datasets.

<img src="./fig/reconstruction_MNIST.png" width="450">
<img src="./fig/reconstruction_face.png" width="450">

<img src="./fig/generated_MNIST.png" width="550">
<img src="./fig/generated_face.png" width="550">


================================================
FILE: ResNetVAE_FACE.py
================================================
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import torchvision.transforms as transforms
import torch.utils.data as data
import torchvision
from torch.autograd import Variable
import matplotlib.pyplot as plt
from modules import *
from sklearn.datasets import fetch_olivetti_faces
from torch.utils.data import Dataset, DataLoader, TensorDataset
from skimage.transform import resize
from sklearn.model_selection import train_test_split
import pickle

# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   
# os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# EncoderCNN architecture
CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024
CNN_embed_dim = 256     # latent dim extracted by 2D CNN
res_size = 224        # ResNet image size
dropout_p = 0.2       # dropout probability


# training parameters
epochs = 100        # training epochs
batch_size = 50
learning_rate = 1e-3
log_interval = 10   # interval for displaying training info

# save model
save_model_path = './results_Olivetti_face'

def check_mkdir(dir_name):
    if not os.path.exists(dir_name):
        os.mkdir(dir_name)


def loss_function(recon_x, x, mu, logvar):
    # MSE = F.mse_loss(recon_x, x, reduction='sum')
    MSE = F.binary_cross_entropy(recon_x, x, reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return MSE + KLD


def train(log_interval, model, device, train_loader, optimizer, epoch):
    # set model as training mode
    model.train()

    losses = []
    all_X, all_y, all_z, all_mu, all_logvar = [], [], [], [], []
    N_count = 0   # counting total trained sample in one epoch
    for batch_idx, (X, y) in enumerate(train_loader):
        # distribute data to device
        X, y = X.to(device), y.to(device).view(-1, )
        N_count += X.size(0)

        optimizer.zero_grad()
        X_reconst, z, mu, logvar = model(X)  # VAE
        loss = loss_function(X_reconst, X, mu, logvar)
        losses.append(loss.item())

        loss.backward()
        optimizer.step()
        
        all_X.extend(X.data.cpu().numpy())
        all_y.extend(y.data.cpu().numpy())
        all_z.extend(z.data.cpu().numpy())
        all_mu.extend(mu.data.cpu().numpy())
        all_logvar.extend(logvar.data.cpu().numpy())

        # show information
        if (batch_idx + 1) % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch + 1, N_count, len(train_loader.dataset), 100. * (batch_idx + 1) / len(train_loader), loss.item()))
    
    all_X = np.stack(all_X, axis=0)
    all_y = np.stack(all_y, axis=0)
    all_z = np.stack(all_z, axis=0)
    all_mu = np.stack(all_mu, axis=0)
    all_logvar = np.stack(all_logvar, axis=0)

    # save Pytorch models of best record
    torch.save(model.state_dict(), os.path.join(save_model_path, 'model_epoch{}.pth'.format(epoch + 1)))  # save motion_encoder
    torch.save(optimizer.state_dict(), os.path.join(save_model_path, 'optimizer_epoch{}.pth'.format(epoch + 1)))      # save optimizer
    print("Epoch {} model saved!".format(epoch + 1))

    return all_X, all_y, all_z, all_mu, all_logvar, losses


def validation(model, device, optimizer, test_loader):
    # set model as testing mode
    model.eval()

    test_loss = 0
    all_X, all_y, all_z, all_mu, all_logvar = [], [], [], [], []
    with torch.no_grad():
        for X, y in test_loader:
            # distribute data to device
            X, y = X.to(device), y.to(device).view(-1, )
            X_reconst, z, mu, logvar = model(X)

            loss = loss_function(X_reconst, X, mu, logvar)
            test_loss += loss.item()  # sum up batch loss

            all_X.extend(X.data.cpu().numpy())
            all_y.extend(y.data.cpu().numpy())
            all_z.extend(z.data.cpu().numpy())
            all_mu.extend(mu.data.cpu().numpy())
            all_logvar.extend(logvar.data.cpu().numpy())

    test_loss /= len(test_loader.dataset)
    all_X = np.stack(all_X, axis=0)
    all_y = np.stack(all_y, axis=0)
    all_z = np.stack(all_z, axis=0)
    all_mu = np.stack(all_mu, axis=0)
    all_logvar = np.stack(all_logvar, axis=0)

    # show information
    print('\nTest set ({:d} samples): Average loss: {:.4f}\n'.format(len(test_loader.dataset), test_loss))
    return all_X, all_y, all_z, all_mu, all_logvar, test_loss


# Detect devices
use_cuda = torch.cuda.is_available()                   # check if GPU exists
device = torch.device("cuda" if use_cuda else "cpu")   # use CPU or GPU

# Data loading parameters
params = {'batch_size': batch_size, 'shuffle': True, 'num_workers': 2, 'pin_memory': True} if use_cuda else {}


# Load the faces datasets
data = fetch_olivetti_faces()
face_img = data.images.reshape((data.images.shape[0], data.images.shape[1], data.images.shape[2]))
face_img_resized = [np.tile(np.expand_dims(resize(face_img[i, :, :], (res_size, res_size), anti_aliasing=True), axis=0), (3, 1, 1)) for i in range(face_img.shape[0])]
face_img_resized = np.stack(face_img_resized, axis=0)
face_img_resized = torch.from_numpy(face_img_resized).float()
labels = torch.from_numpy(data.target)

olivetti_data = TensorDataset(face_img_resized, labels)
# Data loader (input pipeline)
train_loader = torch.utils.data.DataLoader(dataset=olivetti_data, **params)
valid_loader = torch.utils.data.DataLoader(dataset=olivetti_data, **params)

# Create model
resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device)

print("Using", torch.cuda.device_count(), "GPU!")
model_params = list(resnet_vae.parameters())
optimizer = torch.optim.Adam(model_params, lr=learning_rate)


# record training process
epoch_train_losses = []
epoch_test_losses = []
check_mkdir(save_model_path)

# start training
for epoch in range(epochs):
    # train, test model
    X_train, y_train, z_train, mu_train, logvar_train, train_losses = train(log_interval, resnet_vae, device, train_loader, optimizer, epoch)
    X_test, y_test, z_test, mu_test, logvar_test, epoch_test_loss = validation(resnet_vae, device, optimizer, valid_loader)

    # save results
    epoch_train_losses.append(train_losses)
    epoch_test_losses.append(epoch_test_loss)

    # save all train test results
    A = np.array(epoch_train_losses)
    C = np.array(epoch_test_losses)
    
    np.save(os.path.join(save_model_path, 'ResNet_VAE_training_loss.npy'), A)
    np.save(os.path.join(save_model_path, 'X_Olivetti_train_epoch{}.npy'.format(epoch + 1)), X_train)
    np.save(os.path.join(save_model_path, 'y_Olivetti_train_epoch{}.npy'.format(epoch + 1)), y_train)
    np.save(os.path.join(save_model_path, 'z_Olivetti_train_epoch{}.npy'.format(epoch + 1)), z_train)


================================================
FILE: ResNetVAE_MNIST.py
================================================
import os
import glob
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import torchvision.transforms as transforms
import torch.utils.data as data
import torchvision
from torch.autograd import Variable
import matplotlib.pyplot as plt
from modules import *
from sklearn.model_selection import train_test_split
import pickle

# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   
# os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# EncoderCNN architecture
CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024
CNN_embed_dim = 256     # latent dim extracted by 2D CNN
res_size = 224        # ResNet image size
dropout_p = 0.2       # dropout probability

# training parameters
epochs = 20        # training epochs
batch_size = 50
learning_rate = 1e-3
log_interval = 10   # interval for displaying training info


# save model
save_model_path = './results_MNIST'


def check_mkdir(dir_name):
    if not os.path.exists(dir_name):
        os.mkdir(dir_name)

def loss_function(recon_x, x, mu, logvar):
    # MSE = F.mse_loss(recon_x, x, reduction='sum')
    MSE = F.binary_cross_entropy(recon_x, x, reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return MSE + KLD


def train(log_interval, model, device, train_loader, optimizer, epoch):
    # set model as training mode
    model.train()

    losses = []
    all_y, all_z, all_mu, all_logvar = [], [], [], []
    N_count = 0   # counting total trained sample in one epoch
    for batch_idx, (X, y) in enumerate(train_loader):
        # distribute data to device
        X, y = X.to(device), y.to(device).view(-1, )
        N_count += X.size(0)

        optimizer.zero_grad()
        X_reconst, z, mu, logvar = model(X)  # VAE
        loss = loss_function(X_reconst, X, mu, logvar)
        losses.append(loss.item())

        loss.backward()
        optimizer.step()

        all_y.extend(y.data.cpu().numpy())
        all_z.extend(z.data.cpu().numpy())
        all_mu.extend(mu.data.cpu().numpy())
        all_logvar.extend(logvar.data.cpu().numpy())

        # show information
        if (batch_idx + 1) % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch + 1, N_count, len(train_loader.dataset), 100. * (batch_idx + 1) / len(train_loader), loss.item()))

    all_y = np.stack(all_y, axis=0)
    all_z = np.stack(all_z, axis=0)
    all_mu = np.stack(all_mu, axis=0)
    all_logvar = np.stack(all_logvar, axis=0)

    # save Pytorch models of best record
    torch.save(model.state_dict(), os.path.join(save_model_path, 'model_epoch{}.pth'.format(epoch + 1)))  # save motion_encoder
    torch.save(optimizer.state_dict(), os.path.join(save_model_path, 'optimizer_epoch{}.pth'.format(epoch + 1)))      # save optimizer
    print("Epoch {} model saved!".format(epoch + 1))

    return X.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, losses


def validation(model, device, optimizer, test_loader):
    # set model as testing mode
    model.eval()

    test_loss = 0
    all_y, all_z, all_mu, all_logvar = [], [], [], []
    with torch.no_grad():
        for X, y in test_loader:
            # distribute data to device
            X, y = X.to(device), y.to(device).view(-1, )
            X_reconst, z, mu, logvar = model(X)

            loss = loss_function(X_reconst, X, mu, logvar)
            test_loss += loss.item()  # sum up batch loss

            all_y.extend(y.data.cpu().numpy())
            all_z.extend(z.data.cpu().numpy())
            all_mu.extend(mu.data.cpu().numpy())
            all_logvar.extend(logvar.data.cpu().numpy())

    test_loss /= len(test_loader.dataset)
    all_y = np.stack(all_y, axis=0)
    all_z = np.stack(all_z, axis=0)
    all_mu = np.stack(all_mu, axis=0)
    all_logvar = np.stack(all_logvar, axis=0)

    # show information
    print('\nTest set ({:d} samples): Average loss: {:.4f}\n'.format(len(test_loader.dataset), test_loss))
    return X.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, test_loss


# Detect devices
use_cuda = torch.cuda.is_available()                   # check if GPU exists
device = torch.device("cuda" if use_cuda else "cpu")   # use CPU or GPU

# Data loading parameters
params = {'batch_size': batch_size, 'shuffle': True, 'num_workers': 4, 'pin_memory': True} if use_cuda else {}
transform = transforms.Compose([transforms.Resize([res_size, res_size]),
                                transforms.ToTensor(),
                                transforms.Lambda(lambda x: x.repeat(3, 1, 1)),  # gray -> GRB 3 channel (lambda function)
                                transforms.Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0])])  # for grayscale images

# MNIST dataset (images and labels)
MNIST_train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
MNIST_test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform)

# Data loader (input pipeline)
train_loader = torch.utils.data.DataLoader(dataset=MNIST_train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = torch.utils.data.DataLoader(dataset=MNIST_test_dataset, batch_size=batch_size, shuffle=False)

# Create model
resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device)

print("Using", torch.cuda.device_count(), "GPU!")
model_params = list(resnet_vae.parameters())
optimizer = torch.optim.Adam(model_params, lr=learning_rate)


# record training process
epoch_train_losses = []
epoch_test_losses = []
check_mkdir(save_model_path)

# start training
for epoch in range(epochs):

    # train, test model
    X_train, y_train, z_train, mu_train, logvar_train, train_losses = train(log_interval, resnet_vae, device, train_loader, optimizer, epoch)
    X_test, y_test, z_test, mu_test, logvar_test, epoch_test_loss = validation(resnet_vae, device, optimizer, valid_loader)

    # save results
    epoch_train_losses.append(train_losses)
    epoch_test_losses.append(epoch_test_loss)

    
    # save all train test results
    A = np.array(epoch_train_losses)
    C = np.array(epoch_test_losses)
    
    np.save(os.path.join(save_model_path, 'ResNet_VAE_training_loss.npy'), A)
    np.save(os.path.join(save_model_path, 'X_MNIST_train_epoch{}.npy'.format(epoch + 1)), X_train) #save last batch
    np.save(os.path.join(save_model_path, 'y_MNIST_train_epoch{}.npy'.format(epoch + 1)), y_train)
    np.save(os.path.join(save_model_path, 'z_MNIST_train_epoch{}.npy'.format(epoch + 1)), z_train)

================================================
FILE: ResNetVAE_cifar10.py
================================================
import os
import glob
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
import torchvision.transforms as transforms
import torch.utils.data as data
import torchvision
from torch.autograd import Variable
import matplotlib.pyplot as plt
from modules import *
from sklearn.model_selection import train_test_split
import pickle

# os.environ["CUDA_DEVICE_ORDER"] = "PCI_BUS_ID"   
# os.environ["CUDA_VISIBLE_DEVICES"] = "0"

# EncoderCNN architecture
CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024
CNN_embed_dim = 256     # latent dim extracted by 2D CNN
res_size = 224        # ResNet image size
dropout_p = 0.2       # dropout probability


# training parameters
epochs = 20        # training epochs
batch_size = 50
learning_rate = 1e-3
log_interval = 10   # interval for displaying training info

# save model
save_model_path = './results_cifar10'

def check_mkdir(dir_name):
    if not os.path.exists(dir_name):
        os.mkdir(dir_name)


def loss_function(recon_x, x, mu, logvar):
    # MSE = F.mse_loss(recon_x, x, reduction='sum')
    MSE = F.binary_cross_entropy(recon_x, x, reduction='sum')
    KLD = -0.5 * torch.sum(1 + logvar - mu.pow(2) - logvar.exp())
    return MSE + KLD


def train(log_interval, model, device, train_loader, optimizer, epoch):
    # set model as training mode
    model.train()

    losses = []
    all_y, all_z, all_mu, all_logvar = [], [], [], []
    N_count = 0   # counting total trained sample in one epoch
    for batch_idx, (X, y) in enumerate(train_loader):
        # distribute data to device
        X, y = X.to(device), y.to(device).view(-1, )
        N_count += X.size(0)

        optimizer.zero_grad()
        X_reconst, z, mu, logvar = model(X)  # VAE
        loss = loss_function(X_reconst, X, mu, logvar)
        losses.append(loss.item())

        loss.backward()
        optimizer.step()

        all_y.extend(y.data.cpu().numpy())
        all_z.extend(z.data.cpu().numpy())
        all_mu.extend(mu.data.cpu().numpy())
        all_logvar.extend(logvar.data.cpu().numpy())

        # show information
        if (batch_idx + 1) % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch + 1, N_count, len(train_loader.dataset), 100. * (batch_idx + 1) / len(train_loader), loss.item()))

    all_y = np.stack(all_y, axis=0)
    all_z = np.stack(all_z, axis=0)
    all_mu = np.stack(all_mu, axis=0)
    all_logvar = np.stack(all_logvar, axis=0)

    # save Pytorch models of best record
    torch.save(model.state_dict(), os.path.join(save_model_path, 'model_epoch{}.pth'.format(epoch + 1)))  # save motion_encoder
    torch.save(optimizer.state_dict(), os.path.join(save_model_path, 'optimizer_epoch{}.pth'.format(epoch + 1)))      # save optimizer
    print("Epoch {} model saved!".format(epoch + 1))

    return X_reconst.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, losses


def validation(model, device, optimizer, test_loader):
    # set model as testing mode
    model.eval()

    test_loss = 0
    all_y, all_z, all_mu, all_logvar = [], [], [], []
    with torch.no_grad():
        for X, y in test_loader:
            # distribute data to device
            X, y = X.to(device), y.to(device).view(-1, )
            X_reconst, z, mu, logvar = model(X)

            loss = loss_function(X_reconst, X, mu, logvar)
            test_loss += loss.item()  # sum up batch loss

            all_y.extend(y.data.cpu().numpy())
            all_z.extend(z.data.cpu().numpy())
            all_mu.extend(mu.data.cpu().numpy())
            all_logvar.extend(logvar.data.cpu().numpy())

    test_loss /= len(test_loader.dataset)
    all_y = np.stack(all_y, axis=0)
    all_z = np.stack(all_z, axis=0)
    all_mu = np.stack(all_mu, axis=0)
    all_logvar = np.stack(all_logvar, axis=0)

    # show information
    print('\nTest set ({:d} samples): Average loss: {:.4f}\n'.format(len(test_loader.dataset), test_loss))
    return X_reconst.data.cpu().numpy(), all_y, all_z, all_mu, all_logvar, test_loss


# Detect devices
use_cuda = torch.cuda.is_available()                   # check if GPU exists
device = torch.device("cuda" if use_cuda else "cpu")   # use CPU or GPU

# Data loading parameters
params = {'batch_size': batch_size, 'shuffle': True, 'num_workers': 2, 'pin_memory': True} if use_cuda else {}
# transform = transforms.Compose([transforms.Resize([res_size, res_size]),
#                                 transforms.ToTensor(),
#                                 transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])

transform = transforms.Compose([transforms.Resize([res_size, res_size]),
                                transforms.ToTensor(),
                                transforms.Normalize(mean=[0.0, 0.0, 0.0], std=[1.0, 1.0, 1.0])])

# cifar10 dataset (images and labels)
cifar10_train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
cifar10_test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)


# classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# Data loader (input pipeline)
train_loader = torch.utils.data.DataLoader(dataset=cifar10_train_dataset, batch_size=batch_size, shuffle=True)
valid_loader = torch.utils.data.DataLoader(dataset=cifar10_test_dataset, batch_size=batch_size, shuffle=False)

# Create model
resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device)

print("Using", torch.cuda.device_count(), "GPU!")
model_params = list(resnet_vae.parameters())
optimizer = torch.optim.Adam(model_params, lr=learning_rate)


# record training process
epoch_train_losses = []
epoch_test_losses = []
check_mkdir(save_model_path)

# start training
for epoch in range(epochs):
    # train, test model
    X_reconst_train, y_train, z_train, mu_train, logvar_train, train_losses = train(log_interval, resnet_vae, device, train_loader, optimizer, epoch)
    X_reconst_test, y_test, z_test, mu_test, logvar_test, epoch_test_loss = validation(resnet_vae, device, optimizer, valid_loader)

    # save results
    epoch_train_losses.append(train_losses)
    epoch_test_losses.append(epoch_test_loss)

    # save all train test results
    A = np.array(epoch_train_losses)
    C = np.array(epoch_test_losses)
    
    np.save(os.path.join(save_model_path, 'ResNet_VAE_training_loss.npy'), A)
    np.save(os.path.join(save_model_path, 'y_cifar10_train_epoch{}.npy'.format(epoch + 1)), y_train)
    np.save(os.path.join(save_model_path, 'z_cifar10_train_epoch{}.npy'.format(epoch + 1)), z_train)


================================================
FILE: ResNetVAE_reconstruction.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib\n",
    "import matplotlib.pyplot as plt\n",
    "import os\n",
    "import glob\n",
    "import numpy as np\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "import torchvision.models as models\n",
    "import torchvision.transforms as transforms\n",
    "import torch.utils.data as data\n",
    "import torchvision\n",
    "from torch.autograd import Variable\n",
    "import matplotlib.pyplot as plt\n",
    "from modules import *\n",
    "from sklearn.model_selection import train_test_split\n",
    "import pickle\n",
    "from sklearn.datasets import fetch_olivetti_faces\n",
    "from torch.utils.data import Dataset, DataLoader, TensorDataset\n",
    "from skimage.transform import resize\n",
    "\n",
    "\n",
    "def decoder(model, device, z):\n",
    "    model.eval()\n",
    "    z = Variable(torch.FloatTensor(z)).to(device)\n",
    "    new_images = model.decode(z).squeeze_().data.cpu().numpy().transpose((1, 2, 0))\n",
    "    return new_images\n",
    "\n",
    "saved_model_path = './results_Olivetti_face'\n",
    "# saved_model_path = './results_MNIST'\n",
    "\n",
    "exp = 'Olivetti'\n",
    "# exp = 'MNIST'\n",
    "\n",
    "# use same ResNet Encoder saved earlier!\n",
    "CNN_fc_hidden1, CNN_fc_hidden2 = 1024, 1024\n",
    "CNN_embed_dim = 256\n",
    "res_size = 224        # ResNet image size\n",
    "dropout_p = 0.2       # dropout probability\n",
    "\n",
    "epoch = 20\n",
    "\n",
    "\n",
    "use_cuda = torch.cuda.is_available()                   # check if GPU exists\n",
    "device = torch.device(\"cuda\" if use_cuda else \"cpu\")   # use CPU or GPU\n",
    "\n",
    "# reload ResNetVAE model\n",
    "resnet_vae = ResNet_VAE(fc_hidden1=CNN_fc_hidden1, fc_hidden2=CNN_fc_hidden2, drop_p=dropout_p, CNN_embed_dim=CNN_embed_dim).to(device)\n",
    "resnet_vae.load_state_dict(torch.load(os.path.join(saved_model_path, 'model_epoch{}.pth'.format(epoch))))\n",
    "\n",
    "print('ResNetVAE epoch {} model reloaded!'.format(epoch))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Reconstruction "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z_train = np.load(os.path.join(saved_model_path, 'z_{}_train_epoch{}.npy').format(exp, epoch))\n",
    "X_train = np.load(os.path.join(saved_model_path, 'X_{}_train_epoch{}.npy').format(exp, epoch))\n",
    "\n",
    "ind = 1\n",
    "zz = torch.from_numpy(z_train[ind]).view(1, -1)\n",
    "X = np.transpose(X_train[ind], (1, 2, 0))\n",
    "\n",
    "new_imgs = decoder(resnet_vae, device, zz)\n",
    "\n",
    "fig = plt.figure(figsize=(10, 10))\n",
    "\n",
    "plt.subplot(1, 2, 1)\n",
    "plt.imshow(X)\n",
    "plt.title('original')\n",
    "plt.axis('off')\n",
    "\n",
    "plt.subplot(1, 2, 2)\n",
    "plt.imshow(new_imgs)\n",
    "plt.title('reconstructed')\n",
    "plt.axis('off')\n",
    "plt.savefig(\"./reconstruction_{}.png\".format(exp), bbox_inches='tight', dpi=600)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate new images from latent points"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# choose two original images\n",
    "sample1, sample2 = 0, 1\n",
    "w = 0.4 # weight for fusing two images\n",
    "\n",
    "X1 = np.transpose(X_train[-sample1], (1, 2, 0))\n",
    "X2 = np.transpose(X_train[-sample2], (1, 2, 0))\n",
    "\n",
    "# generate image using decoder\n",
    "z_train = np.load(os.path.join(saved_model_path, 'z_{}_train_epoch{}.npy').format(exp, epoch))\n",
    "z = z_train[-sample1] * w + z_train[-sample2] * (1 - w)\n",
    "new_imgs = decoder(resnet_vae, device, torch.from_numpy(z).view(1, -1))\n",
    "\n",
    "fig = plt.figure(figsize=(15, 15))\n",
    "\n",
    "plt.subplot(1, 3, 1)\n",
    "plt.imshow(X1)\n",
    "plt.title('original 1')\n",
    "plt.axis('off')\n",
    "\n",
    "plt.subplot(1, 3, 2)\n",
    "plt.imshow(X2)\n",
    "plt.title('original 2')\n",
    "plt.axis('off')\n",
    "\n",
    "\n",
    "plt.subplot(1, 3, 3)\n",
    "plt.imshow(new_imgs)\n",
    "plt.title('new image')\n",
    "plt.axis('off')\n",
    "plt.savefig(\"./generated_{}.png\".format(exp), bbox_inches='tight', dpi=600)\n",
    "plt.show()\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}


================================================
FILE: modules.py
================================================
import os
import numpy as np
from PIL import Image
from torch.utils import data
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models as models
from torch.autograd import Variable
import torchvision.transforms as transforms


class Dataset(data.Dataset):
    "Characterizes a dataset for PyTorch"
    def __init__(self, filenames, labels, transform=None):
        "Initialization"
        self.filenames = filenames
        self.labels = labels
        self.transform = transform

    def __len__(self):
        "Denotes the total number of samples"
        return len(self.filenames)


    def __getitem__(self, index):
        "Generates one sample of data"
        # Select sample
        filename = self.filenames[index]
        X = Image.open(filename)

        if self.transform:
            X = self.transform(X)     # transform

        y = torch.LongTensor([self.labels[index]])
        return X, y

## ---------------------- end of Dataloaders ---------------------- ##

def conv2D_output_size(img_size, padding, kernel_size, stride):
    # compute output shape of conv2D
    outshape = (np.floor((img_size[0] + 2 * padding[0] - (kernel_size[0] - 1) - 1) / stride[0] + 1).astype(int),
                np.floor((img_size[1] + 2 * padding[1] - (kernel_size[1] - 1) - 1) / stride[1] + 1).astype(int))
    return outshape

def convtrans2D_output_size(img_size, padding, kernel_size, stride):
    # compute output shape of conv2D
    outshape = ((img_size[0] - 1) * stride[0] - 2 * padding[0] + kernel_size[0],
                (img_size[1] - 1) * stride[1] - 2 * padding[1] + kernel_size[1])
    return outshape

## ---------------------- ResNet VAE ---------------------- ##

class ResNet_VAE(nn.Module):
    def __init__(self, fc_hidden1=1024, fc_hidden2=768, drop_p=0.3, CNN_embed_dim=256):
        super(ResNet_VAE, self).__init__()

        self.fc_hidden1, self.fc_hidden2, self.CNN_embed_dim = fc_hidden1, fc_hidden2, CNN_embed_dim

        # CNN architechtures
        self.ch1, self.ch2, self.ch3, self.ch4 = 16, 32, 64, 128
        self.k1, self.k2, self.k3, self.k4 = (5, 5), (3, 3), (3, 3), (3, 3)      # 2d kernal size
        self.s1, self.s2, self.s3, self.s4 = (2, 2), (2, 2), (2, 2), (2, 2)      # 2d strides
        self.pd1, self.pd2, self.pd3, self.pd4 = (0, 0), (0, 0), (0, 0), (0, 0)  # 2d padding

        # encoding components
        resnet = models.resnet152(pretrained=True)
        modules = list(resnet.children())[:-1]      # delete the last fc layer.
        self.resnet = nn.Sequential(*modules)
        self.fc1 = nn.Linear(resnet.fc.in_features, self.fc_hidden1)
        self.bn1 = nn.BatchNorm1d(self.fc_hidden1, momentum=0.01)
        self.fc2 = nn.Linear(self.fc_hidden1, self.fc_hidden2)
        self.bn2 = nn.BatchNorm1d(self.fc_hidden2, momentum=0.01)
        # Latent vectors mu and sigma
        self.fc3_mu = nn.Linear(self.fc_hidden2, self.CNN_embed_dim)      # output = CNN embedding latent variables
        self.fc3_logvar = nn.Linear(self.fc_hidden2, self.CNN_embed_dim)  # output = CNN embedding latent variables

        # Sampling vector
        self.fc4 = nn.Linear(self.CNN_embed_dim, self.fc_hidden2)
        self.fc_bn4 = nn.BatchNorm1d(self.fc_hidden2)
        self.fc5 = nn.Linear(self.fc_hidden2, 64 * 4 * 4)
        self.fc_bn5 = nn.BatchNorm1d(64 * 4 * 4)
        self.relu = nn.ReLU(inplace=True)

        # Decoder
        self.convTrans6 = nn.Sequential(
            nn.ConvTranspose2d(in_channels=64, out_channels=32, kernel_size=self.k4, stride=self.s4,
                               padding=self.pd4),
            nn.BatchNorm2d(32, momentum=0.01),
            nn.ReLU(inplace=True),
        )
        self.convTrans7 = nn.Sequential(
            nn.ConvTranspose2d(in_channels=32, out_channels=8, kernel_size=self.k3, stride=self.s3,
                               padding=self.pd3),
            nn.BatchNorm2d(8, momentum=0.01),
            nn.ReLU(inplace=True),
        )

        self.convTrans8 = nn.Sequential(
            nn.ConvTranspose2d(in_channels=8, out_channels=3, kernel_size=self.k2, stride=self.s2,
                               padding=self.pd2),
            nn.BatchNorm2d(3, momentum=0.01),
            nn.Sigmoid()    # y = (y1, y2, y3) \in [0 ,1]^3
        )


    def encode(self, x):
        x = self.resnet(x)  # ResNet
        x = x.view(x.size(0), -1)  # flatten output of conv

        # FC layers
        x = self.bn1(self.fc1(x))
        x = self.relu(x)
        x = self.bn2(self.fc2(x))
        x = self.relu(x)
        # x = F.dropout(x, p=self.drop_p, training=self.training)
        mu, logvar = self.fc3_mu(x), self.fc3_logvar(x)
        return mu, logvar

    def reparameterize(self, mu, logvar):
        if self.training:
            std = logvar.mul(0.5).exp_()
            eps = Variable(std.data.new(std.size()).normal_())
            return eps.mul(std).add_(mu)
        else:
            return mu

    def decode(self, z):
        x = self.relu(self.fc_bn4(self.fc4(z)))
        x = self.relu(self.fc_bn5(self.fc5(x))).view(-1, 64, 4, 4)
        x = self.convTrans6(x)
        x = self.convTrans7(x)
        x = self.convTrans8(x)
        x = F.interpolate(x, size=(224, 224), mode='bilinear')
        return x

    def forward(self, x):
        mu, logvar = self.encode(x)
        z = self.reparameterize(mu, logvar)
        x_reconst = self.decode(z)

        return x_reconst, z, mu, logvar


================================================
FILE: plot_latent.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "import matplotlib.pyplot as plt\n",
    "from mpl_toolkits.mplot3d import Axes3D\n",
    "import matplotlib.cm as cm\n",
    "from sklearn.manifold import TSNE\n",
    "import numpy as np\n",
    "import pickle\n",
    "\n",
    "      \n",
    "epoch = 20\n",
    "exp = 'cifar10'\n",
    "# exp = 'MNIST'\n",
    "\n",
    "N = 6000 # image number\n",
    "\n",
    "y_train = np.load('./results_{}/y_{}_train_epoch{}.npy'.format(exp, exp, epoch))\n",
    "z_train = np.load('./results_{}/z_{}_train_epoch{}.npy'.format(exp, exp, epoch))\n",
    "classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']  # cifar10\n",
    "# classes = np.arange(10) #MNIST\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Direct projection of latent space"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "y_train = y_train[:N]\n",
    "z_train = z_train[:N]\n",
    "\n",
    "fig = plt.figure(figsize=(12, 10))\n",
    "plots = []\n",
    "markers = ['o', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']\n",
    "for i, c in enumerate(classes):\n",
    "    ind = (y_train == i).tolist() or ([j < N // len(classes) for j in range(len(y_train))])\n",
    "    color = cm.jet([i / len(classes)] * sum(ind))\n",
    "    plots.append(plt.scatter(z_train[ind, 1], z_train[ind, 2], marker=markers[i], c=color, s=8, label=i))\n",
    "\n",
    "plt.axis('off')\n",
    "plt.legend(plots, classes, fontsize=14, loc='upper right')\n",
    "plt.title('{} (direct projection: {}-dim -> 2-dim)'.format(exp, z_train.shape[1]), fontsize=14)\n",
    "plt.savefig(\"./ResNetVAE_{}_direct_plot.png\".format(exp), bbox_inches='tight', dpi=600)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Use t-SNE for dimension reduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### compressed to 2-dimension"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z_embed = TSNE(n_components=2, n_iter=12000).fit_transform(z_train[:N])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(12, 10))\n",
    "plots = []\n",
    "markers = ['o', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']  # select different markers\n",
    "for i, c in enumerate(classes):\n",
    "    ind = (y_train[:N] == i).tolist()\n",
    "    color = cm.jet([i / len(classes)] * sum(ind))\n",
    "    # plot each category one at a time \n",
    "    plots.append(plt.scatter(z_embed[ind, 0], z_embed[ind, 1], c=color, marker=markers[i], s=8, label=i))\n",
    "\n",
    "plt.axis('off')\n",
    "plt.xlim(-150, 150)\n",
    "plt.ylim(-150, 150)\n",
    "plt.legend(plots, classes, fontsize=14, loc='upper right')\n",
    "plt.title('{} (t-SNE: {}-dim -> 2-dim)'.format(exp, z_train.shape[1]), fontsize=14)\n",
    "plt.savefig(\"./ResNetVAE_{}_embedded_plot.png\".format(exp), bbox_inches='tight', dpi=600)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### compressed to 3-dimension"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "z_embed3D = TSNE(n_components=3, n_iter=12000).fit_transform(z_train[:N])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig = plt.figure(figsize=(12, 10))\n",
    "ax = fig.add_subplot(111, projection='3d')\n",
    "\n",
    "plots = []\n",
    "markers = ['o', ',', 'x', '+', 'v', '^', '<', '>', 's', 'd']  # select different markers\n",
    "for i, c in enumerate(classes):\n",
    "    ind = (y_train[:N] == i).tolist()\n",
    "    color = cm.jet([i / len(classes)] * sum(ind))\n",
    "    # plot each category one at a time \n",
    "    ax.scatter(z_embed3D[ind, 0], z_embed3D[ind, 1], c=color, marker=markers[i], s=8, label=i)\n",
    "\n",
    "ax.axis('on')\n",
    "\n",
    "r_max = 20\n",
    "r_min = -r_max\n",
    "\n",
    "ax.set_xlim(r_min, r_max)\n",
    "ax.set_ylim(r_min, r_max)\n",
    "ax.set_zlim(r_min, r_max)\n",
    "ax.set_xlabel('z-dim 1')\n",
    "ax.set_ylabel('z-dim 2')\n",
    "ax.set_zlabel('z-dim 3')\n",
    "ax.set_title('{} (t-SNE: {}-dim -> 3-dim)'.format(exp, z_train.shape[1]), fontsize=14)\n",
    "ax.legend(plots, classes, fontsize=14, loc='upper right')\n",
    "plt.savefig(\"./ResNetVAE_{}_embedded_3Dplot.png\".format(exp), bbox_inches='tight', dpi=600)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}