Repository: elieJalbout/Clustering-with-Deep-learning Branch: master Commit: 95eae15b8799 Files: 10 Total size: 71.3 KB Directory structure: gitextract_5766_38r/ ├── .gitignore ├── .project ├── .pydevproject ├── README.md ├── archs/ │ ├── coil.json │ └── mnist.json ├── customlayers.py ├── main.py ├── misc.py └── network.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: .gitignore ================================================ *.pyc outputs/*.png logs/* .idea/ ================================================ FILE: .project ================================================ DeepConvJointClustering org.python.pydev.PyDevBuilder org.python.pydev.pythonNature ================================================ FILE: .pydevproject ================================================ /${PROJECT_DIR_NAME} python 2.7 Default ================================================ FILE: README.md ================================================ Deep Learning for Clustering ======================= Code for project "Deep Learning for Clustering" under lab course "Deep Learning for Computer Vision and Biomedicine" - TUM. Depends on **numpy**, **theano**, **lasagne**, **scikit-learn**, **matplotlib**. #### Contributors - [Mohd Yawar Nihal Siddiqui](mailto:yawarnihal@gmail.com) - [Elie Aljalbout](mailto:elie.aljalbout@tum.de) - [Vladimir Golkov](mailto:vladimir.golkov@tum.de) (Supervisor) #### Related Papers: This repository is an implementation of the paper : Elie Aljalbout, Vladimir Golkov, Yawar Siddiqui, Daniel Cremers "Clustering with Deep Learning: Taxonomy and new methods" - arxiv: https://arxiv.org/abs/1801.07648 Usage -------- Use the main script for training, visualizing clusters and/or reporting clustering metrics ``` python main.py ``` Option | | -------- | --- ```-d DATASET_NAME, --dataset DATASET_NAME ```| ``(Required) Dataset on which autoencoder is to be trained trained, or metrics/visualizations are to be performed [MNIST,COIL20]`` ```-a ARCH_IDX, --architecture ARCH_IDX```| ``(Required) Index of architecture of autoencoder in the json file (archs/)`` ``--pretrain EPOCHS`` | ``Pretrain the autoencoder for specified #epochs specified by architecture on specified dataset`` ``--cluster EPOCHS``| ``Refine the autoencoder for specified #epochs with clustering loss, assumes that pretraining results are available`` ``--metrics``| ``Report k-means clustering metrics on the clustered latent space, assumes pretrain and cluster based training have been performed`` ``--visualize``|``Visualize the image space and latent space, assumes pre-training and cluster based training have been performed`` Project Structure ------------------------ Folder / File | Description| -------- | --- archs| Contains json files specifying architectures for autoencoder networks used. File ``mnist.json`` contains architectures for MNIST dataset. We use the second architecture for the reported results (command line argument ``-a 1``) coil, mnist | Contains the datasets COIL20 and MNIST respectively logs| Output folder for logs generated by the scripts. Named by date and time of script execution plots|Scatter plots showing the raw, pre-trained latent space, and the final latent space clusters saved_params | Contains saved network parameters and saved representation of inputs in latent space custom_layers.py | Custom lasagne layers, Unpool2D - which performs inverse max pooling by replicating input pixels as dictated by the filter size, and the ClusteringLayer - a layer that outputs soft cluster assignments based on k-means cluster distance main.py | The main python script for training and evaluating the network misc.py | Contains dataset handlers and other utility methods network.py| Contains classes for parsing and building the network from json files and also for training the network Autoencoder Builder ----------------------------- We've implemented a **NetworkBuilder** class that can be used to quickly describe the architecture of an autoencoder through a **json** file. The json specification of the architecture is a dictionary with the following fields | Field | Description ---------|------------ name| Name identifier given to the architecture, used for file naming while saving parameters batch_size| Batch size to be used while training the network use_batch_norm| Whether to use batch normalization for convolutional/deconvolutional layers network_type| Type of network - convolutional or fully connected layers| A list describing the encoder part of the autoencoder Further, each item in the layers list is a dictionary with the following fields | Field | Description ---------|------------ type| Can be Input, Conv2D, MaxPool2D, MaxPool2D*, Dense, Reshape, Deconv2D num_filters| For Conv2D/MaxPool2D/MaxPool2D*/Deconv2D layers this field specifies number of filters filter_size| Dimensions of kernel for the above layers num_units| For Dense layer number of hidden units non_linearity| Non-Linearity function used at output of the layer conv_mode| Can be used to specify the convolution mode like same, valid etc. for convolutional layers output_non_linearity| If you want a different non linearity function at the output than the one which would be obtained by mirroring Only the encoder part of the autoencoder needs to be specified, the decoder will be automatically generated by the class. Example of a network description ```json { "name": "c-5-6_p_c-5-16_p_c-4-120", "use_batch_norm": 1, "batch_size": 100, "layers": [ { "type": "Input", "output_shape":[1, 28, 28] }, { "type": "Conv2D", "num_filters": 50, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type": "MaxPool2D*", "filter_size": [2, 2] }, { "type": "Conv2D", "num_filters": 50, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type": "MaxPool2D*", "filter_size": [2, 2] }, { "type": "Conv2D", "num_filters": 120, "filter_size": [4, 4], "non_linearity": "linear" } ] } ``` This would generate the network ``50[5x5] 50[5x5]_bn max[2x2] 50[5x5] 50[5x5]_bn max[2x2]`` **``120[4x4] 120[4x4]_bn `` **``50[4x4] 50[4x4]_bn ups*[2x2] 50[5x5] 50[5x5]_bn ups*[2x2] 1[5x5]`` Experiments and Results ----------------------------------- We trained and tested the network on two datasets - MNIST and COIL20 |Dataset| Image size | Number of samples | Number of clusters -------- | ---|---|--- MNIST| 28x28x1|60000|10 COIL20| 128x128x1|1440|20 Clustering was performed with two different loss functions - - Loss = ``KL-Divergence(soft assignment distribution, target distribution) + Autoencoder Reconstruction loss ``, where the target distribution is a distribution that improves cluster purity and puts more emphasis on data points assigned with a high confidence. For more details check out the DEC paper [[1]](https://arxiv.org/abs/1511.06335). - Loss = ``k-Means loss + Autoencoder Reconstruction loss`` #### **MNIST** ##### Our network | Clustering space| Clustering Accuracy| Normalized Mutual Information -------- | ---|---- Image pixels | 0.542|0.480 Autoencoder| 0.760|0.667 Autoencoder + k-Means Loss| 0.781| 0.796 Autoencoder + KLDiv Loss| **0.859**| **0.825** ##### Other networks |Method| Clustering Accuracy| Normalized Mutual Information -------- | ---|---- DEC|0.843|0.800 DCN|0.830|0.810 CNN-RC| - |0.915 CNN-FD|-|0.876 DBC| 0.964|0.917 > Note: The commit b34743114f68624b5371cd0d4c059b141422902f gives upto 0.96 accuracy and 0.92 NMI on the MNIST dataset. We will include it to the main branch once we can get better results with the COIL architecture ##### **Latent space visualizations** ###### Pixel space ![](/plots/MNIST/raw.png) ###### Autoencoder ![](/plots/MNIST/autoencoder.png) ###### Autoencoder Latent Space Evolution (video) [![Autoencoder](http://img.youtube.com/vi/_WuUB3gD984/0.jpg)](https://www.youtube.com/watch?v=_WuUB3gD984) ###### Autoencoder + KLDivergence ![](/plots/MNIST/clustered_kld.png) ###### Autoencoder + KLDivergence Latent Space Evolution (video) [![Autoencoder](http://img.youtube.com/vi/XYS7DFkVx_A/0.jpg)](https://www.youtube.com/watch?v=XYS7DFkVx_A) ###### Autoencoder + k-Means ![](/plots/MNIST/clustered_km.png) #### **COIL20** ##### Our network | Clustering space| Clustering Accuracy| Normalized Mutual Information -------- | ---|---- Image pixels | 0.689|0.793 Autoencoder| 0.739|0.828 Autoencoder + k-Means Loss| 0.745| 0.846 Autoencoder + KLDiv Loss| 0.762| 0.848 ##### Other networks |Method| Clustering Accuracy| Normalized Mutual Information -------- | ---|---- DEN|0.725|0.870 CNN-RC| - |1.000 DBC| 0.793|0.895 ##### **Latent space visualizations** ###### Pixel space ![](/plots/COIL20/raw.png) ###### Autoencoder ![](/plots/COIL20/autoencoder.png) ###### Autoencoder + k-Means ![](/plots/COIL20/clustered_km.png) ###### Autoencoder + KLDivergence ![](/plots/COIL20/clustered_kld.png) ================================================ FILE: archs/coil.json ================================================ [ { "name": "c-9-20_p-2_c-5-20_p-2_c-5-40_p-2_c-4-320", "use_batch_norm":1, "batch_size": 10, "layers": [ { "type":"Input", "output_shape": [1, 128, 128] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [9, 9], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 40, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 320, "filter_size": [4, 4], "non_linearity": "linear" } ] }, { "name": "c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-400", "use_batch_norm":1, "batch_size": 10, "layers": [ { "type":"Input", "output_shape": [1, 128, 128] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [9, 9], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 320, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type": "Dense", "num_units": 5120, "non_linearity": "rectify" }, { "type": "Dense", "num_units": 400, "non_linearity": "rectify" } ] }, { "name": "c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-400-fc-200", "use_batch_norm":1, "batch_size": 10, "layers": [ { "type":"Input", "output_shape": [1, 128, 128] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [9, 9], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 320, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type": "Dense", "num_units": 5120, "non_linearity": "rectify" }, { "type": "Dense", "num_units": 400, "non_linearity": "rectify" }, { "type": "Dense", "num_units": 200, "non_linearity": "rectify" } ] }, { "name": "c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-200", "use_batch_norm":1, "batch_size": 10, "layers": [ { "type":"Input", "output_shape": [1, 128, 128] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [9, 9], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 320, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type": "Dense", "num_units": 5120, "non_linearity": "rectify" }, { "type": "Dense", "num_units": 200, "non_linearity": "rectify" } ] }, { "name": "c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-80", "use_batch_norm":1, "batch_size": 10, "layers": [ { "type":"Input", "output_shape": [1, 128, 128] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [9, 9], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 320, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type": "Dense", "num_units": 5120, "non_linearity": "rectify" }, { "type": "Dense", "num_units": 80, "non_linearity": "rectify" } ] }, { "name": "c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-32", "use_batch_norm":1, "batch_size": 10, "layers": [ { "type":"Input", "output_shape": [1, 128, 128] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [9, 9], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 320, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type": "Dense", "num_units": 5120, "non_linearity": "rectify" }, { "type": "Dense", "num_units": 32, "non_linearity": "rectify" } ] }, { "name": "c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-1", "use_batch_norm":0, "batch_size": 10, "layers": [ { "type":"Input", "output_shape": [1, 128, 128] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [9, 9], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 320, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type": "Dense", "num_units": 5120, "non_linearity": "rectify" }, { "type": "Dense", "num_units": 1, "non_linearity": "rectify" } ] }, { "name": "c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-20", "use_batch_norm":1, "batch_size": 10, "layers": [ { "type":"Input", "output_shape": [1, 128, 128] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [9, 9], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 20, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D", "filter_size": [2, 2] }, { "type":"Conv2D", "num_filters": 320, "filter_size": [5, 5], "non_linearity": "rectify" }, { "type":"MaxPool2D*", "filter_size": [2, 2] }, { "type": "Dense", "num_units": 5120, "non_linearity": "rectify" }, { "type": "Dense", "num_units": 20, "non_linearity": "rectify" } ] } ] ================================================ FILE: archs/mnist.json ================================================ [ { "name": "c-3-32_p_c-3-64_p_fc-32", "batch_size": 50, "layers": [ { "type": "Input", "output_shape": [ 1, 28, 28 ] }, { "type": "Conv2D", "num_filters": 32, "filter_size": [ 3, 3 ], "non_linearity": "rectify", "conv_mode": "same" }, { "type": "MaxPool2D", "filter_size": [ 2, 2 ] }, { "type": "Conv2D", "num_filters": 64, "filter_size": [ 3, 3 ], "non_linearity": "rectify", "conv_mode": "same" }, { "type": "MaxPool2D", "filter_size": [ 2, 2 ] }, { "type": "Dense", "num_units": 3136, "non_linearity": "rectify" }, { "type": "Dense", "num_units": 32, "non_linearity": "rectify" } ] }, { "name": "c-5-6_p_c-5-16_p_c-4-120", "use_batch_norm": 1, "batch_size": 100, "layers": [ { "type": "Input", "output_shape": [ 1, 28, 28 ] }, { "type": "Conv2D", "num_filters": 50, "filter_size": [ 5, 5 ], "non_linearity": "rectify" }, { "type": "MaxPool2D*", "filter_size": [ 2, 2 ] }, { "type": "Conv2D", "num_filters": 50, "filter_size": [ 5, 5 ], "non_linearity": "rectify" }, { "type": "MaxPool2D*", "filter_size": [ 2, 2 ] }, { "type": "Conv2D", "num_filters": 120, "filter_size": [ 4, 4 ], "non_linearity": "linear" } ] } ] ================================================ FILE: customlayers.py ================================================ ''' Created on Jul 25, 2017 ''' from lasagne import layers import theano import theano.tensor as T class Unpool2DLayer(layers.Layer): """ This layer performs unpooling over the last two dimensions of a 4D tensor. Layer borrowed from: https://swarbrickjones.wordpress.com/2015/04/29/convolutional-autoencoders-in-pythontheanolasagne/ """ def __init__(self, incoming, ds, **kwargs): super(Unpool2DLayer, self).__init__(incoming, **kwargs) self.ds = ds def get_output_shape_for(self, input_shape): output_shape = list(input_shape) output_shape[2] = input_shape[2] * self.ds[0] output_shape[3] = input_shape[3] * self.ds[1] return tuple(output_shape) def get_output_for(self, incoming, **kwargs): ''' Just repeats the input element the upscaled image ''' ds = self.ds return incoming.repeat(ds[0], axis=2).repeat(ds[1], axis=3) class ClusteringLayer(layers.Layer): ''' This layer gives soft assignments for the clusters based on distance from k-means based cluster centers. The weights of the layers are the cluster centers so that they can be learnt while optimizing for loss ''' def __init__(self, incoming, num_clusters, initial_clusters, num_samples, latent_space_dim, **kwargs): super(ClusteringLayer, self).__init__(incoming, **kwargs) self.num_clusters = num_clusters self.W = self.add_param(theano.shared(initial_clusters), initial_clusters.shape, 'W') self.num_samples = num_samples self.latent_space_dim = latent_space_dim def get_output_shape_for(self, input_shape): ''' Output shape is number of inputs x number of cluster, i.e for each input soft assignments corresponding to all clusters ''' return (input_shape[0], self.num_clusters) def get_output_for(self, incoming, **kwargs): return getSoftAssignments(incoming, self.W, self.num_clusters, self.latent_space_dim, self.num_samples) def getSoftAssignments(latent_space, cluster_centers, num_clusters, latent_space_dim, num_samples): ''' Returns cluster membership distribution for each sample :param latent_space: latent space representation of inputs :param cluster_centers: the coordinates of cluster centers in latent space :param num_clusters: total number of clusters :param latent_space_dim: dimensionality of latent space :param num_samples: total number of input samples :return: soft assigment based on the equation qij = (1+|zi - uj|^2)^(-1)/sum_j'((1+|zi - uj'|^2)^(-1)) ''' z_expanded = latent_space.reshape((num_samples, 1, latent_space_dim)) z_expanded = T.tile(z_expanded, (1, num_clusters, 1)) u_expanded = T.tile(cluster_centers, (num_samples, 1, 1)) distances_from_cluster_centers = (z_expanded - u_expanded).norm(2, axis=2) qij_numerator = 1 + distances_from_cluster_centers * distances_from_cluster_centers qij_numerator = 1 / qij_numerator normalizer_q = qij_numerator.sum(axis=1).reshape((num_samples, 1)) return qij_numerator / normalizer_q ================================================ FILE: main.py ================================================ ''' Created on Jul 9, 2017 ''' import numpy import json from misc import DatasetHelper, evaluateKMeans, visualizeData from network import DCJC, rootLogger from copy import deepcopy import argparse def testOnlyClusterInitialization(dataset_name, arch, epochs): ''' Train an autoencoder defined by architecture arch and trains it with the dataset defined :param dataset_name: Name of the dataset with which the network will be trained [MNIST, COIL20] :param arch: Architecture of the network as a dictionary. Specification for architecture can be found in readme.md :param epochs: Number of train epochs :return: None - (side effect) saves the latent space and params of trained network in an appropriate location in saved_params folder ''' arch_copy = deepcopy(arch) rootLogger.info("Loading dataset") dataset = DatasetHelper(dataset_name) dataset.loadDataset() rootLogger.info("Done loading dataset") rootLogger.info("Creating network") dcjc = DCJC(arch_copy) rootLogger.info("Done creating network") rootLogger.info("Starting training") dcjc.pretrainWithData(dataset, epochs, False); def testOnlyClusterImprovement(dataset_name, arch, epochs, method): ''' Use an initialized autoencoder and train it along with clustering loss. Assumed that pretrained autoencoder params are available, i.e. testOnlyClusterInitialization has been run already with the given params :param dataset_name: Name of the dataset with which the network will be trained [MNIST, COIL20] :param arch: Architecture of the network as a dictionary. Specification for architecture can be found in readme.md :param epochs: Number of train epochs :param method: Can be KM or KLD - depending on whether the clustering loss is KLDivergence loss between the current KMeans distribution(Q) and a more desired one(Q^2), or if the clustering loss is just the Kmeans loss :return: None - (side effect) saves latent space and params of the trained network ''' arch_copy = deepcopy(arch) rootLogger.info("Loading dataset") dataset = DatasetHelper(dataset_name) dataset.loadDataset() rootLogger.info("Done loading dataset") rootLogger.info("Creating network") dcjc = DCJC(arch_copy) rootLogger.info("Starting cluster improvement") if method == 'KM': dcjc.doClusteringWithKMeansLoss(dataset, epochs) elif method == 'KLD': dcjc.doClusteringWithKLdivLoss(dataset, True, epochs) def testKMeans(dataset_name, archs): ''' Performs kMeans clustering, and report metrics on the output latent space produced by the networks defined in archs, with given dataset. Assumes that testOnlyClusterInitialization and testOnlyClusterImprovement have been run before this for the specified archs/datasets, as the results saved by them are used for clustering :param dataset_name: Name of dataset [MNIST, COIL20] :param archs: Architectures as a dictionary :return: None - reports the accuracy and nmi clustering metrics ''' rootLogger.info('Initial Cluster Quality Comparison') rootLogger.info(80 * '_') rootLogger.info('%-50s %8s %8s' % ('method', 'ACC', 'NMI')) rootLogger.info(80 * '_') dataset = DatasetHelper(dataset_name) dataset.loadDataset() rootLogger.info(evaluateKMeans(dataset.input_flat, dataset.labels, dataset.getClusterCount(), 'image')[0]) for arch in archs: Z = numpy.load('saved_params/' + dataset.name + '/z_' + arch['name'] + '.npy') rootLogger.info(evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), arch['name'])[0]) Z = numpy.load('saved_params/' + dataset.name + '/pc_z_' + arch['name'] + '.npy') rootLogger.info(evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), arch['name'])[0]) Z = numpy.load('saved_params/' + dataset.name + '/pc_km_z_' + arch['name'] + '.npy') rootLogger.info(evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), arch['name'])[0]) rootLogger.info(80 * '_') def visualizeLatentSpace(dataset_name, arch): ''' Plots and saves graphs for visualized images space, autoencoder latent space, and the final clustering latent space :param dataset_name: Name of dataset [MNIST, COIL20] :param arch: Architectures as a dictionary :return: None - (side effect) saved graphs in plots/ folder ''' rootLogger.info("Loading dataset") dataset = DatasetHelper(dataset_name) dataset.loadDataset() rootLogger.info("Done loading dataset") # We consider only the first 5000 point or less for better visualization max_points = min(dataset.input_flat.shape[0], 5000) # Image space visualizeData(dataset.input_flat[0:max_points], dataset.labels[0:max_points], dataset.getClusterCount(), "plots/%s/raw.png" % dataset.name) # Latent space - autoencoder Z = numpy.load('saved_params/' + dataset.name + '/z_' + arch['name'] + '.npy') visualizeData(Z[0:max_points], dataset.labels[0:max_points], dataset.getClusterCount(), "plots/%s/autoencoder.png" % dataset.name) # Latent space - kl div clustering network Z = numpy.load('saved_params/' + dataset.name + '/pc_z_' + arch['name'] + '.npy') visualizeData(Z[0:max_points], dataset.labels[0:max_points], dataset.getClusterCount(), "plots/%s/clustered_kld.png" % dataset.name) # Latent space - kmeans clustering network Z = numpy.load('saved_params/' + dataset.name + '/pc_km_z_' + arch['name'] + '.npy') visualizeData(Z[0:max_points], dataset.labels[0:max_points], dataset.getClusterCount(), "plots/%s/clustered_km.png" % dataset.name) if __name__ == '__main__': ''' usage: main.py [-h] -d DATASET -a ARCHITECTURE [--pretrain PRETRAIN] [--cluster CLUSTER] [--metrics METRICS] [--visualize VISUALIZE] required arguments: -d DATASET, --dataset DATASET Dataset on which autoencoder is trained [MNIST,COIL20] -a ARCHITECTURE, --architecture ARCHITECTURE Index of architecture of autoencoder in the json file (archs/) optional arguments: -h, --help show this help message and exit --pretrain PRETRAIN Pretrain the autoencoder for specified #epochs specified by architecture on specified dataset --cluster CLUSTER Refine the autoencoder for specified #epochs with clustering loss, assumes that pretraining results are available --metrics METRICS Report k-means clustering metrics on the clustered latent space, assumes pretrain and cluster based training have been performed --visualize VISUALIZE Visualize the image space and latent space, assumes pretraining and cluster based training have been performed ''' # Load architectures from the json files mnist_archs = [] coil_archs = [] with open("archs/coil.json") as archs_file: coil_archs = json.load(archs_file) with open("archs/mnist.json") as archs_file: mnist_archs = json.load(archs_file) # Argument parsing parser = argparse.ArgumentParser() requiredArgs = parser.add_argument_group('required arguments') requiredArgs.add_argument("-d", "--dataset", help="Dataset on which autoencoder is trained [MNIST,COIL20]", required=True) requiredArgs.add_argument("-a", "--architecture", type=int, help="Index of architecture of autoencoder in the json file (archs/)", required=True) parser.add_argument("--pretrain", type=int, help="Pretrain the autoencoder for specified #epochs specified by architecture on specified dataset") parser.add_argument("--cluster", type=int, help="Refine the autoencoder for specified #epochs with clustering loss, assumes that pretraining results are available") parser.add_argument("--metrics", action='store_true', help="Report k-means clustering metrics on the clustered latent space, assumes pretrain and cluster based training have been performed") parser.add_argument("--visualize", action='store_true', help="Visualize the image space and latent space, assumes pretraining and cluster based training have been performed") args = parser.parse_args() # Train/Visualize as per the arguments dataset_name = args.dataset arch_index = args.architecture if dataset_name == 'MNIST': archs = mnist_archs elif dataset_name == 'COIL20': archs = coil_archs if args.pretrain: testOnlyClusterInitialization(dataset_name, archs[arch_index], args.pretrain) if args.cluster: testOnlyClusterImprovement(dataset_name, archs[arch_index], args.cluster, "KLD") if args.metrics: testKMeans(dataset_name, [archs[arch_index]]) if args.visualize: visualizeLatentSpace(dataset_name, archs[arch_index]) ================================================ FILE: misc.py ================================================ ''' Created on Jul 11, 2017 ''' import cPickle import gzip import numpy as np from PIL import Image import matplotlib # For plotting graphs via ssh with no display # Ref: https://stackoverflow.com/questions/2801882/generating-a-png-with-matplotlib-when-display-is-undefined matplotlib.use('Agg') from matplotlib import pyplot as plt from numpy import float32 from sklearn import metrics from sklearn.cluster.k_means_ import KMeans from sklearn import manifold from sklearn.utils.linear_assignment_ import linear_assignment class DatasetHelper(object): ''' Utility class for handling different datasets ''' def __init__(self, name): ''' A dataset instance keeps dataset name, the input set, the flat version of input set and the cluster labels ''' self.name = name if name == 'MNIST': self.dataset = MNISTDataset() elif name == 'STL': self.dataset = STLDataset() elif name == 'COIL20': self.dataset = COIL20Dataset() def loadDataset(self): ''' Load the appropriate dataset based on the dataset name ''' self.input, self.labels, self.input_flat = self.dataset.loadDataset() def getClusterCount(self): ''' Number of clusters in the dataset - e.g 10 for mnist, 20 for coil20 ''' return self.dataset.cluster_count def iterate_minibatches(self, set_type, batch_size, targets=None, shuffle=False): ''' Utility method for getting batches out of a dataset :param set_type: IMAGE - suitable input for CNNs or FLAT - suitable for DNN :param batch_size: Size of minibatches :param targets: None if the output should be same as inputs (autoencoders), otherwise takes a target array from which batches can be extracted. Must have the same order as the dataset, e.g, dataset inputs nth sample has output at target's nth element :param shuffle: If the dataset needs to be shuffled or not :return: generates a batches of size batch_size from the dataset, each batch is the pair (input, output) ''' inputs = None if set_type == 'IMAGE': inputs = self.input if targets is None: targets = self.input elif set_type == 'FLAT': inputs = self.input_flat if targets is None: targets = self.input_flat assert len(inputs) == len(targets) if shuffle: indices = np.arange(len(inputs)) np.random.shuffle(indices) for start_idx in range(0, len(inputs) - batch_size + 1, batch_size): if shuffle: excerpt = indices[start_idx:start_idx + batch_size] else: excerpt = slice(start_idx, start_idx + batch_size) yield inputs[excerpt], targets[excerpt] class MNISTDataset(object): ''' Class for reading and preparing MNIST dataset ''' def __init__(self): self.cluster_count = 10 def loadDataset(self): f = gzip.open('mnist/mnist.pkl.gz', 'rb') train_set, _, test_set = cPickle.load(f) train_input, train_input_flat, train_labels = self.prepareDatasetForAutoencoder(train_set[0], train_set[1]) test_input, test_input_flat, test_labels = self.prepareDatasetForAutoencoder(test_set[0], test_set[1]) f.close() # combine test and train samples return [np.concatenate((train_input, test_input)), np.concatenate((train_labels, test_labels)), np.concatenate((train_input_flat, test_input_flat))] def prepareDatasetForAutoencoder(self, inputs, targets): ''' Returns the image, flat and labels as a tuple ''' X = inputs X = X.reshape((-1, 1, 28, 28)) return (X, X.reshape((-1, 28 * 28)), targets) class STLDataset(object): ''' Class for preparing and reading the STL dataset ''' def __init__(self): self.cluster_count = 10 def loadDataset(self): train_x = np.fromfile('stl/train_X.bin', dtype=np.uint8) train_y = np.fromfile('stl/train_y.bin', dtype=np.uint8) test_x = np.fromfile('stl/train_X.bin', dtype=np.uint8) test_y = np.fromfile('stl/train_y.bin', dtype=np.uint8) train_input = np.reshape(train_x, (-1, 3, 96, 96)) train_labels = train_y train_input_flat = np.reshape(test_x, (-1, 1, 3 * 96 * 96)) test_input = np.reshape(test_x, (-1, 3, 96, 96)) test_labels = test_y test_input_flat = np.reshape(test_x, (-1, 1, 3 * 96 * 96)) return [np.concatenate(train_input, test_input), np.concatenate(train_labels, test_labels), np.concatenate(train_input_flat, test_input_flat)] class COIL20Dataset(object): ''' Class for reading and preparing the COIL20Dataset ''' def __init__(self): self.cluster_count = 20 def loadDataset(self): train_x = np.load('coil/coil_X.npy').astype(np.float32) / 256.0 train_y = np.load('coil/coil_y.npy') train_x_flat = np.reshape(train_x, (-1, 128 * 128)) return [train_x, train_y, train_x_flat] def rescaleReshapeAndSaveImage(image_sample, out_filename): ''' For saving the reconstructed output as an image :param image_sample: output of the autoencoder :param out_filename: filename for the saved image :return: None (side effect) Image saved ''' image_sample = ((image_sample - np.amin(image_sample)) / (np.amax(image_sample) - np.amin(image_sample))) * 255; image_sample = np.rint(image_sample).astype(int) image_sample = np.clip(image_sample, a_min=0, a_max=255).astype('uint8') img = Image.fromarray(image_sample, 'L') img.save(out_filename) def cluster_acc(y_true, y_pred): ''' Uses the hungarian algorithm to find the best permutation mapping and then calculates the accuracy wrt Implementation inpired from https://github.com/piiswrong/dec, since scikit does not implement this metric this mapping and true labels :param y_true: True cluster labels :param y_pred: Predicted cluster labels :return: accuracy score for the clustering ''' D = int(max(y_pred.max(), y_true.max()) + 1) w = np.zeros((D, D), dtype=np.int32) for i in range(y_pred.size): idx1 = int(y_pred[i]) idx2 = int(y_true[i]) w[idx1, idx2] += 1 ind = linear_assignment(w.max() - w) return sum([w[i, j] for i, j in ind]) * 1.0 / y_pred.size def getClusterMetricString(method_name, labels_true, labels_pred): ''' Creates a formatted string containing the method name and acc, nmi metrics - can be used for printing :param method_name: Name of the clustering method (just for printing) :param labels_true: True label for each sample :param labels_pred: Predicted label for each sample :return: Formatted string containing metrics and method name ''' acc = cluster_acc(labels_true, labels_pred) nmi = metrics.normalized_mutual_info_score(labels_true, labels_pred) return '%-50s %8.3f %8.3f' % (method_name, acc, nmi) def evaluateKMeans(data, labels, nclusters, method_name): ''' Clusters data with kmeans algorithm and then returns the string containing method name and metrics, and also the evaluated cluster centers :param data: Points that need to be clustered as a numpy array :param labels: True labels for the given points :param nclusters: Total number of clusters :param method_name: Name of the method from which the clustering space originates (only used for printing) :return: Formatted string containing metrics and method name, cluster centers ''' kmeans = KMeans(n_clusters=nclusters, n_init=20) kmeans.fit(data) return getClusterMetricString(method_name, labels, kmeans.labels_), kmeans.cluster_centers_ def visualizeData(Z, labels, num_clusters, title): ''' TSNE visualization of the points in latent space Z :param Z: Numpy array containing points in latent space in which clustering was performed :param labels: True labels - used for coloring points :param num_clusters: Total number of clusters :param title: filename where the plot should be saved :return: None - (side effect) saves clustering visualization plot in specified location ''' labels = labels.astype(int) tsne = manifold.TSNE(n_components=2, init='pca', random_state=0) Z_tsne = tsne.fit_transform(Z) fig = plt.figure() plt.scatter(Z_tsne[:, 0], Z_tsne[:, 1], s=2, c=labels, cmap=plt.cm.get_cmap("jet", num_clusters)) plt.colorbar(ticks=range(num_clusters)) fig.savefig(title, dpi=fig.dpi) ================================================ FILE: network.py ================================================ ''' Created on Jul 11, 2017 ''' from datetime import datetime import logging from lasagne import layers import lasagne from lasagne.layers.helper import get_all_layers import theano import signal from customlayers import ClusteringLayer, Unpool2DLayer, getSoftAssignments from misc import evaluateKMeans, visualizeData, rescaleReshapeAndSaveImage import numpy as np import theano.tensor as T from lasagne.layers import batch_norm # Logging utilities - logs get saved in folder logs named by date and time, and also output # at standard output logFormatter = logging.Formatter("[%(asctime)s] %(message)s", datefmt='%m/%d %I:%M:%S') rootLogger = logging.getLogger() rootLogger.setLevel(logging.DEBUG) fileHandler = logging.FileHandler(datetime.now().strftime('logs/dcjc_%H_%M_%d_%m.log')) fileHandler.setFormatter(logFormatter) rootLogger.addHandler(fileHandler) consoleHandler = logging.StreamHandler() consoleHandler.setFormatter(logFormatter) rootLogger.addHandler(consoleHandler) class DCJC(object): # Main class holding autoencoder network and training functions def __init__(self, network_description): signal.signal(signal.SIGINT, self.signal_handler) self.name = network_description['name'] netbuilder = NetworkBuilder(network_description) self.shouldStopNow = False # Get the lasagne network using the network builder class that creates autoencoder with the specified architecture self.network = netbuilder.buildNetwork() self.encode_layer, self.encode_size = netbuilder.getEncodeLayerAndSize() self.t_input, self.t_target = netbuilder.getInputAndTargetVars() self.input_type = netbuilder.getInputType() self.batch_size = netbuilder.getBatchSize() rootLogger.info("Network: " + self.networkToStr()) # Reconstruction is just output of the network recon_prediction_expression = layers.get_output(self.network) # Latent/Encoded space is the output of the bottleneck/encode layer encode_prediction_expression = layers.get_output(self.encode_layer, deterministic=True) # Loss for autoencoder = reconstruction loss + weight decay regularizer loss = self.getReconstructionLossExpression(recon_prediction_expression, self.t_target) weightsl2 = lasagne.regularization.regularize_network_params(self.network, lasagne.regularization.l2) loss += (5e-5 * weightsl2) params = lasagne.layers.get_all_params(self.network, trainable=True) # SGD with momentum + Decaying learning rate self.learning_rate = theano.shared(lasagne.utils.floatX(0.01)) updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=self.learning_rate) # Theano functions for calculating loss, predicting reconstruction, encoding self.trainAutoencoder = theano.function([self.t_input, self.t_target], loss, updates=updates) self.predictReconstruction = theano.function([self.t_input], recon_prediction_expression) self.predictEncoding = theano.function([self.t_input], encode_prediction_expression) def getReconstructionLossExpression(self, prediction_expression, t_target): ''' Reconstruction loss = means square error between input and reconstructed input ''' loss = lasagne.objectives.squared_error(prediction_expression, t_target) loss = loss.mean() return loss def signal_handler(self,signal, frame): command = raw_input('\nWhat is your command?') if str(command).lower()=="stop": self.shouldStopNow = True else: exec(command) def pretrainWithData(self, dataset, epochs, continue_training=False): ''' Pretrains the autoencoder on the given dataset :param dataset: Data on which the autoencoder is trained :param epochs: number of training epochs :param continue_training: Resume training if saved params available :return: None - (side effect) saves the trained network params and latent space in appropriate location ''' batch_size = self.batch_size # array for holding the latent space representation of input Z = np.zeros((dataset.input.shape[0], self.encode_size), dtype=np.float32); # in case we're continuing training load the network params if continue_training: with np.load('saved_params/%s/m_%s.npz' % (dataset.name, self.name)) as f: param_values = [f['arr_%d' % i] for i in range(len(f.files))] lasagne.layers.set_all_param_values(self.network, param_values, trainable=True) for epoch in range(epochs): error = 0 total_batches = 0 for batch in dataset.iterate_minibatches(self.input_type, batch_size, shuffle=True): inputs, targets = batch error += self.trainAutoencoder(inputs, targets) total_batches += 1 # learning rate decay self.learning_rate.set_value(self.learning_rate.get_value() * lasagne.utils.floatX(0.9999)) # For every 20th iteration, print the clustering accuracy and nmi - for checking if the network # is actually doing something meaningful - the labels are never used for training if (epoch + 1) % 2 == 0: for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)): Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0]) # Uncomment the next two lines to create reconstruction outputs in folder dumps/ (may need to be created) #for i, x in enumerate(self.predictReconstruction(batch[0])): # print('dump') # rescaleReshapeAndSaveImage(x[0], "dumps/%02d%03d.jpg"%(epoch,i)); rootLogger.info(evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), "%d/%d [%.4f]" % (epoch + 1, epochs, error / total_batches))[0]) else: # Just report the training loss rootLogger.info("%-30s %8s %8s" % ("%d/%d [%.4f]" % (epoch + 1, epochs, error / total_batches), "", "")) if self.shouldStopNow: break # The inputs in latent space after pretraining for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)): Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0]) # Save network params and latent space np.save('saved_params/%s/z_%s.npy' % (dataset.name, self.name), Z) # Borrowed from mnist lasagne example np.savez('saved_params/%s/m_%s.npz' % (dataset.name, self.name), *lasagne.layers.get_all_param_values(self.network, trainable=True)) def doClusteringWithKLdivLoss(self, dataset, combined_loss, epochs): ''' Trains the autoencoder with combined kldivergence loss and reconstruction loss, or just the kldivergence loss At the moment does not give good results :param dataset: Data on which the autoencoder is trained :param combined_loss: boolean - whether to use both reconstruction and kl divergence loss or just kldivergence loss :param epochs: Number of training epochs :return: None - (side effect) saves the trained network params and latent space in appropriate location ''' batch_size = self.batch_size # Load saved network params and inputs in latent space obtained after pretraining with np.load('saved_params/%s/m_%s.npz' % (dataset.name, self.name)) as f: param_values = [f['arr_%d' % i] for i in range(len(f.files))] lasagne.layers.set_all_param_values(self.network, param_values, trainable=True) Z = np.load('saved_params/%s/z_%s.npy' % (dataset.name, self.name)) # Find initial cluster centers quality_desc, cluster_centers = evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), 'Initial') rootLogger.info(quality_desc) # P is the more pure target distribution we want to achieve P = T.matrix('P') # Extend the network so it calculates soft assignment cluster distribution for the inputs in latent space clustering_network = ClusteringLayer(self.encode_layer, dataset.getClusterCount(), cluster_centers, batch_size,self.encode_size) soft_assignments = layers.get_output(clustering_network) reconstructed_output_exp = layers.get_output(self.network) # Clustering loss = kl divergence between the pure distribution P and current distribution clustering_loss = self.getKLDivLossExpression(soft_assignments, P) reconstruction_loss = self.getReconstructionLossExpression(reconstructed_output_exp, self.t_target) params_ae = lasagne.layers.get_all_params(self.network, trainable=True) params_dec = lasagne.layers.get_all_params(clustering_network, trainable=True) # Total loss = weighted sum of the two losses w_cluster_loss = 1 w_reconstruction_loss = 1 total_loss = w_cluster_loss * clustering_loss if (combined_loss): total_loss = total_loss + w_reconstruction_loss * reconstruction_loss all_params = params_dec if combined_loss: all_params.extend(params_ae) # Parameters = unique parameters in the new network all_params = list(set(all_params)) # SGD with momentum, LR = 0.01, Momentum = 0.9 updates = lasagne.updates.nesterov_momentum(total_loss, all_params, learning_rate=0.01) # Function to calculate the soft assignment distribution getSoftAssignments = theano.function([self.t_input], soft_assignments) # Train function - based on whether complete loss is used or not trainFunction = None if combined_loss: trainFunction = theano.function([self.t_input, self.t_target, P], total_loss, updates=updates) else: trainFunction = theano.function([self.t_input, P], clustering_loss, updates=updates) for epoch in range(epochs): # Get the current distribution qij = np.zeros((dataset.input.shape[0], dataset.getClusterCount()), dtype=np.float32) for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)): qij[i * batch_size: (i + 1) * batch_size] = getSoftAssignments(batch[0]) # Calculate the desired distribution pij = self.calculateP(qij) error = 0 total_batches = 0 for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, pij, shuffle=True)): if (combined_loss): error += trainFunction(batch[0], batch[0], batch[1]) else: error += trainFunction(batch[0], batch[1]) total_batches += 1 for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)): Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0]) # For every 10th iteration, print the clustering accuracy and nmi - for checking if the network # is actually doing something meaningful - the labels are never used for training if (epoch + 1) % 10 == 0: rootLogger.info(evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), "%d [%.4f]" % ( epoch, error / total_batches))[0]) if self.shouldStopNow: break # Save the inputs in latent space and the network parameters for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)): Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0]) np.save('saved_params/%s/pc_z_%s.npy' % (dataset.name, self.name), Z) np.savez('saved_params/%s/pc_m_%s.npz' % (dataset.name, self.name), *lasagne.layers.get_all_param_values(self.network, trainable=True)) def calculateP(self, Q): # Function to calculate the desired distribution Q^2, for more details refer to DEC paper f = Q.sum(axis=0) pij_numerator = Q * Q pij_numerator = pij_numerator / f normalizer_p = pij_numerator.sum(axis=1).reshape((Q.shape[0], 1)) P = pij_numerator / normalizer_p return P def getKLDivLossExpression(self, Q_expression, P_expression): # Loss = KL Divergence between the two distributions log_arg = P_expression / Q_expression log_exp = T.log(log_arg) sum_arg = P_expression * log_exp loss = sum_arg.sum(axis=1).sum(axis=0) return loss def doClusteringWithKMeansLoss(self, dataset, epochs): ''' Trains the autoencoder with combined kMeans loss and reconstruction loss At the moment does not give good results :param dataset: Data on which the autoencoder is trained :param epochs: Number of training epochs :return: None - (side effect) saves the trained network params and latent space in appropriate location ''' batch_size = self.batch_size # Load the inputs in latent space produced by the pretrained autoencoder and use it to initialize cluster centers Z = np.load('saved_params/%s/z_%s.npy' % (dataset.name, self.name)) quality_desc, cluster_centers = evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), 'Initial') rootLogger.info(quality_desc) # Load network parameters - code borrowed from mnist lasagne example with np.load('saved_params/%s/m_%s.npz' % (dataset.name, self.name)) as f: param_values = [f['arr_%d' % i] for i in range(len(f.files))] lasagne.layers.set_all_param_values(self.network, param_values, trainable=True) # reconstruction loss is just rms loss between input and reconstructed input reconstruction_loss = self.getReconstructionLossExpression(layers.get_output(self.network), self.t_target) # extent the network to do soft cluster assignments clustering_network = ClusteringLayer(self.encode_layer, dataset.getClusterCount(), cluster_centers, batch_size, self.encode_size) soft_assignments = layers.get_output(clustering_network) # k-means loss is the sum of distances from the cluster centers weighted by the soft assignments to the clusters kmeansLoss = self.getKMeansLoss(layers.get_output(self.encode_layer), soft_assignments, clustering_network.W, dataset.getClusterCount(), self.encode_size, batch_size) params = lasagne.layers.get_all_params(self.network, trainable=True) # total loss = reconstruction loss + lambda * kmeans loss weight_reconstruction = 1 weight_kmeans = 0.1 total_loss = weight_kmeans * kmeansLoss + weight_reconstruction * reconstruction_loss updates = lasagne.updates.nesterov_momentum(total_loss, params, learning_rate=0.01) trainKMeansWithAE = theano.function([self.t_input, self.t_target], total_loss, updates=updates) for epoch in range(epochs): error = 0 total_batches = 0 for batch in dataset.iterate_minibatches(self.input_type, batch_size, shuffle=True): inputs, targets = batch error += trainKMeansWithAE(inputs, targets) total_batches += 1 # For every 10th epoch, update the cluster centers and print the clustering accuracy and nmi - for checking if the network # is actually doing something meaningful - the labels are never used for training if (epoch + 1) % 10 == 0: for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)): Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0]) quality_desc, cluster_centers = evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), "%d/%d [%.4f]" % (epoch + 1, epochs, error / total_batches)) rootLogger.info(quality_desc) else: # Just print the training loss rootLogger.info("%-30s %8s %8s" % ("%d/%d [%.4f]" % (epoch + 1, epochs, error / total_batches), "", "")) if self.shouldStopNow: break # Save the inputs in latent space and the network parameters for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)): Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0]) np.save('saved_params/%s/pc_km_z_%s.npy' % (dataset.name, self.name), Z) np.savez('saved_params/%s/pc_km_m_%s.npz' % (dataset.name, self.name), *lasagne.layers.get_all_param_values(self.network, trainable=True)) def getKMeansLoss(self, latent_space_expression, soft_assignments, t_cluster_centers, num_clusters, latent_space_dim, num_samples, soft_loss=False): # Kmeans loss = weighted sum of latent space representation of inputs from the cluster centers z = latent_space_expression.reshape((num_samples, 1, latent_space_dim)) z = T.tile(z, (1, num_clusters, 1)) u = t_cluster_centers.reshape((1, num_clusters, latent_space_dim)) u = T.tile(u, (num_samples, 1, 1)) distances = (z - u).norm(2, axis=2).reshape((num_samples, num_clusters)) if soft_loss: weighted_distances = distances * soft_assignments loss = weighted_distances.sum(axis=1).mean() else: loss = distances.min(axis=1).mean() return loss def networkToStr(self): # Utility method for printing the network structure in a shortened form layers = lasagne.layers.get_all_layers(self.network) result = '' for layer in layers: t = type(layer) if t is lasagne.layers.input.InputLayer: pass else: result += ' ' + layer.name return result.strip() class NetworkBuilder(object): ''' Class that handles parsing the architecture dictionary and creating an autoencoder out of it ''' def __init__(self, network_description): ''' :param network_description: python dictionary specifying the autoencoder architecture ''' # Populate the missing values in the dictionary with defaults, also add the missing decoder part # of the autoencoder which is missing in the dictionary self.network_description = self.populateMissingDescriptions(network_description) # Create theano variables for input and output - would be of different types for simple and convolutional autoencoders if self.network_description['network_type'] == 'CAE': self.t_input = T.tensor4('input_var') self.t_target = T.tensor4('target_var') self.input_type = "IMAGE" else: self.t_input = T.matrix('input_var') self.t_target = T.matrix('target_var') self.input_type = "FLAT" self.network_type = self.network_description['network_type'] self.batch_norm = bool(self.network_description["use_batch_norm"]) self.layer_list = [] def getBatchSize(self): return self.network_description["batch_size"] def getInputAndTargetVars(self): return self.t_input, self.t_target def getInputType(self): return self.input_type def buildNetwork(self): ''' :return: Lasagne autoencoder network based on the network decription dictionary ''' network = None for layer in self.network_description['layers']: network = self.processLayer(network, layer) return network def getEncodeLayerAndSize(self): ''' :return: The encode layer - layer between encoder and decoder (bottleneck) ''' return self.encode_layer, self.encode_size def populateDecoder(self, encode_layers): ''' Creates a specification for the mirror of encode layers - which completes the autoencoder specification ''' decode_layers = [] for i, layer in reversed(list(enumerate(encode_layers))): if (layer["type"] == "MaxPool2D*"): # Inverse max pool doesn't upscale the input, but does reverse of what happened when maxpool # operation was performed decode_layers.append({ "type": "InverseMaxPool2D", "layer_index": i, 'filter_size': layer['filter_size'] }) elif (layer["type"] == "MaxPool2D"): # Unpool just upscales the input back decode_layers.append({ "type": "Unpool2D", 'filter_size': layer['filter_size'] }) elif (layer["type"] == "Conv2D"): # Inverse convolution = deconvolution decode_layers.append({ 'type': 'Deconv2D', 'conv_mode': layer['conv_mode'], 'non_linearity': layer['non_linearity'], 'filter_size': layer['filter_size'], 'num_filters': encode_layers[i - 1]['output_shape'][0] }) elif (layer["type"] == "Dense" and not layer["is_encode"]): # Inverse of dense layers is just a dense layer, though we dont create an inverse layer corresponding to bottleneck layer decode_layers.append({ 'type': 'Dense', 'num_units': encode_layers[i]['output_shape'][2], 'non_linearity': encode_layers[i]['non_linearity'] }) # if the layer following the dense layer is one of these, we need to reshape the output if (encode_layers[i - 1]['type'] in ("Conv2D", "MaxPool2D", "MaxPool2D*")): decode_layers.append({ "type": "Reshape", "output_shape": encode_layers[i - 1]['output_shape'] }) encode_layers.extend(decode_layers) def populateShapes(self, layers): # Fills the dictionary with shape information corresponding to each layer, which will be used in creating the decode layers last_layer_dimensions = layers[0]['output_shape'] for layer in layers[1:]: if (layer['type'] == 'MaxPool2D' or layer['type'] == 'MaxPool2D*'): layer['output_shape'] = [last_layer_dimensions[0], last_layer_dimensions[1] / layer['filter_size'][0], last_layer_dimensions[2] / layer['filter_size'][1]] elif (layer['type'] == 'Conv2D'): multiplier = 1 if (layer['conv_mode'] == "same"): multiplier = 0 layer['output_shape'] = [layer['num_filters'], last_layer_dimensions[1] - (layer['filter_size'][0] - 1) * multiplier, last_layer_dimensions[2] - (layer['filter_size'][1] - 1) * multiplier] elif (layer['type'] == 'Dense'): layer['output_shape'] = [1, 1, layer['num_units']] last_layer_dimensions = layer['output_shape'] def populateMissingDescriptions(self, network_description): # Complete the architecture dictionary by filling in default values and populating description for decoder if 'network_type' not in network_description: if (network_description['name'].split('_')[0].split('-')[0] == 'fc'): network_description['network_type'] = 'AE' else: network_description['network_type'] = 'CAE' for layer in network_description['layers']: if 'conv_mode' not in layer: layer['conv_mode'] = 'valid' layer['is_encode'] = False network_description['layers'][-1]['is_encode'] = True if 'output_non_linearity' not in network_description: network_description['output_non_linearity'] = network_description['layers'][1]['non_linearity'] self.populateShapes(network_description['layers']) self.populateDecoder(network_description['layers']) if 'use_batch_norm' not in network_description: network_description['use_batch_norm'] = False for layer in network_description['layers']: if 'is_encode' not in layer: layer['is_encode'] = False layer['is_output'] = False network_description['layers'][-1]['is_output'] = True network_description['layers'][-1]['non_linearity'] = network_description['output_non_linearity'] return network_description def getInitializationFct(self): return lasagne.init.GlorotUniform() def processLayer(self, network, layer_definition): ''' Create a lasagne layer corresponding to the "layer definition" ''' if (layer_definition["type"] == "Input"): if self.network_type == 'CAE': network = lasagne.layers.InputLayer(shape=tuple([None] + layer_definition['output_shape']), input_var=self.t_input) elif self.network_type == 'AE': network = lasagne.layers.InputLayer(shape=(None, layer_definition['output_shape'][2]), input_var=self.t_input) elif (layer_definition['type'] == 'Dense'): network = lasagne.layers.DenseLayer(network, num_units=layer_definition['num_units'], nonlinearity=self.getNonLinearity(layer_definition['non_linearity']), name=self.getLayerName(layer_definition),W=self.getInitializationFct()) elif (layer_definition['type'] == 'Conv2D'): network = lasagne.layers.Conv2DLayer(network, num_filters=layer_definition['num_filters'], filter_size=tuple(layer_definition["filter_size"]), pad=layer_definition['conv_mode'], nonlinearity=self.getNonLinearity(layer_definition['non_linearity']), name=self.getLayerName(layer_definition),W=self.getInitializationFct()) elif (layer_definition['type'] == 'MaxPool2D' or layer_definition['type'] == 'MaxPool2D*'): network = lasagne.layers.MaxPool2DLayer(network, pool_size=tuple(layer_definition["filter_size"]), name=self.getLayerName(layer_definition)) elif (layer_definition['type'] == 'InverseMaxPool2D'): network = lasagne.layers.InverseLayer(network, self.layer_list[layer_definition['layer_index']], name=self.getLayerName(layer_definition)) elif (layer_definition['type'] == 'Unpool2D'): network = Unpool2DLayer(network, tuple(layer_definition['filter_size']), name=self.getLayerName(layer_definition)) elif (layer_definition['type'] == 'Reshape'): network = lasagne.layers.ReshapeLayer(network, shape=tuple([-1] + layer_definition["output_shape"]), name=self.getLayerName(layer_definition)) elif (layer_definition['type'] == 'Deconv2D'): network = lasagne.layers.Deconv2DLayer(network, num_filters=layer_definition['num_filters'], filter_size=tuple(layer_definition['filter_size']), crop=layer_definition['conv_mode'], nonlinearity=self.getNonLinearity(layer_definition['non_linearity']), name=self.getLayerName(layer_definition)) self.layer_list.append(network) # Batch normalization on all convolutional layers except if at output if (self.batch_norm and (not layer_definition["is_output"]) and layer_definition['type'] in ("Conv2D", "Deconv2D")): network = batch_norm(network) # Save the encode layer separately if (layer_definition['is_encode']): self.encode_layer = lasagne.layers.flatten(network, name='fl') self.encode_size = layer_definition['output_shape'][0] * layer_definition['output_shape'][1] * layer_definition['output_shape'][2] return network def getLayerName(self, layer_definition): ''' Utility method to name layers ''' if (layer_definition['type'] == 'Dense'): return 'fc[{}]'.format(layer_definition['num_units']) elif (layer_definition['type'] == 'Conv2D'): return '{}[{}]'.format(layer_definition['num_filters'], 'x'.join([str(fs) for fs in layer_definition['filter_size']])) elif (layer_definition['type'] == 'MaxPool2D' or layer_definition['type'] == 'MaxPool2D*'): return 'max[{}]'.format('x'.join([str(fs) for fs in layer_definition['filter_size']])) elif (layer_definition['type'] == 'InverseMaxPool2D'): return 'ups*[{}]'.format('x'.join([str(fs) for fs in layer_definition['filter_size']])) elif (layer_definition['type'] == 'Unpool2D'): return 'ups[{}]'.format( str(layer_definition['filter_size'][0]) + 'x' + str(layer_definition['filter_size'][1])) elif (layer_definition['type'] == 'Deconv2D'): return '{}[{}]'.format(layer_definition['num_filters'], 'x'.join([str(fs) for fs in layer_definition['filter_size']])) elif (layer_definition['type'] == 'Reshape'): return "rsh" def getNonLinearity(self, non_linearity): return { 'rectify': lasagne.nonlinearities.rectify, 'linear': lasagne.nonlinearities.linear, 'elu': lasagne.nonlinearities.elu }[non_linearity]