[
  {
    "path": ".gitignore",
    "content": "*.pyc\noutputs/*.png\nlogs/*\n.idea/\n"
  },
  {
    "path": ".project",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<projectDescription>\n\t<name>DeepConvJointClustering</name>\n\t<comment></comment>\n\t<projects>\n\t</projects>\n\t<buildSpec>\n\t\t<buildCommand>\n\t\t\t<name>org.python.pydev.PyDevBuilder</name>\n\t\t\t<arguments>\n\t\t\t</arguments>\n\t\t</buildCommand>\n\t</buildSpec>\n\t<natures>\n\t\t<nature>org.python.pydev.pythonNature</nature>\n\t</natures>\n</projectDescription>\n"
  },
  {
    "path": ".pydevproject",
    "content": "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n<?eclipse-pydev version=\"1.0\"?><pydev_project>\n<pydev_pathproperty name=\"org.python.pydev.PROJECT_SOURCE_PATH\">\n<path>/${PROJECT_DIR_NAME}</path>\n</pydev_pathproperty>\n<pydev_property name=\"org.python.pydev.PYTHON_PROJECT_VERSION\">python 2.7</pydev_property>\n<pydev_property name=\"org.python.pydev.PYTHON_PROJECT_INTERPRETER\">Default</pydev_property>\n</pydev_project>\n"
  },
  {
    "path": "README.md",
    "content": "Deep Learning for Clustering\n=======================\nCode for project \"Deep Learning for Clustering\" under lab course  \"Deep Learning for Computer Vision and Biomedicine\" - TUM. Depends on **numpy**, **theano**, **lasagne**, **scikit-learn**, **matplotlib**. \n\n#### Contributors\n- [Mohd Yawar Nihal Siddiqui](mailto:yawarnihal@gmail.com)\n- [Elie Aljalbout](mailto:elie.aljalbout@tum.de)\n- [Vladimir Golkov](mailto:vladimir.golkov@tum.de) (Supervisor)\n\n#### Related Papers:\n This repository is an implementation of the paper :\n Elie Aljalbout, Vladimir Golkov, Yawar Siddiqui, Daniel Cremers \"Clustering with Deep Learning: Taxonomy and new methods\"\n - arxiv: https://arxiv.org/abs/1801.07648\n\nUsage\n--------\nUse the main script for training, visualizing clusters and/or reporting clustering metrics \n```\npython main.py <options>\n```\nOption     | |\n-------- | ---\n```-d DATASET_NAME, --dataset DATASET_NAME ```| ``(Required) Dataset on which autoencoder is to be trained trained, or metrics/visualizations are to be performed [MNIST,COIL20]``\n```-a ARCH_IDX, --architecture ARCH_IDX```| ``(Required) Index of architecture of autoencoder in the json file (archs/)``\n``--pretrain EPOCHS`` | ``Pretrain the autoencoder for specified #epochs specified by architecture on specified dataset``\n``--cluster EPOCHS``| ``Refine the autoencoder for specified #epochs with clustering loss, assumes that pretraining results are available``\n``--metrics``| ``Report k-means clustering metrics on the clustered latent space, assumes pretrain and cluster based training have been performed``\n``--visualize``|``Visualize the image space and latent space, assumes pre-training and cluster based training have been performed``\n\nProject Structure\n------------------------\nFolder / File     | Description|\n-------- | ---\n<i class=\"icon-folder-open\"></i> archs| Contains json files specifying architectures for autoencoder networks used. File ``mnist.json`` contains architectures for  MNIST dataset. We use the second architecture for the reported results (command line argument ``-a 1``) \n<i class=\"icon-folder-open\"></i> coil, mnist | Contains the datasets COIL20 and MNIST respectively\n<i class=\"icon-folder-open\"></i> logs| Output folder for logs generated by the scripts. Named by date and time of script execution\n<i class=\"icon-folder-open\"></i>plots|Scatter plots showing the raw, pre-trained latent space, and the final latent space clusters\n<i class=\"icon-folder-open\"></i>saved_params | Contains saved network parameters and saved representation of inputs in latent space\n<i class=\"icon-file\"></i> custom_layers.py | Custom lasagne layers, Unpool2D - which performs inverse max pooling by replicating input pixels as dictated by the filter size, and the ClusteringLayer - a layer that outputs soft cluster assignments based on k-means cluster distance\n<i class=\"icon-file\"></i> main.py | The main python script for training and evaluating the network\n<i class=\"icon-file\"></i> misc.py | Contains dataset handlers and other utility methods\n<i class=\"icon-file\"></i>network.py| Contains classes for parsing and building the network from json files and also for training the network  \n\nAutoencoder Builder\n-----------------------------\nWe've implemented a **NetworkBuilder** class  that can be used to quickly describe the architecture of an autoencoder through a **json** file. \nThe json specification of the architecture is a dictionary with the following fields\n\n| Field | Description \n---------|------------\nname| Name identifier given to the architecture, used for file naming while saving parameters \nbatch_size| Batch size to be used while training the network\nuse_batch_norm| Whether to use batch normalization for convolutional/deconvolutional layers \nnetwork_type| Type of network - convolutional or fully connected\nlayers| A list describing the encoder part of the autoencoder\n\nFurther, each item in the layers list is a dictionary with the following fields\n\n| Field | Description \n---------|------------\ntype| Can be Input, Conv2D, MaxPool2D, MaxPool2D*, Dense, Reshape, Deconv2D\nnum_filters| For Conv2D/MaxPool2D/MaxPool2D*/Deconv2D layers this field specifies number of filters\nfilter_size| Dimensions of kernel for the above layers\nnum_units| For Dense layer number of hidden units\nnon_linearity| Non-Linearity function used at output of the layer\nconv_mode| Can be used to specify the convolution mode like same, valid etc. for convolutional layers\noutput_non_linearity| If you want a different non linearity function at the output than the one which would be obtained by mirroring\n\nOnly the encoder part of the autoencoder needs to be specified, the decoder will be automatically generated by the class. \n\nExample of a network description\n\n```json\n{\n    \"name\": \"c-5-6_p_c-5-16_p_c-4-120\",\n    \"use_batch_norm\": 1,\n    \"batch_size\": 100,\n    \"layers\": [\n      {\n        \"type\": \"Input\",\n        \"output_shape\":[1, 28, 28]\n      },\n      {\n        \"type\": \"Conv2D\",\n        \"num_filters\": 50,\n        \"filter_size\": [5, 5],\n        \"non_linearity\": \"rectify\"\n      },\n      {\n        \"type\": \"MaxPool2D*\",\n        \"filter_size\": [2, 2]\n      },\n      {\n        \"type\": \"Conv2D\",\n        \"num_filters\": 50,\n        \"filter_size\": [5, 5],\n        \"non_linearity\": \"rectify\"\n      },\n      {\n        \"type\": \"MaxPool2D*\",\n        \"filter_size\": [2, 2]\n      },\n      {\n        \"type\": \"Conv2D\",\n        \"num_filters\": 120,\n        \"filter_size\": [4, 4],\n        \"non_linearity\": \"linear\"\n      }\n    ]\n  }\n``` \n\nThis would generate the network\n``50[5x5] 50[5x5]_bn max[2x2] 50[5x5] 50[5x5]_bn  max[2x2]`` **``120[4x4] 120[4x4]_bn `` **``50[4x4] 50[4x4]_bn ups*[2x2] 50[5x5] 50[5x5]_bn ups*[2x2] 1[5x5]``\n\n\nExperiments and Results\n-----------------------------------\nWe trained and tested the network on two datasets - MNIST and COIL20 \n\n|Dataset| Image size | Number of samples | Number of clusters \n-------- | ---|---|---\nMNIST| 28x28x1|60000|10\nCOIL20| 128x128x1|1440|20\n\nClustering was performed with two different loss functions - \n\n- Loss = ``KL-Divergence(soft assignment distribution, target distribution) + Autoencoder Reconstruction loss ``, where the target distribution is a distribution that improves cluster purity and puts more emphasis on data points assigned with a high confidence. For more details check out the DEC paper [[1]](https://arxiv.org/abs/1511.06335).\n- Loss = ``k-Means loss + Autoencoder Reconstruction loss``\n\n#### **MNIST**\n\n##### Our network\n| Clustering space| Clustering Accuracy| Normalized Mutual Information \n-------- | ---|----\nImage pixels | 0.542|0.480\nAutoencoder| 0.760|0.667\nAutoencoder + k-Means Loss| 0.781| 0.796\nAutoencoder + KLDiv Loss| **0.859**| **0.825**\n##### Other networks\n|Method| Clustering Accuracy| Normalized Mutual Information \n-------- | ---|----\nDEC|0.843|0.800\nDCN|0.830|0.810\nCNN-RC| - |0.915\nCNN-FD|-|0.876\nDBC| 0.964|0.917\n\n> Note: The commit b34743114f68624b5371cd0d4c059b141422902f gives upto 0.96 accuracy and 0.92 NMI on the MNIST dataset. We will include it to the main branch once we can get better results with the COIL architecture \n\n##### **Latent space visualizations**\n\n###### Pixel space\n![](/plots/MNIST/raw.png)\n###### Autoencoder\n![](/plots/MNIST/autoencoder.png)\n###### Autoencoder Latent Space Evolution (video)\n[![Autoencoder](http://img.youtube.com/vi/_WuUB3gD984/0.jpg)](https://www.youtube.com/watch?v=_WuUB3gD984)\n###### Autoencoder + KLDivergence\n![](/plots/MNIST/clustered_kld.png)\n###### Autoencoder + KLDivergence Latent Space Evolution (video)\n[![Autoencoder](http://img.youtube.com/vi/XYS7DFkVx_A/0.jpg)](https://www.youtube.com/watch?v=XYS7DFkVx_A)\n###### Autoencoder + k-Means\n![](/plots/MNIST/clustered_km.png)\n\n#### **COIL20**\n##### Our network\n| Clustering space| Clustering Accuracy| Normalized Mutual Information \n-------- | ---|----\nImage pixels | 0.689|0.793\nAutoencoder| 0.739|0.828\nAutoencoder + k-Means Loss| 0.745| 0.846\nAutoencoder + KLDiv Loss| 0.762| 0.848\n##### Other networks\n|Method| Clustering Accuracy| Normalized Mutual Information \n-------- | ---|----\nDEN|0.725|0.870\nCNN-RC| - |1.000\nDBC| 0.793|0.895\n\n##### **Latent space visualizations**\n###### Pixel space\n![](/plots/COIL20/raw.png)\n###### Autoencoder\n![](/plots/COIL20/autoencoder.png)\n###### Autoencoder + k-Means\n![](/plots/COIL20/clustered_km.png)\n###### Autoencoder + KLDivergence\n![](/plots/COIL20/clustered_kld.png)\n"
  },
  {
    "path": "archs/coil.json",
    "content": "[\n    {\n      \"name\": \"c-9-20_p-2_c-5-20_p-2_c-5-40_p-2_c-4-320\",\n      \"use_batch_norm\":1,\n      \"batch_size\": 10,\n      \"layers\": [\n        {\n          \"type\":\"Input\",\n          \"output_shape\": [1, 128, 128]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [9, 9],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 40,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 320,\n          \"filter_size\": [4, 4],\n          \"non_linearity\": \"linear\"\n        }\n      ]\n    },\n    {\n    \"name\": \"c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-400\",\n    \"use_batch_norm\":1,\n    \"batch_size\": 10,\n    \"layers\": [\n        {\n          \"type\":\"Input\",\n          \"output_shape\": [1, 128, 128]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [9, 9],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 320,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 5120,\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 400,\n          \"non_linearity\": \"rectify\"\n        }\n      ]\n    },\n    {\n    \"name\": \"c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-400-fc-200\",\n    \"use_batch_norm\":1,\n    \"batch_size\": 10,\n    \"layers\": [\n        {\n          \"type\":\"Input\",\n          \"output_shape\": [1, 128, 128]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [9, 9],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 320,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 5120,\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 400,\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 200,\n          \"non_linearity\": \"rectify\"\n        }\n      ]\n    },\n    {\n    \"name\": \"c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-200\",\n    \"use_batch_norm\":1,\n    \"batch_size\": 10,\n    \"layers\": [\n        {\n          \"type\":\"Input\",\n          \"output_shape\": [1, 128, 128]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [9, 9],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 320,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 5120,\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 200,\n          \"non_linearity\": \"rectify\"\n        }\n      ]\n    },\n    {\n    \"name\": \"c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-80\",\n    \"use_batch_norm\":1,\n    \"batch_size\": 10,\n    \"layers\": [\n        {\n          \"type\":\"Input\",\n          \"output_shape\": [1, 128, 128]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [9, 9],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 320,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 5120,\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 80,\n          \"non_linearity\": \"rectify\"\n        }\n      ]\n    },\n    {\n    \"name\": \"c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-32\",\n    \"use_batch_norm\":1,\n    \"batch_size\": 10,\n    \"layers\": [\n        {\n          \"type\":\"Input\",\n          \"output_shape\": [1, 128, 128]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [9, 9],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 320,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 5120,\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 32,\n          \"non_linearity\": \"rectify\"\n        }\n      ]\n    },\n    {\n    \"name\": \"c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-1\",\n    \"use_batch_norm\":0,\n    \"batch_size\": 10,\n    \"layers\": [\n        {\n          \"type\":\"Input\",\n          \"output_shape\": [1, 128, 128]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [9, 9],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 320,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 5120,\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 1,\n          \"non_linearity\": \"rectify\"\n        }\n      ]\n    },\n    {\n    \"name\": \"c-9-20_p_c-5-20_p_c-5-20_p_c-5-320_p_fc-20\",\n    \"use_batch_norm\":1,\n    \"batch_size\": 10,\n    \"layers\": [\n        {\n          \"type\":\"Input\",\n          \"output_shape\": [1, 128, 128]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [9, 9],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 20,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\":\"Conv2D\",\n          \"num_filters\": 320,\n          \"filter_size\": [5, 5],\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\":\"MaxPool2D*\",\n          \"filter_size\": [2, 2]\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 5120,\n          \"non_linearity\": \"rectify\"\n        },\n        {\n          \"type\": \"Dense\",\n          \"num_units\": 20,\n          \"non_linearity\": \"rectify\"\n        }\n      ]\n    }\n]\n"
  },
  {
    "path": "archs/mnist.json",
    "content": "[\n  {\n    \"name\": \"c-3-32_p_c-3-64_p_fc-32\",\n    \"batch_size\": 50,\n    \"layers\": [\n      {\n        \"type\": \"Input\",\n        \"output_shape\": [\n          1,\n          28,\n          28\n        ]\n      },\n      {\n        \"type\": \"Conv2D\",\n        \"num_filters\": 32,\n        \"filter_size\": [\n          3,\n          3\n        ],\n        \"non_linearity\": \"rectify\",\n        \"conv_mode\": \"same\"\n      },\n      {\n        \"type\": \"MaxPool2D\",\n        \"filter_size\": [\n          2,\n          2\n        ]\n      },\n      {\n        \"type\": \"Conv2D\",\n        \"num_filters\": 64,\n        \"filter_size\": [\n          3,\n          3\n        ],\n        \"non_linearity\": \"rectify\",\n        \"conv_mode\": \"same\"\n      },\n      {\n        \"type\": \"MaxPool2D\",\n        \"filter_size\": [\n          2,\n          2\n        ]\n      },\n      {\n        \"type\": \"Dense\",\n        \"num_units\": 3136,\n        \"non_linearity\": \"rectify\"\n      },\n      {\n        \"type\": \"Dense\",\n        \"num_units\": 32,\n        \"non_linearity\": \"rectify\"\n      }\n    ]\n  },\n  {\n    \"name\": \"c-5-6_p_c-5-16_p_c-4-120\",\n    \"use_batch_norm\": 1,\n    \"batch_size\": 100,\n    \"layers\": [\n      {\n        \"type\": \"Input\",\n        \"output_shape\": [\n          1,\n          28,\n          28\n        ]\n      },\n      {\n        \"type\": \"Conv2D\",\n        \"num_filters\": 50,\n        \"filter_size\": [\n          5,\n          5\n        ],\n        \"non_linearity\": \"rectify\"\n      },\n      {\n        \"type\": \"MaxPool2D*\",\n        \"filter_size\": [\n          2,\n          2\n        ]\n      },\n      {\n        \"type\": \"Conv2D\",\n        \"num_filters\": 50,\n        \"filter_size\": [\n          5,\n          5\n        ],\n        \"non_linearity\": \"rectify\"\n      },\n      {\n        \"type\": \"MaxPool2D*\",\n        \"filter_size\": [\n          2,\n          2\n        ]\n      },\n      {\n        \"type\": \"Conv2D\",\n        \"num_filters\": 120,\n        \"filter_size\": [\n          4,\n          4\n        ],\n        \"non_linearity\": \"linear\"\n      }\n    ]\n  }\n]\n"
  },
  {
    "path": "customlayers.py",
    "content": "'''\nCreated on Jul 25, 2017\n'''\n\nfrom lasagne import layers\nimport theano\nimport theano.tensor as T\n\n\nclass Unpool2DLayer(layers.Layer):\n    \"\"\"\n    This layer performs unpooling over the last two dimensions\n    of a 4D tensor.\n    Layer borrowed from: https://swarbrickjones.wordpress.com/2015/04/29/convolutional-autoencoders-in-pythontheanolasagne/\n    \"\"\"\n\n    def __init__(self, incoming, ds, **kwargs):\n        super(Unpool2DLayer, self).__init__(incoming, **kwargs)\n        self.ds = ds\n\n    def get_output_shape_for(self, input_shape):\n        output_shape = list(input_shape)\n        output_shape[2] = input_shape[2] * self.ds[0]\n        output_shape[3] = input_shape[3] * self.ds[1]\n        return tuple(output_shape)\n\n    def get_output_for(self, incoming, **kwargs):\n        '''\n        Just repeats the input element the upscaled image\n        '''\n        ds = self.ds\n        return incoming.repeat(ds[0], axis=2).repeat(ds[1], axis=3)\n\n\nclass ClusteringLayer(layers.Layer):\n    '''\n    This layer gives soft assignments for the clusters based on distance from k-means based\n    cluster centers. The weights of the layers are the cluster centers so that they can be learnt\n    while optimizing for loss\n    '''\n\n    def __init__(self, incoming, num_clusters, initial_clusters, num_samples, latent_space_dim, **kwargs):\n        super(ClusteringLayer, self).__init__(incoming, **kwargs)\n        self.num_clusters = num_clusters\n        self.W = self.add_param(theano.shared(initial_clusters), initial_clusters.shape, 'W')\n        self.num_samples = num_samples\n        self.latent_space_dim = latent_space_dim\n\n    def get_output_shape_for(self, input_shape):\n        '''\n        Output shape is number of inputs x number of cluster, i.e for each input soft assignments\n        corresponding to all clusters\n        '''\n        return (input_shape[0], self.num_clusters)\n\n    def get_output_for(self, incoming, **kwargs):\n        return getSoftAssignments(incoming, self.W, self.num_clusters, self.latent_space_dim, self.num_samples)\n\n\ndef getSoftAssignments(latent_space, cluster_centers, num_clusters, latent_space_dim, num_samples):\n    '''\n    Returns cluster membership distribution for each sample\n    :param latent_space: latent space representation of inputs\n    :param cluster_centers: the coordinates of cluster centers in latent space\n    :param num_clusters: total number of clusters\n    :param latent_space_dim: dimensionality of latent space\n    :param num_samples: total number of input samples\n    :return: soft assigment based on the equation qij = (1+|zi - uj|^2)^(-1)/sum_j'((1+|zi - uj'|^2)^(-1))\n    '''\n    z_expanded = latent_space.reshape((num_samples, 1, latent_space_dim))\n    z_expanded = T.tile(z_expanded, (1, num_clusters, 1))\n    u_expanded = T.tile(cluster_centers, (num_samples, 1, 1))\n\n    distances_from_cluster_centers = (z_expanded - u_expanded).norm(2, axis=2)\n    qij_numerator = 1 + distances_from_cluster_centers * distances_from_cluster_centers\n    qij_numerator = 1 / qij_numerator\n    normalizer_q = qij_numerator.sum(axis=1).reshape((num_samples, 1))\n\n    return qij_numerator / normalizer_q\n"
  },
  {
    "path": "main.py",
    "content": "'''\nCreated on Jul 9, 2017\n'''\nimport numpy\nimport json\nfrom misc import DatasetHelper, evaluateKMeans, visualizeData\nfrom network import DCJC, rootLogger\nfrom copy import deepcopy\nimport argparse\n\n\ndef testOnlyClusterInitialization(dataset_name, arch, epochs):\n    '''\n    Train an autoencoder defined by architecture arch and trains it with the dataset defined\n    :param dataset_name: Name of the dataset with which the network will be trained [MNIST, COIL20]\n    :param arch: Architecture of the network as a dictionary. Specification for architecture can be found in readme.md\n    :param epochs: Number of train epochs\n    :return: None - (side effect) saves the latent space and params of trained network in an appropriate location in saved_params folder\n    '''\n    arch_copy = deepcopy(arch)\n    rootLogger.info(\"Loading dataset\")\n    dataset = DatasetHelper(dataset_name)\n    dataset.loadDataset()\n    rootLogger.info(\"Done loading dataset\")\n    rootLogger.info(\"Creating network\")\n    dcjc = DCJC(arch_copy)\n    rootLogger.info(\"Done creating network\")\n    rootLogger.info(\"Starting training\")\n    dcjc.pretrainWithData(dataset, epochs, False);\n\n\ndef testOnlyClusterImprovement(dataset_name, arch, epochs, method):\n    '''\n    Use an initialized autoencoder and train it along with clustering loss. Assumed that pretrained autoencoder params\n    are available, i.e. testOnlyClusterInitialization has been run already with the given params\n    :param dataset_name: Name of the dataset with which the network will be trained [MNIST, COIL20]\n    :param arch: Architecture of the network as a dictionary. Specification for architecture can be found in readme.md\n    :param epochs: Number of train epochs\n    :param method: Can be KM or KLD - depending on whether the clustering loss is KLDivergence loss between the current KMeans distribution(Q) and a more desired one(Q^2), or if the clustering loss is just the Kmeans loss\n    :return: None - (side effect) saves latent space and params of the trained network\n    '''\n    arch_copy = deepcopy(arch)\n    rootLogger.info(\"Loading dataset\")\n    dataset = DatasetHelper(dataset_name)\n    dataset.loadDataset()\n    rootLogger.info(\"Done loading dataset\")\n    rootLogger.info(\"Creating network\")\n    dcjc = DCJC(arch_copy)\n    rootLogger.info(\"Starting cluster improvement\")\n    if method == 'KM':\n        dcjc.doClusteringWithKMeansLoss(dataset, epochs)\n    elif method == 'KLD':\n        dcjc.doClusteringWithKLdivLoss(dataset, True, epochs)\n\n\ndef testKMeans(dataset_name, archs):\n    '''\n    Performs kMeans clustering, and report metrics on the output latent space produced by the networks defined in archs,\n    with given dataset. Assumes that testOnlyClusterInitialization and testOnlyClusterImprovement have been run before\n    this for the specified archs/datasets, as the results saved by them are used for clustering\n    :param dataset_name: Name of dataset [MNIST, COIL20]\n    :param archs: Architectures as a dictionary\n    :return: None - reports the accuracy and nmi clustering metrics\n    '''\n    rootLogger.info('Initial Cluster Quality Comparison')\n    rootLogger.info(80 * '_')\n    rootLogger.info('%-50s     %8s     %8s' % ('method', 'ACC', 'NMI'))\n    rootLogger.info(80 * '_')\n    dataset = DatasetHelper(dataset_name)\n    dataset.loadDataset()\n    rootLogger.info(evaluateKMeans(dataset.input_flat, dataset.labels, dataset.getClusterCount(), 'image')[0])\n    for arch in archs:\n        Z = numpy.load('saved_params/' + dataset.name + '/z_' + arch['name'] + '.npy')\n        rootLogger.info(evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), arch['name'])[0])\n        Z = numpy.load('saved_params/' + dataset.name + '/pc_z_' + arch['name'] + '.npy')\n        rootLogger.info(evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), arch['name'])[0])\n        Z = numpy.load('saved_params/' + dataset.name + '/pc_km_z_' + arch['name'] + '.npy')\n        rootLogger.info(evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), arch['name'])[0])\n    rootLogger.info(80 * '_')\n\n\ndef visualizeLatentSpace(dataset_name, arch):\n    '''\n    Plots and saves graphs for visualized images space, autoencoder latent space, and the final clustering latent space\n    :param dataset_name: Name of dataset [MNIST, COIL20]\n    :param arch: Architectures as a dictionary\n    :return: None - (side effect) saved graphs in plots/ folder\n    '''\n    rootLogger.info(\"Loading dataset\")\n    dataset = DatasetHelper(dataset_name)\n    dataset.loadDataset()\n    rootLogger.info(\"Done loading dataset\")\n    # We consider only the first 5000 point or less for better visualization\n    max_points = min(dataset.input_flat.shape[0], 5000)\n    # Image space\n    visualizeData(dataset.input_flat[0:max_points], dataset.labels[0:max_points], dataset.getClusterCount(), \"plots/%s/raw.png\" % dataset.name)\n    # Latent space - autoencoder\n    Z = numpy.load('saved_params/' + dataset.name + '/z_' + arch['name'] + '.npy')\n    visualizeData(Z[0:max_points], dataset.labels[0:max_points], dataset.getClusterCount(), \"plots/%s/autoencoder.png\" % dataset.name)\n    # Latent space - kl div clustering network\n    Z = numpy.load('saved_params/' + dataset.name + '/pc_z_' + arch['name'] + '.npy')\n    visualizeData(Z[0:max_points], dataset.labels[0:max_points], dataset.getClusterCount(), \"plots/%s/clustered_kld.png\" % dataset.name)\n    # Latent space - kmeans clustering network\n    Z = numpy.load('saved_params/' + dataset.name + '/pc_km_z_' + arch['name'] + '.npy')\n    visualizeData(Z[0:max_points], dataset.labels[0:max_points], dataset.getClusterCount(), \"plots/%s/clustered_km.png\" % dataset.name)\n\n\nif __name__ == '__main__':\n    '''\n    usage: main.py [-h] -d DATASET -a ARCHITECTURE [--pretrain PRETRAIN]\n               [--cluster CLUSTER] [--metrics METRICS] [--visualize VISUALIZE]\n        \n    required arguments:\n      -d DATASET, --dataset DATASET\n                            Dataset on which autoencoder is trained [MNIST,COIL20]\n      -a ARCHITECTURE, --architecture ARCHITECTURE\n                            Index of architecture of autoencoder in the json file\n                            (archs/)\n                            \n    optional arguments:\n      -h, --help            show this help message and exit\n      --pretrain PRETRAIN   Pretrain the autoencoder for specified #epochs\n                            specified by architecture on specified dataset\n      --cluster CLUSTER     Refine the autoencoder for specified #epochs with\n                            clustering loss, assumes that pretraining results are\n                            available\n      --metrics METRICS     Report k-means clustering metrics on the clustered\n                            latent space, assumes pretrain and cluster based\n                            training have been performed\n      --visualize VISUALIZE\n                            Visualize the image space and latent space, assumes\n                            pretraining and cluster based training have been\n                            performed\n    '''\n    # Load architectures from the json files\n    mnist_archs = []\n    coil_archs = []\n    with open(\"archs/coil.json\") as archs_file:\n        coil_archs = json.load(archs_file)\n    with open(\"archs/mnist.json\") as archs_file:\n        mnist_archs = json.load(archs_file)\n    # Argument parsing\n    parser = argparse.ArgumentParser()\n    requiredArgs = parser.add_argument_group('required arguments')\n    requiredArgs.add_argument(\"-d\", \"--dataset\", help=\"Dataset on which autoencoder is trained [MNIST,COIL20]\", required=True)\n    requiredArgs.add_argument(\"-a\", \"--architecture\", type=int, help=\"Index of architecture of autoencoder in the json file (archs/)\", required=True)\n    parser.add_argument(\"--pretrain\", type=int, help=\"Pretrain the autoencoder for specified #epochs specified by architecture on specified dataset\")\n    parser.add_argument(\"--cluster\", type=int, help=\"Refine the autoencoder for specified #epochs with clustering loss, assumes that pretraining results are available\")\n    parser.add_argument(\"--metrics\", action='store_true', help=\"Report k-means clustering metrics on the clustered latent space, assumes pretrain and cluster based training have been performed\")\n    parser.add_argument(\"--visualize\", action='store_true', help=\"Visualize the image space and latent space, assumes pretraining and cluster based training have been performed\")\n    args = parser.parse_args()\n    # Train/Visualize as per the arguments\n    dataset_name = args.dataset\n    arch_index = args.architecture\n    if dataset_name == 'MNIST':\n        archs = mnist_archs\n    elif dataset_name == 'COIL20':\n        archs = coil_archs\n    if args.pretrain:\n        testOnlyClusterInitialization(dataset_name, archs[arch_index], args.pretrain)\n    if args.cluster:\n        testOnlyClusterImprovement(dataset_name, archs[arch_index], args.cluster, \"KLD\")\n    if args.metrics:\n        testKMeans(dataset_name, [archs[arch_index]])\n    if args.visualize:\n        visualizeLatentSpace(dataset_name, archs[arch_index])\n"
  },
  {
    "path": "misc.py",
    "content": "'''\nCreated on Jul 11, 2017\n'''\n\nimport cPickle\nimport gzip\n\nimport numpy as np\nfrom PIL import Image\nimport matplotlib\n\n# For plotting graphs via ssh with no display\n# Ref: https://stackoverflow.com/questions/2801882/generating-a-png-with-matplotlib-when-display-is-undefined\nmatplotlib.use('Agg')\n\nfrom matplotlib import pyplot as plt\nfrom numpy import float32\nfrom sklearn import metrics\nfrom sklearn.cluster.k_means_ import KMeans\nfrom sklearn import manifold\nfrom sklearn.utils.linear_assignment_ import linear_assignment\n\n\nclass DatasetHelper(object):\n    '''\n    Utility class for handling different datasets\n    '''\n\n    def __init__(self, name):\n        '''\n        A dataset instance keeps dataset name, the input set, the flat version of input set\n        and the cluster labels\n        '''\n        self.name = name\n        if name == 'MNIST':\n            self.dataset = MNISTDataset()\n        elif name == 'STL':\n            self.dataset = STLDataset()\n        elif name == 'COIL20':\n            self.dataset = COIL20Dataset()\n\n    def loadDataset(self):\n        '''\n        Load the appropriate dataset based on the dataset name\n        '''\n        self.input, self.labels, self.input_flat = self.dataset.loadDataset()\n\n    def getClusterCount(self):\n        '''\n        Number of clusters in the dataset - e.g 10 for mnist, 20 for coil20\n        '''\n        return self.dataset.cluster_count\n\n    def iterate_minibatches(self, set_type, batch_size, targets=None, shuffle=False):\n        '''\n        Utility method for getting batches out of a dataset\n        :param set_type: IMAGE - suitable input for CNNs or FLAT - suitable for DNN\n        :param batch_size: Size of minibatches\n        :param targets: None if the output should be same as inputs (autoencoders), otherwise takes a target array from which batches can be extracted. Must have the same order as the dataset, e.g, dataset inputs nth sample has output at target's nth element\n        :param shuffle: If the dataset needs to be shuffled or not\n        :return: generates a batches of size batch_size from the dataset, each batch is the pair (input, output)\n        '''\n        inputs = None\n        if set_type == 'IMAGE':\n            inputs = self.input\n            if targets is None:\n                targets = self.input\n        elif set_type == 'FLAT':\n            inputs = self.input_flat\n            if targets is None:\n                targets = self.input_flat\n        assert len(inputs) == len(targets)\n        if shuffle:\n            indices = np.arange(len(inputs))\n            np.random.shuffle(indices)\n        for start_idx in range(0, len(inputs) - batch_size + 1, batch_size):\n            if shuffle:\n                excerpt = indices[start_idx:start_idx + batch_size]\n            else:\n                excerpt = slice(start_idx, start_idx + batch_size)\n            yield inputs[excerpt], targets[excerpt]\n\n\nclass MNISTDataset(object):\n    '''\n    Class for reading and preparing MNIST dataset\n    '''\n\n    def __init__(self):\n        self.cluster_count = 10\n\n    def loadDataset(self):\n        f = gzip.open('mnist/mnist.pkl.gz', 'rb')\n        train_set, _, test_set = cPickle.load(f)\n        train_input, train_input_flat, train_labels = self.prepareDatasetForAutoencoder(train_set[0], train_set[1])\n        test_input, test_input_flat, test_labels = self.prepareDatasetForAutoencoder(test_set[0], test_set[1])\n        f.close()\n        # combine test and train samples\n        return [np.concatenate((train_input, test_input)), np.concatenate((train_labels, test_labels)),\n                np.concatenate((train_input_flat, test_input_flat))]\n\n    def prepareDatasetForAutoencoder(self, inputs, targets):\n        '''\n        Returns the image, flat and labels as a tuple\n        '''\n        X = inputs\n        X = X.reshape((-1, 1, 28, 28))\n        return (X, X.reshape((-1, 28 * 28)), targets)\n\n\nclass STLDataset(object):\n    '''\n    Class for preparing and reading the STL dataset\n    '''\n\n    def __init__(self):\n        self.cluster_count = 10\n\n    def loadDataset(self):\n        train_x = np.fromfile('stl/train_X.bin', dtype=np.uint8)\n        train_y = np.fromfile('stl/train_y.bin', dtype=np.uint8)\n        test_x = np.fromfile('stl/train_X.bin', dtype=np.uint8)\n        test_y = np.fromfile('stl/train_y.bin', dtype=np.uint8)\n        train_input = np.reshape(train_x, (-1, 3, 96, 96))\n        train_labels = train_y\n        train_input_flat = np.reshape(test_x, (-1, 1, 3 * 96 * 96))\n        test_input = np.reshape(test_x, (-1, 3, 96, 96))\n        test_labels = test_y\n        test_input_flat = np.reshape(test_x, (-1, 1, 3 * 96 * 96))\n        return [np.concatenate(train_input, test_input), np.concatenate(train_labels, test_labels),\n                np.concatenate(train_input_flat, test_input_flat)]\n\n\nclass COIL20Dataset(object):\n    '''\n    Class for reading and preparing the COIL20Dataset\n    '''\n\n    def __init__(self):\n        self.cluster_count = 20\n\n    def loadDataset(self):\n        train_x = np.load('coil/coil_X.npy').astype(np.float32) / 256.0\n        train_y = np.load('coil/coil_y.npy')\n        train_x_flat = np.reshape(train_x, (-1, 128 * 128))\n        return [train_x, train_y, train_x_flat]\n\n\ndef rescaleReshapeAndSaveImage(image_sample, out_filename):\n    '''\n    For saving the reconstructed output as an image\n    :param image_sample: output of the autoencoder\n    :param out_filename: filename for the saved image\n    :return: None (side effect) Image saved\n    '''\n    image_sample = ((image_sample - np.amin(image_sample)) / (np.amax(image_sample) - np.amin(image_sample))) * 255;\n    image_sample = np.rint(image_sample).astype(int)\n    image_sample = np.clip(image_sample, a_min=0, a_max=255).astype('uint8')\n    img = Image.fromarray(image_sample, 'L')\n    img.save(out_filename)\n\n\ndef cluster_acc(y_true, y_pred):\n    '''\n    Uses the hungarian algorithm to find the best permutation mapping and then calculates the accuracy wrt\n    Implementation inpired from https://github.com/piiswrong/dec, since scikit does not implement this metric\n    this mapping and true labels\n    :param y_true: True cluster labels\n    :param y_pred: Predicted cluster labels\n    :return: accuracy score for the clustering\n    '''\n    D = int(max(y_pred.max(), y_true.max()) + 1)\n    w = np.zeros((D, D), dtype=np.int32)\n    for i in range(y_pred.size):\n        idx1 = int(y_pred[i])\n        idx2 = int(y_true[i])\n        w[idx1, idx2] += 1\n    ind = linear_assignment(w.max() - w)\n    return sum([w[i, j] for i, j in ind]) * 1.0 / y_pred.size\n\n\ndef getClusterMetricString(method_name, labels_true, labels_pred):\n    '''\n    Creates a formatted string containing the method name and acc, nmi metrics - can be used for printing\n    :param method_name: Name of the clustering method (just for printing)\n    :param labels_true: True label for each sample\n    :param labels_pred: Predicted label for each sample\n    :return: Formatted string containing metrics and method name\n    '''\n    acc = cluster_acc(labels_true, labels_pred)\n    nmi = metrics.normalized_mutual_info_score(labels_true, labels_pred)\n    return '%-50s     %8.3f     %8.3f' % (method_name, acc, nmi)\n\n\ndef evaluateKMeans(data, labels, nclusters, method_name):\n    '''\n    Clusters data with kmeans algorithm and then returns the string containing method name and metrics, and also the evaluated cluster centers\n    :param data: Points that need to be clustered as a numpy array\n    :param labels: True labels for the given points\n    :param nclusters: Total number of clusters\n    :param method_name: Name of the method from which the clustering space originates (only used for printing)\n    :return: Formatted string containing metrics and method name, cluster centers\n    '''\n    kmeans = KMeans(n_clusters=nclusters, n_init=20)\n    kmeans.fit(data)\n    return getClusterMetricString(method_name, labels, kmeans.labels_), kmeans.cluster_centers_\n\n\ndef visualizeData(Z, labels, num_clusters, title):\n    '''\n    TSNE visualization of the points in latent space Z\n    :param Z: Numpy array containing points in latent space in which clustering was performed\n    :param labels: True labels - used for coloring points\n    :param num_clusters: Total number of clusters\n    :param title: filename where the plot should be saved\n    :return: None - (side effect) saves clustering visualization plot in specified location\n    '''\n    labels = labels.astype(int)\n    tsne = manifold.TSNE(n_components=2, init='pca', random_state=0)\n    Z_tsne = tsne.fit_transform(Z)\n    fig = plt.figure()\n    plt.scatter(Z_tsne[:, 0], Z_tsne[:, 1], s=2, c=labels, cmap=plt.cm.get_cmap(\"jet\", num_clusters))\n    plt.colorbar(ticks=range(num_clusters))\n    fig.savefig(title, dpi=fig.dpi)\n"
  },
  {
    "path": "network.py",
    "content": "'''\nCreated on Jul 11, 2017\n'''\n\nfrom datetime import datetime\nimport logging\n\nfrom lasagne import layers\nimport lasagne\nfrom lasagne.layers.helper import get_all_layers\nimport theano\nimport signal\n\nfrom customlayers import ClusteringLayer, Unpool2DLayer, getSoftAssignments\nfrom misc import evaluateKMeans, visualizeData, rescaleReshapeAndSaveImage\nimport numpy as np\nimport theano.tensor as T\n\nfrom lasagne.layers import batch_norm\n\n# Logging utilities - logs get saved in folder logs named by date and time, and also output\n# at standard output\n\nlogFormatter = logging.Formatter(\"[%(asctime)s]  %(message)s\", datefmt='%m/%d %I:%M:%S')\n\nrootLogger = logging.getLogger()\nrootLogger.setLevel(logging.DEBUG)\n\nfileHandler = logging.FileHandler(datetime.now().strftime('logs/dcjc_%H_%M_%d_%m.log'))\nfileHandler.setFormatter(logFormatter)\nrootLogger.addHandler(fileHandler)\n\nconsoleHandler = logging.StreamHandler()\nconsoleHandler.setFormatter(logFormatter)\nrootLogger.addHandler(consoleHandler)\n\n\nclass DCJC(object):\n    # Main class holding autoencoder network and training functions\n    def __init__(self, network_description):\n\n        signal.signal(signal.SIGINT, self.signal_handler)\n        self.name = network_description['name']\n        netbuilder = NetworkBuilder(network_description)\n        self.shouldStopNow  = False\n        # Get the lasagne network using the network builder class that creates autoencoder with the specified architecture\n        self.network = netbuilder.buildNetwork()\n        self.encode_layer, self.encode_size = netbuilder.getEncodeLayerAndSize()\n        self.t_input, self.t_target = netbuilder.getInputAndTargetVars()\n        self.input_type = netbuilder.getInputType()\n        self.batch_size = netbuilder.getBatchSize()\n        rootLogger.info(\"Network: \" + self.networkToStr())\n        # Reconstruction is just output of the network\n        recon_prediction_expression = layers.get_output(self.network)\n        # Latent/Encoded space is the output of the bottleneck/encode layer\n        encode_prediction_expression = layers.get_output(self.encode_layer, deterministic=True)\n        # Loss for autoencoder = reconstruction loss + weight decay regularizer\n        loss = self.getReconstructionLossExpression(recon_prediction_expression, self.t_target)\n        weightsl2 = lasagne.regularization.regularize_network_params(self.network, lasagne.regularization.l2)\n        loss += (5e-5 * weightsl2)\n        params = lasagne.layers.get_all_params(self.network, trainable=True)\n        # SGD with momentum + Decaying learning rate\n        self.learning_rate = theano.shared(lasagne.utils.floatX(0.01))\n        updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=self.learning_rate)\n        # Theano functions for calculating loss, predicting reconstruction, encoding\n        self.trainAutoencoder = theano.function([self.t_input, self.t_target], loss, updates=updates)\n        self.predictReconstruction = theano.function([self.t_input], recon_prediction_expression)\n        self.predictEncoding = theano.function([self.t_input], encode_prediction_expression)\n\n    def getReconstructionLossExpression(self, prediction_expression, t_target):\n        '''\n        Reconstruction loss = means square error between input and reconstructed input\n        '''\n        loss = lasagne.objectives.squared_error(prediction_expression, t_target)\n        loss = loss.mean()\n        return loss\n\n    def signal_handler(self,signal, frame):\n\n        command = raw_input('\\nWhat is your command?')\n        if str(command).lower()==\"stop\":\n            self.shouldStopNow  = True\n        else:\n            exec(command)\n\n    def pretrainWithData(self, dataset, epochs, continue_training=False):\n        '''\n        Pretrains the autoencoder on the given dataset\n        :param dataset: Data on which the autoencoder is trained\n        :param epochs: number of training epochs\n        :param continue_training: Resume training if saved params available\n        :return: None - (side effect) saves the trained network params and latent space in appropriate location\n        '''\n        batch_size = self.batch_size\n        # array for holding the latent space representation of input\n        Z = np.zeros((dataset.input.shape[0], self.encode_size), dtype=np.float32);\n        # in case we're continuing training load the network params\n        if continue_training:\n            with np.load('saved_params/%s/m_%s.npz' % (dataset.name, self.name)) as f:\n                param_values = [f['arr_%d' % i] for i in range(len(f.files))]\n                lasagne.layers.set_all_param_values(self.network, param_values, trainable=True)\n        for epoch in range(epochs):\n            error = 0\n            total_batches = 0\n            for batch in dataset.iterate_minibatches(self.input_type, batch_size, shuffle=True):\n                inputs, targets = batch\n                error += self.trainAutoencoder(inputs, targets)\n                total_batches += 1\n            # learning rate decay\n            self.learning_rate.set_value(self.learning_rate.get_value() * lasagne.utils.floatX(0.9999))\n            # For every 20th iteration, print the clustering accuracy and nmi - for checking if the network\n            # is actually doing something meaningful - the labels are never used for training\n            if (epoch + 1) % 2 == 0:\n                for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)):\n                    Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0])\n                    # Uncomment the next two lines to create reconstruction outputs in folder dumps/ (may need to be created)\n                    #for i, x in enumerate(self.predictReconstruction(batch[0])):\n                    #\tprint('dump')\n                    #\trescaleReshapeAndSaveImage(x[0], \"dumps/%02d%03d.jpg\"%(epoch,i));\n                rootLogger.info(evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), \"%d/%d [%.4f]\" % (epoch + 1, epochs, error / total_batches))[0])\n            else:\n                # Just report the training loss\n                rootLogger.info(\"%-30s     %8s     %8s\" % (\"%d/%d [%.4f]\" % (epoch + 1, epochs, error / total_batches), \"\", \"\"))\n            if self.shouldStopNow:\n            \tbreak\n        # The inputs in latent space after pretraining\n        for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)):\n            Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0])\n        # Save network params and latent space\n        np.save('saved_params/%s/z_%s.npy' % (dataset.name, self.name), Z)\n        # Borrowed from mnist lasagne example\n        np.savez('saved_params/%s/m_%s.npz' % (dataset.name, self.name), *lasagne.layers.get_all_param_values(self.network, trainable=True))\n\n    def doClusteringWithKLdivLoss(self, dataset, combined_loss, epochs):\n        '''\n        Trains the autoencoder with combined kldivergence loss and reconstruction loss, or just the kldivergence loss\n        At the moment does not give good results\n        :param dataset: Data on which the autoencoder is trained\n        :param combined_loss: boolean - whether to use both reconstruction and kl divergence loss or just kldivergence loss\n        :param epochs: Number of training epochs\n        :return: None - (side effect) saves the trained network params and latent space in appropriate location\n        '''\n        batch_size = self.batch_size\n        # Load saved network params and inputs in latent space obtained after pretraining\n        with np.load('saved_params/%s/m_%s.npz' % (dataset.name, self.name)) as f:\n            param_values = [f['arr_%d' % i] for i in range(len(f.files))]\n            lasagne.layers.set_all_param_values(self.network, param_values, trainable=True)\n        Z = np.load('saved_params/%s/z_%s.npy' % (dataset.name, self.name))\n        # Find initial cluster centers\n        quality_desc, cluster_centers = evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), 'Initial')\n        rootLogger.info(quality_desc)\n        # P is the more pure target distribution we want to achieve\n        P = T.matrix('P')\n        # Extend the network so it calculates soft assignment cluster distribution for the inputs in latent space\n        clustering_network = ClusteringLayer(self.encode_layer, dataset.getClusterCount(), cluster_centers, batch_size,self.encode_size)\n        soft_assignments = layers.get_output(clustering_network)\n        reconstructed_output_exp = layers.get_output(self.network)\n        # Clustering loss = kl divergence between the pure distribution P and current distribution\n        clustering_loss = self.getKLDivLossExpression(soft_assignments, P)\n        reconstruction_loss = self.getReconstructionLossExpression(reconstructed_output_exp, self.t_target)\n        params_ae = lasagne.layers.get_all_params(self.network, trainable=True)\n        params_dec = lasagne.layers.get_all_params(clustering_network, trainable=True)\n        # Total loss = weighted sum of the two losses\n        w_cluster_loss = 1\n        w_reconstruction_loss = 1\n        total_loss = w_cluster_loss * clustering_loss\n        if (combined_loss):\n            total_loss = total_loss + w_reconstruction_loss * reconstruction_loss\n        all_params = params_dec\n        if combined_loss:\n            all_params.extend(params_ae)\n        # Parameters = unique parameters in the new network\n        all_params = list(set(all_params))\n        # SGD with momentum, LR = 0.01, Momentum = 0.9\n        updates = lasagne.updates.nesterov_momentum(total_loss, all_params, learning_rate=0.01)\n        # Function to calculate the soft assignment distribution\n        getSoftAssignments = theano.function([self.t_input], soft_assignments)\n        # Train function - based on whether complete loss is used or not\n        trainFunction = None\n        if combined_loss:\n            trainFunction = theano.function([self.t_input, self.t_target, P], total_loss, updates=updates)\n        else:\n            trainFunction = theano.function([self.t_input, P], clustering_loss, updates=updates)\n        for epoch in range(epochs):\n            # Get the current distribution\n            qij = np.zeros((dataset.input.shape[0], dataset.getClusterCount()), dtype=np.float32)\n            for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)):\n                qij[i * batch_size: (i + 1) * batch_size] = getSoftAssignments(batch[0])\n            # Calculate the desired distribution\n            pij = self.calculateP(qij)\n            error = 0\n            total_batches = 0\n            for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, pij, shuffle=True)):\n                if (combined_loss):\n                    error += trainFunction(batch[0], batch[0], batch[1])\n                else:\n                    error += trainFunction(batch[0], batch[1])\n                total_batches += 1\n            for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)):\n                Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0])\n            # For every 10th iteration, print the clustering accuracy and nmi - for checking if the network\n            # is actually doing something meaningful - the labels are never used for training\n            if (epoch + 1) % 10 == 0:\n                rootLogger.info(evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), \"%d [%.4f]\" % (\n                    epoch, error / total_batches))[0])\n            if self.shouldStopNow:\n           \t   break\n        # Save the inputs in latent space and the network parameters\n        for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)):\n            Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0])\n        np.save('saved_params/%s/pc_z_%s.npy' % (dataset.name, self.name), Z)\n        np.savez('saved_params/%s/pc_m_%s.npz' % (dataset.name, self.name),\n                 *lasagne.layers.get_all_param_values(self.network, trainable=True))\n\n    def calculateP(self, Q):\n        # Function to calculate the desired distribution Q^2, for more details refer to DEC paper\n        f = Q.sum(axis=0)\n        pij_numerator = Q * Q\n        pij_numerator = pij_numerator / f\n        normalizer_p = pij_numerator.sum(axis=1).reshape((Q.shape[0], 1))\n        P = pij_numerator / normalizer_p\n        return P\n\n    def getKLDivLossExpression(self, Q_expression, P_expression):\n        # Loss = KL Divergence between the two distributions\n        log_arg = P_expression / Q_expression\n        log_exp = T.log(log_arg)\n        sum_arg = P_expression * log_exp\n        loss = sum_arg.sum(axis=1).sum(axis=0)\n        return loss\n\n    def doClusteringWithKMeansLoss(self, dataset, epochs):\n        '''\n        Trains the autoencoder with combined kMeans loss and reconstruction loss\n        At the moment does not give good results\n        :param dataset: Data on which the autoencoder is trained\n        :param epochs: Number of training epochs\n        :return: None - (side effect) saves the trained network params and latent space in appropriate location\n        '''\n        batch_size = self.batch_size\n        # Load the inputs in latent space produced by the pretrained autoencoder and use it to initialize cluster centers\n        Z = np.load('saved_params/%s/z_%s.npy' % (dataset.name, self.name))\n        quality_desc, cluster_centers = evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), 'Initial')\n        rootLogger.info(quality_desc)\n        # Load network parameters - code borrowed from mnist lasagne example\n        with np.load('saved_params/%s/m_%s.npz' % (dataset.name, self.name)) as f:\n            param_values = [f['arr_%d' % i] for i in range(len(f.files))]\n            lasagne.layers.set_all_param_values(self.network, param_values, trainable=True)\n        # reconstruction loss is just rms loss between input and reconstructed input\n        reconstruction_loss = self.getReconstructionLossExpression(layers.get_output(self.network), self.t_target)\n        # extent the network to do soft cluster assignments\n        clustering_network = ClusteringLayer(self.encode_layer, dataset.getClusterCount(), cluster_centers, batch_size, self.encode_size)\n        soft_assignments = layers.get_output(clustering_network)\n        # k-means loss is the sum of distances from the cluster centers weighted by the soft assignments to the clusters\n        kmeansLoss = self.getKMeansLoss(layers.get_output(self.encode_layer), soft_assignments, clustering_network.W, dataset.getClusterCount(), self.encode_size, batch_size)\n        params = lasagne.layers.get_all_params(self.network, trainable=True)\n        # total loss = reconstruction loss + lambda * kmeans loss\n        weight_reconstruction = 1\n        weight_kmeans = 0.1\n        total_loss = weight_kmeans * kmeansLoss + weight_reconstruction * reconstruction_loss\n        updates = lasagne.updates.nesterov_momentum(total_loss, params, learning_rate=0.01)\n        trainKMeansWithAE = theano.function([self.t_input, self.t_target], total_loss, updates=updates)\n        for epoch in range(epochs):\n            error = 0\n            total_batches = 0\n            for batch in dataset.iterate_minibatches(self.input_type, batch_size, shuffle=True):\n                inputs, targets = batch\n                error += trainKMeansWithAE(inputs, targets)\n                total_batches += 1\n            # For every 10th epoch, update the cluster centers and print the clustering accuracy and nmi - for checking if the network\n            # is actually doing something meaningful - the labels are never used for training\n            if (epoch + 1) % 10 == 0:\n                for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)):\n                    Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0])\n                quality_desc, cluster_centers = evaluateKMeans(Z, dataset.labels, dataset.getClusterCount(), \"%d/%d [%.4f]\" % (epoch + 1, epochs, error / total_batches))\n                rootLogger.info(quality_desc)\n            else:\n                # Just print the training loss\n                rootLogger.info(\"%-30s     %8s     %8s\" % (\"%d/%d [%.4f]\" % (epoch + 1, epochs, error / total_batches), \"\", \"\"))\n            if self.shouldStopNow:\n            \tbreak\n\n        # Save the inputs in latent space and the network parameters\n        for i, batch in enumerate(dataset.iterate_minibatches(self.input_type, batch_size, shuffle=False)):\n            Z[i * batch_size:(i + 1) * batch_size] = self.predictEncoding(batch[0])\n        np.save('saved_params/%s/pc_km_z_%s.npy' % (dataset.name, self.name), Z)\n        np.savez('saved_params/%s/pc_km_m_%s.npz' % (dataset.name, self.name),\n                 *lasagne.layers.get_all_param_values(self.network, trainable=True))\n\n\n    def getKMeansLoss(self, latent_space_expression, soft_assignments, t_cluster_centers, num_clusters, latent_space_dim, num_samples, soft_loss=False):\n        # Kmeans loss = weighted sum of latent space representation of inputs from the cluster centers\n        z = latent_space_expression.reshape((num_samples, 1, latent_space_dim))\n        z = T.tile(z, (1, num_clusters, 1))\n        u = t_cluster_centers.reshape((1, num_clusters, latent_space_dim))\n        u = T.tile(u, (num_samples, 1, 1))\n        distances = (z - u).norm(2, axis=2).reshape((num_samples, num_clusters))\n        if soft_loss:\n            weighted_distances = distances * soft_assignments\n            loss = weighted_distances.sum(axis=1).mean()\n        else:\n            loss = distances.min(axis=1).mean()\n        return loss\n\n    def networkToStr(self):\n        # Utility method for printing the network structure in a shortened form\n        layers = lasagne.layers.get_all_layers(self.network)\n        result = ''\n        for layer in layers:\n            t = type(layer)\n            if t is lasagne.layers.input.InputLayer:\n                pass\n            else:\n                result += ' ' + layer.name\n        return result.strip()\n\n\nclass NetworkBuilder(object):\n    '''\n    Class that handles parsing the architecture dictionary and creating an autoencoder out of it\n    '''\n\n    def __init__(self, network_description):\n        '''\n        :param network_description: python dictionary specifying the autoencoder architecture\n        '''\n        # Populate the missing values in the dictionary with defaults, also add the missing decoder part\n        # of the autoencoder which is missing in the dictionary\n        self.network_description = self.populateMissingDescriptions(network_description)\n        # Create theano variables for input and output - would be of different types for simple and convolutional autoencoders\n        if self.network_description['network_type'] == 'CAE':\n            self.t_input = T.tensor4('input_var')\n            self.t_target = T.tensor4('target_var')\n            self.input_type = \"IMAGE\"\n        else:\n            self.t_input = T.matrix('input_var')\n            self.t_target = T.matrix('target_var')\n            self.input_type = \"FLAT\"\n        self.network_type = self.network_description['network_type']\n        self.batch_norm = bool(self.network_description[\"use_batch_norm\"])\n        self.layer_list = []\n\n    def getBatchSize(self):\n        return self.network_description[\"batch_size\"]\n\n    def getInputAndTargetVars(self):\n        return self.t_input, self.t_target\n\n    def getInputType(self):\n        return self.input_type\n\n    def buildNetwork(self):\n        '''\n        :return: Lasagne autoencoder network based on the network decription dictionary\n        '''\n        network = None\n        for layer in self.network_description['layers']:\n            network = self.processLayer(network, layer)\n        return network\n\n    def getEncodeLayerAndSize(self):\n        '''\n        :return: The encode layer - layer between encoder and decoder (bottleneck)\n        '''\n        return self.encode_layer, self.encode_size\n\n    def populateDecoder(self, encode_layers):\n        '''\n        Creates a specification for the mirror of encode layers - which completes the autoencoder specification\n        '''\n        decode_layers = []\n        for i, layer in reversed(list(enumerate(encode_layers))):\n            if (layer[\"type\"] == \"MaxPool2D*\"):\n                # Inverse max pool doesn't upscale the input, but does reverse of what happened when maxpool\n                # operation was performed\n                decode_layers.append({\n                    \"type\": \"InverseMaxPool2D\",\n                    \"layer_index\": i,\n                    'filter_size': layer['filter_size']\n                })\n            elif (layer[\"type\"] == \"MaxPool2D\"):\n                # Unpool just upscales the input back\n                decode_layers.append({\n                    \"type\": \"Unpool2D\",\n                    'filter_size': layer['filter_size']\n                })\n            elif (layer[\"type\"] == \"Conv2D\"):\n                # Inverse convolution = deconvolution\n                decode_layers.append({\n                    'type': 'Deconv2D',\n                    'conv_mode': layer['conv_mode'],\n                    'non_linearity': layer['non_linearity'],\n                    'filter_size': layer['filter_size'],\n                    'num_filters': encode_layers[i - 1]['output_shape'][0]\n                })\n            elif (layer[\"type\"] == \"Dense\" and not layer[\"is_encode\"]):\n                # Inverse of dense layers is just a dense layer, though we dont create an inverse layer corresponding to bottleneck layer\n                decode_layers.append({\n                    'type': 'Dense',\n                    'num_units': encode_layers[i]['output_shape'][2],\n                    'non_linearity': encode_layers[i]['non_linearity']\n                })\n                # if the layer following the dense layer is one of these, we need to reshape the output\n                if (encode_layers[i - 1]['type'] in (\"Conv2D\", \"MaxPool2D\", \"MaxPool2D*\")):\n                    decode_layers.append({\n                        \"type\": \"Reshape\",\n                        \"output_shape\": encode_layers[i - 1]['output_shape']\n                    })\n        encode_layers.extend(decode_layers)\n\n    def populateShapes(self, layers):\n        # Fills the dictionary with shape information corresponding to each layer, which will be used in creating the decode layers\n        last_layer_dimensions = layers[0]['output_shape']\n        for layer in layers[1:]:\n            if (layer['type'] == 'MaxPool2D' or layer['type'] == 'MaxPool2D*'):\n                layer['output_shape'] = [last_layer_dimensions[0], last_layer_dimensions[1] / layer['filter_size'][0],\n                                         last_layer_dimensions[2] / layer['filter_size'][1]]\n            elif (layer['type'] == 'Conv2D'):\n                multiplier = 1\n                if (layer['conv_mode'] == \"same\"):\n                    multiplier = 0\n                layer['output_shape'] = [layer['num_filters'],\n                                         last_layer_dimensions[1] - (layer['filter_size'][0] - 1) * multiplier,\n                                         last_layer_dimensions[2] - (layer['filter_size'][1] - 1) * multiplier]\n            elif (layer['type'] == 'Dense'):\n                layer['output_shape'] = [1, 1, layer['num_units']]\n            last_layer_dimensions = layer['output_shape']\n\n    def populateMissingDescriptions(self, network_description):\n        # Complete the architecture dictionary by filling in default values and populating description for decoder\n        if 'network_type' not in network_description:\n            if (network_description['name'].split('_')[0].split('-')[0] == 'fc'):\n                network_description['network_type'] = 'AE'\n            else:\n                network_description['network_type'] = 'CAE'\n        for layer in network_description['layers']:\n            if 'conv_mode' not in layer:\n                layer['conv_mode'] = 'valid'\n            layer['is_encode'] = False\n        network_description['layers'][-1]['is_encode'] = True\n        if 'output_non_linearity' not in network_description:\n            network_description['output_non_linearity'] = network_description['layers'][1]['non_linearity']\n        self.populateShapes(network_description['layers'])\n        self.populateDecoder(network_description['layers'])\n        if 'use_batch_norm' not in network_description:\n            network_description['use_batch_norm'] = False\n        for layer in network_description['layers']:\n            if 'is_encode' not in layer:\n                layer['is_encode'] = False\n            layer['is_output'] = False\n        network_description['layers'][-1]['is_output'] = True\n        network_description['layers'][-1]['non_linearity'] = network_description['output_non_linearity']\n        return network_description\n        \n    def getInitializationFct(self):\n\t\t\n\t\treturn lasagne.init.GlorotUniform()\n\n    def processLayer(self, network, layer_definition):\n        '''\n        Create a lasagne layer corresponding to the \"layer definition\"\n        '''\n        if (layer_definition[\"type\"] == \"Input\"):\n            if self.network_type == 'CAE':\n                network = lasagne.layers.InputLayer(shape=tuple([None] + layer_definition['output_shape']), input_var=self.t_input)\n            elif self.network_type == 'AE':\n                network = lasagne.layers.InputLayer(shape=(None, layer_definition['output_shape'][2]), input_var=self.t_input)\n        elif (layer_definition['type'] == 'Dense'):\n            network = lasagne.layers.DenseLayer(network, num_units=layer_definition['num_units'], nonlinearity=self.getNonLinearity(layer_definition['non_linearity']), name=self.getLayerName(layer_definition),W=self.getInitializationFct())\n        elif (layer_definition['type'] == 'Conv2D'):\n            network = lasagne.layers.Conv2DLayer(network, num_filters=layer_definition['num_filters'], filter_size=tuple(layer_definition[\"filter_size\"]), pad=layer_definition['conv_mode'], nonlinearity=self.getNonLinearity(layer_definition['non_linearity']), name=self.getLayerName(layer_definition),W=self.getInitializationFct())\n        elif (layer_definition['type'] == 'MaxPool2D' or layer_definition['type'] == 'MaxPool2D*'):\n            network = lasagne.layers.MaxPool2DLayer(network, pool_size=tuple(layer_definition[\"filter_size\"]), name=self.getLayerName(layer_definition))\n        elif (layer_definition['type'] == 'InverseMaxPool2D'):\n            network = lasagne.layers.InverseLayer(network, self.layer_list[layer_definition['layer_index']], name=self.getLayerName(layer_definition))\n        elif (layer_definition['type'] == 'Unpool2D'):\n            network = Unpool2DLayer(network, tuple(layer_definition['filter_size']), name=self.getLayerName(layer_definition))\n        elif (layer_definition['type'] == 'Reshape'):\n            network = lasagne.layers.ReshapeLayer(network, shape=tuple([-1] + layer_definition[\"output_shape\"]), name=self.getLayerName(layer_definition))\n        elif (layer_definition['type'] == 'Deconv2D'):\n            network = lasagne.layers.Deconv2DLayer(network, num_filters=layer_definition['num_filters'], filter_size=tuple(layer_definition['filter_size']), crop=layer_definition['conv_mode'], nonlinearity=self.getNonLinearity(layer_definition['non_linearity']), name=self.getLayerName(layer_definition))\n        self.layer_list.append(network)\n        # Batch normalization on all convolutional layers except if at output\n        if (self.batch_norm and (not layer_definition[\"is_output\"]) and layer_definition['type'] in (\"Conv2D\", \"Deconv2D\")):\n            network = batch_norm(network)\n        # Save the encode layer separately\n        if (layer_definition['is_encode']):\n            self.encode_layer = lasagne.layers.flatten(network, name='fl')\n            self.encode_size = layer_definition['output_shape'][0] * layer_definition['output_shape'][1] * layer_definition['output_shape'][2]\n        return network\n\t\n    def getLayerName(self, layer_definition):\n        '''\n        Utility method to name layers\n        '''\n        if (layer_definition['type'] == 'Dense'):\n            return 'fc[{}]'.format(layer_definition['num_units'])\n        elif (layer_definition['type'] == 'Conv2D'):\n            return '{}[{}]'.format(layer_definition['num_filters'],\n                                   'x'.join([str(fs) for fs in layer_definition['filter_size']]))\n        elif (layer_definition['type'] == 'MaxPool2D' or layer_definition['type'] == 'MaxPool2D*'):\n            return 'max[{}]'.format('x'.join([str(fs) for fs in layer_definition['filter_size']]))\n        elif (layer_definition['type'] == 'InverseMaxPool2D'):\n            return 'ups*[{}]'.format('x'.join([str(fs) for fs in layer_definition['filter_size']]))\n        elif (layer_definition['type'] == 'Unpool2D'):\n            return 'ups[{}]'.format(\n                str(layer_definition['filter_size'][0]) + 'x' + str(layer_definition['filter_size'][1]))\n        elif (layer_definition['type'] == 'Deconv2D'):\n            return '{}[{}]'.format(layer_definition['num_filters'],\n                                   'x'.join([str(fs) for fs in layer_definition['filter_size']]))\n        elif (layer_definition['type'] == 'Reshape'):\n            return \"rsh\"\n\n    def getNonLinearity(self, non_linearity):\n        return {\n            'rectify': lasagne.nonlinearities.rectify,\n            'linear': lasagne.nonlinearities.linear,\n            'elu': lasagne.nonlinearities.elu\n        }[non_linearity]\n"
  }
]