Repository: mkocaoglu/CausalGAN
Branch: master
Commit: 9d52b520b5ef
Files: 51
Total size: 288.4 KB

Directory structure:
gitextract_fvn7v_0h/

├── .gitignore
├── LICENSE
├── README.md
├── assets/
│   ├── 0808_112404_cbcg.csv
│   ├── 0810_191625_bcg.csv
│   ├── 0821_213901_rcbcg.csv
│   ├── guide_to_gifs.txt
│   └── tvdplot.ipynb
├── causal_began/
│   ├── CausalBEGAN.py
│   ├── __init__.py
│   ├── config.py
│   ├── models.py
│   └── utils.py
├── causal_controller/
│   ├── ArrayDict.py
│   ├── CausalController.py
│   ├── __init__.py
│   ├── config.py
│   ├── models.py
│   └── utils.py
├── causal_dcgan/
│   ├── CausalGAN.py
│   ├── __init__.py
│   ├── config.py
│   ├── models.py
│   ├── ops.py
│   └── utils.py
├── causal_graph.py
├── config.py
├── data_loader.py
├── download.py
├── figure_scripts/
│   ├── __init__.py
│   ├── distributions.py
│   ├── encode.py
│   ├── high_level.py
│   ├── pairwise.py
│   ├── probability_table.txt
│   ├── sample.py
│   └── utils.py
├── main.py
├── synthetic/
│   ├── README.md
│   ├── collect_stats.py
│   ├── config.py
│   ├── figure_generation.ipynb
│   ├── main.py
│   ├── models.py
│   ├── run_datasets.sh
│   ├── tboard.py
│   ├── trainer.py
│   └── utils.py
├── tboard.py
├── trainer.py
└── utils.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
data/
data
.*.swp

logs
old

final_checkpoints
checkpoint/
figures/
*.pyc
.DS_Store
.ipynb_checkpoints
[._]*.s[a-v][a-z]
[._]*.sw[a-p]
[._]s[a-v][a-z]
[._]sw[a-p]

samples
outputs


================================================
FILE: LICENSE
================================================
MIT License

Copyright (c) 2017 Murat Kocaoglu, Christopher Snyder

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.


================================================
FILE: README.md
================================================
# CausalGAN/CausalBEGAN in Tensorflow

Tensorflow implementation of [CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training](https://arxiv.org/abs/1709.02023)

### Top: Random samples from do(Bald=1); Bottom: Random samples from cond(Bald=1)
![alt text](./assets/314393_began_Bald_topdo1_botcond1.png)
### Top: Random samples from do(Mustache=1); Bottom: Random samples from cond(Mustache=1)
![alt text](./assets/314393_began_Mustache_topdo1_botcond1.png)


## Requirements
- Python 2.7
- [Pillow](https://pillow.readthedocs.io/en/4.0.x/)
- [tqdm](https://github.com/tqdm/tqdm)
- [requests](https://github.com/kennethreitz/requests) (Only used for downloading CelebA dataset)
- [TensorFlow 1.1.0](https://github.com/tensorflow/tensorflow)

## Getting Started

First download [CelebA](http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html) datasets with:

    $ apt-get install p7zip-full # ubuntu
    $ brew install p7zip # Mac
    $ pip install tqdm
    $ python download.py

## Usage

The CausalGAN/CausalBEGAN code factorizes into two components, which can be trained or loaded independently: the causal_controller module specifies the model which learns a causal generative model over labels, and the causal_dcgan or causal_began modules learn a GAN over images given those labels. We denote training the causal controller over labels as "pretraining" (--is_pretrain=True), and training a GAN over images given labels as "training" (--is_train=True)

To train a causal implicit model over labels and then over the image given the labels use

    $ python main.py --causal_model big_causal_graph --is_pretrain True --model_type began --is_train True

where "big_causal_graph" is one of the causal graphs specified by the keys in the causal_graphs dictionary in causal_graph.py. 

Alternatively, one can first train a causal implicit model over labels only with the following command:

    $ python main.py --causal_model big_causal_graph --is_pretrain True

One can then train a conditional generative model for the images given the trained causal generative model for the labels (causal controller), which yields a causal implicit generative model for the image and the labels, as suggested in [arXiv link to the paper]:

    $ echo CC-MODEL_PATH='./logs/celebA_0810_191625_0.145tvd_bcg/controller/checkpoints/CC-Model-20000'
    $ python main.py --causal_model big_causal_graph --pt_load_path $CC-MODEL_PATH --model_type began --is_train True 

Instead of loading the model piecewise, once image training has been run once, the entire joint model can be loaded more simply by specifying the model directory:

    $ python main.py --causal_model big_causal_graph --load_path ./logs/celebA_0815_170635 --model_type began --is_train True 

Tensorboard visualization of the most recently created model is simply (as long as port 6006 is free):

    $ python tboard.py


To interact with an already trained model I recommend the following procedure:

    ipython
    In [1]: %run main --causal_model big_causal_graph --load_path './logs/celebA_0815_170635' --model_type 'began'

For example to sample N=22 interventional images from do(Smiling=1) (as long as your causal graph includes a "Smiling" node:

    In [2]: sess.run(model.G,{cc.Smiling.label:np.ones((22,1), trainer.batch_size:22})

Conditional sampling is most efficiently done through 2 session calls: the first to cc.sample_label to get, and the second feeds that sampled label to get an image. See trainer.causal_sampling for a more extensive example. Note that is also possible combine conditioning and intervention during sampling.

    In [3]: lab_samples=cc.sample_label(sess,do_dict={'Bald':1}, cond_dict={'Mustache':1},N=22)

will sample all labels from the joint distribution conditioned on Mustache=1 and do(Bald=1). These label samples can be turned into image samples as follows:

    In [4]: feed_dict={cc.label_dict[k]:v for k,v in lab_samples.iteritems()}
    In [5]: feed_dict[trainer.batch_size]=22
    In [6]: images=sess.run(trainer.G,feed_dict)


### Configuration
Since this really controls training of 3 different models (CausalController, CausalGAN, and CausalBEGAN), many configuration options are available. To make things managable, there are 4 files corresponding to configurations specific to different parts of the model. Not all configuration combinations are tested. Default parameters are gauranteed to work.

configurations:
./config.py  :  generic data and scheduling
./causal_controller/config  :  specific to CausalController
./causal_dcgan/config  :  specific to CausalGAN
./causal_began/config  :  specific to CausalBEGAN

For convenience, the configurations used are saved in 4 .json files in the model directory for future reference.


## Results

### Causal Controller convergence
We show convergence in TVD for Causal Graph 1 (big_causal_graph in causal_graph.py), a completed version of Causal Graph 1 (complete_big_causal_graph in causal_graph.py, and an edge reversed version of the complete Causal Graph 1 (reverse_big_causal_graph in causal_graph.py). We could get reasonable marginals with a complete DAG containing all 40 nodes, but TVD becomes very difficult to measure. We show TVD convergence for 9 nodes for two complete graphs. When the graph is incomplete, there is a "TVD gap" but reasonable convergence.

![alt text](./assets/tvd_vs_step.png)

### Conditional vs Interventional Sampling:
We trained a causal implicit generative model assuming we are given the following causal graph over labels:
For the following images when we condition or intervene, these operations can be reasoned about from the graph structure. e.g., conditioning on mustache=1 should give more male whereas intervening should not (since the edges from the parents are disconnected in an intervention).

### CausalGAN Conditioning vs Intervening
For each label, images were randomly sampled by either _intervening_ (top row) or _conditioning_ (bottom row) on label=1.

![alt text](./assets/causalgan_pictures/45507_intvcond_Bald=1_2x10.png) Bald

![alt text](./assets/causalgan_pictures/45507_intvcond_Mouth_Slightly_Open=1_2x10.png) Mouth Slightly Open

![alt text](./assets/causalgan_pictures/45507_intvcond_Mustache=1_2x10.png) Mustache

![alt text](./assets/causalgan_pictures/45507_intvcond_Narrow_Eyes=1_2x10.png) Narrow Eyes

![alt text](./assets/causalgan_pictures/45507_intvcond_Smiling=1_2x10.png) Smiling

![alt text](./assets/causalgan_pictures/45507_intvcond_Eyeglasses=1_2x10.png) Eyeglasses

![alt text](./assets/causalgan_pictures/45507_intvcond_Wearing_Lipstick=1_2x10.png) Wearing Lipstick

### CausalBEGAN Conditioning vs Intervening
For each label, images were randomly sampled by either _intervening_ (top row) or _conditioning_ (bottom row) on label=1.

![alt text](./assets/causalbegan_pictures/190001_intvcond_Bald=1_2x10.png) Bald

![alt text](./assets/causalbegan_pictures/190001_intvcond_Mouth_Slightly_Open=1_2x10.png) Mouth Slightly Open

![alt text](./assets/causalbegan_pictures/190001_intvcond_Mustache=1_2x10.png) Mustache

![alt text](./assets/causalbegan_pictures/190001_intvcond_Narrow_Eyes=1_2x10.png) Narrow Eyes

![alt text](./assets/causalbegan_pictures/190001_intvcond_Smiling=1_2x10.png) Smiling

![alt text](./assets/causalbegan_pictures/190001_intvcond_Eyeglasses=1_2x10.png)  Eyeglasses

![alt text](./assets/causalbegan_pictures/190001_intvcond_Wearing_Lipstick=1_2x10.png) Wearing Lipstick

### CausalGAN Generator output (10x10) (randomly sampled label)
![alt text](https://user-images.githubusercontent.com/10726729/30076306-09743002-923e-11e7-8011-8523cd914f25.gif)

### CausalBEGAN Generator output (10x10) (randomly sampled label)
![alt text](https://user-images.githubusercontent.com/10726729/30076379-38b407fc-923e-11e7-81aa-4310c76a2e39.gif)

<---
  Repo originally forked from these two
- [BEGAN-tensorflow](https://github.com/carpedm20/BEGAN-tensorflow)
- [DCGAN-tensorflow](https://github.com/carpedm20/DCGAN-tensorflow)
-->

## Related works
- [Generative Adversarial Networks](https://arxiv.org/abs/1406.2661)
- [Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434)
- [Wasserstein GAN](https://arxiv.org/abs/1701.07875)
- [BEGAN: Boundary Equilibrium Generative Adversarial Networks](https://arxiv.org/abs/1703.10717)

## Authors

Christopher Snyder / [@22csnyder](http://22csnyder.github.io)
Murat Kocaoglu / [@mkocaoglu](http://mkocaoglu.github.io)


================================================
FILE: assets/0808_112404_cbcg.csv
================================================
Wall time,Step,Value
1502209477.065396,1,0.9871935844421387
1502210175.629644,1001,0.5611526370048523
1502210858.027971,2001,0.48091334104537964
1502211539.450148,3001,0.3693326711654663
1502212228.305266,4001,0.2690610885620117
1502212916.163691,5001,0.1852252036333084
1502213605.455342,6001,0.11786147207021713
1502214290.655429,7001,0.10585799068212509
1502214974.834744,8001,0.11575613915920258
1502215664.377923,9001,0.09277261048555374
1502216342.813149,10001,0.08084549009799957
1502217004.542623,11001,0.07447165995836258
1502217677.840079,12001,0.07388914376497269
1502218338.794636,13001,0.06354445964097977
1502219000.20777,14001,0.058855485171079636
1502219659.079145,15001,0.06558254361152649
1502220348.8056,16001,0.051907140761613846
1502221033.399544,17001,0.04890892282128334
1502221718.709654,18001,0.04604059085249901
1502222403.268966,19001,0.04389917105436325
1502223087.183902,20001,0.04280887916684151
1502223772.410776,21001,0.04196497052907944
1502224457.815937,22001,0.038901761174201965
1502225141.198389,23001,0.04273799806833267
1502225826.618027,24001,0.041886329650878906
1502226518.698883,25001,0.04319506511092186
1502227208.700241,26001,0.042861778289079666
1502227899.513253,27001,0.04321207478642464
1502228588.126751,28001,0.035417430102825165
1502229277.24218,29001,0.03713845834136009
1502229964.6007,30001,0.03938867151737213


================================================
FILE: assets/0810_191625_bcg.csv
================================================
Wall time,Step,Value
1502410626.387592,1,0.9544087648391724
1502411081.292726,1001,0.5290326476097107
1502411533.622933,2001,0.44044023752212524
1502411981.535893,3001,0.35751280188560486
1502412434.074014,4001,0.2676760256290436
1502412884.345166,5001,0.20682139694690704
1502413336.727762,6001,0.1853639930486679
1502413786.845507,7001,0.19252602756023407
1502414239.265506,8001,0.19284175336360931
1502414689.356373,9001,0.16991157829761505
1502415145.18223,10001,0.15723274648189545
1502415595.021095,11001,0.15078511834144592
1502416037.124821,12001,0.14841803908348083
1502416478.158467,13001,0.1522006243467331
1502416920.270544,14001,0.15191766619682312
1502417364.060506,15001,0.14936088025569916
1502417803.97219,16001,0.14549562335014343
1502418242.907475,17001,0.14224907755851746
1502418684.820146,18001,0.13779735565185547
1502419124.551228,19001,0.14404024183750153


================================================
FILE: assets/0821_213901_rcbcg.csv
================================================
Wall time,Step,Value
1503369574.677247,1,0.8920440077781677
1503370041.447478,1001,0.512530505657196
1503370517.215026,2001,0.44317319989204407
1503370985.171754,3001,0.35666027665138245
1503371450.274446,4001,0.2928802967071533
1503371929.346399,5001,0.19688302278518677
1503372408.39261,6001,0.13801704347133636
1503372886.733545,7001,0.1106921136379242
1503373363.362404,8001,0.08717407286167145
1503373839.834317,9001,0.0857364684343338
1503374318.503915,10001,0.07331433147192001
1503374802.444324,11001,0.07706638425588608
1503375279.389205,12001,0.06169278547167778
1503375752.728541,13001,0.059477031230926514
1503376226.577342,14001,0.061632610857486725
1503376699.448754,15001,0.06138858571648598
1503377174.465165,16001,0.05955960601568222
1503377653.261056,17001,0.04774799197912216
1503378126.625743,18001,0.05300581455230713
1503378604.128631,19001,0.047743991017341614
1503379079.647434,20001,0.05426724627614021
1503379555.901424,21001,0.04658582806587219
1503380028.219916,22001,0.04909271374344826
1503380498.204313,23001,0.05326574668288231
1503380962.853232,24001,0.05447468161582947
1503381428.927937,25001,0.05708151310682297
1503381893.354328,26001,0.051777616143226624
1503382360.002207,27001,0.046131476759910583
1503382825.077767,28001,0.04513547569513321
1503383290.90524,29001,0.044165026396512985


================================================
FILE: assets/guide_to_gifs.txt
================================================
#Approach uses imagemagick
#Take the first 20 images in a folder and convert to gif
ls -v | head -20 | xargs cp -t newfolder
cd newfolder
mogrify -format png *.pdf
mogrify -crop 62.5%x62.5%+0+0 +repage *.png
rm *.pdf
convert -delay 20 $(ls -v) -loop 0 -layers optimize mygifname.gif


================================================
FILE: assets/tvdplot.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using matplotlib backend: TkAgg\n"
     ]
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "import tensorflow as tf\n",
    "import pandas as pd\n",
    "%matplotlib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "\n",
    "raw_data={'cG1': pd.read_csv('0808_112404_cbcg.csv'),\n",
    "      'G1' : pd.read_csv('0810_191625_bcg.csv'),\n",
    "      'rcG1': pd.read_csv('0821_213901_rcbcg.csv')}\n",
    "xlabel='Training Step'\n",
    "dfs=[pd.DataFrame(data={k:v['Value'].values,xlabel:v['Step'].values}) for k,v in raw_data.items()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "\n",
    "raw_data={'Causal Graph 1' : pd.read_csv('0810_191625_bcg.csv'),\n",
    "          'complete Causal Graph 1': pd.read_csv('0808_112404_cbcg.csv'),      \n",
    "          'edge-reversed complete Causal Graph 1': pd.read_csv('0821_213901_rcbcg.csv')}\n",
    "xlabel='Training Step'\n",
    "dfs=[pd.DataFrame(data={k:v['Value'].values,xlabel:v['Step'].values}) for k,v in raw_data.items()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def my_merge(df1,df2):\n",
    "    return pd.merge(df1,df2,how='outer',on=xlabel)\n",
    "    \n",
    "\n",
    "plot_data=reduce(my_merge,dfs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.text.Text at 0x7f376528c690>"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "ax=plot_data.plot.line(x=xlabel,xlim=[0,18000],ylim=[0,1],style = ['bs-','ro-','y^-'])\n",
    "ax.set_ylabel('Total Variation Distance',fontsize=18)\n",
    "ax.set_title('TVD of Label Generation',fontsize=18)\n",
    "ax.set_xlabel(xlabel,fontsize=18)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.savefig('tvd_vs_step.pdf')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}


================================================
FILE: causal_began/CausalBEGAN.py
================================================
from __future__ import print_function
from utils import save_image,distribute_input_data,summary_stats,make_summary
import pandas as pd
import os
import StringIO
import scipy.misc
import numpy as np
from glob import glob
from tqdm import trange
from itertools import chain
from collections import deque
from figure_scripts.pairwise import crosstab
from figure_scripts.sample import intervention2d,condition2d

from utils import summary_stats
from models import *

class CausalBEGAN(object):
    '''
    A quick quirk about this class.
    if the model is built with a gpu, it must
    later be loaded with a gpu in order to preserve
    tensor structure: NCHW/NHCW (number-channel-height-width/number-height-channel-width)

    in paper <-> in code
    b1,c1    <-> b_k, k_t
    b2,c2    <-> b_l, l_t
    b3,c3    <-> b_z, z_t
    '''

    def __init__(self,batch_size,config):
        '''
        batch_size: again a tensorflow placeholder
        config    : see causal_began/config.py
        '''

        self.batch_size=batch_size #a tensor
        self.config=config
        self.use_gpu = config.use_gpu
        self.data_format=self.config.data_format#NHWC or NCHW
        self.TINY = 10**-6

        #number of calls to self.g_optim
        self.step = tf.Variable(0, name='step', trainable=False)

        #optimizers
        self.g_lr = tf.Variable(config.g_lr, name='g_lr')
        self.d_lr = tf.Variable(config.d_lr, name='d_lr')

        self.g_lr_update = tf.assign(self.g_lr, self.g_lr * 0.5, name='g_lr_update')
        self.d_lr_update = tf.assign(self.d_lr, self.d_lr * 0.5, name='d_lr_update')

        optimizer = tf.train.AdamOptimizer
        self.g_optimizer, self.d_optimizer = optimizer(self.g_lr), optimizer(self.d_lr)

        self.lambda_k = config.lambda_k
        self.lambda_l = config.lambda_l
        self.lambda_z = config.lambda_z
        self.gamma = config.gamma
        self.gamma_label = config.gamma_label
        self.zeta=config.zeta
        self.z_dim = config.z_dim
        self.conv_hidden_num = config.conv_hidden_num

        self.model_dir = config.model_dir

        self.start_step = 0
        self.log_step = config.log_step
        self.max_step = config.max_step
        self.lr_update_step = config.lr_update_step
        self.is_train = config.is_train

        #Keeps track of params from different devices
        self.tower_dict=dict(
                    c_tower_grads=[],
                    dcc_tower_grads=[],
                    g_tower_grads=[],
                    d_tower_grads=[],
                    tower_g_loss_image=[],
                    tower_d_loss_real=[],
                    tower_g_loss_label=[],
                    tower_d_loss_real_label=[],
                    tower_d_loss_fake_label=[],
            )
        self.k_t = tf.get_variable(name='k_t',initializer=0.,trainable=False)
        self.l_t = tf.get_variable(name='l_t',initializer=0.,trainable=False)
        self.z_t = tf.get_variable(name='z_t',initializer=0.,trainable=False)

    def __call__(self, real_inputs, fake_inputs):
        '''
        in a multi gpu setting, self.__call__ is done once for every device with variables shared so
        that a copy of the tensorflow variables created in self.__call__ resides on
        each device. This would be run multiple times in a loop over devices.

        Parameters:
        fake inputs : a dictionary of labels from cc
        real_inputs : also a dictionary of labels
                      with an additional key 'x' for the real image
        '''
        config=self.config

        #The keys are all the labels union 'x'
        self.real_inputs=real_inputs
        self.fake_inputs=fake_inputs
        n_labels=len(fake_inputs)#number of labels in graph, not dataset

        #[0,255] NHWC
        self.x = self.real_inputs.pop('x')

        #used to change dataformat in data queue
        if self.data_format == 'NCHW':
            #self.x = tf.transpose(self.x, [2, 0, 1])#3D
            self.x = tf.transpose(self.x, [0, 3, 1, 2])#4D
        elif self.data_format == 'NHWC':
            pass
        else:
            raise Exception("[!] Unkown data_format: {}".format(self.data_format))

        _, height, width, self.channel = \
                get_conv_shape(self.x, self.data_format)
        self.config.repeat_num= int(np.log2(height)) - 2
        self.config.channel=self.channel

        #There are two versions: "x" and "self.x".
        #    "x" is normalized for computation
        #    "self.x" is unnormalized for saving and summaries
        #    likewise for "G" and "self.G"
        #x in [-1,1]
        x = norm_img(self.x)

        self.real_labels=tf.concat(self.real_inputs.values(),-1)
        self.fake_labels=tf.concat(self.fake_inputs.values(),-1)

        #noise given to generate image in addition to labels
        self.z_gen = tf.random_uniform(
            (self.batch_size, self.z_dim), minval=-1.0, maxval=1.0)

        if self.config.round_fake_labels:#default
            self.z= tf.concat( [tf.round(self.fake_labels), self.z_gen],axis=-1,name='z')
        else:
            self.z= tf.concat( [self.fake_labels, self.z_gen],axis=-1,name='z')

        G, self.G_var = GeneratorCNN(self.z,config)
        d_out, self.D_z, self.D_var = DiscriminatorCNN(tf.concat([G, x],0),config)
        AE_G, AE_x = tf.split(d_out, 2)
        self.D_encode_G, self.D_encode_x=tf.split(self.D_z, 2)#axis=0 by default

        if not self.config.separate_labeler:
            self.D_fake_labels_logits=tf.slice(self.D_encode_G,[0,0],[-1,n_labels])
            self.D_real_labels_logits=tf.slice(self.D_encode_x,[0,0],[-1,n_labels])
        else:#default
            self.D_fake_labels_logits,self.DL_var=Discriminator_labeler(G,n_labels,config)
            self.D_real_labels_logits,_=Discriminator_labeler(x,n_labels,config,reuse=True)
            self.D_var += self.DL_var

        self.D_real_labels=tf.sigmoid(self.D_real_labels_logits)
        self.D_fake_labels=tf.sigmoid(self.D_fake_labels_logits)
        self.D_real_labels_list=tf.split(self.D_real_labels,n_labels,axis=1)
        self.D_fake_labels_list=tf.split(self.D_fake_labels,n_labels,axis=1)

        # sigmoid_cross_entropy_with_logits
        def sxe(logits,labels):
            #use zeros or ones if pass in scalar
            if not isinstance(labels,tf.Tensor):
                labels=labels*tf.ones_like(logits)
            return tf.nn.sigmoid_cross_entropy_with_logits(
                logits=logits,labels=labels)

        #Round fake labels before calc loss
        if self.config.round_fake_labels:#default
            fake_labels=tf.round(self.fake_labels)
        else:
            fake_labels=self.fake_labels

        #This is here because it's used in cross_entropy calc, but it's not used by default
        self.fake_labels_logits= -tf.log(1/(self.fake_labels+self.TINY)-1)

        #One of three label losses available
        # Default is squared loss, "squarediff"
        self.d_xe_real_label=sxe(self.D_real_labels_logits,self.real_labels)
        self.d_xe_fake_label=sxe(self.D_fake_labels_logits,fake_labels)
        self.g_xe_label=sxe(self.fake_labels_logits, self.D_fake_labels)

        self.d_absdiff_real_label=tf.abs(self.D_real_labels  - self.real_labels)
        self.d_absdiff_fake_label=tf.abs(self.D_fake_labels  - fake_labels)
        self.g_absdiff_label     =tf.abs(fake_labels  -  self.D_fake_labels)

        self.d_squarediff_real_label=tf.square(self.D_real_labels  - self.real_labels)
        self.d_squarediff_fake_label=tf.square(self.D_fake_labels  - fake_labels)
        self.g_squarediff_label     =tf.square(fake_labels  -  self.D_fake_labels)

        if self.config.label_loss=='xe':
            self.d_loss_real_label = tf.reduce_mean(self.d_xe_real_label)
            self.d_loss_fake_label = tf.reduce_mean(self.d_xe_fake_label)
            self.g_loss_label      = tf.reduce_mean(self.g_xe_label)
        elif self.config.label_loss=='absdiff':
            self.d_loss_real_label = tf.reduce_mean(self.d_absdiff_real_label)
            self.d_loss_fake_label = tf.reduce_mean(self.d_absdiff_fake_label)
            self.g_loss_label      = tf.reduce_mean(self.g_absdiff_label)
        elif self.config.label_loss=='squarediff':
            self.d_loss_real_label = tf.reduce_mean(self.d_squarediff_real_label)
            self.d_loss_fake_label = tf.reduce_mean(self.d_squarediff_fake_label)
            self.g_loss_label      = tf.reduce_mean(self.g_squarediff_label)

        #"self.G" is [0,255], "G" is [-1,1]
        self.G = denorm_img(G, self.data_format)
        self.AE_G, self.AE_x = denorm_img(AE_G, self.data_format), denorm_img(AE_x, self.data_format)

        u1=tf.abs(AE_x - x)
        u2=tf.abs(AE_G - G)
        m1=tf.reduce_mean(u1)
        m2=tf.reduce_mean(u2)
        c1=tf.reduce_mean(tf.square(u1-m1))
        c2=tf.reduce_mean(tf.square(u2-m2))
        self.eqn2 = tf.square(m1-m2)#from orig began paper
        self.eqn1 = (c1+c2-2*tf.sqrt(c1*c2))/self.eqn2#from orig began paper

        self.d_loss_real = tf.reduce_mean(u1)
        self.d_loss_fake = tf.reduce_mean(u2)
        self.g_loss_image = tf.reduce_mean(tf.abs(AE_G - G))

        self.d_loss_image=self.d_loss_real       -   self.k_t*self.d_loss_fake
        self.d_loss_label=self.d_loss_real_label -   self.l_t*self.d_loss_fake_label
        self.d_loss=self.d_loss_image+self.d_loss_label

        if not self.config.no_third_margin:#normal mode
            #Careful on z_t sign!#(z_t <==> c_3 from paper)
            self.g_loss = self.g_loss_image + self.z_t*self.g_loss_label
        else:
            print('Warning: not using third margin')
            self.g_loss = self.g_loss_image + 1.*self.g_loss_label

        # Calculate the gradients for the batch of data,
        # on this particular gpu tower.
        g_grad=self.g_optimizer.compute_gradients(self.g_loss,var_list=self.G_var)
        d_grad=self.d_optimizer.compute_gradients(self.d_loss,var_list=self.D_var)

        self.tower_dict['g_tower_grads'].append(g_grad)
        self.tower_dict['d_tower_grads'].append(d_grad)
        self.tower_dict['tower_g_loss_image'].append(self.g_loss_image)
        self.tower_dict['tower_d_loss_real'].append(self.d_loss_real)
        self.tower_dict['tower_g_loss_label'].append(self.g_loss_label)
        self.tower_dict['tower_d_loss_real_label'].append(self.d_loss_real_label)
        self.tower_dict['tower_d_loss_fake_label'].append(self.d_loss_fake_label)

        self.var=self.G_var+self.D_var+[self.step]

    def build_train_op(self):
        #Now outside gpu loop

        #attributes starting with ave_ are averaged over devices
        self.ave_d_loss_real       =tf.reduce_mean(self.tower_dict['tower_d_loss_real'])
        self.ave_g_loss_image      =tf.reduce_mean(self.tower_dict['tower_g_loss_image'])
        self.ave_d_loss_real_label =tf.reduce_mean(self.tower_dict['tower_d_loss_real_label'])
        self.ave_d_loss_fake_label =tf.reduce_mean(self.tower_dict['tower_d_loss_fake_label'])
        self.ave_g_loss_label      =tf.reduce_mean(self.tower_dict['tower_g_loss_label'])

        #recalculate balance equations (b1,b2,b3 in paper)
        self.balance_k = self.gamma * self.ave_d_loss_real - self.ave_g_loss_image
        self.balance_l = self.gamma_label * self.ave_d_loss_real_label - self.ave_d_loss_fake_label
        self.balance_z = self.zeta*tf.nn.relu(self.balance_k) - tf.nn.relu(self.balance_l)

        self.measure = self.ave_d_loss_real + tf.abs(self.balance_k)
        self.measure_complete = self.ave_d_loss_real + self.ave_d_loss_real_label + \
            tf.abs(self.balance_k)+tf.abs(self.balance_l)+tf.abs(self.balance_z)

        #update margins coefficients (c1,c2,c3 in paper)
        k_update = tf.assign(
            self.k_t, tf.clip_by_value(self.k_t + self.lambda_k*self.balance_k, 0, 1))
        l_update = tf.assign(
            self.l_t, tf.clip_by_value(self.l_t + self.lambda_l*self.balance_l, 0, 1))
        z_update = tf.assign(
            self.z_t, tf.clip_by_value(self.z_t + self.lambda_z*self.balance_z, 0, 1))

        g_grads=average_gradients(self.tower_dict['g_tower_grads'])
        d_grads=average_gradients(self.tower_dict['d_tower_grads'])

        g_optim = self.g_optimizer.apply_gradients(g_grads, global_step=self.step)
        d_optim = self.d_optimizer.apply_gradients(d_grads)

        #every time train_op is run, run k_update, l_update, z_update
        with tf.control_dependencies([k_update,l_update,z_update]):
            #when train_op is run, run [g_optim,d_optim]
            self.train_op=tf.group(g_optim, d_optim)

    def train_step(self,sess,counter):
        sess.run(self.train_op)

        if counter % self.config.lr_update_step == self.lr_update_step - 1:
            sess.run([self.g_lr_update, self.d_lr_update])

    def build_summary_op(self):
        names,real_labels_list=zip(*self.real_inputs.items())
        _    ,fake_labels_list=zip(*self.fake_inputs.items())
        LabelList=[names,real_labels_list,fake_labels_list,
                   self.D_fake_labels_list,self.D_real_labels_list]
        for name,rlabel,flabel,d_fake_label,d_real_label in zip(*LabelList):
            with tf.name_scope(name):

                d_flabel=tf.cast(tf.round(d_fake_label),tf.int32)
                d_rlabel=tf.cast(tf.round(d_real_label),tf.int32)
                f_acc=tf.contrib.metrics.accuracy(tf.cast(tf.round(flabel),tf.int32),d_flabel)
                r_acc=tf.contrib.metrics.accuracy(tf.cast(tf.round(rlabel),tf.int32),d_rlabel)

                summary_stats('d_fake_label',d_fake_label,hist=True)
                summary_stats('d_real_label',d_real_label,hist=True)

                tf.summary.scalar('ave_d_fake_abs_diff',tf.reduce_mean(tf.abs(flabel-d_fake_label)))
                tf.summary.scalar('ave_d_real_abs_diff',tf.reduce_mean(tf.abs(rlabel-d_real_label)))

                tf.summary.scalar('real_label_ave',tf.reduce_mean(rlabel))
                tf.summary.scalar('real_label_accuracy',r_acc)
                tf.summary.scalar('fake_label_accuracy',f_acc)

        ##Summaries picked from last gpu to run
        tf.summary.scalar('losslabel/d_loss_real_label',tf.reduce_mean(self.ave_d_loss_real_label))
        tf.summary.scalar('losslabel/d_loss_fake_label',tf.reduce_mean(self.ave_d_loss_fake_label))
        tf.summary.scalar('losslabel/g_loss_label',self.g_loss_label)

        tf.summary.image("G", self.G),
        tf.summary.image("AE_G", self.AE_G),
        tf.summary.image("AE_x", self.AE_x),

        tf.summary.scalar("loss/d_loss", self.d_loss),
        tf.summary.scalar("loss/d_loss_fake", self.d_loss_fake),
        tf.summary.scalar("loss/g_loss", self.g_loss),

        tf.summary.scalar("misc/d_lr", self.d_lr),
        tf.summary.scalar("misc/g_lr", self.g_lr),
        tf.summary.scalar("misc/eqn1", self.eqn1),#From orig BEGAN paper
        tf.summary.scalar("misc/eqn2", self.eqn2),#From orig BEGAN paper

        #summaries of gpu-averaged values
        tf.summary.scalar("loss/d_loss_real",self.ave_d_loss_real),
        tf.summary.scalar("loss/g_loss_image", self.ave_g_loss_image),
        tf.summary.scalar("balance/l", self.balance_l),
        tf.summary.scalar("balance/k", self.balance_k),
        tf.summary.scalar("balance/z", self.balance_z),
        tf.summary.scalar("misc/measure", self.measure),
        tf.summary.scalar("misc/measure_complete", self.measure_complete),
        tf.summary.scalar("misc/k_t", self.k_t),
        tf.summary.scalar("misc/l_t", self.l_t),
        tf.summary.scalar("misc/z_t", self.z_t),

        #doesn't include summaries from causal controller
        #TODO: rework so only 1 copy of summaries if multiple gpu
        self.summary_op=tf.summary.merge_all()


================================================
FILE: causal_began/__init__.py
================================================


================================================
FILE: causal_began/config.py
================================================
#-*- coding: utf-8 -*-
import argparse

def str2bool(v):
    #return (v is True) or (v.lower() in ('true', '1'))
    return v is True or v.lower() in ('true', '1')

arg_lists = []
parser = argparse.ArgumentParser()

def add_argument_group(name):
    arg = parser.add_argument_group(name)
    arg_lists.append(arg)
    return arg


#Network
net_arg = add_argument_group('Network')
net_arg.add_argument('--c_dim',type=int, default=3,
                     help='''number of color channels. I wouldn't really change
                     this from 3''')
net_arg.add_argument('--conv_hidden_num', type=int, default=128,
                     choices=[64, 128],help='n in the paper')
net_arg.add_argument('--separate_labeler', type=str2bool, default=True)
net_arg.add_argument('--z_dim', type=int, default=64, choices=[64, 128],
                    help='''dimension of the noise input to the generator along
                    with the labels''')
net_arg.add_argument('--z_num', type=int, default=64,
                    help='''dimension of the hidden space of the autoencoder''')


# Data
data_arg = add_argument_group('Data')
data_arg.add_argument('--dataset', type=str, default='celebA')
data_arg.add_argument('--split', type=str, default='train')
data_arg.add_argument('--batch_size', type=int, default=16)

# Training / test parameters
train_arg = add_argument_group('Training')
train_arg.add_argument('--beta1', type=float, default=0.5)
train_arg.add_argument('--beta2', type=float, default=0.999)
train_arg.add_argument('--d_lr', type=float, default=0.00008)
train_arg.add_argument('--g_lr', type=float, default=0.00008)
train_arg.add_argument('--label_loss',type=str,default='squarediff',choices=['xe','absdiff','squarediff'],
                      help='''what comparison should be made between the
                       labeler output and the actual labels''')
train_arg.add_argument('--lr_update_step', type=int, default=100000, choices=[100000, 75000])
train_arg.add_argument('--max_step', type=int, default=50000)
train_arg.add_argument('--num_iter',type=int,default=250000,
                       help='the number of training iterations to run the model for')
train_arg.add_argument('--optimizer', type=str, default='adam')
train_arg.add_argument('--round_fake_labels',type=str2bool,default=True,
                       help='''Whether the label outputs of the causal
                       controller should be rounded first before calculating
                       the loss of generator or d-labeler''')
train_arg.add_argument('--use_gpu', type=str2bool, default=True)
train_arg.add_argument('--num_gpu', type=int, default=1,
                      help='specify 0 for cpu. If k specified, will default to\
                      first k of n gpus detected. If use_gpu=True but num_gpu not\
                      specified will default to 1')

margin_arg = add_argument_group('Margin')
margin_arg.add_argument('--gamma', type=float, default=0.5)
margin_arg.add_argument('--gamma_label', type=float, default=0.5)
margin_arg.add_argument('--lambda_k', type=float, default=0.001)
margin_arg.add_argument('--lambda_l', type=float, default=0.00008,
                       help='''As mentioned in the paper this is lower because
                       this margin can be responded to more quickly than the
                        other margins. Im not sure if it definitely needs to be lower''')
margin_arg.add_argument('--lambda_z', type=float, default=0.01)
margin_arg.add_argument('--no_third_margin', type=str2bool, default=False,
                       help='''Use True for appendix figure in paper. This is
                        used to neglect the third margin (c3,b3)''')
margin_arg.add_argument('--zeta', type=float, default=0.5,
                       help='''This is gamma_3 in the paper''')

# Misc
misc_arg = add_argument_group('Misc')
misc_arg.add_argument('--is_train',type=str2bool,default=False,
                      help='''whether to enter the image training loop''')
misc_arg.add_argument('--build_all', type=str2bool, default=False,
                     help='''normally specifying is_pretrain=False will cause
                     the pretraining components not to be built and likewise
                      with is_train=False only the pretrain compoenent will
                      (possibly) be built. This is here as a debug helper to
                      enable building out the whole model without doing any
                      training''')
misc_arg.add_argument('--data_dir', type=str, default='data')
misc_arg.add_argument('--dry_run', action='store_true')
#misc_arg.add_argument('--dry_run', type=str2bool, default='False')
misc_arg.add_argument('--log_step', type=int, default=100,
                     help='''how often to log stuff. Sample images are created
                     every 10*log_step''')
misc_arg.add_argument('--num_log_samples', type=int, default=3)
misc_arg.add_argument('--log_level', type=str, default='INFO', choices=['INFO', 'DEBUG', 'WARN'])
misc_arg.add_argument('--log_dir', type=str, default='logs')


def gpu_logic(config):
    #consistency between use_gpu and num_gpu
    if config.num_gpu>0:
        config.use_gpu=True
    else:
        config.use_gpu=False
#        if config.use_gpu and config.num_gpu==0:
#            config.num_gpu=1
    return config


def get_config():
    config, unparsed = parser.parse_known_args()
    config=gpu_logic(config)

    #this has to respect gpu/cpu
    #data_format = 'NCHW'
    if config.use_gpu:
        data_format = 'NCHW'
    else:
        data_format = 'NHWC'
    setattr(config, 'data_format', data_format)


    print('Loaded ./causal_began/config.py')

    return config, unparsed

if __name__=='__main__':
    #for debug of config
    config, unparsed = get_config()


================================================
FILE: causal_began/models.py
================================================
import numpy as np
import tensorflow as tf
slim = tf.contrib.slim


def lrelu(x,leak=0.2,name='lrelu'):
    with tf.variable_scope(name):
        f1=0.5 * (1+leak)
        f2=0.5 * (1-leak)
        return f1*x + f2*tf.abs(x)

def GeneratorCNN( z, config, reuse=None):
    hidden_num=config.conv_hidden_num
    output_num=config.c_dim
    repeat_num=config.repeat_num
    data_format=config.data_format

    with tf.variable_scope("G",reuse=reuse) as vs:
        x = slim.fully_connected(z, np.prod([8, 8, hidden_num]),activation_fn=None,scope='fc1')
        x = reshape(x, 8, 8, hidden_num, data_format)

        for idx in range(repeat_num):
            x = slim.conv2d(x, hidden_num, 3, 1, activation_fn=tf.nn.elu,
                            data_format=data_format,scope='conv'+str(idx)+'a')
            x = slim.conv2d(x, hidden_num, 3, 1, activation_fn=tf.nn.elu,
                            data_format=data_format,scope='conv'+str(idx)+'b')
            if idx < repeat_num - 1:
                x = upscale(x, 2, data_format)

        out = slim.conv2d(x, 3, 3, 1, activation_fn=None,data_format=data_format,scope='conv'+str(idx+1))

    variables = tf.contrib.framework.get_variables(vs)
    return out, variables

def DiscriminatorCNN(image, config, reuse=None):
    hidden_num=config.conv_hidden_num
    data_format=config.data_format
    input_channel=config.channel

    with tf.variable_scope("D",reuse=reuse) as vs:
        # Encoder
        with tf.variable_scope('encoder'):
            x = slim.conv2d(image, hidden_num, 3, 1, activation_fn=tf.nn.elu,
                            data_format=data_format,scope='conv0')

            prev_channel_num = hidden_num
            for idx in range(config.repeat_num):
                channel_num = hidden_num * (idx + 1)
                x = slim.conv2d(x, channel_num, 3, 1, activation_fn=tf.nn.elu,
                                data_format=data_format,scope='conv'+str(idx+1)+'a')
                x = slim.conv2d(x, channel_num, 3, 1, activation_fn=tf.nn.elu,
                                data_format=data_format,scope='conv'+str(idx+1)+'b')
                if idx < config.repeat_num - 1:
                    x = slim.conv2d(x, channel_num, 3, 2, activation_fn=tf.nn.elu,
                                    data_format=data_format,scope='conv'+str(idx+1)+'c')
                    #x = tf.contrib.layers.max_pool2d(x, [2, 2], [2, 2], padding='VALID')

            x = tf.reshape(x, [-1, np.prod([8, 8, channel_num])])
            z = x = slim.fully_connected(x, config.z_num, activation_fn=None,scope='proj')

        # Decoder
        with tf.variable_scope('decoder'):
            x = slim.fully_connected(x, np.prod([8, 8, hidden_num]), activation_fn=None)
            x = reshape(x, 8, 8, hidden_num, data_format)

            for idx in range(config.repeat_num):
                x = slim.conv2d(x, hidden_num, 3, 1, activation_fn=tf.nn.elu,
                                data_format=data_format,scope='conv'+str(idx)+'a')
                x = slim.conv2d(x, hidden_num, 3, 1, activation_fn=tf.nn.elu,
                                data_format=data_format,scope='conv'+str(idx)+'b')
                if idx < config.repeat_num - 1:
                    x = upscale(x, 2, data_format)
            out = slim.conv2d(x, input_channel, 3, 1, activation_fn=None,
                              data_format=data_format,scope='proj')

    variables = tf.contrib.framework.get_variables(vs)
    return out, z, variables


def Discriminator_labeler(image, output_size, config, reuse=None):
    hidden_num=config.conv_hidden_num
    repeat_num=config.repeat_num
    data_format=config.data_format
    with tf.variable_scope("discriminator_labeler",reuse=reuse) as scope:

        x = slim.conv2d(image, hidden_num, 3, 1, activation_fn=tf.nn.elu,
                        data_format=data_format,scope='conv0')

        prev_channel_num = hidden_num
        for idx in range(repeat_num):
            channel_num = hidden_num * (idx + 1)
            x = slim.conv2d(x, channel_num, 3, 1, activation_fn=tf.nn.elu,
                            data_format=data_format,scope='conv'+str(idx+1)+'a')
            x = slim.conv2d(x, channel_num, 3, 1, activation_fn=tf.nn.elu,
                            data_format=data_format,scope='conv'+str(idx+1)+'b')
            if idx < repeat_num - 1:
                x = slim.conv2d(x, channel_num, 3, 2, activation_fn=tf.nn.elu,
                                data_format=data_format,scope='conv'+str(idx+1)+'c')
                #x = tf.contrib.layers.max_pool2d(x, [2, 2], [2, 2], padding='VALID')

        x = tf.reshape(x, [-1, np.prod([8, 8, channel_num])])
        label_logit = slim.fully_connected(x, output_size, activation_fn=None,scope='proj')

        variables = tf.contrib.framework.get_variables(scope)
        return label_logit,variables

def next(loader):
    return loader.next()[0].data.numpy()

def to_nhwc(image, data_format):
    if data_format == 'NCHW':
        #Isn't this backward?
        new_image = nchw_to_nhwc(image)
    else:
        new_image = image
    return new_image

def to_nchw_numpy(image):
    if image.shape[3] in [1, 3]:
        new_image = image.transpose([0, 3, 1, 2])
    else:
        new_image = image
    return new_image

def norm_img(image, data_format=None):
    image = image/127.5 - 1.
    if data_format:
        image = to_nhwc(image, data_format)
    return image

def denorm_img(norm, data_format):
    return tf.clip_by_value(to_nhwc((norm + 1)*127.5, data_format), 0, 255)

def slerp(val, low, high):
    """Code from https://github.com/soumith/dcgan.torch/issues/14"""
    omega = np.arccos(np.clip(np.dot(low/np.linalg.norm(low), high/np.linalg.norm(high)), -1, 1))
    so = np.sin(omega)
    if so == 0:
        return (1.0-val) * low + val * high # L'Hopital's rule/LERP
    return np.sin((1.0-val)*omega) / so * low + np.sin(val*omega) / so * high

def int_shape(tensor):
    shape = tensor.get_shape().as_list()
    return [num if num is not None else -1 for num in shape]

def get_conv_shape(tensor, data_format):
    shape = int_shape(tensor)
    # always return [N, H, W, C]
    if data_format == 'NCHW':
        return [shape[0], shape[2], shape[3], shape[1]]
    elif data_format == 'NHWC':
        return shape

def nchw_to_nhwc(x):
    return tf.transpose(x, [0, 2, 3, 1])

def nhwc_to_nchw(x):
    return tf.transpose(x, [0, 3, 1, 2])

def reshape(x, h, w, c, data_format):
    if data_format == 'NCHW':
        x = tf.reshape(x, [-1, c, h, w])
    else:
        x = tf.reshape(x, [-1, h, w, c])
    return x

def resize_nearest_neighbor(x, new_size, data_format):
    if data_format == 'NCHW':
        x = nchw_to_nhwc(x)
        x = tf.image.resize_nearest_neighbor(x, new_size)
        x = nhwc_to_nchw(x)
    else:
        x = tf.image.resize_nearest_neighbor(x, new_size)
    return x

def upscale(x, scale, data_format):
    _, h, w, _ = get_conv_shape(x, data_format)
    return resize_nearest_neighbor(x, (h*scale, w*scale), data_format)


#https://github.com/tensorflow/models/blob/master/tutorials/image/cifar10/cifar10_multi_gpu_train.py#L168
def average_gradients(tower_grads):
    """Calculate the average gradient for each shared variable across all towers.
    Note that this function provides a synchronization point across all towers.
    Args:
    tower_grads: List of lists of (gradient, variable) tuples.
    The outer list
    is over individual gradients. The inner list is over the gradient
    calculation for each tower.
    Returns:
    List of pairs of (gradient, variable) where the gradient has been averaged across all towers.
    """
    average_grads = []
    for grad_and_vars in zip(*tower_grads):
        # Note that each grad_and_vars looks like the following:
        #   ((grad0_gpu0, var0_gpu0), ... , (grad0_gpuN, var0_gpuN))
        grads = []
        for g, _ in grad_and_vars:
            # Add 0 dimension to the gradients to represent the tower.
            expanded_g = tf.expand_dims(g, 0)

            # Append on a 'tower' dimension which we will average over below.
            grads.append(expanded_g)

        # Average over the 'tower' dimension.
        grad = tf.concat(axis=0, values=grads)
        grad = tf.reduce_mean(grad, 0)

        # Keep in mind that the Variables are redundant because they are shared
        # across towers.  So ..  we will just return the first tower's pointer to the Variable.
        v = grad_and_vars[0][1]
        grad_and_var = (grad, v)
        average_grads.append(grad_and_var)
    return average_grads


================================================
FILE: causal_began/utils.py
================================================
from __future__ import print_function
import tensorflow as tf
import os
from os import listdir
from os.path import isfile, join
import shutil
import sys
import math
import json
import logging
import numpy as np
from PIL import Image
from datetime import datetime
from tensorflow.core.framework import summary_pb2

def make_summary(name, val):
    return summary_pb2.Summary(value=[summary_pb2.Summary.Value(tag=name, simple_value=val)])

def summary_stats(name,tensor,collections=None,hist=False):
    collections=collections or [tf.GraphKeys.SUMMARIES]
    ave=tf.reduce_mean(tensor)
    std=tf.sqrt(tf.reduce_mean(tf.square(ave-tensor)))
    tf.summary.scalar(name+'_ave',ave,collections)
    tf.summary.scalar(name+'_std',std,collections)
    if hist:
        tf.summary.histogram(name+'_hist',tensor,collections)


def prepare_dirs_and_logger(config):
    formatter = logging.Formatter("%(asctime)s:%(levelname)s::%(message)s")
    logger = logging.getLogger()

    for hdlr in logger.handlers:
        logger.removeHandler(hdlr)

    handler = logging.StreamHandler()
    handler.setFormatter(formatter)

    logger.addHandler(handler)

    if config.load_path:
        if config.load_path.startswith(config.log_dir):
            config.model_dir = config.load_path
        else:
            if config.load_path.startswith(config.dataset):
                config.model_name = config.load_path
            else:
                config.model_name = "{}_{}".format(config.dataset, config.load_path)
    else:
        config.model_name = "{}_{}".format(config.dataset, get_time())

    if not hasattr(config, 'model_dir'):
        config.model_dir = os.path.join(config.log_dir, config.model_name)
    config.data_path = os.path.join(config.data_dir, config.dataset)

    if not config.load_path:
        config.log_code_dir=os.path.join(config.model_dir,'code')
        for path in [config.log_dir, config.data_dir,
                     config.model_dir, config.log_code_dir]:
            if not os.path.exists(path):
                os.makedirs(path)

        #Copy python code in directory into model_dir/code for future reference:
        code_dir=os.path.dirname(os.path.realpath(sys.argv[0]))
        model_files = [f for f in listdir(code_dir) if isfile(join(code_dir, f))]
        for f in model_files:
            if f.endswith('.py'):
                shutil.copy2(f,config.log_code_dir)

def get_time():
    return datetime.now().strftime("%m%d_%H%M%S")

def save_config(config):
    param_path = os.path.join(config.model_dir, "params.json")

    print("[*] MODEL dir: %s" % config.model_dir)
    print("[*] PARAM path: %s" % param_path)

    with open(param_path, 'w') as fp:
        json.dump(config.__dict__, fp, indent=4, sort_keys=True)

def get_available_gpus():
    from tensorflow.python.client import device_lib
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type=='GPU']

def distribute_input_data(data_loader,num_gpu):
    '''
    data_loader is a dictionary of tensors that are fed into our model

    This function takes that dictionary of n*batch_size dimension tensors
    and breaks it up into n dictionaries with the same key of tensors with
    dimension batch_size. One is given to each gpu
    '''
    if num_gpu==0:
        return {'/cpu:0':data_loader}

    gpus=get_available_gpus()
    if num_gpu > len(gpus):
        raise ValueError('number of gpus specified={}, more than gpus available={}'.format(num_gpu,len(gpus)))

    gpus=gpus[:num_gpu]


    data_by_gpu={g:{} for g in gpus}
    for key,value in data_loader.items():
        spl_vals=tf.split(value,num_gpu)
        for gpu,val in zip(gpus,spl_vals):
            data_by_gpu[gpu][key]=val

    return data_by_gpu


def rank(array):
    return len(array.shape)

def make_grid(tensor, nrow=8, padding=2,
              normalize=False, scale_each=False):
    """Code based on https://github.com/pytorch/vision/blob/master/torchvision/utils.py"""
    nmaps = tensor.shape[0]
    xmaps = min(nrow, nmaps)
    ymaps = int(math.ceil(float(nmaps) / xmaps))
    height, width = int(tensor.shape[1] + padding), int(tensor.shape[2] + padding)
    grid = np.zeros([height * ymaps + 1 + padding // 2, width * xmaps + 1 + padding // 2, 3], dtype=np.uint8)
    k = 0
    for y in range(ymaps):
        for x in range(xmaps):
            if k >= nmaps:
                break
            h, h_width = y * height + 1 + padding // 2, height - padding
            w, w_width = x * width + 1 + padding // 2, width - padding

            grid[h:h+h_width, w:w+w_width] = tensor[k]
            k = k + 1
    return grid

def save_image(tensor, filename, nrow=8, padding=2,
               normalize=False, scale_each=False):
    ndarr = make_grid(tensor, nrow=nrow, padding=padding,
                            normalize=normalize, scale_each=scale_each)
    im = Image.fromarray(ndarr)
    im.save(filename)


================================================
FILE: causal_controller/ArrayDict.py
================================================
import numpy as np
class ArrayDict(object):

    '''
    This is a class for manipulating dictionaries of arrays
    or dictionaries of scalars. I find this comes up pretty often when dealing
    with tensorflow, because you can pass dictionaries to feed_dict and get
    dictionaries back. If you use a smaller batch_size, you then want to
    "concatenate" these outputs for each key.
    '''

    def __init__(self):
        self.dict={}
    def __len__(self):
        if len(self.dict)==0:
            return 0
        else:
            return len(self.dict.values()[0])
    def __repr__(self):
        return repr(self.dict)
    def keys(self):
        return self.dict.keys()
    def items(self):
        return self.dict.items()

    def validate_dict(self,a_dict):
        #Check keys
        for key,val in self.dict.items():
            if not key in a_dict.keys():
                raise ValueError('key:',key,'was not in a_dict.keys()')

        for key,val in a_dict.items():
            #Check same keys
            if not key in self.dict.keys():
                raise ValueError('argument key:',key,'was not in self.dict')

            if isinstance(val,np.ndarray):
                #print('ndarray')
                my_val=self.dict[key]
                if not np.all(val.shape[1:]==my_val.shape[1:]):
                    raise ValueError('key:',key,'value shape',val.shape,'does\
                                     not match existing shape',my_val.shape)
            else: #scalar
                a_val=np.array([[val]])#[1,1]shape array
                my_val=self.dict[key]
                if not np.all(my_val.shape[1:]==a_val.shape[1:]):
                    raise ValueError('key:',key,'value shape',val.shape,'does\
                                     not match existing shape',my_val.shape)
    def arr_dict(self,a_dict):
        if isinstance(a_dict.values()[0],np.ndarray):
            return a_dict
        else:
            return {k:np.array([[v]]) for k,v in a_dict.items()}


    def concat(self,a_dict):
        if self.dict=={}:
            self.dict=self.arr_dict(a_dict)#store interally as array
        else:
            self.validate_dict(a_dict)
            self.dict={k:np.vstack([v,a_dict[k]]) for k,v in self.items()}

    def __getitem__(self,at):
        return {k:v[at] for k,v in self.items()}

#debug, run tests
if __name__=='__main__':
    out1=ArrayDict()
    d1={'Male':np.ones((3,1)),'Young':2*np.ones((3,1))}
    d2={'Male':3,'Young':33}
    d3={'Male':4*np.ones((4,1)),'Young':4*np.ones((4,1))}

    out1.concat(d1)
    out1.concat(d2)

    out2=ArrayDict()
    out2.concat(d2)
    out2.concat(d1)
    out2.concat(d3)


================================================
FILE: causal_controller/CausalController.py
================================================
from __future__ import print_function
from itertools import chain
import numpy as np
import tensorflow as tf
import pandas as pd
import os
slim = tf.contrib.slim
from models import lrelu,DiscriminatorW,Grad_Penalty
from utils import summary_stats,did_succeed
from ArrayDict import ArrayDict#Collector of outputs

debug=False

class CausalController(object):
    model_type='controller'
    summs=['cc_summaries']
    def summary_scalar(self,name,ten):
        tf.summary.scalar(name,ten,collections=self.summs)
    def summary_stats(self,name,ten,hist=False):
        summary_stats(name,ten,collections=self.summs,hist=hist)

    def load(self,sess,path):
        '''
        sess is a tf.Session object
        path is the path of the file you want to load, (not the directory)
        Example
        ./checkpoint/somemodel/saved/model.ckpt-3000
        (leave off the extensions)
        '''
        if not hasattr(self,'saver'):#should have one now
            self.saver=tf.train.Saver(var_list=self.var)
        print('Attempting to load model:',path)
        self.saver.restore(sess,path)

    def __init__(self,batch_size,config):
        '''
        Args:
            config    : This carries all the aguments defined in
            causal_controller/config.py with it. It also defines config.graph,
            which is a nested list that specifies the graph

            batch_size: This is separate from config because it is actually a
            tf.placeholder so that batch_size can be set during sess.run, but
            also synchronized between the models.

        A causal graph (config.graph) is specified as follows:
            just supply a list of pairs (node, node_parents)

            Example: A->B<-C; D->E

            [ ['A',[]],
              ['B',['A','C']],
              ['C',[]],
              ['D',[]],
              ['E',['D']]
            ]

            I use a list right now instead of a dict because I don't think
            dict.keys() are gauranteed to be returned a particular order.
            TODO:A good improvement would be to use collections.OrderedDict

            #old
            #Pass indep_causal=True to use Unif[0,1] labels
            #input_dict allows the model to take in some aritrary input instead
            #of using tf_random_uniform nodes
            #pass reuse if constructing for a second time

            Access nodes ether with:
            model.cc.node_dict['Male']
            or with:
            model.cc.Male


        Other models such as began/dcgan are intended to be build more than
        once (for example on 2 gpus), but causal_controller is just built once.

        '''

        self.config=config
        self.batch_size=batch_size #tf.placeholder_with_default
        self.graph=config.graph
        print('causal graph size:',len(self.graph))
        self.node_names, self.parent_names=zip(*self.graph)
        self.node_names=list(self.node_names)
        self.label_names=self.node_names

        #set nodeclass attributes
        if debug:
            print('Using ',self.config.cc_n_layers,'between each causal node')
        CausalNode.n_layers=self.config.cc_n_layers
        CausalNode.n_hidden=self.config.cc_n_hidden
        CausalNode.batch_size=self.batch_size

        with tf.variable_scope('causal_controller') as vs:
            self.step=tf.Variable(0, name='step', trainable=False)
            self.inc_step=tf.assign(self.step,self.step+1)

            self.nodes=[CausalNode(name=n,config=config) for n in self.node_names]

            for node,rents in zip(self.nodes,self.parent_names):
                node.parents=[n for n in self.nodes if n.name in rents]

            ##construct graph##
            #Lazy construction avoids the pain of traversing the causal graph explicitly
            #python recursion error if the graph is not a DAG
            for node in self.nodes:
                node.setup_tensor()

            self.labels=tf.concat(self.list_labels(),-1)
            self.fake_labels=self.labels
            self.fake_labels_logits= tf.concat( self.list_label_logits(),-1 )

        self.label_dict={n.name:n.label for n in self.nodes}
        self.node_dict={n.name:n for n in self.nodes}
        self.z_dict={n.name:n.z for n in self.nodes}

        #enable access directly. Little dangerous
        #Please don't have any nodes named "batch_size" for example
        self.__dict__.update(self.node_dict)

        #dcc variables are not saved, so if you reload in the middle of a
        #pretrain, that might be a quirk. I don't find it makes much of a
        #difference though
        self.var = tf.contrib.framework.get_variables(vs)
        trainable=tf.get_collection('trainable_variables')
        self.train_var=[v for v in self.var if v in trainable]

        #wont save dcc var
        self.saver=tf.train.Saver(var_list=self.var)
        self.model_dir=os.path.join(self.config.model_dir,self.model_type)
        self.save_model_dir=os.path.join(self.model_dir,'checkpoints')
        self.save_model_name=os.path.join(self.save_model_dir,'CC-Model')

        if not os.path.exists(self.model_dir):
            os.mkdir(self.model_dir)
        if not os.path.exists(self.save_model_dir):
            os.mkdir(self.save_model_dir)


    def build_pretrain(self,label_loader):
        '''
        This is not called if for example using an existing model
        label_loader is a queue of only labels that moves quickly because no
        images
        '''
        config=self.config

        #Pretraining setup
        self.DCC=DiscriminatorW

        #if self.config.pt_factorized:
            #self.DCC=FactorizedNetwork(self.graph,self.DCC,self.config)

        #reasonable alternative with equal performance
        if self.config.pt_factorized:#Each node owns a dcc
            print('CC is factorized!')
            for node in self.nodes:
                node.setup_pretrain(config,label_loader,self.DCC)

            with tf.control_dependencies([self.inc_step]):
                self.c_optim=tf.group(*[n.c_optim for n in self.nodes])
            self.dcc_optim=tf.group(*[n.dcc_optim for n in self.nodes])
            self.train_op=tf.group(self.c_optim,self.dcc_optim)

            self.c_loss=tf.reduce_sum([n.c_loss for n in self.nodes])
            self.dcc_loss=tf.reduce_sum([n.dcc_loss for n in self.nodes])

            self.summary_stats('total_c_loss',self.c_loss)
            self.summary_stats('total_dcc_loss',self.dcc_loss)

        #default.
        else:#Not factorized. CC owns dcc
            print('setting up pretrain:','CausalController')
            real_inputs=tf.concat([label_loader[n] for n in self.node_names],axis=1)
            fake_inputs=self.labels
            n_hidden=self.config.critic_hidden_size
            real_prob,self.dcc_real_logit,self._dcc_var=self.DCC(real_inputs,self.batch_size,n_hidden,self.config)
            fake_prob,self.dcc_fake_logit,_=self.DCC(fake_inputs,self.batch_size,n_hidden,self.config,reuse=True)
            grad_cost,self.dcc_slopes=Grad_Penalty(real_inputs,fake_inputs,self.DCC,self.config)

            self.dcc_diff = self.dcc_fake_logit - self.dcc_real_logit
            self.dcc_gan_loss=tf.reduce_mean(self.dcc_diff)
            self.dcc_grad_loss=grad_cost
            self.dcc_loss=self.dcc_gan_loss+self.dcc_grad_loss#
            self.c_loss=-tf.reduce_mean(self.dcc_fake_logit)#

            optimizer = tf.train.AdamOptimizer
            self.c_optimizer, self.dcc_optimizer = optimizer(config.pt_cc_lr),optimizer(config.pt_dcc_lr)

            with tf.control_dependencies([self.inc_step]):
                self.c_optim=self.c_optimizer.minimize(self.c_loss,var_list=self.train_var)
            self.dcc_optim=self.dcc_optimizer.minimize(self.dcc_loss,var_list=self.dcc_var)
            self.train_op=tf.group(self.c_optim,self.dcc_optim)

            self.summary_stats('total_c_loss',self.c_loss)
            self.summary_stats('total_dcc_loss',self.dcc_loss)


            for node in self.nodes:
                with tf.name_scope(node.name):
                    #TODO:replace with summary_stats
                    self.summary_stats(node.name+'_fake',node.label,hist=True)
                    self.summary_stats(node.name+'_real',label_loader[node.name],hist=True)


        self.summaries=tf.get_collection(self.summs[0])
        print('causalcontroller has',len(self.summaries),'summaries')
        self.summary_op=tf.summary.merge(self.summaries)


    @property
    def dcc_var(self):
        if self.config.is_pretrain:
            if self.config.pt_factorized:
                return list(chain.from_iterable([n.dcc_var for n in self.nodes]))
            else:
                return self._dcc_var
        else:
            return []


    def critic_update(self,sess):
        fetch_dict = {"critic_op":self.dcc_optim }
        for i in range(self.config.n_critic):
            result = sess.run(fetch_dict)


    def __len__(self):
        return len(self.node_dict)


    def list_placeholders(self):
        return [n.z for n in self.nodes]
    def list_labels(self):
        return [n.label for n in self.nodes]
    def list_label_logits(self):
        return [n.label_logit for n in self.nodes]

    def do2feed(self,do_dict):
        '''
        used internally to convert a dictionary to a feed_dict
        '''
        feed_dict={}
        for key,value in do_dict.items():
            feed_dict[self.label_dict[key]]=value
        return feed_dict

    def sample_label(self, sess, cond_dict=None,do_dict=None,N=None,verbose=False):
        '''
        This is a method to sample conditional and internventional
        distributions over labels. This is disconnected from
        interventions/conditioning that include the image because it is
        potentially faster. (images are not generated for rejected samples).
        The intent is to pass these labels to the image generator.

        This is low level. One experiment type(N times) per function call.
        values of dictionaries should be scalars

        Assumed that label_dict is always the fetch

        may combine conditioning and intervening
        '''

        do_dict= do_dict or {}
        cond_dict= cond_dict or {}
        fetch_dict=self.label_dict

        #boolean scalars are all that is allowed
        for v in cond_dict.values():
            assert(v==0 or v==1)
        for v in do_dict.values():
            assert(v==0 or v==1)

        arr_do_dict={k:v*np.ones([N,1]) for k,v in do_dict.items()}

        feed_dict = self.do2feed(arr_do_dict)#{tensor:array}
        feed_dict.update({self.batch_size:N})

        if verbose:
            print('feed_dict',feed_dict)
            print('fetch_dict',fetch_dict)

        #No conditioning loop needed
        if not cond_dict:
            return sess.run(fetch_dict, feed_dict)

        else:#cond_dict not None

            rows=np.arange(N)#what idx do we need
            #init
            max_fail=4000
            n_fails=0
            outputs=ArrayDict()
            iter_rows=np.arange(N)
            n_remaining=N

            ii=0
            while( n_remaining > 0 ):
                ii+=1

                #Run N samples
                out=sess.run(fetch_dict, feed_dict)

                bool_pass = did_succeed(out,cond_dict)
                pass_idx=iter_rows[bool_pass]
                pass_idx=pass_idx[:n_remaining]
                pass_dict={k:v[pass_idx] for k,v in out.items()}

                outputs.concat(pass_dict)
                n_remaining=N-len(outputs)

                #    :(
                if ii>max_fail:
                    print('WARNING: for cond_dict:',cond_dict,)
                    print('could not condition in ',max_fail*N, 'samples')
                    break

            else:
                if verbose:
                    print('for cond_dict:',cond_dict,)
                    print('conditioning finished normally with ',ii,'tries')

            return outputs.dict


class CausalNode(object):
    '''
    A CausalNode sets up a small neural network:
    z_noise+[,other causes] -> label_logit

    Everything is defined in terms of @property
    to allow tensorflow graph to be lazily generated as called
    because I don't enforce that a node's parent tf graph
    is constructed already during class.setup_tensor

    Uniform[-1,1] + other causes pases through n_layers fully connected layers.
    '''
    train = True
    name=None
    #logit is going to be 1 dim with sigmoid
    #as opposed to 2 dim with softmax
    _label_logit=None
    _label=None
    parents=[]#list of CausalNodes
    n_layers=3
    n_hidden=10
    batch_size=-1#Must be set by cc
    summs=['cc_summaries']

    def summary_scalar(self,name,ten):
        tf.summary.scalar(name,ten,collections=self.summs)
    def summary_stats(self,name,ten,hist=False):
        summary_stats(name,ten,collections=self.summs,hist=hist)

    def __init__(self,name,config):
        self.name=name
        self.config=config

        if self.batch_size==-1:
            raise Exception('class attribute CausalNode.batch_size must be set')

        with tf.variable_scope(self.name) as vs:
            #I think config.seed would have to be passed explicitly here
            self.z=tf.random_uniform((self.batch_size,self.n_hidden),minval=-1.0,maxval=1.0)
            self.init_var = tf.contrib.framework.get_variables(vs)
            self.setup_var=[]#empty until setup_tensor runs

    def setup_tensor(self):
        if self._label is not None:#already setup
            if debug:
                #Notify that already setup (normal behavior)
                print('self.',self.name,' has refuted setting up tensor')
            return

        tf_parents=[self.z]+[node.label for node in self.parents]


        with tf.variable_scope(self.name) as vs:
            h=tf.concat(tf_parents,-1)#tensor of parent values
            for l in range(self.n_layers-1):
                h=slim.fully_connected(h,self.n_hidden,activation_fn=lrelu,scope='layer'+str(l))

            self._label_logit = slim.fully_connected(h,1,activation_fn=None,scope='proj')
            self._label=tf.nn.sigmoid( self._label_logit )
            if debug:
                print('self.',self.name,' has setup _label=',self._label)

            #There could actually be some (quiet) error here I think if one of the
            #names in the causal graph is a substring of some other name.
                #e.g. 'hair' and 'black_hair'
            #Sorry, not coded to anticipate corner case
            self.setup_var=tf.contrib.framework.get_variables(vs)
    @property
    def var(self):
        if len(self.setup_var)==0:
            print('WARN: node var was accessed before it was constructed')
        return self.init_var+self.setup_var
    @property
    def train_var(self):
        trainable=tf.get_collection('trainable_variables')
        return [v for v in self.var if v in trainable]

    @property
    def label_logit(self):
        #Less stable. Better to access labels
        #for input to another model
        if self._label_logit is not None:
            return self._label_logit
        else:
            self.setup_tensor()
            return self._label_logit
    @property
    def label(self):
        if self._label is not None:
            return self._label
        else:
            self.setup_tensor()
            return self._label


    def setup_pretrain(self,config,label_loader,DCC):
        '''
        This function is not functional because
        this only happens if cc_config.pt_factorized=True.

        In this case convergence of each node is treated like its
        own gan conditioned on the parent nodes labels.

        I couldn't bring myself to delete it, but it's not needed
        to get good convergence for the models we tested.
        '''

        print('setting up pretrain:',self.name)

        with tf.variable_scope(self.name,reuse=self.reuse) as vs:
            self.config=config
            n_hidden=self.config.critic_hidden_size

            parent_names=[p.name for p in self.parents]
            real_inputs=tf.concat([label_loader[n] for n in parent_names]+[label_loader[self.name]],axis=1)
            fake_inputs=tf.concat([p.label for p in self.parents]+[self.label],axis=1)

            real_prob,self.dcc_real_logit,self.dcc_var=DCC(real_inputs,self.batch_size,n_hidden,self.config)
            fake_prob,self.dcc_fake_logit,_=DCC(fake_inputs,self.batch_size,n_hidden,self.config,reuse=True)

            grad_cost,self.dcc_slopes=Grad_Penalty(real_inputs,fake_inputs,DCC,self.config)

            self.dcc_diff = self.dcc_fake_logit - self.dcc_real_logit
            self.dcc_gan_loss=tf.reduce_mean(self.dcc_diff)
            self.dcc_grad_loss=grad_cost
            self.dcc_loss=self.dcc_gan_loss+self.dcc_grad_loss#
            self.c_loss=-tf.reduce_mean(self.dcc_fake_logit)#

            self.summary_scalar('dcc_gan_loss',self.dcc_gan_loss)
            self.summary_scalar('dcc_grad_loss',self.dcc_grad_loss)
            self.summary_stats('dcc_slopes',self.dcc_slopes,hist=True)

            if config.optimizer == 'adam':
                optimizer = tf.train.AdamOptimizer
            else:
                raise Exception("[!] Caution! Optimizer untested {}. Only tested Adam".format(config.optimizer))
            self.c_optimizer, self.dcc_optimizer = optimizer(config.pt_cc_lr),optimizer(config.pt_dcc_lr)

            self.c_optim=self.c_optimizer.minimize(self.c_loss,var_list=self.train_var)
            self.dcc_optim=self.dcc_optimizer.minimize(self.dcc_loss,var_list=self.dcc_var)

            self.summary_stats('c_loss',self.c_loss)
            self.summary_stats('dcc_loss',self.c_loss)
            self.summary_stats('dcc_real_logit',self.dcc_real_logit,hist=True)
            self.summary_stats('dcc_fake_logit',self.dcc_fake_logit,hist=True)


================================================
FILE: causal_controller/__init__.py
================================================


================================================
FILE: causal_controller/config.py
================================================
'''

These are the command line parameters that pertain exlusively to the
CausalController.

'''

from __future__ import print_function
import argparse

def str2bool(v):
    #return (v is True) or (v.lower() in ('true', '1'))
    return v is True or v.lower() in ('true', '1')

arg_lists = []
parser = argparse.ArgumentParser()

def add_argument_group(name):
    arg = parser.add_argument_group(name)
    arg_lists.append(arg)
    return arg

#Pretrain network
pretrain_arg=add_argument_group('Pretrain')
pretrain_arg.add_argument('--pt_load_path', type=str, default='')
pretrain_arg.add_argument('--is_pretrain',type=str2bool,default=False,
                         help='to do pretraining')
#pretrain_arg.add_argument('--only_pretrain', action='store_true',
#                         help='simply complete pretrain and exit')

#Used to be an option, but now is solved
#pretrain_arg.add_argument('--pretrain_type',type=str,default='wasserstein',choices=['wasserstein','gan'])

pretrain_arg.add_argument('--pt_cc_lr',type=float,default=0.00008,#
                          help='learning rate for causal controller')
pretrain_arg.add_argument('--pt_dcc_lr',type=float,default=0.00008,#
                          help='learning rate for causal controller')
pretrain_arg.add_argument('--lambda_W',type=float,default=0.1,#
                          help='penalty for gradient of W critic')
pretrain_arg.add_argument('--n_critic',type=int,default=20,#5 for speed
                          help='number of critic iterations between gen update')
pretrain_arg.add_argument('--critic_layers',type=int,default=6,#4 usual.8 might help
                          help='number of layers in the Wasserstein discriminator')
pretrain_arg.add_argument('--critic_hidden_size',type=int,default=15,#10,15
                         help='hidden_size for critic of discriminator')

pretrain_arg.add_argument('--min_tvd',type=float,default=0.02,
                          help='if tvd<min_tvd then stop pretrain')
pretrain_arg.add_argument('--min_pretrain_iter',type=int,default=5000,
                          help='''pretrain for at least this long before
                          stopping early due to tvd convergence. This is to
                          avoid being able to get a low tvd without labels
                          being clustered near integers''')
pretrain_arg.add_argument('--pretrain_iter',type=int,default=10000,
                          help='if iter>pretrain_iter then stop pretrain')
#pretrain_arg.add_argument('--pretrain_labeler',type=str2bool,default=False,
#                          help='''whether to train the labeler on real images
#                          during pretraining''')

pretrain_arg.add_argument('--pt_factorized',type=str2bool,default=False,
                          help='''Interesting approach that seemed to stabalize
                          training, but is not needed in this application.
                          It turned out that we could get very good training without
                          this complication, so we did not include in the paper.
                          I've left it commented out here in the code.

                          Whether the discriminator should be
                          factorized according to the structure of the graph
                          to speed/stabalize convergence.
                          
                          This creates a separate discriminator for each node
                          that only looks at each causal nodes value and its
                          parents''')

#Network
net_arg = add_argument_group('Network')

net_arg.add_argument('--cc_n_layers',type=int, default=6,
                     help='''This is the number of neural network fc layers
                     between the causes of a node and the node itsef.''')
net_arg.add_argument('--cc_n_hidden',type=int, default=10,
                     help='''number of neurons per layer in causal controller.
                     Also functions as the dimensionality of the uniform noise
                     input to the controller''')

# Data
data_arg = add_argument_group('Data')
data_arg.add_argument('--causal_model', type=str)
data_arg.add_argument('--dataset', type=str, default='celebA')

data_arg.add_argument('--batch_size', type=int, default=16)
data_arg.add_argument('--num_worker', type=int, default=24,
     help='number of threads to use for loading and preprocessing data')

# Training / test parameters
train_arg = add_argument_group('Training')


# Misc
misc_arg = add_argument_group('Misc')
misc_arg.add_argument('--load_path', type=str, default='')
misc_arg.add_argument('--log_step', type=int, default=100)
misc_arg.add_argument('--save_step', type=int, default=5000)
misc_arg.add_argument('--num_log_samples', type=int, default=3)
misc_arg.add_argument('--log_level', type=str, default='INFO', choices=['INFO', 'DEBUG', 'WARN'])
misc_arg.add_argument('--log_dir', type=str, default='logs')


def get_config():
    config, unparsed = parser.parse_known_args()
    print('Loaded ./causal_controller/config.py')
    return config, unparsed

if __name__=='__main__':
    #for debug of config
    config, unparsed = get_config()


================================================
FILE: causal_controller/models.py
================================================
import numpy as np
import tensorflow as tf
slim = tf.contrib.slim


def lrelu(x,leak=0.2,name='lrelu'):
    with tf.variable_scope(name):
        #Trick that saves memory by avoiding tf.max
        f1=0.5 * (1+leak)
        f2=0.5 * (1-leak)
        return f1*x + f2*tf.abs(x)


def DiscriminatorW(labels,batch_size, n_hidden, config, reuse=None):
    '''
    A simple discriminator to be used with Wasserstein optimization.
    No minibatch features or batch normalization is used.
    '''
    with tf.variable_scope("WasserDisc") as scope:
        if reuse:
            scope.reuse_variables()
        h=labels
        act_fn=lrelu
        n_neurons=n_hidden
        for i in range(config.critic_layers):
            if i==config.critic_layers-1:
                act_fn=None
                n_neurons=1
            scp='WD'+str(i)
            h = slim.fully_connected(h,n_neurons,activation_fn=act_fn,scope=scp)
        variables = tf.contrib.framework.get_variables(scope)
        return tf.nn.sigmoid(h),h,variables


def Grad_Penalty(real_data,fake_data,Discriminator,config):
    '''
    Implemention from "Improved training of Wasserstein"
    Interpolation based estimation of the gradient of the discriminator.
    Used to penalize the derivative rather than explicitly constrain lipschitz.
    '''
    batch_size=config.batch_size
    LAMBDA=config.lambda_W
    n_hidden=config.critic_hidden_size
    alpha = tf.random_uniform([batch_size,1],0.,1.)
    interpolates = alpha*real_data + ((1-alpha)*fake_data)#Could do more if not fixed batch_size
    disc_interpolates = Discriminator(interpolates,batch_size,n_hidden=n_hidden,config=config, reuse=True)[1]#logits
    gradients = tf.gradients(disc_interpolates,[interpolates])[0]#orig
    slopes = tf.sqrt(tf.reduce_sum(tf.square(gradients),
                           reduction_indices=[1]))
    gradient_penalty = tf.reduce_mean((slopes-1)**2)
    grad_cost = LAMBDA*gradient_penalty
    return grad_cost,slopes


================================================
FILE: causal_controller/utils.py
================================================
from __future__ import print_function
import numpy as np
import tensorflow as tf

def summary_stats(name,tensor,collections=None,hist=False):
    collections=collections or [tf.GraphKeys.SUMMARIES]
    ave=tf.reduce_mean(tensor)
    std=tf.sqrt(tf.reduce_mean(tf.square(ave-tensor)))
    tf.summary.scalar(name+'_ave',ave,collections)
    tf.summary.scalar(name+'_std',std,collections)
    if hist:
        tf.summary.histogram(name+'_hist',tensor,collections)

def did_succeed( output_dict, cond_dict ):
    '''
    Used in rejection sampling:
    for each row, determine if cond is satisfied
    for every cond in cond_dict

    success is hardcoded as round(label) being exactly equal
    to the integer in cond_dict
    '''

    #definition success:
    def is_win(key):
        #cond=np.squeeze(cond_dict[key])
        cond=np.squeeze(cond_dict[key])
        val=np.squeeze(output_dict[key])
        condition= np.round(val)==cond
        return condition

    scoreboard=[is_win(key) for key in cond_dict]
    #print('scoreboard', scoreboard)
    all_victories_bool=np.logical_and.reduce(scoreboard)
    return all_victories_bool.flatten()


================================================
FILE: causal_dcgan/CausalGAN.py
================================================
from __future__ import division,print_function
from figure_scripts.pairwise import crosstab
from figure_scripts.sample import intervention2d
import os
import time
import math
from glob import glob
import tensorflow as tf
import numpy as np
from six.moves import xrange
import pandas as pd
import sys
import scipy.stats as stats

from models import GeneratorCNN,DiscriminatorCNN,discriminator_labeler
from models import discriminator_gen_labeler,discriminator_on_z

from tensorflow.core.framework import summary_pb2
from tensorflow.contrib import slim

from ops import batch_norm,lrelu

from causal_graph import get_causal_graph

def norm_img(image):
    image = image/127.5 - 1.
    return image
def denorm_img(norm):
    return tf.clip_by_value((norm + 1)*127.5, 0, 255)

def tf_truncexpon(batch_size,rate,right):
    '''
    a tensorflow node that returns a random variable
    sampled from an Exp(rate) random variable
    which has been truncated and normalized to [0,right]

    #Leverages that log of uniform is exponential

    batch_size: a tensorflow placeholder to sync batch_size everywhere
    rate: lambda rate parameter for exponential dist
    right: float in (0,inf) where to truncate exp distribution
    '''

    uleft=tf.exp(-1*rate*right)
    U=tf.random_uniform(shape=(batch_size,1),minval=uleft,maxval=1)
    tExp=(-1/rate)*tf.log(U)

    return tExp

def add_texp_noise(batch_size,labels01):
    labels=0.3+labels01*0.4#{0.3,0.7}
    lower, upper, scale = 0, 0.2, 1/25.0
    lower_tail, upper_tail, scale_tail = 0, 0.3, 1/50.0
    #before #t = stats.truncexpon(b=(upper-lower)/scale, loc=lower, scale=scale)
    #b*scale was the right-boundary
    b=(upper-lower)/scale
    b_tail=(upper_tail-lower_tail)/scale_tail

    s=tf_truncexpon(batch_size,rate=b,right=upper)
    s_tail=tf_truncexpon(batch_size,rate=b_tail,right=upper_tail)
    labels = labels + ((0.5-labels)/0.2)*s + ((-0.5+labels)/0.2)*s_tail
    return labels, [s,s_tail]

class CausalGAN(object):
    model_type='dcgan'

    def __init__(self,batch_size,config):

        self.batch_size = batch_size #a tensor
        self.config=config
        self.model_dir=config.model_dir
        self.TINY = 10**-6

        self.step = tf.Variable(0, name='step', trainable=False)
        self.inc_step=tf.assign(self.step,self.step+1)

        #########################################
        ##### Following is not used anymore #####
        #########################################
        self.gamma_k = tf.get_variable(name='gamma_k',initializer=config.gamma_k,trainable=False)
        self.lambda_k = config.lambda_k#0.05
        self.gamma_l = config.gamma_l#self.label_loss_hyperparameter
        self.lambda_l = config.lambda_l#0.005
        self.gamma_m = 1./(self.gamma_k+self.TINY)#gamma_m#4.0 # allowing gan loss to be 8 times labelerR loss
        #self.gamma_m=config.gamma_m
        self.lambda_m =config.lambda_m#0.05
        #########################################

        self.k_t = tf.get_variable(name='k_t',initializer=1.,trainable=False) # kt is the closed loop feedback coefficient to balance the loss between LR and LG

        self.rec_loss_coeff = 0.0
        print('WARNING:CausalGAN.rec_loss_coff=',self.rec_loss_coeff)

        self.hidden_size=config.critic_hidden_size

        self.gf_dim = config.gf_dim
        self.df_dim = config.df_dim

        self.loss_function = config.loss_function

    def __call__(self, real_inputs, fake_inputs):
        '''
        This builds the model on the inputs. Potentially this would be called
        multiple times in a multi-gpu situation. Put "setup" type stuff in
        __init__ instead.

        This is like self.build_model()

        fake inputs is a dictionary of labels from cc
        real_inputs is also a dictionary of labels
            with an additional key 'x' for the real image
        '''
        config=self.config#used many times

        #dictionaries
        self.real_inputs=real_inputs
        self.fake_inputs=fake_inputs

        n_labels=len(fake_inputs)
        self.x = self.real_inputs.pop('x')#[0,255]
        x = norm_img(self.x)#put in [-1,1]

        #These are 0,1 labels. To add noise, add noise from here.
        self.real_labels=tf.concat(self.real_inputs.values(),-1)
        self.fake_labels=tf.concat(self.fake_inputs.values(),-1)

        ##BEGIN manipulating labels##

        #Fake labels will already be nearly discrete
        if config.round_fake_labels: #default
            fake_labels=tf.round(self.fake_labels)#{0,1}
            real_labels=tf.round(self.real_labels)#should already be rounded
        else:
            fake_labels=self.fake_labels#{0,1}
            real_labels=self.real_labels

        if config.label_type=='discrete':
            fake_labels=0.3+fake_labels*0.4#{0.3,0.7}
            real_labels=0.3+real_labels*0.4#{0.3,0.7}

        elif config.label_type=='continuous':

            #this is so that they can be set to 0 in label_interpolation
            self.noise_variables=[]

            if config.label_specific_noise:
                #TODO#uniform see above #REFERENCE
                raise Exception('label_specific_noise=True not yet implemented')
            else:#default
                fake_labels,nvfake=add_texp_noise(self.batch_size,fake_labels)
                real_labels,nvreal=add_texp_noise(self.batch_size,real_labels)
                self.noise_variables.extend(nvfake)
                self.noise_variables.extend(nvreal)

            tf.summary.histogram('noisy_fake_labels',fake_labels)
            tf.summary.histogram('noisy_real_labels',real_labels)

        self.fake_labels_logits= -tf.log(1/(fake_labels+self.TINY)-1)
        self.real_labels_logits = -tf.log(1/(real_labels+self.TINY)-1)

        self.noisy_fake_labels=fake_labels
        self.noisy_real_labels=real_labels

        if config.type_input_to_generator=='labels':
            self.fake_labels_inputs=fake_labels
            self.real_labels_inputs=real_labels#for reconstruction
        elif config.type_input_to_generator=='logits': #default
            self.fake_labels_inputs=self.fake_labels_logits
            self.real_labels_inputs=self.real_labels_logits

        ##FINISHED manipulating labels##

        self.z_gen = tf.random_uniform( [self.batch_size, config.z_dim],minval=-1.0, maxval=1.0,name='z_gen')

        self.z= tf.concat( [self.z_gen, self.fake_labels_inputs],axis=-1,name='z')

        G, self.g_vars = GeneratorCNN(self.z,config)#[-1,1]float
        self.G=denorm_img(G)#[0,255]

        #Discriminator
        D_on_real=DiscriminatorCNN(x,config)
        D_on_fake=DiscriminatorCNN(G,config,reuse=True)
        self.D, self.D_logits ,self.features_to_estimate_z_on_input ,self.d_vars=D_on_real
        self.D_,self.D_logits_,self.features_to_estimate_z_on_generated,_ =D_on_fake

        #Discriminator Labeler
        self.D_labels_for_real, self.D_labels_for_real_logits, self.dl_vars =\
                discriminator_labeler(x,n_labels,config)
        self.D_labels_for_fake, self.D_labels_for_fake_logits, _ =\
                discriminator_labeler(G,n_labels,config,reuse=True)

        #Other discriminators
        self.D_gen_labels_for_fake,self.D_gen_labels_for_fake_logits,self.dl_gen_vars=\
            discriminator_gen_labeler(G,n_labels,config)
            #discriminator_gen_labeler(self.G,n_labels,config)

        self.D_on_z_real,_ =discriminator_on_z(self.features_to_estimate_z_on_input,config)
        self.D_on_z,self.dz_vars=discriminator_on_z(self.features_to_estimate_z_on_generated,config,reuse=True)

        #order of concat matters
        self.z_for_real = tf.concat([self.D_on_z_real,self.real_labels_inputs], axis=1 , name ='z_real')
        self.inputs_reconstructed,_ = GeneratorCNN(self.z_for_real,self.config, reuse = True)
        # Reconstructability is an idea that we tried. It does not provide big improvements, hence is not used ini the current version.

        tf.summary.histogram('d',self.D)
        tf.summary.histogram('d_',self.D_)
        tf.summary.image('G',self.G,max_outputs=10)

        def sigmoid_cross_entropy_with_logits(x, y):
            return tf.nn.sigmoid_cross_entropy_with_logits(logits=x, labels=y)

        # We tried different loss functions: 0,1,2 all have the order of terms in the cross entropy loss flipped, whereas 3,4,5 are not (consistent with theory).
        # Although all works to some extent, we have seen the sharpest images and best image quality with "loss function 1".
        # Difference between 0, 1, 2: This is to see the effect of using different GAN losses, as mentioned in the paper.
        if self.loss_function == 0:
            self.g_lossLabels= tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.fake_labels_logits,self.D_labels_for_fake))
            self.g_lossGAN = tf.reduce_mean(
              -sigmoid_cross_entropy_with_logits(self.D_logits_, tf.zeros_like(self.D_))+sigmoid_cross_entropy_with_logits(self.D_logits_, tf.ones_like(self.D_)))
        elif self.loss_function == 1:#default
            self.g_lossLabels= tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.fake_labels_logits,self.D_labels_for_fake))
            self.g_lossGAN = tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.D_logits_, tf.ones_like(self.D_)))
        elif self.loss_function == 2:
            self.g_lossLabels= tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.fake_labels_logits,self.D_labels_for_fake))
            self.g_lossGAN = tf.reduce_mean(-sigmoid_cross_entropy_with_logits(self.D_logits_, tf.zeros_like(self.D_)))
        elif self.loss_function == 3:
            self.g_lossLabels= tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.D_labels_for_fake_logits, self.fake_labels))
            self.g_lossGAN = tf.reduce_mean(
              -sigmoid_cross_entropy_with_logits(self.D_logits_, tf.zeros_like(self.D_))+sigmoid_cross_entropy_with_logits(self.D_logits_, tf.ones_like(self.D_)))
        elif self.loss_function == 4:
            self.g_lossLabels= tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.D_labels_for_fake_logits, self.fake_labels))
            self.g_lossGAN = tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.D_logits_, tf.ones_like(self.D_)))
        elif self.loss_function == 5:
            self.g_lossLabels= tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.D_labels_for_fake_logits, self.fake_labels))
            self.g_lossGAN = tf.reduce_mean(-sigmoid_cross_entropy_with_logits(self.D_logits_, tf.zeros_like(self.D_)))
        else:
            raise Exception('Something is wrong with the loss function.\
                            self.loss_function=',self.loss_function)

        self.g_lossLabels_GLabeler = tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.fake_labels_logits,self.D_gen_labels_for_fake))
        tf.summary.scalar("g_loss_labelerG",self.g_lossLabels_GLabeler)

        self.g_loss_on_z = tf.reduce_mean(tf.abs(self.z_gen - self.D_on_z)**2)
        #x is the real input image
        self.real_reconstruction_loss = tf.reduce_mean(tf.abs(x-self.inputs_reconstructed)**2)

        tf.summary.scalar('real_reconstruction_loss', self.real_reconstruction_loss)

        self.d_loss_real = tf.reduce_mean(
          sigmoid_cross_entropy_with_logits(self.D_logits, tf.ones_like(self.D)))
        self.d_loss_fake = tf.reduce_mean(
          sigmoid_cross_entropy_with_logits(self.D_logits_, tf.zeros_like(self.D_)))

        if config.reconstr_loss:
            g_loss_on_z=self.g_loss_on_z
        else:
            g_loss_on_z=0.
            # Default value for now, since reconstructability is not used in the current version.

        if config.off_label_losses:
            self.g_loss = self.g_lossGAN
        else:#default
            self.g_loss = self.g_lossGAN - 1.0*self.k_t*self.g_lossLabels_GLabeler + self.g_lossLabels + g_loss_on_z

        tf.summary.scalar('g_loss_labelerR', self.g_lossLabels)
        tf.summary.scalar('g_lossGAN', self.g_lossGAN)
        tf.summary.scalar('g_loss_on_z', self.g_loss_on_z)
        tf.summary.scalar('coeff_of_negLabelerG_loss_k_t', self.k_t)
        tf.summary.scalar('gamma_k_summary', self.gamma_k)

        self.d_labelLossReal = tf.reduce_mean(sigmoid_cross_entropy_with_logits(self.D_labels_for_real_logits,self.real_labels))

        tf.summary.scalar("d_loss_real", self.d_loss_real)
        tf.summary.scalar("d_loss_fake", self.d_loss_fake)
        tf.summary.scalar("d_loss_real_label", self.d_labelLossReal)

        self.d_loss = self.d_loss_real + self.d_loss_fake

        tf.summary.scalar("g_loss", self.g_loss)
        tf.summary.scalar("d_loss", self.d_loss)

    def build_train_op(self):
        config=self.config

        self.g_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
                  .minimize(self.g_loss, var_list=self.g_vars)

        self.d_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
                  .minimize(self.d_loss, var_list=self.d_vars)

        self.d_label_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
                  .minimize(self.d_labelLossReal, var_list=self.dl_vars)

        self.d_gen_label_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
                  .minimize(self.g_lossLabels_GLabeler, var_list=self.dl_gen_vars)

        self.d_on_z_optim = tf.train.AdamOptimizer(config.learning_rate, beta1=config.beta1) \
                  .minimize(self.g_loss_on_z + self.rec_loss_coeff*self.real_reconstruction_loss, var_list=self.dz_vars)

        self.k_t_update = tf.assign(self.k_t, self.k_t*tf.exp(-1.0/config.tau) )

        self.train_op=tf.group(self.d_gen_label_optim,self.d_label_optim,self.d_optim,self.g_optim,self.d_on_z_optim)

    def build_summary_op(self):
        self.summary_op=tf.summary.merge_all()

    def train_step(self,sess,counter):
        '''
        This is a generic function that will be called by the Trainer class
        once per iteration. The simplest body for this part would be simply
        "sess.run(self.train_op)". But you may have more complications.

        Running self.summary_op is handeled by Trainer.Supervisor and doesn't
        need to be addressed here

        Only counters, not epochs are explicitly kept track of
        '''

        ###You can wait until counter>N to do stuff for example:
        if self.config.pretrain_LabelerR and counter < self.config.pretrain_LabelerR_no_of_iters:
            sess.run(self.d_label_optim)

        else:
            if np.mod(counter, 3) == 0:

                sess.run(self.g_optim)
                sess.run([self.train_op,self.k_t_update,self.inc_step])#all ops

            else:
                sess.run([self.g_optim, self.k_t_update ,self.inc_step])
                sess.run(self.g_optim)

================================================
FILE: causal_dcgan/__init__.py
================================================


================================================
FILE: causal_dcgan/config.py
================================================
from __future__ import print_function
import argparse

def str2bool(v):
    return v is True or v.lower() in ('true', '1')

arg_lists = []
parser = argparse.ArgumentParser()

def add_argument_group(name):
    arg = parser.add_argument_group(name)
    arg_lists.append(arg)
    return arg

# Data
data_arg = add_argument_group('Data')
data_arg.add_argument('--batch_size', type=int, default=64,
                     help='''default batch_size when using this model and not
                      specifying the batch_size elsewhere''')


data_arg.add_argument('--label_specific_noise',type=str2bool,default=False,
                      help='whether to add noise dependent on the data mean')

#This flag doesn't function. Model is designed to take in CC.labels
data_arg.add_argument('--fakeLabels_distribution',type=str,choices=['real_joint','iid_uniform'],default='real_joint')


data_arg.add_argument('--label_type',type=str,choices=['discrete','continuous'],default='continuous')
data_arg.add_argument('--round_fake_labels',type=str2bool,default=True,
                    help='''whether to round the outputs of causal controller
                      before (possibly) adding noise to them or using them as
                      input to the image generator. I highly recommend as a
                      small improvement.''')

data_arg.add_argument('--type_input_to_generator',type=str,choices=['labels','logits'],
                      default='logits',help='''Whether to send labels or logits to the generator
                      to form images. Chris recommends labels''')

#Network
net_arg = add_argument_group('Network')

#TODO need help strings
net_arg.add_argument('--df_dim',type=int, default=64 )
net_arg.add_argument('--gf_dim',type=int, default=64,
                    help='''output dimensions [gf_dim,gf_dim] for generator''')
net_arg.add_argument('--c_dim',type=int, default=3,
                     help='''number of color channels. I wouldn't really change
                     this from 3''')

net_arg.add_argument('--z_dim',type=int,default=100,
                     help='''the number of dimensions for the noise input that
                     will be concatenated with labels and fed to the image
                     generator''')

net_arg.add_argument('--loss_function',type=int,default=1,
                     help='''which loss function to choose. See CausalGAN.py''')

net_arg.add_argument('--critic_hidden_size',type=int,default=10,
                    help='''number of neurons per fc layer in discriminator''')

net_arg.add_argument('--reconstr_loss',type=str2bool,default=False,
                     help='''whether to inclue g_loss_on_z in the generator
                     loss. This was True by default until recently which is where there are a lot of unneccsary networks''')


net_arg.add_argument('--stab_proj',type=str2bool,default=False,
                     help='''stabalizing projection method used for
                     discriminator. Stabalizing GAN Training with Multiple
                     Random Projections
                     https://arxiv.org/abs/1705.07831''')

net_arg.add_argument('--n_stab_proj',type=int,default=256,
                     help='''number of stabalizing projections. Need
                     stab_proj=True for this to have effect''')


# Training / test parameters
train_arg = add_argument_group('Training')
train_arg.add_argument('--num_iter',type=int,default=100000,
                       help='the number of training iterations to run the model for')
train_arg.add_argument('--learning_rate',type=float,default=0.0002,
                       help='Learning rate for adam [0.0002]')
train_arg.add_argument('--beta1',type=float,default=0.5,
                       help='Momentum term of adam [0.5]')

train_arg.add_argument('--off_label_losses',type=str2bool,default=False)

#TODO unclear on default for these two arguments
#Not yet setup. Use False
train_arg.add_argument('--pretrain_LabelerR',type=str2bool,default=False)

#counters over epochs preferred
#train_arg.add_argument('--pretrain_LabelerR_no_of_epochs',type=int,default=5)
train_arg.add_argument('--pretrain_LabelerR_no_of_iters',type=int,default=15000)


#TODO: add help strings describing params
train_arg.add_argument('--lambda_m',type=float,default=0.05,)#0.05
train_arg.add_argument('--lambda_k',type=float,default=0.05,)#0.05
train_arg.add_argument('--lambda_l',type=float,default=0.001,)#0.005
train_arg.add_argument('--gamma_m',type=float,default=-1.0,)# NOT USED!
train_arg.add_argument('--gamma_k',type=float,default=-1.0,#0.8#FLAGS.gamma_k not used
                       help='''default initial value''')
train_arg.add_argument('--gamma_l',type=float,default=-1.0,
                      )

train_arg.add_argument('--tau',type=float,default=3000,
                       help='''time constant. Every tau calls of k_t_update will
                       reduce k_t by a factor of 1/e.''')


#old config file differed from implementation:
#    FLAGS.gamma_k = -1.0
#    FLAGS.gamma_m = -1.0 # set to 1/gamma_k in the code
#    FLAGS.gamma_l = -1.0 # made more extreme
#    FLAGS.lambda_k = 0.05
#    FLAGS.lambda_m = 0.05
#    FLAGS.lambda_l = 0.001


# Misc
misc_arg = add_argument_group('Misc')
misc_arg.add_argument('--is_train',type=str2bool,default=False,
                      help='''whether to enter the image training loop''')
misc_arg.add_argument('--log_level', type=str, default='INFO', choices=['INFO', 'DEBUG', 'WARN'])
misc_arg.add_argument('--log_dir', type=str, default='logs')
misc_arg.add_argument('--log_step', type=int, default=100,
                     help='''how often to log stuff. Sample images are created
                     every 10*log_step''')


##REFERENCE
#  elif model_ID == 44:
#    FLAGS.is_train = True
#    #FLAGS.graph = "big_causal_graph"
#    FLAGS.graph = "complete_big_causal_graph"
#    FLAGS.loss_function = 1
#    FLAGS.pretrain_LabelerR = False
#    FLAGS.pretrain_LabelerR_no_of_epochs = 3
#    FLAGS.fakeLabels_distribution = "real_joint"
#    FLAGS.gamma_k = -1.0
#    FLAGS.gamma_m = -1.0 # set to 1/gamma_k in the code
#    FLAGS.gamma_l = -1.0 # made more extreme
#    FLAGS.lambda_k = 0.05
#    FLAGS.lambda_m = 0.05
#    FLAGS.lambda_l = 0.001
#    FLAGS.label_type = 'continuous'
#    return FLAGS


def get_config():
    config, unparsed = parser.parse_known_args()

    print('Loaded ./causal_dcgan/config.py')
    return config, unparsed

if __name__=='__main__':
    #for debug of config
    config, unparsed = get_config()


================================================
FILE: causal_dcgan/models.py
================================================
import tensorflow as tf
import numpy as np
slim = tf.contrib.slim
import math

from ops import lrelu,linear,conv_cond_concat,batch_norm,add_minibatch_features

from ops import conv2d,deconv2d


def conv_out_size_same(size, stride):
  return int(math.ceil(float(size) / float(stride)))

def GeneratorCNN( z, config, reuse=None):
    '''
    maps z to a 64x64 images with values in [-1,1]
    uses batch normalization internally
    '''

    #trying to get around batch_size like this:
    batch_size=tf.shape(z)[0]
    #batch_size=tf.placeholder_with_default(64,[],'bs')

    with tf.variable_scope("generator",reuse=reuse) as vs:
        g_bn0 = batch_norm(name='g_bn0')
        g_bn1 = batch_norm(name='g_bn1')
        g_bn2 = batch_norm(name='g_bn2')
        g_bn3 = batch_norm(name='g_bn3')

        s_h, s_w = config.gf_dim, config.gf_dim#64,64
        s_h2, s_w2 = conv_out_size_same(s_h, 2), conv_out_size_same(s_w, 2)
        s_h4, s_w4 = conv_out_size_same(s_h2, 2), conv_out_size_same(s_w2, 2)
        s_h8, s_w8 = conv_out_size_same(s_h4, 2), conv_out_size_same(s_w4, 2)
        s_h16, s_w16 = conv_out_size_same(s_h8, 2), conv_out_size_same(s_w8, 2)


        # project `z` and reshape
        z_, self_h0_w, self_h0_b = linear(
            z, config.gf_dim*8*s_h16*s_w16, 'g_h0_lin', with_w=True)

        self_h0 = tf.reshape(
            z_, [-1, s_h16, s_w16, config.gf_dim * 8])
        h0 = tf.nn.relu(g_bn0(self_h0))

        h1, h1_w, h1_b = deconv2d(
            h0, [batch_size, s_h8, s_w8, config.gf_dim*4], name='g_h1', with_w=True)
        h1 = tf.nn.relu(g_bn1(h1))

        h2, h2_w, h2_b = deconv2d(
            h1, [batch_size, s_h4, s_w4, config.gf_dim*2], name='g_h2', with_w=True)
        h2 = tf.nn.relu(g_bn2(h2))

        h3, h3_w, h3_b = deconv2d(
            h2, [batch_size, s_h2, s_w2, config.gf_dim*1], name='g_h3', with_w=True)
        h3 = tf.nn.relu(g_bn3(h3))

        h4, h4_w, h4_b = deconv2d(
            h3, [batch_size, s_h, s_w, config.c_dim], name='g_h4', with_w=True)
        out=tf.nn.tanh(h4)

    variables = tf.contrib.framework.get_variables(vs)
    return out, variables

def DiscriminatorCNN(image, config, reuse=None):
    '''
    Discriminator for GAN model.

    image      : batch_size x 64x64x3 image
    config     : see causal_dcgan/config.py
    reuse      : pass True if not calling for first time

    returns: probabilities(real)
           : logits(real)
           : first layer activation used to estimate z from
           : variables list
    '''
    with tf.variable_scope("discriminator",reuse=reuse) as vs:
        d_bn1 = batch_norm(name='d_bn1')
        d_bn2 = batch_norm(name='d_bn2')
        d_bn3 = batch_norm(name='d_bn3')

        if not config.stab_proj:
            h0 = lrelu(conv2d(image, config.df_dim, name='d_h0_conv'))#16,32,32,64

        else:#method to restrict disc from winning
            #I think this is equivalent to just not letting disc optimize first layer
            #and also removing nonlinearity

            #k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
            #paper used 8x8 kernel, but I'm using 5x5 because it is more similar to my achitecture
            #n_projs=config.df_dim#64 instead of 32 in paper
            n_projs=config.n_stab_proj#64 instead of 32 in paper

            print("WARNING:STAB_PROJ active, using ",n_projs," projections")

            w_proj = tf.get_variable('w_proj', [5, 5, image.get_shape()[-1],n_projs],
                initializer=tf.truncated_normal_initializer(stddev=0.02),trainable=False)
            conv = tf.nn.conv2d(image, w_proj, strides=[1, 2, 2, 1], padding='SAME')

            b_proj = tf.get_variable('b_proj', [n_projs],#does nothing
                 initializer=tf.constant_initializer(0.0),trainable=False)
            h0=tf.nn.bias_add(conv,b_proj)


        h1_ = lrelu(d_bn1(conv2d(h0, config.df_dim*2, name='d_h1_conv')))#16,16,16,128

        h1 = add_minibatch_features(h1_, config.df_dim)
        h2 = lrelu(d_bn2(conv2d(h1, config.df_dim*4, name='d_h2_conv')))#16,16,16,248
        h3 = lrelu(d_bn3(conv2d(h2, config.df_dim*8, name='d_h3_conv')))
        #print('h3shape: ',h3.get_shape().as_list())
        #print('8df_dim:',config.df_dim*8)
        #dim3=tf.reduce_prod(tf.shape(h3)[1:])
        dim3=np.prod(h3.get_shape().as_list()[1:])
        h3_flat=tf.reshape(h3, [-1,dim3])
        h4 = linear(h3_flat, 1, 'd_h3_lin')

        prob=tf.nn.sigmoid(h4)

        variables = tf.contrib.framework.get_variables(vs,collection=tf.GraphKeys.TRAINABLE_VARIABLES)

    return prob, h4, h1_, variables


def discriminator_labeler(image, output_dim, config, reuse=None):
    batch_size=tf.shape(image)[0]
    with tf.variable_scope("disc_labeler",reuse=reuse) as vs:
        dl_bn1 = batch_norm(name='dl_bn1')
        dl_bn2 = batch_norm(name='dl_bn2')
        dl_bn3 = batch_norm(name='dl_bn3')

        h0 = lrelu(conv2d(image, config.df_dim, name='dl_h0_conv'))#16,32,32,64
        h1 = lrelu(dl_bn1(conv2d(h0, config.df_dim*2, name='dl_h1_conv')))#16,16,16,128
        h2 = lrelu(dl_bn2(conv2d(h1, config.df_dim*4, name='dl_h2_conv')))#16,16,16,248
        h3 = lrelu(dl_bn3(conv2d(h2, config.df_dim*8, name='dl_h3_conv')))
        dim3=np.prod(h3.get_shape().as_list()[1:])
        h3_flat=tf.reshape(h3, [-1,dim3])
        D_labels_logits = linear(h3_flat, output_dim, 'dl_h3_Label')
        D_labels = tf.nn.sigmoid(D_labels_logits)
        variables = tf.contrib.framework.get_variables(vs)
    return D_labels, D_labels_logits, variables

def discriminator_gen_labeler(image, output_dim, config, reuse=None):
    batch_size=tf.shape(image)[0]
    with tf.variable_scope("disc_gen_labeler",reuse=reuse) as vs:
        dl_bn1 = batch_norm(name='dl_bn1')
        dl_bn2 = batch_norm(name='dl_bn2')
        dl_bn3 = batch_norm(name='dl_bn3')

        h0 = lrelu(conv2d(image, config.df_dim, name='dgl_h0_conv'))#16,32,32,64
        h1 = lrelu(dl_bn1(conv2d(h0, config.df_dim*2, name='dgl_h1_conv')))#16,16,16,128
        h2 = lrelu(dl_bn2(conv2d(h1, config.df_dim*4, name='dgl_h2_conv')))#16,16,16,248
        h3 = lrelu(dl_bn3(conv2d(h2, config.df_dim*8, name='dgl_h3_conv')))
        dim3=np.prod(h3.get_shape().as_list()[1:])
        h3_flat=tf.reshape(h3, [-1,dim3])
        D_labels_logits = linear(h3_flat, output_dim, 'dgl_h3_Label')
        D_labels = tf.nn.sigmoid(D_labels_logits)
        variables = tf.contrib.framework.get_variables(vs)
    return D_labels, D_labels_logits,variables

def discriminator_on_z(image, config, reuse=None):
    batch_size=tf.shape(image)[0]
    with tf.variable_scope("disc_z_labeler",reuse=reuse) as vs:
        dl_bn1 = batch_norm(name='dl_bn1')
        dl_bn2 = batch_norm(name='dl_bn2')
        dl_bn3 = batch_norm(name='dl_bn3')

        h0 = lrelu(conv2d(image, config.df_dim, name='dzl_h0_conv'))#16,32,32,64
        h1 = lrelu(dl_bn1(conv2d(h0, config.df_dim*2, name='dzl_h1_conv')))#16,16,16,128
        h2 = lrelu(dl_bn2(conv2d(h1, config.df_dim*4, name='dzl_h2_conv')))#16,16,16,248
        h3 = lrelu(dl_bn3(conv2d(h2, config.df_dim*8, name='dzl_h3_conv')))
        dim3=np.prod(h3.get_shape().as_list()[1:])
        h3_flat=tf.reshape(h3, [-1,dim3])
        D_labels_logits = linear(h3_flat, config.z_dim, 'dzl_h3_Label')
        D_labels = tf.nn.tanh(D_labels_logits)
        variables = tf.contrib.framework.get_variables(vs)
    return D_labels,variables


================================================
FILE: causal_dcgan/ops.py
================================================
import math
import numpy as np
import tensorflow as tf

from tensorflow.python.framework import ops

from utils import *


class batch_norm(object):
    def __init__(self, epsilon=1e-5, momentum = 0.9, name="batch_norm"):
        with tf.variable_scope(name):
            self.epsilon  = epsilon
            self.momentum = momentum
            self.name = name

    def __call__(self, x, train=True):
        return tf.contrib.layers.batch_norm(x,
                                          decay=self.momentum,
                                          updates_collections=None,
                                          epsilon=self.epsilon,
                                          scale=True,
                                          is_training=train,
                                          scope=self.name)

def conv_cond_concat(x, y):
    """Concatenate conditioning vector on feature map axis."""
    #print('input x:',x.get_shape().as_list())
    #print('input y:',y.get_shape().as_list())

    xshape=x.get_shape()
    #tile by [1,64,64,1]

    tile_shape=tf.stack([1,xshape[1],xshape[2],1])
    tile_y=tf.tile(y,tile_shape)

    #print('tile y:',tile_y.get_shape().as_list())

    return tf.concat([x,tile_y],axis=3)


    #x_shapes = x.get_shape()
    #y_shapes = y.get_shape()
    #return tf.concat([
    #x, y*tf.ones([x_shapes[0], x_shapes[1], x_shapes[2], y_shapes[3]])], 3)


def conv2d(input_, output_dim,
       k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
       name="conv2d"):
  with tf.variable_scope(name):
    w = tf.get_variable('w', [k_h, k_w, input_.get_shape()[-1], output_dim],
              initializer=tf.truncated_normal_initializer(stddev=stddev))
    conv = tf.nn.conv2d(input_, w, strides=[1, d_h, d_w, 1], padding='SAME')

    biases = tf.get_variable('biases', [output_dim], initializer=tf.constant_initializer(0.0))
    #conv = tf.reshape(tf.nn.bias_add(conv, biases), conv.get_shape())
    conv=tf.nn.bias_add(conv,biases)

    return conv

def deconv2d(input_, output_shape,
       k_h=5, k_w=5, d_h=2, d_w=2, stddev=0.02,
       name="deconv2d", with_w=False):
    with tf.variable_scope(name):
        # filter : [height, width, output_channels, in_channels]
        w = tf.get_variable('w', [k_h, k_w, output_shape[-1], input_.get_shape()[-1]],
                  initializer=tf.random_normal_initializer(stddev=stddev))

        tf_output_shape=tf.stack(output_shape)
        deconv = tf.nn.conv2d_transpose(input_, w, output_shape=tf_output_shape,
                strides=[1, d_h, d_w, 1])

        biases = tf.get_variable('biases', [output_shape[-1]], initializer=tf.constant_initializer(0.0))
        #deconv = tf.reshape(tf.nn.bias_add(deconv, biases), deconv.get_shape())
        deconv = tf.reshape(tf.nn.bias_add(deconv, biases), tf_output_shape)

        if with_w:
            return deconv, w, biases
        else:
            return deconv

def lrelu(x,leak=0.2,name='lrelu'):
    with tf.variable_scope(name):
        f1=0.5 * (1+leak)
        f2=0.5 * (1-leak)
        return f1*x + f2*tf.abs(x)

#This takes more memory than above
#def lrelu(x, leak=0.2, name="lrelu"):
#  return tf.maximum(x, leak*x)

def linear(input_, output_size, scope=None, stddev=0.02, bias_start=0.0, with_w=False):
    shape = input_.get_shape().as_list()

    #mat_shape=tf.stack([tf.shape(input_)[1],output_size])
    mat_shape=[shape[1],output_size]

    with tf.variable_scope(scope or "Linear"):
        #matrix = tf.get_variable("Matrix", [shape[1], output_size], tf.float32,
        matrix = tf.get_variable("Matrix", mat_shape, tf.float32,
                     tf.random_normal_initializer(stddev=stddev))
        bias = tf.get_variable("bias", [output_size],
                   initializer=tf.constant_initializer(bias_start))
        if with_w:
            return tf.matmul(input_, matrix) + bias, matrix, bias
        else:
            return tf.matmul(input_, matrix) + bias


#minibatch method that improves on openai
#because it doesn't fix batchsize:
#TODO: recheck when not sleepy
def add_minibatch_features(image,df_dim):
    shape = image.get_shape().as_list()
    dim = np.prod(shape[1:])            # dim = prod(9,2) = 18
    h_mb0 = lrelu(conv2d(image, df_dim, name='d_mb0_conv'))
    h_mb1 = conv2d(h_mb0, df_dim, name='d_mbh1_conv')

    dims=h_mb1.get_shape().as_list()
    conv_dims=np.prod(dims[1:])

    image_ = tf.reshape(h_mb1, tf.stack([-1, conv_dims]))
    #image_ = tf.reshape(h_mb1, tf.stack([batch_size, -1]))

    n_kernels = 300
    dim_per_kernel = 50
    x = linear(image_, n_kernels * dim_per_kernel,'d_mbLinear')
    act = tf.reshape(x, (-1, n_kernels, dim_per_kernel))

    act= tf.reshape(x, (-1, n_kernels, dim_per_kernel))
    act_tp=tf.transpose(act, [1,2,0])
    #bs x n_ker x dim_ker x bs -> bs x n_ker x bs :
    abs_dif = tf.reduce_sum(tf.abs(tf.expand_dims(act, 3) - tf.expand_dims(act_tp, 0)), 2)
    eye=tf.expand_dims( tf.eye( tf.shape(abs_dif)[0] ), 1)#bs x 1 x bs
    masked=tf.exp(-abs_dif) - eye
    f1=tf.reduce_mean( masked, 2)
    mb_features = tf.reshape(f1, [-1, 1, 1, n_kernels])
    return conv_cond_concat(image, mb_features)

## following is from https://github.com/openai/improved-gan/blob/master/imagenet/discriminator.py#L88
#def add_minibatch_features(image,df_dim,batch_size):
#    shape = image.get_shape().as_list()
#    dim = np.prod(shape[1:])            # dim = prod(9,2) = 18
#    h_mb0 = lrelu(conv2d(image, df_dim, name='d_mb0_conv'))
#    h_mb1 = conv2d(h_mb0, df_dim, name='d_mbh1_conv')
#
#    dims=h_mb1.get_shape().as_list()
#    conv_dims=np.prod(dims[1:])
#
#    image_ = tf.reshape(h_mb1, tf.stack([-1, conv_dims]))
#    #image_ = tf.reshape(h_mb1, tf.stack([batch_size, -1]))
#
#    n_kernels = 300
#    dim_per_kernel = 50
#    x = linear(image_, n_kernels * dim_per_kernel,'d_mbLinear')
#    activation = tf.reshape(x, (batch_size, n_kernels, dim_per_kernel))
#    big = np.zeros((batch_size, batch_size), dtype='float32')
#    big += np.eye(batch_size)
#    big = tf.expand_dims(big, 1)
#    abs_dif = tf.reduce_sum(tf.abs(tf.expand_dims(activation, 3) - tf.expand_dims(tf.transpose(activation, [1, 2, 0]), 0)), 2)
#    mask = 1. - big
#    masked = tf.exp(-abs_dif) * mask
#    f1 = tf.reduce_sum(masked, 2) / tf.reduce_sum(mask)
#    mb_features = tf.reshape(f1, [batch_size, 1, 1, n_kernels])
#    return conv_cond_concat(image, mb_features)


================================================
FILE: causal_dcgan/utils.py
================================================
"""
Some codes from https://github.com/Newmu/dcgan_code
"""
from __future__ import division
import math
import json
import random
import pprint
import scipy.misc
import numpy as np
from time import gmtime, strftime
from six.moves import xrange
import os

pp = pprint.PrettyPrinter()

get_stddev = lambda x, k_h, k_w: 1/math.sqrt(k_w*k_h*x.get_shape()[-1])


def get_image(image_path, input_height, input_width,
              resize_height=64, resize_width=64,
              is_crop=True, is_grayscale=False):
  image = imread(image_path, is_grayscale)
  return transform(image, input_height, input_width,
                   resize_height, resize_width, is_crop)

def save_images(images, size, image_path):
  return imsave(inverse_transform(images), size, image_path)

def imread(path, is_grayscale = False):
  if (is_grayscale):
    return scipy.misc.imread(path, flatten = True).astype(np.float)
  else:
    return scipy.misc.imread(path).astype(np.float)

def merge_images(images, size):
  return inverse_transform(images)

def merge(images, size):
  h, w = images.shape[1], images.shape[2]
  img = np.zeros((h * size[0], w * size[1], 3))
  for idx, image in enumerate(images):
    i = idx % size[1]
    j = idx // size[1]
    img[j*h:j*h+h, i*w:i*w+w, :] = image
  return img

def imsave(images, size, path):
  return scipy.misc.imsave(path, merge(images, size))

def center_crop(x, crop_h, crop_w,
                resize_h=64, resize_w=64):
  if crop_w is None:
    crop_w = crop_h
  h, w = x.shape[:2]
  j = int(round((h - crop_h)/2.))
  i = int(round((w - crop_w)/2.))
  return scipy.misc.imresize(
      x[j:j+crop_h, i:i+crop_w], [resize_h, resize_w])

def transform(image, input_height, input_width,
              resize_height=64, resize_width=64, is_crop=True):
  if is_crop:
    cropped_image = center_crop(
      image, input_height, input_width,
      resize_height, resize_width)
  else:
    cropped_image = scipy.misc.imresize(image, [resize_height, resize_width])
  return np.array(cropped_image)/127.5 - 1.

def inverse_transform(images):
  return (images+1.)/2.

def to_json(output_path, *layers):
  with open(output_path, "w") as layer_f:
    lines = ""
    for w, b, bn in layers:
      layer_idx = w.name.split('/')[0].split('h')[1]

      B = b.eval()

      if "lin/" in w.name:
        W = w.eval()
        depth = W.shape[1]
      else:
        W = np.rollaxis(w.eval(), 2, 0)
        depth = W.shape[0]

      biases = {"sy": 1, "sx": 1, "depth": depth, "w": ['%.2f' % elem for elem in list(B)]}
      if bn != None:
        gamma = bn.gamma.eval()
        beta = bn.beta.eval()

        gamma = {"sy": 1, "sx": 1, "depth": depth, "w": ['%.2f' % elem for elem in list(gamma)]}
        beta = {"sy": 1, "sx": 1, "depth": depth, "w": ['%.2f' % elem for elem in list(beta)]}
      else:
        gamma = {"sy": 1, "sx": 1, "depth": 0, "w": []}
        beta = {"sy": 1, "sx": 1, "depth": 0, "w": []}

      if "lin/" in w.name:
        fs = []
        for w in W.T:
          fs.append({"sy": 1, "sx": 1, "depth": W.shape[0], "w": ['%.2f' % elem for elem in list(w)]})

        lines += """
          var layer_%s = {
            "layer_type": "fc",
            "sy": 1, "sx": 1,
            "out_sx": 1, "out_sy": 1,
            "stride": 1, "pad": 0,
            "out_depth": %s, "in_depth": %s,
            "biases": %s,
            "gamma": %s,
            "beta": %s,
            "filters": %s
          };""" % (layer_idx.split('_')[0], W.shape[1], W.shape[0], biases, gamma, beta, fs)
      else:
        fs = []
        for w_ in W:
          fs.append({"sy": 5, "sx": 5, "depth": W.shape[3], "w": ['%.2f' % elem for elem in list(w_.flatten())]})

        lines += """
          var layer_%s = {
            "layer_type": "deconv",
            "sy": 5, "sx": 5,
            "out_sx": %s, "out_sy": %s,
            "stride": 2, "pad": 1,
            "out_depth": %s, "in_depth": %s,
            "biases": %s,
            "gamma": %s,
            "beta": %s,
            "filters": %s
          };""" % (layer_idx, 2**(int(layer_idx)+2), 2**(int(layer_idx)+2),
               W.shape[0], W.shape[3], biases, gamma, beta, fs)
    layer_f.write(" ".join(lines.replace("'","").split()))

def make_gif(images, fname, duration=2, true_image=False):
    import moviepy.editor as mpy

    def make_frame(t):
        try:
            x = images[int(len(images)/duration*t)]
        except:
            x = images[-1]

    if true_image:
        return x.astype(np.uint8)
    else:
        return ((x+1)/2*255).astype(np.uint8)

    clip = mpy.VideoClip(make_frame, duration=duration)
    clip.write_gif(fname, fps = len(images) / duration)


================================================
FILE: causal_graph.py
================================================
'''
To use a particular causal graph, just specify it here


Strings specified have to match *exactly* to keys in attribute text file


A graph lists each node and it's parents in pairs

A->B, C->D, D->B:
    [['A',[]],
     ['B',['A','D']],
     ['C',[]],
     ['D',[]]]

'''

#A reminder of what labels are available
#Make sure to use caps-sensitive correct spelling
all_nodes=[
        ['5_o_Clock_Shadow',[]],
        ['Arched_Eyebrows',[]],
        ['Attractive',[]],
        ['Bags_Under_Eyes',[]],
        ['Bald',[]],
        ['Bangs',[]],
        ['Big_Lips',[]],
        ['Big_Nose',[]],
        ['Black_Hair',[]],
        ['Blond_Hair',[]],
        ['Blurry',[]],
        ['Brown_Hair',[]],
        ['Bushy_Eyebrows',[]],
        ['Chubby',[]],
        ['Double_Chin',[]],
        ['Eyeglasses',[]],
        ['Goatee',[]],
        ['Gray_Hair',[]],
        ['Heavy_Makeup',[]],
        ['High_Cheekbones',[]],
        ['Male',[]],
        ['Mouth_Slightly_Open',[]],
        ['Mustache',[]],
        ['Narrow_Eyes',[]],
        ['No_Beard',[]],
        ['Oval_Face',[]],
        ['Pale_Skin',[]],
        ['Pointy_Nose',[]],
        ['Receding_Hairline',[]],
        ['Rosy_Cheeks',[]],
        ['Sideburns',[]],
        ['Smiling',[]],
        ['Straight_Hair',[]],
        ['Wavy_Hair',[]],
        ['Wearing_Earrings',[]],
        ['Wearing_Hat',[]],
        ['Wearing_Lipstick',[]],
        ['Wearing_Necklace',[]],
        ['Wearing_Necktie',[]],
        ['Young',[]]
    ]

causal_graphs={
#'complete_all':[
#        ['Young',[]],
#        ['Male',['Young']],
#        ['Eyeglasses',['Male','Young']],
#        ['Bald',            ['Male','Young','Eyeglasses']],
#        ['Mustache',        ['Male','Young','Eyeglasses','Bald']],
#        ['Smiling',         ['Male','Young','Eyeglasses','Bald','Mustache']],
#        ['Wearing_Lipstick',['Male','Young','Eyeglasses','Bald','Mustache','Smiling']],
#        ['Mouth_Slightly_Open',['Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick']],
#        ['Narrow_Eyes',['Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['5_o_Clock_Shadow',['Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Arched_Eyebrows',['5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Attractive',['Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Bags_Under_Eyes',['Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Bangs',['Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Big_Lips',['Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Big_Nose',['Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Black_Hair',['Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Blond_Hair',['Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Blurry',['Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Brown_Hair',['Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Bushy_Eyebrows',['Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Chubby',['Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#        ['Double_Chin',['Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Goatee',['Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Gray_Hair',['Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Heavy_Makeup',['Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['High_Cheekbones',['Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Mouth_Slightly_Open',['High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Mustache',['Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Narrow_Eyes',['Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['No_Beard',['Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Oval_Face',['No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Pale_Skin',['Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Pointy_Nose',['Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Receding_Hairline',['Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Rosy_Cheeks',['Receding_Hairline','Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Sideburns',['Rosy_Cheeks','Receding_Hairline','Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Smiling',['Sideburns','Rosy_Cheeks','Receding_Hairline','Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Straight_Hair',['Smiling','Sideburns','Rosy_Cheeks','Receding_Hairline','Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Wavy_Hair',['Straight_Hair','Smiling','Sideburns','Rosy_Cheeks','Receding_Hairline','Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Wearing_Earrings',['Wavy_Hair','Straight_Hair','Smiling','Sideburns','Rosy_Cheeks','Receding_Hairline','Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Wearing_Hat',['Wearing_Earrings','Wavy_Hair','Straight_Hair','Smiling','Sideburns','Rosy_Cheeks','Receding_Hairline','Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Wearing_Lipstick',['Wearing_Hat','Wearing_Earrings','Wavy_Hair','Straight_Hair','Smiling','Sideburns','Rosy_Cheeks','Receding_Hairline','Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Wearing_Necklace',['Wearing_Lipstick','Wearing_Hat','Wearing_Earrings','Wavy_Hair','Straight_Hair','Smiling','Sideburns','Rosy_Cheeks','Receding_Hairline','Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
        #['Wearing_Necktie',['Wearing_Necklace','Wearing_Lipstick','Wearing_Hat','Wearing_Earrings','Wavy_Hair','Straight_Hair','Smiling','Sideburns','Rosy_Cheeks','Receding_Hairline','Pointy_Nose','Pale_Skin','Oval_Face','No_Beard','Narrow_Eyes','Mustache','Mouth_Slightly_Open','High_Cheekbones','Heavy_Makeup','Gray_Hair','Goatee','Double_Chin','Chubby','Bushy_Eyebrows','Brown_Hair','Blurry','Blond_Hair','Black_Hair','Big_Nose','Big_Lips','Bangs','Bags_Under_Eyes','Attractive','Arched_Eyebrows','5_o_Clock_Shadow','Narrow_Eyes','Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
#    ],

'subset1_nodes':[
    ['Bald',[]],
#        ['Blurry',[]],
#        ['Brown_Hair',[]],
#        ['Bushy_Eyebrows',[]],
#        ['Chubby',[]],
    ['Double_Chin',[]],
#        ['Eyeglasses',[]],
#        ['Goatee',[]],
#        ['Gray_Hair',[]],
    ['Male',[]],
    ['Mustache',[]],
    ['No_Beard',[]],
    ['Smiling',[]],
#        ['Straight_Hair',[]],
#        ['Wavy_Hair',[]],
    ['Wearing_Earrings',[]],
#        ['Wearing_Hat',[]],
    ['Wearing_Lipstick',[]],
    ['Young',[]]
],


'standard_graph':[
   ['Male'   , []              ],
   ['Young'  , []              ],
   ['Smiling', ['Male','Young']]
   ],

'male_causes_beard':[
    ['Male',[]],
    ['No_Beard',['Male']],
],
'male_causes_mustache':[
    ['Male',[]],
    ['Mustache',['Male']],
],

'mustache_causes_male':[
    ['Male',['Mustache']],
    ['Mustache',[]],
],

'young_causes_gray':[
    ['Young',[]],
    ['Gray_Hair',['Young']],
    ],

'gray_causes_young':[
    ['Young',['Gray_Hair']],
    ['Gray_Hair',[]],
    ],

'young_ind_gray':[
        ['Young',[]],
        ['Gray_Hair',[]],
        ],


'small_causal_graph':[
        ['Young',[]],
        ['Male',[]],
        ['Mustache',        ['Male','Young']],
        ['Smiling',         ['Male','Young']],
        ['Wearing_Lipstick',['Male','Young']],
        ['Mouth_Slightly_Open',['Male','Young','Smiling']],
        ['Narrow_Eyes',        ['Male','Young','Smiling']],
    ],


'big_causal_graph':[
        ['Young',[]],
        ['Male',[]],
        ['Eyeglasses',['Young']],
        ['Bald',            ['Male','Young']],
        ['Mustache',        ['Male','Young']],
        ['Smiling',         ['Male','Young']],
        ['Wearing_Lipstick',['Male','Young']],
        ['Mouth_Slightly_Open',['Young','Smiling']],
        ['Narrow_Eyes',        ['Male','Young','Smiling']],
    ],

'complete_big_causal_graph':[
        ['Young',[]],
        ['Male',['Young']],
        ['Eyeglasses',['Male','Young']],
        ['Bald',            ['Male','Young','Eyeglasses']],
        ['Mustache',        ['Male','Young','Eyeglasses','Bald']],
        ['Smiling',         ['Male','Young','Eyeglasses','Bald','Mustache']],
        ['Wearing_Lipstick',['Male','Young','Eyeglasses','Bald','Mustache','Smiling']],
        ['Mouth_Slightly_Open',['Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick']],
        ['Narrow_Eyes',['Male','Young','Eyeglasses','Bald','Mustache','Smiling','Wearing_Lipstick','Mouth_Slightly_Open']],
    ],

'reverse_complete_big_causal_graph':[

        ['Narrow_Eyes',        []],
        ['Mouth_Slightly_Open',['Narrow_Eyes']],
        ['Wearing_Lipstick',   ['Narrow_Eyes','Mouth_Slightly_Open']],
        ['Smiling',            ['Narrow_Eyes','Mouth_Slightly_Open','Wearing_Lipstick']],
        ['Mustache',           ['Narrow_Eyes','Mouth_Slightly_Open','Wearing_Lipstick','Smiling']],
        ['Bald',               ['Narrow_Eyes','Mouth_Slightly_Open','Wearing_Lipstick','Smiling','Mustache']],
        ['Eyeglasses',         ['Narrow_Eyes','Mouth_Slightly_Open','Wearing_Lipstick','Smiling','Mustache','Bald']],
        ['Male',               ['Narrow_Eyes','Mouth_Slightly_Open','Wearing_Lipstick','Smiling','Mustache','Bald','Eyeglasses']],
        ['Young',              ['Narrow_Eyes','Mouth_Slightly_Open','Wearing_Lipstick','Smiling','Mustache','Bald','Eyeglasses','Male']],

    ],

'indep_big_causal_graph':[
        ['Young',[]],
        ['Male',[]],
        ['Eyeglasses',[]],
        ['Bald',            []],
        ['Mustache',        []],
        ['Smiling',         []],
        ['Wearing_Lipstick',[]],
        ['Mouth_Slightly_Open',[]],
        ['Narrow_Eyes',        []],
    ],


'complete_minimal_graph':[
        ['Young',[]],
        ['Male',['Young']],
        ['Mustache',        ['Male','Young']],
        ['Wearing_Lipstick',['Male','Young','Mustache']],
        ['Smiling',         ['Male','Young','Mustache','Wearing_Lipstick']],
    ],

'male_ind_mustache ': [
        ['Male',[]],
        ['Mustache',[]]
    ],
'Smiling_MSO ': [
        ['Smiling',[]],
        ['Mouth_Slightly_Open',['Smiling']]
       ],

'Male_Young_Eyeglasses':[
    ['Male',[]],
    ['Young',[]],
    ['Eyeglasses',['Male','Young']]
    ],

'MYESO':[
    ['Male',[]],
    ['Young',['Male']],
    ['Eyeglasses',['Male','Young']],
    ['Smiling',['Male','Young','Eyeglasses']],
    ['Mouth_Slightly_Open',['Male','Young','Eyeglasses','Smiling']],
    ],

'mustache':[
    ['Mustache',[]]
    ],

'male_ind_mustache ': [
        ['Male',[]],
        ['Mustache',[]]
    ],

'male_smiling_lipstick':[
       ['Male'   , []],
       ['Wearing_Lipstick'  , ['Male']],
       ['Smiling', ['Male']]
       ],
'SLM':[
       ['Smiling'   , []],
       ['Wearing_Lipstick'  , ['Smiling']],
       ['Male', ['Smiling','Wearing_Lipstick']]
       ],
'MLS':[
       ['Male'   , []],
       ['Wearing_Lipstick'  , ['Male']],
       ['Smiling', ['Male','Wearing_Lipstick']]
       ],
'M':[
    ['Male',[]]
    ],

'Smiling_MSO ': [
        ['Smiling',[]],
        ['Mouth_Slightly_Open',['Smiling']]
       ],
'MYESO':[
    ['Male',[]],
    ['Young',['Male']],
    ['Eyeglasses',['Male','Young']],
    ['Smiling',['Male','Young','Eyeglasses']],
    ['Mouth_Slightly_Open',['Male','Young','Eyeglasses','Smiling']],
    ],

'MSO_smiling ': [
        ['Smiling',['Mouth_Slightly_Open']],
        ['Mouth_Slightly_Open',[]]
       ],
'Male_Young_Eyeglasses ': [
        ['Male',[]],
        ['Young',[]],
        ['Eyeglasses',['Male','Young']]
        ],
'Male_Young_Eyeglasses_complete ': [
        ['Male',[]],
        ['Young',['Male']],
        ['Eyeglasses',['Male','Young']]
        ],
'male_mustache_lipstick':[
       ['Male'   , []],
       ['Mustache', ['Male']],
       ['Wearing_Lipstick'  , ['Male','Mustache']]
       ]
}

def get_causal_graph(causal_model=None,*args,**kwargs):

    #define complete_all
    list_nodes,_=zip(*all_nodes)
    complete_all=[]
    so_far=[]
    for node in list_nodes:
        complete_all.append([node,so_far[:]])
        so_far.append(node)
    causal_graphs['complete_all']=complete_all


    if not causal_model in causal_graphs.keys():
        raise ValueError('the specified graph:',causal_model,' was not one of\
                         those listed in ',__file__)

    else:
        return causal_graphs[causal_model]


================================================
FILE: config.py
================================================
from __future__ import print_function
import argparse

def str2bool(v):
    #return (v is True) or (v.lower() in ('true', '1'))
    return v is True or v.lower() in ('true', '1')

arg_lists = []
parser = argparse.ArgumentParser()

def add_argument_group(name):
    arg = parser.add_argument_group(name)
    arg_lists.append(arg)
    return arg

# Data
data_arg = add_argument_group('Data')
#data_arg.add_argument('--batch_size', type=int, default=16)#default set elsewhere
data_arg.add_argument('--causal_model', type=str,
                     help='''Matches the argument with a key in ./causal_graph.py and sets the graph attribute of cc_config to be a list of lists defining the causal graph''')
data_arg.add_argument('--data_dir', type=str, default='data')
data_arg.add_argument('--dataset', type=str, default='celebA')
data_arg.add_argument('--do_shuffle', type=str2bool, default=True)#never used
data_arg.add_argument('--input_scale_size', type=int, default=64,
                     help='input image will be resized with the given value as width and height')
data_arg.add_argument('--is_crop', type=str2bool, default='True')
data_arg.add_argument('--grayscale', type=str2bool, default=False)#never used
data_arg.add_argument('--split', type=str, default='train')#never used
data_arg.add_argument('--num_worker', type=int, default=24,
                     help='number of threads to use for loading and preprocessing data')
data_arg.add_argument('--resize_method',type=str,default='AREA',choices=['AREA','BILINEAR','BICUBIC','NEAREST_NEIGHBOR'],
                     help='''methods to resize image to 64x64. AREA seems to work
                     best, possibly some scipy methods could work better. It
                     wasn't clear to me why the results should be so different''')


# Training / test parameters
train_arg = add_argument_group('Training')


train_arg.add_argument('--build_train', type=str2bool, default=False,
                      help='''You may want to build all the components for
                       training, without doing any training right away. This is
                      for that. This arg is effectively True when is_train=True''')
train_arg.add_argument('--build_pretrain', type=str2bool, default=False,
                      help='''You may want to build all the components for
                       training, without doing any training right away. This is
                      for that. This arg is effectively True when is_pretrain=True''')


train_arg.add_argument('--model_type',type=str,default='',choices=['dcgan','began'],
                      help='''Which model to use. If the argument is not
                       passed, only causal_controller is built. This overrides
                      is_train=True, since no image model to train''')
train_arg.add_argument('--use_gpu', type=str2bool, default=True)
train_arg.add_argument('--num_gpu', type=int, default=1,
                      help='specify 0 for cpu. If k specified, will default to\
                      first k of n detected. If use_gpu=True but num_gpu not\
                      specified will default to 1')

# Misc
misc_arg = add_argument_group('Misc')
#misc_arg.add_argument('--build_all', type=str2bool, default=False,
#                     help='''normally specifying is_pretrain=False will cause
#                     the pretraining components not to be built and likewise
#                      with is_train=False only the pretrain compoenent will
#                      (possibly) be built. This is here as a debug helper to
#                      enable building out the whole model without doing any
#                      training''')

misc_arg.add_argument('--descrip', type=str, default='',help='''
                      Only use this when creating a new model. New model folder names
                      are generated automatically by using the time-date. Then
                      you cant rename them while the model is running. If
                      provided, this is a short string that appends to the end
                      of a model folder name to help keep track of what the
                      contents of that folder were without getting into the
                      content of that folder. No weird characters''')

misc_arg.add_argument('--dry_run', action='store_true',help='''Build and load
                      the model and all the specified components, but don't actually do
                      any pretraining/training etc. This overrides
                      --is_pretrain, --is_train. This is mostly used for just
                      bringing the model into the workspace if you say wanted
                      to manipulated it in ipython''')

misc_arg.add_argument('--load_path', type=str, default='',
                     help='''This is a "global" load path. You can simply pass
                     the model_dir of the whatever run, and all the variables
                      (dcgan/began and causal_controller both). If you want to
                      just load one component: for example, the pretrained part
                      of a previous model, use pt_load_path from the
                      causal_controller.config section''')

misc_arg.add_argument('--log_step', type=int, default=100,
                     help='''this is used for generic summaries that are common
                     to both models. Use model specific config files for
                     logging done within train_step''')
#misc_arg.add_argument('--save_step', type=int, default=5000)
misc_arg.add_argument('--log_level', type=str, default='INFO', choices=['INFO', 'DEBUG', 'WARN'])
misc_arg.add_argument('--log_dir', type=str, default='logs', help='''where to store model and model results. Do not put a leading "./" out front''')

#misc_arg.add_argument('--sample_per_image', type=int, default=64,
#                      help='# of sample per image during test sample generation')

misc_arg.add_argument('--seed', type=int, default=22,help=
                      '''Not working right now: TF seed should be fixed to make sure exogenous noise for each causal node is fixed also''')

#Doesn't do anything atm
#misc_arg.add_argument('--visualize', action='store_true')


def gpu_logic(config):

    #consistency between use_gpu and num_gpu
    if config.num_gpu>0:
        config.use_gpu=True
    else:
        config.use_gpu=False
#        if config.use_gpu and config.num_gpu==0:
#            config.num_gpu=1
    return config


def get_config():
    config, unparsed = parser.parse_known_args()
    config=gpu_logic(config)
    config.num_devices=max(1,config.num_gpu)#that are used in backprop


    #Just for BEGAN:
    ##this has to respect gpu/cpu
    ##data_format = 'NCHW'
    #if config.use_gpu:
    #    data_format = 'NCHW'
    #else:
    #    data_format = 'NHWC'
    #setattr(config, 'data_format', data_format)

    print('Loaded ./config.py')

    return config, unparsed

if __name__=='__main__':
    #for debug of config
    config, unparsed = get_config()


================================================
FILE: data_loader.py
================================================
import os
import numpy as np
import pandas as pd
from PIL import Image
from glob import glob
import tensorflow as tf

from IPython.core import debugger
debug = debugger.Pdb().set_trace


def logodds(p):
    return np.log(p/(1.-p))

class DataLoader(object):
    '''This loads the image and the labels through a tensorflow queue.
    All of the labels are loaded regardless of what is specified in graph,
    because this model is gpu throttled anyway so there shouldn't be any
    overhead

    For multiple gpu, the strategy here is to have 1 queue with 2xbatch_size
    then use tf.split within trainer.train()
    '''
    def __init__(self,label_names,config):
        self.label_names=label_names
        self.config=config
        self.scale_size=config.input_scale_size
        #self.data_format=config.data_format
        self.split=config.split
        self.do_shuffle=config.do_shuffle
        self.num_worker=config.num_worker
        self.is_crop=config.is_crop
        self.is_grayscale=config.grayscale

        attr_file= glob("{}/*.{}".format(config.data_path, 'txt'))[0]
        setattr(config,'attr_file',attr_file)

        attributes = pd.read_csv(config.attr_file,delim_whitespace=True) #+-1
        #Store all labels for reference
        self.all_attr= 0.5*(attributes+1)# attributes is {0,1}
        self.all_label_means=self.all_attr.mean()

        #but only return desired labels in queues
        self.attr=self.all_attr[label_names]
        self.label_means=self.attr.mean()# attributes is 0,1

        self.image_dir=os.path.join(config.data_path,'images')
        self.filenames=[os.path.join(self.image_dir,j) for j in self.attr.index]

        self.num_examples_per_epoch=len(self.filenames)
        self.min_fraction_of_examples_in_queue=0.001#go faster during debug
        #self.min_fraction_of_examples_in_queue=0.01
        self.min_queue_examples=int(self.num_examples_per_epoch*self.min_fraction_of_examples_in_queue)


    def get_label_queue(self,batch_size):
        tf_labels = tf.convert_to_tensor(self.attr.values, dtype=tf.uint8)#0,1

        with tf.name_scope('label_queue'):
            uint_label=tf.train.slice_input_producer([tf_labels])[0]
        label=tf.to_float(uint_label)

        #All labels, not just those in causal_model
        dict_data={sl:tl for sl,tl in
                   zip(self.label_names,tf.split(label,len(self.label_names)))}


        num_preprocess_threads = max(self.num_worker-3,1)

        data_batch = tf.train.shuffle_batch(
                dict_data,
                batch_size=batch_size,
                num_threads=num_preprocess_threads,
                capacity=self.min_queue_examples + 3 * batch_size,
                min_after_dequeue=self.min_queue_examples,
                )

        return data_batch

    def get_data_queue(self,batch_size):
        image_files = tf.convert_to_tensor(self.filenames, dtype=tf.string)
        tf_labels = tf.convert_to_tensor(self.attr.values, dtype=tf.uint8)

        with tf.name_scope('filename_queue'):
            #must be list
            str_queue=tf.train.slice_input_producer([image_files,tf_labels])
        img_filename, uint_label= str_queue

        img_contents=tf.read_file(img_filename)
        image = tf.image.decode_jpeg(img_contents, channels=3)

        image=tf.cast(image,dtype=tf.float32)
        if self.config.is_crop:#use dcgan cropping
            #dcgan center-crops input to 108x108, outputs 64x64 #centrally crops it #We emulate that here
            image=tf.image.resize_image_with_crop_or_pad(image,108,108)
            #image=tf.image.resize_bilinear(image,[scale_size,scale_size])#must be 4D

            resize_method=getattr(tf.image.ResizeMethod,self.config.resize_method)
            image=tf.image.resize_images(image,[self.scale_size,self.scale_size],
                    method=resize_method)
            #Some dataset enlargement. Might as well.
            image=tf.image.random_flip_left_right(image)

            ##carpedm-began crops to 128x128 starting at (50,25), then resizes to 64x64
            #image=tf.image.crop_to_bounding_box(image, 50, 25, 128, 128)
            #image=tf.image.resize_nearest_neighbor(image, [scale_size, scale_size])

            tf.summary.image('real_image',tf.expand_dims(image,0))


        label=tf.to_float(uint_label)
        #Creates a dictionary  {'Male',male_tensor, 'Young',young_tensor} etc..
        dict_data={sl:tl for sl,tl in
                   zip(self.label_names,tf.split(label,len(self.label_names)))}
        assert not 'x' in dict_data.keys()#don't have a label named "x"
        dict_data['x']=image

        print ('Filling queue with %d Celeb images before starting to train. '
            'I don\'t know how long this will take' % self.min_queue_examples)
        num_preprocess_threads = max(self.num_worker,1)

        data_batch = tf.train.shuffle_batch(
                dict_data,
                batch_size=batch_size,
                num_threads=num_preprocess_threads,
                capacity=self.min_queue_examples + 3 * batch_size,
                min_after_dequeue=self.min_queue_examples,
                )
        return data_batch


================================================
FILE: download.py
================================================
"""
Modification of
https://github.com/carpedm20/BEGAN-tensorflow/blob/master/download.py
"""
from __future__ import print_function
import os
import zipfile
import requests
import subprocess
from tqdm import tqdm
from collections import OrderedDict

def download_file_from_google_drive(id, destination):
    URL = "https://docs.google.com/uc?export=download"
    session = requests.Session()

    response = session.get(URL, params={ 'id': id }, stream=True)
    token = get_confirm_token(response)

    if token:
        params = { 'id' : id, 'confirm' : token }
        response = session.get(URL, params=params, stream=True)

    save_response_content(response, destination)

def get_confirm_token(response):
    for key, value in response.cookies.items():
        if key.startswith('download_warning'):
            return value
    return None

def save_response_content(response, destination, chunk_size=32*1024):
    total_size = int(response.headers.get('content-length', 0))
    with open(destination, "wb") as f:
        for chunk in tqdm(response.iter_content(chunk_size), total=total_size,
                          unit='B', unit_scale=True, desc=destination):
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)

def unzip(filepath):
    print("Extracting: " + filepath)
    base_path = os.path.dirname(filepath)
    with zipfile.ZipFile(filepath) as zf:
        zf.extractall(base_path)
    os.remove(filepath)

def download_celeb_a(base_path):
    data_path = os.path.join(base_path, 'celebA')
    images_path = os.path.join(data_path, 'images')
    if os.path.exists(data_path):
        print('[!] Found celeb-A - skip')
        return

    filename, drive_id  = "img_align_celeba.zip", "0B7EVK8r0v71pZjFTYXZWM3FlRnM"
    save_path = os.path.join(base_path, filename)

    if os.path.exists(save_path):
        print('[*] {} already exists'.format(save_path))
    else:
        download_file_from_google_drive(drive_id, save_path)

    zip_dir = ''
    with zipfile.ZipFile(save_path) as zf:
        zip_dir = zf.namelist()[0]
        zf.extractall(base_path)
    if not os.path.exists(data_path):
        os.mkdir(data_path)
    os.rename(os.path.join(base_path, "img_align_celeba"), images_path)
    os.remove(save_path)

    download_attr_file(data_path)


def download_attr_file(data_path):
    attr_gdID='0B7EVK8r0v71pblRyaVFSWGxPY0U'
    attr_fname=os.path.join(data_path,'list_attr_celeba.txt')
    download_file_from_google_drive(attr_gdID, attr_fname)
    delete_top_line(attr_fname)#make pandas readable
    #Top line was just an integer saying how many samples there were

def prepare_data_dir(path = './data'):
    if not os.path.exists(path):
        os.mkdir(path)

# check, if file exists, make link
def check_link(in_dir, basename, out_dir):
    in_file = os.path.join(in_dir, basename)
    if os.path.exists(in_file):
        link_file = os.path.join(out_dir, basename)
        rel_link = os.path.relpath(in_file, out_dir)
        os.symlink(rel_link, link_file)

def add_splits(base_path):
    data_path = os.path.join(base_path, 'celebA')
    images_path = os.path.join(data_path, 'images')
    train_dir = os.path.join(data_path, 'splits', 'train')
    valid_dir = os.path.join(data_path, 'splits', 'valid')
    test_dir = os.path.join(data_path, 'splits', 'test')
    if not os.path.exists(train_dir):
        os.makedirs(train_dir)
    if not os.path.exists(valid_dir):
        os.makedirs(valid_dir)
    if not os.path.exists(test_dir):
        os.makedirs(test_dir)

    # these constants based on the standard celebA splits
    NUM_EXAMPLES = 202599
    TRAIN_STOP = 162770
    VALID_STOP = 182637

    for i in range(0, TRAIN_STOP):
        basename = "{:06d}.jpg".format(i+1)
        check_link(images_path, basename, train_dir)
    for i in range(TRAIN_STOP, VALID_STOP):
        basename = "{:06d}.jpg".format(i+1)
        check_link(images_path, basename, valid_dir)
    for i in range(VALID_STOP, NUM_EXAMPLES):
        basename = "{:06d}.jpg".format(i+1)
        check_link(images_path, basename, test_dir)

def delete_top_line(txt_fname):
    lines=open(txt_fname,'r').readlines()
    open(txt_fname,'w').writelines(lines[1:])

if __name__ == '__main__':
    base_path = './data'
    prepare_data_dir()
    download_celeb_a(base_path)
    add_splits(base_path)


================================================
FILE: figure_scripts/__init__.py
================================================


================================================
FILE: figure_scripts/distributions.py
================================================
import tensorflow as tf
import numpy as np
import os
import scipy.misc
import numpy as np
import pandas as pd
from tqdm import trange,tqdm
import pandas as pd
from itertools import combinations, product
import sys
from utils import save_figure_images,make_sample_dir,guess_model_step
from sample import get_joint,sample


def get_pdf(model, do_dict=None,cond_dict=None,name='',N=6400,return_discrete=True,step=''):
    str_step=str(step) or guess_model_step(model)

    joint=get_joint(model,int_do_dict=do_dict,int_cond_dict=cond_dict,N=N,return_discrete=return_discrete)

    sample_dir=make_sample_dir(model)

    if name:
        name+='_'
    f_pdf=os.path.join(sample_dir,str_step+name+'dist'+'.csv')

    pdf=pd.DataFrame.from_dict({k:val.mean() for k,val in joint.items()})

    #print 'get pdf cond_dict:',cond_dict
    if not do_dict and not cond_dict:
        data=model.attr.mean()
        pdf['data']=data
    if not do_dict and cond_dict:
        bool_cond=np.logical_and.reduce([model.attr[k]==v for k,v in cond_dict.items()])
        attr=model.attr[bool_cond]
        pdf['data']=attr.mean()

    print 'Writing to file',f_pdf
    pdf.to_csv(f_pdf)

    return pdf


TINY=1e-6
def get_interv_table(model,intrv=True):

    n_batches=25
    table_outputs=[]
    d_vals=np.linspace(TINY,0.6,n_batches)
    for name in model.cc.node_names:
        outputs=[]
        for d_val in d_vals:
            do_dict={model.cc.node_dict[name].label_logit : d_val*np.ones((model.batch_size,1))}
            outputs.append(model.sess.run(model.fake_labels,do_dict))

        out=np.vstack(outputs)
        table_outputs.append(out)

    table=np.stack(table_outputs,axis=2)

    np.mean(np.round(table),axis=0)

    return table

#dT=pd.DataFrame(index=p_names, data=T, columns=do_names)
#T=np.mean(np.round(table),axis=0)
#table=get_interv_table(model)


def record_interventional(model,step=''):
    '''
    designed for truncated exponential noise.
    For each node that could be intervened on,
    sample interventions from the continuous
    distribution that discrete intervention
    corresponds to. Collect the joint and output
    to a csv file
    '''
    make_sample_dir(model)

    str_step=str(step)
    if str_step=='':
        if hasattr(model,'step'):
            str_step=str( model.sess.run(model.step) )+'_'

    m=20
    do =lambda val: np.linspace(0,val*0.8,m)
    for name in model.cc.node_names:
        for int_val,intv in enumerate([do(-1), do(+1)]):
            do_dict={name:intv}

            joint=get_joint(model, do_dict=None, N=5,return_discrete=True,step='')

            lab_df=pd.DataFrame(data=joint['g_fake_label'])
            dfl_df=pd.DataFrame(data=joint['d_fake_label'])

            lab_fname=str_step+str(name)+str(int_val)+'.csv'
            dfl_fname=str_step+str(name)+str(int_val)+'.csv'

            lab_df.to_csv(lab_fname)
            dfl_df.to_csv(dfl_fname)

    #with open(dfl_xtab_fn,'w') as dlf_f, open(lab_xtab_fn,'w') as lab_f:


================================================
FILE: figure_scripts/encode.py
================================================
#from __future__ import print_function
import tensorflow as tf
#import scipy
import scipy.misc
import numpy as np
from tqdm import trange
import os
import pandas as pd
from itertools import combinations
import sys
from Causal_controller import *
from began.models import GeneratorCNN, DiscriminatorCNN
from utils import to_nhwc,read_prepared_uint8_image,make_encode_dir

from utils import transform, inverse_transform #dcgan img norm
from utils import norm_img, denorm_img #began norm image

def var_like_z(z_ten,name):
    z_dim=z_ten.get_shape().as_list()[-1]
    return tf.get_variable(name,shape=(1,z_dim))
def noise_like_z(z_ten,name):
    z_dim=z_ten.get_shape().as_list()[-1]
    noise=tf.random_uniform([1,z_dim],minval=-1.,maxval=1.,)
    return noise


class Encoder:
    '''
    This is a class where you pass a model, and an image file
    and it creates more tensorflow variables, along with
    surrounding saving and summary functionality for encoding
    that image back into the hidden space using gradient descent
    '''
    model_name = "Encode.model"
    model_type= 'encoder'
    summ_col='encoder_summaries'
    def __init__(self,model,image,image_name=None,max_tr_steps=50000,load_path=''):
        '''
        image is assumed to be a path to a precropped 64x64x3 uint8 image
        '''

        #Some hardcoded defaults here
        self.log_step=500
        self.lr=0.0005
        self.max_tr_steps=max_tr_steps

        self.model=model
        self.load_path=load_path

        self.image_name=image_name or os.path.basename(image).replace('.','_')
        self.encode_dir=make_encode_dir(model,self.image_name)
        self.model_dir=self.encode_dir#different from self.model.model_dir
        self.save_dir=os.path.join(self.model_dir,'save')

        self.sess=self.model.sess#session should already be in progress

        if model.model_type =='dcgan':
            self.data_format='NHWC'#Don't change
        elif model.model_type == 'began':
            self.data_format=model.data_format#'NCHW' if gpu
        else:
            raise Exception('Should not happen. model_type=',model.model_type)

        #Notation:
        #self.uint_x/G ; 3D [0,255]
        #self.x/G ; 4D [-1,1]
        self.uint_x=read_prepared_uint8_image(image)#x is [0,255]

        print('Read image shape',self.uint_x.shape)
        self.x=norm_img(np.expand_dims(self.uint_x,0),self.data_format)#bs=1
        #self.x=norm_img(tf.expand_dims(self.uint_x,0),self.data_format)#bs=1
        print('Shape after norm:',self.x.get_shape().as_list())


        ##All variables created under encoder have uniform init
        vs=tf.variable_scope('encoder',
             initializer=tf.random_uniform_initializer(minval=-1.,maxval=1.),
             dtype=tf.float32)


        with vs as scope:
            #avoid creating adams params
            optimizer = tf.train.GradientDescentOptimizer
            #optimizer = tf.train.AdamOptimizer
            self.g_optimizer = optimizer(self.lr)

            encode_var={n.name:var_like_z(n.z,n.name) for n in model.cc.nodes}
            encode_var['gen']=var_like_z(model.z_gen,'gen')
            print 'encode variables created'
            self.train_var = tf.contrib.framework.get_variables(scope)
            self.step=tf.Variable(0,name='step')
            self.var = tf.contrib.framework.get_variables(scope)

        #all encode vars created by now
        self.saver = tf.train.Saver(var_list=self.var)
        print('Summaries will be written to ',self.model_dir)
        self.summary_writer = tf.summary.FileWriter(self.model_dir)

        #load or initialize enmodel variables
        self.init()

        if model.model_type =='dcgan':
            self.cc=CausalController(graph=model.graph, input_dict=encode_var, reuse=True)
            self.fake_labels_logits= tf.concat( self.cc.list_label_logits(),-1 )
            self.z_fake_labels=self.fake_labels_logits
            #self.z_gen = noise_like_z( self.model.z_gen,'en_z_gen')
            self.z_gen=encode_var['gen']
            self.z= tf.concat( [self.z_gen, self.z_fake_labels], axis=1 , name='z')

            self.G=model.generator( self.z , bs=1, reuse=True)

        elif model.model_type == 'began':
            with tf.variable_scope('tower'):#reproduce variable scope
                self.cc=CausalController(graph=model.graph, input_dict=encode_var, reuse=True)

                self.fake_labels= tf.concat( self.cc.list_labels(),-1 )
                self.fake_labels_logits= tf.concat( self.cc.list_label_logits(),-1 )
                #self.z_gen = noise_like_z( self.model.z_gen,'en_z_gen')
                self.z_gen=encode_var['gen']
                self.z= tf.concat( [self.fake_labels, self.z_gen],axis=-1,name='z')

                self.G,_ = GeneratorCNN(
                        self.z, model.conv_hidden_num, model.channel,
                        model.repeat_num, model.data_format,reuse=True)

                d_out, self.D_zG, self.D_var = DiscriminatorCNN(
                        self.G, model.channel, model.z_num,
                    model.repeat_num, model.conv_hidden_num,
                    model.data_format,reuse=True)

                _   , self.D_zX, _           = DiscriminatorCNN(
                        self.x, model.channel, model.z_num,
                    model.repeat_num, model.conv_hidden_num,
                    model.data_format,reuse=True)
                self.norm_AE_G=d_out

                #AE_G, AE_x = tf.split(d_out, 2)
                self.AE_G=denorm_img(self.norm_AE_G, model.data_format)
            self.aeg_sum=tf.summary.image('encoder/AE_G',self.AE_G)

        node_summaries=[]
        for node in self.cc.nodes:
            with tf.name_scope(node.name):
                ave_label=tf.reduce_mean(node.label)
                node_summaries.append(tf.summary.scalar('ave',ave_label))


        #unclear how scope with adam param works
        #with tf.variable_scope('encoderGD') as scope:

        #use L1 loss
        #self.g_loss_image = tf.reduce_mean(tf.abs(self.x - self.G))

        #use L2 loss
        #self.g_loss_image = tf.reduce_mean(tf.square(self.x - self.G))

        #use autoencoder reconstruction loss  #3.1.1 series
        #self.g_loss_image = tf.reduce_mean(tf.abs(self.x - self.norm_AE_G))

        #use L1 in autoencoded space# 3.2
        self.g_loss_image = tf.reduce_mean(tf.abs(self.D_zX - self.D_zG))

        g_loss_sum=tf.summary.scalar( 'encoder/g_loss_image',\
                          self.g_loss_image,self.summ_col)

        self.g_loss= self.g_loss_image
        self.train_op=self.g_optimizer.minimize(self.g_loss,
               var_list=self.train_var,global_step=self.step)

        self.uint_G=tf.squeeze(denorm_img( self.G ,self.data_format))#3D[0,255]
        gimg_sum=tf.summary.image( 'encoder/Reconstruct',tf.stack([self.uint_x,self.uint_G]),\
                max_outputs=2,collections=self.summ_col)

        #self.summary_op=tf.summary.merge_all(self.summ_col)
        #self.summary_op=tf.summary.merge_all(self.summ_col)

        if model.model_type=='dcgan':
            self.summary_op=tf.summary.merge([g_loss_sum,gimg_sum]+node_summaries)
        elif model.model_type=='began':
            self.summary_op=tf.summary.merge([g_loss_sum,gimg_sum,self.aeg_sum]+node_summaries)


        #print 'encoder summaries:',self.summ_col
        #print 'encoder summaries:',tf.get_collection(self.summ_col)


    def init(self):
        if self.load_path:
            print 'Attempting to load directly from path:',
            print self.load_path
            self.saver.restore(self.sess,self.load_path)
        else:
            print 'New ENCODE Model..init new Z parameters'
            init=tf.variables_initializer(var_list=self.var)
            print 'Initializing following variables:'
            for v in self.var:
                print v.name, v.get_shape().as_list()

            self.model.sess.run(init)

    def save(self, step=None):
        if step is None:
            step=self.sess.run(self.step)

        if not os.path.exists(self.save_dir):
            print 'Creating Directory:',self.save_dir
            os.makedirs(self.save_dir)
        savefile=os.path.join(self.save_dir,self.model_name)
        print 'Saving file:',savefile
        self.saver.save(self.model.sess,savefile,global_step=step)

    def train(self, n_step=None):
        max_step=n_step or self.max_tr_steps

        if False:#debug
            print 'a'
            self.sess.run(self.train_op)
            print 'b'
            self.sess.run(self.summary_op)
            print 'c'
            self.sess.run(self.g_loss)
            print 'd'

        print 'max_step;',max_step
        for counter in trange(max_step):

            fetch_dict = {
                "train_op": self.train_op,
            }
            if counter%self.log_step==0:
                fetch_dict.update({
                    "summary": self.summary_op,
                    "g_loss": self.g_loss,
                    "global_step":self.step
                    })

            result = self.sess.run(fetch_dict)

            if counter % self.log_step == 0:
                g_loss=result['g_loss']
                step=result['global_step']
                self.summary_writer.add_summary(result['summary'],step)
                self.summary_writer.flush()

                print("[{}/{}] Reconstr Loss_G: {:.6f}".format(counter,max_step,g_loss))

            if counter % (10.*self.log_step) == 0:
                self.save(step=step)

        self.save()


##Just for reference##
    #def load(self, checkpoint_dir):
    #    print(" [*] Reading checkpoints...")
    #    checkpoint_dir = os.path.join(checkpoint_dir, self.model_dir)
    #    ckpt = tf.train.get_checkpoint_state(checkpoint_dir)
    #    if ckpt and ckpt.model_checkpoint_path:
    #        ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
    #        self.saver.restore(self.sess, os.path.join(checkpoint_dir, ckpt_name))
    #        print(" [*] Success to read {}".format(ckpt_name))
    #        return True
    #    else:
    #        print(" [*] Failed to find a checkpoint")
    #        return False
#def norm_img(image, data_format=None):
#    image = image/127.5 - 1.
#    if data_format:
#        image = to_nhwc(image, data_format)
#    return image
#def transform:
#    stuff
#  return np.array(cropped_image)/127.5 - 1.
#def denorm_img(norm, data_format):
#    return tf.clip_by_value(to_nhwc((norm + 1)*127.5, data_format), 0, 255)
#def inverse_transform(images):
#  return (images+1.)/2.


#if model.model_name=='began':
#    fake_labels=model.fake_labels
#    D_fake_labels=model.D_fake_labels
#    #result_dir=os.path.join('began',model.model_dir)
#    result_dir=model.model_dir
#    if str_step=='':
#        str_step=str( model.sess.run(model.step) )+'_'
#    attr=model.attr[list(model.cc.node_names)]
#elif model.model_name=='dcgan':
#    fake_labels=model.fake_labels
#    D_fake_labels=model.D_labels_for_fake
#    result_dir=model.checkpoint_dir
#    attr=0.5*(model.attributes+1)
#    attr=attr[list(model.cc.names)]


================================================
FILE: figure_scripts/high_level.py
================================================
import tensorflow as tf
import numpy as np
import os
import scipy.misc
import numpy as np
import pandas as pd
from tqdm import trange,tqdm
import pandas as pd
from itertools import combinations, product
import sys
from utils import save_figure_images,make_sample_dir,guess_model_step
from sample import get_joint,sample,find_logit_percentile


'''
This is a file where each function creates a particular figure. No real need
for this to be configurable. Just make a new function for each figure

This uses functions in sample.py and distribution.py, which are intended to
be lower level functions that can be used more generally.

'''


def fig1(model, output_folder):
    '''
    This function makes two 2x10 images
    showing the difference between conditioning
    and intervening
    '''

    str_step=guess_model_step(model)
    fname=os.path.join(output_folder,str_step+model.model_type)

    for key in ['Young','Smiling','Wearing_Lipstick','Male','Mouth_Slightly_Open','Narrow_Eyes']:
    #for key in ['Mustache','Bald']:
    #for key in ['Mustache']:
        print 'Starting ',key,
        #for key in ['Bald']:

        p50,n50=find_logit_percentile(model,key,50)
        do_dict={key:np.repeat([p50],10)}
        eps=3
        cond_dict={key:np.repeat([+eps],10)}

        out,_=sample(model,do_dict=do_dict)
        intv_images=out['G']

        out,_=sample(model,cond_dict=cond_dict)
        cond_images=out['G']

        images=np.vstack([intv_images,cond_images])
        dc_file=fname+'_'+key+'_topdo1_botcond1.pdf'
        save_figure_images(model.model_type,images,dc_file,size=[2,10])

        do_dict={key:np.repeat([p50,n50],10)}
        cond_dict={key:np.repeat([+eps,-eps],10)}

        dout,_=sample(model,do_dict=do_dict)
        cout,_=sample(model,cond_dict=cond_dict)

        itv_file  = fname+'_'+key+'_topdo1_botdo0.pdf'
        cond_file  = fname+'_'+key+'_topcond1_botcond0.pdf'
        eps=3

        save_figure_images(model.model_type,dout['G'],itv_file,size=[2,10])
        save_figure_images(model.model_type,cout['G'],cond_file,size=[2,10])
        print '..finished ',key

    #return images,cout['G'],dout['G']
    return key


================================================
FILE: figure_scripts/pairwise.py
================================================
from __future__ import print_function
import time
import tensorflow as tf
import os
import scipy.misc
import numpy as np
from tqdm import trange

import pandas as pd
from itertools import combinations
import sys
from sample import sample


def calc_tvd(label_dict,attr):
    '''
    attr should be a 0,1 pandas dataframe with
    columns corresponding to label names

    for example:
    names=zip(*self.graph)[0]
    calc_tvd(label_dict,attr[names])

    label_dict should be a dictionary key:1d-array of samples
    '''
    ####Calculate Total Variation####
    if np.min(attr.values)<0:
        raise ValueError('calc_tvd received \
                 attr that may not have been in {0,1}')

    label_names=label_dict.keys()
    attr=attr[label_names]

    df2=attr.drop_duplicates()
    df2 = df2.reset_index(drop = True).reset_index()
    df2=df2.rename(columns = {'index':'ID'})
    real_data_id=pd.merge(attr,df2)
    real_counts = pd.value_counts(real_data_id['ID'])
    real_pdf=real_counts/len(attr)

    label_list_dict={k:np.round(v.ravel()) for k,v in label_dict.items()}
    df_dat=pd.DataFrame.from_dict(label_list_dict)
    dat_id=pd.merge(df_dat,df2,on=label_names,how='left')
    dat_counts=pd.value_counts(dat_id['ID'])
    dat_pdf = dat_counts / dat_counts.sum()
    diff=real_pdf.subtract(dat_pdf, fill_value=0)
    tvd=0.5*diff.abs().sum()
    return tvd


def crosstab(model,result_dir=None,report_tvd=True,no_save=False,N=500000):
    '''
    This is a script for outputing [0,1/2], [1/2,1] binned pdfs
    including the marginals and the pairwise comparisons

    report_tvd is given as optional because it is somewhat time consuming

    result_dir is where to save the distribution text files. defaults to
    model.cc.model_dir

    '''
    result_dir=result_dir or model.cc.model_dir
    result={}

    n_labels=len(model.cc.nodes)

    #Not really sure how this should scale
    #N=1000*n_labels
    #N=500*n_labels**2#open to ideas that avoid a while loop
    #N=12000

    #tvd will not be reported as low unless N is large
    #N=500000 #default

    print('Calculating joint distribution with',)

    t0=time.time()
    label_dict=sample(model,fetch_dict=model.cc.label_dict,N=N)
    print('sampling model N=',N,' times took ',time.time()-t0,'sec')


    #fake_labels=model.cc.fake_labels

    str_step=str( model.sess.run(model.cc.step) )+'_'

    attr=model.data.attr
    attr=attr[model.cc.node_names]

    lab_xtab_fn = os.path.join(result_dir,str_step+'glabel_crosstab.txt')
    print('Writing to files:',lab_xtab_fn)

    if report_tvd:
        t0=time.time()
        tvd=calc_tvd(label_dict,attr)
        result['tvd']=tvd
        print('calculating tvd from samples took ',time.time()-t0,'sec')

        if no_save:
            return result

    t0=time.time()

    joint={}
    label_joint={}
    #for name, lab in zip(model.cc.node_names,list_labels):
    for name, lab in label_dict.items():
        joint[name]={ 'g_fake_label':lab }


    #with open(dfl_xtab_fn,'w') as dlf_f, open(lab_xtab_fn,'w') as lab_f, open(gvsd_xtab_fn,'w') as gldf_f:
    with open(lab_xtab_fn,'w') as lab_f:
        if report_tvd:
            lab_f.write('TVD:'+str(tvd)+'\n\n')
        lab_f.write('Marginals:\n')

        #Marginals
        for name in joint.keys():
            lab_f.write('Node: '+name+'\n')

            true_marg=np.mean((attr[name]>0.5).values)
            lab_marg=(joint[name]['g_fake_label'] > 0.5).astype('int')

            lab_f.write('  mean='+str(np.mean(lab_marg))+'\t'+\
                        'true mean='+str(true_marg)+'\n')

            lab_f.write('\n')


        #Pairs of labels
        lab_f.write('\nPairwise:\n')

        for node1,node2 in combinations(joint.keys(),r=2):

            lab_node1=(joint[node1]['g_fake_label']>0.5).astype('int')
            lab_node2=(joint[node2]['g_fake_label']>0.5).astype('int')
            lab_df=pd.DataFrame(data=np.hstack([lab_node1,lab_node2]),columns=[node1,node2])
            lab_ct=pd.crosstab(index=lab_df[node1],columns=lab_df[node2],margins=True,normalize=True)

            true_ct=pd.crosstab(index=attr[node1],columns=attr[node2],margins=True,normalize=True)


            lab_f.write('\n\tFake:\n')
            lab_ct.to_csv(lab_xtab_fn,mode='a')
            lab_f.write( lab_ct.__repr__() )
            lab_f.write('\n\tReal:\n')
            lab_f.write( true_ct.__repr__() )

            lab_f.write('\n\n')

    print('calculating pairwise crosstabs and saving results took ',time.time()-t0,'sec')
    return result


================================================
FILE: figure_scripts/probability_table.txt
================================================


model: celebA_0627_200239
    graph:MLS

    [img,cc,d_fake_labels,true]

    P(M=1|S=1) = [0.28, 

    
================================================
FILE: figure_scripts/sample.py
================================================
from __future__ import print_function
import tensorflow as tf
import numpy as np
import os
import scipy.misc
import numpy as np
from tqdm import trange,tqdm

import pandas as pd
from itertools import combinations, product
import sys

from utils import save_figure_images#makes grid image plots

#convenience functions
from utils import make_sample_dir,guess_model_step,infer_grid_image_shape


from IPython.core import debugger
debug = debugger.Pdb().set_trace


def find_logit_percentile(model, key, per):
    data=[]
    for _ in range(30):
        data.append(model.sess.run(model.cc.node_dict[key].label_logit))
    D=np.vstack(data)
    pos_logits,neg_logits=D[D>0], D[D<0]
    pos_tile = np.percentile(pos_logits,per)
    neg_tile = np.percentile(neg_logits,100-per)
    return pos_tile,neg_tile

def fixed_label_diversity(model, config,step=''):
    sample_dir=make_sample_dir(model)
    str_step=str(step) or guess_model_step(model)

    N=64#per image
    n_combo=5#n label combinations

    #0,1 label combinations
    fixed_labels=model.attr.sample(n_combo)[model.cc.node_names]
    size=infer_grid_image_shape(N)

    for j, fx_label in enumerate(fixed_labels.values):
        fx_label=np.reshape(fx_label,[1,-1])
        fx_label=np.tile(fx_label,[N,1])
        do_dict={model.cc.labels: fx_label}

        images, feed_dict= sample(model, do_dict=do_dict)
        fx_file=os.path.join(sample_dir, str_step+'fxlab'+str(j)+'.pdf')
        save_figure_images(model.model_type,images['G'],fx_file,size=size)

    #which image is what label
    fixed_labels=fixed_labels.reset_index(drop=True)
    fixed_labels.to_csv(os.path.join(sample_dir,str_step+'fxlab'+'.csv'))


def get_joint(model, int_do_dict=None,int_cond_dict=None, N=6400,return_discrete=True):
    '''
    Returns a dictionary of dataframes of samples.
    Each dataframe correponds to a different tensor i.e. cc labels, d_labeler
    labels etc.

    int_do_dict and int_cond_dict indicate that just a simple +1 or 0 should be
    passed in
    ex: int_do_dict={'Wearing_Lipstick':+1}


    Ex: if intervention=+1 corresponds to logits uniform in [0,0.6], pass
    np.linspace(0,0.6,n)

    N is number of batches to sample at each location in logitspace (num_labels
    dimensional)
    '''

    #values are either +1 or -1 in cond and do dict

    do_dict,cond_dict={},{}
    if int_do_dict is not None:
        for key,value in int_do_dict.items():
            #Intervene in the middle of where the model is used to operating
            print('calculating percentile...')
            data=[]
            for _ in range(30):
                data.append(model.sess.run(model.cc.node_dict[key].label_logit))
            D=np.vstack(data)
            pos_logits,neg_logits=D[D>0], D[D<0]
            if value == 1:
                intv = np.percentile(pos_logits,50)
            elif value == 0:
                intv = np.percentile(neg_logits,50)
            else:
                raise ValueError('pass either +1 or 0')
            do_dict[key]=np.repeat([intv],N)


    if int_cond_dict is not None:
        for key,value in int_cond_dict.items():
            eps=3.
            if value == 1:
                cond_dict[key]=np.repeat([+eps],N)
            elif value == 0:
                cond_dict[key]=np.repeat([-eps],N)
            else:
                raise ValueError('pass either +1 or 0')

    #print 'getjoint: cond_dict:',cond_dict
    #print 'getjoint: do_dict:',do_dict

    #Terminology
    if model.model_type=='began':
        fake_labels=model.fake_labels
        D_fake_labels=model.D_fake_labels
        D_real_labels=model.D_real_labels
    elif model.model_type=='dcgan':
        fake_labels=model.fake_labels
        D_fake_labels=model.D_labels_for_fake
        D_real_labels=model.D_labels_for_real

    #fetch_dict={'cc_labels':model.cc.labels}
    fetch_dict={'d_fake_labels':D_fake_labels,
                'cc_labels':model.cc.labels}

    if model.model_type=='began':#dcgan not fully connected
        if not cond_dict and not do_dict:
            #Havent coded conditioning on real data
            fetch_dict.update({'d_real_labels':D_real_labels})


    print('Calculating joint distribution')
    result,_=sample(model, cond_dict=cond_dict, do_dict=do_dict,N=N,
                    fetch=fetch_dict,return_failures=False)
    print('fetd keys:',fetch_dict.keys())
    result={k:result[k] for k in fetch_dict.keys()}

    n_labels=len(model.cc.node_names)
    #list_labels=np.split( result['cfl'],n_labels, axis=1)
    #list_d_fake_labels=np.split(result['dfl'],n_labels, axis=1)
    #list_d_real_labels=np.split(result['drl'],n_labels, axis=1)

    for k in result.keys():
        print('valshape',result[k].shape)
        print('result',result[k])
    list_result={k:np.split(val,n_labels, axis=1) for k,val in result.items()}

    pd_joint={}
    for key,r in list_result.items():
        joint={}
        for name,val in zip(model.cc.node_names,r):
            int_val=(val>0.5).astype('int')
            joint[name]=int_val.ravel()
        pd_joint[key]=pd.DataFrame.from_dict(joint)

    return pd_joint


    for name, lab, dfl in zip(model.cc.node_names,list_labels,list_d_fake_labels):
        if return_discrete:
            cfl_val=(lab>0.5).astype('int')
            dfl_val=(dfl>0.5).astype('int')

        joint['dfl'][name]=dfl_val
        joint['cfl'][name]=cfl_val


    cfl=pd.DataFrame.from_dict( {k:val.ravel() for k,val in joint['cfl'].items()} )
    dfl=pd.DataFrame.from_dict( {k:val.ravel() for k,val in joint['cfl'].items()} )

    print('get_joint successful')
    return cfl,dfl


#__________

def take_product(do_dict):
    '''
    this function takes some dictionary like:
        {key1:1, key2:[a,b], key3:[c,d]}
    and returns the dictionary:
        {key1:[1,1,1], key2[a,a,b,b,],key3[c,d,c,d]}
    computing the product of values
    '''
    values=[]
    for v in do_dict.values():
        if hasattr(v,'__iter__'):
            values.append(v)
        else:
            values.append([v])#allows scalar to be passed

    prod_values=np.vstack(product(*values))
    return {k:np.array(v) for k,v in zip(do_dict.keys(),zip(*prod_values))}


def chunks(input_dict, chunk_size):
    """
    Yield successive n-sized chunks.
    Takes a dictionary of iterables and makes an
    iterable of dictionaries
    """
    if len(input_dict)==0:
        return [{}]

    n=chunk_size
    batches=[]

    L=len(input_dict.values()[0])
    for i in xrange(0, L, n):
        fd={}
        n=n- max(0, (i+n) - L )#incase doesn't evenly divide
        for key,value in input_dict.items():
            fd[key]=value[i:i+n]

        batches.append(fd)
    return batches


def do2feed( do_dict, model, on_logits=True):
    '''
    this contains logit for parsing "do_dict"
    into a feed dict that can actually be worked with
    '''
    feed_dict={}
    for key,value in do_dict.items():
        if isinstance(key,tf.Tensor):
            feed_dict[key]=value
        elif isinstance(key,str):
            if key in model.cc.node_names:
                node=model.cc.node_dict[key]
                if on_logits:# intervene on logits by default
                    feed_dict[node.label_logit]=value
                else:
                    feed_dict[node.label]=value
            elif hasattr(model,key):
                feed_dict[getattr(model,key)]=value
            else:
                raise ValueError('string keys must be attributes of either\
                                 model.cc or model. Got string:',key)
        else:
            raise ValueError('keys must be tensors or strings but got',type(key))

    #Make sure [64,] isn't passed to [64,1] for example
    for tensor,value in feed_dict.items():
        #Make last dims line up:
        tf_shape=tensor.get_shape().as_list()
        shape=[len(value)]+tf_shape[1:]
        try:
            feed_dict[tensor]=np.reshape(value,shape)
        except Exception,e:
            print('Unexpected difficulty reshaping inputs:',tensor.name, tf_shape, len(value), np.size(value))
            raise e
    return feed_dict

def cond2fetch( cond_dict=None, model=None, on_logits=True):
    '''
    this contains logit for parsing "cond_dict"
    into a fetch dict that can actually be worked with.
    A fetch dict can be passed into the first argument
    of session.run and therefore has values that are all tensors
    '''
    cond_dict=cond_dict or {}

    fetch_dict={}
    for key,value in cond_dict.items():
        if isinstance(value,tf.Tensor):
            fetch_dict[key]=value#Nothing to be done
        elif isinstance(key,tf.Tensor):
            fetch_dict[key]=key#strange scenario, but possible
        elif isinstance(key,str):
            if key in model.cc.node_names:
                node=model.cc.node_dict[key]
                if on_logits:# intervene on logits by default
                    fetch_dict[key]=node.label_logit
                else:
                    fetch_dict[key]=node.label
            elif hasattr(model,key):
                fetch_dict[key]=getattr(model,key)
            else:
                raise ValueError('string keys must be attributes of either\
                                 model.cc or model. Got string:',key)
        else:
            raise ValueError('keys must be tensors or strings but got',type(key))

    return fetch_dict


def interpret_dict( a_dict, model,n_times=1, on_logits=True):
    '''
    pass either a do_dict or a cond_dict.
    The rules for converting arguments to numpy arrays to pass
    to tensorflow are identical
    '''
    if a_dict is None:
        return {}
    elif len(a_dict)==0:
        return {}

    if n_times>1:
        token=tf.placeholder_with_default(2.22)
        a_dict[token]=-2.22

    p_a_dict=take_product(a_dict)

    ##Need divisible batch_size for most models
    if len(p_a_dict)>0:
        L=len(p_a_dict.values()[0])
    else:
        L=0
    print("L is " + str(L))
    print(p_a_dict)

    ##Check compatability batch_size and L
    if L>=model.batch_size:
        if not L % model.batch_size == 0:
            raise ValueError('a_dict must be dividable by batch_size\
                             but instead product of inputs was of length',L)
    elif model.batch_size % L == 0:
        p_a_dict = {key:np.repeat(value,model.batch_size/L,axis=0) for key,value in p_a_dict.items()}
    else:
        raise ValueError('No. of intervened values must divide batch_size.')
    return p_a_dict


def slice_dict(feed_dict, rows):
    '''
    conditional sampling requires doing only certain indicies depending
    on the result of the previous iteration.
    This function takes a feed_dict and "slices" it,
    returning a dictionary with the same keys, but with values[rows,:]
    '''
    fd_out={}
    for key,value in feed_dict.iteritems():
        fd_out[key]=value[rows]
    return fd_out


def did_succeed( output_dict, cond_dict ):
    '''
    Used in rejection sampling:
    for each row, determine if cond is satisfied
    for every cond in cond_dict

    success is hardcoded as being more extreme
    than the condition specified
    '''
    test_key=cond_dict.keys()[0]
    #print('output_dict:',np.squeeze(output_dict[test_key]))
    #print('cond_dict:',cond_dict[test_key])


    #definition success:
    def is_win(key):
        cond=np.squeeze(cond_dict[key])
        val=np.squeeze(output_dict[key])
        cond1=np.sign(val)==np.sign(cond)
        cond2=np.abs(val)>np.abs(cond)
        return cond1*cond2


    scoreboard=[is_win(key) for key in cond_dict]
    #print('scoreboard', scoreboard)
    all_victories_bool=np.logical_and.reduce(scoreboard)
    return all_victories_bool.flatten()


def sample(model, cond_dict=None, do_dict=None, fetch_dict=None,N=None,
           on_logits=True,return_failures=True):
    '''
    fetch_dict should be a dict of tensors to do sess.run on
    do_dict is a list of strings or tensors of the form:
    {'Male':1, model.z_gen:[0,1], model.cc.Smiling:[0.1,0.9]}

    N is used only if cond_dict and do_dict are None
    '''

    do_dict= do_dict or {}
    cond_dict= cond_dict or {}
    fetch_dict=fetch_dict or {'G':model.G}

    ##Handle the case where len querry doesn't divide batch_size
    #a_dict=cond_dict or do_dict
    #if a_dict:
    #    nsamples=len(a_dict.values()[0])
    #elif N:
    #    nsamples=N
    #else:
    #    raise ValueError('either pass a dictionary or N')


    ##Pad to be batch_size divisible
    #npad=(64-nsamples)%64
    #if npad>0:
    #    print("Warn. nsamples doesnt divide batch_size, pad=",npad)
    ##N+=npad

    #if npad>0:
    #    if do_dict:
    #        for k in do_dict.keys():
    #            keypad=np.tile(do_dict[k][0],[npad])
    #            do_dict[k]=np.concatenate([do_dict[k],keypad])

    #    if cond_dict:
    #        for k in cond_dict.keys():
    #            keypad=np.tile(cond_dict[k][0],[npad])
    #            cond_dict[k]=np.concatenate([cond_dict[k],keypad])

    verbose=False
    #verbose=True


    feed_dict = do2feed(do_dict, model, on_logits=on_logits)#{tensor:array}
    cond_fetch_dict= cond2fetch(cond_dict,model,on_logits=on_logits) #{string:tensor}
    fetch_dict.update(cond_fetch_dict)


    #print('actual cond_dict', cond_dict )#{}
    #print('actual do_dict', do_dict )#{}

    if verbose:
        print('feed_dict',feed_dict)
        print('fetch_dict',fetch_dict)

    if not cond_dict and do_dict:
        #Simply do intervention w/o loop
        if verbose:
            print('sampler mode:Interventional')

        #fds=chunks(feed_dict,model.batch_size)
        fds=chunks(feed_dict,model.default_batch_size)

        outputs={k:[] for k in fetch_dict.keys()}
        for fd in fds:
            out=model.sess.run(fetch_dict, fd)
            #outputs.append(out['G'])
            for k,val in out.items():
                outputs[k].append(val)

        for k in outputs.keys():
            outputs[k]=np.vstack(outputs[k])[:nsamples]
        return outputs,feed_dict
        #return np.vstack(outputs), feed_dict

    elif not cond_dict and not do_dict:
        #neither passed, but get N samples
        assert(N>0)
        if verbose:
            print('sampling model N=',N,' times')

        ##Should be variable batch_size allowed
        outputs=model.sess.run(fetch_dict,{model.batch_size:N})

        ##fds=chunks({'idx':range(npad+N)},model.batch_size)
        #fds=chunks({'idx':range(npad+N)},model.default_batch_size)

        #outputs={k:[] for k in fetch_dict.keys()}
        #for fd in fds:
        #    out=model.sess.run(fetch_dict)
        #    for k,val in out.items():
        #        outputs[k].append(val)
        #for k in outputs.keys():
        #    outputs[k]=np.vstack(outputs[k])[:nsamples]
        #return outputs, feed_dict

        return outputs


    #elif cond_dict and not do_dict:
    elif cond_dict:
    #Could also pass do_dict here to be interesting
        ##Implements rejection sampling
        if verbose:
            print('sampler mode:Conditional')
            print('conddict',cond_dict)

        rows=np.arange( len(cond_dict.values()[0]))#what idx do we need
        assert(len(rows)>=model.batch_size)#should already be true.

        if verbose:
            print('nrows:',len(rows))

        #init
        max_fail=4000
        #max_fail=10000
        n_fails=np.zeros_like(rows)
        remaining_rows=rows.copy()
        completed_rows=[]
        bad_rows=set()

        #null=lambda :[-1 for r in rows]
        if verbose:
            print('cond fetch_dict',fetch_dict)
        outputs={key:[np.zeros(fetch_dict[key].get_shape().as_list()[1:]) for r in rows] for key in fetch_dict}
        if verbose:
            print('n keys in outputs:',len(outputs.keys()))

        #debug()

        ii=0
        while( len(remaining_rows)>0 ):
            #debug()
            ii+=1
            #loop
            if not return_failures:
                if len(completed_rows)>=nsamples:
                    if verbose:
                        print('Have enough for now; breaking')
                    break
            iter_rows=remaining_rows[:model.batch_size]
            n_pad = model.batch_size - len(iter_rows)
            if verbose:
                print('Iter:',ii, 'to go:',len(iter_rows))
                #print('iter_rows:',len(iter_rows),':',iter_rows)
            #iter_rows.extend( [iter_rows[-1]]*n_pad )#just duplicate
            pad_iter_rows=list(iter_rows)+ ( [iter_rows[-1]]*n_pad )

            iter_rows=np.array(iter_rows)
            pad_iter_rows=np.array(pad_iter_rows)

            fed=slice_dict( feed_dict, pad_iter_rows )
            cond=slice_dict( cond_dict, pad_iter_rows )

            out=model.sess.run(fetch_dict, fed)

            bool_pass = did_succeed(out,cond)[:len(iter_rows)]
            if verbose:
                print('bool_pass:',len(bool_pass),':',bool_pass)
            pass_idx=iter_rows[bool_pass]
            fail_idx=iter_rows[~bool_pass]


            #yuck
            for key in out:
                for i,row_pass in enumerate(bool_pass):
                    idx=iter_rows[i]
                    if row_pass:
                        outputs[key][idx]=out[key][i]
                    else:
                        n_fails[idx]+=1

            good_rows=set( iter_rows[bool_pass] )
            completed_rows.extend(list(good_rows))
            #print('good_rows',good_rows)
            bad_rows=set( rows[ n_fails>=max_fail ] )
            #print('bad_rows',bad_rows)

            for key in out:
                for idx_giveup in bad_rows:
                    shape=fetch_dict[key].get_shape().as_list()[1:]
                    outputs[key][idx_giveup]=np.zeros(shape)
                    if verbose:
                        print('key:',key,' shape giveup:',shape)


            ##Remove rows
            remaining_rows=list( set(remaining_rows)-good_rows-bad_rows )

            #debug()

        if verbose:
            print('conditioning took',ii,' tries')
            n_fails.sort()
            print('10 most fail counts(limit=',max_fail,'):',n_fails[-10:])

        if verbose:
            print('means:')
            for k in outputs.keys():
                for v in outputs[k]:
                    print(np.mean(v))


        if not return_failures:
            #useful for pdf calculations.
            #not useful for image grids
            if verbose:
                print('Not returning failures!..',)
            for k in outputs.keys():
                outputs[k]=[outputs[k][i] for i in completed_rows]
                if verbose:
                    print('..Returning', len(completed_rows),'/',len(cond_dict.values()[0]))
        else:
            for k in outputs.keys():
                outputs[k]=outputs[k][:nsamples]

        for k in outputs.keys():
            if verbose:
                print('tobestacked:',len(outputs[k]))
                print('tobestacked:',isinstance(outputs[k][0],np.ndarray))

            values=outputs[k][:nsamples]
            if verbose:
                for v in values:
                    try:
                        print(v.shape)
                    except:
                        print(type(v))

            if len(fetch_dict[k].get_shape().as_list())>1:
                outputs[k]=np.stack(outputs[k])
            else:
                outputs[k]=np.concatenate(outputs[k])


        return outputs,cond_dict

    else:
        raise Exception('This should not happen')


def condition2d( model, cond_dict,cond_dict_name,step='', on_logits=True):
    '''
    Function largely copied from intervention2d with minor changes.

    This function is a wrapper around the more general function "sample".
    In this function, the cond_dict is assumed to have only two varying
    parameters on which a 2d interventions plot can be made.
    '''
    #TODO: Unify function with intervention2d

    if not on_logits:
        raise ValueError('on_logits=False not implemented')

    #Interpret defaults:
    #n_defaults=len( filter(lambda l:l == 'model_default', cond_dict.values() ))
    #accept any string for now
    n_defaults=len( filter(lambda l: isinstance(l,str), cond_dict.values() ))

    if n_defaults>0:
        print(n_defaults,' default values given..using 8 for each of them')

    try:
        for key,value in cond_dict.items():
            if value == 'model_default':
                print('Warning! using 1/2*model.intervention_range\
                      to specify the conditioning defaults')
                cond_min,cond_max=model.intervention_range[key]
                #cond_dict[key]=np.linspace(cond_min,cond_max,8)
                cond_dict[key]=[0.5*cond_min,0.5*cond_max]
                print('Condition dict used:',cond_dict)

            elif value=='int':
                #for integer pretrained models
                #eps=0.1 #usually logits are around 4-20
                eps=3 #usually logits are around 4-10
                #sigmoid(3) ~ 0.95
                cond_dict[key]=np.repeat([+eps,-eps],64) #logit on either size of 0
            elif value=='percentile':
                ##I'm changing this to do 50th percentile
                #of positive or of negative class
                print('calculating percentile...')
                data=[]
                for _ in range(30):
                    data.append(model.sess.run(model.cc.node_dict[key].label_logit))
                D=np.vstack(data)
                pos_logits,neg_logits=D[D>0], D[D<0]
                print("Conditioning on 5th percentile")
                pos_intv = np.percentile(pos_logits,5)
                neg_intv = np.percentile(neg_logits,95)
                cond_dict[key]=np.repeat([pos_intv,neg_intv],64)
                print('percentile5 for',key,'is',np.percentile(D,5))
                print('percentile25 for',key,'is',np.percentile(D,25))
                print('percentile50 for',key,'is',np.percentile(D,50))
                print('percentile75 for',key,'is',np.percentile(D,75))
                print('percentile95 for',key,'is',np.percentile(D,95))

                #OLD:
                ##fetch=cond2fetch(cond_dict)
                #print('...calculating percentile')
                #data=[]
                #for _ in range(30):
                #    data.append(model.sess.run(model.cc.node_dict[key].label_logit))
                #D=np.vstack(data)
                #print('dat',D.flatten())
                #cond_dict[key]=np.repeat([np.percentile(D,95),np.percentile(D,5)],64)
                #print('percentiles for',key,'are',[np.percentile(D,5),np.percentile(D,95)])


            else:
                #otherwise pass a number, list, or array
                assert(not isinstance(value,str))

    except Exception, e:
        raise(e,'Difficulty accessing default model interventions')


    str_step=str(step)

    lengths = [ len(v) for v in cond_dict.values() if hasattr(v,'__len__') ]
    #print('lengths',lengths)
    print('lengths',lengths)

    gt_one = filter(lambda l:l>1,lengths)

    if not 0<=len(gt_one)<=2:
        raise ValueError('for visualizing intervention, must have < 3 parameters varying')
    if len(gt_one) == 0:
        image_dim = np.sqrt(model.batch_size).astype(int)
        size = [image_dim,image_dim]
#    if len(gt_one)==1 and lengths[0]>=model.batch_size:
#        size=[gt_one[0],1]
#    elif len(gt_one)==1 and lengths[0]<model.batch_size:
#        image_dim = np.sqrt(model.batch_size).astype(int)
#        size = [image_dim,image_dim]
#    elif len(gt_one)==2:
#        size=[gt_one[0],gt_one[1]]
#

    elif len(gt_one)==2:
        size=[gt_one[0],gt_one[1]]

    else:
        N=np.prod(lengths)
        if N%8==0:
            #size=[N/8,8]
            size=[8,N/8]
        else:
            size=[8,8]


    #Terminology
    if model.model_type=='began':
        result_dir=model.model_dir
        if str_step=='':
            str_step=str( model.sess.run(model.step) )+'_'
    elif model.model_type=='dcgan':
        print('DCGAN')
        result_dir=model.checkpoint_dir

    sample_dir=os.path.join(result_dir,'sample_figures')
    if not os.path.exists(sample_dir):
        os.mkdir(sample_dir)

    out, _= sample(model, cond_dict=cond_dict,on_logits=on_logits)
    images=out['G']

    #print('Images shape:',images.shape)


    #cond_file=os.path.join(sample_dir, str_step+str(cond_dict_name)+'_cond'+'.png')
    cond_file=os.path.join(sample_dir,str_step+str(cond_dict_name)+'_cond'+'.pdf')

    #if os.path.exists(cond_file):
    #    cond_file='new'+cond_file #don't overwrite

    save_figure_images(model.model_type,images,cond_file,size=size)


def intervention2d(model, fetch=None, do_dict=None, do_dict_name=None, on_logits=True, step=''):
    '''
    This function is a wrapper around the more general function "sample".
    In this function, the do_dict is assumed to have only two varying
    parameters on which a 2d interventions plot can be made.
    '''
    #TODO: Unify function with condition2d

    if not on_logits:
        raise ValueError('on_logits=False not implemented')

    #Interpret defaults:
    #n_defaults=len( filter(lambda l:l == 'model_default', do_dict.values() ))
    #accept any string for now
    n_defaults=len( filter(lambda l: isinstance(l,str), do_dict.values() ))

    if n_defaults>0:
        print(n_defaults,' default values given..using 8 for each of them')

    try:
        for key,value in do_dict.items():
            if value == 'model_default':
                itv_min,itv_max=model.intervention_range[key]
                do_dict[key]=np.linspace(itv_min,itv_max,8)

            elif value=='int':
                #for integer pretrained models
                #eps=0.1 #usually logits are around 4-20
                eps=3 #usually logits are around 4-10
                #sigmoid(3) ~ 0.95
                do_dict[key]=np.repeat([-eps,+eps],64) #logit on either size of 0

            elif value=='percentile':
                ##I'm changing this to do 50th percentile
                #of positive or of negative class
                print('calculating percentile...')
                data=[]
                for _ in range(30):
                    data.append(model.sess.run(model.cc.node_dict[key].label_logit))
                D=np.vstack(data)
                pos_logits,neg_logits=D[D>0], D[D<0]
                pos_intv = np.percentile(pos_logits,50)
                neg_intv = np.percentile(neg_logits,50)
                do_dict[key]=np.repeat([pos_intv,neg_intv],64)
                print('percentile5 for',key,'is',np.percentile(D,5))
                print('percentile25 for',key,'is',np.percentile(D,25))
                print('percentile50 for',key,'is',np.percentile(D,50))
                print('percentile75 for',key,'is',np.percentile(D,75))
                print('percentile95 for',key,'is',np.percentile(D,95))
            else:
                #otherwise pass a number, list, or array
                assert(not isinstance(value,str))

    except Exception, e:
        raise(e,'Difficulty accessing default model interventions')


    str_step=str(step)

    lengths = [ len(v) for v in do_dict.values() if hasattr(v,'__len__') ]
    #print('lengths',lengths)
    print('lengths',lengths)

    gt_one = filter(lambda l:l>1,lengths)

    if not 0<=len(gt_one)<=2:
        raise ValueError('for visualizing intervention, must have < 3 parameters varying')
    if len(gt_one) == 0:
        #image_dim = np.sqrt(model.batch_size).astype(int)
        image_dim = np.sqrt(64).astype(int)
        size = [image_dim,image_dim]

    #if len(gt_one)==1 and lengths[0]>=model.batch_size:
    #    size=[gt_one[0],1]
    #elif len(gt_one)==1 and lengths[0]<model.batch_size:
    #    #image_dim = np.sqrt(model.batch_size).astype(int)
    #    image_dim = np.sqrt(64).astype(int)
    #    size = [image_dim,image_dim]
    elif len(gt_one)==2:
        size=[gt_one[0],gt_one[1]]

    else:
        N=np.prod(lengths)
        if N%8==0:
            #size=[N/8,8]
            size=[8,N/8]
        else:
            size=[8,8]

    #Terminology
    if model.model_type=='began':
        result_dir=model.model_dir
        if str_step=='':
            str_step=str( model.sess.run(model.step) )+'_'
    elif model.model_type=='dcgan':
        print('DCGAN')
        result_dir=model.checkpoint_dir

    sample_dir=os.path.join(result_dir,'sample_figures')
    if not os.path.exists(sample_dir):
        os.mkdir(sample_dir)

    #print('do_dict DEBUG:',do_dict)
    out, feed_dict= sample(model, do_dict=do_dict,on_logits=on_logits)
    images=out['G']


    itv_file=os.path.join(sample_dir, str_step+str(do_dict_name)+'_intv'+'.pdf')
    #itv_file=os.path.join(sample_dir, str_step+str(do_dict_name)+'_intv'+'.png')

    #if os.path.exists(itv_file):
    #    itv_file='new'+itv_file #don't overwrite

    save_figure_images(model.model_type,images,itv_file,size=size)


================================================
FILE: figure_scripts/utils.py
================================================
from __future__ import print_function,division
import tensorflow as tf
import os
from os import listdir
from os.path import isfile, join
import shutil
import sys
import math
import json
import logging
import numpy as np
from PIL import Image
from datetime import datetime

import tensorflow as tf
from PIL import Image

import math
import random
import pprint
import scipy.misc
import numpy as np
from time import gmtime, strftime
from six.moves import xrange

pp = pprint.PrettyPrinter()

def nhwc_to_nchw(x):
    return tf.transpose(x, [0, 3, 1, 2])
def to_nchw_numpy(image):
    if image.shape[3] in [1, 3]:
        new_image = image.transpose([0, 3, 1, 2])
    else:
        new_image = image
    return new_image

def norm_img(image, data_format=None):
    #image = tf.cast(image,tf.float32)/127.5 - 1.
    image = image/127.5 - 1.
    #if data_format:
        #image = to_nhwc(image, data_format)
    if data_format=='NCHW':
        image = to_nchw_numpy(image)

    image=tf.cast(image,tf.float32)
    return image


#Denorming
def nchw_to_nhwc(x):
    return tf.transpose(x, [0, 2, 3, 1])
def to_nhwc(image, data_format):
    if data_format == 'NCHW':
        new_image = nchw_to_nhwc(image)
    else:
        new_image = image
    return new_image
def denorm_img(norm, data_format):
    return tf.clip_by_value(to_nhwc((norm + 1)*127.5, data_format), 0, 255)


def read_prepared_uint8_image(img_path):
    '''
    img_path should point to a uint8 image that is
    already cropped and resized
    '''
    cropped_image=scipy.misc.imread(img_path)
    if not np.all( np.array([64,64,3])==cropped_image.shape):
        raise ValueError('image must already be cropped and resized:',img_path)
    #TODO: warn if wrong dtype
    return cropped_image

def make_encode_dir(model,image_name):
    #Terminology
    if model.model_type=='began':
        result_dir=model.model_dir
    elif model.model_type=='dcgan':
        print('DCGAN')
        result_dir=model.checkpoint_dir
    encode_dir=os.path.join(result_dir,'encode_'+str(image_name))
    if not os.path.exists(encode_dir):
        os.mkdir(encode_dir)
    return encode_dir

def make_sample_dir(model):
    #Terminology
    if model.model_type=='began':
        result_dir=model.model_dir
    elif model.model_type=='dcgan':
        print('DCGAN')
        result_dir=model.checkpoint_dir

    sample_dir=os.path.join(result_dir,'sample_figures')
    if not os.path.exists(sample_dir):
        os.mkdir(sample_dir)
    return sample_dir

def guess_model_step(model):
    if model.model_type=='began':
        str_step=str( model.sess.run(model.step) )+'_'
    elif model.model_type=='dcgan':
        result_dir=model.checkpoint_dir
        ckpt = tf.train.get_checkpoint_state(result_dir)
        ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
        str_step=ckpt_name[-5:]+'_'
    return str_step

def infer_grid_image_shape(N):
    if N%8==0:
        size=[8,N//8]
    else:
        size=[8,8]
    return size


def save_figure_images(model_type, tensor, filename, size, padding=2, normalize=False, scale_each=False):

    print('[*] saving:',filename)

    #nrow=size[0]
    nrow=size[1]#Was this number per row and now number of rows?

    if model_type=='began':
        began_save_image(tensor,filename,nrow,padding,normalize,scale_each)
    elif model_type=='dcgan':
        #images = np.split(tensor,len(tensor))
        images=tensor
        dcgan_save_images(images,size,filename)


#Began originally
def make_grid(tensor, nrow=8, padding=2,
              normalize=False, scale_each=False):
    """Code based on https://github.com/pytorch/vision/blob/master/torchvision/utils.py"""
    nmaps = tensor.shape[0]
    xmaps = min(nrow, nmaps)
    ymaps = int(math.ceil(float(nmaps) / xmaps))
    height, width = int(tensor.shape[1] + padding), int(tensor.shape[2] + padding)
    grid = np.zeros([height * ymaps + 1 + padding // 2, width * xmaps + 1 + padding // 2, 3], dtype=np.uint8)
    k = 0
    for y in range(ymaps):
        for x in range(xmaps):
            if k >= nmaps:
                break
            h, h_width = y * height + 1 + padding // 2, height - padding
            w, w_width = x * width + 1 + padding // 2, width - padding

            grid[h:h+h_width, w:w+w_width] = tensor[k]
            k = k + 1
    return grid

def began_save_image(tensor, filename, nrow=8, padding=2,
               normalize=False, scale_each=False):
    ndarr = make_grid(tensor, nrow=nrow, padding=padding,
                            normalize=normalize, scale_each=scale_each)
    im = Image.fromarray(ndarr)
    im.save(filename)


#Dcgan originally
get_stddev = lambda x, k_h, k_w: 1/math.sqrt(k_w*k_h*x.get_shape()[-1])

def get_image(image_path, input_height, input_width,
              resize_height=64, resize_width=64,
              is_crop=True, is_grayscale=False):
  image = imread(image_path, is_grayscale)
  return transform(image, input_height, input_width,
                   resize_height, resize_width, is_crop)

def dcgan_save_images(images, size, image_path):
  return imsave(inverse_transform(images), size, image_path)

def imread(path, is_grayscale = False):
  if (is_grayscale):
    return scipy.misc.imread(path, flatten = True).astype(np.float)
  else:
    return scipy.misc.imread(path).astype(np.float)

def merge_images(images, size):
  return inverse_transform(images)

def merge(images, size):
  h, w = images.shape[1], images.shape[2]
  img = np.zeros((h * size[0], w * size[1], 3))
  for idx, image in enumerate(images):
    i = idx % size[1]
    j = idx // size[1]
    img[j*h:j*h+h, i*w:i*w+w, :] = image
  return img

def imsave(images, size, path):
  return scipy.misc.imsave(path, merge(images, size))

def center_crop(x, crop_h, crop_w,
                resize_h=64, resize_w=64):
  if crop_w is None:
    crop_w = crop_h
  h, w = x.shape[:2]
  j = int(round((h - crop_h)/2.))
  i = int(round((w - crop_w)/2.))
  return scipy.misc.imresize(
      x[j:j+crop_h, i:i+crop_w], [resize_h, resize_w])

def transform(image, input_height, input_width, 
              resize_height=64, resize_width=64, is_crop=True):
  if is_crop:
    cropped_image = center_crop(
      image, input_height, input_width, 
      resize_height, resize_width)
  else:
    cropped_image = scipy.misc.imresize(image, [resize_height, resize_width])
  return np.array(cropped_image)/127.5 - 1.

def inverse_transform(images):
  return (images+1.)/2.


================================================
FILE: main.py
================================================
from __future__ import print_function
import numpy as np
import os
import tensorflow as tf

from trainer import Trainer
from causal_graph import get_causal_graph
from utils import prepare_dirs_and_logger, save_configs

#Generic configuration arguments
from config import get_config
#Submodel specific configurations
from causal_controller.config import get_config as get_cc_config
from causal_dcgan.config import get_config as get_dcgan_config
from causal_began.config import get_config as get_began_config

from causal_began import CausalBEGAN
from causal_dcgan import CausalGAN

from IPython.core import debugger
debug = debugger.Pdb().set_trace

def get_trainer():
    print('tf: resetting default graph!')
    tf.reset_default_graph()#for repeated calls in ipython


    ####GET CONFIGURATION####
    #TODO:load configurations from previous model when loading previous model
    ##if load_path:
        #load config files from dir
    #except if pt_load_path, get cc_config from before
    #overwrite is_train, is_pretrain with current args--sort of a mess

    ##else:
    config,_=get_config()
    cc_config,_=get_cc_config()
    dcgan_config,_=get_dcgan_config()
    began_config,_=get_began_config()

    ###SEEDS###
    np.random.seed(config.seed)
    #tf.set_random_seed(config.seed) # Not working right now.

    prepare_dirs_and_logger(config)
    if not config.load_path:
        print('saving config because load path not given')
        save_configs(config,cc_config,dcgan_config,began_config)

    #Resolve model differences and batch_size
    if config.model_type:
        if config.model_type=='dcgan':
            config.batch_size=dcgan_config.batch_size
            cc_config.batch_size=dcgan_config.batch_size # make sure the batch size of cc is the same as the image model
            config.Model=CausalGAN.CausalGAN
            model_config=dcgan_config
        if config.model_type=='began':
            config.batch_size=began_config.batch_size
            cc_config.batch_size=began_config.batch_size # make sure the batch size of cc is the same as the image model
            config.Model=CausalBEGAN.CausalBEGAN
            model_config=began_config

    else:#no image model
        model_config=None
        config.batch_size=cc_config.batch_size

        if began_config.is_train or dcgan_config.is_train:
            raise ValueError('need to specify model_type for is_train=True')

    #Interpret causal_model keyword
    cc_config.graph=get_causal_graph(config.causal_model)

    #Builds and loads specified models:
    trainer=Trainer(config,cc_config,model_config)
    return trainer

def main(trainer):
    #Do pretraining
    if trainer.cc_config.is_pretrain:
        trainer.pretrain_loop()

    if trainer.model_config:
        if trainer.model_config.is_train:
            trainer.train_loop()

if __name__ == "__main__":
    trainer=get_trainer()

    #make ipython easier
    sess=trainer.sess
    cc=trainer.cc
    if hasattr(trainer,'model'):
        model=trainer.model

    main(trainer)

    tf.logging.set_verbosity(tf.logging.ERROR)

================================================
FILE: synthetic/README.md
================================================
# Causal(BE)GAN in Tensorflow

# (test comment)

Synthetic Data Figures
<> (Tensorflow implementation of [BEGAN: Boundary Equilibrium Generative Adversarial Networks](https://arxiv.org/abs/1703.10717).)

Authors' Tensorflow implementation Synthetic portion of [CausalGAN: Learning Implicit Causal Models with Adversarial Training]

<>some results files

## Setup.

If not already set, make sure that run_datasets.sh is an executable by running
    $ chmod +x run_datasets.sh

## Usage

A single run of main.py trains as many GANs as are in models.py (presently 6) for a single --data_type. This author can fit 3 such runs on a single gpu and conveniently there are 3 datasets considered.

    $ CUDA_VISIBLE_DEVICES='0' python main.py --data_type=linear

Again the tboard.py utility is available to view the most recent model summaries.

    $ python tboard.py

Recovering statistics means averaging over many runs. Mass usage follows the script run_datasets.sh. This bash script will train all GAN models on each of 3 datasets 30 times per dataset. The following will train 2(calls) x 30(loop/call) x 3(datasets/loop) x 6(gan models/dataset)=1080(gan models)


    $ (open first terminal)
    $ CUDA_VISIBLE_DEVICES='0' ./run_datasets.sh
    $ (open second terminal)
    $ CUDA_VISIBLE_DEVICES='1' ./run_datasets.sh


## Collecting Statistics


## Results


## Authors

Christopher Snyder / [@22csnyder](http://22csnyder.github.io)
Murat Kocaoglu / [@mkocaoglu](http://mkocaoglu.github.io)


================================================
FILE: synthetic/collect_stats.py
================================================
import pandas as pd
import numpy as np
import time
from scipy import stats
import os
import matplotlib.pyplot as plt
from models import GeneratorTypes,DataTypes
import brewer2mpl


def makeplots(x_iter,tvd_datastore,show=False,save=False,save_name=None):
    #Make plots
    dtypes=tvd_datastore.keys()
    fig,axes=plt.subplots(1,len(dtypes))

    #fig.subplots_adjust(hspace=0.5,wspace=0.025)
    fig.subplots_adjust(hspace=0.75,wspace=0.05)

    x_iter=x_iter.astype('float')/1000


    for ax,dtype in zip(axes,dtypes):
        if ax in axes[:-1]:
            use_legend = False
        else:
            use_legend = True

        if ax==axes[0]:
            prefix='Synthetic Data Graph:    '
            posfix='                                    '
        else:
            prefix=''
            posfix=''
        axtitle=prefix+dtype+posfix

        #df=pd.DataFrame.from_dict(tvd_datastore[dtype])


        df_tvd=pd.DataFrame(data={gtype:tvd_datastore[dtype][gtype]['tvd'] for gtype in gtypes})
        df_sem=pd.DataFrame(data={gtype:tvd_datastore[dtype][gtype]['sem'] for gtype in gtypes})
        df_tvd.index=x_iter;df_sem.index=x_iter


        df_tvd.plot.line(ax=ax,sharey=True,use_index=True,yerr=df_sem,legend=use_legend,capsize=5,capthick=3,elinewidth=1,errorevery=100)


        ax.set_title(axtitle.title(),fontsize=18)
        ax.set_ylabel('Total Variational Distance',fontsize=18)
        if ax is axes[1]:
            ax.set_xlabel('iter(thousands)',fontsize=18)

    t='Graph Structured Generator tvd Convergence on Synthetic Data with Known Causal Graph'
    plt.suptitle(t,fontsize=20)

    fig.set_figwidth(15,forward=True)
    fig.set_figheight(7,forward=True)

    if save:
        save_name=save_name or 'synth_tvd_vs_time.pdf'
        save_path=os.path.join('assets',save_name)

        plt.savefig(save_path,bbox_inches='tight')
        #plt.savefig(save_path)

    if show:
        plt.show(block=False)
    return fig,axes

def make_individual_plots(x_iter,tvd_datastore,smooth=True,show=False,save=False,save_name=None):
    fontsize=17.5
    tickfont=15

    gtypes=GeneratorTypes.keys()
    dtypes=tvd_datastore.keys()

    format_columns={
        'fc3'     :'FC3',
        'fc5'     :'FC5',
        'fc10'    :'FC10',
        'collider':'Collider',
        'linear'  :'Linear',
        'complete':'Complete',
        }

    #styles={
    #    'FC3'     :'bs-',
    #    'FC5'     :'ro-',
    #    'FC10'    :'y^-',
    #    'Collider':'g+-',
    #    'Linear'  :'m>-',
    #    'Complete':'kd-',
    #    }
    styles={
        'FC3'     :'s-',
        'FC5'     :'o-',
        'FC10'    :'^-',
        'Collider':'+-',
        'Linear'  :'>-',
        'Complete':'d-',
        }

    #bmap = brewer2mpl.get_map('Set2', 'qualitative', 7)
    #colors = bmap.mpl_colors
    colors=['b','r','y','g','m','k']
    markers=['s','o','^','+','>','d']

    #plt.style.use('seaborn-dark-palette')
    #plt.style.use('ggplot')
    plt.style.use('seaborn-deep')

    #Make plots

    #fig.subplots_adjust(hspace=0.5,wspace=0.025)
    #fig.subplots_adjust(hspace=0.75,wspace=0.05)

    x_iter=x_iter.astype('float')/1000

    for dtype in dtypes:
        use_legend=True

        #fig=plt.figure()

        df_tvd=pd.DataFrame(data={format_columns[gtype]:tvd_datastore[dtype][gtype]['tvd'] for gtype in gtypes})
        df_sem=pd.DataFrame(data={format_columns[gtype]:tvd_datastore[dtype][gtype]['sem'] for gtype in gtypes})
        df_tvd.index=x_iter;df_sem.index=x_iter


        if smooth:
            df_tvd=df_tvd.rolling(window=5,min_periods=1,center=True).mean()


        #styles=['bs-','ro-','y^-','g+-','m>-','kd-']

#        df_tvd.plot.line(use_index=True,yerr=df_sem,legend=use_legend,capsize=5,capthick=3,elinewidth=1,errorevery=100,figsize=(6,4),style=styles,markevery=10,markersize=100)
        #df_tvd.plot.line(use_index=True,yerr=df_sem,legend=use_legend,capsize=5,capthick=3,elinewidth=1,errorevery=100,figsize=(6,4),style=styles,markersize=100)

        fig=plt.figure()
        ax=fig.add_subplot(111)
        i=0
        for col in df_tvd.columns:
            #df_tvd[col].plot(ax=ax,use_index=True,yerr=df_sem[col],legend=use_legend,capsize=5,capthick=3,elinewidth=1,errorevery=100,figsize=(6,4),linestyle='-',color=colors[i],marker=markers[i],markevery=50,markersize=7)
            #print 'col',col#Linear last
            #df_tvd[col].plot(ax=ax,use_index=True,yerr=df_sem[col],legend=use_legend,capsize=5,capthick=3,elinewidth=1,errorevery=100,figsize=(6,4),linestyle='-',marker=markers[i],markevery=50,markersize=7)
            df_tvd[col].plot(ax=ax,use_index=True,yerr=df_sem[col],capsize=5,capthick=3,elinewidth=1,errorevery=100,figsize=(6,4),linestyle='-',marker=markers[i],markevery=50,markersize=7)
            i+=1

        ax.set_yscale('log')
        plt.legend()

        plt.xticks(fontsize=tickfont)
        plt.yticks(fontsize=tickfont)


        plt.ylim([0,1])

        plt.ylabel('Total Variation Distance',fontsize=fontsize)
        plt.xlabel('Iteration (in thousands)',fontsize=fontsize)

        if save:
            file_name=save_name or 'synth_tvd_vs_time.pdf'
            file_name=dtype+'_'+file_name
            save_path=os.path.join('assets',file_name)
            plt.savefig(save_path,bbox_inches='tight')
            #plt.savefig(save_path)

    if show:
        plt.show(block=False)


if __name__=='__main__':
    dtypes=DataTypes.keys()
    gtypes=GeneratorTypes.keys()

    logdir='logs/figure_logs'

    #init
    #Create a dictionary for each dataset, of dictionaries for each gen_type
    tvd_all_datastore={dt:{gt:[] for gt in gtypes} for dt in dtypes}
    tvd_datastore={dt:{} for dt in dtypes}
    runs=os.listdir(logdir)

    for dtype in dtypes:
        print ''
        print 'Collecting data for datatype ',dtype,'...'

        typed_runs=filter(lambda x:x.endswith(dtype),runs)

        for gtype in gtypes:
            n_runs=0

            #Go through all runs for each (dtype,gtype) pair
            for run in typed_runs:
                #tvd_csv={gt:os.path.join(logdir,run,gt,'tvd.csv') for gt in gtypes}
                tvd_csv=os.path.join(logdir,run,gtype,'tvd.csv')

                #cols=['step','tvd','mvd']
                dat=pd.read_csv(tvd_csv,sep=' ')

                if len(dat)!=1001:
                    print 'WARN: file',tvd_csv,'was of length:',len(dat),
                    print 'it may be in the process of optimizing.. not using'
                    continue

                #tvd_all_datastore[dtype][gtype]+=dat['tvd']
                tvd_all_datastore[dtype][gtype].append(dat['tvd'])
                n_runs+=1


            #after (dtype,gtype) collection
            if n_runs==0:
                #remove key since no matching gtype for dtype
                print 'Warning: for dtype',dtype,' no runs of gtype ',gtype
                #tvd_all_datastore[dtype].pop(gtype)
            else:
                df_concat=pd.concat(tvd_all_datastore[dtype][gtype],axis=1)
                gb=df_concat.groupby(by=df_concat.columns,axis=1)
                mean=gb.mean()
                sem=gb.sem().rename(columns={'tvd':'sem'})
                tvd_datastore[dtype][gtype]=pd.concat([mean,sem],axis=1)

                #tvd_all_datastore[dtype][gtype]/=n_runs

                #concat
                #groupby

        #after dtype collection
        if len(tvd_datastore[dtype])==0:
            print 'Warning: no runs of dtype ',dtype
            tvd_datastore.pop(dtype)


        print '...There were ',n_runs,' runs of ',dtype


    x_iter=dat['iter'].values


    #run in ipython depending on what you want
    #fig,axes=makeplots(x_iter,tvd_datastore,show=False,save=True)
    make_individual_plots(x_iter,tvd_datastore,smooth=True,show=True,save=True)


    time.sleep(10)


================================================
FILE: synthetic/config.py
================================================
import argparse
from models import DataTypes
def str2bool(v):
    return v is True or v.lower() in ('true', '1')

dtypes=DataTypes.keys()


arg_lists = []
parser = argparse.ArgumentParser()

def add_argument_group(name):
    arg = parser.add_argument_group(name)
    arg_lists.append(arg)
    return arg

#Pretrain network
data_arg=add_argument_group('Data')
gan_arg=add_argument_group('GAN')
misc_arg=add_argument_group('misc')
model_arg=add_argument_group('Model')

data_arg.add_argument('--data_type',type=str,choices=dtypes,
                      default='collider', help='''This is the graph structure
                      that generates the synthetic dataset through polynomials''')

gan_arg.add_argument('--gen_z_dim',type=int,default=10,
                     help='''dim of noise input for generator''')
gan_arg.add_argument('--gen_hidden_size',type=int,default=10,#3,
                     help='''hidden size used for layers of generator''')
gan_arg.add_argument('--disc_hidden_size',type=int,default=10,#6,
                     help='''hidden size used for layers of discriminator''')
gan_arg.add_argument('--lr_gen',type=float,default=0.0005,#0.005
                     help='''generator learning rate''')
gan_arg.add_argument('--lr_disc',type=float,default=0.0005,#0.0025
                     help='''discriminator learning rate''')

#broken
#misc_arg.add_argument('--save_pdfs',type=str2bool,default=False,
#                     help='''whether to save pdfs of scatterplots of x1x3 along
#                     with tensorboard summaries''')

misc_arg.add_argument('--model_dir',type=str,default='logs')
#misc_arg.add_argument('--np_random_seed', type=int, default=123)
#misc_arg.add_argument('--tf_random_seed', type=int, default=123)


model_arg.add_argument('--load_path',type=str,default='',
                       help='''Path to folder containing model to load. This
                       should be actual checkpoint to load. Example:
                       --load_path=./logs/0817_153755_collider/checkpoints/Model-50000''')
model_arg.add_argument('--is_train',type=str2bool,default=True,
                       help='''whether the model should train''')
model_arg.add_argument('--batch_size',type=int,default=64,
                      help='''batch_size for all generators and all
                       discriminators''')


def get_config():

    #setattr(config, 'data_dir', data_format)
    config, unparsed = parser.parse_known_args()
    return config, unparsed


================================================
FILE: synthetic/figure_generation.ipynb
================================================
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "tf: resetting default graph!\n",
      "Using data_type  linear\n",
      "Model directory is  ./logs/0818_072052_linear/checkpoints/Model-50000\n",
      "[*] MODEL dir: ./logs/0818_072052_linear/checkpoints/Model-50000\n",
      "[*] PARAM path: ./logs/0818_072052_linear/checkpoints/Model-50000/params.json\n",
      "GAN Model directory is  ./logs/0818_072052_linear/checkpoints/Model-50000/fc3\n",
      "GAN Model directory is  ./logs/0818_072052_linear/checkpoints/Model-50000/collider\n",
      "GAN Model directory is  ./logs/0818_072052_linear/checkpoints/Model-50000/fc5\n",
      "GAN Model directory is  ./logs/0818_072052_linear/checkpoints/Model-50000/linear\n",
      "GAN Model directory is  ./logs/0818_072052_linear/checkpoints/Model-50000/fc10\n",
      "GAN Model directory is  ./logs/0818_072052_linear/checkpoints/Model-50000/complete\n",
      " [*] Attempting to restore ./logs/0818_072052_linear/checkpoints/Model-50000\n",
      "INFO:tensorflow:Restoring parameters from ./logs/0818_072052_linear/checkpoints/Model-50000\n",
      "built trainer successfully\n"
     ]
    }
   ],
   "source": [
    "%run main.py --data_type 'linear' --load_path './logs/0818_072052_linear/checkpoints/Model-50000' --is_train False"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Using matplotlib backend: TkAgg\n"
     ]
    }
   ],
   "source": [
    "%matplotlib\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "sess=trainer.sess;gans=trainer.gans\n",
    "Xgs=[sess.run(g.gen.X,{g.gen.N:5000}) for g in gans]\n",
    "split_Xgs=[np.split(x,3,axis=1) for x in Xgs]\n",
    "X13gs=[[x[0],x[-1]] for x in split_Xgs]\n",
    "Xds=np.split(sess.run(trainer.data.X,{trainer.data.N:5000}),3,axis=1)\n",
    "X13d=[Xds[0],Xds[-1]]\n",
    "\n",
    "data_dict={'data':X13d}\n",
    "for g,dat in zip(gans,X13gs):\n",
    "    data_dict[g.gan_type]=dat\n",
    "\n",
    "gan_plots=['data','linear','collider','fc5']\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "titles={'data':'Data Distribution',\n",
    "        'linear':'Linear Generator',\n",
    "        'complete':'Complete Generator',\n",
    "        'collider':'Collider Generator',\n",
    "        'fc5':'Fully Connected Generator'}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#all at once\n",
    "fig,axes=plt.subplots(1,len(gan_plots),sharey=True)\n",
    "\n",
    "for gtype,ax in zip(gan_plots,axes):\n",
    "    data=data_dict[gtype]\n",
    "    ax.scatter(data[0],data[1])\n",
    "    \n",
    "    ax.set_title(titles[gtype])\n",
    "    ax.set_xlabel('X1')\n",
    "    if gtype==gan_plots[0]:\n",
    "        ax.set_ylabel('X3')\n",
    "\n",
    "        \n",
    "fig.canvas.draw()\n",
    "plt.show()    \n",
    "\n",
    "fig.subplots_adjust(wspace=0.04,left=0.05,hspace=0.04,right=0.98)\n",
    "\n",
    "fig.set_figheight(4)\n",
    "fig.set_figwidth(12)\n",
    "\n",
    "plt.savefig('assets/0818_072052_x1x3_all.pdf')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#one at a time\n",
    "\n",
    "for gtype in titles.keys():\n",
    "    data=data_dict[gtype]\n",
    "    fig=plt.figure()\n",
    "    plt.scatter(data[0],data[1])\n",
    "    plt.xlim([0,1])\n",
    "    plt.ylim([0,1])\n",
    "    \n",
    "    plt.title(titles[gtype],fontsize=20)\n",
    "\n",
    "    plt.ylabel('X3',fontsize=16)\n",
    "    plt.xlabel('X1',fontsize=16)\n",
    "    save_path='assets/'+'0818_072052/'+'x1x3_'+gtype+'.pdf'\n",
    "    plt.savefig(save_path)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#no titles\n",
    "\n",
    "for gtype in titles.keys():\n",
    "    data=data_dict[gtype]\n",
    "    fig=plt.figure()\n",
    "    plt.scatter(data[0],data[1])\n",
    "    plt.xlim([0,1])\n",
    "    plt.ylim([0,1])\n",
    "    \n",
    "    #plt.title(titles[gtype],fontsize=20)\n",
    "\n",
    "    plt.ylabel('X3',fontsize=16)\n",
    "    plt.xlabel('X1',fontsize=16)\n",
    "    save_path='assets/'+'0818_072052/'+'x1x3_notitle'+gtype+'.pdf'\n",
    "    plt.savefig(save_path)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "#no text\n",
    "#No titles: leave to latex to add titles/axes\n",
    "\n",
    "for gtype in titles.keys():\n",
    "    data=data_dict[gtype]\n",
    "    fig=plt.figure()\n",
    "    plt.scatter(data[0],data[1])\n",
    "    plt.xlim([0,1])\n",
    "    plt.ylim([0,1])\n",
    "    \n",
    "    #plt.title(titles[gtype],fontsize=14)\n",
    "\n",
    "    #plt.ylabel('X3',fontsize=14)\n",
    "    #plt.xlabel('X1',fontsize=14)\n",
    "    save_path='assets/'+'0818_072052/'+'x1x3_notext'+gtype+'.pdf'\n",
    "    plt.savefig(save_path)\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "fig.subplots_adjust?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "ename": "NameError",
     "evalue": "name 'trainer' is not defined",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-2-b98b42720f8e>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtrainer\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[0;31mNameError\u001b[0m: name 'trainer' is not defined"
     ]
    }
   ],
   "source": [
    "trainer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from utils import scatter2d"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}


================================================
FILE: synthetic/main.py
================================================
from __future__ import print_function
import numpy as np
import tensorflow as tf

from trainer import Trainer
from config import get_config
import os

from IPython.core import debugger
debug = debugger.Pdb().set_trace


'''main code for synthetic experiments

'''


def get_trainer(config):
    print('tf: resetting default graph!')
    tf.reset_default_graph()

    #tf.set_random_seed(config.random_seed)
    #np.random.seed(22)

    print('Using data_type ',config.data_type)
    trainer=Trainer(config,config.data_type)
    print('built trainer successfully')

    tf.logging.set_verbosity(tf.logging.ERROR)

    return trainer


def main(trainer,config):

    if config.is_train:
        trainer.train()


def get_model(config=None):
    if not None:
        config, unparsed = get_config()
    return get_trainer(config)

if __name__ == "__main__":
    config, unparsed = get_config()
    if not os.path.exists(config.model_dir):
        os.mkdir(config.model_dir)
    trainer=get_trainer(config)
    main(trainer,config)


================================================
FILE: synthetic/models.py
================================================
import tensorflow as tf
import matplotlib.pyplot as plt
from utils import *

#class Data3d

def sxe(logits,labels):
    #use zeros or ones if pass in scalar
    if not isinstance(labels,tf.Tensor):
        labels=labels*tf.ones_like(logits)
    return tf.nn.sigmoid_cross_entropy_with_logits(
        logits=logits,labels=labels)

#def linear(input_, output_dim, scope=None, stddev=10.):
def linear(input_, output_dim, scope=None, stddev=.7):
    unif = tf.uniform_unit_scaling_initializer()
    norm = tf.random_normal_initializer(stddev=stddev)
    const = tf.constant_initializer(0.0)
    with tf.variable_scope(scope or 'linear'):
        #w = tf.get_variable('w', [input_.get_shape()[1], output_dim], initializer=unif)
        w = tf.get_variable('w', [input_.get_shape()[1], output_dim], initializer=norm)
        b = tf.get_variable('b', [output_dim], initializer=const)
        return tf.matmul(input_, w) + b


class Arrows:
    x_dim=3
    e_dim=3
    bdry_buffer=0.05# output in [bdry_buffer,1-bdry_buffer]
    def __init__(self,N):
        with tf.variable_scope('Arrow') as scope:
            self.N=tf.placeholder_with_default(N,shape=[])
            #self.N=tf.constant(N) #how many to sample at a time
            self.e1=tf.random_uniform([self.N,1],0,1)
            self.e2=tf.random_uniform([self.N,1],0,1)
            self.e3=tf.random_uniform([self.N,1],0,1)
            self.build()
            #WARN. some of these are not trainable: i.e. poly
            self.var = tf.contrib.framework.get_variables(scope)
    def build(self):
        pass

    def normalize_output(self,X):
        '''
        I think that data literally in [0,1] was difficult for sigmoid network.
        Therefore, I am normalizing it to [bdry_buffer,1-bdry_buffer]

        X: assumed to be in [0,1]
        '''
        return (1.-2*self.bdry_buffer)*X + self.bdry_buffer


class Generator:
    x_dim=3
    def __init__(self, N, hidden_size=10,z_dim=10):
        with tf.variable_scope('Gen') as scope:
            self.N=tf.placeholder_with_default(N,shape=[])
            self.hidden_size=hidden_size
            self.z_dim=z_dim
            self.build()
            self.tr_var = tf.contrib.framework.get_variables(scope)
            self.step=tf.Variable(0,name='step',trainable=False)
            self.var = tf.contrib.framework.get_variables(scope)
    def build(self):
        raise Exception('must override')
    def smallNN(self,inputs,name='smallNN'):
        with tf.variable_scope(name):
            if isinstance(inputs,list):
                inputs=tf.concat(inputs,axis=1)
            h01 = tf.tanh(linear(inputs, self.hidden_size, name+'l1'))
            h11 = tf.tanh(linear(h01, self.hidden_size, name+'l21'))
            #h21 = output_nonlinearity(linear(h11, 1, name+'l31'))
            #h21 = linear(h11, 1, name+'l31')
            h21 = tf.sigmoid(linear(h11, 1, name+'l31'))

        return h21#rank2
        #return tf.sigmoid(h21)#rank2


randunif=tf.random_uniform_initializer(0,1,dtype=tf.float32)
def poly(cause,cause2=None,cause3=None,name='poly1d',reuse=None):
    #assumes input is in [0,1]. Enforces output is in [0,1]
    #if cause2 is not given, this is a cubic poly is 1 variable

    #cause and cause2 should be given as tensors like (N,1)

    #Check conditions
    if isinstance(cause2,str):
        raise ValueError('cause2 was a string. you probably forgot to include\
                         the "name=" keyword when specifying only 1 cause')
    if isinstance(cause3,str):
        raise ValueError('cause3 was a string. you probably forgot to include\
                         the "name=" keyword when specifying only 1 cause')
    if not len(cause.shape)>=2:
        cshape=cause.get_shape().as_list()
        raise ValueError('cause and cause2 must have len(shape)>=2. shape was' , cshape )
    if cause2 is not None:
        if not len(cause2.get_shape().as_list())>=2:
            cshape2=cause2.get_shape().as_list()
            raise ValueError('cause and cause2 must have len(shape)>=2. shape was %r'%(cshape2))
    if cause3 is not None:
        if not len(cause3.get_shape().as_list())>=2:
            cshape3=cause3.get_shape().as_list()
            raise ValueError('cause and cause3 must have len(shape)>=2. shape was %r'%(cshape3))

    #Start
    with tf.variable_scope(name,reuse=reuse):
        if cause2 is not None and cause3 is not None:
            inputs=[tf.ones_like(cause),cause,cause2,cause3]
        if cause2 is not None and cause3 is None:
            inputs=[tf.ones_like(cause),cause,cause2]
        else:
            inputs=[tf.ones_like(cause),cause]
        dim=len(inputs)#2 or 3 or 4

        C=np.random.rand(1,dim,dim,dim).astype(np.float32)#unif
        C=2*C-1 #unif[-1,1]

        n=200
        N=n**(dim-1)
        grids=np.mgrid[[slice(0,1,1./n) for i in inputs[1:]]]
        y=np.hstack([np.ones((N,1))]+[g.reshape(N,1) for g in grids])
        y1=np.reshape(y,[N,-1,1,1])
        y2=np.reshape(y,[N,1,-1,1])
        y3=np.reshape(y,[N,1,1,-1])

        test_poly=np.sum(y1*y2*y3*C,axis=(1,2,3))
        Cmin=np.min(test_poly)
        Cmax=np.max(test_poly)
        #normalize [0,1]->[0,1]
        C[0,0,0,0]-=Cmin
        C/=(Cmax-Cmin)

        coeff=tf.Variable(C,name='coef',trainable=False)

        #M=cause.get_shape.as_list()[0]
        x=tf.concat(inputs,axis=1)
        x1=tf.reshape(x,[-1,dim,1,1])
        x2=tf.reshape(x,[-1,1,dim,1])
        x3=tf.reshape(x,[-1,1,1,dim])

        poly=tf.reduce_sum(x1*x2*x3*coeff,axis=[1,2,3])
        return tf.reshape(poly,[-1,1])


class CompleteArrows(Arrows): # Data generated from the causal graph X1->X2->X3
    name='complete'
    def build(self):
        with tf.variable_scope(self.name):
            self.X1=poly(self.e1,name='X1')
            #self.X2=0.5*poly(self.X1,name='X1cX2')+0.5*self.e2
            #self.X3=0.5*poly(self.X1,self.X2,name='X1X2cX3')+0.5*self.e3
            self.X2=poly(self.X1,self.e2,name='X1cX2')
            self.X3=poly(self.X1,self.X2,self.e3,name='X1X2cX3')
            self.X=tf.concat([self.X1,self.X2,self.X3],axis=1)
            self.X=self.normalize_output(self.X)
            #print 'completearrowX.shape:',self.X.get_shape().as_list()
class CompleteGenerator(Generator):
    name='complete'
    def build(self):
        with tf.variable_scope(self.name):
            self.z=tf.random_uniform((self.N,self.x_dim*self.z_dim), 0,1,name='z')
            z1,z2,z3=tf.split( self.z ,3,axis=1)#3=x_dim
            self.X1=self.smallNN(z1,'X1')
            self.X2=self.smallNN([self.X1,z2],'X1cX2')
            self.X3=self.smallNN([self.X1,self.X2,z3],'X1X2cX3')
            self.X=tf.concat([self.X1,self.X2,self.X3],axis=1)
            #print 'completegenX.shape:',self.X.get_shape().as_list()

class ColliderArrows(Arrows):
    name='collider'
    def build(self):
        with tf.variable_scope(self.name):
            self.X1=poly(self.e1,name='X1')
            self.X3=poly(self.e3,name='X3')
            #self.X2=0.5*poly(self.X1,self.X3,name='X1X3cX2')+0.5*self.e2
            self.X2=poly(self.X1,self.X3,self.e2,name='X1X3cX2')
            self.X=tf.concat([self.X1,self.X2,self.X3],axis=1)
            self.X=self.normalize_output(self.X)
class ColliderGenerator(Generator):
    name='collider'
    def build(self):
        with tf.variable_scope(self.name):
            self.z=tf.random_uniform((self.N,self.x_dim*self.z_dim), 0,1,name='z')
            z1,z2,z3=tf.split( self.z ,3,axis=1)#3=x_dim
            self.X1=self.smallNN(z1,'X1')
            self.X3=self.smallNN(z3,'X3')
            self.X2=self.smallNN([self.X1,self.X3,z2],'X1X3cX2')
            self.X=tf.concat([self.X1,self.X2,self.X3],axis=1)

class LinearArrows(Arrows):
    name='linear'
    def build(self):
        with tf.variable_scope(self.name):
            self.X1=poly(self.e1,name='X1')
            #self.X2=0.5*poly(self.X1,name='X2')+0.5*self.e2
            #self.X3=0.5*poly(self.X2,name='X3')+0.5*self.e3
            self.X2=poly(self.X1,self.e2,name='X2')
            self.X3=poly(self.X2,self.e3,name='X3')
            self.X=tf.concat([self.X1,self.X2,self.X3],axis=1)
            self.X=self.normalize_output(self.X)
class LinearGenerator(Generator):
    name='linear'
    def build(self):
        with tf.variable_scope(self.name):
            self.z=tf.random_uniform((self.N,self.x_dim*self.z_dim), 0,1,name='z')
            z1,z2,z3=tf.split( self.z ,3,axis=1)#3=x_dim
            self.X1=self.smallNN(z1,'X1')
            self.X2=self.smallNN([self.X1,z2],'X2')
            self.X3=self.smallNN([self.X2,z3],'X3')
            self.X=tf.concat([self.X1,self.X2,self.X3],axis=1)

class NetworkArrows(Arrows):
    name='network'
    def build(self):
        with tf.variable_scope(self.name):
            self.hidden_size=10
            h0 = tf.tanh(linear(self.e1, self.hidden_size, 'netarrow0'))
            h1 = tf.tanh(linear(h0, self.hidden_size, 'netarrow1'))
            h2 = tf.tanh(linear(h1, self.hidden_size, 'netarrow2'))
            h3 = tf.tanh(linear(h2, self.hidden_size, 'netarrow3'))
            h4 = tf.sigmoid(linear(h3, self.x_dim, 'netarrow4'))
            self.X=self.normalize_output(h4)

class FC3_Generator(Generator):
    name='fc3'
    def build(self):
        z=tf.random_uniform((self.N,self.x_dim*self.z_dim), 0,1,name='z')
        z1,z2,z3=tf.split( z ,3,axis=1)#3=x_dim
        h0 = tf.tanh(linear(z1, self.hidden_size, 'fc3gen0'))
        h1 = tf.tanh(linear(h0, self.hidden_size, 'fc3gen1'))
        h2 = tf.sigmoid(linear(h1, self.x_dim, 'fc3gen2'))
        self.X=h2

class FC5_Generator(Generator):
    name='fc5'
    def build(self):
        z=tf.random_uniform((self.N,self.x_dim*self.z_dim), 0,1,name='z')
        z1,z2,z3=tf.split( z ,3,axis=1)#3=x_dim
        h0 = tf.tanh(linear(z1, self.hidden_size, 'fc5gen0'))
        h1 = tf.tanh(linear(h0, self.hidden_size, 'fc5gen1'))
        h2 = tf.tanh(linear(h1, self.hidden_size, 'fc5gen2'))
        h3 = tf.tanh(linear(h2, self.hidden_size, 'fc5gen3'))
        h4 = tf.sigmoid(linear(h3, self.x_dim, 'fc5gen4'))
        self.X=h4

class FC10_Generator(Generator):
    name='fc10'
    def build(self):
        z=tf.random_uniform((self.N,self.x_dim*self.z_dim), 0,1,name='z')
        z1,z2,z3=tf.split( z ,3,axis=1)#3=x_dim
        h0 = tf.tanh(linear(z1, self.hidden_size, 'fc10gen0'))
        h1 = tf.tanh(linear(h0, self.hidden_size, 'fc10gen1'))
        h2 = tf.tanh(linear(h1, self.hidden_size, 'fc10gen2'))
        h3 = tf.tanh(linear(h2, self.hidden_size, 'fc10gen3'))
        h4 = tf.tanh(linear(h3, self.hidden_size, 'fc10gen4'))
        h5 = tf.tanh(linear(h4, self.hidden_size, 'fc10gen5'))
        h6 = tf.tanh(linear(h5, self.hidden_size, 'fc10gen6'))
        h7 = tf.tanh(linear(h6, self.hidden_size, 'fc10gen7'))
        h8 = tf.tanh(linear(h7, self.hidden_size, 'fc10gen8'))
        h9 = tf.sigmoid(linear(h8, self.x_dim, 'fc10gen9'))
        self.X=h9


def minibatch(input_, num_kernels=5, kernel_dim=3):
    x = linear(input_, num_kernels * kernel_dim, scope='minibatch', stddev=0.02)
    activation = tf.reshape(x, (-1, num_kernels, kernel_dim))
    diffs = tf.expand_dims(activation, 3) - tf.expand_dims(tf.transpose(activation, [1, 2, 0]), 0)
    abs_diffs = tf.reduce_sum(tf.abs(diffs), 2)
    minibatch_features = tf.reduce_sum(tf.exp(-abs_diffs), 2)
    return tf.concat([input_, minibatch_features],1)


def Discriminator(input_, hidden_size,minibatch_layer=True,alpha=0.5,reuse=None):
    with tf.variable_scope('discriminator',reuse=reuse):
        h0_ = tf.nn.relu(linear(input_, hidden_size, 'disc0'))
        h0 = tf.maximum(alpha*h0_,h0_)
        h1_ = tf.nn.relu(linear(h0, hidden_size, 'disc1'))
        h1 = tf.maximum(alpha*h1_,h1_)
        if minibatch_layer:
            h2 = minibatch(h1)
        else:
            h2_ = tf.nn.relu(linear(h1, hidden_size, 'disc2'))
            h2 = tf.maximum(alpha*h2_,h2_)
        h3 = linear(h2, 1, 'disc3')
        return h3


GeneratorTypes={CompleteGenerator.name:CompleteGenerator,
            ColliderGenerator.name:ColliderGenerator,
            LinearGenerator.name:LinearGenerator,
            FC3_Generator.name:FC3_Generator,
            FC5_Generator.name:FC5_Generator,
            FC10_Generator.name:FC10_Generator}
DataTypes={CompleteArrows.name:CompleteArrows,
           ColliderArrows.name:ColliderArrows,
           LinearArrows.name:LinearArrows,
           NetworkArrows.name:NetworkArrows}

#def poly1d(cause,name='poly1d',reuse=None):
#    #assumes input is in [0,1]. Enforces output is in [0,1]
#    print 'Warning poly1d not ready yet'
#    with tf.variable_scope(name,initializer=randunif,reuse=reuse):
#        #C=np.random.rand(1,2,2).astype(np.float32)#unif
#        C=np.random.rand(1,2,2,2).astype(np.float32)#unif
#
#        #find min and max
#        N=2000
#        y=np.hstack([np.ones((N,1)),np.linspace(0,1.,N).reshape((N,1))])
#        y1=np.reshape(y,[N,2,1,1])
#        y2=np.reshape(y,[N,1,2,1])
#        y3=np.reshape(y,[N,1,1,2])
#
#        test_poly=np.sum(y1*y2*y3*C,axis=(1,2,3))
#        Cmin=np.min(test_poly)
#        Cmax=np.max(test_poly)
#
#        #normalize [0,1]->[0,1]
#        C[0,0,0,0]-=Cmin
#        C/=(Cmax-Cmin)
#
#        coeff=tf.Variable(C,name='coef',trainable=False)
#        x2=tf.reshape(tf.stack([tf.ones_like(cause),cause],axis=1),[-1,1,2])
#        x1=tf.transpose(x2,[0,2,1])
#        poly=tf.reduce_sum(x1*x2*coeff,axis=[1,2])
#        out= tf.squeeze(poly)
#        return poly
#
#        #coeff=tf.Variable(trainable=False,expected_shape=[1,3])
#    #    X=tf.stack([cause,cause*cause,cause*cause*cause],axis=1)
#    #    return tf.reduce_sum(coeff*X,axis=1)/tf.reduce_max(coeff)
#
#def poly2d(cause,cause2,name='poly2d',reuse=None):
#    with tf.variable_scope(name,initializer=randunif,reuse=reuse):
#        #coeff=tf.Variable(np.random.randn(1,2,2,2).astype(np.float32),trainable=False)
#        #x3=tf.reshape(tf.stack([cause,cause2],axis=0),[-1,1,1,2])
#        #x2=tf.transpose(x3,[0,2,3,1])
#        #x1=tf.transpose(x2,[0,2,3,1])
#
#        C=np.random.rand(1,3,3,3).astype(np.float32)
#        C[:,0,0,0]=0.#constant
#        C[:,0,2,0]=1.#x^3,y^3 coeff
#        C[:,0,0,2]=1.
#        coeff=tf.Variable(C, trainable=False)
#        x3=tf.reshape(tf.stack([tf.ones_like(cause),cause,cause2],axis=1),[-1,1,1,3])
#        x2=tf.transpose(x3,[0,2,3,1])
#        x1=tf.transpose(x2,[0,2,3,1])
#
#        poly=tf.reduce_sum(x1*x2*x3*coeff,axis=[1,2,3])
#
#        #out = tf.squeeze(poly)/tf.reduce_max(coeff)
#        out= tf.squeeze(poly)
#        return out


================================================
FILE: synthetic/run_datasets.sh
================================================
#!/bin/bash

#This script should be called with CUDA_VISIBLE_DEVICES
#already set. This script runs 1 of each gan model for
#1 of each dataset model

set -e

cvd=${CUDA_VISIBLE_DEVICES:?"Needs to be set"}
echo "DEVICES=$cvd"

#Sorry tqmd will produce some spastic output

#for i in {1..5}
for i in {1..30}
do
    echo "GPU "$CUDA_VISIBLE_DEVICES" Iter $i"

    python main.py --data_type=linear &
    sleep 2s
    python main.py --data_type=collider &
    sleep 2s
    python main.py --data_type=complete 

    #python main.py --data_type=linear &
    #sleep 2s
    #python main.py --data_type=linear &
    #sleep 2s
    #python main.py --data_type=linear 

    #python main.py --data_type=network &
    #python main.py --data_type=network &
    #python main.py --data_type=network 

    #Make sure all finished
    echo "Sleeping"
    sleep 5m

done


echo "finshed fork_datasets.sh"


================================================
FILE: synthetic/tboard.py
================================================
import os
import sys

from subprocess import call

def file2number(fname):
    nums=[s for s in fname.split('_') if s.isdigit()]
    if len(nums)==0:
        nums=['0']
    number=int(''.join(nums))
    return number

if __name__=='__main__':
    root='./logs'

    logs=os.listdir(root)
    logs.sort(key=lambda x:file2number(x))


    logdir=os.path.join(root,logs[-1])
    print 'running tensorboard on logdir:',logdir

    call(['tensorboard', '--logdir',logdir])


================================================
FILE: synthetic/trainer.py
================================================
from __future__ import print_function
import tensorflow as tf
import logging
import numpy as np
import pandas as pd
import shutil
import json
import sys
import os
from datetime import datetime
from tqdm import trange
import matplotlib.pyplot as plt

from os import listdir
from os.path import isfile,join

from utils import calc_tvd,summary_scatterplots,Timer,summary_losses,make_summary
from models import GeneratorTypes,DataTypes,Discriminator,sxe

class GAN(object):
    def __init__(self,config,gan_type,data,parent_dir):
        self.config=config
        self.gan_type=gan_type
        self.data=data
        self.Xd=data.X
        self.parent_dir=parent_dir
        self.prepare_model_dir()
        self.prepare_logger()

        with tf.variable_scope(gan_type):
            self.step=tf.Variable(0,'step')
            self.inc_step=tf.assign(self.step,self.step+1)
            self.build_model()
        self.build_summaries()#This can be either in var_scope(name) or out

    def build_model(self):
        Gen=GeneratorTypes[self.gan_type]
        config=self.config
        self.gen=Gen(config.batch_size,config.gen_hidden_size,config.gen_z_dim)

        with tf.variable_scope('Disc') as scope:
            self.D1 = Discriminator(self.data.X, config.disc_hidden_size)
            scope.reuse_variables()
            self.D2 = Discriminator(self.gen.X, config.disc_hidden_size)
            d_var = tf.contrib.framework.get_variables(scope)

        d_loss_real=tf.reduce_mean( sxe(self.D1,1) )
        d_loss_fake=tf.reduce_mean( sxe(self.D2,0) )
        self.loss_d =  d_loss_real  +  d_loss_fake
        self.loss_g = tf.reduce_mean( sxe(self.D2,1) )

        optimizer=tf.train.AdamOptimizer
        g_optimizer=optimizer(self.config.lr_gen)
        d_optimizer=optimizer(self.config.lr_disc)
        self.opt_d = d_optimizer.minimize(self.loss_d,var_list= d_var)
        self.opt_g = g_optimizer.minimize(self.loss_g,var_list= self.gen.tr_var,
                               global_step=self.gen.step)

        with tf.control_dependencies([self.inc_step]):
            self.train_op=tf.group(self.opt_d,self.opt_g)

    def build_summaries(self):
        d_summ=tf.summary.scalar(self.data.name+'_dloss',self.loss_d)
        g_summ=tf.summary.scalar(self.data.name+'_gloss',self.loss_g)
        self.summaries=[d_summ,g_summ]
        self.summary_op=tf.summary.merge(self.summaries)
        self.tf_scatter=tf.placeholder(tf.uint8,[3,480,640,3])
        scatter_name='scatter_D'+self.data.name+'_G'+self.gen.name
        self.g_scatter_summary=tf.summary.image(scatter_name,self.tf_scatter,max_outputs=3)
        self.summary_writer=tf.summary.FileWriter(self.model_dir)

    def record_losses(self,sess):
        step, sum_loss_g, sum_loss_d = summary_losses(sess,self)
        self.summary_writer.add_summary(sum_loss_g,step)
        self.summary_writer.add_summary(sum_loss_d,step)
        self.summary_writer.flush()

    def record_tvd(self,sess):
        step,tvd,mvd = calc_tvd(sess,self.gen,self.data)
        self.log_tvd(step,tvd,mvd)
        summ_tvd=make_summary(self.data.name+'_tvd',tvd)
        summ_mvd=make_summary(self.data.name+'_mvd',mvd)
        self.summary_writer.add_summary(summ_tvd,step)
        self.summary_writer.add_summary(summ_mvd,step)
        self.summary_writer.flush()
    def record_scatter(self,sess):
        Xg=sess.run(self.gen.X,{self.gen.N:5000})
        X1,X2,X3=np.split(Xg,3,axis=1)
        x1x2,x1x3,x2x3 = summary_scatterplots(X1,X2,X3)
        step,Pg_summ=sess.run([self.step,self.g_scatter_summary],{self.tf_scatter:np.concatenate([x1x2,x1x3,x2x3])})
        self.summary_writer.add_summary(Pg_summ,step)
        self.summary_writer.flush()

#        if self.config.save_pdfs:
#            self.save_np_scatter(step,X1,X3)

#Maybe it's the supervisor creating the segfault??
#Try just one model at a time

#   #will cause segfault ;)
#    def save_np_scatter(self,step,x,y,save_dir=None,ext='.pdf'):
#        '''
#        This is a convenience that just saves the image as a pdf in addition to putting it on
#        tensorboard. only does x1x3 because that's what I needed at the moment
#
#        sorry I wrote this really quickly
#        TODO: make less bad.
#        '''
#        plt.scatter(x,y)
#        plt.title('X1X3')
#        plt.xlabel('X1')
#        plt.ylabel('X3')
#        plt.xlim([0,1])
#        plt.ylim([0,1])
#
#        scatter_dir=os.path.join(self.model_dir,'scatter')
#
#        save_dir=save_dir or scatter_dir
#        if not os.path.exists(save_dir):
#            os.mkdir(save_dir)
#
#        save_name=os.path.join(save_dir,'{}_scatter_x1x3_{}_{}'+ext)
#        save_path=save_name.format(step,self.config.data_type,self.gan_type)
#
#        plt.savefig(save_path)


    def prepare_model_dir(self):
        self.model_dir=os.path.join(self.parent_dir,self.gan_type)
        if not os.path.exists(self.model_dir):
            os.mkdir(self.model_dir)
        print('GAN Model directory is ',self.model_dir)
    def prepare_logger(self):
        self.logger=logging.getLogger(self.gan_type)
        pth=os.path.join(self.model_dir,'tvd.csv')
        file_handler=logging.FileHandler(pth)
        self.logger.addHandler(file_handler)
        self.logger.setLevel(logging.INFO)
        self.logger.info('iter tvd mvd')
    def log_tvd(self,step,tvd,mvd):
        log_str=' '.join([str(step),str(tvd),str(mvd)])
        self.logger.info(log_str)


class Trainer(object):
    def __init__(self,config,data_type):
        self.config=config
        self.data_type=data_type
        self.prepare_model_dir()


        #with tf.variable_scope('trainer'):#commented to get summaries on same plot
        self.step=tf.Variable(0,'step')
        self.inc_step=tf.assign(self.step,self.step+1)
        self.build_model()

        self.summary_writer=tf.summary.FileWriter(self.model_dir)

        self.saver=tf.train.Saver()

        #sv = tf.train.Supervisor(
        #                        logdir=self.save_model_dir,
        #                        is_chief=True,
        #                        saver=self.saver,
        #                        summary_op=None,
        #                        summary_writer=self.summary_writer,
        #                        save_model_secs=300,
        #                        global_step=self.step,
        #                        ready_for_local_init_op=None
        #                        )

        gpu_options = tf.GPUOptions(allow_growth=True,
                                  per_process_gpu_memory_fraction=0.333)
        sess_config = tf.ConfigProto(allow_soft_placement=True,
                                    gpu_options=gpu_options)
        #self.sess = sv.prepare_or_wait_for_session(config=sess_config)
        self.sess = tf.Session(config=sess_config)


        init=tf.global_variables_initializer()
        self.sess.run(init)

        #if load_path, replace initialized values
        if self.config.load_path:
            print(" [*] Attempting to restore {}".format(self.config.load_path))
            self.saver.restore(self.sess,self.config.load_path)

            #print(" [*] Attempting to restore {}".format(ckpt))
            #self.saver.restore(self.sess,ckpt)
            #print(" [*] Success to read {}".format(ckpt))


        if not self.config.load_path:
            #once data scatterplot (doesn't change during training)
            self.data_scatterplot()


    def data_scatterplot(self):
        Xd=self.sess.run(self.data.X,{self.data.N:5000})
        X1,X2,X3=np.split(Xd,3,axis=1)
        x1x2,x1x3,x2x3 = summary_scatterplots(X1,X2,X3)
        step,Pg_summ=self.sess.run([self.step,self.d_scatter_summary],{self.tf_scatter:np.concatenate([x1x2,x1x3,x2x3])})
        self.summary_writer.add_summary(Pg_summ,step)
        self.summary_writer.flush()


    def build_model(self):
        self.data=DataTypes[self.data_type](self.config.batch_size)

        self.gans=[GAN(self.config,n,self.data,self.model_dir) for n in GeneratorTypes.keys()]

        with tf.control_dependencies([self.inc_step]):
            self.train_op=tf.group(*[gan.train_op for gan in self.gans])
            #self.train_op=tf.group(gan.train_op for gan in self.gans.values())

        #Used for generating image summaries of scatterplots
        self.tf_scatter=tf.placeholder(tf.uint8,[3,480,640,3])
        self.d_scatter_summary=tf.summary.image('scatter_Data_'+self.data_type,self.tf_scatter,max_outputs=3)


    def train(self):
        self.train_timer   =Timer()
        self.losses_timer  =Timer()
        self.tvd_timer     =Timer()
        self.scatter_timer =Timer()

        self.log_step=50
        self.max_step=50001
        #self.max_step=501
        for step in trange(self.max_step):

            if step % self.log_step == 0:
                for gan in self.gans:
                    self.losses_timer.on()
                    gan.record_losses(self.sess)
                    self.losses_timer.off()

                    self.tvd_timer.on()
                    gan.record_tvd(self.sess)
                    self.tvd_timer.off()

            if step % (10*self.log_step) == 0:
                for gan in self.gans:
                    self.scatter_timer.on()
                    gan.record_scatter(self.sess)

                    #DEBUG: reassure me nothing changes during optimization
                    #self.data_scatterplot()

                    self.scatter_timer.off()

            if step % (5000) == 0:
                self.saver.save(self.sess,self.save_model_name,step)

            self.train_timer.on()
            self.sess.run(self.train_op)
            self.train_timer.off()


        print("Timers:")
        print(self.train_timer)
        print(self.losses_timer)
        print(self.tvd_timer)
        print(self.scatter_timer)


    def prepare_model_dir(self):
        if self.config.load_path:
            self.model_dir=self.config.load_path
        else:
            pth=datetime.now().strftime("%m%d_%H%M%S")+'_'+self.data_type
            self.model_dir=os.path.join(self.config.model_dir,pth)


        if not os.path.exists(self.model_dir):
            os.mkdir(self.model_dir)
        print('Model directory is ',self.model_dir)

        self.save_model_dir=os.path.join(self.model_dir,'checkpoints')
        if not os.path.exists(self.save_model_dir):
            os.mkdir(self.save_model_dir)
        self.save_model_name=os.path.join(self.save_model_dir,'Model')


        param_path = os.path.join(self.model_dir, "params.json")
        print("[*] MODEL dir: %s" % self.model_dir)
        print("[*] PARAM path: %s" % param_path)
        with open(param_path, 'w') as fp:
            json.dump(self.config.__dict__, fp, indent=4, sort_keys=True)

        config=self.config
        if config.is_train and not config.load_path:
            config.log_code_dir=os.path.join(self.model_dir,'code')
            for path in [self.model_dir, config.log_code_dir]:
                if not os.path.exists(path):
                    os.makedirs(path)

            #Copy python code in directory into model_dir/code for future reference:
            code_dir=os.path.dirname(os.path.realpath(sys.argv[0]))
            model_files = [f for f in listdir(code_dir) if isfile(join(code_dir, f))]
            for f in model_files:
                if f.endswith('.py'):
                    shutil.copy2(f,config.log_code_dir)


================================================
FILE: synthetic/utils.py
================================================
from __future__ import print_function
import tensorflow as tf
import os
from os import listdir
from os.path import isfile, join
from skimage import io
import shutil
import sys
import math
import time
import json
import logging
import numpy as np
from PIL import Image
from datetime import datetime
from tensorflow.core.framework import summary_pb2
import matplotlib.pyplot as plt

def make_summary(name, val):
    return summary_pb2.Summary(value=[summary_pb2.Summary.Value(tag=name, simple_value=val)])

def summary_losses(sess,model,N=1000):
    step,loss_g,loss_d=sess.run([model.step,model.loss_g,model.loss_d],{model.data.N:N,model.gen.N:N})
    lgsum=make_summary(model.data.name+'_gloss',loss_g)
    ldsum=make_summary(model.data.name+'_dloss',loss_d)
    return step,lgsum, ldsum

def calc_tvd(sess,Generator,Data,N=50000,nbins=10):
    Xd=sess.run(Data.X,{Data.N:N})
    step,Xg=sess.run([Generator.step,Generator.X],{Generator.N:N})

    p_gen,_ = np.histogramdd(Xg,bins=nbins,range=[[0,1],[0,1],[0,1]],normed=True)
    p_dat,_ = np.histogramdd(Xd,bins=nbins,range=[[0,1],[0,1],[0,1]],normed=True)
    p_gen/=nbins**3
    p_dat/=nbins**3
    tvd=0.5*np.sum(np.abs( p_gen-p_dat ))
    mvd=np.max(np.abs( p_gen-p_dat ))

    return step,tvd, mvd

    s_tvd=make_summary(Data.name+'_tvd',tvd)
    s_mvd=make_summary(Data.name+'_mvd',mvd)

    return step,s_tvd,s_mvd
    #return make_summary('tvd/'+Generator.name,tvd)


def summary_stats(name,tensor,hist=False):
    ave=tf.reduce_mean(tensor)
    std=tf.sqrt(tf.reduce_mean(tf.square(ave-tensor)))
    tf.summary.scalar(name+'_ave',ave)
    tf.summary.scalar(name+'_std',std)
    if hist:
        tf.summary.histogram(name+'_hist',tensor)

def summary_scatterplots(X1,X2,X3):
    with tf.name_scope('scatter'):
        img1=summary_scatter2d(X1,X2,'X1X2',xlabel='X1',ylabel='X2')
        img2=summary_scatter2d(X1,X3,'X1X3',xlabel='X1',ylabel='X3')
        img3=summary_scatter2d(X2,X3,'X2X3',xlabel='X2',ylabel='X3')
        plt.close()
    return img1,img2,img3


def summary_scatter2d(x,y,title='2dscatterplot',xlabel=None,ylabel=None):
    fig=scatter2d(x,y,title,xlabel=xlabel,ylabel=ylabel)

    fig.canvas.draw()
    rgb=fig.canvas.tostring_rgb()
    buf=np.fromstring(rgb,dtype=np.uint8)

    w,h = fig.canvas.get_width_height()
    img=buf.reshape(1,h,w,3)
    #summary=tf.summary.image(title,img)
    plt.close(fig)
    #fig.clf()
    return img

def scatter2d(x,y,title='2dscatterplot',xlabel=None,ylabel=None):
    fig=plt.figure()
    plt.scatter(x,y)
    plt.title(title)
    if xlabel:
        plt.xlabel(xlabel)
    if ylabel:
        plt.ylabel(ylabel)

    if not 0<=np.min(x)<=np.max(x)<=1:
        raise ValueError('summary_scatter2d title:',title,' input x exceeded [0,1] range.\
                         min:',np.min(x),' max:',np.max(x))
    if not 0<=np.min(y)<=np.max(y)<=1:
        raise ValueError('summary_scatter2d title:',title,' input y exceeded [0,1] range.\
                         min:',np.min(y),' max:',np.max(y))

    plt.xlim([0,1])
    plt.ylim([0,1])
    return fig


def prepare_dirs_and_logger(config):
    formatter = logging.Formatter("%(asctime)s:%(levelname)s::%(message)s")
    logger = logging.getLogger()

    for hdlr in logger.handlers:
        logger.removeHandler(hdlr)

    handler = logging.StreamHandler()
    handler.setFormatter(formatter)

    logger.addHandler(handler)

    if config.load_path:
        if config.load_path.startswith(config.log_dir):
            config.model_dir = config.load_path
        else:
            if config.load_path.startswith(config.dataset):
                config.model_name = config.load_path
            else:
                config.model_name = "{}_{}".format(config.dataset, config.load_path)
    else:
        config.model_name = "{}_{}".format(config.dataset, get_time())

    if not hasattr(config, 'model_dir'):
        config.model_dir = os.path.join(config.log_dir, config.model_name)
    config.data_path = os.path.join(config.data_dir, config.dataset)

    if config.is_train:
        config.log_code_dir=os.path.join(config.model_dir,'code')
        for path in [config.log_dir, config.data_dir,
                     config.model_dir, config.log_code_dir]:
            if not os.path.exists(path):
                os.makedirs(path)

        #Copy python code in directory into model_dir/code for future reference:
        code_dir=os.path.dirname(os.path.realpath(sys.argv[0]))
        model_files = [f for f in listdir(code_dir) if isfile(join(code_dir, f))]
        for f in model_files:
            if f.endswith('.py'):
                shutil.copy2(f,config.log_code_dir)

def get_time():
    return datetime.now().strftime("%m%d_%H%M%S")

def save_config(config):
    param_path = os.path.join(config.model_dir, "params.json")

    print("[*] MODEL dir: %s" % config.model_dir)
    print("[*] PARAM path: %s" % param_path)

    with open(param_path, 'w') as fp:
        json.dump(config.__dict__, fp, indent=4, sort_keys=True)


class Timer(object):
    def __init__(self):
        self.total_section_time=0.
        self.iter=0
    def on(self):
        self.t0=time.time()
    def off(self):
        self.total_section_time+=time.time()-self.t0
        self.iter+=1
    def __str__(self):
        n_min=self.total_section_time/60.
        return '%.2fmin'%n_min


================================================
FILE: tboard.py
================================================
import os
import sys

from subprocess import call

def file2number(fname):
    nums=[s for s in fname.split('_') if s.isdigit()]
    if len(nums)==0:
        nums=['0']
    number=int(''.join(nums))
    return number

if __name__=='__main__':
    root='./logs'

    logs=os.listdir(root)
    logs.sort(key=lambda x:file2number(x))


    logdir=os.path.join(root,logs[-1])
    print 'running tensorboard on logdir:',logdir

    call(['tensorboard', '--logdir',logdir])


================================================
FILE: trainer.py
================================================
from __future__ import print_function
import numpy as np
import tensorflow as tf
from causal_controller.CausalController import CausalController
from tqdm import trange
import os
import pandas as pd

from utils import make_summary,distribute_input_data,get_available_gpus
from utils import save_image

from data_loader import DataLoader
from figure_scripts.pairwise import crosstab

class Trainer(object):

    def __init__(self, config, cc_config, model_config=None):
        self.config=config
        self.cc_config=cc_config
        self.model_dir = config.model_dir
        self.cc_config.model_dir=config.model_dir

        self.model_config=model_config
        if self.model_config:
            self.model_config.model_dir=config.model_dir

        self.save_model_dir=os.path.join(self.model_dir,'checkpoints')
        if not os.path.exists(self.save_model_dir):
            os.mkdir(self.save_model_dir)

        self.summary_dir=os.path.join(self.model_dir,'summaries')
        if not os.path.exists(self.summary_dir):
            os.mkdir(self.summary_dir)

        self.load_path = config.load_path
        self.use_gpu = config.use_gpu

        #This tensor controls batch_size for all models
        #Not expected to change during training, but during testing it can be
        #helpful to change it

        self.batch_size=tf.placeholder_with_default(self.config.batch_size,[],name='batch_size')

        loader_batch_size=config.num_devices*config.batch_size

        #Always need to build CC
        print('setting up CausalController')
        cc_batch_size=config.num_devices*self.batch_size#Tensor/placeholder
        self.cc=CausalController(cc_batch_size,cc_config)
        self.step=self.cc.step

        #Data
        print('setting up data')
        self.data=DataLoader(self.cc.label_names,config)

        if self.cc_config.is_pretrain or self.config.build_pretrain:
            print('setup pretrain')
            #queue system to feed labels quickly. This does not queue images
            label_queue= self.data.get_label_queue(loader_batch_size)
            self.cc.build_pretrain(label_queue)

        #Build Model
        if self.model_config:
            #Will build both gen and discrim
            self.model=self.config.Model(self.batch_size,self.model_config)

            #Trainer step is defined as cc.step+model.step
            #e.g. 10k iter pretrain and 100k iter image model
            #will have image summaries at 100k but trainer model saved at Model-110k
            self.step+=self.model.step

            # This queue holds (image,label) pairs, and is used for training conditional GANs
            data_queue=self.data.get_data_queue(loader_batch_size)

            self.real_data_by_gpu = distribute_input_data(data_queue,config.num_gpu)
            self.fake_data_by_gpu = distribute_input_data(self.cc.label_dict,config.num_gpu)

            with tf.variable_scope('tower'):
                for gpu in get_available_gpus():
                    print('using device:',gpu)

                    real_data=self.real_data_by_gpu[gpu]
                    fake_data=self.fake_data_by_gpu[gpu]
                    tower=gpu.replace('/','').replace(':','_')

                    with tf.device(gpu),tf.name_scope(tower):
                        #Build num_gpu copies of graph: inputs->gradient
                        #Updates self.tower_dict
                        self.model(real_data,fake_data)

                    #allow future gpu to use same variables
                    tf.get_variable_scope().reuse_variables()

            if self.model_config.is_train or self.config.build_train:
                self.model.build_train_op()
                self.model.build_summary_op()

        else:
            print('Image model not built')

        self.saver = tf.train.Saver(keep_checkpoint_every_n_hours=2)
        self.summary_writer = tf.summary.FileWriter(self.summary_dir)

        print('trainer.model_dir:',self.model_dir)
        gpu_options = tf.GPUOptions(allow_growth=True,
                                  per_process_gpu_memory_fraction=0.333)
        sess_config = tf.ConfigProto(allow_soft_placement=True,
                                    gpu_options=gpu_options)

        sv = tf.train.Supervisor(
                                logdir=self.save_model_dir,
                                is_chief=True,
                                saver=self.saver,
                                summary_op=None,
                                summary_writer=self.summary_writer,
                                save_model_secs=300,
                                global_step=self.step,
                                ready_for_local_init_op=None
                                )
        self.sess = sv.prepare_or_wait_for_session(config=sess_config)

        if cc_config.pt_load_path:
            print('Attempting to load pretrain model:',cc_config.pt_load_path)
            self.cc.load(self.sess,cc_config.pt_load_path)

            print('Check tvd after restore')
            info=crosstab(self,report_tvd=True)
            print('tvd after load:',info['tvd'])

            #save copy of cc model in new dir
            cc_step=self.sess.run(self.cc.step)
            self.cc.saver.save(self.sess,self.cc.save_model_name,cc_step)

        if config.load_path:#Declare loading point
            pnt_str='Loaded variables at ccStep:{}'
            cc_step=self.sess.run(self.cc.step)
            pnt_str=pnt_str.format(cc_step)
            print('pntstr',pnt_str)
            if self.model_config:
                pnt_str+=' imagemodelStep:{}'
                model_step=self.sess.run
                pnt_str=pnt_str.format(model_step)
            print(pnt_str)

        #PREPARE training:
        #TODO save as Variables so they are restored to same values when load model
        fixed_batch_size=256 #get this many fixed z values

        self.fetch_fixed_z={n.z:n.z for n in self.cc.nodes}
        if model_config:
            self.fetch_fixed_z[self.model.z_gen]=self.model.z_gen

        #feed_dict that ensures constant inputs
        #add feed_fixed_z[self.cc.Male.label]=1*ones() to intervene
        self.feed_fixed_z=self.sess.run(self.fetch_fixed_z,{self.batch_size:fixed_batch_size})

    def pretrain_loop(self,num_iter=None):
        '''
        num_iter : is the number of *additional* iterations to do
        baring one of the quit conditions (the model may already be
        trained for some number of iterations). Defaults to
        cc_config.pretrain_iter.

        '''
        #TODO: potentially should be moved into CausalController for consistency

        num_iter = num_iter or self.cc.config.pretrain_iter

        if hasattr(self,'model'):
            model_step=self.sess.run(self.model.step)
            assert model_step==0,'if pretraining, model should not be trained already'

        cc_step=self.sess.run(self.cc.step)
        if cc_step>0:
            print('Resuming training of already optimized CC model at\
                  step:',cc_step)

        label_stats=crosstab(self,report_tvd=True)

        def break_pretrain(label_stats,counter):
            c1=counter>=self.cc.config.min_pretrain_iter
            c2= (label_stats['tvd']<self.cc.config.min_tvd)
            return (c1 and c2)

        for counter in trange(cc_step,cc_step+num_iter):
            #Check for early exit
            if counter %(10*self.cc.config.log_step)==0:
                label_stats=crosstab(self,report_tvd=True)
                print('ptstep:',counter,'  TVD:',label_stats['tvd'])
                if break_pretrain(label_stats,counter):
                    print('Completed Pretrain by TVD Qualification')
                    break

            #Optimize critic
            self.cc.critic_update(self.sess)

            #one iter causal controller
            fetch_dict = {
                "pretrain_op": self.cc.train_op,
                'cc_step':self.cc.step,
                'step':self.step,
            }

            #update what to run
            if counter % self.cc.config.log_step == 0:
                fetch_dict.update({
                    "summary": self.cc.summary_op,
                    "c_loss": self.cc.c_loss,
                    "dcc_loss": self.cc.dcc_loss,
                })
            result = self.sess.run(fetch_dict)

            #update summaries
            if counter % self.cc.config.log_step == 0:
                if counter %(10*self.cc.config.log_step)==0:
                    sum_tvd=make_summary('misc/tvd', label_stats['tvd'])
                    self.summary_writer.add_summary(sum_tvd,result['cc_step'])

                self.summary_writer.add_summary(result['summary'],result['cc_step'])
                self.summary_writer.flush()

                c_loss = result['c_loss']
                dcc_loss = result['dcc_loss']
                print("[{}/{}] Loss_C: {:.6f} Loss_DCC: {:.6f}".\
                      format(counter, cc_step+ num_iter, c_loss, dcc_loss))

            if counter %(10*self.cc.config.log_step)==0:
                self.cc.saver.save(self.sess,self.cc.save_model_name,result['cc_step'])

        else:
            label_stats=crosstab(self,report_tvd=True)
            self.cc.saver.save(self.sess,self.cc.save_model_name,self.cc.step)
            print('Completed Pretrain by Exhausting all Pretrain Steps!')

        print('step:',result['cc_step'],'  TVD:',label_stats['tvd'])


    def train_loop(self,num_iter=None):
        '''
        This is a function for handling the training of either CausalBEGAN or
        CausalGAN models. The python function Model.train_step() is called
        num_iter times and some general image save features: intervening,
        conditioning, etc are done here too.
        '''
        num_iter=num_iter or self.model_config.num_iter

        #Train loop
        print('Entering train loop..')
        for counter in trange(num_iter):

            self.model.train_step(self.sess,counter)

            #scalar and histogram summaries
            if counter % self.config.log_step == 0:
                step,summ=self.sess.run([self.model.step,self.model.summary_op])
                self.summary_writer.add_summary(summ,step)
                self.summary_writer.flush()

            #expensive summaries
            if counter % (self.config.log_step * 50) == 0:
                self.causal_sampling([8,16])
                self.label_interpolation()
                self.sample_diversity()

            #more rare events
            if counter % (self.config.log_step * 100) == 0:
                self.causal_sampling([2,10])

    ##Wrapper methods
    def sample_label(self, cond_dict=None, do_dict=None,N=None):
        return self.cc.sample_label(self.sess,cond_dict=cond_dict,do_dict=do_dict,N=N)
    ##

    ##Sampling and figure methods
    def label_interpolation(self,inputs=None,save_dir=None,ext='.pdf'):
        '''
        Holding all other inputs the same, move a causal controller
        labels between 0 and 1. Recalculate the downstream effects to capture the causal effect.

        For each label, this makes an 8x8 image with each row being
        an instance of z_fixed with varying label
        '''

        interpolation_dir=os.path.join(self.model_dir,'label_interpolation')
        save_dir=save_dir or interpolation_dir
        if not os.path.exists(save_dir):
            os.mkdir(save_dir)

        inputs=inputs or {}

        #use the first 8 values
        #contrasting np.repeat and np.tile to get all combinations
        fixed_z=inputs or {k:np.repeat(v[:8],8,axis=0) for k,v in self.feed_fixed_z.items()}
        setval=np.tile(np.linspace(0,1,8),8).reshape([64,1])

        fixed_z.update({self.batch_size:64})
        save_name='{}/{}_G_interp_{}'+ext

        #make 8x8 image
        for node in self.cc.nodes:

            fd=fixed_z.copy()
            fd[node.label]=setval
            images,step=self.sess.run([self.model.G,self.model.step],fd)
            interp_path=save_name.format(save_dir,step,node.name)
            save_image(images,interp_path,nrow=8)

        out_str="[*] Interpolation Samples saved: "+save_name
        print(save_name.format(save_dir,step,'*'))

    def causal_sampling(self, img_shape ,ext='.pdf'):
        '''
        sampling new noise inputs each time, draw samples from
        interventional distributions.
        Recalculate downstream effects given a label value

        img_shape must have rows divisible by 2

        This function implements the following three sampling techniques: 
        1) Images where 
            Top half is sampled from the intervention do(label=1)
            Bottom half is sampled from the intervention do(label=0)
        2) Images where
            Top half is sampled from the intervention do(label=1/0)
            Bottom half is sampled conditioned on |label = 1/0
        3) Image where 
            Top half is sampled conditioned on |label = 1
            Bottom half is sampled conditioned on |label = 0
        '''

        assert len(img_shape)==2,'2d shape for output'
        assert img_shape[0]%2==0,'should have equal top and bot half'

        shape_str='_'+'x'.join(map(str,img_shape))

        #sample given(Label=1/0)
        conditioning_dir=os.path.join(self.model_dir,'label_conditioning')
        if not os.path.exists(conditioning_dir):
            os.mkdir(conditioning_dir)

        #sample do(Label=1/0)
        intervention_dir=os.path.join(self.model_dir,'label_intervention')
        if not os.path.exists(intervention_dir):
            os.mkdir(intervention_dir)

        #sample do(Label=1)/given(Label=1)
        #sample do(Label=0)/given(Label=0)
        intv_v_conditioning_dir=os.path.join(self.model_dir,'label_intv_v_conditioning')
        if not os.path.exists(intv_v_conditioning_dir):
            os.mkdir(intv_v_conditioning_dir)

        save_name_cond =os.path.join(conditioning_dir,'{}_condition_{}'+shape_str+ext)
        save_name_intv =os.path.join(intervention_dir,'{}_interv_{}'+shape_str+ext)
        save_name_intvcond=os.path.join(intv_v_conditioning_dir,'{}_intvcond_{}={}'+shape_str+ext)

        half_shape=[img_shape[0]//2, img_shape[1]]
        N=np.prod(half_shape)

        for name in self.cc.node_names:
            #First sample labels (two step more efficient)
            #ex:{'Male':1}
            c0=self.sample_label(cond_dict={name:0},N=N)
            c1=self.sample_label(cond_dict={name:1},N=N)
            d0=self.sample_label(do_dict=  {name:0},N=N)
            d1=self.sample_label(do_dict=  {name:1},N=N)

            feed_c0={self.cc.label_dict[k]:v for k,v in c0.iteritems()}
            feed_c1={self.cc.label_dict[k]:v for k,v in c1.iteritems()}
            feed_d0={self.cc.label_dict[k]:v for k,v in d0.iteritems()}
            feed_d1={self.cc.label_dict[k]:v for k,v in d1.iteritems()}

            feed_c0[self.batch_size]=N
            feed_c1[self.batch_size]=N
            feed_d0[self.batch_size]=N
            feed_d1[self.batch_size]=N

            step=self.sess.run(self.model.step)
            c0_images=self.sess.run(self.model.G,feed_c0)
            c1_images=self.sess.run(self.model.G,feed_c1)
            d0_images=self.sess.run(self.model.G,feed_d0)
            d1_images=self.sess.run(self.model.G,feed_d1)

            save_path_cond      = save_name_cond.format(step,name)
            save_path_intv      = save_name_intv.format(step,name)
            save_path_intvcond0 = save_name_intvcond.format(step,name,0)
            save_path_intvcond1 = save_name_intvcond.format(step,name,1)

            #saveimage fills row by row from top left
            save_image(np.concatenate([c1_images,c0_images]),save_path_cond,nrow=img_shape[0])
            save_image(np.concatenate([d1_images,d0_images]),save_path_intv,nrow=img_shape[0])
            save_image(np.concatenate([d0_images,c0_images]),save_path_intvcond0,nrow=img_shape[0])
            save_image(np.concatenate([d1_images,c1_images]),save_path_intvcond1,nrow=img_shape[0])

        print("[*] Conditioning Samples saved: "+conditioning_dir)
        print("[*] Intervention Samples saved: "+intervention_dir)
        print("[*] Intervention vs Condition Samples saved: "+intv_v_conditioning_dir)


    def sample_diversity(self,save_dir=None,ext='.pdf'):
        '''
        This is to make a 16x16 image from fixed inputs
        to examine the image diversity over time
        '''
        #Make 16x16 image
        nrow=16
        diversity_dir=os.path.join(self.model_dir,'image_diversity')
        save_dir=save_dir or diversity_dir
        if not os.path.exists(save_dir):
            os.mkdir(save_dir)
        save_name=os.path.join(save_dir,'{}_G_diversity'+ext)

        feed_fixed={k:v[:256] for k,v in self.feed_fixed_z.items()}
        feed_fixed.update({self.batch_size:256})

        step,images = self.sess.run([self.model.step,self.model.G], feed_dict=feed_fixed)

        print('image shape',images.shape)

        save_path=save_name.format(step)
        save_image(images, save_path, nrow=nrow)
        print("[*] Diversity Sample saved: {}".format(save_path))


================================================
FILE: utils.py
================================================
from __future__ import print_function
import tensorflow as tf
from functools import partial
import os
from os import listdir
from os.path import isfile, join
import shutil
import sys
from glob import glob
import math
import json
import logging
import numpy as np
from PIL import Image
from datetime import datetime
from tensorflow.core.framework import summary_pb2


def make_summary(name, val):
    return summary_pb2.Summary(value=[summary_pb2.Summary.Value(tag=name, simple_value=val)])

def summary_stats(name,tensor,collections=None,hist=False):
    collections=collections or [tf.GraphKeys.SUMMARIES]
    ave=tf.reduce_mean(tensor)
    std=tf.sqrt(tf.reduce_mean(tf.square(ave-tensor)))
    tf.summary.scalar(name+'_ave',ave,collections)
    tf.summary.scalar(name+'_std',std,collections)
    if hist:
        tf.summary.histogram(name+'_hist',tensor,collections)


def prepare_dirs_and_logger(config):

    if config.load_path:
        strip_lp=config.load_path.strip('./')
        if strip_lp.startswith(config.log_dir):
            config.model_dir = config.load_path
        else:
            if config.load_path.startswith(config.dataset):
                config.model_name = config.load_path
            else:
                config.model_name = "{}_{}".format(config.dataset, config.load_path)
    else:#new model
        config.model_name = "{}_{}".format(config.dataset, get_time())
        if config.descrip:
            config.model_name+='_'+config.descrip


    if not hasattr(config, 'model_dir'):
        config.model_dir = os.path.join(config.log_dir, config.model_name)
    config.data_path = os.path.join(config.data_dir, config.dataset)


    if not config.load_path:
        config.log_code_dir=os.path.join(config.model_dir,'code')
        for path in [config.log_dir, config.data_dir,
                     config.model_dir]:
            if not os.path.exists(path):
                os.makedirs(path)

        #Copy python code in directory into model_dir/code for future reference:
        #All python files in this directory are copied.
        code_dir=os.path.dirname(os.path.realpath(sys.argv[0]))

        ##additionally, all python files in these directories are also copied. Also symlinks are copied. The idea is to allow easier model loading in the future
        allowed_dirs=['causal_controller','causal_began','causal_dcgan','figure_scripts']

        #ignore copy of all non-*.py except for these directories
        #If you make another folder you want copied, you have to add it here
        ignore_these=partial(ignore_except,allowed_dirs=allowed_dirs)
        shutil.copytree(code_dir,config.log_code_dir,symlinks=True,ignore=ignore_these)


#        model_files = [f for f in listdir(code_dir) if isfile(join(code_dir, f))]
#        for f in model_files:
#            if f.endswith('.py'):
#                shutil.copy2(f,config.log_code_dir)


def ignore_except(src,contents,allowed_dirs):
    files=filter(os.path.isfile,contents)
    dirs=filter(os.path.isdir,contents)
    ignored_files=[f for f in files if not f.endswith('.py')]
    ignored_dirs=[d for d in dirs if not d in allowed_dirs]
    return ignored_files+ignored_dirs

def get_time():
    return datetime.now().strftime("%m%d_%H%M%S")

def save_configs(config,cc_config,dcgan_config,began_config):
    model_dir=config.model_dir
    print("[*] MODEL dir: %s" % model_dir)
    save_config(config)
    save_config(cc_config,'cc_params.json',model_dir)
    save_config(dcgan_config,'dcgan_params.json',model_dir)
    save_config(began_config,'began_params.json',model_dir)


def save_config(config,name="params.json",where=None):
    where=where or config.model_dir
    param_path = os.path.join(where, name)

    print("[*] PARAM path: %s" % param_path)

    with open(param_path, 'w') as fp:
        json.dump(config.__dict__, fp, indent=4, sort_keys=True)

def get_available_gpus():
    from tensorflow.python.client import device_lib
    local_device_protos = device_lib.list_local_devices()
    return [x.name for x in local_device_protos if x.device_type=='GPU']

def distribute_input_data(data_loader,num_gpu):
    '''
    data_loader is a dictionary of tensors that are fed into our model

    This function takes that dictionary of n*batch_size dimension tensors
    and breaks it up into n dictionaries with the same key of tensors with
    dimension batch_size. One is given to each gpu
    '''
    if num_gpu==0:
        return {'/cpu:0':data_loader}

    gpus=get_available_gpus()
    if num_gpu > len(gpus):
        raise ValueError('number of gpus specified={}, more than gpus available={}'.format(num_gpu,len(gpus)))

    gpus=gpus[:num_gpu]

    data_by_gpu={g:{} for g in gpus}
    for key,value in data_loader.items():
        spl_vals=tf.split(value,num_gpu)
        for gpu,val in zip(gpus,spl_vals):
            data_by_gpu[gpu][key]=val

    return data_by_gpu


def rank(array):
    return len(array.shape)

def make_grid(tensor, nrow=8, padding=2,
              normalize=False, scale_each=False):
    """Code based on https://github.com/pytorch/vision/blob/master/torchvision/utils.py
    minor improvement, row/col was reversed"""
    nmaps = tensor.shape[0]
    ymaps = min(nrow, nmaps)
    xmaps = int(math.ceil(float(nmaps) / ymaps))
    height, width = int(tensor.shape[1] + padding), int(tensor.shape[2] + padding)
    grid = np.zeros([height * ymaps + 1 + padding // 2, width * xmaps + 1 + padding // 2, 3], dtype=np.uint8)
    k = 0
    for y in range(ymaps):
        for x in range(xmaps):
            if k >= nmaps:
                break
            h, h_width = y * height + 1 + padding // 2, height - padding
            w, w_width = x * width + 1 + padding // 2, width - padding

            grid[h:h+h_width, w:w+w_width] = tensor[k]
            k = k + 1
    return grid

def save_image(tensor, filename, nrow=8, padding=2,
               normalize=False, scale_each=False):
    ndarr = make_grid(tensor, nrow=nrow, padding=padding,
                            normalize=normalize, scale_each=scale_each)
    im = Image.fromarray(ndarr)
    im.save(filename)