Full Code of yaroslavvb/stuff for AI

master a8024ead315a cached

266 files

15.6 MB

4.1M tokens

3079 symbols

1 requests

Copy disabled (too large) Download .txt

Showing preview only (16,458K chars total). Download the full file to get everything.

Repository: yaroslavvb/stuff
Branch: master
Commit: a8024ead315a
Files: 266
Total size: 15.6 MB

Directory structure:
gitextract_ycct44t1/

├── .gitignore
├── README.md
├── akaitsuki-slow/
│   ├── config.py
│   ├── feed_dict.pbtxt
│   ├── feed_dict.py
│   └── main.py
├── autotune/
│   ├── README.md
│   ├── autograd_lib.py
│   ├── autograd_lib_test.py
│   ├── autograd_test.py
│   ├── ciresan_bench.py
│   ├── curvature_test.py
│   ├── eval_conv2d_approx.py
│   ├── factored_test.py
│   ├── globals.py
│   ├── hessian_test.py
│   ├── linalg_bench.py
│   ├── linesearch_test_disabled.py
│   ├── lyapunov_test.py
│   ├── mnist_end2end_test.py
│   ├── plotting_test.py
│   ├── pytorch_benchmark.py
│   ├── scipy_benchmark.py
│   ├── svd_benchmark.py
│   ├── test/
│   │   ├── bad_sigmas.pt
│   │   ├── factored.pt
│   │   └── gesvd_crash.txt
│   ├── train_ciresan.py
│   ├── train_ciresan_cca.py
│   ├── train_ciresan_factored.py
│   ├── train_ciresan_new.py
│   ├── train_medium.py
│   ├── train_small.py
│   ├── train_small_xent.py
│   ├── train_small_xent_factored.py
│   ├── train_tiny.py
│   ├── train_tiny_xent.py
│   ├── util.py
│   └── util_test.py
├── aws-recipes.ipynb
├── aws-scratch.ipynb
├── benchmark_huggingface_predict.py
├── bin/
│   └── tfversion
├── clipping-profile.ipynb
├── cluster/
│   ├── .gitignore
│   ├── README.md
│   ├── async_adder.py
│   ├── aws.py
│   ├── benchmark_grpc_recv.py
│   ├── benchmarks/
│   │   ├── .gitignore
│   │   ├── LICENSE
│   │   ├── README.md
│   │   ├── bower_components/
│   │   │   ├── d3/
│   │   │   │   ├── .bower.json
│   │   │   │   ├── .gitattributes
│   │   │   │   ├── CONTRIBUTING.md
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── bower.json
│   │   │   │   ├── d3.js
│   │   │   │   └── package.js
│   │   │   └── plottable/
│   │   │       ├── .bower.json
│   │   │       ├── bower.json
│   │   │       ├── plottable.css
│   │   │       ├── plottable.d.ts
│   │   │       └── plottable.js
│   │   ├── dashboard_app/
│   │   │   ├── app.yaml
│   │   │   ├── main.py
│   │   │   ├── main_test.py
│   │   │   ├── requirements.txt
│   │   │   ├── static/
│   │   │   │   ├── css/
│   │   │   │   │   └── style.css
│   │   │   │   └── js/
│   │   │   │       └── benchmark_latency_chart.js
│   │   │   └── templates/
│   │   │       ├── index.html
│   │   │       └── test.html
│   │   ├── index.html
│   │   ├── js/
│   │   │   ├── csv_benchmark_chart.js
│   │   │   └── latency_chart.js
│   │   ├── scripts/
│   │   │   ├── Dockerfile.tf_cnn_benchmarks
│   │   │   ├── benchmark_configs.yml
│   │   │   ├── tf_cnn_benchmarks/
│   │   │   │   ├── README.md
│   │   │   │   ├── benchmark_cnn.py
│   │   │   │   ├── benchmark_storage.py
│   │   │   │   ├── cbuild_benchmark_storage.py
│   │   │   │   ├── cnn_util.py
│   │   │   │   ├── convnet_builder.py
│   │   │   │   ├── datasets.py
│   │   │   │   ├── models/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── alexnet_model.py
│   │   │   │   │   ├── densenet_model.py
│   │   │   │   │   ├── googlenet_model.py
│   │   │   │   │   ├── inception_model.py
│   │   │   │   │   ├── lenet_model.py
│   │   │   │   │   ├── model.py
│   │   │   │   │   ├── model_config.py
│   │   │   │   │   ├── overfeat_model.py
│   │   │   │   │   ├── resnet_model.py
│   │   │   │   │   ├── trivial_model.py
│   │   │   │   │   └── vgg_model.py
│   │   │   │   ├── preprocessing.py
│   │   │   │   ├── tf_cnn_benchmarks.py
│   │   │   │   └── variable_mgr.py
│   │   │   └── util/
│   │   │       ├── __init__.py
│   │   │       ├── benchmark_util.py
│   │   │       ├── benchmark_util_test.py
│   │   │       ├── convert_csv_to_json.py
│   │   │       └── convert_csv_to_json_test.py
│   │   ├── soumith_benchmarks.html
│   │   └── tools/
│   │       ├── k8s_tensorflow_lib.py
│   │       ├── k8s_tensorflow_test.py
│   │       ├── kubectl_util.py
│   │       ├── kubectl_util_test.py
│   │       └── run_distributed_benchmarks.py
│   ├── client_transfer_benchmark.py
│   ├── cloud-formation-example/
│   │   ├── README.md
│   │   ├── iam.yaml
│   │   ├── tensorflow.yaml
│   │   └── zone.sh
│   ├── connect.py
│   ├── delete_placement_groups.py
│   ├── fill_efs.py
│   ├── imagenet64/
│   │   ├── README.md
│   │   ├── aws.py
│   │   ├── launch.py
│   │   ├── requirements.txt
│   │   └── variable_mgr.py
│   ├── instance_info.py
│   ├── launch_async_adder.py
│   ├── launch_micro.py
│   ├── launch_ray.py
│   ├── launch_simple_tf.py
│   ├── local_distributed_benchmark.py
│   ├── myutil.py
│   ├── ray_add.py
│   ├── simple_distributed.py
│   ├── terminate_instances.py
│   ├── test_aws.py
│   ├── tf-tools/
│   │   ├── .gitignore
│   │   ├── benchmark/
│   │   │   ├── multi_gpu/
│   │   │   │   ├── advanced_tweaks_compare.sh
│   │   │   │   ├── image_classification_bench_tests.sh
│   │   │   │   ├── stats_monitor.sh
│   │   │   │   ├── test_runner.sh
│   │   │   │   └── unit_test_stats_monitor.sh
│   │   │   └── runner/
│   │   │       ├── cluster_aws.py
│   │   │       ├── command_builder.py
│   │   │       ├── configs/
│   │   │       │   └── aws/
│   │   │       │       ├── multi_server.yaml
│   │   │       │       └── yaroslav.yaml
│   │   │       ├── instance_info.py
│   │   │       ├── launch_experiment.py
│   │   │       ├── test_cluster_aws.py
│   │   │       ├── test_command_builder.py
│   │   │       └── util.py
│   │   └── install/
│   │       ├── aws_amzlinux.md
│   │       └── aws_ubuntu16_04.md
│   ├── tmux.py
│   └── upload_test.txt
├── conditional_backprop.py
├── configure_tf.sh
├── configure_tf_cpu.sh
├── danjar_peek.py
├── distributed/
│   ├── README.md
│   ├── benchmark_grpc_recv.py
│   └── client_transfer_benchmark.py
├── double_memory_bug.py
├── dynamic_stitch_gpu.py
├── dynamic_stitch_gpu_profile.pbtxt
├── eager_lbfgs/
│   ├── .ipynb_checkpoints/
│   │   └── performance-checkpoint.ipynb
│   ├── common_gd.py
│   ├── data/
│   │   ├── short_batch.csv
│   │   ├── short_eager_batch.csv
│   │   ├── short_eager_loss.csv
│   │   ├── short_eager_time.csv
│   │   ├── short_pytorch_loss.csv
│   │   └── short_pytorch_time.csv
│   ├── eager_lbfgs.py
│   ├── performance.ipynb
│   ├── pytorch_lbfgs.py
│   ├── run_experiment.py
│   ├── torch_lbfgs.lua
│   └── util.py
├── enqueue_many_test.py
├── enqueue_many_test_singlerun.py
├── ericyue-slowreader/
│   ├── benchmark-batch-noqueuerunners-timeline.json
│   ├── benchmark-batch-noqueuerunners.profile
│   ├── benchmark-batch-noqueuerunners.py
│   ├── benchmark-batch.py
│   ├── benchmark-reader.py
│   ├── benchmark-synthetic-batch.py
│   ├── benchmark-synthetic.py
│   ├── benchmark.py
│   ├── data.zlib
│   └── profile-batch.py
├── free_gpus.py
├── github_pyfunc_slowness.py
├── gpu-memory-transfer.ipynb
├── gpu_oom.py
├── graph_template.py
├── imagenet15-scratch.ipynb
├── input_benchmarks/
│   ├── convert_to_records.py
│   ├── fully_connected_feed.py
│   ├── fully_connected_preloaded_var.py
│   ├── fully_connected_reader.py
│   ├── timeline.feed.json
│   ├── timeline.reader.json
│   └── timeline.var.json
├── inverse_segfault.py
├── keras_autoencoder/
│   ├── keras_large.py
│   ├── util.py
│   └── weightnorm.py
├── khatri_rao_benchmark.py
├── lazy_dog.py
├── linalg-benchmark/
│   ├── README.md
│   ├── bad_matrix.py
│   ├── benchmark.py
│   ├── environment.yml
│   ├── get_cores_per_socket.py
│   ├── launch.py
│   ├── launch_tensorflow_svd_crash.py
│   ├── requirements.txt
│   ├── results.txt
│   └── tensorflow_svd_crash.py
├── line_search_example/
│   ├── data/
│   │   └── step_lengths_ada.csv
│   ├── line_search_example.py
│   └── util.py
├── linearize/
│   ├── linearize.py
│   ├── linearize_test.py
│   └── memory_util.py
├── matmul_benchmark.py
├── matmul_benchmark_seq.py
├── matmul_times/
│   ├── 1080-float16.csv
│   ├── 1080-float32.csv
│   ├── g3-float16.csv
│   ├── g3-float32.csv
│   ├── nvidia-p3-float16.csv
│   ├── nvidia-p3-float32.csv
│   ├── p2-float16.csv
│   └── p2-float32.csv
├── mavelin/
│   ├── machine1.py
│   └── machine3.py
├── memory tracking.ipynb
├── memory-probe-examples.ipynb
├── memory-release-check.ipynb
├── natural_gradient_multilayer.py
├── node-merge.ipynb
├── notebook_util.py
├── numpy_initializers/
│   ├── kfac_cifar.py
│   └── util.py
├── parallel_dequeue_test.py
├── phantomjs-tryout.ipynb
├── phantomjs-tryout.js
├── pytorch-hessian.ipynb
├── queue_mismatch.py
├── queues_talk/
│   └── queues.ipynb
├── resnet_8_simple.pbtxt
├── resnet_leak_report.py
├── resnet_leak_report2.py
├── resource_variable_test.py
├── rotations_comparison.py
├── saving memory by using functions.ipynb
├── simple_rewiring.ipynb
├── simple_train.py
├── svd_benchmark.py
├── svd_noconverge.py
├── svd_test.py
├── tf_initializer_bug_report.py
├── tiny_runs/
│   ├── qr_test.py
│   └── tiny_tf.py
└── whitening_util.py

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
/__pycache__
/.ipynb_checkpoints
*#
*~
/linalg-benchmark/.idea/linalg-benchmark.iml
/linalg-benchmark/.idea/misc.xml
/linalg-benchmark/.idea/modules.xml
/linalg-benchmark/.idea/vcs.xml
/linalg-benchmark/.idea/workspace.xml
/linalg-benchmark/.idea
.DS_Store
__pycache__


================================================
FILE: README.md
================================================
# stuff


================================================
FILE: akaitsuki-slow/config.py
================================================
import argparse
 
 
def str2bool(v):
    return v.lower() in ('y', 'yes', 't', 'true', '1')
 
 
def get_args():
    parser = argparse.ArgumentParser()
    parser.register('type', 'bool', str2bool)
 
    parser.add_argument('--random_seed',
                        type=int,
                        default=1013,
                        help='Random seed')
 
    parser.add_argument('--vocab_size',
                        type=int,
                        default=10000,
                        help='Default embed size')
 
    parser.add_argument('--embed_size',
                        type=int,
                        default=128,
                        help='Default embedding size if embedding_file is not given')
 
    parser.add_argument('--hidden_size',
                        type=int,
                        default=128,
                        help='Hidden size of RNN units')
 
    parser.add_argument('--num_labels',
                        type=int,
                        default=96,
                        help='num labels')
 
    parser.add_argument('--bidir',
                        type='bool',
                        default=True,
                        help='bidir: whether to use a bidirectional RNN')
 
    parser.add_argument('--num_layers',
                        type=int,
                        default=1,
                        help='Number of RNN layers')
 
    parser.add_argument('--rnn_type',
                        type=str,
                        default='gru',
                        help='RNN type: lstm or gru (default)')
 
    parser.add_argument('--batch_size',
                        type=int,
                        default=32,
                        help='Batch size')
 
    parser.add_argument('--dropout_rate',
                        type=float,
                        default=0.2,
                        help='Dropout rate')
 
    parser.add_argument('--optimizer',
                        type=str,
                        default='sgd',
                        help='Optimizer: sgd (default) or adam or rmsprop')
 
    parser.add_argument('--learning_rate', '-lr',
                        type=float,
                        default=0.1,
                        help='Learning rate for SGD')
 
    return parser.parse_args()



================================================
FILE: akaitsuki-slow/feed_dict.pbtxt
================================================
step_stats {
  dev_stats {
    device: "/job:localhost/replica:0/task:0/cpu:0"
    node_stats {
      node_name: "_SOURCE"
      all_start_micros: 1489437692909096
      op_start_rel_micros: 2
      op_end_rel_micros: 3
      all_end_rel_micros: 7
      memory {
        allocator_name: "cpu"
      }
      timeline_label: "_SOURCE = NoOp()"
      scheduled_micros: 1489437692909081
    }
    node_stats {
      node_name: "mul/y"
      all_start_micros: 1489437692909111
      op_end_rel_micros: 1
      all_end_rel_micros: 11
      memory {
        allocator_name: "cpu"
      }
      output {
        tensor_description {
          dtype: DT_FLOAT
          shape {
          }
          allocation_description {
            requested_bytes: 4
            allocator_name: "cpu"
            ptr: 4588618432
          }
        }
      }
      timeline_label: "mul/y = Const()"
      scheduled_micros: 1489437692909103
    }
    node_stats {
      node_name: "mul"
      all_start_micros: 1489437692909144
      op_start_rel_micros: 2
      op_end_rel_micros: 12
      all_end_rel_micros: 16
      memory {
        allocator_name: "cpu"
        total_bytes: 4
      }
      output {
        tensor_description {
          dtype: DT_FLOAT
          shape {
          }
          allocation_description {
            requested_bytes: 4
            allocator_name: "cpu"
            ptr: 4937671424
          }
        }
      }
      timeline_label: "mul = Mul(_recv_Placeholder_0, mul/y)"
      scheduled_micros: 1489437692909140
    }
  }
}
partition_graphs {
  node {
    name: "_recv_Placeholder_0"
    op: "_Recv"
    device: "/job:localhost/replica:0/task:0/cpu:0"
    attr {
      key: "client_terminated"
      value {
        b: true
      }
    }
    attr {
      key: "recv_device"
      value {
        s: "/job:localhost/replica:0/task:0/cpu:0"
      }
    }
    attr {
      key: "send_device"
      value {
        s: "/job:localhost/replica:0/task:0/cpu:0"
      }
    }
    attr {
      key: "send_device_incarnation"
      value {
        i: -2824119418009608211
      }
    }
    attr {
      key: "tensor_name"
      value {
        s: "Placeholder:0"
      }
    }
    attr {
      key: "tensor_type"
      value {
        type: DT_FLOAT
      }
    }
  }
  node {
    name: "mul/y"
    op: "Const"
    device: "/job:localhost/replica:0/task:0/cpu:0"
    attr {
      key: "dtype"
      value {
        type: DT_FLOAT
      }
    }
    attr {
      key: "value"
      value {
        tensor {
          dtype: DT_FLOAT
          tensor_shape {
          }
          float_val: 2.0
        }
      }
    }
  }
  node {
    name: "mul"
    op: "Mul"
    input: "_recv_Placeholder_0"
    input: "mul/y"
    device: "/job:localhost/replica:0/task:0/cpu:0"
    attr {
      key: "T"
      value {
        type: DT_FLOAT
      }
    }
  }
  node {
    name: "_send_mul_0"
    op: "_Send"
    input: "mul"
    device: "/job:localhost/replica:0/task:0/cpu:0"
    attr {
      key: "T"
      value {
        type: DT_FLOAT
      }
    }
    attr {
      key: "client_terminated"
      value {
        b: true
      }
    }
    attr {
      key: "recv_device"
      value {
        s: "/job:localhost/replica:0/task:0/cpu:0"
      }
    }
    attr {
      key: "send_device"
      value {
        s: "/job:localhost/replica:0/task:0/cpu:0"
      }
    }
    attr {
      key: "send_device_incarnation"
      value {
        i: -2824119418009608211
      }
    }
    attr {
      key: "tensor_name"
      value {
        s: "mul:0"
      }
    }
  }
  versions {
    producer: 21
  }
}


================================================
FILE: akaitsuki-slow/feed_dict.py
================================================
import numpy as np
import tensorflow as tf
from tensorflow.python.client import timeline 
 

sess = tf.Session()
a = tf.placeholder(tf.float32)
b = a*2
c0 = sess.run([b], feed_dict={a:2.})

run_metadata = tf.RunMetadata()
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_options.output_partition_graphs=True

c0 = sess.run([b], feed_dict={a:2.}, options=run_options,
              run_metadata=run_metadata)
with open("feed_dict.pbtxt", "w") as f:
    f.write(str(run_metadata))


================================================
FILE: akaitsuki-slow/main.py
================================================
import logging
import time
import config
import numpy as np
import tensorflow as tf
from tensorflow.python.ops import array_ops
from tensorflow.python.client import timeline 
 
def retrieve_seq_length_op2(data):
    return tf.reduce_sum(tf.cast(tf.greater(data, tf.zeros_like(data)), tf.int32), 1)
 
 
def advanced_indexing_op(input, index):
    batch_size = tf.shape(input)[0]
    max_length = tf.shape(input)[1]
    dim_size = int(input.get_shape()[2])
    index = tf.range(0, batch_size) * max_length + (index - 1)
    flat = tf.reshape(input, [-1, dim_size])
    relevant = tf.gather(flat, index)
    return relevant
 
 
def bidirectional_dynamic_rnn(inputs, cell_fn, n_hidden, sequence_length=None, return_last=False, name='bidyrnn'):
    with tf.variable_scope(name):
        batch_size = array_ops.shape(inputs)[0]
 
        fw_cell = cell_fn(num_units=n_hidden)
        bw_cell = cell_fn(num_units=n_hidden)
 
        fw_initial_state = fw_cell.zero_state(batch_size, dtype=tf.float32)
        bw_initial_state = bw_cell.zero_state(batch_size, dtype=tf.float32)
 
        outputs, _ = tf.nn.bidirectional_dynamic_rnn(
            cell_fw=fw_cell,
            cell_bw=bw_cell,
            inputs=inputs,
            sequence_length=sequence_length,
            initial_state_fw=fw_initial_state,
            initial_state_bw=bw_initial_state,
        )
        outputs = tf.concat(outputs, 2)
        if return_last:
            outputs = advanced_indexing_op(outputs, sequence_length)
        return outputs
 
 
def BilinearAttention(inputs, n_hidden, mask=None,
                      initializer=tf.random_uniform_initializer(-0.01, 0.01)):
    W = tf.get_variable('W', shape=(n_hidden, n_hidden),
                        initializer=initializer)
    M = tf.matmul(inputs[1], W)
    M = tf.expand_dims(M, axis=1)
    alpha = tf.nn.softmax(tf.reduce_sum(inputs[0] * M, axis=2))
    if mask is not None:
        alpha *= mask
        alpha /= tf.reduce_sum(alpha, axis=1, keep_dims=True)
    alpha = tf.expand_dims(alpha, axis=2)
    outputs = tf.reduce_sum(inputs[0] * alpha, axis=1)
 
    return outputs
 
 
def inference(x1, x2, mask1, mask2, l, y,
              args, embeddings, reuse=False, training=False):
    with tf.variable_scope('model', reuse=reuse):
        embed = tf.get_variable('embed', shape=embeddings.shape,
                                initializer=tf.constant_initializer(embeddings))
        embed1 = tf.nn.embedding_lookup(embed, x1)
        embed2 = tf.nn.embedding_lookup(embed, x2)
 
        keep = 1.0 - args.dropout_rate if training else 1.0
        dropout1 = tf.nn.dropout(embed1, keep)
        dropout2 = tf.nn.dropout(embed2, keep)
 
        rnn_cell = {'gru': tf.contrib.rnn.GRUCell,
                    'lstm': tf.contrib.rnn.LSTMCell}[args.rnn_type]
        rnn1 = bidirectional_dynamic_rnn(dropout1, cell_fn=rnn_cell,
                                         n_hidden=args.hidden_size,
                                         sequence_length=retrieve_seq_length_op2(mask1),
                                         name='rnn1')
        rnn2 = bidirectional_dynamic_rnn(dropout2, cell_fn=rnn_cell,
                                         n_hidden=args.hidden_size,
                                         sequence_length=retrieve_seq_length_op2(mask2),
                                         return_last=True,
                                         name='rnn2')
 
        args.rnn_output_size = 2 * args.hidden_size
        att = BilinearAttention([rnn1, rnn2], args.rnn_output_size, mask1)
 
        z = tf.layers.dense(att, units=args.num_labels,
                            kernel_initializer=tf.random_uniform_initializer(-0.1, 0.1),
                            use_bias=False)
 
        prob = tf.nn.softmax(z)
        prob = prob * l
        prob /= tf.reduce_sum(prob, axis=1, keep_dims=True)
 
        pred = tf.to_int32(tf.arg_max(prob, dimension=1))
        acc = tf.reduce_mean(tf.to_float(tf.equal(pred, y)))
 
        if not training:
            return acc
        else:
            epsilon = 1e-7
            prob = tf.clip_by_value(prob, epsilon, 1 - epsilon)
            loss = tf.one_hot(y, depth=args.num_labels) * -tf.log(prob)
            loss = tf.reduce_sum(loss, axis=1)
            loss = tf.reduce_mean(loss)
 
            if args.optimizer == 'sgd':
                optimizer = tf.train.GradientDescentOptimizer(learning_rate=args.learning_rate)
            elif args.optimizer == 'adam':
                optimizer = tf.train.AdamOptimizer()
            elif args.optimizer == 'rmsprop':
                optimizer = tf.train.RMSPropOptimizer(learning_rate=args.learning_rate)
            else:
                raise NotImplementedError('optimizer = %s' % args.optimizer)
            train_op = optimizer.minimize(loss)
            return train_op, loss, acc
 
 
def main(args):
    logging.info('-' * 50)
    logging.info('Preparing data..')
 
    embeddings = np.random.uniform(-1, 1, (args.vocab_size, args.embed_size))
    x1 = np.random.choice(args.vocab_size, (4 * args.batch_size, 2000))
    x2 = np.random.choice(args.vocab_size, (4 * args.batch_size, 50))
    len1 = np.random.choice(np.arange(1000, 2000), 4 * args.batch_size)
    mask1 = np.ones((4 * args.batch_size, 2000)).astype('float32')
    for l, mask in zip(len1, mask1):
        mask[l:] = 0
    len2 = np.random.choice(np.arange(25, 50), 4 * args.batch_size)
    mask2 = np.ones((4 * args.batch_size, 50)).astype('float32')
    for l, mask in zip(len2, mask2):
        mask[l:] = 0
    l = np.ones((4 * args.batch_size, args.num_labels)).astype('float32')
    y = np.random.choice(args.num_labels, 4 * args.batch_size)
 
    in_x1 = tf.placeholder(tf.int32, [None, None])
    in_x2 = tf.placeholder(tf.int32, [None, None])
    in_mask1 = tf.placeholder(tf.float32, [None, None])
    in_mask2 = tf.placeholder(tf.float32, [None, None])
    in_l = tf.placeholder(tf.float32, [None, None])
    in_y = tf.placeholder(tf.int32, [None])
    feed_dict = {in_x1: x1, in_x2: x2,
                 in_mask1: mask1, in_mask2: mask2,
                 in_l: l, in_y: y}
 
    q = tf.RandomShuffleQueue(capacity=2000 * args.batch_size, min_after_dequeue=0,
                              dtypes=[tf.int32, tf.int32, tf.float32, tf.float32, tf.float32, tf.int32],
                              shapes=[x1.shape[1:], x2.shape[1:],
                                      mask1.shape[1:], mask2.shape[1:],
                                      l.shape[1:], y.shape[1:]])
    q_size = q.size()
    enqueue_op = q.enqueue_many([in_x1, in_x2, in_mask1, in_mask2, in_l, in_y])
    qr = tf.train.QueueRunner(q, [enqueue_op])
    all_data = q.dequeue_many(args.batch_size)
 
    logging.info('Building Computation Graph..')
    train_op, loss, ac = inference(*all_data, args, embeddings, reuse=False, training=True)
    light_op = tf.square(all_data[0])
 
    logging.info('-' * 50)
    logging.info('Create TensorFlow session..')
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    config.allow_soft_placement = True
    sess = tf.Session(config=config)
 
    logging.info('Initialize model parameters..')
    sess.run(tf.global_variables_initializer())
 
    logging.info('-' * 50)
    logging.info(args)
 
    logging.info('-' * 50)
    logging.info('Start training..')
 
    num_samples_in_queue = sess.run(q_size)
    while num_samples_in_queue < 1999 * args.batch_size:
        sess.run(qr.enqueue_ops, feed_dict)
        num_samples_in_queue = sess.run(q_size)
        print("Recharging queue, current size = %i" % num_samples_in_queue)
 
    writer = tf.summary.FileWriter('summary', sess.graph)
    idx = 0
    while num_samples_in_queue > args.batch_size:
        idx += 1
        run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
        run_metadata = tf.RunMetadata()
        begin_time = time.time()
        #sess.run(light_op, options=run_options, run_metadata=run_metadata)
        sess.run(train_op, options=run_options, run_metadata=run_metadata)
        end_time = time.time()
        writer.add_run_metadata(run_metadata, 'step = %d' % idx)
        writer.flush()
        num_samples_in_queue = sess.run(q_size)
        logging.info('%d left in queue' % num_samples_in_queue)
        logging.info('elapsed time %.2f(s)' % (end_time - begin_time))

        with open('stepstats-%d.json'%(idx,), 'w') as f:
            f.write(str(run_metadata))

        tl = timeline.Timeline(run_metadata.step_stats)
        ctf = tl.generate_chrome_trace_format()
        with open('timeline-%d.json'%(idx,), 'w') as f:
            f.write(ctf)
        
    writer.close()
    logging.info('-' * 50)
    logging.info('Close TensorFlow session..')
    sess.close()
 
 
if __name__ == "__main__":
    args = config.get_args()
    np.random.seed(args.random_seed)
    tf.set_random_seed(args.random_seed)
 
    logging.basicConfig(level=logging.DEBUG,
                        format='%(asctime)s %(message)s', datefmt='%m-%d %H:%M')
 
    main(args)

    


================================================
FILE: autotune/README.md
================================================
To run tests in this directory

```
pytest
```

If there's a slow test, you can run this file directly to see timings of individual tests, ie

```
python linesearch_test.py
```


================================================
FILE: autotune/autograd_lib.py
================================================
"""
Library for extracting interesting quantites from autograd.

Not thread-safe because of module-level variables affecting state of autograd

Notation:
o: number of output classes (exact Hessian), number of Hessian samples (sampled Hessian)
n: batch-size
do: output dimension (output channels for convolution)
di: input dimension (input channels for convolution)
s: spatial dimension (Oh*Ow)
Oh, Ow: output height, output width (convolution)
Kh, Kw: kernel height, kernel width (convolution)

Hi: per-example Hessian
    Linear layer: shape [do*di, do*di]
    Conv2d layer: shape [do*di*Kh*Kw, do*di*Kh*Kw]
Hi_bias: per-example Hessian of bias
H: mean Hessian of matmul
H_bias: mean Hessian of bias


Jo: batch output Jacobian of matmul, gradient of output for each output,example pair, [o, n, ....]
Jo_bias: output Jacobian of bias

A, activations: inputs into matmul
    Linear: [n, di]
    Conv2d: [n, di, Ih, Iw] -> (unfold) -> [n, di, Oh, Ow]
B, backprops: backprop values (aka Lop aka Jacobian-vector product) for current layer
    Linear: [n, do]
    Conv2d: [n, do, Oh, Ow]

weight: matmul part of layer, Linear [di, do], Conv [do, di, Kh, Kw]

H, hess  -- hessian
S, sigma -- noise
L, lyap -- lyapunov matrix

"""
import math
from typing import List, Optional, Callable, Tuple, Dict

import torch
import torch.nn as nn
import torch.nn.functional as F

import util as u
import globals as gl
from attrdict import AttrDefault, AttrDict

_supported_layers = ['Linear', 'Conv2d']  # Supported layer class types  TODO(y): make non-private
_supported_methods = ['exact', 'kron', 'mean_kron', 'experimental_kfac']  # supported approximation methods
_supported_losses = ['LeastSquares', 'CrossEntropy']

# module-level variables affecting state of autograd
_global_hooks_disabled: bool = False  # work-around for https://github.com/pytorch/pytorch/issues/25723
_global_enforce_fresh_backprop: bool = False  # global switch to catch double backprop errors on Hessian computation
_global_backprops_prefix = ''  # hooks save backprops to, ie param.{_backprops_prefix}backprops_list


class LayerStats:
    # Some notation/background from https://docs.google.com/document/d/19Jmh4spbSAnAGX_eq7WSFPgLzrpJEhiZRpjX1jSYObo/edit#heading=h.9fi55aowtmgy
    sparsity: torch.Tensor
    mean_activation: torch.Tensor
    mean_backprop: torch.Tensor

    sigma_l2: torch.Tensor  # l2 norm of centered gradient covariance (singlconnected noise covariant matrix)
    sigma_erank: torch.Tensor  # trace/l2 norm

    H_l2: torch.Tensor  # spectral norm of H (largest curvature)
    H_fro: torch.Tensor  # Frobenius norm of hessian
    grad_fro: torch.Tensor  # Frobenius norm of gradient update
    param_fro: torch.Tensor  # Frobenius norm of parameter tensor

    grad_curv: torch.Tensor  # curvature in direction of gradient
    newton_curv: torch.Tensor  # curvature in direction of Newton step

    step_openai: torch.Tensor  # optimal step length using gradient direction and Hessian curvature estimate
    step_div_inf: torch.Tensor  # divergent step size for infinite batch (2/spectral radius)
    step_div_1: torch.Tensor  # (2/trace, a bad attempt to approximate Jain divergent lr, should be 2/R^2)
    newton_fro: torch.Tensor  # Frobenius norm of newton step
    regret_gradient: torch.Tensor  # expected improvement if we took optimal step-size in gradient direction
    reget_newton: torch.Tensor  # expected improvement if we took Newton step
    batch_openai: torch.Tensor  # optimal batch size from gradient noise stat (loss change from noise part over loss change from deterministic part)
    batch_jain_simple: torch.Tensor  # optimal batch-size assuming well-specified model (trace/sigma)
    batch_jain_full: torch.Tensor  # optimal batch size using Jain/Kakade approach
    noise_variance_pinv: torch.Tensor  # asymptotic minimax rate (called noise variance in 1.1.4 of Jain/Kakade)

    # need: R^2 the largest Jacobian size
    # need: angle between gradient and newton step

    # 2 hessians, 2 covariance matrices, need l2, trace, rank, spectrum

    def __iter__(self):
        return iter(self.__dict__)

    def __init__(self):
        pass

    def __getitem__(self, item):
        return self.__dict__[item]

    def items(self):
        return self.__dict__.items()


def add_hooks(model: nn.Module) -> None:
    """
    Adds hooks to model to save activations and backprop values.

    The hooks will
    1. assign activations to layer.activations during forward pass
    2. assign layer output to layer.output during forward pass
    2. append backprops to layer.backprops_list during backward pass

    Call "clear_backprops" to clear backprops_list values for all parameters in the model
    Call "remove_hooks(model)" to undo this operation.


    Args:
        model:
    """

    global _global_hooks_disabled
    _global_hooks_disabled = False

    handles = []
    for layer in model.modules():
        if _layer_type(layer) in _supported_layers:
            handles.append(layer.register_forward_hook(_capture_activations))
            handles.append(layer.register_forward_hook(_capture_output))
            handles.append(layer.register_backward_hook(_capture_backprops))

    model.__dict__.setdefault('autograd_hacks_hooks', []).extend(handles)


def remove_hooks(model: nn.Module) -> None:
    """
    Remove hooks added by add_hooks. Provides Providesa hook for removing hooks.
    """

    assert model == 0, "not working, remove this after fix to https://github.com/pytorch/pytorch/issues/25723"




    if not hasattr(model, 'autograd_hacks_hooks'):
        print("Warning, asked to remove hooks, but no hooks found")
    else:
        for handle in model.autograd_hacks_hooks:
            handle
        del model.autograd_hacks_hooks


def disable_hooks() -> None:
    """
    Globally disable all hooks installed by this library.
    """

    global _global_hooks_disabled
    _global_hooks_disabled = True


def enable_hooks() -> None:
    """The opposite of disable_hooks()."""

    global _global_hooks_disabled
    _global_hooks_disabled = False


def is_supported(layer: nn.Module) -> bool:
    """Check if this layer is supported."""

    return _layer_type(layer) in _supported_layers


def _layer_type(layer: nn.Module) -> str:
    return layer.__class__.__name__


def _capture_activations(layer: nn.Module, input: List[torch.Tensor], output: torch.Tensor):
    """Save activations into layer.activations in forward pass"""

    if _global_hooks_disabled:
        return
    assert _layer_type(layer) in _supported_layers, "Hook installed on unsupported layer, this shouldn't happen"
    setattr(layer, "activations", input[0].detach())


def _capture_output(layer: nn.Module, input: List[torch.Tensor], output: torch.Tensor):
    """Save activations into layer.activations in forward pass"""

    if _global_hooks_disabled:
        return
    assert _layer_type(layer) in _supported_layers, "Hook installed on unsupported layer, this shouldn't happen"
    setattr(layer, "output", output.detach())


def _capture_backprops(layer: nn.Module, _input, output):
    """Append backprop to layer.backprops_list in backward pass."""
    global _global_enforce_fresh_backprop

    if _global_hooks_disabled:
        return

    backprops_list_attr = _global_backprops_prefix + 'backprops_list'
    if _global_enforce_fresh_backprop:
        assert not hasattr(layer,
                           backprops_list_attr), f"Seeing result of previous backprop in {backprops_list_attr}, use {_global_backprops_prefix}clear_backprops(model) to clear"
        _global_enforce_fresh_backprop = False

    if not hasattr(layer, backprops_list_attr):
        setattr(layer, backprops_list_attr, [])
    getattr(layer, backprops_list_attr).append(output[0].detach())


def clear_backprops(model: nn.Module) -> None:
    """Delete layer.backprops_list in every layer."""
    for layer in model.modules():
        if hasattr(layer, 'backprops_list'):
            del layer.backprops_list


def clear_hess_backprops(model: nn.Module) -> None:
    """Delete layer.backprops_list in every layer."""
    for layer in model.modules():
        if hasattr(layer, 'hess_backprops_list'):
            del layer.hess_backprops_list


def compute_grad1(model: nn.Module, loss_type: str = 'mean') -> None:
    """
    Compute per-example gradients and save them under 'param.grad1'. Must be called after loss.backprop()

    Args:
        model:
        loss_type: either "mean" or "sum" depending whether backpropped loss was averaged or summed over batch
    """

    assert loss_type in ('sum', 'mean')
    for layer in model.modules():
        if hasattr(layer, 'expensive'):
            continue

        layer_type = _layer_type(layer)
        if layer_type not in _supported_layers:
            continue
        assert hasattr(layer, 'activations'), "No activations detected, run forward after add_hooks(model)"
        assert hasattr(layer, 'backprops_list'), "No backprops detected, run backward after add_hooks(model)"
        assert len(layer.backprops_list) == 1, "Multiple backprops detected, make sure to call clear_backprops(model)"

        A = layer.activations
        n = A.shape[0]
        if loss_type == 'mean':
            B = layer.backprops_list[0] * n
        else:  # loss_type == 'sum':
            B = layer.backprops_list[0]

        if layer_type == 'Linear':
            setattr(layer.weight, 'grad1', torch.einsum('ni,nj->nij', B, A))
            if layer.bias is not None:
                setattr(layer.bias, 'grad1', B)

        elif layer_type == 'Conv2d':
            Kh, Kw = layer.kernel_size
            di, do = layer.in_channels, layer.out_channels
            Oh, Ow = layer.backprops_list[0].shape[2:]
            weight_shape = [n] + list(layer.weight.shape)  # n, do, di, Kh, Kw
            assert weight_shape == [n, do, di, Kh, Kw]
            A = torch.nn.functional.unfold(A, layer.kernel_size)  # n, di * Kh * Kw, Oh * Ow

            assert A.shape == (n, di * Kh * Kw, Oh * Ow)
            assert layer.backprops_list[0].shape == (n, do, Oh, Ow)

            # B = B.reshape(n, -1, A.shape[-1])
            B = B.reshape(n, do, Oh * Ow)
            # noinspection PyTypeChecker
            grad1 = torch.einsum('ijk,ilk->ijl', B, A)  # n, do, di * Kh * Kw
            assert grad1.shape == (n, do, di * Kh * Kw)

            setattr(layer.weight, 'grad1', grad1.reshape(weight_shape))
            if layer.bias is not None:
                setattr(layer.bias, 'grad1', torch.sum(B, dim=2))


def compute_hess(model: nn.Module, method='exact', attr_name=None, vecr_order=False, loss_aggregation='mean') -> None:
    """Compute Hessian (torch.Tensor) for each parameter and save it under 'param.hess' by default.
    Hessian can be a Tensor or a tensor-like object like KronFactored.

     If attr_name is specified, saves it under 'param.{attr_name}'

    Must be called after backprop_hess().

    Args:
        model:
        method: which method to use for computing the Hessian
            kron: kronecker product
            mean_kron: mean of kronecker products, one kronecker product per datapoint
            experimental_kfac: experimental method for Conv2d
        attr_name: which attribute of Parameter object to use for storing the Hessian.
        vecr_order: determines whether Hessian computed with respect to vectorized parameters (math notation default) or row-vectorized parameters (more efficient in PyTorch)
        loss_aggregation: 'mean' or 'sum', determines whether final loss is sum or mean of per-example losses
    """

    assert method in _supported_methods
    assert loss_aggregation in ['mean', 'sum']

    # TODO: get rid of hess_factored logic

    # legacy specification for factored version, remove
    if attr_name is None:
        hess_attr = 'hess' if (method == 'exact' or method == 'autograd') else 'hess_factored'
    else:
        hess_attr = attr_name

    li = 0
    for layer in model.modules():

        if hasattr(layer, 'expensive'):
            continue

        layer_type = _layer_type(layer)
        if layer_type not in _supported_layers:
            continue
        assert hasattr(layer, 'activations'), "No activations detected, run forward after add_hooks(model)"
        assert hasattr(layer, 'hess_backprops_list'), "No backprops detected, run hess_backprop"

        A = layer.activations
        n = A.shape[0]

        if layer_type == 'Linear':
            B = torch.stack(layer.hess_backprops_list)

            di = A.shape[1]
            do = layer.hess_backprops_list[0].shape[1]
            o = B.shape[0]

            original_A = A

            A = torch.stack([A] * o)

            if method == 'exact':
                Jo = torch.einsum("oni,onj->onij", B, A).reshape(n * o, -1)
                H = torch.einsum('ni,nj->ij', Jo, Jo) / n

                # Alternative way
                # Jo = torch.einsum("oni,onj->onij", B, A)
                # H = torch.einsum('onij,onkl->ijkl', Jo, Jo) / n
                # H = H.reshape(do*di, do*di)

                H_bias = torch.einsum('oni,onj->ij', B, B) / n
            else:  # TODO(y): can optimize this case by not stacking A
                assert method == 'kron'
                # AA = torch.einsum("oni,onj->ij", A, A) / (o * n)  # # TODO(y): makes more sense to apply o factor to B
                # BB = torch.einsum("oni,onj->ij", B, B) / n
                #                H = u.Kron(AA, BB)
                H_bias = u.Kron(torch.eye(1), torch.einsum("oni,onj->ij", B, B) / n)  # TODO: reuse BB

                hess = u.KronFactoredCov(di, do)
                hess.add_samples(original_A, B)
                H = hess.value()


        elif layer_type == 'Conv2d':
            Kh, Kw = layer.kernel_size
            di, do = layer.in_channels, layer.out_channels
            n, do, Oh, Ow = layer.hess_backprops_list[0].shape
            o = len(layer.hess_backprops_list)

            A = torch.nn.functional.unfold(A, kernel_size=layer.kernel_size,
                                           stride=layer.stride,
                                           padding=layer.padding,
                                           dilation=layer.dilation)  # n, di * Kh * Kw, Oh * Ow
            assert A.shape == (n, di * Kh * Kw, Oh * Ow)
            B = torch.stack([Bh.reshape(n, do, -1) for Bh in layer.hess_backprops_list])  # o, n, do, Oh*Ow

            A = torch.stack([A] * o)  # o, n, di * Kh * Kw, Oh*Ow
            if gl.debug_dump_stats:
                print(f'layerA {li}', A)
                print(f'layerB {li}', B)

            if method == 'exact':
                Jo = torch.einsum('onis,onks->onik', B, A)  # o, n, do, di * Kh * Kw
                Jo_bias = torch.einsum('onis->oni', B)

                Hi = torch.einsum('onij,onkl->nijkl', Jo, Jo)  # n, do, di*Kh*Kw, do, di*Kh*Kw
                Hi = Hi.reshape(n, do * di * Kh * Kw, do * di * Kh * Kw)  # n, do*di*Kh*Kw, do*di*Kh*Kw
                Hi_bias = torch.einsum('oni,onj->nij', Jo_bias, Jo_bias)  # n, do, do
                H = Hi.mean(dim=0)
                H_bias = Hi_bias.mean(dim=0)
            elif method == 'kron':
                AA = torch.einsum("onis->oni", A) / (Oh * Ow)  # group input channels
                AA = torch.einsum("oni,onj->onij", AA, AA) / (o * n)  # remove factor of o because A is repeated o times

                AA = torch.einsum("onij->ij", AA)  # sum out outputs/classes

                BB = torch.einsum("onip->oni", B)  # group output channels
                BB = torch.einsum("oni,onj->ij", BB, BB) / n
            elif method == 'mean_kron':
                AA = torch.einsum("onis->oni", A) / (Oh * Ow)  # group input channels
                AA = torch.einsum("oni,onj->onij", AA, AA) / (o)  # remove factor of o because A is repeated o times

                AA = torch.einsum("onij->nij", AA)  # sum out outputs/classes

                BB = torch.einsum("onip->oni", B)  # group output channels
                BB = torch.einsum("oni,onj->nij", BB, BB)

            elif method == 'experimental_kfac':
                AA = torch.einsum("onis,onjs->onijs", A, A)
                AA = torch.einsum("onijs->onij", AA) / (Oh * Oh)
                AA = torch.einsum("onij->oij", AA) / n
                AA = torch.einsum("oij->ij", AA) / o

                BB = torch.einsum("onip,onjp->onijp", B, B) / n
                BB = torch.einsum("onijp->onij", BB)
                BB = torch.einsum("onij->nij", BB)
                BB = torch.einsum("nij->ij", BB)

            if method != 'exact':
                if method == 'mean_kron':
                    H = u.MeanKronFactored(AA, BB)
                    # H = u.KronFactored(AA[0,...], BB[0,...])
                else:
                    H = u.Kron(AA, BB)

                BB_bias = torch.einsum("onip->oni", B)  # group output channels
                BB_bias = torch.einsum("oni,onj->onij", BB_bias, BB_bias) / n  # covariance
                BB_bias = torch.einsum("onij->ij", BB_bias)  # sum out outputs + examples
                H_bias = u.Kron(torch.eye(1), BB_bias)

        if loss_aggregation == 'sum':
            H = n * H
            H_bias = H * n

        if vecr_order:
            H = H.commute()
            H_bias = H_bias.commute()

        setattr(layer.weight, hess_attr, H)
        if layer.bias is not None:
            setattr(layer.bias, hess_attr, H_bias)
        li += 1


def backprop_hess(output: torch.Tensor, hess_type: str, model: Optional[nn.Module] = None) -> None:
    """
    Call backprop 1 or more times to accumulate values needed for Hessian computation.

    Values are accumulated under .backprops_list attr of each layer and used by downstream functions like compute_hess

    Args:
        output: prediction of neural network (ie, input of nn.CrossEntropyLoss())
        hess_type: 'LeastSquares' or 'CrossEntropy'. Type of Hessian propagation, "CrossEntropy" results in exact Hessian for CrossEntropy
        model: optional model, used to freeze the parameters
    """

    global _global_enforce_fresh_backprop, _global_hooks_disabled, _global_backprops_prefix

    assert not _global_hooks_disabled
    _global_enforce_fresh_backprop = True  # enforce empty backprops_list on first backprop

    old_backprops_prefix = _global_backprops_prefix
    _global_backprops_prefix = 'hess_'  # backprops go into hess_backprops_list

    valid_hess_types = ('LeastSquares', 'CrossEntropy', 'DebugLeastSquares')
    assert hess_type in valid_hess_types, f"Unexpected hessian type: {hess_type}, valid types are {valid_hess_types}"
    n, o = output.shape

    if hess_type == 'CrossEntropy':
        batch = F.softmax(output, dim=1)

        mask = torch.eye(o).to(gl.device).expand(n, o, o)
        diag_part = batch.unsqueeze(2).expand(n, o, o) * mask
        outer_prod_part = torch.einsum('ij,ik->ijk', batch, batch)
        hess = diag_part - outer_prod_part
        assert hess.shape == (n, o, o)

        with u.timeit("xent-symsqrt"):
            for i in range(n):
                if torch.get_default_dtype() == torch.float64:
                    hess[i, :, :] = u.symsqrt_svd(
                        hess[i, :, :])  # more stable method since we don't care about speed with float64
                    print('warning, slow method for cross-entropy')
                else:
                    hess[i, :, :] = u.symsqrt(hess[i, :, :])
                u.nan_check(hess[i, :, :])
            hess = hess.transpose(0, 1)

    elif hess_type == 'LeastSquares':
        hess = []
        assert len(output.shape) == 2
        batch_size, output_size = output.shape

        id_mat = torch.eye(output_size).to(output.device).type(output.dtype)
        for out_idx in range(output_size):
            hess.append(torch.stack([id_mat[out_idx]] * batch_size))

    elif hess_type == 'DebugLeastSquares':
        hess = []
        assert len(output.shape) == 2
        batch_size, output_size = output.shape

        id_mat = torch.eye(output_size)
        id_mat[0, 0] = 10
        for out_idx in range(output_size):
            hess.append(torch.stack([id_mat[out_idx]] * batch_size))

    for o in range(o):
        output.backward(hess[o], retain_graph=True)

    # side-effect of Hessian backprop is that .grad buffers are updated.
    # Zero out those buffers to prevent accidental use
    if model is not None:
        model.zero_grad()

    _global_backprops_prefix = old_backprops_prefix


class LayerCov:
    """Class representing second-order information associated with a layer."""

    S: u.KronFactoredCov  # expected gradient outer product: E[gg'] where g=gradient of loss
    J: u.KronFactoredCov  # expected Jacobian outer product: E[hh'] where h=gradient of network output
    H: u.KronFactoredCov  # expected hessian: E[H] where H is per-example hessian

    def __init__(self):
        self.S = None
        self.J = None
        self.H = None


def compute_cov(model: nn.Module, loss_fn: Callable, stats_iter, batch_size, steps, loss_type='CrossEntropy'):
    """

    Augments model layers with their associated covariance matrices. At the end of the run, every layer will have an
    attribute 'cov' of type LayerCov

    Args:
        stats_iter: data iterator
        loss_type:
        model:
        loss_fn:
        batch_size: size of batch to use for estimation
        steps: number of steps to use to aggregate stats

    Returns:

    """

    assert loss_type == 'CrossEntropy', 'only cross entropy is implemented'

    enable_hooks()

    clear_backprops(model)
    clear_hess_backprops(model)
    model.zero_grad()

    for i in range(steps):
        data, targets = next(stats_iter)
        output = model(data)
        loss = loss_fn(output, targets)
        assert len(data) == batch_size

        with u.timeit("backprop_J"):
            backprop_hess(output, hess_type='LeastSquares')
            update_cov(model, 'activations', 'hess_backprops_list', 'J')
            clear_hess_backprops(model)

        with u.timeit("backprop_G"):
            loss.backward(retain_graph=True)
            update_cov(model, 'activations', 'backprops_list', 'S')
            clear_backprops(model)
            model.zero_grad()

        # disable because super-slow
        if not gl.hacks_disable_hess:
            with u.timeit("backprop_H"):
                backprop_hess(output, hess_type='CrossEntropy')
                update_cov(model, 'activations', 'hess_backprops_list', 'H')
                clear_hess_backprops(model)

    disable_hooks()


def update_cov(model, a_attr, b_attr, target_attr):
    """Update Kronecker-factored layer covariance of a,b values.
    For every layer in the model, will perform
        a = layer.{a_attr}
        b = layer.{b_attr}
        layer.cov.{target_attr}.add_samples(a, b)
    """

    for layer in model.modules():
        if not is_supported(layer):
            continue

        layer_type = _layer_type(layer)

        if hasattr(layer, 'cov'):
            layer_cov = layer.cov
        else:
            layer_cov = LayerCov()
            setattr(layer, 'cov', layer_cov)

        a_vals = getattr(layer, a_attr)
        a_dim = a_vals.shape[-1]
        b_vals = getattr(layer, b_attr)

        if layer_type == 'Conv2d':
            Kh, Kw = layer.kernel_size
            di, do = layer.in_channels, layer.out_channels
            n, do, Oh, Ow = a_vals.shape
            o = len(layer.hess_backprops_list)

            a_vals = torch.nn.functional.unfold(a_vals, kernel_size=layer.kernel_size,
                                                stride=layer.stride,
                                                padding=layer.padding,
                                                dilation=layer.dilation)  # n, di * Kh * Kw, Oh * Ow
            assert a_vals.shape == (n, di * Kh * Kw, Oh * Ow)
            a_vals = torch.einsum("onis->oni", a_vals) / (Oh * Ow)  # group input channels
            b_vals = [b.reshape(n, do, -1) for b in b_vals]

        # backward vals are in a list
        # special handling for backprops, multiple vals are stacked into rank 3 tensor

        assert type(b_vals) == list
        if len(b_vals) > 1:
            b_vals = torch.stack(b_vals)

        else:
            b_vals = b_vals[0]

        b_dim = b_vals.shape[-1]

        covmat = getattr(layer_cov, target_attr)
        if covmat is None:
            covmat = u.KronFactoredCov(a_dim, b_dim)
            setattr(layer_cov, target_attr, covmat)
        covmat.add_samples(a_vals, b_vals)


def compute_stats(model, attr_name='stats', factored=False, sigma_centering=True):
    """

    Combines activations and backprops to compute statistics for a model.
    Args:
        model:
        attr_name: stats are saved under this attribute name on corresponding Parameter

    """

    # obtain n
    n = 0
    for param in model.modules():
        if hasattr(param, 'activations'):
            n = param.activations.shape[0]
            break
    assert n, "Couldn't figure out size of activations"

    for (i, layer) in enumerate(model.layers):

        if hasattr(layer, 'expensive'):
            continue

        param_names = {layer.weight: "weight", layer.bias: "bias"}
        for param in [layer.weight, layer.bias]:

            if param is None:
                continue

            s = LayerStats()  # dictionary-like object for layer stats

            #############################
            # Gradient stats
            #############################
            A_t = layer.activations
            B_t = layer.backprops_list[0] * n

            s.sparsity = torch.sum(
                layer.output <= 0).float() / layer.output.numel()  # proportion of activations that are zero
            s.mean_activation = torch.mean(A_t)
            s.mean_backprop = torch.mean(B_t)

            # empirical Fisher
            G = param.grad1.reshape((n, -1))
            g = G.mean(dim=0, keepdim=True)

            print(G)
            u.nan_check(G)
            with u.timeit(f'sigma-{i}'):
                efisher = G.t() @ G / n
                if sigma_centering:
                    sigma = efisher - g.t() @ g
                else:
                    sigma = efisher

                s.sigma_l2 = u.sym_l2_norm(sigma)
                s.sigma_erank = torch.trace(sigma) / s.sigma_l2

            H = param.hess

            u.nan_check(H)

            with u.timeit(f"H_l2-{i}"):
                s.H_l2 = u.sym_l2_norm(H)

            with u.timeit(f"norms-{i}"):
                s.H_fro = H.flatten().norm()
                s.grad_fro = g.flatten().norm()
                s.param_fro = param.data.flatten().norm()

            # TODO(y): col vs row fix
            def loss_direction(dd: torch.Tensor, eps):
                """

                Args:
                    dd: direction, as a n-by-1 matrix
                    eps: scalar length

                Returns:
                   loss improvement if we take step eps in direction dd.
                """
                assert u.is_row_matrix(dd)
                return (eps * (dd @ g.t()) - 0.5 * eps ** 2 * dd @ H @ dd.t()).squeeze()

            def curv_direction(dd: torch.Tensor):
                """Curvature in direction dd (directional eigenvalue). """
                assert u.is_row_matrix(dd)
                return (dd @ H @ dd.t() / (dd.flatten().norm() ** 2)).squeeze()

            with u.timeit(f"pinvH-{i}"):
                pinvH = u.pinv(H)

            with u.timeit(f'curv-{i}'):
                s.grad_curv = curv_direction(g)  # curvature (eigenvalue) in direction g
                ndir = g @ pinvH  # newton direction (TODO(y): replace with lstsqsolve)
                s.newton_curv = curv_direction(ndir)

                setattr(layer.weight, 'pre', pinvH)  # save Newton preconditioner
                s.step_openai = 1 / s.grad_curv if s.grad_curv else 1234567
                s.step_div_inf = 2 / s.H_l2  # divegent step size for batch_size=infinity
                s.step_div_1 = torch.tensor(2) / torch.trace(H)  # divergent step for batch_size=1

                s.newton_fro = ndir.flatten().norm()  # frobenius norm of Newton update
                s.regret_newton = u.to_python_scalar(g @ pinvH @ g.t() / 2)  # replace with "quadratic_form"
                s.regret_gradient = loss_direction(g, s.step_openai)

            # todo: Lyapunov has to be redone
            with u.timeit(f'rho-{i}'):
                s.rho, lyap_erank, L_evals = u.truncated_lyapunov_rho(H, sigma)
                s.step_div_1_adjusted = s.step_div_1 / s.rho

            with u.timeit(f"batch-{i}"):
                # s.batch_openai = torch.trace(H @ sigma) / (g @ H @ g.t()).squeeze()
                s.batch_openai = torch.trace(H @ sigma) / (g @ H @ g.t())
                print('original sigma: ', torch.trace(H @ sigma) / (g @ H @ g.t()))
                denom = (g @ H @ g.t())
                print('subtracted1:', torch.trace(H @ (sigma - g.t() @ g)) / denom)
                print('subtracted2:', torch.trace(H @ sigma) / denom - torch.trace(H @ g.t() @ g) / denom)
                print("left term: ", torch.trace(H @ sigma))
                print("right term: ", torch.trace(H @ g.t() @ g))
                print('denom: ', denom)

                s.diversity = torch.norm(G, "fro") ** 2 / torch.norm(g) ** 2 / n  # Gradient diversity / n
                s.noise_variance_pinv = torch.trace(pinvH @ sigma)  # todo(y): replace with lsqtsolve
                s.H_erank = torch.trace(H) / s.H_l2
                s.batch_jain_simple = 1 + s.H_erank
                s.batch_jain_full = 1 + s.rho * s.H_erank

            param_name = f"{layer.name}={param_names[param]}"
            u.log_scalars(u.nest_stats(f"{param_name}", s))

            H_evals = u.symeig_pos_evals(H)
            S_evals = u.symeig_pos_evals(sigma)

            # s.H_evals = H_evals
            # s.S_evals = S_evals
            # s.L_evals = L_evals

            setattr(param, attr_name, s)

            # u.log_spectrum(f'{param_name}/hess', H_evals)
            # u.log_spectrum(f'{param_name}/sigma', S_evals)
            # u.log_spectrum(f'{param_name}/lyap', L_evals)

    return None


def compute_stats_factored(model, attr_name='stats', sigma_centering=True):
    """Combines activations and backprops to compute statistics for a model. Assumes factored hessian was saved into 'hess2' of each param"""

    ein = torch.einsum
    n = 0
    for param in model.modules():
        if hasattr(param, 'activations'):
            n = param.activations.shape[0]
            break
    assert n, "Couldn't figure out size of activations"

    for (i, layer) in enumerate(model.layers):

        if hasattr(layer, 'expensive'):
            continue

        param_names = {layer.weight: "weight", layer.bias: "bias"}
        for param in [layer.weight, layer.bias]:

            if param is None:
                continue

            s = AttrDefault(str, {})  # dictionary-like object for layer stats

            do, di = layer.weight.shape
            H: u.Kron = param.hess2

            if param is layer.weight:
                assert H.shape == ((di, di), (do, do))
            else:
                assert H.shape == ((1, 1), (do, do))

            # TODO(y): fix stats for bias
            if param is layer.bias:
                continue

            G = param.grad1.reshape((n, -1))
            g = G.mean(dim=0, keepdim=True)

            if param is layer.weight:
                vecG = u.Vec(g, shape=(do, di))
            else:  # bias
                vecG = u.Vec(g, shape=(do, 1))

            u.nan_check(G)

            A = layer.activations
            B = layer.backprops_list[0] * n

            AA = ein('ni,nj->ij', A, A)
            BB = ein('ni,nj->ij', B, B)
            Bc = B - torch.mean(B, dim=0)
            BBc = ein('ni,nj->ij', Bc, Bc)

            # subtracting mean breaks Kronecker factoring, so this is approximate
            if sigma_centering:
                sigma_k = u.Kron(AA, BBc) / n  # only center backprops, centering both leads to underestimate of cov
            else:
                sigma_k = u.Kron(AA, BB) / n
            sigma_k = sigma_k / n  # extra factor to average A's as well
            s.sparsity = torch.sum(
                layer.output <= 0).float() / layer.output.numel()  # proportion of activations that are zero
            s.mean_activation = torch.mean(A)
            s.mean_backprop = torch.mean(B)

            with u.timeit(f'sigma-{i}'):
                s.sigma_l2 = sigma_k.sym_l2_norm()
                s.sigma_erank = sigma_k.trace() / s.sigma_l2

            # u.nan_check(param.hess)

            with u.timeit(f"H_l2-{i}"):
                s.H_l2 = H.sym_l2_norm()

            with u.timeit(f"norms-{i}"):
                s.H_fro = H.frobenius_norm()
                s.grad_fro = g.flatten().norm()
                s.param_fro = param.data.flatten().norm()

            with u.timeit(f"pinvH-{i}"):
                pinvH = H.pinv()

            with u.timeit(f'curv-{i}'):
                s.grad_curv = u.matmul(vecG @ H, vecG) / (vecG @ vecG)
                newton_dir = vecG @ pinvH
                s.newton_curv = u.matmul(newton_dir @ H, newton_dir) / (newton_dir @ newton_dir)

                setattr(layer.weight, 'pre', pinvH)  # save Newton preconditioner
                s.step_openai = 1 / s.grad_curv if s.grad_curv else 1234567
                s.step_div_inf = 2 / s.H_l2  # divegent step size for batch_size=infinity
                s.step_div_1 = torch.tensor(2) / H.trace()  # divergent step for batch_size=1
                s.newton_fro = newton_dir.norm()  # frobenius norm of Newton update,

                def loss_direction(d: u.Vec, step):  # improvement in loss if we go eps units in direction dir
                    return step * (d @ vecG) - 0.5 * step ** 2 * (d @ H @ vecG)

                s.regret_gradient = loss_direction(vecG, s.step_openai)
                # can compute newton regret more efficiently by doing row-vectorized instead of col-vectorized
                vecG2 = u.Vecr(g, shape=(do, di))
                pinvH_rowvec = pinvH.commute()  # original H was for col-vectorized order
                s.regret_newton = vecG2 @ pinvH_rowvec @ vecG2 / 2

            with u.timeit(f'rho-{i}'):
                # lyapunov matrix
                Xk = u.lyapunov_spectral(H.RR, sigma_k.RR)  # compare backprops
                s.rho = u.erank(u.eye_like(Xk)) / u.erank(Xk)
                s.step_div_1_adjusted = s.step_div_1 / s.rho

            with u.timeit(f"batch-{i}"):
                s.batch_openai = (H @ sigma_k).trace() / (vecG @ H @ vecG)
                if not sigma_centering:
                    s.batch_openai -= 1

                expected_grad_norm_sq = torch.norm(G, "fro") ** 2 / n  # expected gradient norm squared
                s.diversity = expected_grad_norm_sq / torch.norm(g) ** 2  # Gradient diversity / n
                s.noise_variance_pinv = (pinvH @ sigma_k).trace()
                s.H_erank = H.trace() / s.H_l2
                s.batch_jain_simple = 1 + s.H_erank
                s.batch_jain_full = 1 + s.rho * s.H_erank

            param_name = f"{layer.name}={param_names[param]}"
            u.log_scalars(u.nest_stats(f"{param_name}", s))

            setattr(param, attr_name, s)


##########################################################################################
### post-refactoring
##########################################################################################

import torch.utils


# second order covariance for two variables X,Y
class SecondOrderCov:

    # todo: add "symmetric" flag
    def __init__(self):
        self.mean_x = None
        self.mean_y = None
        self.cov_xy = None
        self.d_x = -1
        self.d_y = -1
        self.initialized = False
        self.n = 0

    def accumulate(self, data_x, data_y):
        assert u.is_matrix(data_x)
        assert u.is_matrix(data_y)
        if not self.initialized:
            self.d_x = data_x.shape[1]
            self.d_y = data_y.shape[1]
            self.cov_xy = torch.zeros(self.d_x, self.d_y).type(data_x.dtype).to(data_x.device)
            self.mean_x = torch.zeros(self.d_x).type(data_x.dtype).to(data_x.device)
            self.mean_y = torch.zeros(self.d_y).type(data_y.dtype).to(data_y.device)
            self.initialized = True
        n = data_x.shape[0]
        assert n == data_y.shape[0]
        self.cov_xy += torch.einsum("ni,nj->ij", data_x, data_y)
        self.mean_x += torch.einsum("ni->i", data_x)
        self.mean_y += torch.einsum("ni->i", data_y)
        self.n += n

    def zero_(self):
        self.n = 0
        self.cov_xy.zero_()
        self.mean_x.zero_()
        self.mean_y.zero_()


class SymmetricFourthOrderCov:
    """Fourth order generalized covariance.
    rank=4 gives exact stats
    rank=3 uses Isserlis Theorem to for compact storage
    rank=2 is equivalent to Kronecker factoring
    rank=1 (not implemented) analogous to batch normalization
    """
    xx: SecondOrderCov
    yy: SecondOrderCov
    xy: SecondOrderCov
    xxyy: SecondOrderCov

    def __init__(self, rank=3):
        self.xx = SecondOrderCov()    # rank2
        self.yy = SecondOrderCov()    # rank2
        self.xy = SecondOrderCov()    # rank3
        self.xxyy = SecondOrderCov()  # rank4
        self.rank = rank

    def accumulate(self, data_x: torch.Tensor, data_y: torch.Tensor, cached_xx=None):
        assert u.is_matrix(data_x)
        assert u.is_matrix(data_y[0])
        n = data_x.shape[0]
        assert data_y.shape[0] == n

        if self.rank == 4:
            Jo = torch.einsum("oni,onj->onij", data_y, data_x).reshape(n, -1)
            self.xxyy.accumulate(Jo, Jo)

            # Alternative way
            # Jo = torch.einsum("oni,onj->onij", B, A)
            # H = torch.einsum('onij,onkl->ijkl', Jo, Jo) / n
            # H = H.reshape(do*di, do*di)

        else:
            if cached_xx is not None:
                self.xx.accumulate(data_x, data_x)
            self.yy.accumulate(data_y, data_y)

            if self.rank == 3:
                self.xy.accumulate(data_x, data_y)

    def zero_(self):
        self.xx.zero_()
        self.yy.zero_()
        self.xy.zero_()


class ModuleDict(dict):

    def __init__(self, defaultcreator=None, defaultvalue=None):
        assert (defaultcreator is None) or (defaultvalue is None), "only one of defaultcreator/defaultvalue must be set"
        self.defaultvalue = None
        self.defaultcreator = None
        if defaultcreator is not None:
            self.defaultcreator = defaultcreator
        elif defaultvalue is not None:
            self.defaultvalue = defaultvalue

    def __getitem__(self, item):
        if item not in self:
            if self.defaultcreator:
                self[item] = self.defaultcreator()
            elif self.defaultvalue:
                self[item] = self.defaultvalue
            else:
                assert False, f"Requested value {item} which doesn't exist in ModuleDict and defaultcreator nor default value is set"
        return self[item]


# Namespace of global settings used by the library internally
# Using module-level variable for settings means this library is not thread-safe.

global_settings_initialized = False


class Settings(object):
    forward_hooks: List[Callable]   # forward subhooks called by the global hook
    backward_hooks: List[Callable]  # backward subhooks
    model: Optional[nn.Module]
    hook_handles: List[torch.utils.hooks.RemovableHandle]    # removal handles of global hooks registered with PyTorch
    default_activations: Optional[ModuleDict]


    def __init__(self):
        assert global_settings_initialized is False, "Reinitializing Settings seems like a bug."
        self.model = None
        self.hook_handles = []
        self.forward_hooks = []
        self.backward_hooks = []
        self.default_activations = None
        self.default_Acov = None

        # temporary settings to aggregate gradient norm squared globally
        self._hack_gradient_norm_sum = ModuleDict(defaultvalue=torch.zeros(()))
        self._hack_gradient_norm_count = ModuleDict(defaultvalue=torch.zeros(()))
        self._hack_activations_squared = None   # initialized in set_default_activations

        # TODO(y): maybe remove all of this
        # store last activations captured for each layer. While this breaks encapsulation, this considerably simplifies the common use
        # case where the same set of activations is needed for several backward aggregation calls
        self.last_captured_activations = ModuleDict()

        # To prevent a mix of saved activations from different forward calls, keep counter which indicates in which
        # context each activation value was saved. This can be used to enforce that all activations were captured in the same context
        self.last_captured_activations_contextid = ModuleDict()
        self.activations_contextid = 0   # this gets incremented for each save_activations context





global_settings = Settings()


def layer_cov_dict(model):
    """Returns a dictionary of layer->KronFactoredCov for all supported layers in model."""
    return {layer: u.KronFactoredCov() for layer in model.layers() if is_supported(layer)}


def _forward_hook(layer: nn.Module, input: List[torch.Tensor], output: torch.Tensor):
    for hook in global_settings.forward_hooks:
        hook(layer, input, output)


# TODO(y): fix signature
def _backward_hook(layer: nn.Module, _input: torch.Tensor, output: torch.Tensor):
    for hook in global_settings.backward_hooks:
        hook(layer, _input, output)


def register(model: nn.Module):
    """
    Registers given model with autograd_lib. This allows user to use decorators like save_activations(A) and module_hook
    """
    global global_settings
    _global_hooks_disabled = False

    # TODO(y): make it work for multiple models and test. This needs check that hook list remains a singleton
    assert 'handles' not in vars(global_settings), "Already called register in this thread"

    global_settings.model = model

    layer: nn.Module
    for layer in model.modules():
        if _layer_type(layer) in _supported_layers:
            global_settings.hook_handles.append(layer.register_forward_hook(_forward_hook))
            layer.register_backward_hook(_backward_hook)   # don't save handle, https://github.com/pytorch/pytorch/issues/25723


def _hack_zero_gradient_norms_squared():
    for layer in global_settings._hack_gradient_norm_count:
        global_settings._hack_gradient_norm_count[layer] = 0
        global_settings._hack_gradient_norm_sum[layer] = 0
        global_settings._hack_activations_squared[layer] = None


def _hack_update_gradient_norms_squared(layer: nn.Module, backprops: torch.Tensor):
    """Trick from Ian Goodfellow https://arxiv.org/abs/1510.01799

    Add up gradient norm squared of gradients
    """
    n = backprops.shape[0]
    A2 = global_settings._hack_activations_squared[layer]   # initialized in "set_default_activations"
    assert n == A2.shape[0]
    sq_components = A2 * backprops * backprops
    global_settings._hack_gradient_norm_sum[layer] += torch.sum(sq_components)
    global_settings._hack_gradient_norm_count[layer] += n


def unregister():
    # TODO(y): switch to tensor backward hooks
    for handle in global_settings.hook_handles:
        handle.remove()


from contextlib import contextmanager


@contextmanager
def save_activations(storage: ModuleDict):
    """Save activations to layer storage: storage[layer] = activations
    """

    assert global_settings.hook_handles, "No hooks have been registered."
    hook_called = [False]

    global_settings.activations_contextid += 1

    def hook(layer: nn.Module, input: List[torch.Tensor], _output: torch.Tensor):
        if layer in storage:
            print("warning, overwriting existing activation for layer ", layer)
        hook_called[0] = True
        activations = input[0].detach()
        storage[layer] = activations
        global_settings.last_captured_activations[layer] = activations
        global_settings.last_captured_activations_contextid[layer] = global_settings.activations_contextid
        assert len(global_settings.last_captured_activations) < 200, "warning, possibly leaking activations, got more than 200"

    global_settings.forward_hooks.append(hook)
    yield
    assert hook_called[0], "Forward hook was never called."
    global_settings.forward_hooks.pop()


@contextmanager
def extend_backprops(storage: ModuleDict):
    """Extends list of backprops in storage with current backprops. storage[layer].extend([backprops])
    """

    assert global_settings.hook_handles, "No hooks have been registered."
    hook_called = [False]

    def hook(layer: nn.Module, _input, output):
        storage.setdefault(layer, []).extend([output[0].detach()])
        hook_called[0] = True

    global_settings.backward_hooks.append(hook)
    yield
    assert hook_called[0], "Backward hook was never called."
    global_settings.backward_hooks.pop()


@contextmanager
def module_hook(hook: Callable):
    """Context manager for running given hook on forward or backward."""

    # TODO(y): maybe add checking for arg types on hook to catch forward/backward hook mismatches
    # TODO(y): use weak ref for the hook handles so they are removed when model goes out of scope
    assert global_settings.hook_handles, "Global hooks have not been registered. Make sure to call .register(model) on your model"
    forward_hook_called = [False]
    backward_hook_called = [False]

    def forward_hook(layer: nn.Module, input: Tuple[torch.Tensor], output: torch.Tensor):
        assert len(input) == 1, "Only support single input modules on forward."
        assert type(output) == torch.Tensor, "Only support single output modules on forward."
        activations = input[0].detach()
        hook(layer, activations, output)
        forward_hook_called[0] = True

    def backward_hook(layer: nn.Module, input: Tuple[torch.Tensor], output: Tuple[torch.Tensor]):
        assert len(output) == 1, "Only support single output modules on backward."
        backprops = output[0].detach()
        hook(layer, input, backprops)
        backward_hook_called[0] = True

    global_settings.forward_hooks.append(forward_hook)
    global_settings.backward_hooks.append(backward_hook)
    yield
    assert forward_hook_called[0] or backward_hook_called[0], "Hook was called neither on forward nor backward pass, did you register your model?"
    assert not (forward_hook_called[0] and backward_hook_called[0]), "Hook was called both on forward and backward pass, did you register your model?"
    global_settings.forward_hooks.pop()
    global_settings.backward_hooks.pop()


@contextmanager
def save_activations2():
    """Save activations to layer storage: storage[layer] = activations
    """

    activations = {}
    def saveit(layer, A, _):
        activations[layer] = A
    with module_hook(saveit):
        yield activations


""""
Concept: backprop_func

Given output tensor, create a batch of matrices used for backprop.

Can be used to compute per-example Hessians if this function returns X where XiXi'=Hi
Examples: identity -- torch.eye for each example
Examples: cross_entropy_hessian -- Hessian of cross-entropy function
Examples: cross_entropy_rank1 -- Hessian of cross-entropy function, rank-1 approximation per example
Examples: cross_entropy_average -- average Hessian, low rank approximation
"""


def backward(tensor, backward_func, retain_graph=False):
    """Custom backprop.
    """

    vals = backward_func(tensor)
    o = len(vals)
    for idx, hess in enumerate(vals):
        tensor.backward(hess, retain_graph=(retain_graph or idx < o - 1))


def backward_accum(tensor, backward_func, storage: ModuleDict, retain_graph=False, update_gradient_norm=False):
    """
    Backpropagates from given tensor and updates FourthOrderCov for each layer in storage.


    Args:
        tensor:
        backward_func: function used to generate backward values. See "backward functions" below. Special value of 1 indicates 1 backpropagation
        storage: layer->FourthOrderCov ModuleDict storage
        retain_graph: whether to release activations at the end of forward prop
        update_gradient_norm: whether to update gradient norm squared estimates

    """

    if backward_func == 1:
        backward_func = backward_ones

    elif backward_func == 'identity':
        backward_func = backward_identity
    elif backward_func == 'xent':
        backward_func = backward_xent

    assert global_settings.hook_handles, "No hooks have been registered."
    hook_called = [False]

    def hook(layer: nn.Module, _input, output):
        backprops = output[0].detach()
        A = global_settings.default_activations
        Acov = global_settings.default_Acov
        if update_gradient_norm:
            _hack_update_gradient_norms_squared(layer, backprops)
        storage[layer].accumulate(data_x=A[layer], data_y=backprops, cached_xx=Acov)
        hook_called[0] = True

    global_settings.backward_hooks.append(hook)
    backward(tensor, backward_func, retain_graph)
    assert hook_called[0], "Backward hook was never called."
    global_settings.backward_hooks.pop()


# def backward_kron(target, tensor, A, A_cov, gradient):
#     """Calls backward, and aggrates covariance of backward values. If activations are provided, also updates cross covariance"""
#
#     def hook(module: nn.Module, _input, output):
#         """Appends all backprops (Jacobian Lops from upstream) to layer.backprops_list.
#         Using list in order to capture multiple backprop values for a single batch. Use util.clear_backprops(model)
#         to clear all saved values.
#         """
#         backprops = output[0].detach()
#         buffer[module].cov = KronFactoredCov
#         if activations:
#             activations = activations[module]
#
#         # compute covariance matrix
#
#     with backward_hook(hook):
#         tensor.backwards(gradient)


def set_default_activations(A):
    global_settings.default_activations = A

    global_settings._hack_activations_squared = ModuleDict()
    for layer in A:
        global_settings._hack_activations_squared[layer] = A[layer] * A[layer]


def set_default_Acov(Acov):
    global_settings.default_Acov = Acov


# Backward functions.
# These accept output tensor and produce a list of matrices [...,mi,...] suitable for output.backward(mi)
def backward_xent(output):
    pass


def backward_identity(tensor):
    assert u.is_matrix(tensor), "Only support rank-2 outputs."""
    n, o = tensor.shape

    hess = []
    batch_size, output_size = tensor.shape
    id_mat = u.eye(output_size)
    for out_idx in range(output_size):
        hess.append(torch.stack([id_mat[out_idx]] * batch_size))

    return hess


def backprop_identity(output, retain_graph=False) -> None:
    """
    Helper to find Jacobian with respect to given tensor. Backpropagates a row of identity matrix
    for each output of tensor. Rows are replicated across batch dimension.

    Args:
        output: target of backward
        retain_graph: same meaning as PyTorch retain_graph
    """

    assert u.is_matrix(output), "Only support rank-2 outputs."""

    n, o = output.shape
    id_mat = u.eye(o)
    for idx in range(o):
        output.backward(torch.stack([id_mat[idx]] * n), retain_graph=(retain_graph or idx < o - 1))


# TODO(y): rename to backward or backward_jacobian


def backward_ones(output):
    return [torch.ones_like(output)]

# backward_jacobian(strategy='exact')
# backward_jacobian(strategy='sampled')
# backward_hessian(loss='cross_entropy', strategy='exact')
# backward_hessian(loss='cross_entropy', strategy='sampled')


def backward_jacobian(output, sampled=False, retain_graph=False) -> None:
    """
    Helper to find Jacobian with respect to given tensor. Backpropagates a row of identity matrix
    for each output of tensor. Rows are replicated across batch dimension.

    Args:
        output: target of backward
        retain_graph: same meaning as PyTorch retain_graph
        sampled:
    """

    assert u.is_matrix(output), "Only support rank-2 outputs."""
    # assert strategy in ('exact', 'sampled')

    n, o = output.shape
    if not sampled:
        id_mat = torch.eye(o).to(gl.device)
        for idx in range(o):
            output.backward(torch.stack([id_mat[idx]] * n), retain_graph=(retain_graph or idx < o - 1))
    else:
        vals = torch.LongTensor(n, o).to(gl.device).random_(0, 2) * 2 - 1
        vals = vals.type(torch.get_default_dtype())
        vals /= o  # factor to preserve magnitudes from exact case.
        # switching to subsampling, kfac_fro became 1000x smaller, diversity became 300x larger, kfac_l2 unaffected
        output.backward(vals, retain_graph=retain_graph)


def backward_hessian(output, loss='CrossEntropy', sampled=False, retain_graph=False) -> None:
    assert loss in ('CrossEntropy',), f"Only CrossEntropy loss is supported, got {loss}"
    assert u.is_matrix(output)

    # use Cholesky-like decomposition from https://www.wolframcloud.com/obj/yaroslavvb/newton/square-root-formulas.nb
    n, o = output.shape
    p = F.softmax(output, dim=1)

    mask = torch.eye(o).to(gl.device).expand(n, o, o)
    diag_part = p.sqrt().unsqueeze(2).expand(n, o, o) * mask
    hess_sqrt = diag_part - torch.einsum('ij,ik->ijk', p.sqrt(), p)   # n, o, o

    if not sampled:
        for out_idx in range(o):
            output.backward(hess_sqrt[:, out_idx, :], retain_graph=(retain_graph or out_idx < o - 1))
    else:
        vals = torch.LongTensor(n, o).to(gl.device).random_(0, 2) * 2 - 1
        vals = vals.type(torch.get_default_dtype())/o
        mixed_vector = torch.einsum('nop,no->np', hess_sqrt, vals)
        output.backward(mixed_vector, retain_graph=retain_graph)


def grad_norms(A, B, m=None, approx='zero_order'):
    """
    Compute gradient norms squared with respect to given metric.

    "zero_order" approximation uses Euclidian metric computation (standard gradient norms), otherwise try to recover
    metric tensor from one or more of the moments defined as follows on the population activation/backprop values A, B
        m.a = einsum('ni->i', A)
        m.b = einsum('nk->k', B)
        m.AA = einsum('nij->ij', A, A)
        m.BB = einsum('nkl->kl', B, B)
        m.BA = einsum('nik->ik', B, A)
        m.BABA = einsum('nkilj->kilj', B, A, B, A)

    Args:
        A: n, d1 tensor of activationsn
        B: n, d2 tensor of backprops
        m: expected moments of covariance/curvature tensor
        approx: which approximation to use to reconstruct metric tensor out of moments.
            "zero_order":  ignore moments and use Euclidian metric
            "kfac": use m.AA, m.BB as Mijkl=Mij*Mkl
            "isserlis: use m.AA, m.BB, m.BA, m.a, m.b as Mijkl=Mij*Mkl+Mik*Mjl+Mil*Mjk-2Mi*Mj*Mk*Ml
            "full": full 4th order moment, use m.BABA

    Returns:
        (n,) tensor of per-example gradient norms squared
    """
    if approx == 'zero_order':
        norms = (A * A).sum(dim=1) * (B * B).sum(dim=1)
    elif approx == 'kfac':
        Am, Bm = A @ m.AA, B @ m.BB
        norms = (Am * A).sum(dim=1) * (Bm * B).sum(dim=1)
        # equivalent to torch.einsum('nk,ni,lk,ij,nl,nj->n', B, A, BB, AA, B, A)

    elif approx == 'isserlis':
        kfac = torch.einsum('nk,ni,lk,ij,nl,nj->n', B, A, m.BB, m.AA, B, A)
        cross1 = torch.einsum('nk,ni,ki,lj,nl,nj->n', B, A, m.BA, m.BA, B, A)
        cross2 = torch.einsum('nk,ni,li,kj,nl,nj->n', B, A, m.BA, m.BA, B, A)
        first_order = torch.einsum('nk,ni,i,j,k,l,nl,nj->n', B, A, m.a, m.a, m.b, m.b, B, A)
        norms = kfac + cross1 + cross2 - 2*first_order
    else:
        assert approx == 'full'
        norms = torch.einsum('ni,nk,nj,nl,likj->n', A, B, A, B, m.BABA)
    return norms


def offset_losses(A, B, alpha, offset, m, approx='zero_order'):
    """
    Evaluates expected improvement in loss on example i after taking gradient step from loss on example i+offset

    If alpha is None, uses optimal learning rate for minimizing i example loss by using direction of i+offset gradient

    Returns:
        (n,) tensor of improvements
    """
    if approx == 'zero_order':
        norms = (A * A).sum(dim=1) * (B * B).sum(dim=1)
    elif approx == 'kfac':
        Am, Bm = A @ m.AA, B @ m.BB
        norms = (Am * A).sum(dim=1) * (Bm * B).sum(dim=1)
        # equivalent to torch.einsum('nk,ni,lk,ij,nl,nj->n', B, A, BB, AA, B, A)
    elif approx == 'isserlis':  # TODO(y): currently this runs out of memory, optimize the einsums below 
        kfac = torch.einsum('nk,ni,lk,ij,nl,nj->n', B, A, m.BB, m.AA, B, A)
        cross1 = torch.einsum('nk,ni,ki,lj,nl,nj->n', B, A, m.BA, m.BA, B, A)
        cross2 = torch.einsum('nk,ni,li,kj,nl,nj->n', B, A, m.BA, m.BA, B, A)
        first_order = torch.einsum('nk,ni,i,j,k,l,nl,nj->n', B, A, m.a, m.a, m.b, m.b, B, A)
        norms = kfac + cross1 + cross2 - 2*first_order
    else:
        assert approx == 'full'
        norms = torch.einsum('ni,nk,nj,nl,likj->n', A, B, A, B, m.BABA)

    Ad = torch.roll(A, offset, 0)
    Bd = torch.roll(B, offset, 0)
    dot_prods = (A * Ad).sum(dim=1) * (B * Bd).sum(dim=1)
    if alpha is None:  # use optimal step for given direction
        improvements = 1/2 * dot_prods*dot_prods / norms
    else:
        improvements = alpha*dot_prods - 1/2 * alpha**2 * norms

    return improvements


def offset_cosines(A, B, offset=1):
    """
    Evaluates cosines between gradients

    If alpha is None, uses optimal learning rate for minimizing i example loss by using direction of i+offset gradient

    Returns:
        (n,) tensor of improvements
    """
    assert offset != 0

    Ad = torch.roll(A, offset, 0)
    Bd = torch.roll(B, offset, 0)
    dot_products = (A * Ad).sum(dim=1) * (B * Bd).sum(dim=1)
    norms1 = (A * A).sum(dim=1) * (B * B).sum(dim=1)
    norms2 = (Ad * Ad).sum(dim=1) * (Bd * Bd).sum(dim=1)
    cosines_squared = dot_products*dot_products/(norms1 * norms2)
    cosines = torch.sqrt(cosines_squared)
    # print('max cosine float32', max(abs(cosines)).item())
    # dot_products = dot_products.type(torch.float64)  # division by small numbers unstable, use higher precision
    # norms1 = norms1.type(torch.float64)  # division by small numbers unstable, use higher precision
    # norms2 = norms2.type(torch.float64)  # division by small numbers unstable, use higher precision
    # print('max cosine float64', max(abs(cosines)).item())
    return cosines


def offset_dotprod(A, B, offset=1):
    """
    Evaluates cosines between gradients

    If alpha is None, uses optimal learning rate for minimizing i example loss by using direction of i+offset gradient

    Returns:
        (n,) tensor of improvements
    """
    assert offset != 0

    Ad = torch.roll(A, offset, 0)
    Bd = torch.roll(B, offset, 0)
    dot_products = (A * Ad).sum(dim=1) * (B * Bd).sum(dim=1)
    return dot_products


def grad_curvs(A, B, metric):
    Am, Bm = A @ metric.AA, B @ metric.BB
    norms_before = (A * A).sum(dim=1) * (B * B).sum(dim=1)
    norms_after = (Am * A).sum(dim=1) * (Bm * B).sum(dim=1)
    return norms_after / norms_before



================================================
FILE: autotune/autograd_lib_test.py
================================================
import sys
from collections import namedtuple, defaultdict

import autograd_lib
import pytest

import util as u

# Test exact Hessian computation

# import torch
from typing import Callable

import torch
import torch.nn as nn

from attrdict import AttrDefault, AttrDict


def simple_model(d, num_layers):
    """Creates simple linear neural network initialized to identity"""
    layers = []
    for i in range(num_layers):
        layer = nn.Linear(d, d, bias=False)
        layer.weight.data.copy_(torch.eye(d))
        layers.append(layer)
    return torch.nn.Sequential(*layers)


def test_hooks():
    d = 1
    model = simple_model(d, num_layers=5)
    autograd_lib.register(model)

    A1, A2, A3 = {}, {}, {}
    x = torch.ones(1, d)

    with autograd_lib.save_activations(A1):
        y = model(2 * x)

    with autograd_lib.save_activations(A2):
        with autograd_lib.save_activations(A3):
            y = model(x)

    B1 = {}
    B2 = {}
    with autograd_lib.extend_backprops(B1):
        y.backward(x, retain_graph=True)

    model[2].weight.requires_grad = False
    for layer in model:
        del layer.weight.grad

    # model.clear_grads()
    with autograd_lib.extend_backprops(B2):
        y.backward(2 * x)

    print(B2.values())
    for layer in model:
        print(layer.weight.grad)

    for layer in model:
        assert A1[layer] == 2 * x
        assert A2[layer] == x
        assert A3[layer] == x
        assert B1[layer] == [x]
        assert B2[layer] == [2 * x]

    autograd_lib.unregister()


def _test_activations_contextmanager():
    d = 5
    model = simple_model(d, num_layers=2)
    autograd_lib.register(model)

    A1, A2, A3 = {}, {}, {}
    x = torch.ones(1, d)

    with autograd_lib.save_activations(A1):
        y = model(x)
        with autograd_lib.save_activations(A2):
            z = model[1](x)

    context_ids = autograd_lib.global_settings.last_captured_activations_contextid
    assert context_ids[model[1]] == context_ids[model[0]] + 1


# def _test_backprop():
#     d = 1
#     model = simple_model(d, num_layers=5)
#     autograd_lib.register(model)
#
#     x = torch.ones(2, d)
#     y = model(x)
#
#     # make sure buffers get freed, second call will cause a crash
#     autograd_lib.backward(y, kind='identity')
#     with pytest.raises(RuntimeError, match=r".*retain_graph=True.*"):
#         autograd_lib.backward(y, kind='identity')
#
#     y = model(x)
#     B = {}
#     with autograd_lib.save_backprops(B):
#         autograd_lib.backward(y, kind='identity', retain_graph=True)
#     u.check_equal(B[model[0]], [x])
#
#     with autograd_lib.save_backprops(B):
#         autograd_lib.backward(y, kind='identity', retain_graph=True)
#     u.check_equal(B[model[0]], [x, x])
#
#     autograd_lib.unregister()


def test_jacobian():
    # ground truth for unit tests from
    # https://www.wolframcloud.com/obj/yaroslavvb/newton/linear-jacobians-and-hessians.nb

    def init_model(B, X, A):
        """  Initializes the model Y=B'XA
        """
        B, X, A = u.to_pytorches(B, X, A)
        n = A.shape[1]
        d1, d2 = X.shape
        d3 = B.shape[1]

        # Do a test using Linear layers instead of matrix multiplies
        model: u.SimpleFullyConnected2 = u.SimpleFullyConnected2([d1, d2, d3], bias=False)
        model.layers[0].weight.data.copy_(X)
        model.layers[1].weight.data.copy_(B.t())

        def eval():
            return model(A.t())

        return eval, model.layers[0].weight

    # default Kronecker rules give result in vec order.
    # A*B=>(B*A)'  gives scalar for vector or scalar jacobian in vecr order
    # For matrix/matrix Jacobian must also switch the first two dimensions

    # matrix variable, scalar output
    torch.set_default_dtype(torch.float64)
    B = torch.tensor([[-4.], [2]])
    X = torch.tensor([[-5., 0], [-2, -6]], requires_grad=True)
    A = torch.tensor([[-1.], [3]])
    d_out, d_in = X.shape
    Y_func, X_var = init_model(B, X, A)
    Y = Y_func()
    u.check_equal(Y, [[-52]])

    J = u.jacobian(Y, X_var)
    assert J.shape == (1, 1, 2, 2)
    J = J.reshape(2, 2)

    u.check_equal(J, u.kron(B, A).T.reshape(d_out, d_in))
    u.check_equal(J, [[4, -12], [-2, 6]])

    # matrix variable, vector output, dvecr Y/dvecr X
    B = [[-4, 3], [2, 6]]
    B, X, A = u.to_pytorches(B, X, A)
    Y_func, X_var = init_model(B, X, A)
    Y = Y_func()
    u.check_equal(Y, [[-52, -81]])
    J = u.jacobian(Y, X_var)
    assert J.shape == (1, 2, 2, 2)
    J1 = u.kron(B, A).T
    assert J1.shape == (2, 4)  # output and input directions are flattened
    u.check_equal(J, J1.reshape(J.shape))
    u.check_equal(J.reshape(J1.shape), J1)

    # matrix variable, matrix output, dvecr Y/dvecX
    A = torch.tensor([[-1., 4], [3, 0]])
    B, X, A = u.to_pytorches(B, X, A)
    Y_func, X_var = init_model(B, X, A)
    Y = Y_func()

    J = u.jacobian(Y, X_var)
    J = J.transpose(0, 1)  # dvecrY/dvecr X -> dvecY/dvecr X
    assert J.shape == (2, 2, 2, 2)

    J1 = u.kron(B, A).T  # this gives order where variable is row vectorized, but output is column vectorized
    assert J1.shape == (4, 4)
    u.check_equal(J, J1.reshape(J.shape))
    u.check_equal(J.reshape(J1.shape), J1)

    # Hessian of matrix variable,  x output
    loss = (Y * Y).sum() / 2
    hess = u.hessian(loss, X_var)
    assert hess.shape == (2, 2, 2, 2)
    hess1 = u.kron(B @ B.t(), A @ A.t())
    assert hess1.shape == (4, 4)
    u.check_equal(hess1.reshape(hess.shape), hess)
    u.check_equal(hess1, hess.reshape(hess1.shape))


def create_toy_model():
    """
    Create model from https://www.wolframcloud.com/obj/yaroslavvb/newton/linear-jacobians-and-hessians.nb
    PyTorch works on transposed representation, hence to obtain Y from notebook, do model(A.T).T
    """

    model: u.SimpleFullyConnected2 = u.SimpleFullyConnected2([2, 2, 2], bias=False)
    autograd_lib.register(model)

    A = torch.tensor([[-1., 4], [3, 0]])
    B = torch.tensor([[-4., 3], [2, 6]])
    X = torch.tensor([[-5., 0], [-2, -6]], requires_grad=True)

    model.layers[0].weight.data.copy_(X)
    model.layers[1].weight.data.copy_(B.t())
    return A, model


def test_gradient_norms():
    """Per-example gradient norms."""
    u.seed_random(1)
    A, model = create_toy_model()

    activations = {}

    def save_activations(layer, a, _):
        if layer != model.layers[0]:
            return
        activations[layer] = a

    with autograd_lib.module_hook(save_activations):
        Y = model(A.t())
        loss = torch.sum(Y * Y) / 2

    norms = {}

    def compute_norms(layer, _, b):
        if layer != model.layers[0]:
            return
        a = activations[layer]
        del activations[layer]  # release memory kept by activations
        norms[layer] = (a * a).sum(dim=1) * (b * b).sum(dim=1)

    with autograd_lib.module_hook(compute_norms):
        loss.backward()

    u.check_equal(norms[model.layers[0]], [3493250, 9708800])


def test_full_hessian():
    u.seed_random(1)
    A, model = create_toy_model()
    data = A.t()
    #    data = data.repeat(3, 1)
    activations = {}

    hess = defaultdict(float)

    def save_activations(layer, a, _):
        activations[layer] = a

    with autograd_lib.module_hook(save_activations):
        Y = model(A.t())
        loss = torch.sum(Y * Y) / 2

    def compute_hess(layer, _, B):
        A = activations[layer]
        n = A.shape[0]

        di = A.shape[1]
        do = B.shape[1]

        BA = torch.einsum("nl,ni->nli", B, A)
        hess[layer] += torch.einsum('nli,nkj->likj', BA, BA)

    with autograd_lib.module_hook(compute_hess):
        autograd_lib.backprop_identity(Y, retain_graph=True)

    # check against autograd
    hess_autograd = u.hessian(loss, model.layers[0].weight)
    hess0 = hess[model.layers[0]]
    u.check_equal(hess_autograd, hess0)

    # check against manual solution
    u.check_equal(hess0.reshape(4, 4),
                  [[425, -75, 170, -30], [-75, 225, -30, 90], [170, -30, 680, -120], [-30, 90, -120, 360]])


def test_full_fisher():
    u.seed_random(1)
    A, model = create_toy_model()

    activations = {}

    def save_activations(layer, a, _):
        if layer != model.layers[0]:
            return
        activations[layer] = a

    with autograd_lib.module_hook(save_activations):
        Y = model(A.t())
        loss = torch.sum(Y * Y) / 2

    fisher = [0]

    def compute_fisher(layer, _, B):
        if layer != model.layers[0]:
            return
        A = activations[layer]
        n = A.shape[0]

        di = A.shape[1]
        do = B.shape[1]

        Jo = torch.einsum("ni,nj->nij", B, A).reshape(n, -1)
        fisher[0] += torch.einsum('ni,nj->ij', Jo, Jo)

    with autograd_lib.module_hook(compute_fisher):
        loss.backward()

    result0 = torch.tensor([[5.383625e+06, -3.675000e+03, 4.846250e+06, -6.195000e+04],
                            [-3.675000e+03, 1.102500e+04, -6.195000e+04, 1.858500e+05],
                            [4.846250e+06, -6.195000e+04, 4.674500e+06, -1.044300e+06],
                            [-6.195000e+04, 1.858500e+05, -1.044300e+06, 3.132900e+06]])
    u.check_close(fisher[0], result0)


def test_full_fisher_multibatch():
    torch.set_default_dtype(torch.float64)
    u.seed_random(1)
    A, model = create_toy_model()

    activations = {}

    def save_activations(layer, a, _):
        if layer != model.layers[0]:
            return
        activations[layer] = a

    fisher = [0]

    def compute_fisher(layer, _, B):
        if layer != model.layers[0]:
            return
        A = activations[layer]
        n = A.shape[0]

        di = A.shape[1]
        do = B.shape[1]

        Jo = torch.einsum("ni,nj->nij", B, A).reshape(n, -1)
        fisher[0] += torch.einsum('ni,nj->ij', Jo, Jo)

    for x in A.t():
        with autograd_lib.module_hook(save_activations):
            y = model(x)
            loss = torch.sum(y * y) / 2

        with autograd_lib.module_hook(compute_fisher):
            loss.backward()

    # result computed using single step forward prop
    result0 = torch.tensor([[5.383625e+06, -3.675000e+03, 4.846250e+06, -6.195000e+04],
                            [-3.675000e+03, 1.102500e+04, -6.195000e+04, 1.858500e+05],
                            [4.846250e+06, -6.195000e+04, 4.674500e+06, -1.044300e+06],
                            [-6.195000e+04, 1.858500e+05, -1.044300e+06, 3.132900e+06]])
    u.check_close(fisher[0], result0)
    # check against autograd
    # hess0 = u.hessian(loss, model.layers[0].weight).reshape([4, 4])
    # u.check_equal(hess[0], hess0)


def test_kfac_hessian():
    A, model = create_toy_model()
    data = A.t()
    data = data.repeat(7, 1)
    n = float(len(data))

    activations = {}
    hess = defaultdict(lambda: AttrDefault(float))

    def save_activations(layer, a, _):
        activations[layer] = a

    def compute_hessian(layer, _, B):
        A = activations[layer]
        hess[layer].AA += torch.einsum("ni,nj->ij", A, A)
        hess[layer].BB += torch.einsum("ni,nj->ij", B, B)

    for x in data:
        with autograd_lib.module_hook(save_activations):
            y = model(x)
            o = y.shape[1]
            loss = torch.sum(y * y) / 2

        with autograd_lib.module_hook(compute_hessian):
            autograd_lib.backprop_identity(y)

    hess0 = hess[model.layers[0]]
    result = u.kron(hess0.BB / n, hess0.AA / o)

    # check result against autograd
    loss = u.least_squares(model(data), aggregation='sum')
    hess0 = u.hessian(loss, model.layers[0].weight).reshape(4, 4)
    u.check_equal(hess0, result)


def test_full_hessian_multibatch():
    A, model = create_toy_model()
    data = A.t()
    data = data.repeat(3, 1)
    n = float(len(data))

    activations = {}
    hess = defaultdict(float)

    def save_activations(layer, a, _):
        activations[layer] = a

    def compute_hessian(layer, _, B):
        A = activations[layer]
        BA = torch.einsum("nl,ni->nli", B, A)
        hess[layer] += torch.einsum('nli,nkj->likj', BA, BA)

    for x in data:
        with autograd_lib.module_hook(save_activations):
            y = model(x)
            loss = torch.sum(y * y) / 2

        with autograd_lib.module_hook(compute_hessian):
            autograd_lib.backprop_identity(y)

    result = hess[model.layers[0]]

    # check result against autograd
    loss = u.least_squares(model(data), aggregation='sum')
    hess0 = u.hessian(loss, model.layers[0].weight)
    u.check_equal(hess0, result)


def test_diagonal_hessian():
    u.seed_random(1)
    A, model = create_toy_model()

    activations = {}

    def save_activations(layer, a, _):
        if layer != model.layers[0]:
            return
        activations[layer] = a

    with autograd_lib.module_hook(save_activations):
        Y = model(A.t())
        loss = torch.sum(Y * Y) / 2

    hess = [0]

    def compute_hess(layer, _, B):
        if layer != model.layers[0]:
            return
        A = activations[layer]
        hess[0] += torch.einsum("ni,nj->ij", B * B, A * A).reshape(-1)

    with autograd_lib.module_hook(compute_hess):
        autograd_lib.backprop_identity(Y, retain_graph=True)

    # check against autograd
    hess0 = u.hessian(loss, model.layers[0].weight).reshape([4, 4])
    u.check_equal(hess[0], torch.diag(hess0))

    # check against manual solution
    u.check_equal(hess[0], [425., 225., 680., 360.])


def test_full_hessian_xent():
    u.seed_random(1)
    torch.set_default_dtype(torch.float64)

    batch_size = 1
    d = [2, 2]
    o = d[-1]
    n = batch_size
    train_steps = 1

    model: u.SimpleModel = u.SimpleFullyConnected(d, nonlin=True, bias=True)
    model.layers[0].weight.data.copy_(torch.eye(2))
    autograd_lib.register(model)
    loss_fn = torch.nn.CrossEntropyLoss()

    data = u.to_logits(torch.tensor([[0.7, 0.3]]))
    targets = torch.tensor([0])

    data = data.repeat([3, 1])
    targets = targets.repeat([3])
    n = len(data)

    activations = {}
    hess = defaultdict(float)

    def save_activations(layer, a, _):
        activations[layer] = a

    with autograd_lib.module_hook(save_activations):
        Y = model(data)
        loss = loss_fn(Y, targets)

    def compute_hess(layer, _, B):
        A = activations[layer]
        BA = torch.einsum("nl,ni->nli", B, A)
        hess[layer] += torch.einsum('nli,nkj->likj', BA, BA)

    with autograd_lib.module_hook(compute_hess):
        autograd_lib.backward_hessian(Y, loss='CrossEntropy', retain_graph=True)

    # check against autograd
    # 0.1459
    hess_autograd = u.hessian(loss, model.layers[0].weight)
    hess0 = hess[model.layers[0]] / n
    u.check_equal(hess_autograd, hess0)


def test_full_hessian_xent_multibatch():
    u.seed_random(1)
    torch.set_default_dtype(torch.float64)

    batch_size = 1
    d = [2, 2]
    o = d[-1]
    n = batch_size
    train_steps = 1

    model: u.SimpleModel = u.SimpleFullyConnected2(d, nonlin=True, bias=True)
    model.layers[0].weight.data.copy_(torch.eye(2))
    autograd_lib.register(model)
    loss_fn = torch.nn.CrossEntropyLoss()

    data = u.to_logits(torch.tensor([[0.7, 0.3]]))
    targets = torch.tensor([0])

    data = data.repeat([3, 1])
    targets = targets.repeat([3])
    n = len(data)

    activations = {}
    hess = defaultdict(float)

    def save_activations(layer, a, _):
        activations[layer] = a

    for i in range(n):
        with autograd_lib.module_hook(save_activations):
            data_batch = data[i: i + 1]
            targets_batch = targets[i: i + 1]
            Y = model(data_batch)
            loss = loss_fn(Y, targets_batch)

        def compute_hess(layer, _, B):
            A = activations[layer]
            BA = torch.einsum("nl,ni->nli", B, A)
            hess[layer] += torch.einsum('nli,nkj->likj', BA, BA)

        with autograd_lib.module_hook(compute_hess):
            autograd_lib.backward_hessian(Y, loss='CrossEntropy')

    # check against autograd
    # 0.1459
    Y = model(data)
    loss = loss_fn(Y, targets)
    hess_autograd = u.hessian(loss, model.layers[0].weight)
    hess0 = hess[model.layers[0]] / n
    u.check_equal(hess_autograd, hess0)


def test_full_hessian_xent_kfac():
    u.seed_random(1)
    torch.set_default_dtype(torch.float64)

    batch_size = 1
    d = [2, 2]
    o = d[-1]
    n = batch_size
    train_steps = 1

    model: u.SimpleModel = u.SimpleFullyConnected2(d, nonlin=True, bias=True)
    model.layers[0].weight.data.copy_(torch.eye(2))
    autograd_lib.register(model)
    loss_fn = torch.nn.CrossEntropyLoss()

    data = u.to_logits(torch.tensor([[0.7, 0.3]]))
    targets = torch.tensor([0])

    data = data.repeat([3, 1])
    targets = targets.repeat([3])
    n = len(data)

    activations = {}
    hess = defaultdict(lambda: AttrDefault(float))

    for i in range(n):
        def save_activations(layer, a, _):
            activations[layer] = a

        with autograd_lib.module_hook(save_activations):
            data_batch = data[i: i + 1]
            targets_batch = targets[i: i + 1]
            Y = model(data_batch)
            o = Y.shape[1]
            loss = loss_fn(Y, targets_batch)

        def compute_hess(layer, _, B):
            A = activations[layer]
            hess[layer].AA += torch.einsum("ni,nj->ij", A, A)
            hess[layer].BB += torch.einsum("ni,nj->ij", B, B)

        with autograd_lib.module_hook(compute_hess):
            autograd_lib.backward_hessian(Y, loss='CrossEntropy')

    # expand
    hess_factored = hess[model.layers[0]]
    hess0 = torch.einsum('kl,ij->kilj', hess_factored.BB / n, hess_factored.AA / o)  # hess for sum loss
    hess0 /= n  # hess for mean loss

    # check against autograd
    # 0.1459
    Y = model(data)
    loss = loss_fn(Y, targets)
    hess_autograd = u.hessian(loss, model.layers[0].weight)
    u.check_equal(hess_autograd, hess0)

    # check diagonal hessian
    diag_autograd = torch.einsum('lili->li', hess_autograd)
    diag_kfac = torch.einsum('ll,ii->li', hess_factored.BB / n, hess_factored.AA / o / n)
    u.check_close(diag_autograd, diag_kfac)


def test_full_hessian_xent_kfac2():
    """Test with uneven layers."""
    u.seed_random(1)
    torch.set_default_dtype(torch.float64)

    batch_size = 1
    d = [3, 2]
    o = d[-1]
    n = batch_size
    train_steps = 1

    model: u.SimpleModel = u.SimpleFullyConnected2(d, nonlin=True, bias=False)
    autograd_lib.register(model)
    loss_fn = torch.nn.CrossEntropyLoss()

    data = u.to_logits(torch.tensor([[0.7, 0.2, 0.1]]))
    targets = torch.tensor([0])

    data = data.repeat([3, 1])
    targets = targets.repeat([3])
    n = len(data)

    activations = {}
    hess = defaultdict(lambda: AttrDefault(float))

    for i in range(n):
        def save_activations(layer, A, _):
            activations[layer] = A
            hess[layer].AA += torch.einsum("ni,nj->ij", A, A)

        with autograd_lib.module_hook(save_activations):
            data_batch = data[i: i + 1]
            targets_batch = targets[i: i + 1]
            Y = model(data_batch)
            o = Y.shape[1]
            loss = loss_fn(Y, targets_batch)

        def compute_hess(layer, _, B):
            hess[layer].BB += torch.einsum("ni,nj->ij", B, B)

        with autograd_lib.module_hook(compute_hess):
            autograd_lib.backward_hessian(Y, loss='CrossEntropy')

    # expand
    hess_factored = hess[model.layers[0]]
    hess0 = torch.einsum('kl,ij->kilj', hess_factored.BB / n, hess_factored.AA / o)  # hess for sum loss
    hess0 /= n  # hess for mean loss

    # check against autograd
    # 0.1459
    Y = model(data)
    loss = loss_fn(Y, targets)
    hess_autograd = u.hessian(loss, model.layers[0].weight)
    u.check_equal(hess_autograd, hess0)


def test_full_hessian_xent_mnist():
    u.seed_random(1)

    data_width = 3
    batch_size = 2
    d = [data_width ** 2, 10]
    o = d[-1]
    n = batch_size
    train_steps = 1

    model: u.SimpleModel = u.SimpleFullyConnected2(d, nonlin=False, bias=True)
    autograd_lib.register(model)
    dataset = u.TinyMNIST(dataset_size=batch_size, data_width=data_width, original_targets=True)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    train_iter = iter(trainloader)

    loss_fn = torch.nn.CrossEntropyLoss()

    hess = defaultdict(float)
    for train_step in range(train_steps):
        data, targets = next(train_iter)

        activations = {}

        def save_activations(layer, a, _):
            activations[layer] = a

        with autograd_lib.module_hook(save_activations):
            output = model(data)
            loss = loss_fn(output, targets)

        def compute_hess(layer, _, B):
            A = activations[layer]
            BA = torch.einsum("nl,ni->nli", B, A)
            hess[layer] += torch.einsum('nli,nkj->likj', BA, BA)

        with autograd_lib.module_hook(compute_hess):
            autograd_lib.backward_hessian(output, loss='CrossEntropy', retain_graph=True)

        # compute Hessian through autograd
        H_autograd = u.hessian(loss, model.layers[0].weight)
        u.check_close(hess[model.layers[0]] / n, H_autograd)


def test_full_hessian_xent_mnist_multilayer():
    """Test regular and diagonal hessian computation."""
    u.seed_random(1)

    data_width = 3
    batch_size = 2
    d = [data_width ** 2, 6, 10]
    o = d[-1]
    n = batch_size
    train_steps = 1

    model: u.SimpleModel = u.SimpleFullyConnected2(d, nonlin=False, bias=True)
    autograd_lib.register(model)
    dataset = u.TinyMNIST(dataset_size=batch_size, data_width=data_width, original_targets=True)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    train_iter = iter(trainloader)

    loss_fn = torch.nn.CrossEntropyLoss()

    hess = defaultdict(float)
    hess_diag = defaultdict(float)
    for train_step in range(train_steps):
        data, targets = next(train_iter)

        activations = {}

        def save_activations(layer, a, _):
            activations[layer] = a

        with autograd_lib.module_hook(save_activations):
            output = model(data)
            loss = loss_fn(output, targets)

        def compute_hess(layer, _, B):
            A = activations[layer]
            BA = torch.einsum("nl,ni->nli", B, A)
            hess[layer] += torch.einsum('nli,nkj->likj', BA, BA)
            hess_diag[layer] += torch.einsum("ni,nj->ij", B * B, A * A)

        with autograd_lib.module_hook(compute_hess):
            autograd_lib.backward_hessian(output, loss='CrossEntropy', retain_graph=True)

        # compute Hessian through autograd
        H_autograd = u.hessian(loss, model.layers[0].weight)
        u.check_close(hess[model.layers[0]] / batch_size, H_autograd)
        diag_autograd = torch.einsum('lili->li', H_autograd)
        u.check_close(diag_autograd, hess_diag[model.layers[0]] / batch_size)

        H_autograd = u.hessian(loss, model.layers[1].weight)
        u.check_close(hess[model.layers[1]] / batch_size, H_autograd)
        diag_autograd = torch.einsum('lili->li', H_autograd)
        u.check_close(diag_autograd, hess_diag[model.layers[1]] / batch_size)


def _test_kfac_hessian_xent_mnist():
    u.seed_random(1)

    data_width = 3
    batch_size = 2
    d = [data_width ** 2, 10]
    o = d[-1]
    n = batch_size
    train_steps = 1

    model: u.SimpleModel = u.SimpleFullyConnected2(d, nonlin=False, bias=True)
    autograd_lib.register(model)
    dataset = u.TinyMNIST(dataset_size=batch_size, data_width=data_width, original_targets=True)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    train_iter = iter(trainloader)

    loss_fn = torch.nn.CrossEntropyLoss()

    activations = {}
    hess = defaultdict(lambda: AttrDefault(float))
    for train_step in range(train_steps):
        data, targets = next(train_iter)

        activations = {}

        def save_activations(layer, a, _):
            activations[layer] = a

        with autograd_lib.module_hook(save_activations):
            output = model(data)
            loss = loss_fn(output, targets)

        def compute_hess(layer, _, B):
            A = activations[layer]
            hess[layer].AA += torch.einsum("ni,nj->ij", A, A)
            hess[layer].BB += torch.einsum("ni,nj->ij", B, B)

        with autograd_lib.module_hook(compute_hess):
            autograd_lib.backward_hessian(output, loss='CrossEntropy', retain_graph=True)

        hess_factored = hess[model.layers[0]]
        hess0 = torch.einsum('kl,ij->kilj', hess_factored.BB / n, hess_factored.AA / o)  # hess for sum loss
        hess0 /= n  # hess for mean loss

        # compute Hessian through autograd
        H_autograd = u.hessian(loss, model.layers[0].weight)
        rel_error = torch.norm((hess0 - H_autograd).flatten()) / torch.norm(H_autograd.flatten())
        assert rel_error < 0.01  # 0.0057


def test_kfac_jacobian_mnist():
    u.seed_random(1)

    data_width = 3
    d = [data_width ** 2, 8, 10]
    model: u.SimpleMLP = u.SimpleMLP(d, nonlin=False)
    autograd_lib.register(model)

    batch_size = 4
    stats_steps = 2
    n = batch_size * stats_steps

    dataset = u.TinyMNIST(dataset_size=n, data_width=data_width, original_targets=True)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    train_iter = iter(trainloader)

    loss_fn = torch.nn.CrossEntropyLoss()

    activations = {}
    jacobians = defaultdict(lambda: AttrDefault(float))
    total_data = []

    # sum up statistics over n examples
    for train_step in range(stats_steps):
        data, targets = next(train_iter)
        total_data.append(data)

        activations = {}

        def save_activations(layer, A, _):
            activations[layer] = A
            jacobians[layer].AA += torch.einsum("ni,nj->ij", A, A)

        with autograd_lib.module_hook(save_activations):
            output = model(data)
            loss = loss_fn(output, targets)

        def compute_jacobian(layer, _, B):
            A = activations[layer]
            jacobians[layer].BB += torch.einsum("ni,nj->ij", B, B)
            jacobians[layer].diag += torch.einsum("ni,nj->ij", B * B, A * A)

        with autograd_lib.module_hook(compute_jacobian):
            autograd_lib.backward_jacobian(output)

    for layer in model.layers:
        jacobian0 = jacobians[layer]
        jacobian_full = torch.einsum('kl,ij->kilj', jacobian0.BB / n, jacobian0.AA / n)
        jacobian_diag = jacobian0.diag / n

        J = u.jacobian(model(torch.cat(total_data)), layer.weight)
        J_autograd = torch.einsum('noij,nokl->ijkl', J, J) / n
        u.check_equal(jacobian_full, J_autograd)

        u.check_equal(jacobian_diag, torch.einsum('ikik->ik', J_autograd))


def test_kfac_fisher_mnist():
    u.seed_random(1)

    data_width = 3
    d = [data_width ** 2, 8, 10]
    model: u.SimpleMLP = u.SimpleMLP(d, nonlin=False)
    autograd_lib.register(model)

    batch_size = 4
    stats_steps = 2
    n = batch_size * stats_steps

    dataset = u.TinyMNIST(dataset_size=n, data_width=data_width, original_targets=True)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    train_iter = iter(trainloader)

    loss_fn = torch.nn.CrossEntropyLoss()

    activations = {}
    fishers = defaultdict(lambda: AttrDefault(float))
    total_data = []

    # sum up statistics over n examples
    for train_step in range(stats_steps):
        data, targets = next(train_iter)
        total_data.append(data)

        activations = {}

        def save_activations(layer, A, _):
            activations[layer] = A
            fishers[layer].AA += torch.einsum("ni,nj->ij", A, A)

        with autograd_lib.module_hook(save_activations):
            output = model(data)
            loss = loss_fn(output, targets) * len(data)  # remove data normalization

        def compute_fisher(layer, _, B):
            A = activations[layer]
            fishers[layer].BB += torch.einsum("ni,nj->ij", B, B)
            fishers[layer].diag += torch.einsum("ni,nj->ij", B * B, A * A)

        with autograd_lib.module_hook(compute_fisher):
            autograd_lib.backward_jacobian(output)

    for layer in model.layers:
        fisher0 = fishers[layer]
        fisher_full = torch.einsum('kl,ij->kilj', fisher0.BB / n, fisher0.AA / n)
        fisher_diag = fisher0.diag / n

        u.check_equal(torch.einsum('ikik->ik', fisher_full), fisher_diag)


# list replacement. Workaround for AttrDict automatically converting list objects to Tuple
class MyList:
    def __init__(self, *args, **kwargs):
        super(MyList, self).__init__(*args, **kwargs)
        self.storage = list()

    def __getattr__(self, *_args, **_kwargs):
        return self.storage.__getattribute__(*_args, **_kwargs)

    def normal_form(self):
        return self.value()

    def value(self):
        return self.storage


def test_grad_norms():
    """Test computing gradient norms using various methods."""

    u.seed_random(1)
    # torch.set_default_dtype(torch.float64)

    data_width = 3
    batch_size = 2
    d = [data_width ** 2, 6, 10]
    o = d[-1]
    stats_steps = 2
    num_samples = batch_size * stats_steps  # number of samples used in computation of curvature stats

    model: u.SimpleModel = u.SimpleMLP(d, nonlin=True, bias=True)
    loss_fn = torch.nn.CrossEntropyLoss()
    autograd_lib.register(model)

    dataset = u.TinyMNIST(dataset_size=num_samples, data_width=data_width, original_targets=True)
    stats_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    stats_iter = iter(stats_loader)

    moments = defaultdict(lambda: AttrDefault(float))
    norms = defaultdict(lambda: AttrDefault(MyList))
    data_batches = []
    targets_batches = []
    for stats_step in range(stats_steps):
        data, targets = next(stats_iter)
        data_batches.append(data)
        targets_batches.append(targets)

        activations = {}
        def forward_aggregate(layer, A, _):
            activations[layer] = A
            moments[layer].AA += torch.einsum('ni,nj->ij', A, A)
            moments[layer].a += torch.einsum("ni->i", A)

        with autograd_lib.module_hook(forward_aggregate):
            output = model(data)
            loss_fn(output, targets)

        def backward_aggregate(layer, _, B):
            A = activations[layer]
            moments[layer].b += torch.einsum("nk->k", B)
            moments[layer].BA += torch.einsum("nl,ni->li", B, A)
            moments[layer].BB += torch.einsum("nk,nl->kl", B, B)
            moments[layer].BABA += torch.einsum('nl,ni,nk,nj->likj', B, A, B, A)

        with autograd_lib.module_hook(backward_aggregate):
            autograd_lib.backward_hessian(output, loss='CrossEntropy', retain_graph=True)

    # compare against results using autograd
    data = torch.cat(data_batches)
    targets = torch.cat(targets_batches)

    with autograd_lib.save_activations2() as activations:
        loss = loss_fn(model(data), targets)

    def normalize_moments(d, n):
        result = AttrDict()
        for val in d:
            if type(d[val]) == torch.Tensor:
                result[val] = d[val] / n
        return result

    def compute_norms(layer, _, B):
        A = activations[layer]
        for kind in ('zero_order', 'kfac', 'isserlis', 'full'):
            normalized_moments = normalize_moments(moments[layer], num_samples)
            norms_list = getattr(norms[layer], kind)
            norms_list.extend(autograd_lib.grad_norms(A, B, normalized_moments, approx=kind))

    with autograd_lib.module_hook(compute_norms):
        model.zero_grad()
        (len(data) * loss).backward(retain_graph=True)

        print(norms[model.layers[0]].zero_order.value())

    for layer in model.layers:
        output = model(data)
        losses = torch.stack([loss_fn(output[i:i + 1], targets[i:i + 1]) for i in range(len(data))])
        grads = u.jacobian(losses, layer.weight)
        grad_norms = torch.einsum('nij,nij->n', grads, grads)
        u.check_close(grad_norms, norms[layer].zero_order)

        # test gradient norms with custom metric
        kfac_norms, isserlis_norms, full_norms = [u.to_pytorch(getattr(norms[layer], k)) for k in ('kfac', 'isserlis', 'full')]
        error_kfac = max(abs(kfac_norms - full_norms))
        error_isserlis = max(abs(isserlis_norms - full_norms))
        assert error_isserlis < 1e-4
        assert error_kfac < 1e-4


if __name__ == '__main__':
    # test_gradient_norms()
    # test_full_hessian()
    # test_diagonal_hessian()
    # test_full_fisher()
    # test_full_fisher_multibatch()
    #  test_full_hessian_multibatch()
    #  test_kfac_hessian()
    # test_full_hessian_xent()
    #    test_full_hessian_xent_multibatch()
    test_full_hessian_xent_kfac()
    test_full_hessian_xent_kfac2()
    # test_full_hessian_xent_mnist()
    test_full_hessian_xent_mnist_multilayer()
    test_kfac_jacobian_mnist()
    #    _test_kfac_jacobian_mnist()
    test_kfac_fisher_mnist()

    test_grad_norms()
    # test_hooks()
    # test_activations_contextmanager()
    # test_jacobian()
    # test_backprop()
    # u.run_all_tests(sys.modules[__name__])


================================================
FILE: autotune/autograd_test.py
================================================
# Tests that compare manual computation of quantities against PyTorch autograd

import os
import sys

import globals as gl
import pytest
import torch
from torch import nn as nn
import wandb
from torch import optim
from torch.utils.tensorboard import SummaryWriter

import torch.nn.functional as F

import util as u

import numpy as np

import autograd_lib

unfold = torch.nn.functional.unfold
fold = torch.nn.functional.fold


def test_autoencoder_minimize():
    """Minimize autoencoder for a few steps."""
    u.seed_random(1)
    torch.set_default_dtype(torch.float32)
    data_width = 4
    targets_width = 2

    batch_size = 64
    dataset = u.TinyMNIST(data_width=data_width, targets_width=targets_width,
                          dataset_size=batch_size)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)

    d1 = data_width ** 2
    d2 = 10
    d3 = targets_width ** 2
    model: u.SimpleModel = u.SimpleFullyConnected([d1, d2, d3], nonlin=True)
    model.disable_hooks()

    optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)

    def loss_fn(data, targets):
        err = data - targets.view(-1, data.shape[1])
        assert len(data) == batch_size
        return torch.sum(err * err) / 2 / len(data)

    loss = 0
    for i in range(10):
        data, targets = next(iter(trainloader))
        optimizer.zero_grad()
        loss = loss_fn(model(data), targets)
        if i == 0:
            assert loss > 0.054
            pass
        loss.backward()
        optimizer.step()

    assert loss < 0.0398


def test_autoencoder_newton():
    """Use Newton's method to train autoencoder."""

    image_size = 3
    batch_size = 64
    dataset = u.TinyMNIST(data_width=image_size, targets_width=image_size,
                          dataset_size=batch_size)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)

    d = image_size ** 2  # hidden layer size
    u.seed_random(1)
    model: u.SimpleModel = u.SimpleFullyConnected([d, d])
    model.disable_hooks()

    optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9)

    def loss_fn(data, targets):
        err = data - targets.view(-1, data.shape[1])
        assert len(data) == batch_size
        return torch.sum(err * err) / 2 / len(data)

    for i in range(10):
        data, targets = next(iter(trainloader))
        optimizer.zero_grad()
        loss = loss_fn(model(data), targets)
        if i > 0:
            assert loss < 1e-9

        loss.backward()
        W = model.layers[0].weight
        grad = u.tvec(W.grad)

        loss = loss_fn(model(data), targets)
        H = u.hessian(loss, W)

        #  for col-major: H = H.transpose(0, 1).transpose(2, 3).reshape(d**2, d**2)
        H = H.reshape(d ** 2, d ** 2)

        #  For col-major: W1 = u.unvec(u.vec(W) - u.pinv(H) @ grad, d)
        # W1 = u.untvec(u.tvec(W) - grad @ u.pinv(H), d)
        W1 = u.untvec(u.tvec(W) - grad @ H.pinverse(), d)
        W.data.copy_(W1)


def test_main_autograd():
    u.seed_random(1)
    log_wandb = False
    autograd_check = True
    use_double = False

    logdir = u.get_unique_logdir('/tmp/autoencoder_test/run')

    run_name = os.path.basename(logdir)
    gl.event_writer = SummaryWriter(logdir)

    batch_size = 5

    try:
        if log_wandb:
            wandb.init(project='test-autograd_test', name=run_name)
            wandb.tensorboard.patch(tensorboardX=False)
            wandb.config['batch'] = batch_size
    except Exception as e:
        print(f"wandb crash with {e}")

    data_width = 4
    targets_width = 2

    d1 = data_width ** 2
    d2 = 10
    d3 = targets_width ** 2
    o = d3
    n = batch_size
    d = [d1, d2, d3]
    model: u.SimpleModel = u.SimpleFullyConnected(d, nonlin=True, bias=True)
    if use_double:
        model = model.double()
    train_steps = 3

    dataset = u.TinyMNIST(data_width=data_width, targets_width=targets_width,
                          dataset_size=batch_size * train_steps)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    train_iter = iter(trainloader)

    def loss_fn(data, targets):
        err = data - targets.view(-1, data.shape[1])
        assert len(data) == batch_size
        return torch.sum(err * err) / 2 / len(data)

    loss_hessian = u.HessianExactSqrLoss()

    gl.token_count = 0
    for train_step in range(train_steps):
        data, targets = next(train_iter)
        if use_double:
            data, targets = data.double(), targets.double()

        # get gradient values
        model.skip_backward_hooks = False
        model.skip_forward_hooks = False
        u.clear_backprops(model)
        output = model(data)
        loss = loss_fn(output, targets)
        loss.backward(retain_graph=True)
        model.skip_forward_hooks = True

        output = model(data)
        for bval in loss_hessian(output):
            if use_double:
                bval = bval.double()
            output.backward(bval, retain_graph=True)

        model.skip_backward_hooks = True

        for (i, layer) in enumerate(model.layers):

            #############################
            # Gradient stats
            #############################
            A_t = layer.activations
            assert A_t.shape == (n, d[i])

            # add factor of n because backprop takes loss averaged over batch, while we need per-example loss
            B_t = layer.backprops_list[0] * n
            assert B_t.shape == (n, d[i + 1])

            # per example gradients
            G = u.khatri_rao_t(B_t, A_t)
            assert G.shape == (n, d[i+1] * d[i])
            Gbias = B_t
            assert Gbias.shape == (n, d[i + 1])

            # average gradient
            g = G.sum(dim=0, keepdim=True) / n
            gb = Gbias.sum(dim=0, keepdim=True) / n
            assert g.shape == (1, d[i] * d[i + 1])
            assert gb.shape == (1, d[i + 1])

            if autograd_check:
                u.check_close(B_t.t() @ A_t / n, layer.weight.saved_grad)
                u.check_close(g.reshape(d[i + 1], d[i]), layer.weight.saved_grad)
                u.check_close(torch.einsum('nj->j', B_t) / n, layer.bias.saved_grad)
                u.check_close(torch.mean(B_t, dim=0), layer.bias.saved_grad)
                u.check_close(torch.einsum('ni,nj->ij', B_t, A_t)/n, layer.weight.saved_grad)

            # empirical Fisher
            efisher = G.t() @ G / n
            _sigma = efisher - g.t() @ g

            #############################
            # Hessian stats
            #############################
            A_t = layer.activations
            Bh_t = [layer.backprops_list[out_idx + 1] for out_idx in range(o)]
            Amat_t = torch.cat([A_t] * o, dim=0)  # todo: can instead replace with a khatri-rao loop
            Bmat_t = torch.cat(Bh_t, dim=0)
            Amat_t2 = torch.stack([A_t]*o, dim=0)  # o, n, in_dim
            Bmat_t2 = torch.stack(Bh_t, dim=0)  # o, n, out_dim

            assert Amat_t.shape == (n * o, d[i])
            assert Bmat_t.shape == (n * o, d[i + 1])

            Jb = u.khatri_rao_t(Bmat_t, Amat_t)  # batch output Jacobian
            H = Jb.t() @ Jb / n
            Jb2 = torch.einsum('oni,onj->onij', Bmat_t2, Amat_t2)
            u.check_close(H.reshape(d[i+1], d[i], d[i+1], d[i]), torch.einsum('onij,onkl->ijkl', Jb2, Jb2)/n)

            Hbias = Bmat_t.t() @ Bmat_t / n
            u.check_close(Hbias, torch.einsum('ni,nj->ij', Bmat_t, Bmat_t) / n)

            if autograd_check:
                model.zero_grad()
                output = model(data)
                loss = loss_fn(output, targets)
                H_autograd = u.hessian(loss, layer.weight)
                Hbias_autograd = u.hessian(loss, layer.bias)
                u.check_close(H, H_autograd.reshape(d[i+1] * d[i], d[i+1] * d[i]))
                u.check_close(Hbias, Hbias_autograd)


def test_unfold():
    """Reproduce convolution as a special case of matrix multiplication with unfolded input tensors"""
    gl.skip_backward_hooks = False
    gl.skip_forward_hooks = False
    gl.backward_idx = 0

    N, Xc, Xh, Xw = 1, 2, 3, 3
    model: u.SimpleModel = u.SimpleConvolutional([Xc, 2])

    weight_buffer = model.layers[0].weight.data
    weight_buffer.copy_(torch.ones_like(weight_buffer))
    dims = N, Xc, Xh, Xw

    size = np.prod(dims)
    X = torch.arange(0, size).reshape(*dims)

    def loss_fn(data):
        err = data.reshape(len(data), -1)
        return torch.sum(err * err) / 2 / len(data)

    layer = model.layers[0]
    output = model(X)
    loss = loss_fn(output)
    loss.backward()

    u.check_close(layer.activations, X)
    assert layer.backprops_list[0].shape == layer.output.shape

    unfold = torch.nn.functional.unfold
    fold = torch.nn.functional.fold
    out_unf = layer.weight.view(layer.weight.size(0), -1) @ unfold(layer.activations, (2, 2))
    u.check_close(fold(out_unf, layer.output.shape[2:], (1, 1)), output)


def test_cross_entropy_hessian_tiny():
    u.seed_random(1)

    batch_size = 1
    d = [2, 2]
    o = d[-1]
    n = batch_size
    train_steps = 1

    model: u.SimpleModel = u.SimpleFullyConnected(d, nonlin=True, bias=True)
    model.layers[0].weight.data.copy_(torch.eye(2))

    loss_fn = torch.nn.CrossEntropyLoss()
    loss_hessian = u.HessianExactCrossEntropyLoss()

    data = u.to_logits(torch.tensor([[0.7, 0.3]]))
    targets = torch.tensor([0])

    # get gradient values
    u.clear_backprops(model)
    model.skip_forward_hooks = False
    model.skip_backward_hooks = False
    output = model(data)

    for bval in loss_hessian(output):
        output.backward(bval, retain_graph=True)
    i = 0
    layer = model.layers[i]
    H, Hbias = u.hessian_from_backprops(layer.activations,
                                        layer.backprops_list,
                                        bias=True)
    model.skip_forward_hooks = True
    model.skip_backward_hooks = True

    # compute Hessian through autograd
    model.zero_grad()
    output = model(data)
    loss = loss_fn(output, targets)
    H_autograd = u.hessian(loss, layer.weight)
    u.check_close(H, H_autograd.reshape(d[i] * d[i + 1], d[i] * d[i + 1]))
    Hbias_autograd = u.hessian(loss, layer.bias)
    u.check_close(Hbias, Hbias_autograd)


def test_cross_entropy_hessian_mnist():
    u.seed_random(1)

    data_width = 3
    batch_size = 2
    d = [data_width**2, 10]
    o = d[-1]
    n = batch_size
    train_steps = 1

    model: u.SimpleModel = u.SimpleFullyConnected(d, nonlin=False, bias=True)

    dataset = u.TinyMNIST(dataset_size=batch_size, data_width=data_width, original_targets=True)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    train_iter = iter(trainloader)

    loss_fn = torch.nn.CrossEntropyLoss()
    loss_hessian = u.HessianExactCrossEntropyLoss()

    gl.token_count = 0
    for train_step in range(train_steps):
        data, targets = next(train_iter)

        # get gradient values
        u.clear_backprops(model)
        model.skip_forward_hooks = False
        model.skip_backward_hooks = False
        output = model(data)
        for bval in loss_hessian(output):
            output.backward(bval, retain_graph=True)
        i = 0
        layer = model.layers[i]
        H, Hbias = u.hessian_from_backprops(layer.activations,
                                            layer.backprops_list,
                                            bias=True)
        model.skip_forward_hooks = True
        model.skip_backward_hooks = True

        # compute Hessian through autograd
        model.zero_grad()
        output = model(data)
        loss = loss_fn(output, targets)
        H_autograd = u.hessian(loss, layer.weight).reshape(d[i] * d[i + 1], d[i] * d[i + 1])
        u.check_close(H, H_autograd)

        Hbias_autograd = u.hessian(loss, layer.bias)
        u.check_close(Hbias, Hbias_autograd)


def test_hessian():
    """Tests of Hessian computation."""
    u.seed_random(1)
    batch_size = 500

    data_width = 4
    targets_width = 4

    d1 = data_width ** 2
    d2 = 10
    d3 = targets_width ** 2
    o = d3
    N = batch_size
    d = [d1, d2, d3]

    dataset = u.TinyMNIST(data_width=data_width, targets_width=targets_width, dataset_size=batch_size)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    train_iter = iter(trainloader)
    data, targets = next(train_iter)

    def loss_fn(data, targets):
        assert len(data) == len(targets)
        err = data - targets.view(-1, data.shape[1])
        return torch.sum(err * err) / 2 / len(data)

    u.seed_random(1)
    model: u.SimpleModel = u.SimpleFullyConnected(d, nonlin=False, bias=True)

    # backprop hessian and compare against autograd
    hessian_backprop = u.HessianExactSqrLoss()
    output = model(data)
    for bval in hessian_backprop(output):
        output.backward(bval, retain_graph=True)

    i, layer = next(enumerate(model.layers))
    A_t = layer.activations
    Bh_t = layer.backprops_list
    H, Hb = u.hessian_from_backprops(A_t, Bh_t, bias=True)

    model.disable_hooks()
    H_autograd = u.hessian(loss_fn(model(data), targets), layer.weight)
    u.check_close(H, H_autograd.reshape(d[i + 1] * d[i], d[i + 1] * d[i]),
                  rtol=1e-4, atol=1e-7)
    Hb_autograd = u.hessian(loss_fn(model(data), targets), layer.bias)
    u.check_close(Hb, Hb_autograd, rtol=1e-4, atol=1e-7)

    # check first few per-example Hessians
    Hi, Hb_i = u.per_example_hess(A_t, Bh_t, bias=True)
    u.check_close(H, Hi.mean(dim=0))
    u.check_close(Hb, Hb_i.mean(dim=0), atol=2e-6, rtol=1e-5)

    for xi in range(5):
        loss = loss_fn(model(data[xi:xi + 1, ...]), targets[xi:xi + 1])
        H_autograd = u.hessian(loss, layer.weight)
        u.check_close(Hi[xi], H_autograd.reshape(d[i + 1] * d[i], d[i + 1] * d[i]))
        Hbias_autograd = u.hessian(loss, layer.bias)
        u.check_close(Hb_i[i], Hbias_autograd)

    # get subsampled Hessian
    u.seed_random(1)
    model = u.SimpleFullyConnected(d, nonlin=False)
    hessian_backprop = u.HessianSampledSqrLoss(num_samples=1)

    output = model(data)
    for bval in hessian_backprop(output):
        output.backward(bval, retain_graph=True)
    model.disable_hooks()
    i, layer = next(enumerate(model.layers))
    H_approx1 = u.hessian_from_backprops(layer.activations, layer.backprops_list)

    # get subsampled Hessian with more samples
    u.seed_random(1)
    model = u.SimpleFullyConnected(d, nonlin=False)

    hessian_backprop = u.HessianSampledSqrLoss(num_samples=o)
    output = model(data)
    for bval in hessian_backprop(output):
        output.backward(bval, retain_graph=True)
    model.disable_hooks()
    i, layer = next(enumerate(model.layers))
    H_approx2 = u.hessian_from_backprops(layer.activations, layer.backprops_list)

    assert abs(u.l2_norm(H) / u.l2_norm(H_approx1) - 1) < 0.08, abs(u.l2_norm(H) / u.l2_norm(H_approx1) - 1)  # 0.0612
    assert abs(u.l2_norm(H) / u.l2_norm(H_approx2) - 1) < 0.03, abs(u.l2_norm(H) / u.l2_norm(H_approx2) - 1)  # 0.0239
    assert u.kl_div_cov(H_approx1, H) < 0.3, u.kl_div_cov(H_approx1, H)  # 0.222
    assert u.kl_div_cov(H_approx2, H) < 0.2, u.kl_div_cov(H_approx2, H)  # 0.1233


def test_conv_grad():
    """Test per-example gradient computation for conv layer.

    """

    u.seed_random(1)
    N, Xc, Xh, Xw = 3, 2, 3, 7
    dd = [Xc, 2]

    Kh, Kw = 2, 3
    Oh, Ow = Xh - Kh + 1, Xw - Kw + 1
    model = u.SimpleConvolutional(dd, kernel_size=(Kh, Kw), bias=True).double()

    weight_buffer = model.layers[0].weight.data

    # output channels, input channels, height, width
    assert weight_buffer.shape == (dd[1], dd[0], Kh, Kw)

    input_dims = N, Xc, Xh, Xw
    size = int(np.prod(input_dims))
    X = torch.arange(0, size).reshape(*input_dims).double()

    def loss_fn(data):
        err = data.reshape(len(data), -1)
        return torch.sum(err * err) / 2 / len(data)

    layer = model.layers[0]
    output = model(X)
    loss = loss_fn(output)
    loss.backward()

    u.check_equal(layer.activations, X)

    assert layer.backprops_list[0].shape == layer.output.shape
    assert layer.output.shape == (N, dd[1], Oh, Ow)

    out_unf = layer.weight.view(layer.weight.size(0), -1) @ unfold(layer.activations, (Kh, Kw))
    assert out_unf.shape == (N, dd[1], Oh * Ow)
    reshaped_bias = layer.bias.reshape(1, dd[1], 1)  # (Co,) -> (1, Co, 1)
    out_unf = out_unf + reshaped_bias

    u.check_equal(fold(out_unf, (Oh, Ow), (1, 1)), output)  # two alternative ways of reshaping
    u.check_equal(out_unf.view(N, dd[1], Oh, Ow), output)

    # Unfold produces patches with output dimension merged, while in backprop they are not merged
    # Hence merge the output (width/height) dimension
    assert unfold(layer.activations, (Kh, Kw)).shape == (N, Xc * Kh * Kw, Oh * Ow)
    assert layer.backprops_list[0].shape == (N, dd[1], Oh, Ow)

    grads_bias = layer.backprops_list[0].sum(dim=(2, 3)) * N
    mean_grad_bias = grads_bias.sum(dim=0) / N
    u.check_equal(mean_grad_bias, layer.bias.grad)

    Bt = layer.backprops_list[0] * N   # remove factor of N applied during loss batch averaging
    assert Bt.shape == (N, dd[1], Oh, Ow)
    Bt = Bt.reshape(N, dd[1], Oh*Ow)
    At = unfold(layer.activations, (Kh, Kw))
    assert At.shape == (N, dd[0] * Kh * Kw, Oh*Ow)

    grad_unf = torch.einsum('ijk,ilk->ijl', Bt, At)
    assert grad_unf.shape == (N, dd[1], dd[0] * Kh * Kw)

    grads = grad_unf.reshape((N, dd[1], dd[0], Kh, Kw))
    u.check_equal(grads.mean(dim=0), layer.weight.grad)

    # compute per-example gradients using autograd, compare against manual computation
    for i in range(N):
        u.clear_backprops(model)
        output = model(X[i:i + 1, ...])
        loss = loss_fn(output)
        loss.backward()
        u.check_equal(grads[i], layer.weight.grad)
        u.check_equal(grads_bias[i], layer.bias.grad)


def test_conv_hessian():
    """Test per-example gradient computation for conv layer."""
    u.seed_random(1)
    n, Xc, Xh, Xw = 3, 2, 3, 7
    dd = [Xc, 2]

    Kh, Kw = 2, 3
    Oh, Ow = Xh - Kh + 1, Xw - Kw + 1
    model: u.SimpleModel = u.ReshapedConvolutional(dd, kernel_size=(Kh, Kw), bias=True)
    weight_buffer = model.layers[0].weight.data

    assert (Kh, Kw) == model.layers[0].kernel_size

    data = torch.randn((n, Xc, Xh, Xw))

    # output channels, input channels, height, width
    assert weight_buffer.shape == (dd[1], dd[0], Kh, Kw)

    def loss_fn(data):
        err = data.reshape(len(data), -1)
        return torch.sum(err * err) / 2 / len(data)

    loss_hessian = u.HessianExactSqrLoss()
    # o = Oh * Ow * dd[1]

    output = model(data)
    o = output.shape[1]
    for bval in loss_hessian(output):
        output.backward(bval, retain_graph=True)
    assert loss_hessian.num_samples == o

    i, layer = next(enumerate(model.layers))

    At = unfold(layer.activations, (Kh, Kw))    # -> n, Xc * Kh * Kw, Oh * Ow
    assert At.shape == (n, dd[0] * Kh * Kw, Oh*Ow)

    #  o, n, dd[1], Oh, Ow -> o, n, dd[1], Oh*Ow
    Bh_t = torch.stack([Bt.reshape(n, dd[1], Oh*Ow) for Bt in layer.backprops_list])
    assert Bh_t.shape == (o, n, dd[1], Oh*Ow)
    Ah_t = torch.stack([At]*o)
    assert Ah_t.shape == (o, n, dd[0] * Kh * Kw, Oh*Ow)

    # sum out the output patch dimension
    Jb = torch.einsum('onij,onkj->onik', Bh_t, Ah_t)  # => o, n, dd[1], dd[0] * Kh * Kw
    Hi = torch.einsum('onij,onkl->nijkl', Jb, Jb)     # => n, dd[1], dd[0]*Kh*Kw, dd[1], dd[0]*Kh*Kw
    Jb_bias = torch.einsum('onij->oni', Bh_t)
    Hb_i = torch.einsum('oni,onj->nij', Jb_bias, Jb_bias)
    H = Hi.mean(dim=0)
    Hb = Hb_i.mean(dim=0)

    model.disable_hooks()
    loss = loss_fn(model(data))
    H_autograd = u.hessian(loss, layer.weight)
    assert H_autograd.shape == (dd[1], dd[0], Kh, Kw, dd[1], dd[0], Kh, Kw)
    assert H.shape == (dd[1], dd[0]*Kh*Kw, dd[1], dd[0]*Kh*Kw)
    u.check_close(H, H_autograd.reshape(H.shape), rtol=1e-4, atol=1e-7)

    Hb_autograd = u.hessian(loss, layer.bias)
    assert Hb_autograd.shape == (dd[1], dd[1])
    u.check_close(Hb, Hb_autograd)

    assert len(Bh_t) == loss_hessian.num_samples == o
    for xi in range(n):
        loss = loss_fn(model(data[xi:xi + 1, ...]))
        H_autograd = u.hessian(loss, layer.weight)
        u.check_close(Hi[xi], H_autograd.reshape(H.shape))
        Hb_autograd = u.hessian(loss, layer.bias)
        u.check_close(Hb_i[xi], Hb_autograd)
        assert Hb_i[xi, 0, 0] == Oh*Ow   # each output has curvature 1, bias term adds up Oh*Ow of them


# Lenet-5 from https://github.com/pytorch/examples/blob/master/mnist/main.py
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4 * 4 * 50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4 * 4 * 50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


# Tiny LeNet-5 for Hessian testing
class TinyNet(nn.Module):
    def __init__(self):
        super(TinyNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 2, 2, 1)
        self.conv2 = nn.Conv2d(2, 2, 2, 1)
        self.fc1 = nn.Linear(2, 2)
        self.fc2 = nn.Linear(2, 10)

    def forward(self, x):            # 28x28
        x = F.max_pool2d(x, 4, 4)    # 7x7
        x = F.relu(self.conv1(x))    # 6x6
        x = F.max_pool2d(x, 2, 2)    # 3x3
        x = F.relu(self.conv2(x))    # 2x2
        x = F.max_pool2d(x, 2, 2)    # 1x1
        x = x.view(-1, 2 * 1 * 1)    # C * W * H
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x


def test_end2end_grad1():
    torch.manual_seed(1)
    model = Net()
    loss_fn = nn.CrossEntropyLoss()

    n = 4
    data = torch.rand(n, 1, 28, 28)
    targets = torch.LongTensor(n).random_(0, 10)

    autograd_lib.add_hooks(model)
    output = model(data)
    loss_fn(output, targets).backward(retain_graph=True)
    autograd_lib.compute_grad1(model)
    autograd_lib.disable_hooks()

    # Compare values against autograd
    losses = torch.stack([loss_fn(output[i:i+1], targets[i:i+1]) for i in range(len(data))])

    for layer in model.modules():
        if not autograd_lib.is_supported(layer):
            continue
        for param in layer.parameters():
            assert torch.allclose(param.grad, param.grad1.mean(dim=0))
            assert torch.allclose(u.jacobian(losses, param), param.grad1)


def test_end2end_hess():
    u.setup_logdir_and_event_writer('test')
    subtest_hess_type('CrossEntropy')
    subtest_hess_type('LeastSquares')


def subtest_hess_type(hess_type):
    torch.manual_seed(1)
    model = TinyNet()

    def least_squares_loss(data_, targets_):
       assert len(data_) == len(targets_)
       err = data_ - targets_
       return torch.sum(err * err) / 2 / len(data_)

    n = 3
    data = torch.rand(n, 1, 28, 28)

    autograd_lib.add_hooks(model)
    output = model(data)

    if hess_type == 'LeastSquares':
        targets = torch.rand(output.shape)
        loss_fn = least_squares_loss
    else:  # hess_type == 'CrossEntropy':
        targets = torch.LongTensor(n).random_(0, 10)
        loss_fn = nn.CrossEntropyLoss()

    # Dummy backprop to make sure multiple backprops don't invalidate each other
    autograd_lib.backprop_hess(output, hess_type=hess_type)
    autograd_lib.clear_hess_backprops(model)

    autograd_lib.backprop_hess(output, hess_type=hess_type)

    autograd_lib.compute_hess(model)
    autograd_lib.disable_hooks()

    for layer in model.modules():
        if not autograd_lib.is_supported(layer):
            continue
        for param in layer.parameters():
            loss = loss_fn(output, targets)
            hess_autograd = u.hessian(loss, param)
            hess = param.hess
            assert torch.allclose(hess, hess_autograd.reshape(hess.shape))


def test_kron_nano():
    u.seed_random(1)

    d = [1, 2]
    n = 1
    # torch.set_default_dtype(torch.float32)

    loss_type = 'CrossEntropy'
    model: u.SimpleModel = u.SimpleFullyConnected2(d, nonlin=False, bias=True)

    if loss_type == 'LeastSquares':
        loss_fn = u.least_squares
    elif loss_type == 'DebugLeastSquares':
        loss_fn = u.debug_least_squares
    else:
        loss_fn = nn.CrossEntropyLoss()

    data = torch.randn(n, d[0])
    data = torch.ones(n, d[0])
    if loss_type.endswith('LeastSquares'):
        target = torch.randn(n, d[-1])
    elif loss_type == 'CrossEntropy':
        target = torch.LongTensor(n).random_(0, d[-1])
        target = torch.tensor([0])

    # Hessian computation, saves regular and Kronecker factored versions into .hess and .hess_factored attributes
    autograd_lib.add_hooks(model)
    output = model(data)
    autograd_lib.backprop_hess(output, hess_type=loss_type)
    autograd_lib.compute_hess(model, method='kron')
    autograd_lib.compute_hess(model)
    autograd_lib.disable_hooks()

    for layer in model.layers:
        Hk: u.Kron = layer.weight.hess_factored
        Hk_bias: u.Kron = layer.bias.hess_factored
        Hk, Hk_bias = u.expand_hess(Hk, Hk_bias)   # kronecker multiply the factors

        # old approach, using direct computation
        H2, H_bias2 = layer.weight.hess, layer.bias.hess

        # compute Hessian through autograd
        model.zero_grad()
        output = model(data)
        loss = loss_fn(output, target)
        H_autograd = u.hessian(loss, layer.weight)
        H_bias_autograd = u.hessian(loss, layer.bias)

        # compare autograd with direct approach
        u.check_close(H2, H_autograd.reshape(Hk.shape))
        u.check_close(H_bias2, H_bias_autograd)

        # compare factored with direct approach
        assert(u.symsqrt_dist(Hk, H2) < 1e-6)


def test_kron_tiny():
    u.seed_random(1)

    d = [2, 3, 3, 4, 5]
    n = 5
    # torch.set_default_dtype(torch.float32)

    loss_type = 'CrossEntropy'
    model: u.SimpleModel = u.SimpleFullyConnected2(d, nonlin=False, bias=True)

    if loss_type == 'LeastSquares':
        loss_fn = u.least_squares
    elif loss_type == 'DebugLeastSquares':
        loss_fn = u.debug_least_squares
    else:
        loss_fn = nn.CrossEntropyLoss()

    data = torch.randn(n, d[0])
    data = torch.ones(n, d[0])
    if loss_type.endswith('LeastSquares'):
        target = torch.randn(n, d[-1])
    elif loss_type == 'CrossEntropy':
        target = torch.LongTensor(n).random_(0, d[-1])

    # Hessian computation, saves regular and Kronecker factored versions into .hess and .hess_factored attributes
    autograd_lib.add_hooks(model)
    output = model(data)
    autograd_lib.backprop_hess(output, hess_type=loss_type)
    autograd_lib.compute_hess(model, method='kron')
    autograd_lib.compute_hess(model)
    autograd_lib.disable_hooks()

    for layer in model.layers:
        H: u.Kron = layer.weight.hess_factored
        H_bias: u.Kron = layer.bias.hess_factored
        H, H_bias = u.expand_hess(H, H_bias)   # kronecker multiply the factors

        # old approach, using direct computation
        H2, H_bias2 = layer.weight.hess, layer.bias.hess

        # compute Hessian through autograd
        model.zero_grad()
        output = model(data)
        loss = loss_fn(output, target)
        H_autograd = u.hessian(loss, layer.weight)
        H_bias_autograd = u.hessian(loss, layer.bias)

        # compare autograd with direct approach
        u.check_close(H2, H_autograd.reshape(H.shape))
        u.check_close(H_bias2, H_bias_autograd)

        # compare factored with direct approach
        assert(u.symsqrt_dist(H, H2) < 1e-6)


def test_kron_mnist():
    u.seed_random(1)

    data_width = 3
    batch_size = 3
    d = [data_width**2, 10]
    o = d[-1]
    n = batch_size
    train_steps = 1

    # torch.set_default_dtype(torch.float64)

    model: u.SimpleModel2 = u.SimpleFullyConnected2(d, nonlin=False, bias=True)
    autograd_lib.add_hooks(model)

    dataset = u.TinyMNIST(dataset_size=batch_size, data_width=data_width, original_targets=True)
    trainloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=False)
    train_iter = iter(trainloader)

    loss_fn = torch.nn.CrossEntropyLoss()

    gl.token_count = 0
    for train_step in range(train_steps):
        data, targets = next(train_iter)

        # get gradient values
        u.clear_backprops(model)
        autograd_lib.enable_hooks()
        output = model(data)
        autograd_lib.backprop_hess(output, hess_type='CrossEntropy')

        i = 0
        layer = model.layers[i]
        autograd_lib.compute_hess(model, method='kron')
        autograd_lib.compute_hess(model)
        autograd_lib.disable_hooks()

        # direct Hessian computation
        H = layer.weight.hess
        H_bias = layer.bias.hess

        # factored Hessian computation
        H2 = layer.weight.hess_factored
        H2_bias = layer.bias.hess_factored
        H2, H2_bias = u.expand_hess(H2, H2_bias)

        # autograd Hessian computation
        loss = loss_fn(output, targets) # TODO: change to d[i+1]*d[i]
        H_autograd = u.hessian(loss, layer.weight).reshape(d[i] * d[i + 1], d[i] * d[i + 1])
        H_bias_autograd = u.hessian(loss, layer.bias)

        # compare direct against autograd
        u.check_close(H, H_autograd)
        u.check_close(H_bias, H_bias_autograd)

        approx_error = u.symsqrt_dist(H, H2)
        assert approx_error < 1e-2, approx_error


def test_kron_conv_exact():
    """Test per-example gradient computation for conv layer.


    Kronecker factoring is exact for 1x1 convolutions and linear activations.

    """
    u.seed_random(1)

    n, Xh, Xw = 2, 2, 2
    Kh, Kw = 1, 1
    dd = [2, 2, 2]
    o = dd[-1]

    model: u.SimpleModel = u.PooledConvolutional2(dd, kernel_size=(Kh, Kw), nonlin=False, bias=True)
    data = torch.randn((n, dd[0], Xh, Xw))

    #print(model)
    #print(data)

    loss_type = 'CrossEntropy'    #  loss_type = 'LeastSquares'
    if loss_type == 'LeastSquares':
        loss_fn = u.least_squares
    elif loss_type == 'DebugLeastSquares':
        loss_fn = u.debug_least_squares
    else:    # CrossEntropy
        loss_fn = nn.CrossEntropyLoss()

    sample_output = model(data)

    if loss_type.endswith('LeastSquares'):
        targets = torch.randn(sample_output.shape)
    elif loss_type == 'CrossEntropy':
        targets = torch.LongTensor(n).random_(0, o)

    autograd_lib.clear_backprops(model)
    autograd_lib.add_hooks(model)
    output = model(data)
    autograd_lib.backprop_hess(output, hess_type=loss_type)
    autograd_lib.compute_hess(model, method='mean_kron')
    autograd_lib.compute_hess(model, method='exact')
    autograd_lib.disable_hooks()

    for i in range(len(model.layers)):
        layer = model.layers[i]

        # direct Hessian computation
        H = layer.weight.hess
        H_bias = layer.bias.hess

        # factored Hessian computation
        Hk = layer.weight.hess_factored
        Hk_bias = layer.bias.hess_factored
        Hk = Hk.expand()
        Hk_bias = Hk_bias.expand()

        # autograd Hessian computation
        loss = loss_fn(output, targets)
        Ha = u.hessian(loss, layer.weight).reshape(H.shape)
        Ha_bias = u.hessian(loss, layer.bias)

        # compare direct against autograd
        Ha = Ha.reshape(H.shape)
        # rel_error = torch.max((H-Ha)/Ha)

        u.check_close(H, Ha, rtol=1e-5, atol=1e-7)
        u.check_close(Ha_bias, H_bias, rtol=1e-5, atol=1e-7)

        u.check_close(H_bias, Hk_bias)
        u.check_close(H, Hk)


def test_kron_1x2_conv():
    """Minimal example of a 1x2 convolution whose Hessian/grad covariance doesn't factor as Kronecker.

    Two convolutional layers stacked on top of each other, followed by least squares loss.

        Outputs:
        0 tensor([[[[0., 1., 1., 1.]]]])
        1 tensor([[[[2., 3.]]]])
        2 tensor([[[[8.]]]])

        Activations/backprops:
        layerA 0 tensor([[[[0., 1.],
                           [1., 1.]]]])
        layerB 0 tensor([[[[1., 2.]]]])

        layerA 1 tensor([[[[2.],
                           [3.]]]])
        layerB 1 tensor([[[[1.]]]])

        layer 0 discrepancy: 0.6597963571548462
        layer 1 discrepancy: 0.0

     """
    u.seed_random(1)

    n, Xh, Xw = 1, 1, 4
    Kh, Kw = 1, 2
    dd = [1, 1, 1]
    o = dd[-1]

    model: u.SimpleModel = u.StridedConvolutional2(dd, kernel_size=(Kh, Kw), nonlin=False, bias=True)
    data = torch.tensor([0, 1., 1, 1]).reshape((n, dd[0], Xh, Xw))

    model.layers[0].bias.data.zero_()
    model.layers[0].weight.data.copy_(torch.tensor([1, 2]))

    model.layers[1].bias.data.zero_()
    model.layers[1].weight.data.copy_(torch.tensor([1, 2]))

    sample_output = model(data)

    autograd_lib.clear_backprops(model)
    autograd_lib.add_hooks(model)
    output = model(data)
    autograd_lib.backprop_hess(output, hess_type='LeastSquares')
    autograd_lib.compute_hess(model, method='kron', attr_name='hess_kron')
    autograd_lib.compute_hess(model, method='exact')
    autograd_lib.disable_hooks()

    for i in range(len(model.layers)):
        layer = model.layers[i]
        H = layer.weight.hess
        Hk = layer.weight.hess_kron
        Hk = Hk.expand()
        print(u.symsqrt_dist(H, Hk))


@pytest.mark.skip(reason="need to update golden values")
def _test_kron_conv_golden():
    """Hardcoded error values to detect unexpected numeric changes."""
    u.seed_random(1)

    n, Xh, Xw = 2, 8, 8
    Kh, Kw = 2, 2
    dd = [3, 3, 3, 3]
    o = dd[-1]

    model: u.SimpleModel = u.PooledConvolutional2(dd, kernel_size=(Kh, Kw), nonlin=False, bias=True)
    data = torch.randn((n, dd[0], Xh, Xw))

    # print(model)
    # print(data)

    loss_type = 'CrossEntropy'    #  loss_type = 'LeastSquares'
    if loss_type == 'LeastSquares':
        loss_fn = u.least_squares
    elif loss_type == 'DebugLeastSquares':
        loss_fn = u.debug_least_squares
    else:    # CrossEntropy
        loss_fn = nn.CrossEntropyLoss()

    sample_output = model(data)

    if loss_type.endswith('LeastSquares'):
        targets = torch.randn(sample_output.shape)
    elif loss_type == 'CrossEntropy':
        targets = torch.LongTensor(n).random_(0, o)

    autograd_lib.clear_backprops(model)
    autograd_lib.add_hooks(model)
    output = model(data)
    autograd_lib.backprop_hess(output, hess_type=loss_type)
    autograd_lib.compute_hess(model, method='kron', attr_name='hess_kron')
    autograd_lib.compute_hess(model, method='mean_kron', attr_name='hess_mean_kron')
    autograd_lib.compute_hess(model, method='exact')
    autograd_lib.disable_hooks()

    errors1 = []
    errors2 = []
    for i in range(len(model.layers)):
        layer = model.layers[i]

        # direct Hessian computation
        H = layer.weight.hess
        H_bias = layer.bias.hess

        # factored Hessian computation
        Hk = layer.weight.hess_kron
        Hk_bias = layer.bias.hess_kron
        Hk = Hk.expand()
        Hk_bias = Hk_bias.expand()

        Hk2 = layer.weight.hess_mean_kron
        Hk2_bias = layer.bias.hess_mean_kron
        Hk2 = Hk2.expand()
        Hk2_bias = Hk2_bias.expand()

        # autograd Hessian computation
        loss = loss_fn(output, targets)
        Ha = u.hessian(loss, layer.weight).reshape(H.shape)
        Ha_bias = u.hessian(loss, layer.bias)

        # compare direct against autograd
        Ha = Ha.reshape(H.shape)
        # rel_error = torch.max((H-Ha)/Ha)

        u.check_close(H, Ha, rtol=1e-5, atol=1e-7)
        u.check_close(Ha_bias, H_bias, rtol=1e-5, atol=1e-7)

        errors1.extend([u.symsqrt_dist(H, Hk), u.symsqrt_dist(H_bias, Hk_bias)])
        errors2.extend([u.symsqrt_dist(H, Hk2), u.symsqrt_dist(H_bias, Hk2_bias)])

    errors1 = torch.tensor(errors1)
    errors2 = torch.tensor(errors2)
    golden_errors1 = torch.tensor([0.09458080679178238, 0.0, 0.13416489958763123, 0.0, 0.0003909761435352266, 0.0])
    golden_errors2 = torch.tensor([0.0945773795247078, 0.0, 0.13418318331241608, 0.0, 4.478318658129865e-07, 0.0])

    u.check_close(golden_errors1, errors1)
    u.check_close(golden_errors2, errors2)


if __name__ == '__main__':
    # test_kron_conv_exact1()
    # test_kron_conv_exact2()
    # test_kron_conv_exact()
    #    test_kron_1x2_conv()
    u.run_all_tests(sys.modules[__name__])


================================================
FILE: autotune/ciresan_bench.py
================================================
import os
import sys
import time
from typing import Optional, Tuple, Callable

# import torch
import scipy
import torch
from torchcurv.optim import SecondOrderOptimizer


import torch.nn as nn

import util as u

import numpy as np

"""
MKL version unknown
PyTorch version 1.2.0
Scipy version:  1.2.1
Numpy version:  1.16.4
1024-by-1024 matrix
 7079.93   linalg.solve_lyapunov
  280.11   linalg.pinvh
 1186.08   linalg.pinv
   49.18   linalg.inv
  118.23   qr
  413.42   svd
"""

class Net(nn.Module):
    def __init__(self, d):
        super().__init__()
        self.w = nn.Linear(d, 1, bias=False)

    def forward(self, x: torch.Tensor):
        result = self.w(x)
        return result

last_time = 0

class timeit:
    """Decorator to measure length of time spent in the block in millis and log
    it to TensorBoard. This function is
    """

    def __init__(self, tag=""):
        self.tag = tag

    def __enter__(self):
        self.start = time.perf_counter()
        return self

    def __exit__(self, *args):
        global last_time
        self.end = time.perf_counter()
        interval_ms = 1000 * (self.end - self.start)
        torch.cuda.synchronize()
        print(f"{interval_ms:8.2f}   {self.tag}")
        last_time = interval_ms


def get_mkl_version():
  import ctypes
  import numpy as np

  # this recipe only works on Linux
  try:
    ver = np.zeros(199, dtype=np.uint8)
    mkl = ctypes.cdll.LoadLibrary("libmkl_rt.so")
    mkl.MKL_Get_Version_String(ver.ctypes.data_as(ctypes.c_char_p), 198)
    return ver[ver != 0].tostring()
  except:
    return 'unknown'


def print_cpu_info():
  ver = 'unknown'
  try:
    for l in open("/proc/cpuinfo").read().split('\n'):
      if 'model name' in l:
        ver = l
        break
  except:
    pass


def linalg_bench():
    if np.__config__.get_info("lapack_mkl_info"):
        print("MKL version", get_mkl_version())
    else:
        print("not using MKL")

    print("PyTorch version", torch.version.__version__)

    print("Scipy version: ", scipy.version.full_version)
    print("Numpy version: ", np.version.full_version)

    times = {}
    dimlist = [768, 2500, 2000, 1500, 1000, 500]
    for device in ['cpu', 'cuda']:
        for d in dimlist*2:
            print(f"{d}-by-{d} matrix: ", device)
            n = 10000
            assert n > 2*d   # to prevent singularity
            X = u.from_numpy(np.random.random((d, 10000)))
            Y = u.from_numpy(np.random.random((d, 10000)))
            H = X @ X.t()
            S = Y @ Y.t()
            if torch.cuda.is_available():
                H = H.to(device)
                S = S.to(device)

            with timeit(f"symeig"):
                result = torch.symeig(H, eigenvectors=False)
            times.setdefault(d, []).append(last_time)
            
            with timeit(f"symeig"):
                result = torch.symeig(H, eigenvectors=True)
            times.setdefault(d, []).append(last_time)

            with timeit(f"svd"):
                result = torch.svd(H)
            times.setdefault(d, []).append(last_time)
    for d in times:
        print(f"{d}-by-{d}: {u.format_list(times[d][len(times[d])//2:])}")


if __name__ == '__main__':
    linalg_bench()


================================================
FILE: autotune/curvature_test.py
================================================
# Prototype batch-size quantities from
# Batch size formulas (https://docs.google.com/document/d/19Jmh4spbSAnAGX_eq7WSFPgLzrpJEhiZRpjX1jSYObo/edit)
import os
import sys
from typing import Optional, Tuple, Callable

# import torch
import torch.nn as nn
from torchcurv.optim import SecondOrderOptimizer

from util import *


class Net(nn.Module):
    def __init__(self, d):
        super().__init__()
        self.w = nn.Linear(d, 1, bias=False)

    def forward(self, x: torch.Tensor):
        result = self.w(x)
        return result


def test_singlelayer():
    # Reproduce Linear Regression example
    # https://www.wolframcloud.com/obj/yaroslavvb/newton/curvature-unit-tests.nb

    torch.set_default_dtype(torch.float32)

    d = 2
    n = 3
    model = Net(d)

    w0 = torch.tensor([[1, 2]]).float()
    assert w0.shape[1] == d
    model.w.weight.data.copy_(w0)

    X = torch.tensor([[-2, 0, 2], [-1, 1, 3]]).float()
    assert X.shape[0] == d
    assert X.shape[1] == n

    Y = torch.tensor([[0, 1, 2]]).float()
    assert Y.shape[1] == X.shape[1]

    data = X.t()  # PyTorch expects batch dimension first
    target = Y.t()
    assert data.shape[0] == n

    output = model(data)
    # residuals, aka e
    residuals = output - Y.t()

    def compute_loss(residuals_):
        return torch.sum(residuals_ * residuals_) / (2 * n)

    loss = compute_loss(residuals)

    assert loss - 8.83333 < 1e-5, torch.norm(loss) - 8.83333

    # use learning rate 0 to avoid changing parameter vector
    optim_kwargs = dict(lr=0, momentum=0, weight_decay=0, l2_reg=0,
                        bias_correction=False, acc_steps=1,
                        curv_type="Cov", curv_shapes={"Linear": "Kron"},
                        momentum_type="preconditioned", )
    curv_args = dict(damping=1, ema_decay=1)  # todo: damping
    optimizer = SecondOrderOptimizer(model, **optim_kwargs, curv_kwargs=curv_args)

    def backward(last_layer: str) -> Callable:
        """Creates closure that backpropagates either from output layer or from loss layer"""

        def closure() -> Tuple[Optional[torch.Tensor], torch.Tensor]:
            optimizer.zero_grad()
            output = model(data)
            if last_layer == "output":
                output.backward(torch.ones_like(target))
                return None, output
            elif last_layer == 'loss':
                loss = compute_loss(output - target)
                loss.backward()
                return loss, output
            else:
                assert False, 'last layer must be "output" or "loss"'

        return closure

    #    loss = compute_loss(output - Y.t())
    #    loss.backward()

    loss, output = optimizer.step(closure=backward('loss'))
    check_equal(output.t(), [[-4, 2, 8]])
    check_equal(residuals.t(), [[-4, 1, 6]])
    check_equal(loss, 8.833333)

    # batch output Jacobian
    J = X.t()
    check_close(J, [[-2, -1], [0, 1], [2, 3]])

    # matrix of activations, (n, d)
    A = model.w.data_input
    check_close(A, J)

    # matrix of backprops, add factor n to remove dependence on batch-size
    B = model.w.grad_output * n
    check_close(B, residuals)

    # gradients, n,d
    # method 1, manual computation
    G = residuals.repeat(1, d) * J
    check_close(G, [[8., 4.], [0., 1.], [12., 18.]])

    # method 2, get them of activation + backprop values
    check_close(G, khatri_rao_t(A, B))

    # method 3, PyTorch autograd
    # (n,) losses vector
    losses = torch.stack([compute_loss(r) for r in residuals])
    # batch-loss jacobian
    G2 = jacobian(losses, model.w.weight) * n
    # per-example gradients are row-matrices, squeeze to stack them into a single matrix
    G2 = G2.squeeze(1)
    check_close(G2, G)

    # mean gradient
    g = G.sum(dim=0) / n
    check_close(g, [6.66667, 7.66667])

    # empirical Fisher
    efisher = G.t() @ G / n
    check_close(efisher, [[69.3333, 82.6667], [82.6667, 113.667]])

    # centered empirical Fisher (Sigma in OpenAI paper, estimate of Sigma in Jain paper)
    sigma = efisher - outer(g, g)
    check_close(sigma, [[24.8889, 31.5556], [31.5556, 54.8889]])

    # loss
    loss2 = (residuals * residuals).sum() / (2 * n)
    check_close(to_python_scalar(loss2), 8.83333)

    ####################################################################
    # Hessian
    ####################################################################

    # method 1, manual computation
    H = J.t() @ J / n
    check_close(H, [[2.66667, 2.66667], [2.66667, 3.66667]])

    # method 2, using activation + backprop values
    check_close(A.t() @ torch.eye(n) @ A / n, H)

    # method 3, PyTorch backprop
    hess = hessian(compute_loss(residuals), model.w.weight)
    hess = hess.squeeze(2)   # TODO(y): replace with transpose like in multilayer test
    hess = hess.squeeze(0)
    check_close(hess, H)

    sigma_norm = torch.norm(sigma)
    g_norm = torch.norm(g)

    g_ = g.unsqueeze(0)  # turn g into row matrix

    # predicted drop in loss if we take a Newton step
    excess = to_python_scalar(g_ @ H.inverse() @ g_.t() / 2)
    check_close(excess, 8.83333)

    def loss_direction(direction, eps):
        """loss improvement if we take step eps in direction dir"""
        return to_python_scalar(eps * (direction @ g.t()) - 0.5 * eps ** 2 * direction @ H @ direction.t())

    newtonImprovement = loss_direction(g_ @ H.inverse(), 1)
    check_close(newtonImprovement, 8.83333)

    ############################
    # OpenAI quantities
    grad_curvature = to_python_scalar(g_ @ H @ g_.t())  # curvature in direction of g
    stepOpenAI = to_python_scalar(g.norm() ** 2 / grad_curvature) if g_norm else 999
    check_close(stepOpenAI, 0.170157)
    batchOpenAI = to_python_scalar(torch.trace(H @ sigma) / grad_curvature) if g_norm else 999
    check_close(batchOpenAI, 0.718603)

    # improvement in loss when we take gradient step with optimal learning rate
    gradientImprovement = loss_direction(g_, stepOpenAI)
    assert newtonImprovement > gradientImprovement
    check_close(gradientImprovement, 8.78199)

    ############################
    # Gradient diversity  quantities
    diversity = torch.norm(G, "fro") ** 2 / torch.norm(g) ** 2
    check_close(diversity, 5.31862)

    ############################
    # Jain/Kakade quantities

    # noise scale (Jain, minimax rate of estimator)
    noise_variance = torch.trace(H.inverse() @ sigma)
    check_close(noise_variance, 26.)

    isqrtH = pinv_square_root(H)
    # measure of misspecification between model and actual noise (Jain, \rho)
    # formula (3) of "Parallelizing Stochastic Gradient Descent"
    p_sigma = (kron(H, torch.eye(d)) + kron(torch.eye(d), H)).inverse() @ vec(sigma)
    p_sigma = unvec(p_sigma, d)
    rho = d / erank(p_sigma) if sigma_norm > 0 else 1
    check_close(rho, 1.21987)

    # use new method with Lyapunov factoring
    p_sigma2 = lyapunov_svd(H, sigma)
    rho2 = d / erank(p_sigma2)
    check_close(rho2, 1.21987)

    rhoSimple = (d / erank(isqrtH @ sigma @ isqrtH)) if sigma_norm > 0 else 1
    check_close(rhoSimple, 1.4221)
    assert 1 <= rho <= d, rho

    # divergent learning rate for batch-size 1 (Jain). Approximates max||x_i|| with avg.
    # For more accurate results may want to add stddev of ||x_i||
    # noinspection PyTypeChecker
    stepMin = 2 / torch.trace(H)
    check_close(stepMin, 0.315789)

    # divergent learning rate for batch-size infinity
    stepMax = 2 / l2_norm(H)
    check_close(stepMax, 0.340147)

    # divergent learning rate for batch-size 1, adjusted for misspecification
    check_close(stepMin / rhoSimple, 0.222058)
    check_close(stepMin / rho, 0.258871)

    # batch size that gives provides lr halfway between stepMin and stepMax
    batchJain = 1 + erank(H)
    check_close(batchJain, 2.07713)

    # batch size that provides halfway point after adjusting for misspecification
    check_close(1 + erank(H) * rhoSimple, 2.5318)
    check_close(1 + erank(H) * rho, 2.31397)


class Net2(nn.Module):
    def __init__(self, d1, d2):
        super().__init__()
        self.W = nn.Linear(d1, d2, bias=False)
        self.X2t = nn.Linear(d2, 1, bias=False)

    def forward(self, X1: torch.Tensor):
        result = self.W(X1)
        result = self.X2t(result)
        return result


def test_multilayer():
    # Reproduce multilayer example
    # https://www.wolframcloud.com/obj/yaroslavvb/newton/curvature-unit-tests.nb

    torch.set_default_dtype(torch.float64)

    d1 = 2
    d2 = 4
    n = 3
    model = Net2(d1, d2)

    W0 = u.to_pytorch([[3, 3], [0, 3], [1, -1], [-3, 1]])
    model.W.weight.data.copy_(W0)
    X2 = u.to_pytorch([[1], [-2], [-1], [3]])
    assert X2.shape == (d2, 1)
    model.X2t.weight.data.copy_(X2.t())

    X1 = u.to_pytorch([[2, -2, 3], [-3, 1, -3]])
    assert X1.shape == (d1, n)

    Y = u.to_pytorch([[-2, -3, 0]])
    assert Y.shape == (1, n)

    data = X1.t()  # PyTorch expects batch dimension first
    target = Y.t()
    assert data.shape[0] == n

    output = model(data)
    # residuals, aka e
    residuals = output - Y.t()

    def compute_loss(residuals_):
        return torch.sum(residuals_ * residuals_) / (2 * n)

    loss = compute_loss(residuals)
    assert loss - 187.5 < 1e-5, torch.norm(loss) - 8.83333

    # use learning rate 0 to avoid changing parameter vector
    optim_kwargs = dict(lr=0, momentum=0, weight_decay=0, l2_reg=0,
                        bias_correction=False, acc_steps=1,
                        curv_type="Cov", curv_shapes={"Linear": "Kron"},
                        momentum_type="preconditioned", update_inv=False, precondition_grad=False)
    curv_args = dict(damping=0, ema_decay=1)
    optimizer = SecondOrderOptimizer(model, **optim_kwargs, curv_kwargs=curv_args)

    # def set_requires_grad(v):
    #     for p in model.parameters():
    #         p.requires_grad = False
    #
    def backward(last_layer: str) -> Callable:
        """Creates closure that backpropagates either from output layer or from loss layer"""

        def closure() -> Tuple[Optional[torch.Tensor], torch.Tensor]:
            optimizer.zero_grad()
            output = model(data)
            if last_layer == "output":
                output.backward(torch.ones_like(target))
                return None, output
            elif last_layer == 'loss':
                loss = compute_loss(output - target)
                loss.backward()
                return loss, output
            else:
                assert False, 'last layer must be "output" or "loss"'

        return closure

    #    loss = compute_loss(output - Y.t())
    #    loss.backward()

    loss, output = optimizer.step(closure=backward('loss'))
    check_close(output.t(), [[-17, 15, -24]])
    check_close(residuals.t(), [[-15, 18, -24]])
    check_close(loss, 187.5)

    # batch output Jacobian, n rows, i'th row gives sensitivity of i'th output example to parameters
    J = kron(X1, X2).t()
    assert J.shape == (n, d1 * d2)
    check_close(J, [[2, -4, -2, 6, -3, 6, 3, -9], [-2, 4, 2, -6, 1, -2, -1, 3], [3, -6, -3, 9, -3, 6, 3, -9]])

    # matrix of activations, (n, d1)
    At = model.W.data_input
    A = At.t()
    check_close(At, X1.t())

    # matrix of backprops, add factor n to remove dependence on batch-size
    Bt = model.W.grad_output * n
    check_close(Bt, [[-15, 30, 15, -45], [18, -36, -18, 54], [-24, 48, 24, -72]])

    # gradients, n,d
    # mean gradient, 1, d
    # method 1, manual computation
    G = khatri_rao_t(At, Bt)
    assert G.shape == (n, d1 * d2)
    check_close(G, [[-30, 60, 30, -90, 45, -90, -45, 135], [-36, 72, 36, -108, 18, -36, -18, 54],
                    [-72, 144, 72, -216, 72, -144, -72, 216]])

    g = G.sum(dim=0, keepdim=True) / n
    check_close(g, [[-46, 92, 46, -138, 45, -90, -45, 135]])
    check_close(g, vec(get_param(model.W).grad).t())

    # method 2, explicit PyTorch autograd
    # (n,) losses vector
    losses = torch.stack([compute_loss(r) for r in residuals])
    # batch-loss jacobian
    G2 = jacobian(losses, model.W.weight) * n
    # per-example gradients are row-matrices, squeeze to stack them into a single matrix
    # each element of G2 is a matrix, vectorize+transpose to turn it into a row
    G2 = G2.transpose(1, 2).reshape(n, d1 * d2)
    check_close(G2, G)

    # Hessian
    # method 1, manual computation
    H = J.t() @ J / n
    check_close(H * n,
                [[17, -34, -17, 51, -17, 34, 17, -51],
                 [-34, 68, 34, -102, 34, -68, -34, 102],
                 [-17, 34, 17, -51, 17, -34, -17, 51],
                 [51, -102, -51, 153, -51, 102, 51, -153],
                 [-17, 34, 17, -51, 19, -38, -19, 57],
                 [34, -68, -34, 102, -38, 76, 38, -114],
                 [17, -34, -17, 51, -19, 38, 19, -57],
                 [-51, 102, 51, -153, 57, -114, -57, 171]])

    # method 2, using activation + upstream matrices
    check_close(kron(A @ A.t(), X2 @ X2.t()) / n, H)

    # method 3, PyTorch autograd
    hess = hessian(compute_loss(residuals), model.W.weight)
    # Fix shape: vectorization flattens in column-major order, but PyTorch is row-major order
    # for reshape to flatten things correctly, transpose H_{ijkl} -> H_{jilk}
    hess = hess.transpose(2, 3).transpose(0, 1)
    hess = hess.reshape(d1 * d2, d1 * d2)
    check_close(hess, H)

    # method 4, get Jacobian + Hessian using backprop
    _loss, _output = optimizer.step(closure=backward('output'))
    B2t = model.W.grad_output

    # alternative way of getting batch Jacobian
    J2 = khatri_rao_t(At, B2t)
    check_close(J2, J)
    H2 = J2.t() @ J2 / n
    check_close(H2, H)

    # empirical Fisher
    efisher = G.t() @ G / n
    check_close(efisher, [[2460, -4920, -2460, 7380, -2394, 4788, 2394, -7182],
                          [-4920, 9840, 4920, -14760, 4788, -9576, -4788, 14364],
                          [-2460, 4920, 2460, -7380, 2394, -4788, -2394, 7182],
                          [7380, -14760, -7380, 22140, -7182, 14364, 7182, -21546],
                          [-2394, 4788, 2394, -7182, 2511, -5022, -2511, 7533],
                          [4788, -9576, -4788, 14364, -5022, 10044, 5022, -15066],
                          [2394, -4788, -2394, 7182, -2511, 5022, 2511, -7533],
                          [-7182, 14364, 7182, -21546, 7533, -15066, -7533, 22599]])

    # centered empirical Fisher (Sigma in OpenAI paper, estimate of Sigma in Jain paper)
    sigma = efisher - g.t() @ g
    check_close(sigma, [[344, -688, -344, 1032, -324, 648, 324, -972], [-688, 1376, 688, -2064, 648, -1296, -648, 1944],
                        [-344, 688, 344, -1032, 324, -648, -324, 972],
                        [1032, -2064, -1032, 3096, -972, 1944, 972, -2916],
                        [-324, 648, 324, -972, 486, -972, -486, 1458], [648, -1296, -648, 1944, -972, 1944, 972, -2916],
                        [324, -648, -324, 972, -486, 972, 486, -1458],
                        [-972, 1944, 972, -2916, 1458, -2916, -1458, 4374]])

    # loss
    loss2 = (residuals * residuals).sum() / (2 * n)
    check_close(to_python_scalar(loss2), 187.5)

    sigma_norm = torch.norm(sigma)
    g_norm = torch.norm(g)

    # predicted drop in loss if we take a Newton step
    excess = to_python_scalar(g @ H.pinverse() @ g.t() / 2)
    check_close(excess, 12747 / 68)

    def loss_direction(direction, eps):
        """loss improvement if we take step eps in direction dir"""
        return to_python_scalar(eps * (direction @ g.t()) - 0.5 * eps ** 2 * direction @ H @ direction.t())

    newtonImprovement = loss_direction(g @ H.pinverse(), 1)
    check_close(newtonImprovement, 12747/68)

    ############################
    # OpenAI quantities
    ############################
    grad_curvature = to_python_scalar(g @ H @ g.t())  # curvature in direction of g
    stepOpenAI = to_python_scalar(g.flatten().norm() ** 2 / grad_curvature) if g_norm else 999
    check_close(stepOpenAI, 0.00571855)
    batchOpenAI = to_python_scalar(torch.trace(H @ sigma) / grad_curvature) if g_norm else 999
    check_close(batchOpenAI, 0.180201)

    # improvement in loss when we take gradient step with optimal learning rate
    gradientImprovement = loss_direction(g, stepOpenAI)
    assert newtonImprovement > gradientImprovement
    check_close(gradientImprovement, 177.604)

    ############################
    # Gradient diversity  quantities
    ############################
    diversity = torch.norm(G, "fro") ** 2 / torch.norm(g) ** 2
    check_close(diversity, 3.6013)

    ############################
    # Jain/Kakade quantities
    ############################

    # noise scale (Jain, minimax rate of estimator)
    noise_variance = torch.trace(H.pinverse() @ sigma)
    check_close(noise_variance, 333.706)

    isqrtH = pinv_square_root(H)
    #    isqrtH = torch.tensor(isqrtH)
    # measure of misspecification between model and actual noise (Jain, \rho)
    # formula (3) of "Parallelizing Stochastic Gradient Descent"
    p_sigma = torch.pinverse(kron(H, torch.eye(d1 * d2)) + kron(torch.eye(d1 * d2), H)) @ vec(sigma)
    p_sigma = unvec(p_sigma, d1 * d2)
    rho = d1 * d2 / erank(p_sigma) if sigma_norm > 0 else 1
    check_close(rho, 6.48399)

    rhoSimple = (d1 * d2 / erank(isqrtH @ sigma @ isqrtH)) if sigma_norm > 0 else 1
    check_close(rhoSimple, 6.55661)

    # divergent learning rate for batch-size 1 (Jain). Approximates max||x_i|| with avg.
    # For more accurate results may want to add stddev of ||x_i||
    # noinspection PyTypeChecker
    stepMin = 2 / torch.trace(H)
    check_close(stepMin, 0.0111111)

    # divergent learning rate for batch-size infinity
    stepMax = 2 / l2_norm(H)
    check_close(stepMax, 0.011419)

    # divergent learning rate for batch-size 1, adjusted for misspecification
    check_close(stepMin / rhoSimple, 0.00169464)
    check_close(stepMin / rho, 0.00171362)

    # batch size that gives provides lr halfway between stepMin and stepMax
    batchJain = 1 + erank(H)
    check_close(batchJain, 2.02771)

    # batch size that provides halfway point after adjusting for misspecification
    check_close(1 + erank(H) * rhoSimple, 7.73829)
    check_close(1 + erank(H) * rho, 7.66365)


if __name__ == '__main__':
    run_all_tests(sys.modules[__name__])


================================================
FILE: autotune/eval_conv2d_approx.py
================================================
"""Evaluate approximation quality of factoring on conv2d layers.

Evaluates discrepancy in magnitude (l2 norm) and value (difference between normalized square roots), 0 means perfect agreement

For LeastSquares loss: kron and mean_kron are exact for all combinations with kernel_size=1
For CrossEntropy loss: kron is exact for all combinations with kernel_size=1, num_channels=1
                       mean_kron is exact for all combinations with kernel_size=1


Increasing kernel_size>1, all methods over-estimate the Hessian magnitude, even in 1 channel/LeastSquares loss case

========== mean_kron ========================================
image_size
   value: : [1.1446763892308809e-06, 9.212457143803476e-07, 1.4112242752162274e-06, 1.2591850691023865e-06, 1.194795800074644e-06]
   magnitude : 1.00, 1.00, 1.00, 1.00, 1.00
num_channels
   value: : [5.330160206540313e-07, 9.375009426548786e-07, 1.2972760714546894e-06, 1.1446763892308809e-06, 1.0141359780391213e-05, 9.359457180835307e-06, 6.479138392023742e-06, 6.426676463888725e-06, 8.470763532386627e-06, 5.907071681576781e-06]
   magnitude : 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00, 1.00
kernel_size
   value: : [1.1446763892308809e-06, 0.8146935105323792, 0.3585125207901001, 0.6153226494789124, 0.2633444368839264]
   magnitude : 1.00, 1.12, 1.13, 1.02, 1.00

========== kron ========================================
image_size
   value: : [0.012090419419109821, 0.018052944913506508, 0.004046864341944456, 0.006037037819623947, 0.008428926579654217]
   magnitude : 1.00, 1.01, 1.00, 1.01, 1.01
num_channels
   value: : [0.004109012894332409, 0.029988422989845276, 0.0047996193170547485, 0.012090419419109821, 0.02191711962223053, 0.012329756282269955, 0.12856155633926392, 0.015723882243037224, 0.027835888788104057, 0.012887174263596535]
   magnitude : 1.00, 1.01, 1.00, 1.00, 1.03, 1.00, 1.02, 1.01, 1.00, 1.00
kernel_size
   value: : [0.012090419419109821, 0.8148165941238403, 0.358644038438797, 0.6153406500816345, 0.263365775346756]
   magnitude : 1.00, 1.12, 1.13, 1.02, 1.00
"""

# Tests that compare manual computation of quantities against PyTorch autograd

from typing import List

import autograd_lib
import torch
import util as u
from attrdict import AttrDict

unfold = torch.nn.functional.unfold
fold = torch.nn.functional.fold


def compute_hess(n: int = 1, image_size: int = 1, kernel_size: int = 1, num_channels: int = 1, num_layers: int = 1,
                 nonlin: bool = False,
                 loss: str = 'CrossEntropy', method='exact', param_name='weight') -> List[torch.Tensor]:
    """

    Compute Hessians for all layers for given architecture

    Args:
        param_name: which parameter to compute ('weight' or 'bias')
        n: number of examples
        image_size:  width of image (square image)
        kernel_size: kernel size
        num_channels:
        num_layers:
        nonlin
        loss: LeastSquares or CrossEntropy
        method: 'kron', 'mean_kron'
        num_layers: number of layers in the network

    Returns:
        list of num_layers Hessian matrices.
    """

    assert param_name in ['weight', 'bias']
    assert loss in autograd_lib._supported_losses
    assert method in autograd_lib._supported_methods

    u.seed_random(1)

    Xh, Xw = 1, image_size
    Kh, Kw = 1, kernel_size
    dd = [num_channels] * (num_layers + 1)

    model: u.SimpleModel2 = u.PooledConvolutional2(dd, kernel_size=(Kh, Kw), nonlin=nonlin, bias=True)
    # model: u.SimpleModel2 = u.StridedConvolutional2(dd, kernel_size=(Kh, Kw), nonlin=nonlin, bias=True)
    data = torch.randn((n, dd[0], Xh, Xw))

    autograd_lib.clear_backprops(model)
    autograd_lib.add_hooks(model)
    output = model(data)
    autograd_lib.backprop_hess(output, hess_type=loss)
    autograd_lib.compute_hess(model, method=method)
    autograd_lib.disable_hooks()

    result = []
    for i in range(len(model.layers)):
        param = getattr(model.layers[i], param_name)
        if method == 'exact' or method == 'autograd':
            result.append(param.hess)
        else:
            result.append(param.hess_factored.expand())
    return result


def main():
    # for kernel_size=1, mean kron factoring works for any image size
    main_vals = AttrDict(n=2, kernel_size=1, image_size=625, num_channels=5, num_layers=4, loss='CrossEntropy',
                         nonlin=False)

    hess_list1 = compute_hess(method='exact', **main_vals)
    hess_list2 = compute_hess(method='kron', **main_vals)
    value_error = max([u.symsqrt_dist(h1, h2) for h1, h2 in zip(hess_list1, hess_list2)])
    magnitude_error = max([u.l2_norm(h2) / u.l2_norm(h1) for h1, h2 in zip(hess_list1, hess_list2)])
    print(value_error)
    print(magnitude_error)

    dimension_vals=dict(image_size=[2, 3, 4, 5, 6], num_channels=range(2, 12), kernel_size=[1, 2, 3, 4, 5])
    for method in ['mean_kron', 'kron']:  # , 'experimental_kfac']:
        print()
        print('='*10, method, '='*40)
        for dimension in ['image_size', 'num_channels', 'kernel_size']:
            value_errors = []
            magnitude_errors = []
            for val in dimension_vals[dimension]:
                vals = AttrDict(main_vals.copy())
                vals.method = method
                vals[dimension] = val
                vals.image_size = max(vals.image_size, vals.kernel_size ** vals.num_layers)
                # print(vals)
                vals_exact = AttrDict(vals.copy())
                vals_exact.method = 'exact'
                hess_list1 = compute_hess(**vals_exact)
                hess_list2 = compute_hess(**vals)
                magnitude_error = max([u.l2_norm(h2) / u.l2_norm(h1) for h1, h2 in zip(hess_list1, hess_list2)])
                hess_list1 = [h/u.l2_norm(h) for h in hess_list1]
                hess_list2 = [h/u.l2_norm(h) for h in hess_list2]

                value_error = max([u.symsqrt_dist(h1, h2) for h1, h2 in zip(hess_list1, hess_list2)])
                value_errors.append(value_error)
                magnitude_errors.append(magnitude_error.item())
            print(dimension)
            print('   value: :', value_errors)
            print('   magnitude :', u.format_list(magnitude_errors))


if __name__ == '__main__':
    main()


================================================
FILE: autotune/factored_test.py
================================================
"""Test factored implementation of stats"""

import argparse
import os
import sys
import time

import autograd_lib
import globals as gl
# import torch
import torch
import util as u
import wandb
from attrdict import AttrDefault
from torch import nn as nn
from torch.utils.tensorboard import SummaryWriter


def test_factored_stats_golden_values():
    """Test stats from values generated by non-factored version"""
    u.seed_random(1)
    u.install_pdb_handler()
    torch.set_default_dtype(torch.float32)

    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
    args = parser.parse_args()

    logdir = u.create_local_logdir('/temp/runs/factored_test')
    run_name = os.path.basename(logdir)
    gl.event_writer = SummaryWriter(logdir)
    print('logging to ', logdir)

    loss_type = 'LeastSquares'

    args.data_width = 2
    args.dataset_size = 5
    args.stats_batch_size = 5
    d1 = args.data_width ** 2
    args.stats_batch_size = args.dataset_size
    args.stats_steps = 1

    n = args.stats_batch_size
    o = 10
    d = [d1, o]

    model = u.SimpleFullyConnected2(d, bias=False, nonlin=0)
    model = model.to(gl.device)
    print(model)

    dataset = u.TinyMNIST(data_width=args.data_width, dataset_size=args.dataset_size, loss_type=loss_type)
    stats_loader = torch.utils.data.DataLoader(dataset, batch_size=args.stats_batch_size, shuffle=False)
    stats_iter = u.infinite_iter(stats_loader)
    stats_data, stats_targets = next(stats_iter)

    if loss_type == 'LeastSquares':
        loss_fn = u.least_squares
    else:   # loss_type == 'CrossEntropy':
        loss_fn = nn.CrossEntropyLoss()

    autograd_lib.add_hooks(model)
    gl.reset_global_step()
    last_outer = 0
    for step in range(args.stats_steps):
        if last_outer:
            u.log_scalars({"time/outer": 1000*(time.perf_counter() - last_outer)})
        last_outer = time.perf_counter()

        data, targets = stats_data, stats_targets

        # Capture Hessian and gradient stats
        autograd_lib.enable_hooks()
        autograd_lib.clear_backprops(model)
        with u.timeit("backprop_g"):
            output = model(data)
            loss = loss_fn(output, targets)
            loss.backward(retain_graph=True)

        autograd_lib.clear_hess_backprops(model)
        with u.timeit("backprop_H"):
            autograd_lib.backprop_hess(output, hess_type=loss_type)
        autograd_lib.disable_hooks()   # TODO(y): use remove_hooks

        with u.timeit("compute_grad1"):
            autograd_lib.compute_grad1(model)
        with u.timeit("compute_hess"):
            autograd_lib.compute_hess(model)
            autograd_lib.compute_hess(model, method='kron', attr_name='hess2')

        autograd_lib.compute_stats_factored(model)

        params = list(model.parameters())
        assert len(params) == 1
        new_values = params[0].stats
        golden_values = torch.load('test/factored.pt')

        for valname in new_values:
            print("Checking ", valname)
            if valname == 'sigma_l2':
                u.check_close(new_values[valname], golden_values[valname], atol=1e-2)  # sigma is approximate
            elif valname == 'sigma_erank':
                u.check_close(new_values[valname], golden_values[valname], atol=0.11)  # 1.0 vs 1.1
            elif valname in ['rho', 'step_div_1_adjusted', 'batch_jain_full']:
                continue   # lyapunov stats weren't computed correctly in golden set
            elif valname in ['batch_openai']:
                continue   # batch sizes depend on sigma which is approximate
            elif valname in ['noise_variance_pinv']:
                pass  # went from 0.22 to 0.014 after kron factoring (0.01 with full centering, 0.3 with no centering)
            elif valname in ['sparsity']:
                pass   # had a bug in old calc (using integer arithmetic)
            else:
                u.check_close(new_values[valname], golden_values[valname], rtol=1e-4, atol=1e-6, label=valname)

    gl.event_writer.close()


def test_factored_vs_regular():
    """Take simple network, compute values in two different ways, compare."""

    u.seed_random(1)

    gl.project_name = 'test'
    gl.logdir_base = '/tmp/runs'
    u.setup_logdir_and_event_writer(run_name=sys._getframe().f_code.co_name)

    d = 3
    n = 3
    model: u.SimpleFullyConnected2 = u.SimpleFullyConnected2([d, d], bias=False, nonlin=False)
    param = model.layers[0].weight

    # param.data.copy_(torch.eye(d))
    #param.data.copy_(torch.arange(9).reshape(3, 3))
    # param.data.copy_(torch.zeros(d, d))

    # create simple matrix which is not quite symmetric
    source = 2*torch.eye(d)
    source[0, 0] = 3
    source[0, 1] = 4
    source[1, 0] = -2
    data = source.repeat([n, 1])
    noise = source.repeat_interleave(n, dim=0)

    autograd_lib.add_hooks(model)
    output = model(data)
    output.backward(retain_graph=True, gradient=noise)
    loss = u.least_squares(output)

    autograd_lib.backprop_hess(output, hess_type='LeastSquares', model=model)
    autograd_lib.compute_grad1(model)
    autograd_lib.compute_hess(model)
    autograd_lib.compute_hess(model, method='kron', attr_name='hess2')
    autograd_lib.compute_stats(model, attr_name='stats_regular', sigma_centering=True)
    autograd_lib.compute_stats_factored(model, attr_name='stats_factored', sigma_centering=False)

    stats = param.stats_regular
    stats_factored = param.stats_factored
    for name in stats:
        print(name, stats[name], stats_factored[name])
        # u.check_close(stats[name], stats_factored[name], label=name)


if __name__ == '__main__':
    test_factored_vs_regular()
    # u.run_all_tests(sys.modules[__name__])


================================================
FILE: autotune/globals.py
================================================
# Module to hold global variables for curvature computation functions.
# This is needed sincne functionality may be split over several modules

from typing import Optional

import torch
from torch.utils.tensorboard import SummaryWriter

event_writer: Optional[SummaryWriter] = None
project_name: Optional[str] = 'train_ciresan'  # project name to use for wandb logging
logdir_base: str = '/ncluster/runs'
run_name: Optional[str] = None  # run name to use, corresponds to logging dir and wandb run name
logdir: Optional[str] = None  # logdir
token_count: int = 0   # TODO(y): rename to global-step. Meaning is context-specific, in case of sequences it's number of tokens

args = None   #  global arg values
debug_dump_stats: bool = False   # print activations/backprops to console
debug_linalg_crashes: bool = False   # save matrices that cause linalg routines to crash


# debug_hard_crashes_on_nans: bool = True  # crash if encountering NaN

hacks_disable_hess = False


if torch.cuda.is_available():
    device = torch.device('cuda')
    print("Using GPU")
else:
    device = torch.device('cpu')


def reset_global_step():
    global token_count
    token_count = 0


def increment_global_step(incr: int):
    global token_count
    token_count += incr


def get_global_step() -> int:
    return token_count




================================================
FILE: autotune/hessian_test.py
================================================
# Test exact Hessian computation

# import torch
import sys
from typing import Callable

import torch
import torch.nn as nn


class Net(nn.Module):
    def __init__(self, d):
        super().__init__()
        self.w = nn.Linear(d, 1, bias=False)

    def forward(self, x: torch.Tensor):
        result = self.w(x)
        return result


def test_simple_hessian():
    # Compare against manual calculations in
    # https://www.wolframcloud.com/obj/yaroslavvb/newton/linear-jacobians-and-hessians.nb
    torch.set_default_dtype(torch.float32)

    d = [2, 3, 4, 2]
    n = d[0]
    c = d[-1]
    As = torch.tensor([[3, 1, -1], [1, -3, -2]]).float()
    Bs = torch.tensor([[[3, -3], [-3, -1], [-3, 3], [-3, 0]], [[2, -1], [-3, 0], [1, 1], [-2, 0]]]).float()

    # output Jacobian for first example
    Jo1 = u.kron(u.v2r(As[0]), Bs[0].t())
    u.check_equal(Jo1, [[9, -9, -9, -9, 3, -3, -3, -3, -3, 3, 3, 3], [-9, -3, 9, 0, -3, -1, 3, 0, 3, 1, -3, 0]])

    # batch output Jacobian
    Jb = torch.cat([u.kron(u.v2r(As[i]), Bs[i].t()) for i in range(n)])
    u.check_equal(Jb, [[9, -9, -9, -9, 3, -3, -3, -3, -3, 3, 3, 3], [-9, -3, 9, 0, -3, -1, 3, 0, 3, 1, -3, 0],
                       [2, -3, 1, -2, -6, 9, -3, 6, -4, 6, -2, 4], [-1, 0, 1, 0, 3, 0, -3, 0, 2, 0, -2, 0]])

    W = torch.nn.Parameter(torch.ones((d[2], d[1])))

    def loss(i):
        residuals = Bs[i].t() @ W @ u.v2c(As[i])
        return 0.5 * torch.sum(residuals * residuals)

    u.check_equal(loss(0), 333 / 2)

    # check against PyTorch autograd
    i = 0
    outputs = Bs[i].t() @ W @ u.v2c(As[i])
    jac = u.jacobian(outputs, W)

    u.check_equal(Jo1, jac.transpose(0, 1).transpose(2, 3).reshape((c, -1)))

    Jb = torch.cat([u.kron(u.v2r(As[i]), Bs[i].t()) for i in range(n)])
    manualHess = Jb.t() @ Jb
    u.check_equal(manualHess, [[167, -60, -161, -85, 39, 0, -57, -15, -64, 30, 52, 35],
                               [-60, 99, 51, 87, 0, 3, 27, 9, 30, -48, -12, -39],
                               [-161, 51, 164, 79, -57, 27, 48, 33, 52, -12, -58, -23],
                               [-85, 87, 79, 85, -15, 9, 33, 15, 35, -39, -23, -35],
                               [39, 0, -57, -15, 63, -60, -9, -45, 12, -30, 24, -15],
                               [0, 3, 27, 9, -60, 91, -21, 63, -30, 44, -24, 27],
                               [-57, 27, 48, 33, -9, -21, 36, -9, 24, -24, -6, -21],
                               [-15, 9, 33, 15, -45, 63, -9, 45, -15, 27, -21, 15],
                               [-64, 30, 52, 35, 12, -30, 24, -15, 38, -30, -14, -25],
                               [30, -48, -12, -39, -30, 44, -24, 27, -30, 46, -6, 33],
                               [52, -12, -58, -23, 24, -24, -6, -21, -14, -6, 26, 1],
                               [35, -39, -23, -35, -15, 27, -21, 15, -25, 33, 1, 25]])

    total_loss = torch.add(*[loss(i) for i in range(n)])
    u.check_equal(total_loss, 397 / 2)

    automaticHess = u.hessian(total_loss, W)
    automaticHess = automaticHess.transpose(0, 1).transpose(2, 3).reshape((d[1] * d[2], d[1] * d[2]))
    u.check_equal(automaticHess, manualHess)

    # Note: layers have dimensions (in, out), but the matrices have shape (out, in)
    layer = nn.Linear(d[1], d[2], bias=False)
    Blayer = nn.Linear(d[2], d[3], bias=False)
    model = torch.nn.Sequential(layer, nn.ReLU(), Blayer)
    layer.weight.data.copy_(torch.ones((d[2], d[1])))
    Blayer.weight.data.copy_(Bs[0].t())
    u.check_close(model(As[0]), [-18., -3.])


import autograd_lib
import globals as gl
# import torch
import torch
import util as u
from torch import nn as nn
from torch.utils.tensorboard import SummaryWriter


def test_explicit_hessian():
    """Check computation of hessian of loss(B'WA) from https://github.com/yaroslavvb/kfac_pytorch/blob/master/derivation.pdf


    """

    torch.set_default_dtype(torch.float64)
    A = torch.tensor([[-1., 4], [3, 0]])
    B = torch.tensor([[-4., 3], [2, 6]])
    X = torch.tensor([[-5., 0], [-2, -6]], requires_grad=True)

    Y = B.t() @ X @ A
    u.check_equal(Y, [[-52, 64], [-81, -108]])
    loss = torch.sum(Y * Y) / 2
    hess0 = u.hessian(loss, X).reshape([4, 4])
    hess1 = u.Kron(A @ A.t(), B @ B.t())

    u.check_equal(loss, 12512.5)

    # PyTorch autograd computes Hessian with respect to row-vectorized parameters, whereas
    # autograd_lib uses math convention and does column-vectorized.
    # Commuting order of Kronecker product switches between two representations
    u.check_equal(hess1.commute(), hess0)

    # Do a test using Linear layers instead of matrix multiplies
    model: u.SimpleFullyConnected2 = u.SimpleFullyConnected2([2, 2, 2], bias=False)
    model.layers[0].weight.data.copy_(X)

    # Transpose to match previous results, layers treat dim0 as batch dimension
    u.check_equal(model.layers[0](A.t()).t(), [[5, -20], [-16, -8]])  # XA = (A'X0)'

    model.layers[1].weight.data.copy_(B.t())
    u.check_equal(model(A.t()).t(), Y)

    Y = model(A.t()).t()    # transpose to data-dimension=columns
    loss = torch.sum(Y * Y) / 2
    loss.backward()

    u.check_equal(model.layers[0].weight.grad, [[-2285, -105], [-1490, -1770]])
    G = B @ Y @ A.t()
    u.check_equal(model.layers[0].weight.grad, G)

    u.check_equal(hess0, u.Kron(B @ B.t(), A @ A.t()))

    # compute newton step
    u.check_equal(u.Kron(A@A.t(), B@B.t()).pinv() @ u.vec(G), u.v2c([-5, -2, 0, -6]))

    # compute Newton step using factored representation
    autograd_lib.add_hooks(model)

    Y = model(A.t())
    n = 2
    loss = torch.sum(Y * Y) / 2
    autograd_lib.backprop_hess(Y, hess_type='LeastSquares')
    autograd_lib.compute_hess(model, method='kron', attr_name='hess_kron', vecr_order=False, loss_aggregation='sum')
    param = model.layers[0].weight

    hess2 = param.hess_kron
    print(hess2)

    u.check_equal(hess2, [[425, 170, -75, -30], [170, 680, -30, -120], [-75, -30, 225, 90], [-30, -120, 90, 360]])

    # Gradient test
    model.zero_grad()
    loss.backward()
    u.check_close(u.vec(G).flatten(), u.Vec(param.grad))

    # Newton step test
    # Method 0: PyTorch native autograd
    newton_step0 = param.grad.flatten() @ torch.pinverse(hess0)
    newton_step0 = newton_step0.reshape(param.shape)
    u.check_equal(newton_step0, [[-5, 0], [-2, -6]])

    # Method 1: colummn major order
    ihess2 = hess2.pinv()
    u.check_equal(ihess2.LL, [[1/16, 1/48], [1/48, 17/144]])
    u.check_equal(ihess2.RR, [[2/45, -(1/90)], [-(1/90), 1/36]])
    u.check_equal(torch.flatten(hess2.pinv() @ u.vec(G)), [-5, -2, 0, -6])
    newton_step1 = (ihess2 @ u.Vec(param.grad)).matrix_form()

    # Method2: row major order
    ihess2_rowmajor = ihess2.commute()
    newton_step2 = ihess2_rowmajor @ u.Vecr(param.grad)
    newton_step2 = newton_step2.matrix_form()

    u.check_equal(newton_step0, newton_step1)
    u.check_equal(newton_step0, newton_step2)


def test_factored_hessian():
    """"Simple test to ensure Hessian computation is working.

    In a linear neural network with squared loss, Newton step will converge in one step.
    Compute stats after minimizing, pass sanity checks.
    """

    u.seed_random(1)
    loss_type = 'LeastSquares'

    data_width = 2
    n = 5
    d1 = data_width ** 2
    o = 10
    d = [d1, o]

    model = u.SimpleFullyConnected2(d, bias=False, nonlin=False)
    model = model.to(gl.device)
    print(model)

    dataset = u.TinyMNIST(data_width=data_width, dataset_size=n, loss_type=loss_type)
    stats_loader = torch.utils.data.DataLoader(dataset, batch_size=n, shuffle=False)
    stats_iter = u.infinite_iter(stats_loader)
    stats_data, stats_targets = next(stats_iter)

    if loss_type == 'LeastSquares':
        loss_fn = u.least_squares
    else:  # loss_type == 'CrossEntropy':
        loss_fn = nn.CrossEntropyLoss()

    autograd_lib.add_hooks(model)
    gl.reset_global_step()
    last_outer = 0

    data, targets = stats_data, stats_targets

    # Capture Hessian and gradient stats
    autograd_lib.enable_hooks()
    autograd_lib.clear_backprops(model)

    output = model(data)
    loss = loss_fn(output, targets)
    print(loss)
    loss.backward(retain_graph=True)
    layer = model.layers[0]

    autograd_lib.clear_hess_backprops(model)
    autograd_lib.backprop_hess(output, hess_type=loss_type)
    autograd_lib.disable_hooks()

    # compute Hessian using direct method, compare against PyTorch autograd
    hess0 = u.hessian(loss, layer.weight)
    autograd_lib.compute_hess(model)
    hess1 = layer.weight.hess
    print(hess1)
    u.check_close(hess0.reshape(hess1.shape), hess1, atol=1e-9, rtol=1e-6)

    # compute Hessian using factored method
    autograd_lib.compute_hess(model, method='kron', attr_name='hess2', vecr_order=True)
    # s.regret_newton = vecG.t() @ pinvH.commute() @ vecG.t() / 2  # TODO(y): figure out why needed transposes

    hess2 = layer.weight.hess2
    u.check_close(hess1, hess2, atol=1e-9, rtol=1e-6)

    # Newton step in regular notation
    g1 = layer.weight.grad.flatten()
    newton1 = hess1 @ g1

    g2 = u.Vecr(layer.weight.grad)
    newton2 = g2 @ hess2

    u.check_close(newton1, newton2, atol=1e-9, rtol=1e-6)

    # compute regret in factored notation, compare against actual drop in loss
    regret1 = g1 @ hess1.pinverse() @ g1 / 2
    regret2 = g2 @ hess2.pinv() @ g2 / 2
    u.check_close(regret1, regret2)

    current_weight = layer.weight.detach().clone()
    param: torch.nn.Parameter = layer.weight
    # param.data.sub_((hess1.pinverse() @ g1).reshape(param.shape))
    # output = model(data)
    # loss = loss_fn(output, targets)
    # print("result 1", loss)

    # param.data.sub_((hess1.pinverse() @ u.vec(layer.weight.grad)).reshape(param.shape))
    # output = model(data)
    # loss = loss_fn(output, targets)
    # print("result 2", loss)

    # param.data.sub_((u.vec(layer.weight.grad).t() @ hess1.pinverse()).reshape(param.shape))
    # output = model(data)
    # loss = loss_fn(output, targets)
    # print("result 3", loss)
    #

    del layer.weight.grad
    output = model(data)
    loss = loss_fn(output, targets)
    loss.backward()
    param.data.sub_(u.unvec(hess1.pinverse() @ u.vec(layer.weight.grad), layer.weight.shape[0]))
    output = model(data)
    loss = loss_fn(output, targets)
    print("result 4", loss)

    # param.data.sub_((g1 @ hess1.pinverse() @ g1).reshape(param.shape))

    print(loss)


def test_hessian_multibatch():
    """Test that Kronecker-factored computations still work when splitting work over batches."""

    u.seed_random(1)

    # torch.set_default_dtype(torch.float64)

    gl.project_name = 'test'
    gl.logdir_base = '/tmp/runs'
    run_name = 'test_hessian_multibatch'
    u.setup_logdir_and_event_writer(run_name=run_name)

    loss_type = 'CrossEntropy'
    data_width = 2
    n = 4
    d1 = data_width ** 2
    o = 10
    d = [d1, o]

    model = u.SimpleFullyConnected2(d, bias=False, nonlin=False)
    model = model.to(gl.device)

    dataset = u.TinyMNIST(data_width=data_width, dataset_size=n, loss_type=loss_type)
    stats_loader = torch.utils.data.DataLoader(dataset, batch_size=n, shuffle=False)
    stats_iter = u.infinite_iter(stats_loader)

    if loss_type == 'LeastSquares':
        loss_fn = u.least_squares
    else:  # loss_type == 'CrossEntropy':
        loss_fn = nn.CrossEntropyLoss()

    autograd_lib.add_hooks(model)
    gl.reset_global_step()
    last_outer = 0

    stats_iter = u.infinite_iter(stats_loader)
    stats_data, stats_targets = next(stats_iter)
    data, targets = stats_data, stats_targets

    # Capture Hessian and gradient stats
    autograd_lib.enable_hooks()
    autograd_lib.clear_backprops(model)

    output = model(data)
    loss = loss_fn(output, targets)
    loss.backward(retain_graph=True)
    layer = model.layers[0]

    autograd_lib.clear_hess_backprops(model)
    autograd_lib.backprop_hess(output, hess_type=loss_type)
    autograd_lib.disable_hooks()

    # compute Hessian using direct method, compare against PyTorch autograd
    hess0 = u.hessian(loss, layer.weight)
    autograd_lib.compute_hess(model)
    hess1 = layer.weight.hess
    u.check_close(hess0.reshape(hess1.shape), hess1, atol=1e-8, rtol=1e-6)

    # compute Hessian using factored method. Because Hessian depends on examples for cross entropy, factoring is not exact, raise tolerance
    autograd_lib.compute_hess(model, method='kron', attr_name='hess2', vecr_order=True)
    hess2 = layer.weight.hess2
    u.check_close(hess1, hess2, atol=1e-3, rtol=1e-1)

    # compute Hessian using multibatch
    # restart iterators
    dataset = u.TinyMNIST(data_width=data_width, dataset_size=n, loss_type=loss_type)
    assert n % 2 == 0
    stats_loader = torch.utils.data.DataLoader(dataset, batch_size=n//2, shuffle=False)
    stats_iter = u.infinite_iter(stats_loader)
    autograd_lib.compute_cov(model, loss_fn, stats_iter, batch_size=n//2, steps=2)

    cov: autograd_lib.LayerCov = layer.cov
    hess2: u.Kron = hess2.commute()    # get back into AA x BB order
    u.check_close(cov.H.value(), hess2)


def _test_refactored_stats():
    gl.project_name = 'test'
    gl.logdir_base = '/tmp/runs'
    run_name = 'test_hessian_multibatch'
    u.setup_logdir_and_event_writer(run_name=run_name)

    loss_type = 'CrossEntropy'
    data_width = 2
    n = 4
    d1 = data_width ** 2
    o = 10
    d = [d1, o]

    model = u.SimpleFullyConnected2(d, bias=False, nonlin=False)
    model = model.to(gl.device)

    dataset = u.TinyMNIST(data_width=data_width, dataset_size=n, loss_type=loss_type)
    stats_loader = torch.utils.data.DataLoader(dataset, batch_size=n, shuffle=False)
    stats_iter = u.infinite_iter(stats_loader)

    if loss_type == 'LeastSquares':
        loss_fn = u.least_squares
    else:  # loss_type == 'CrossEntropy':
        loss_fn = nn.CrossEntropyLoss()

    autograd_lib.add_hooks(model)
    gl.reset_global_step()
    last_outer = 0

    stats_iter = u.infinite_iter(stats_loader)
    stats_data, stats_targets = next(stats_iter)
    data, targets = stats_data, stats_targets

    covG = autograd_lib.layer_cov_dict()
    covH = autograd_lib.layer_cov_dict()
    covJ = autograd_lib.layer_cov_dict()

    autograd_lib.register(model)

    A = {}
    with autograd_lib.save_activations(A):
        output = model(data)
        loss = loss_fn(output, targets)

    Acov = autograd_lib.ModuleDict(autograd_lib.SecondOrder)
    for layer, activations in A.items():
        Acov[layer].accumulate(activations)

    autograd_lib.set_default_activations(A)   # set activations to use by default when constructing cov matrices
    autograd_lib.set_default_Acov(Acov)

    # saves backprop covariances
    autograd_lib.backward_accum(loss, 1, covG)
    autograd_lib.backward_accum(output, autograd_lib.xent_bwd, covH)
    autograd_lib.backward_accum(output, autograd_lib.identity_bwd, covJ)

    #grad_cov = KronFactored(covA, covG.cov, covG.cross)
    #hess = KronFactored(covA, covH.cov, covH.cross)
    #grad_cov = KronFactored(covA, covJ.cov, covJ.cross)


def test_hessian_conv():
    """Test conv hessian computation using factored and regular method."""

    u.seed_random(1)
    unfold = torch.nn.functional.unfold
    fold = torch.nn.functional.fold

    import numpy as np

    u.seed_random(1)
    N, Xc, Xh, Xw = 3, 2, 3, 7
    dd = [Xc, 2]

    Kh, Kw = 2, 3
    Oh, Ow = Xh - Kh + 1, Xw - Kw + 1
    model = u.SimpleConvolutional(dd, kernel_size=(Kh, Kw), bias=True).double()

    weight_buffer = model.layers[0].weight.data

    # output channels, input channels, height, width
    assert weight_buffer.shape == (dd[1], dd[0], Kh, Kw)

    input_dims = N, Xc, Xh, Xw
    size = int(np.prod(input_dims))
    X = torch.arange(0, size).reshape(*input_dims).double()

    def loss_fn(data):
        err = data.reshape(len(data), -1)
        return torch.sum(err * err) / 2 / len(data)

    layer = model.layers[0]
    output = model(X)
    loss = loss_fn(output)
    loss.backward()

    u.check_equal(layer.activations, X)

    assert layer.backprops_list[0].shape == layer.output.shape
    assert layer.output.shape == (N, dd[1], Oh, Ow)

    out_unf = layer.weight.view(layer.weight.size(0), -1) @ unfold(layer.activations, (Kh, Kw))
    assert out_unf.shape == (N, dd[1], Oh * Ow)
    reshaped_bias = layer.bias.reshape(1, dd[1], 1)  # (Co,) -> (1, Co, 1)
    out_unf = out_unf + reshaped_bias

    u.check_equal(fold(out_unf, (Oh, Ow), (1, 1)), output)  # two alternative ways of reshaping
    u.check_equal(out_unf.view(N, dd[1], Oh, Ow), output)

    # Unfold produces patches with output dimension merged, while in backprop they are not merged
    # Hence merge the output (width/height) dimension
    assert unfold(layer.activations, (Kh, Kw)).shape == (N, Xc * Kh * Kw, Oh * Ow)
    assert layer.backprops_list[0].shape == (N, dd[1], Oh, Ow)

    grads_bias = layer.backprops_list[0].sum(dim=(2, 3)) * N
    mean_grad_bias = grads_bias.sum(dim=0) / N
    u.check_equal(mean_grad_bias, layer.bias.grad)

    Bt = layer.backprops_list[0] * N   # remove factor of N applied during loss batch averaging
    assert Bt.shape == (N, dd[1], Oh, Ow)
    Bt = Bt.reshape(N, dd[1], Oh*Ow)
    At = unfold(layer.activations, (Kh, Kw))
    assert At.shape == (N, dd[0] * Kh * Kw, Oh*Ow)

    grad_unf = torch.einsum('ijk,ilk->ijl', Bt, At)
    assert grad_unf.shape == (N, dd[1], dd[0] * Kh * Kw)

    grads = grad_unf.reshape((N, dd[1], dd[0], Kh, Kw))
    u.check_equal(grads.mean(dim=0), layer.weight.grad)

    # compute per-example gradients using autograd, compare against manual computation
    for i in range(N):
        u.clear_backprops(model)
        output = model(X[i:i + 1, ...])
        loss = loss_fn(output)
        loss.backward()
        u.check_equal(grads[i], layer.weight.grad)
        u.check_equal(grads_bias[i], layer.bias.grad)


def _test_explicit_hessian_refactored():

    """Check computation of hessian of loss(B'WA) from https://github.com/yaroslavvb/kfac_pytorch/blob/master/derivation.pdf


    """

    torch.set_default_dtype(torch.float64)
    A = torch.tensor([[-1.,

Download .txt

gitextract_ycct44t1/

├── .gitignore
├── README.md
├── akaitsuki-slow/
│   ├── config.py
│   ├── feed_dict.pbtxt
│   ├── feed_dict.py
│   └── main.py
├── autotune/
│   ├── README.md
│   ├── autograd_lib.py
│   ├── autograd_lib_test.py
│   ├── autograd_test.py
│   ├── ciresan_bench.py
│   ├── curvature_test.py
│   ├── eval_conv2d_approx.py
│   ├── factored_test.py
│   ├── globals.py
│   ├── hessian_test.py
│   ├── linalg_bench.py
│   ├── linesearch_test_disabled.py
│   ├── lyapunov_test.py
│   ├── mnist_end2end_test.py
│   ├── plotting_test.py
│   ├── pytorch_benchmark.py
│   ├── scipy_benchmark.py
│   ├── svd_benchmark.py
│   ├── test/
│   │   ├── bad_sigmas.pt
│   │   ├── factored.pt
│   │   └── gesvd_crash.txt
│   ├── train_ciresan.py
│   ├── train_ciresan_cca.py
│   ├── train_ciresan_factored.py
│   ├── train_ciresan_new.py
│   ├── train_medium.py
│   ├── train_small.py
│   ├── train_small_xent.py
│   ├── train_small_xent_factored.py
│   ├── train_tiny.py
│   ├── train_tiny_xent.py
│   ├── util.py
│   └── util_test.py
├── aws-recipes.ipynb
├── aws-scratch.ipynb
├── benchmark_huggingface_predict.py
├── bin/
│   └── tfversion
├── clipping-profile.ipynb
├── cluster/
│   ├── .gitignore
│   ├── README.md
│   ├── async_adder.py
│   ├── aws.py
│   ├── benchmark_grpc_recv.py
│   ├── benchmarks/
│   │   ├── .gitignore
│   │   ├── LICENSE
│   │   ├── README.md
│   │   ├── bower_components/
│   │   │   ├── d3/
│   │   │   │   ├── .bower.json
│   │   │   │   ├── .gitattributes
│   │   │   │   ├── CONTRIBUTING.md
│   │   │   │   ├── LICENSE
│   │   │   │   ├── README.md
│   │   │   │   ├── bower.json
│   │   │   │   ├── d3.js
│   │   │   │   └── package.js
│   │   │   └── plottable/
│   │   │       ├── .bower.json
│   │   │       ├── bower.json
│   │   │       ├── plottable.css
│   │   │       ├── plottable.d.ts
│   │   │       └── plottable.js
│   │   ├── dashboard_app/
│   │   │   ├── app.yaml
│   │   │   ├── main.py
│   │   │   ├── main_test.py
│   │   │   ├── requirements.txt
│   │   │   ├── static/
│   │   │   │   ├── css/
│   │   │   │   │   └── style.css
│   │   │   │   └── js/
│   │   │   │       └── benchmark_latency_chart.js
│   │   │   └── templates/
│   │   │       ├── index.html
│   │   │       └── test.html
│   │   ├── index.html
│   │   ├── js/
│   │   │   ├── csv_benchmark_chart.js
│   │   │   └── latency_chart.js
│   │   ├── scripts/
│   │   │   ├── Dockerfile.tf_cnn_benchmarks
│   │   │   ├── benchmark_configs.yml
│   │   │   ├── tf_cnn_benchmarks/
│   │   │   │   ├── README.md
│   │   │   │   ├── benchmark_cnn.py
│   │   │   │   ├── benchmark_storage.py
│   │   │   │   ├── cbuild_benchmark_storage.py
│   │   │   │   ├── cnn_util.py
│   │   │   │   ├── convnet_builder.py
│   │   │   │   ├── datasets.py
│   │   │   │   ├── models/
│   │   │   │   │   ├── __init__.py
│   │   │   │   │   ├── alexnet_model.py
│   │   │   │   │   ├── densenet_model.py
│   │   │   │   │   ├── googlenet_model.py
│   │   │   │   │   ├── inception_model.py
│   │   │   │   │   ├── lenet_model.py
│   │   │   │   │   ├── model.py
│   │   │   │   │   ├── model_config.py
│   │   │   │   │   ├── overfeat_model.py
│   │   │   │   │   ├── resnet_model.py
│   │   │   │   │   ├── trivial_model.py
│   │   │   │   │   └── vgg_model.py
│   │   │   │   ├── preprocessing.py
│   │   │   │   ├── tf_cnn_benchmarks.py
│   │   │   │   └── variable_mgr.py
│   │   │   └── util/
│   │   │       ├── __init__.py
│   │   │       ├── benchmark_util.py
│   │   │       ├── benchmark_util_test.py
│   │   │       ├── convert_csv_to_json.py
│   │   │       └── convert_csv_to_json_test.py
│   │   ├── soumith_benchmarks.html
│   │   └── tools/
│   │       ├── k8s_tensorflow_lib.py
│   │       ├── k8s_tensorflow_test.py
│   │       ├── kubectl_util.py
│   │       ├── kubectl_util_test.py
│   │       └── run_distributed_benchmarks.py
│   ├── client_transfer_benchmark.py
│   ├── cloud-formation-example/
│   │   ├── README.md
│   │   ├── iam.yaml
│   │   ├── tensorflow.yaml
│   │   └── zone.sh
│   ├── connect.py
│   ├── delete_placement_groups.py
│   ├── fill_efs.py
│   ├── imagenet64/
│   │   ├── README.md
│   │   ├── aws.py
│   │   ├── launch.py
│   │   ├── requirements.txt
│   │   └── variable_mgr.py
│   ├── instance_info.py
│   ├── launch_async_adder.py
│   ├── launch_micro.py
│   ├── launch_ray.py
│   ├── launch_simple_tf.py
│   ├── local_distributed_benchmark.py
│   ├── myutil.py
│   ├── ray_add.py
│   ├── simple_distributed.py
│   ├── terminate_instances.py
│   ├── test_aws.py
│   ├── tf-tools/
│   │   ├── .gitignore
│   │   ├── benchmark/
│   │   │   ├── multi_gpu/
│   │   │   │   ├── advanced_tweaks_compare.sh
│   │   │   │   ├── image_classification_bench_tests.sh
│   │   │   │   ├── stats_monitor.sh
│   │   │   │   ├── test_runner.sh
│   │   │   │   └── unit_test_stats_monitor.sh
│   │   │   └── runner/
│   │   │       ├── cluster_aws.py
│   │   │       ├── command_builder.py
│   │   │       ├── configs/
│   │   │       │   └── aws/
│   │   │       │       ├── multi_server.yaml
│   │   │       │       └── yaroslav.yaml
│   │   │       ├── instance_info.py
│   │   │       ├── launch_experiment.py
│   │   │       ├── test_cluster_aws.py
│   │   │       ├── test_command_builder.py
│   │   │       └── util.py
│   │   └── install/
│   │       ├── aws_amzlinux.md
│   │       └── aws_ubuntu16_04.md
│   ├── tmux.py
│   └── upload_test.txt
├── conditional_backprop.py
├── configure_tf.sh
├── configure_tf_cpu.sh
├── danjar_peek.py
├── distributed/
│   ├── README.md
│   ├── benchmark_grpc_recv.py
│   └── client_transfer_benchmark.py
├── double_memory_bug.py
├── dynamic_stitch_gpu.py
├── dynamic_stitch_gpu_profile.pbtxt
├── eager_lbfgs/
│   ├── .ipynb_checkpoints/
│   │   └── performance-checkpoint.ipynb
│   ├── common_gd.py
│   ├── data/
│   │   ├── short_batch.csv
│   │   ├── short_eager_batch.csv
│   │   ├── short_eager_loss.csv
│   │   ├── short_eager_time.csv
│   │   ├── short_pytorch_loss.csv
│   │   └── short_pytorch_time.csv
│   ├── eager_lbfgs.py
│   ├── performance.ipynb
│   ├── pytorch_lbfgs.py
│   ├── run_experiment.py
│   ├── torch_lbfgs.lua
│   └── util.py
├── enqueue_many_test.py
├── enqueue_many_test_singlerun.py
├── ericyue-slowreader/
│   ├── benchmark-batch-noqueuerunners-timeline.json
│   ├── benchmark-batch-noqueuerunners.profile
│   ├── benchmark-batch-noqueuerunners.py
│   ├── benchmark-batch.py
│   ├── benchmark-reader.py
│   ├── benchmark-synthetic-batch.py
│   ├── benchmark-synthetic.py
│   ├── benchmark.py
│   ├── data.zlib
│   └── profile-batch.py
├── free_gpus.py
├── github_pyfunc_slowness.py
├── gpu-memory-transfer.ipynb
├── gpu_oom.py
├── graph_template.py
├── imagenet15-scratch.ipynb
├── input_benchmarks/
│   ├── convert_to_records.py
│   ├── fully_connected_feed.py
│   ├── fully_connected_preloaded_var.py
│   ├── fully_connected_reader.py
│   ├── timeline.feed.json
│   ├── timeline.reader.json
│   └── timeline.var.json
├── inverse_segfault.py
├── keras_autoencoder/
│   ├── keras_large.py
│   ├── util.py
│   └── weightnorm.py
├── khatri_rao_benchmark.py
├── lazy_dog.py
├── linalg-benchmark/
│   ├── README.md
│   ├── bad_matrix.py
│   ├── benchmark.py
│   ├── environment.yml
│   ├── get_cores_per_socket.py
│   ├── launch.py
│   ├── launch_tensorflow_svd_crash.py
│   ├── requirements.txt
│   ├── results.txt
│   └── tensorflow_svd_crash.py
├── line_search_example/
│   ├── data/
│   │   └── step_lengths_ada.csv
│   ├── line_search_example.py
│   └── util.py
├── linearize/
│   ├── linearize.py
│   ├── linearize_test.py
│   └── memory_util.py
├── matmul_benchmark.py
├── matmul_benchmark_seq.py
├── matmul_times/
│   ├── 1080-float16.csv
│   ├── 1080-float32.csv
│   ├── g3-float16.csv
│   ├── g3-float32.csv
│   ├── nvidia-p3-float16.csv
│   ├── nvidia-p3-float32.csv
│   ├── p2-float16.csv
│   └── p2-float32.csv
├── mavelin/
│   ├── machine1.py
│   └── machine3.py
├── memory tracking.ipynb
├── memory-probe-examples.ipynb
├── memory-release-check.ipynb
├── natural_gradient_multilayer.py
├── node-merge.ipynb
├── notebook_util.py
├── numpy_initializers/
│   ├── kfac_cifar.py
│   └── util.py
├── parallel_dequeue_test.py
├── phantomjs-tryout.ipynb
├── phantomjs-tryout.js
├── pytorch-hessian.ipynb
├── queue_mismatch.py
├── queues_talk/
│   └── queues.ipynb
├── resnet_8_simple.pbtxt
├── resnet_leak_report.py
├── resnet_leak_report2.py
├── resource_variable_test.py
├── rotations_comparison.py
├── saving memory by using functions.ipynb
├── simple_rewiring.ipynb
├── simple_train.py
├── svd_benchmark.py
├── svd_noconverge.py
├── svd_test.py
├── tf_initializer_bug_report.py
├── tiny_runs/
│   ├── qr_test.py
│   └── tiny_tf.py
└── whitening_util.py

Download .txt

Showing preview only (226K chars total). Download the full file or copy to clipboard to get everything.

SYMBOL INDEX (3079 symbols across 142 files)

FILE: akaitsuki-slow/config.py
  function str2bool (line 4) | def str2bool(v):
  function get_args (line 8) | def get_args():

FILE: akaitsuki-slow/main.py
  function retrieve_seq_length_op2 (line 9) | def retrieve_seq_length_op2(data):
  function advanced_indexing_op (line 13) | def advanced_indexing_op(input, index):
  function bidirectional_dynamic_rnn (line 23) | def bidirectional_dynamic_rnn(inputs, cell_fn, n_hidden, sequence_length...
  function BilinearAttention (line 47) | def BilinearAttention(inputs, n_hidden, mask=None,
  function inference (line 63) | def inference(x1, x2, mask1, mask2, l, y,
  function main (line 122) | def main(args):

FILE: autotune/autograd_lib.py
  class LayerStats (line 61) | class LayerStats:
    method __iter__ (line 94) | def __iter__(self):
    method __init__ (line 97) | def __init__(self):
    method __getitem__ (line 100) | def __getitem__(self, item):
    method items (line 103) | def items(self):
  function add_hooks (line 107) | def add_hooks(model: nn.Module) -> None:
  function remove_hooks (line 137) | def remove_hooks(model: nn.Module) -> None:
  function disable_hooks (line 155) | def disable_hooks() -> None:
  function enable_hooks (line 164) | def enable_hooks() -> None:
  function is_supported (line 171) | def is_supported(layer: nn.Module) -> bool:
  function _layer_type (line 177) | def _layer_type(layer: nn.Module) -> str:
  function _capture_activations (line 181) | def _capture_activations(layer: nn.Module, input: List[torch.Tensor], ou...
  function _capture_output (line 190) | def _capture_output(layer: nn.Module, input: List[torch.Tensor], output:...
  function _capture_backprops (line 199) | def _capture_backprops(layer: nn.Module, _input, output):
  function clear_backprops (line 217) | def clear_backprops(model: nn.Module) -> None:
  function clear_hess_backprops (line 224) | def clear_hess_backprops(model: nn.Module) -> None:
  function compute_grad1 (line 231) | def compute_grad1(model: nn.Module, loss_type: str = 'mean') -> None:
  function compute_hess (line 286) | def compute_hess(model: nn.Module, method='exact', attr_name=None, vecr_...
  function backprop_hess (line 445) | def backprop_hess(output: torch.Tensor, hess_type: str, model: Optional[...
  class LayerCov (line 519) | class LayerCov:
    method __init__ (line 526) | def __init__(self):
  function compute_cov (line 532) | def compute_cov(model: nn.Module, loss_fn: Callable, stats_iter, batch_s...
  function update_cov (line 585) | def update_cov(model, a_attr, b_attr, target_attr):
  function compute_stats (line 642) | def compute_stats(model, attr_name='stats', factored=False, sigma_center...
  function compute_stats_factored (line 789) | def compute_stats_factored(model, attr_name='stats', sigma_centering=True):
  class SecondOrderCov (line 923) | class SecondOrderCov:
    method __init__ (line 926) | def __init__(self):
    method accumulate (line 935) | def accumulate(self, data_x, data_y):
    method zero_ (line 952) | def zero_(self):
  class SymmetricFourthOrderCov (line 959) | class SymmetricFourthOrderCov:
    method __init__ (line 971) | def __init__(self, rank=3):
    method accumulate (line 978) | def accumulate(self, data_x: torch.Tensor, data_y: torch.Tensor, cache...
    method zero_ (line 1001) | def zero_(self):
  class ModuleDict (line 1007) | class ModuleDict(dict):
    method __init__ (line 1009) | def __init__(self, defaultcreator=None, defaultvalue=None):
    method __getitem__ (line 1018) | def __getitem__(self, item):
  class Settings (line 1035) | class Settings(object):
    method __init__ (line 1043) | def __init__(self):
  function layer_cov_dict (line 1074) | def layer_cov_dict(model):
  function _forward_hook (line 1079) | def _forward_hook(layer: nn.Module, input: List[torch.Tensor], output: t...
  function _backward_hook (line 1085) | def _backward_hook(layer: nn.Module, _input: torch.Tensor, output: torch...
  function register (line 1090) | def register(model: nn.Module):
  function _hack_zero_gradient_norms_squared (line 1109) | def _hack_zero_gradient_norms_squared():
  function _hack_update_gradient_norms_squared (line 1116) | def _hack_update_gradient_norms_squared(layer: nn.Module, backprops: tor...
  function unregister (line 1129) | def unregister():
  function save_activations (line 1139) | def save_activations(storage: ModuleDict):
  function extend_backprops (line 1165) | def extend_backprops(storage: ModuleDict):
  function module_hook (line 1183) | def module_hook(hook: Callable):
  function save_activations2 (line 1215) | def save_activations2():
  function backward (line 1239) | def backward(tensor, backward_func, retain_graph=False):
  function backward_accum (line 1249) | def backward_accum(tensor, backward_func, storage: ModuleDict, retain_gr...
  function set_default_activations (line 1308) | def set_default_activations(A):
  function set_default_Acov (line 1316) | def set_default_Acov(Acov):
  function backward_xent (line 1322) | def backward_xent(output):
  function backward_identity (line 1326) | def backward_identity(tensor):
  function backprop_identity (line 1339) | def backprop_identity(output, retain_graph=False) -> None:
  function backward_ones (line 1360) | def backward_ones(output):
  function backward_jacobian (line 1369) | def backward_jacobian(output, sampled=False, retain_graph=False) -> None:
  function backward_hessian (line 1396) | def backward_hessian(output, loss='CrossEntropy', sampled=False, retain_...
  function grad_norms (line 1418) | def grad_norms(A, B, m=None, approx='zero_order'):
  function offset_losses (line 1463) | def offset_losses(A, B, alpha, offset, m, approx='zero_order'):
  function offset_cosines (line 1499) | def offset_cosines(A, B, offset=1):
  function offset_dotprod (line 1525) | def offset_dotprod(A, B, offset=1):
  function grad_curvs (line 1542) | def grad_curvs(A, B, metric):

FILE: autotune/autograd_lib_test.py
  function simple_model (line 20) | def simple_model(d, num_layers):
  function test_hooks (line 30) | def test_hooks():
  function _test_activations_contextmanager (line 72) | def _test_activations_contextmanager():
  function test_jacobian (line 115) | def test_jacobian():
  function create_toy_model (line 196) | def create_toy_model():
  function test_gradient_norms (line 214) | def test_gradient_norms():
  function test_full_hessian (line 245) | def test_full_hessian():
  function test_full_fisher (line 284) | def test_full_fisher():
  function test_full_fisher_multibatch (line 323) | def test_full_fisher_multibatch():
  function test_kfac_hessian (line 368) | def test_kfac_hessian():
  function test_full_hessian_multibatch (line 403) | def test_full_hessian_multibatch():
  function test_diagonal_hessian (line 436) | def test_diagonal_hessian():
  function test_full_hessian_xent (line 470) | def test_full_hessian_xent():
  function test_full_hessian_xent_multibatch (line 517) | def test_full_hessian_xent_multibatch():
  function test_full_hessian_xent_kfac (line 569) | def test_full_hessian_xent_kfac():
  function test_full_hessian_xent_kfac2 (line 631) | def test_full_hessian_xent_kfac2():
  function test_full_hessian_xent_mnist (line 687) | def test_full_hessian_xent_mnist():
  function test_full_hessian_xent_mnist_multilayer (line 731) | def test_full_hessian_xent_mnist_multilayer():
  function _test_kfac_hessian_xent_mnist (line 785) | def _test_kfac_hessian_xent_mnist():
  function test_kfac_jacobian_mnist (line 835) | def test_kfac_jacobian_mnist():
  function test_kfac_fisher_mnist (line 892) | def test_kfac_fisher_mnist():
  class MyList (line 946) | class MyList:
    method __init__ (line 947) | def __init__(self, *args, **kwargs):
    method __getattr__ (line 951) | def __getattr__(self, *_args, **_kwargs):
    method normal_form (line 954) | def normal_form(self):
    method value (line 957) | def value(self):
  function test_grad_norms (line 961) | def test_grad_norms():

FILE: autotune/autograd_test.py
  function test_autoencoder_minimize (line 26) | def test_autoencoder_minimize():
  function test_autoencoder_newton (line 65) | def test_autoencoder_newton():
  function test_main_autograd (line 109) | def test_main_autograd():
  function test_unfold (line 245) | def test_unfold():
  function test_cross_entropy_hessian_tiny (line 279) | def test_cross_entropy_hessian_tiny():
  function test_cross_entropy_hessian_mnist (line 323) | def test_cross_entropy_hessian_mnist():
  function test_hessian (line 372) | def test_hessian():
  function test_conv_grad (line 460) | def test_conv_grad():
  function test_conv_hessian (line 535) | def test_conv_hessian():
  class Net (line 607) | class Net(nn.Module):
    method __init__ (line 608) | def __init__(self):
    method forward (line 615) | def forward(self, x):
  class TinyNet (line 627) | class TinyNet(nn.Module):
    method __init__ (line 628) | def __init__(self):
    method forward (line 635) | def forward(self, x):            # 28x28
  function test_end2end_grad1 (line 647) | def test_end2end_grad1():
  function test_end2end_hess (line 673) | def test_end2end_hess():
  function subtest_hess_type (line 679) | def subtest_hess_type(hess_type):
  function test_kron_nano (line 720) | def test_kron_nano():
  function test_kron_tiny (line 776) | def test_kron_tiny():
  function test_kron_mnist (line 831) | def test_kron_mnist():
  function test_kron_conv_exact (line 890) | def test_kron_conv_exact():
  function test_kron_1x2_conv (line 962) | def test_kron_1x2_conv():
  function _test_kron_conv_golden (line 1020) | def _test_kron_conv_golden():

FILE: autotune/ciresan_bench.py
  class Net (line 32) | class Net(nn.Module):
    method __init__ (line 33) | def __init__(self, d):
    method forward (line 37) | def forward(self, x: torch.Tensor):
  class timeit (line 43) | class timeit:
    method __init__ (line 48) | def __init__(self, tag=""):
    method __enter__ (line 51) | def __enter__(self):
    method __exit__ (line 55) | def __exit__(self, *args):
  function get_mkl_version (line 64) | def get_mkl_version():
  function print_cpu_info (line 78) | def print_cpu_info():
  function linalg_bench (line 89) | def linalg_bench():

FILE: autotune/curvature_test.py
  class Net (line 14) | class Net(nn.Module):
    method __init__ (line 15) | def __init__(self, d):
    method forward (line 19) | def forward(self, x: torch.Tensor):
  function test_singlelayer (line 24) | def test_singlelayer():
  class Net2 (line 237) | class Net2(nn.Module):
    method __init__ (line 238) | def __init__(self, d1, d2):
    method forward (line 243) | def forward(self, X1: torch.Tensor):
  function test_multilayer (line 249) | def test_multilayer():

FILE: autotune/eval_conv2d_approx.py
  function compute_hess (line 48) | def compute_hess(n: int = 1, image_size: int = 1, kernel_size: int = 1, ...
  function main (line 102) | def main():

FILE: autotune/factored_test.py
  function test_factored_stats_golden_values (line 19) | def test_factored_stats_golden_values():
  function test_factored_vs_regular (line 116) | def test_factored_vs_regular():

FILE: autotune/globals.py
  function reset_global_step (line 33) | def reset_global_step():
  function increment_global_step (line 38) | def increment_global_step(incr: int):
  function get_global_step (line 43) | def get_global_step() -> int:

FILE: autotune/hessian_test.py
  class Net (line 11) | class Net(nn.Module):
    method __init__ (line 12) | def __init__(self, d):
    method forward (line 16) | def forward(self, x: torch.Tensor):
  function test_simple_hessian (line 21) | def test_simple_hessian():
  function test_explicit_hessian (line 96) | def test_explicit_hessian():
  function test_factored_hessian (line 185) | def test_factored_hessian():
  function test_hessian_multibatch (line 295) | def test_hessian_multibatch():
  function _test_refactored_stats (line 371) | def _test_refactored_stats():
  function test_hessian_conv (line 432) | def test_hessian_conv():
  function _test_explicit_hessian_refactored (line 511) | def _test_explicit_hessian_refactored():
  function _test_new_setup (line 602) | def _test_new_setup():

FILE: autotune/linalg_bench.py
  class Net (line 32) | class Net(nn.Module):
    method __init__ (line 33) | def __init__(self, d):
    method forward (line 37) | def forward(self, x: torch.Tensor):
  class timeit (line 42) | class timeit:
    method __init__ (line 47) | def __init__(self, tag=""):
    method __enter__ (line 50) | def __enter__(self):
    method __exit__ (line 54) | def __exit__(self, *args):
  function get_mkl_version (line 60) | def get_mkl_version():
  function print_cpu_info (line 74) | def print_cpu_info():
  function linalg_bench (line 85) | def linalg_bench():

FILE: autotune/linesearch_test_disabled.py
  function install_pdb_handler (line 44) | def install_pdb_handler():
  class FastMNIST (line 80) | class FastMNIST(datasets.MNIST):
    method __init__ (line 81) | def __init__(self, *args, **kwargs):
    method __getitem__ (line 93) | def __getitem__(self, index):
  class FastBinaryMNIST (line 106) | class FastBinaryMNIST(datasets.MNIST):
    method __init__ (line 107) | def __init__(self, *args, **kwargs):
    method __getitem__ (line 122) | def __getitem__(self, index):
  class SimpleMNIST (line 135) | class SimpleMNIST(datasets.MNIST):
    method __init__ (line 137) | def __init__(self, *args, **kwargs):
    method __getitem__ (line 152) | def __getitem__(self, index):
  function compute_loss (line 165) | def compute_loss(output, target):
  function log (line 173) | def log(metrics, step):
  function test_lineasearch (line 187) | def test_lineasearch():
  function train (line 373) | def train(model, device, train_loader, optimizer, epoch, args, logger):

FILE: autotune/lyapunov_test.py
  class Net (line 18) | class Net(nn.Module):
    method __init__ (line 19) | def __init__(self, d):
    method forward (line 23) | def forward(self, x: torch.Tensor):
  function lyap_newton_schulz (line 30) | def lyap_newton_schulz(z, dldz, numIters, dtype):
  function test_lyapunov (line 44) | def test_lyapunov():
  function test_stability (line 143) | def test_stability():
  function compare_impl (line 153) | def compare_impl():
  function lyapunov_svd (line 181) | def lyapunov_svd(A, C, rtol=1e-4, use_svd=False):
  class timeit (line 200) | class timeit:
    method __init__ (line 204) | def __init__(self, tag=""):
    method __enter__ (line 207) | def __enter__(self):
    method __exit__ (line 211) | def __exit__(self, *args):
  function get_mkl_version (line 217) | def get_mkl_version():
  function print_cpu_info (line 231) | def print_cpu_info():

FILE: autotune/mnist_end2end_test.py
  function test_main (line 25) | def test_main():

FILE: autotune/plotting_test.py
  function install_pdb_handler (line 53) | def install_pdb_handler():
  class FastMNIST (line 89) | class FastMNIST(datasets.MNIST):
    method __init__ (line 90) | def __init__(self, *args, **kwargs):
    method __getitem__ (line 102) | def __getitem__(self, index):
  class FastBinaryMNIST (line 115) | class FastBinaryMNIST(datasets.MNIST):
    method __init__ (line 116) | def __init__(self, *args, **kwargs):
    method __getitem__ (line 131) | def __getitem__(self, index):
  class SimpleMNIST (line 144) | class SimpleMNIST(datasets.MNIST):
    method __init__ (line 146) | def __init__(self, *args, **kwargs):
    method __getitem__ (line 160) | def __getitem__(self, index):
  function compute_loss (line 173) | def compute_loss(output, target):
  function log_scalars (line 181) | def log_scalars(metrics: Dict[str, Any], parent_tag: str = '') -> None:
  function main (line 191) | def main():
  function train (line 506) | def train(model, device, train_loader, optimizer, epoch, args, logger):
  function validate (line 580) | def validate(model, device, val_loader, optimizer):

FILE: autotune/pytorch_benchmark.py
  function empty_aligned (line 51) | def empty_aligned(n, align):
  function benchmark (line 59) | def benchmark(method):

FILE: autotune/scipy_benchmark.py
  function empty_aligned (line 30) | def empty_aligned(n, align):
  function benchmark (line 38) | def benchmark(method, d):

FILE: autotune/svd_benchmark.py
  function empty_aligned (line 33) | def empty_aligned(n, align):
  function benchmark (line 41) | def benchmark(method):

FILE: autotune/train_ciresan.py
  function main (line 37) | def main():
  function validate (line 312) | def validate(model, val_loader, tag='validation'):

FILE: autotune/train_ciresan_cca.py
  function main (line 41) | def main():
  function validate (line 230) | def validate(model, val_loader, tag='validation'):

FILE: autotune/train_ciresan_factored.py
  function main (line 41) | def main():
  function validate (line 202) | def validate(model, val_loader, tag='validation'):

FILE: autotune/train_ciresan_new.py
  function skip_nans (line 49) | def skip_nans(t): return t[torch.isfinite(t)]
  function erank (line 52) | def erank(vals): return vals.sum() / vals.max()
  function srank (line 54) | def srank(vals): return (vals * vals).sum() / (vals.max() ** 2)
  function main (line 58) | def main():
  function validate (line 622) | def validate(model, val_loader, tag='validation'):

FILE: autotune/train_medium.py
  function main (line 26) | def main():

FILE: autotune/train_small.py
  function main (line 22) | def main():

FILE: autotune/train_small_xent.py
  function main (line 29) | def main():

FILE: autotune/train_small_xent_factored.py
  function main (line 29) | def main():

FILE: autotune/train_tiny.py
  function main (line 20) | def main():

FILE: autotune/train_tiny_xent.py
  function main (line 29) | def main():

FILE: autotune/util.py
  function get_condition (line 39) | def get_condition(dtype):
  function v2c (line 48) | def v2c(vec):
  function v2c_np (line 55) | def v2c_np(vec):
  function v2r (line 61) | def v2r(vec: torch.Tensor) -> torch.Tensor:
  function c2v (line 68) | def c2v(col: torch.Tensor) -> torch.Tensor:
  function vec (line 76) | def vec(mat):
  function test_vec (line 83) | def test_vec():
  function test_kron_trace (line 88) | def test_kron_trace():
  function tvec (line 97) | def tvec(mat):
  function test_tvec (line 103) | def test_tvec():
  function unvec (line 108) | def unvec(a, rows):
  function untvec (line 117) | def untvec(a, rows):
  function kron (line 125) | def kron(a: Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor]], b: O...
  function stable_kron (line 148) | def stable_kron(a, b):
  class SpecialForm (line 153) | class SpecialForm:
    method normal_form (line 154) | def normal_form(self):
  class Vec (line 158) | class Vec(SpecialForm):
    method __init__ (line 173) | def __init__(self, mat, shape: Tuple = None):
    method vec_form (line 189) | def vec_form(self):
    method matrix_form (line 192) | def matrix_form(self):
    method normal_form (line 195) | def normal_form(self):
    method __matmul__ (line 198) | def __matmul__(self, other):
    method __rmatmul__ (line 206) | def __rmatmul__(self, other):
    method __truediv__ (line 213) | def __truediv__(self, other):
    method norm (line 216) | def norm(self):
    method commute (line 219) | def commute(self):
    method __str__ (line 225) | def __str__(self):
  class Vecr (line 229) | class Vecr(SpecialForm):
    method __init__ (line 243) | def __init__(self, mat, shape: Tuple = None):
    method vec_form (line 259) | def vec_form(self):
    method matrix_form (line 262) | def matrix_form(self):
    method normal_form (line 265) | def normal_form(self):
    method __matmul__ (line 268) | def __matmul__(self, other):
    method __rmatmul__ (line 276) | def __rmatmul__(self, other):
    method __truediv__ (line 283) | def __truediv__(self, other):
    method norm (line 286) | def norm(self):
    method commute (line 289) | def commute(self):
    method __str__ (line 295) | def __str__(self):
  class Cov (line 299) | class Cov(SpecialForm):
  class FactoredCov (line 303) | class FactoredCov(SpecialForm):
  class KronFactoredCov (line 307) | class KronFactoredCov(SpecialForm):
    method __init__ (line 320) | def __init__(self, a_dim, b_dim):
    method add_samples (line 331) | def add_samples(self, A: torch.Tensor, B: torch.Tensor):
    method value (line 373) | def value(self) -> "Kron":
    method cross (line 376) | def cross(self) -> torch.Tensor:
    method wilks (line 380) | def wilks(self) -> torch.Tensor:
    method bartlett (line 392) | def bartlett(self):
    method prob_dep (line 401) | def prob_dep(self):
    method sigmas_indep (line 407) | def sigmas_indep(self):
    method __str__ (line 413) | def __str__(self):
  function square (line 417) | def square(a: torch.Tensor):
  class Kron (line 422) | class Kron(SpecialForm):
    method __init__ (line 431) | def __init__(self, LL, RR):
    method commute (line 447) | def commute(self):
    method normal_form (line 452) | def normal_form(self):
    method expand (line 455) | def expand(self):
    method expand_vec (line 459) | def expand_vec(self):
    method sym_l2_norm (line 463) | def sym_l2_norm(self):
    method symsqrt (line 466) | def symsqrt(self, cond=None, return_rank=False):
    method trace (line 476) | def trace(self):
    method frobenius_norm (line 479) | def frobenius_norm(self):
    method pinv (line 482) | def pinv(self):
    method inv (line 485) | def inv(self):
    method shape (line 489) | def shape(self):
    method qf (line 492) | def qf(self, G):
    method qf_vec (line 498) | def qf_vec(self, G):
    method __truediv__ (line 505) | def __truediv__(self, other):
    method __add__ (line 508) | def __add__(self, other):
    method __radd__ (line 512) | def __radd__(self, other):
    method __mul__ (line 516) | def __mul__(self, other):
    method __rmul__ (line 520) | def __rmul__(self, other):
    method __matmul__ (line 537) | def __matmul__(self, x):
    method __rmatmul__ (line 554) | def __rmatmul__(self, x):
    method __str__ (line 568) | def __str__(self):
    method __iter__ (line 571) | def __iter__(self):
  class MeanKronFactored (line 575) | class MeanKronFactored(SpecialForm):
    method __init__ (line 580) | def __init__(self, AA: torch.Tensor, BB: torch.Tensor):
    method expand (line 595) | def expand(self):
  function expand_hess (line 612) | def expand_hess(*v) -> Union[torch.Tensor, List[torch.Tensor]]:
  function test_kron (line 625) | def test_kron():
  function nan_check (line 634) | def nan_check(mat):
  function has_nan (line 642) | def has_nan(mat):
  function fro_norm (line 646) | def fro_norm(mat: torch.Tensor):
  function l2_norm (line 653) | def l2_norm(mat: torch.Tensor):
  function sym_l2_norm (line 667) | def sym_l2_norm(mat: torch.Tensor):
  function inv_square_root_numpy (line 685) | def inv_square_root_numpy(mat):
  function pinv_square_root_numpy (line 690) | def pinv_square_root_numpy(mat):
  function erank (line 696) | def erank(mat):
  function rank (line 701) | def rank(A):
  function sym_erank (line 709) | def sym_erank(mat):
  function lyapunov_spectral (line 714) | def lyapunov_spectral(A, B, cond=None):
  function lyapunov_svd (line 735) | def lyapunov_svd(A, C, rtol=1e-4, eps=1e-7, use_svd=False):
  function deleteme (line 761) | def deleteme():
  function lyapunov_svd2 (line 766) | def lyapunov_svd2(A, C, rtol=1e-4, eps=1e-7, use_svd=False):
  function lyapunov_truncated (line 795) | def lyapunov_truncated(A, C, use_svd=False, top_k=None, check_error=False):
  function lyapunov_lstsq (line 838) | def lyapunov_lstsq(A, C):
  function truncated_lyapunov_rho (line 847) | def truncated_lyapunov_rho(A, C):
  function outer (line 889) | def outer(x, y=None):
  function to_python_scalar (line 896) | def to_python_scalar(x):
  function is_scalar (line 905) | def is_scalar(x):
  function from_numpy (line 913) | def from_numpy(x) -> torch.Tensor:
  function pytorch_dtype_to_floating_numpy_dtype (line 934) | def pytorch_dtype_to_floating_numpy_dtype(dtype):
  function to_normal_form (line 947) | def to_normal_form(x):
  function to_pytorch (line 955) | def to_pytorch(x) -> torch.Tensor:
  function to_pytorches (line 966) | def to_pytorches(*xs) -> Tuple[torch.Tensor, ...]:
  function to_numpy (line 970) | def to_numpy(x, dtype: np.dtype = None) -> np.ndarray:
  function to_numpys (line 1007) | def to_numpys(*xs, dtype=np.float32):
  function khatri_rao (line 1011) | def khatri_rao(A: torch.Tensor, B: torch.Tensor):
  function khatri_rao_t (line 1022) | def khatri_rao_t(A: torch.Tensor, B: torch.Tensor):
  function jacobian (line 1034) | def jacobian(y: torch.Tensor, x: torch.Tensor, create_graph=False):
  function hessian (line 1046) | def hessian(y: torch.Tensor, x: torch.Tensor):
  function pinv (line 1050) | def pinv(mat: torch.Tensor, cond=None) -> torch.Tensor:
  function eig_real (line 1074) | def eig_real(mat: torch.Tensor) -> torch.Tensor:
  function pinv_square_root (line 1091) | def pinv_square_root(mat: torch.Tensor, eps=1e-4) -> torch.Tensor:
  function symeig_pos_evals (line 1100) | def symeig_pos_evals(mat: torch.Tensor) -> torch.Tensor:
  function svd_pos_svals (line 1107) | def svd_pos_svals(mat):
  function filter_evals (line 1114) | def filter_evals(vals, cond=None, remove_small=True, remove_negative=True):
  function isymsqrt (line 1130) | def isymsqrt(mat, *args):
  function symsqrt (line 1134) | def symsqrt(mat, cond=None, return_rank=False, inverse=False):
  function symsqrt_svd (line 1172) | def symsqrt_svd(mat: torch.Tensor):
  function robust_svd (line 1184) | def robust_svd(mat: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor, t...
  function regularize_mat (line 1203) | def regularize_mat(mat, eps):
  function regularize_mat2 (line 1209) | def regularize_mat2(mat, eps):
  function symsqrt_dist (line 1227) | def symsqrt_dist(cov1: torch.Tensor, cov2: torch.Tensor) -> float:
  function check_symmetric (line 1237) | def check_symmetric(mat):
  function check_close (line 1245) | def check_close(a0, b0, rtol=1e-5, atol=1e-8, label: str = '') -> None:
  function check_equal (line 1251) | def check_equal(observed, truth, rtol=1e-9, atol=1e-12, label: str = '')...
  function get_param (line 1278) | def get_param(layer):  # TODO(y): deprecate?
  class timeit (line 1288) | class timeit:
    method __init__ (line 1292) | def __init__(self, tag=""):
    method __enter__ (line 1295) | def __enter__(self):
    method __exit__ (line 1299) | def __exit__(self, *args):
  function run_all_tests (line 1307) | def run_all_tests(module: nn.Module):
  function freeze (line 1333) | def freeze(layer: nn.Module):
  function unfreeze (line 1339) | def unfreeze(layer: nn.Module):
  function mark_expensive (line 1345) | def mark_expensive(layer: nn.Module):
  function nest_stats (line 1349) | def nest_stats(tag: str, stats) -> Dict:
  function seed_random (line 1357) | def seed_random(seed: int) -> None:
  class TinyMNIST (line 1366) | class TinyMNIST(datasets.MNIST):
    method __init__ (line 1375) | def __init__(self, dataset_root='/tmp/data', data_width=4, targets_wid...
    method __getitem__ (line 1430) | def __getitem__(self, index):
  class SimpleModel (line 1447) | class SimpleModel(nn.Module):
    method __init__ (line 1456) | def __init__(self, *args, **kwargs):
    method disable_hooks (line 1461) | def disable_hooks(self):
    method enable_hooks (line 1465) | def enable_hooks(self):
    method _finalize (line 1470) | def _finalize(self):
  function least_squares (line 1484) | def least_squares(data, targets=None, aggregation='mean'):
  function debug_least_squares (line 1495) | def debug_least_squares(data, targets=None):
  class SimpleModel2 (line 1507) | class SimpleModel2(nn.Module):
    method __init__ (line 1514) | def __init__(self, *args, **kwargs):
    method _finalize (line 1518) | def _finalize(self):
  function get_parent_model (line 1529) | def get_parent_model(module_or_param) -> Optional[nn.Module]:
  function capture_activations (line 1547) | def capture_activations(module: nn.Module, input: List[torch.Tensor], ou...
  function capture_backprops (line 1560) | def capture_backprops(module: nn.Module, _input, output):
  function save_grad (line 1578) | def save_grad(param: nn.Parameter) -> Callable[[torch.Tensor], None]:
  function clear_backprops (line 1589) | def clear_backprops(model: nn.Module) -> None:
  function register_hooks (line 1603) | def register_hooks(model: SimpleModel):
  class SimpleFullyConnected (line 1617) | class SimpleFullyConnected(SimpleModel):
    method __init__ (line 1620) | def __init__(self, d: List[int], nonlin=False, bias=False, dropout=Fal...
    method forward (line 1646) | def forward(self, x: torch.Tensor):
  class SimpleFullyConnected2 (line 1651) | class SimpleFullyConnected2(SimpleModel2):
    method __init__ (line 1654) | def __init__(self, d: List[int], nonlin=False, bias=False, last_layer_...
    method forward (line 1682) | def forward(self, x: torch.Tensor):
  class SimpleMLP (line 1687) | class SimpleMLP(nn.Module):
    method __init__ (line 1693) | def __init__(self, d: List[int], nonlin=False, bias=False):
    method forward (line 1715) | def forward(self, x: torch.Tensor):
  class RedundantFullyConnected2 (line 1720) | class RedundantFullyConnected2(SimpleModel2):
    method __init__ (line 1723) | def __init__(self, d: List[int], nonlin=False, bias=False, last_layer_...
    method forward (line 1758) | def forward(self, x: torch.Tensor):
  class SimpleConvolutional (line 1773) | class SimpleConvolutional(SimpleModel):
    method __init__ (line 1776) | def __init__(self, d: List[int], kernel_size=(2, 2), nonlin=False, bia...
    method forward (line 1797) | def forward(self, x: torch.Tensor):
  class SimpleConvolutional2 (line 1801) | class SimpleConvolutional2(SimpleModel2):
    method __init__ (line 1804) | def __init__(self, d: List[int], kernel_size=(2, 2), nonlin=False, bia...
    method forward (line 1830) | def forward(self, x: torch.Tensor):
  class ReshapedConvolutional2 (line 1834) | class ReshapedConvolutional2(SimpleConvolutional2):
    method __init__ (line 1837) | def __init__(self, *args, **kwargs):
    method forward (line 1845) | def forward(self, x: torch.Tensor):
  class PooledConvolutional2 (line 1850) | class PooledConvolutional2(SimpleConvolutional2):
    method __init__ (line 1853) | def __init__(self, *args, **kwargs):
    method forward (line 1861) | def forward(self, x: torch.Tensor):
  class StridedConvolutional2 (line 1867) | class StridedConvolutional2(SimpleModel2):
    method __init__ (line 1870) | def __init__(self, d: List[int], kernel_size=(2, 2), nonlin=False, bia...
    method forward (line 1898) | def forward(self, x: torch.Tensor):
  class GroupedConvolutional2 (line 1906) | class GroupedConvolutional2(SimpleModel2):
    method __init__ (line 1911) | def __init__(self, d: List[int], kernel_size=(2, 2), o=None, nonlin=Fa...
    method forward (line 1940) | def forward(self, x: torch.Tensor):
  class ReshapedConvolutional (line 1952) | class ReshapedConvolutional(SimpleConvolutional):
    method __init__ (line 1955) | def __init__(self, *args, **kwargs):
    method forward (line 1963) | def forward(self, x: torch.Tensor):
  function log_scalars (line 1968) | def log_scalars(metrics: Dict[str, Any]) -> None:
  function log_scalar (line 1977) | def log_scalar(**metrics) -> None:
  function log_spectrum (line 1991) | def log_spectrum(tag, vals: torch.Tensor, loglog=True, discard_tiny=False):
  function get_events (line 2015) | def get_events(fname, x_axis='step'):
  function infinite_iter (line 2054) | def infinite_iter(obj):
  function dump (line 2062) | def dump(result, fname):
  function print_version_info (line 2081) | def print_version_info():
  function print_cpu_info (line 2110) | def print_cpu_info():
  function move_to_gpu (line 2146) | def move_to_gpu(tensors):
  function fmt (line 2150) | def fmt(a):
  function to_logits (line 2157) | def to_logits(p: torch.Tensor) -> torch.Tensor:
  class CrossEntropySoft (line 2169) | class CrossEntropySoft(nn.Module):
    method __init__ (line 2178) | def __init__(self):
    method forward (line 2182) | def forward(self, inputs, target):
  function get_unique_logdir (line 2202) | def get_unique_logdir(root_logdir: str) -> str:
  function setup_logdir_and_event_writer (line 2214) | def setup_logdir_and_event_writer(run_name: str, init_wandb=False):
  class HessianBackprop (line 2244) | class HessianBackprop:
  class HessianExactSqrLoss (line 2248) | class HessianExactSqrLoss(HessianBackprop):
    method __init__ (line 2251) | def __init__(self):
    method __call__ (line 2254) | def __call__(self, output: torch.Tensor):
  class HessianSampledSqrLoss (line 2264) | class HessianSampledSqrLoss(HessianBackprop):
    method __init__ (line 2267) | def __init__(self, num_samples):
    method __call__ (line 2271) | def __call__(self, output: torch.Tensor):
  class HessianExactCrossEntropyLoss (line 2287) | class HessianExactCrossEntropyLoss(HessianBackprop):
    method __init__ (line 2290) | def __init__(self):
    method __call__ (line 2293) | def __call__(self, logits: torch.Tensor):
  function hessian_from_backprops (line 2314) | def hessian_from_backprops(A_t, Bh_t, bias=False):
  function per_example_hess (line 2342) | def per_example_hess(A_t, Bh_t, bias=False):
  function kl_div_cov (line 2395) | def kl_div_cov(mat1, mat2, eps=1e-3):
  function kron_quadratic_form (line 2414) | def kron_quadratic_form(H, dd):
  function kron_trace (line 2419) | def kron_trace(H: Tuple[torch.Tensor, torch.Tensor]):
  function kron_trace_matmul (line 2425) | def kron_trace_matmul(H, sigma):
  function kron_pinv (line 2434) | def kron_pinv(H: Tuple):
  function kron_nan_check (line 2439) | def kron_nan_check(H):
  function kron_fro_norm (line 2444) | def kron_fro_norm(H):
  function kron_sym_l2_norm (line 2448) | def kron_sym_l2_norm(H):
  function kron_inv (line 2452) | def kron_inv(H):
  function kron_sigma (line 2456) | def kron_sigma(G):
  function kron_batch_sum (line 2462) | def kron_batch_sum(G: Tuple):
  function chop (line 2468) | def chop(mat: torch.Tensor, eps=1e-10) -> torch.Tensor:
  function format_list (line 2476) | def format_list(ll: List) -> str:
  function create_local_logdir (line 2481) | def create_local_logdir(logdir) -> str:
  class NoOp (line 2490) | class NoOp:
    method __getattr__ (line 2493) | def __getattr__(self, *_args, **_kwargs):
  function install_pdb_handler (line 2499) | def install_pdb_handler():
  function randomly_rotate (line 2532) | def randomly_rotate(X: torch.Tensor) -> torch.Tensor:
  function random_cov (line 2544) | def random_cov(rank, d=None, n=20) -> torch.Tensor:
  function _to_mathematica (line 2564) | def _to_mathematica(x):
  function _from_mathematica (line 2575) | def _from_mathematica(x):
  function _dim_check (line 2582) | def _dim_check(d, rank=0):
  function random_cov_pair (line 2588) | def random_cov_pair(shared_rank, independent_rank, d, n=20, strength=1):
  function is_row_matrix (line 2604) | def is_row_matrix(dd):
  function is_col_matrix (line 2608) | def is_col_matrix(dd):
  function is_square_matrix (line 2612) | def is_square_matrix(dd):
  function is_vector (line 2616) | def is_vector(dd) -> bool:
  function is_matrix (line 2621) | def is_matrix(dd) -> bool:
  function eye (line 2626) | def eye(d: int) -> torch.Tensor:
  function eye_like (line 2630) | def eye_like(X: torch.Tensor) -> torch.Tensor:
  function rmul (line 2639) | def rmul(a: torch.Tensor, b):
  function matmul (line 2644) | def matmul(a, b):
  function rmatmul (line 2651) | def rmatmul(a: torch.Tensor, b):
  function norm_squared (line 2656) | def norm_squared(param):
  function dot_product (line 2660) | def dot_product(A, B):
  function copy_stats (line 2685) | def copy_stats(shared_stats, stats):
  function skip_nans (line 2692) | def skip_nans(t): return t[torch.isfinite(t)]
  class MyList (line 2696) | class MyList:
    method __init__ (line 2697) | def __init__(self, *args, **kwargs):
    method __getattr__ (line 2701) | def __getattr__(self, *_args, **_kwargs):
    method normal_form (line 2704) | def normal_form(self):
    method value (line 2707) | def value(self):
  function divide_attributes (line 2711) | def divide_attributes(d, n):

FILE: autotune/util_test.py
  function test_khatri_rao (line 17) | def test_khatri_rao():
  function test_khatri_rao_t (line 25) | def test_khatri_rao_t():
  function test_to_logits (line 38) | def test_to_logits():
  function test_cross_entropy_soft (line 46) | def test_cross_entropy_soft():
  function test_symsqrt (line 82) | def test_symsqrt():
  function atest_pinv (line 134) | def atest_pinv():
  function test_pinverse (line 162) | def test_pinverse():
  function test_l2_norm (line 186) | def test_l2_norm():
  function test_symsqrt_neg (line 193) | def test_symsqrt_neg():
  function test_truncated_lyapunov (line 245) | def test_truncated_lyapunov():
  function test_lyapunov_lstsq (line 258) | def test_lyapunov_lstsq():
  function test_robust_svd (line 280) | def test_robust_svd():
  function test_misc (line 288) | def test_misc():
  function test_kron (line 294) | def test_kron():
  function test_contiguous (line 400) | def test_contiguous():

FILE: benchmark_huggingface_predict.py
  function to_list (line 37) | def to_list(tensor):
  function predict (line 40) | def predict(line, max_predictions):
  class timeit (line 86) | class timeit:
    method __init__ (line 90) | def __init__(self, tag=""):
    method __enter__ (line 93) | def __enter__(self):
    method __exit__ (line 97) | def __exit__(self, *args):

FILE: cluster/async_adder.py
  function traced_run (line 50) | def traced_run(fetches):
  function sessrun (line 78) | def sessrun(fetches):
  function get_ps_device (line 84) | def get_ps_device(task=0, op_device_str=''):
  function get_worker_device (line 93) | def get_worker_device(task, op_device_str=''):
  function session_config (line 101) | def session_config():
  function write_event (line 109) | def write_event(tag, value, step):
  function make_params (line 123) | def make_params():
  function run_worker (line 132) | def run_worker():
  class MyClusterConfig (line 242) | class MyClusterConfig:
    method __init__ (line 243) | def __init__(self):
    method __str__ (line 248) | def __str__(self):
  function load_config (line 251) | def load_config():
  function run_ps (line 275) | def run_ps():
  function _get_master (line 296) | def _get_master():
  function main (line 343) | def main():

FILE: cluster/aws.py
  function _ExecuteCommandInThread (line 46) | def _ExecuteCommandInThread(ssh_client,
  function _StreamOutputToFile (line 76) | def _StreamOutputToFile(fd, file, line_extractor, cmd=None):
  function _ExecuteCommandAndStreamOutput (line 101) | def _ExecuteCommandAndStreamOutput(ssh_client,
  function lookup_aws_instances (line 140) | def lookup_aws_instances(name):
  function tf_job (line 165) | def tf_job(name, num_tasks, instance_type=None, placement_group=''):
  function terminate_job (line 209) | def terminate_job(name):
  function _ssh_to_host (line 225) | def _ssh_to_host(hostname,
  class Job (line 264) | class Job:
    method __init__ (line 265) | def __init__(self, name, instances):
    method wait_until_ready (line 272) | def wait_until_ready(self):
  function _encode_float (line 278) | def _encode_float(value):
  function _decode_float (line 282) | def _decode_float(b16):
  class Task (line 285) | class Task:
    method __init__ (line 286) | def __init__(self, instance, job, task_id):
    method wait_until_ready (line 297) | def wait_until_ready(self):
    method initialize (line 306) | def initialize(self):
    method run_sync (line 319) | def run_sync(self, cmd):
    method _setup_tasklogdir (line 329) | def _setup_tasklogdir(self):
    method run (line 333) | def run(self, cmd, mirror_output=False):
    method upload (line 360) | def upload(self, local_file, remote_file=None):
    method _upload_directory (line 371) | def _upload_directory(self, local_directory, remote_directory):
    method public_ip (line 375) | def public_ip(self):
    method port (line 380) | def port(self):
    method ip (line 384) | def ip(self):  # private ip

FILE: cluster/benchmark_grpc_recv.py
  function session_config (line 50) | def session_config():
  function clusterspec (line 59) | def clusterspec():
  function create_graph (line 64) | def create_graph(device0, device1):
  function create_done_queue (line 82) | def create_done_queue(i):
  function make_event (line 97) | def make_event(tag, value, step):
  function run_benchmark (line 106) | def run_benchmark(sess, init_op, add_op):
  function run_benchmark_local (line 134) | def run_benchmark_local():
  function run_benchmark_distributed (line 140) | def run_benchmark_distributed():

FILE: cluster/benchmarks/bower_components/d3/d3.js
  function d3_documentElement (line 9) | function d3_documentElement(node) {
  function d3_window (line 12) | function d3_window(node) {
  function d3_ascending (line 46) | function d3_ascending(a, b) {
  function d3_number (line 109) | function d3_number(x) {
  function d3_numeric (line 112) | function d3_numeric(x) {
  function d3_bisector (line 171) | function d3_bisector(compare) {
  function d3_zipLength (line 232) | function d3_zipLength(d) {
  function d3_range_integerScale (line 284) | function d3_range_integerScale(x) {
  function d3_class (line 289) | function d3_class(ctor, properties) {
  function d3_Map (line 311) | function d3_Map() {
  function d3_map_escape (line 344) | function d3_map_escape(key) {
  function d3_map_unescape (line 347) | function d3_map_unescape(key) {
  function d3_map_has (line 350) | function d3_map_has(key) {
  function d3_map_remove (line 353) | function d3_map_remove(key) {
  function d3_map_keys (line 356) | function d3_map_keys() {
  function d3_map_size (line 361) | function d3_map_size() {
  function d3_map_empty (line 366) | function d3_map_empty() {
  function map (line 372) | function map(mapType, array, depth) {
  function entries (line 396) | function entries(map, depth) {
  function d3_Set (line 438) | function d3_Set() {
  function d3_identity (line 456) | function d3_identity(d) {
  function d3_rebind (line 464) | function d3_rebind(target, source, method) {
  function d3_vendorSymbol (line 470) | function d3_vendorSymbol(object, name) {
  function d3_noop (line 479) | function d3_noop() {}
  function d3_dispatch (line 485) | function d3_dispatch() {}
  function d3_dispatch_event (line 500) | function d3_dispatch_event(dispatch) {
  function d3_eventPreventDefault (line 523) | function d3_eventPreventDefault() {
  function d3_eventSource (line 526) | function d3_eventSource() {
  function d3_eventDispatch (line 531) | function d3_eventDispatch(target) {
  function d3_selection (line 557) | function d3_selection(groups) {
  function d3_selection_selector (line 600) | function d3_selection_selector(selector) {
  function d3_selection_selectorAll (line 618) | function d3_selection_selectorAll(selector) {
  function d3_selection_attr (line 656) | function d3_selection_attr(name, value) {
  function d3_collapse (line 680) | function d3_collapse(s) {
  function d3_selection_classedRe (line 700) | function d3_selection_classedRe(name) {
  function d3_selection_classes (line 703) | function d3_selection_classes(name) {
  function d3_selection_classed (line 706) | function d3_selection_classed(name, value) {
  function d3_selection_classedName (line 719) | function d3_selection_classedName(name) {
  function d3_selection_style (line 748) | function d3_selection_style(name, value, priority) {
  function d3_selection_property (line 769) | function d3_selection_property(name, value) {
  function d3_selection_creator (line 808) | function d3_selection_creator(name) {
  function d3_selectionRemove (line 828) | function d3_selectionRemove() {
  function bind (line 843) | function bind(group, groupData) {
  function d3_selection_dataNode (line 911) | function d3_selection_dataNode(data) {
  function d3_selection_filter (line 933) | function d3_selection_filter(selector) {
  function d3_selection_sortComparator (line 954) | function d3_selection_sortComparator(comparator) {
  function d3_selection_each (line 965) | function d3_selection_each(groups, callback) {
  function d3_selection_enter (line 997) | function d3_selection_enter(selection) {
  function d3_selection_enterInsertBefore (line 1030) | function d3_selection_enterInsertBefore(enter) {
  function d3_selection_on (line 1075) | function d3_selection_on(type, listener, capture) {
  function d3_selection_onListener (line 1114) | function d3_selection_onListener(listener, argumentz) {
  function d3_selection_onFilter (line 1126) | function d3_selection_onFilter(listener, argumentz) {
  function d3_event_dragSuppress (line 1136) | function d3_event_dragSuppress(node) {
  function d3_mousePoint (line 1164) | function d3_mousePoint(container, e) {
  function drag (line 1203) | function drag() {
  function dragstart (line 1206) | function dragstart(id, position, subject, move, end) {
  function d3_behavior_dragTouchId (line 1250) | function d3_behavior_dragTouchId() {
  function d3_sgn (line 1262) | function d3_sgn(x) {
  function d3_cross2d (line 1265) | function d3_cross2d(a, b, c) {
  function d3_acos (line 1268) | function d3_acos(x) {
  function d3_asin (line 1271) | function d3_asin(x) {
  function d3_sinh (line 1274) | function d3_sinh(x) {
  function d3_cosh (line 1277) | function d3_cosh(x) {
  function d3_tanh (line 1280) | function d3_tanh(x) {
  function d3_haversin (line 1283) | function d3_haversin(x) {
  function interpolate (line 1290) | function interpolate(t) {
  function zoom (line 1316) | function zoom(g) {
  function location (line 1416) | function location(p) {
  function point (line 1419) | function point(l) {
  function scaleTo (line 1422) | function scaleTo(s) {
  function translateTo (line 1425) | function translateTo(p, l) {
  function zoomTo (line 1430) | function zoomTo(that, p, l, k) {
  function rescale (line 1442) | function rescale() {
  function zoomstarted (line 1450) | function zoomstarted(dispatch) {
  function zoomed (line 1455) | function zoomed(dispatch) {
  function zoomended (line 1463) | function zoomended(dispatch) {
  function mousedowned (line 1469) | function mousedowned() {
  function touchstarted (line 1484) | function touchstarted() {
  function mousewheeled (line 1554) | function mousewheeled() {
  function dblclicked (line 1567) | function dblclicked() {
  function d3_color (line 1575) | function d3_color() {}
  function d3_hsl (line 1580) | function d3_hsl(h, s, l) {
  function d3_hsl_rgb (line 1595) | function d3_hsl_rgb(h, s, l) {
  function d3_hcl (line 1615) | function d3_hcl(h, c, l) {
  function d3_hcl_lab (line 1628) | function d3_hcl_lab(h, c, l) {
  function d3_lab (line 1634) | function d3_lab(l, a, b) {
  function d3_lab_rgb (line 1649) | function d3_lab_rgb(l, a, b) {
  function d3_lab_hcl (line 1656) | function d3_lab_hcl(l, a, b) {
  function d3_lab_xyz (line 1659) | function d3_lab_xyz(x) {
  function d3_xyz_lab (line 1662) | function d3_xyz_lab(x) {
  function d3_xyz_rgb (line 1665) | function d3_xyz_rgb(r) {
  function d3_rgb (line 1669) | function d3_rgb(r, g, b) {
  function d3_rgbNumber (line 1672) | function d3_rgbNumber(value) {
  function d3_rgbString (line 1675) | function d3_rgbString(value) {
  function d3_rgb_hex (line 1698) | function d3_rgb_hex(v) {
  function d3_rgb_parse (line 1701) | function d3_rgb_parse(format, rgb, hsl) {
  function d3_rgb_hsl (line 1737) | function d3_rgb_hsl(r, g, b) {
  function d3_rgb_lab (line 1749) | function d3_rgb_lab(r, g, b) {
  function d3_rgb_xyz (line 1756) | function d3_rgb_xyz(r) {
  function d3_rgb_parseNumber (line 1759) | function d3_rgb_parseNumber(c) {
  function d3_functor (line 1916) | function d3_functor(v) {
  function d3_xhrType (line 1923) | function d3_xhrType(response) {
  function d3_xhr (line 1930) | function d3_xhr(url, mimeType, response, callback) {
  function d3_xhr_fixCallback (line 2005) | function d3_xhr_fixCallback(callback) {
  function d3_xhrHasResponse (line 2010) | function d3_xhrHasResponse(request) {
  function dsv (line 2016) | function dsv(url, row, callback) {
  function response (line 2024) | function response(request) {
  function typedResponse (line 2027) | function typedResponse(f) {
  function token (line 2046) | function token() {
  function formatRow (line 2108) | function formatRow(row) {
  function formatValue (line 2111) | function formatValue(text) {
  function d3_timer_step (line 2139) | function d3_timer_step() {
  function d3_timer_mark (line 2156) | function d3_timer_mark() {
  function d3_timer_sweep (line 2165) | function d3_timer_sweep() {
  function d3_format_precision (line 2178) | function d3_format_precision(x, p) {
  function d3_formatPrefix (line 2195) | function d3_formatPrefix(d, i) {
  function d3_locale_numberFormat (line 2206) | function d3_locale_numberFormat(locale) {
  function d3_format_typeDefault (line 2327) | function d3_format_typeDefault(x) {
  function d3_date_utc (line 2331) | function d3_date_utc() {
  function d3_time_interval (line 2397) | function d3_time_interval(local, step, number) {
  function d3_time_interval_utc (line 2445) | function d3_time_interval_utc(method) {
  function d3_locale_timeFormat (line 2505) | function d3_locale_timeFormat(locale) {
  function d3_time_formatPad (line 2725) | function d3_time_formatPad(value, fill, width) {
  function d3_time_formatRe (line 2729) | function d3_time_formatRe(names) {
  function d3_time_formatLookup (line 2732) | function d3_time_formatLookup(names) {
  function d3_time_parseWeekdayNumber (line 2737) | function d3_time_parseWeekdayNumber(date, string, i) {
  function d3_time_parseWeekNumberSunday (line 2742) | function d3_time_parseWeekNumberSunday(date, string, i) {
  function d3_time_parseWeekNumberMonday (line 2747) | function d3_time_parseWeekNumberMonday(date, string, i) {
  function d3_time_parseFullYear (line 2752) | function d3_time_parseFullYear(date, string, i) {
  function d3_time_parseYear (line 2757) | function d3_time_parseYear(date, string, i) {
  function d3_time_parseZone (line 2762) | function d3_time_parseZone(date, string, i) {
  function d3_time_expandYear (line 2766) | function d3_time_expandYear(d) {
  function d3_time_parseMonthNumber (line 2769) | function d3_time_parseMonthNumber(date, string, i) {
  function d3_time_parseDay (line 2774) | function d3_time_parseDay(date, string, i) {
  function d3_time_parseDayOfYear (line 2779) | function d3_time_parseDayOfYear(date, string, i) {
  function d3_time_parseHour24 (line 2784) | function d3_time_parseHour24(date, string, i) {
  function d3_time_parseMinutes (line 2789) | function d3_time_parseMinutes(date, string, i) {
  function d3_time_parseSeconds (line 2794) | function d3_time_parseSeconds(date, string, i) {
  function d3_time_parseMilliseconds (line 2799) | function d3_time_parseMilliseconds(date, string, i) {
  function d3_time_zone (line 2804) | function d3_time_zone(d) {
  function d3_time_parseLiteralPercent (line 2808) | function d3_time_parseLiteralPercent(date, string, i) {
  function d3_time_formatMulti (line 2813) | function d3_time_formatMulti(formats) {
  function d3_adder (line 2844) | function d3_adder() {}
  function d3_adderSum (line 2861) | function d3_adderSum(a, b, o) {
  function d3_geo_streamGeometry (line 2872) | function d3_geo_streamGeometry(geometry, listener) {
  function d3_geo_streamLine (line 2917) | function d3_geo_streamLine(coordinates, listener, closed) {
  function d3_geo_streamPolygon (line 2923) | function d3_geo_streamPolygon(coordinates, listener) {
  function d3_geo_areaRingStart (line 2952) | function d3_geo_areaRingStart() {
  function d3_geo_cartesian (line 2970) | function d3_geo_cartesian(spherical) {
  function d3_geo_cartesianDot (line 2974) | function d3_geo_cartesianDot(a, b) {
  function d3_geo_cartesianCross (line 2977) | function d3_geo_cartesianCross(a, b) {
  function d3_geo_cartesianAdd (line 2980) | function d3_geo_cartesianAdd(a, b) {
  function d3_geo_cartesianScale (line 2985) | function d3_geo_cartesianScale(vector, k) {
  function d3_geo_cartesianNormalize (line 2988) | function d3_geo_cartesianNormalize(d) {
  function d3_geo_spherical (line 2994) | function d3_geo_spherical(cartesian) {
  function d3_geo_sphericalEqual (line 2997) | function d3_geo_sphericalEqual(a, b) {
  function point (line 3022) | function point(λ, φ) {
  function linePoint (line 3027) | function linePoint(λ, φ) {
  function lineStart (line 3067) | function lineStart() {
  function lineEnd (line 3070) | function lineEnd() {
  function ringPoint (line 3075) | function ringPoint(λ, φ) {
  function ringStart (line 3083) | function ringStart() {
  function ringEnd (line 3086) | function ringEnd() {
  function angle (line 3093) | function angle(λ0, λ1) {
  function compareRanges (line 3096) | function compareRanges(a, b) {
  function withinRange (line 3099) | function withinRange(x, range) {
  function d3_geo_centroidPoint (line 3153) | function d3_geo_centroidPoint(λ, φ) {
  function d3_geo_centroidPointXYZ (line 3158) | function d3_geo_centroidPointXYZ(x, y, z) {
  function d3_geo_centroidLineStart (line 3164) | function d3_geo_centroidLineStart() {
  function d3_geo_centroidLineEnd (line 3185) | function d3_geo_centroidLineEnd() {
  function d3_geo_centroidRingStart (line 3188) | function d3_geo_centroidRingStart() {
  function d3_geo_compose (line 3218) | function d3_geo_compose(a, b) {
  function d3_true (line 3227) | function d3_true() {
  function d3_geo_clipPolygon (line 3230) | function d3_geo_clipPolygon(segments, compare, clipStartInside, interpol...
  function d3_geo_clipPolygonLinkCircular (line 3289) | function d3_geo_clipPolygonLinkCircular(array) {
  function d3_geo_clipPolygonIntersection (line 3300) | function d3_geo_clipPolygonIntersection(point, points, other, entry) {
  function d3_geo_clip (line 3308) | function d3_geo_clip(pointVisible, clipLine, interpolate, clipStart) {
  function d3_geo_clipSegmentLength1 (line 3400) | function d3_geo_clipSegmentLength1(segment) {
  function d3_geo_clipBufferListener (line 3403) | function d3_geo_clipBufferListener() {
  function d3_geo_clipSort (line 3424) | function d3_geo_clipSort(a, b) {
  function d3_geo_clipAntimeridianLine (line 3428) | function d3_geo_clipAntimeridianLine(listener) {
  function d3_geo_clipAntimeridianIntersect (line 3467) | function d3_geo_clipAntimeridianIntersect(λ0, φ0, λ1, φ1) {
  function d3_geo_clipAntimeridianInterpolate (line 3471) | function d3_geo_clipAntimeridianInterpolate(from, to, direction, listene...
  function d3_geo_pointInPolygon (line 3494) | function d3_geo_pointInPolygon(point, polygon) {
  function d3_geo_clipCircle (line 3523) | function d3_geo_clipCircle(radius) {
  function d3_geom_clipLine (line 3619) | function d3_geom_clipLine(x0, y0, x1, y1) {
  function d3_geo_clipExtent (line 3691) | function d3_geo_clipExtent(x0, y0, x1, y1) {
  function d3_geo_conic (line 3825) | function d3_geo_conic(projectAt) {
  function d3_geo_conicEqualArea (line 3833) | function d3_geo_conicEqualArea(φ0, φ1) {
  function albersUsa (line 3860) | function albersUsa(coordinates) {
  function d3_geo_pathAreaRingStart (line 3942) | function d3_geo_pathAreaRingStart() {
  function d3_geo_pathBoundsPoint (line 3964) | function d3_geo_pathBoundsPoint(x, y) {
  function d3_geo_pathBuffer (line 3970) | function d3_geo_pathBuffer() {
  function d3_geo_pathBufferCircle (line 4015) | function d3_geo_pathBufferCircle(radius) {
  function d3_geo_pathCentroidPoint (line 4031) | function d3_geo_pathCentroidPoint(x, y) {
  function d3_geo_pathCentroidLineStart (line 4036) | function d3_geo_pathCentroidLineStart() {
  function d3_geo_pathCentroidLineEnd (line 4050) | function d3_geo_pathCentroidLineEnd() {
  function d3_geo_pathCentroidRingStart (line 4053) | function d3_geo_pathCentroidRingStart() {
  function d3_geo_pathContext (line 4074) | function d3_geo_pathContext(context) {
  function d3_geo_resample (line 4114) | function d3_geo_resample(project) {
  function path (line 4194) | function path(object) {
  function reset (line 4233) | function reset() {
  function d3_geo_pathProjectStream (line 4239) | function d3_geo_pathProjectStream(project) {
  function d3_geo_transform (line 4256) | function d3_geo_transform(stream) {
  function d3_geo_transformPoint (line 4279) | function d3_geo_transformPoint(stream, point) {
  function d3_geo_projection (line 4301) | function d3_geo_projection(project) {
  function d3_geo_projectionMutator (line 4306) | function d3_geo_projectionMutator(projectAt) {
  function d3_geo_projectionRadians (line 4378) | function d3_geo_projectionRadians(stream) {
  function d3_geo_equirectangular (line 4383) | function d3_geo_equirectangular(λ, φ) {
  function forward (line 4391) | function forward(coordinates) {
  function d3_geo_identityRotation (line 4401) | function d3_geo_identityRotation(λ, φ) {
  function d3_geo_rotation (line 4405) | function d3_geo_rotation(δλ, δφ, δγ) {
  function d3_geo_forwardRotationλ (line 4408) | function d3_geo_forwardRotationλ(δλ) {
  function d3_geo_rotationλ (line 4413) | function d3_geo_rotationλ(δλ) {
  function d3_geo_rotationφγ (line 4418) | function d3_geo_rotationφγ(δφ, δγ) {
  function circle (line 4432) | function circle() {
  function d3_geo_circleInterpolate (line 4462) | function d3_geo_circleInterpolate(radius, precision) {
  function d3_geo_circleAngle (line 4479) | function d3_geo_circleAngle(cr, point) {
  function graticule (line 4492) | function graticule() {
  function lines (line 4498) | function lines() {
  function d3_geo_graticuleX (line 4564) | function d3_geo_graticuleX(y0, y1, dy) {
  function d3_geo_graticuleY (line 4572) | function d3_geo_graticuleY(x0, x1, dx) {
  function d3_source (line 4580) | function d3_source(d) {
  function d3_target (line 4583) | function d3_target(d) {
  function greatArc (line 4588) | function greatArc() {
  function d3_geo_interpolate (line 4615) | function d3_geo_interpolate(x0, y0, x1, y1) {
  function d3_geo_lengthLineStart (line 4640) | function d3_geo_lengthLineStart() {
  function d3_geo_azimuthal (line 4655) | function d3_geo_azimuthal(scale, angle) {
  function d3_geo_conicConformal (line 4681) | function d3_geo_conicConformal(φ0, φ1) {
  function d3_geo_conicEquidistant (line 4704) | function d3_geo_conicEquidistant(φ0, φ1) {
  function d3_geo_mercator (line 4726) | function d3_geo_mercator(λ, φ) {
  function d3_geo_mercatorProjection (line 4732) | function d3_geo_mercatorProjection(project) {
  function d3_geo_transverseMercator (line 4773) | function d3_geo_transverseMercator(λ, φ) {
  function d3_geom_pointX (line 4791) | function d3_geom_pointX(d) {
  function d3_geom_pointY (line 4794) | function d3_geom_pointY(d) {
  function hull (line 4800) | function hull(data) {
  function d3_geom_hullUpper (line 4822) | function d3_geom_hullUpper(points) {
  function d3_geom_hullOrder (line 4830) | function d3_geom_hullOrder(a, b) {
  function d3_geom_polygonInside (line 4884) | function d3_geom_polygonInside(p, a, b) {
  function d3_geom_polygonIntersect (line 4887) | function d3_geom_polygonIntersect(c, d, a, b) {
  function d3_geom_polygonClosed (line 4891) | function d3_geom_polygonClosed(coordinates) {
  function d3_geom_voronoiBeach (line 4896) | function d3_geom_voronoiBeach() {
  function d3_geom_voronoiCreateBeach (line 4900) | function d3_geom_voronoiCreateBeach(site) {
  function d3_geom_voronoiDetachBeach (line 4905) | function d3_geom_voronoiDetachBeach(beach) {
  function d3_geom_voronoiRemoveBeach (line 4911) | function d3_geom_voronoiRemoveBeach(beach) {
  function d3_geom_voronoiAddBeach (line 4947) | function d3_geom_voronoiAddBeach(site) {
  function d3_geom_voronoiLeftBreakPoint (line 5001) | function d3_geom_voronoiLeftBreakPoint(arc, directrix) {
  function d3_geom_voronoiRightBreakPoint (line 5013) | function d3_geom_voronoiRightBreakPoint(arc, directrix) {
  function d3_geom_voronoiCell (line 5019) | function d3_geom_voronoiCell(site) {
  function d3_geom_voronoiCloseCells (line 5032) | function d3_geom_voronoiCloseCells(extent) {
  function d3_geom_voronoiHalfEdgeOrder (line 5062) | function d3_geom_voronoiHalfEdgeOrder(a, b) {
  function d3_geom_voronoiCircle (line 5065) | function d3_geom_voronoiCircle() {
  function d3_geom_voronoiAttachCircle (line 5069) | function d3_geom_voronoiAttachCircle(arc) {
  function d3_geom_voronoiDetachCircle (line 5102) | function d3_geom_voronoiDetachCircle(arc) {
  function d3_geom_voronoiClipEdges (line 5112) | function d3_geom_voronoiClipEdges(extent) {
  function d3_geom_voronoiConnectEdge (line 5122) | function d3_geom_voronoiConnectEdge(edge, extent) {
  function d3_geom_voronoiEdge (line 5196) | function d3_geom_voronoiEdge(lSite, rSite) {
  function d3_geom_voronoiCreateEdge (line 5201) | function d3_geom_voronoiCreateEdge(lSite, rSite, va, vb) {
  function d3_geom_voronoiCreateBorderEdge (line 5210) | function d3_geom_voronoiCreateBorderEdge(lSite, va, vb) {
  function d3_geom_voronoiSetEdgeEnd (line 5217) | function d3_geom_voronoiSetEdgeEnd(edge, lSite, rSite, vertex) {
  function d3_geom_voronoiHalfEdge (line 5228) | function d3_geom_voronoiHalfEdge(edge, lSite, rSite) {
  function d3_geom_voronoiRedBlackTree (line 5242) | function d3_geom_voronoiRedBlackTree() {
  function d3_geom_voronoiRedBlackNode (line 5245) | function d3_geom_voronoiRedBlackNode(node) {
  function d3_geom_voronoiRedBlackRotateLeft (line 5408) | function d3_geom_voronoiRedBlackRotateLeft(tree, node) {
  function d3_geom_voronoiRedBlackRotateRight (line 5421) | function d3_geom_voronoiRedBlackRotateRight(tree, node) {
  function d3_geom_voronoiRedBlackFirst (line 5434) | function d3_geom_voronoiRedBlackFirst(node) {
  function d3_geom_voronoi (line 5438) | function d3_geom_voronoi(sites, bbox) {
  function d3_geom_voronoiVertexOrder (line 5467) | function d3_geom_voronoiVertexOrder(a, b) {
  function voronoi (line 5473) | function voronoi(data) {
  function sites (line 5484) | function sites(data) {
  function d3_geom_voronoiTriangleArea (line 5537) | function d3_geom_voronoiTriangleArea(a, b, c) {
  function quadtree (line 5555) | function quadtree(data) {
  function d3_geom_quadtreeCompatX (line 5650) | function d3_geom_quadtreeCompatX(d) {
  function d3_geom_quadtreeCompatY (line 5653) | function d3_geom_quadtreeCompatY(d) {
  function d3_geom_quadtreeNode (line 5656) | function d3_geom_quadtreeNode() {
  function d3_geom_quadtreeVisit (line 5665) | function d3_geom_quadtreeVisit(f, node, x1, y1, x2, y2) {
  function d3_geom_quadtreeFind (line 5674) | function d3_geom_quadtreeFind(root, x, y, x0, y0, x3, y3) {
  function d3_interpolateRgb (line 5711) | function d3_interpolateRgb(a, b) {
  function d3_interpolateObject (line 5720) | function d3_interpolateObject(a, b) {
  function d3_interpolateNumber (line 5740) | function d3_interpolateNumber(a, b) {
  function d3_interpolateString (line 5747) | function d3_interpolateString(a, b) {
  function d3_interpolate (line 5781) | function d3_interpolate(a, b) {
  function d3_interpolateArray (line 5791) | function d3_interpolateArray(a, b) {
  function d3_ease_clamp (line 5842) | function d3_ease_clamp(f) {
  function d3_ease_reverse (line 5847) | function d3_ease_reverse(f) {
  function d3_ease_reflect (line 5852) | function d3_ease_reflect(f) {
  function d3_ease_quad (line 5857) | function d3_ease_quad(t) {
  function d3_ease_cubic (line 5860) | function d3_ease_cubic(t) {
  function d3_ease_cubicInOut (line 5863) | function d3_ease_cubicInOut(t) {
  function d3_ease_poly (line 5869) | function d3_ease_poly(e) {
  function d3_ease_sin (line 5874) | function d3_ease_sin(t) {
  function d3_ease_exp (line 5877) | function d3_ease_exp(t) {
  function d3_ease_circle (line 5880) | function d3_ease_circle(t) {
  function d3_ease_elastic (line 5883) | function d3_ease_elastic(a, p) {
  function d3_ease_back (line 5891) | function d3_ease_back(s) {
  function d3_ease_bounce (line 5897) | function d3_ease_bounce(t) {
  function d3_interpolateHcl (line 5901) | function d3_interpolateHcl(a, b) {
  function d3_interpolateHsl (line 5912) | function d3_interpolateHsl(a, b) {
  function d3_interpolateLab (line 5923) | function d3_interpolateLab(a, b) {
  function d3_interpolateRound (line 5932) | function d3_interpolateRound(a, b) {
  function d3_transform (line 5948) | function d3_transform(m) {
  function d3_transformDot (line 5964) | function d3_transformDot(a, b) {
  function d3_transformNormalize (line 5967) | function d3_transformNormalize(a) {
  function d3_transformCombine (line 5975) | function d3_transformCombine(a, b, k) {
  function d3_interpolateTransform (line 5989) | function d3_interpolateTransform(a, b) {
  function d3_uninterpolateNumber (line 6041) | function d3_uninterpolateNumber(a, b) {
  function d3_uninterpolateClamp (line 6047) | function d3_uninterpolateClamp(a, b) {
  function d3_layout_bundlePath (line 6061) | function d3_layout_bundlePath(link) {
  function d3_layout_bundleAncestors (line 6074) | function d3_layout_bundleAncestors(node) {
  function d3_layout_bundleLeastCommonAncestor (line 6084) | function d3_layout_bundleLeastCommonAncestor(a, b) {
  function relayout (line 6096) | function relayout() {
  function resort (line 6162) | function resort() {
  function repulse (line 6209) | function repulse(node) {
  function position (line 6381) | function position(dimension, size) {
  function dragmove (line 6410) | function dragmove(d) {
  function d3_layout_forceDragstart (line 6416) | function d3_layout_forceDragstart(d) {
  function d3_layout_forceDragend (line 6419) | function d3_layout_forceDragend(d) {
  function d3_layout_forceMouseover (line 6422) | function d3_layout_forceMouseover(d) {
  function d3_layout_forceMouseout (line 6426) | function d3_layout_forceMouseout(d) {
  function d3_layout_forceAccumulate (line 6429) | function d3_layout_forceAccumulate(quad, alpha, charges) {
  function hierarchy (line 6459) | function hierarchy(root) {
  function d3_layout_hierarchyRebind (line 6515) | function d3_layout_hierarchyRebind(object, hierarchy) {
  function d3_layout_hierarchyVisitBefore (line 6521) | function d3_layout_hierarchyVisitBefore(node, callback) {
  function d3_layout_hierarchyVisitAfter (line 6531) | function d3_layout_hierarchyVisitAfter(node, callback) {
  function d3_layout_hierarchyChildren (line 6544) | function d3_layout_hierarchyChildren(d) {
  function d3_layout_hierarchyValue (line 6547) | function d3_layout_hierarchyValue(d) {
  function d3_layout_hierarchySort (line 6550) | function d3_layout_hierarchySort(a, b) {
  function d3_layout_hierarchyLinks (line 6553) | function d3_layout_hierarchyLinks(nodes) {
  function position (line 6565) | function position(node, x, dx, dy) {
  function depth (line 6580) | function depth(node) {
  function partition (line 6588) | function partition(d, i) {
  function pie (line 6602) | function pie(data) {
  function stack (line 6652) | function stack(data, index) {
  function d3_layout_stackX (line 6707) | function d3_layout_stackX(d) {
  function d3_layout_stackY (line 6710) | function d3_layout_stackY(d) {
  function d3_layout_stackOut (line 6713) | function d3_layout_stackOut(d, y0, y) {
  function d3_layout_stackOrderDefault (line 6780) | function d3_layout_stackOrderDefault(data) {
  function d3_layout_stackOffsetZero (line 6783) | function d3_layout_stackOffsetZero(data) {
  function d3_layout_stackMaxIndex (line 6788) | function d3_layout_stackMaxIndex(array) {
  function d3_layout_stackReduceSum (line 6798) | function d3_layout_stackReduceSum(d) {
  function d3_layout_stackSum (line 6801) | function d3_layout_stackSum(p, d) {
  function histogram (line 6806) | function histogram(data, i) {
  function d3_layout_histogramBinSturges (line 6850) | function d3_layout_histogramBinSturges(range, values) {
  function d3_layout_histogramBinFixed (line 6853) | function d3_layout_histogramBinFixed(range, n) {
  function d3_layout_histogramRange (line 6858) | function d3_layout_histogramRange(values) {
  function pack (line 6863) | function pack(d, i) {
  function d3_layout_packSort (line 6902) | function d3_layout_packSort(a, b) {
  function d3_layout_packInsert (line 6905) | function d3_layout_packInsert(a, b) {
  function d3_layout_packSplice (line 6912) | function d3_layout_packSplice(a, b) {
  function d3_layout_packIntersects (line 6916) | function d3_layout_packIntersects(a, b) {
  function d3_layout_packSiblings (line 6920) | function d3_layout_packSiblings(node) {
  function d3_layout_packLink (line 6984) | function d3_layout_packLink(node) {
  function d3_layout_packUnlink (line 6987) | function d3_layout_packUnlink(node) {
  function d3_layout_packTransform (line 6991) | function d3_layout_packTransform(node, x, y, k) {
  function d3_layout_packPlace (line 7001) | function d3_layout_packPlace(a, b, c) {
  function tree (line 7017) | function tree(d, i) {
  function wrapTree (line 7036) | function wrapTree(root0) {
  function firstWalk (line 7060) | function firstWalk(v) {
  function secondWalk (line 7076) | function secondWalk(v) {
  function apportion (line 7080) | function apportion(v, w, ancestor) {
  function sizeNode (line 7110) | function sizeNode(node) {
  function d3_layout_treeSeparation (line 7131) | function d3_layout_treeSeparation(a, b) {
  function d3_layout_treeLeft (line 7134) | function d3_layout_treeLeft(v) {
  function d3_layout_treeRight (line 7138) | function d3_layout_treeRight(v) {
  function d3_layout_treeMove (line 7142) | function d3_layout_treeMove(wm, wp, shift) {
  function d3_layout_treeShift (line 7150) | function d3_layout_treeShift(v) {
  function d3_layout_treeAncestor (line 7159) | function d3_layout_treeAncestor(vim, v, ancestor) {
  function cluster (line 7164) | function cluster(d, i) {
  function d3_layout_clusterY (line 7204) | function d3_layout_clusterY(children) {
  function d3_layout_clusterX (line 7209) | function d3_layout_clusterX(children) {
  function d3_layout_clusterLeft (line 7214) | function d3_layout_clusterLeft(node) {
  function d3_layout_clusterRight (line 7218) | function d3_layout_clusterRight(node) {
  function scale (line 7224) | function scale(children, k) {
  function squarify (line 7231) | function squarify(node) {
  function stickify (line 7258) | function stickify(node) {
  function worst (line 7275) | function worst(row, u) {
  function position (line 7286) | function position(row, u, rect, flush) {
  function treemap (line 7316) | function treemap(d) {
  function padFunction (line 7335) | function padFunction(node) {
  function padConstant (line 7339) | function padConstant(node) {
  function d3_layout_treemapPadNull (line 7370) | function d3_layout_treemapPadNull(node) {
  function d3_layout_treemapPad (line 7378) | function d3_layout_treemapPad(node, padding) {
  function d3_scaleExtent (line 7430) | function d3_scaleExtent(domain) {
  function d3_scaleRange (line 7434) | function d3_scaleRange(scale) {
  function d3_scale_bilinear (line 7437) | function d3_scale_bilinear(domain, range, uninterpolate, interpolate) {
  function d3_scale_nice (line 7443) | function d3_scale_nice(domain, nice) {
  function d3_scale_niceStep (line 7453) | function d3_scale_niceStep(step) {
  function d3_scale_polylinear (line 7467) | function d3_scale_polylinear(domain, range, uninterpolate, interpolate) {
  function d3_scale_linear (line 7485) | function d3_scale_linear(domain, range, interpolate, clamp) {
  function d3_scale_linearRebind (line 7537) | function d3_scale_linearRebind(scale, linear) {
  function d3_scale_linearNice (line 7540) | function d3_scale_linearNice(domain, m) {
  function d3_scale_linearTickRange (line 7543) | function d3_scale_linearTickRange(domain, m) {
  function d3_scale_linearTicks (line 7552) | function d3_scale_linearTicks(domain, m) {
  function d3_scale_linearTickFormat (line 7555) | function d3_scale_linearTickFormat(domain, m, format) {
  function d3_scale_linearPrecision (line 7583) | function d3_scale_linearPrecision(value) {
  function d3_scale_linearFormatPrecision (line 7586) | function d3_scale_linearFormatPrecision(type, range) {
  function d3_scale_log (line 7593) | function d3_scale_log(linear, base, positive, domain) {
  function d3_scale_pow (line 7665) | function d3_scale_pow(linear, exponent, domain) {
  function d3_scale_powPow (line 7699) | function d3_scale_powPow(e) {
  function d3_scale_ordinal (line 7713) | function d3_scale_ordinal(domain, ranger) {
  function d3_scale_quantile (line 7821) | function d3_scale_quantile(domain, range) {
  function d3_scale_quantize (line 7857) | function d3_scale_quantize(x0, x1, range) {
  function d3_scale_threshold (line 7891) | function d3_scale_threshold(domain, range) {
  function d3_scale_identity (line 7917) | function d3_scale_identity(domain) {
  function d3_zero (line 7939) | function d3_zero() {
  function arc (line 7944) | function arc() {
  function circleSegment (line 8017) | function circleSegment(r1, cw) {
  function d3_svg_arcInnerRadius (line 8062) | function d3_svg_arcInnerRadius(d) {
  function d3_svg_arcOuterRadius (line 8065) | function d3_svg_arcOuterRadius(d) {
  function d3_svg_arcStartAngle (line 8068) | function d3_svg_arcStartAngle(d) {
  function d3_svg_arcEndAngle (line 8071) | function d3_svg_arcEndAngle(d) {
  function d3_svg_arcPadAngle (line 8074) | function d3_svg_arcPadAngle(d) {
  function d3_svg_arcSweep (line 8077) | function d3_svg_arcSweep(x0, y0, x1, y1) {
  function d3_svg_arcCornerTangents (line 8080) | function d3_svg_arcCornerTangents(p0, p1, r1, rc, cw) {
  function d3_svg_line (line 8085) | function d3_svg_line(projection) {
  function d3_svg_lineLinear (line 8152) | function d3_svg_lineLinear(points) {
  function d3_svg_lineLinearClosed (line 8155) | function d3_svg_lineLinearClosed(points) {
  function d3_svg_lineStep (line 8158) | function d3_svg_lineStep(points) {
  function d3_svg_lineStepBefore (line 8164) | function d3_svg_lineStepBefore(points) {
  function d3_svg_lineStepAfter (line 8169) | function d3_svg_lineStepAfter(points) {
  function d3_svg_lineCardinalOpen (line 8174) | function d3_svg_lineCardinalOpen(points, tension) {
  function d3_svg_lineCardinalClosed (line 8177) | function d3_svg_lineCardinalClosed(points, tension) {
  function d3_svg_lineCardinal (line 8181) | function d3_svg_lineCardinal(points, tension) {
  function d3_svg_lineHermite (line 8184) | function d3_svg_lineHermite(points, tangents) {
  function d3_svg_lineCardinalTangents (line 8211) | function d3_svg_lineCardinalTangents(points, tension) {
  function d3_svg_lineBasis (line 8221) | function d3_svg_lineBasis(points) {
  function d3_svg_lineBasisOpen (line 8237) | function d3_svg_lineBasisOpen(points) {
  function d3_svg_lineBasisClosed (line 8257) | function d3_svg_lineBasisClosed(points) {
  function d3_svg_lineBundle (line 8276) | function d3_svg_lineBundle(points, tension) {
  function d3_svg_lineDot4 (line 8289) | function d3_svg_lineDot4(a, b) {
  function d3_svg_lineBasisBezier (line 8293) | function d3_svg_lineBasisBezier(path, x, y) {
  function d3_svg_lineSlope (line 8296) | function d3_svg_lineSlope(p0, p1) {
  function d3_svg_lineFiniteDifferences (line 8299) | function d3_svg_lineFiniteDifferences(points) {
  function d3_svg_lineMonotoneTangents (line 8307) | function d3_svg_lineMonotoneTangents(points) {
  function d3_svg_lineMonotone (line 8331) | function d3_svg_lineMonotone(points) {
  function d3_svg_lineRadial (line 8340) | function d3_svg_lineRadial(points) {
  function d3_svg_area (line 8351) | function d3_svg_area(projection) {
  function chord (line 8441) | function chord(d, i) {
  function subgroup (line 8445) | function subgroup(self, f, d, i) {
  function equals (line 8455) | function equals(a, b) {
  function arc (line 8458) | function arc(r, p, a) {
  function curve (line 8461) | function curve(r0, p0, r1, p1) {
  function d3_svg_chordRadius (line 8491) | function d3_svg_chordRadius(d) {
  function diagonal (line 8496) | function diagonal(d, i) {
  function d3_svg_diagonalProjection (line 8524) | function d3_svg_diagonalProjection(d) {
  function d3_svg_diagonalRadialProjection (line 8534) | function d3_svg_diagonalRadialProjection(projection) {
  function symbol (line 8542) | function symbol(d, i) {
  function d3_svg_symbolSize (line 8557) | function d3_svg_symbolSize() {
  function d3_svg_symbolType (line 8560) | function d3_svg_symbolType() {
  function d3_svg_symbolCircle (line 8563) | function d3_svg_symbolCircle(size) {
  function d3_selection_interruptNS (line 8612) | function d3_selection_interruptNS(ns) {
  function d3_transition (line 8622) | function d3_transition(groups, ns, id) {
  function d3_transition_tween (line 8694) | function d3_transition_tween(groups, name, value, tween) {
  function attrNull (line 8708) | function attrNull() {
  function attrNullNS (line 8711) | function attrNullNS() {
  function attrTween (line 8714) | function attrTween(b) {
  function attrTweenNS (line 8722) | function attrTweenNS(b) {
  function attrTween (line 8734) | function attrTween(d, i) {
  function attrTweenNS (line 8740) | function attrTweenNS(d, i) {
  function styleNull (line 8758) | function styleNull() {
  function styleString (line 8761) | function styleString(b) {
  function styleTween (line 8773) | function styleTween(d, i) {
  function d3_transition_text (line 8784) | function d3_transition_text(b) {
  function d3_transitionNamespace (line 8864) | function d3_transitionNamespace(name) {
  function d3_transitionNode (line 8867) | function d3_transitionNode(node, i, ns, id, inherit) {
  function axis (line 8931) | function axis(g) {
  function d3_svg_axisX (line 9026) | function d3_svg_axisX(selection, x0, x1) {
  function d3_svg_axisY (line 9032) | function d3_svg_axisY(selection, y0, y1) {
  function brush (line 9040) | function brush(g) {
  function redraw (line 9126) | function redraw(g) {
  function redrawX (line 9131) | function redrawX(g) {
  function redrawY (line 9135) | function redrawY(g) {
  function brushstart (line 9139) | function brushstart() {
  function d3_time_formatIsoNative (line 9332) | function d3_time_formatIsoNative(date) {
  function d3_time_scale (line 9379) | function d3_time_scale(linear, methods, format) {
  function d3_time_scaleDate (line 9429) | function d3_time_scaleDate(t) {
  function d3_json (line 9488) | function d3_json(request) {
  function d3_html (line 9494) | function d3_html(request) {

FILE: cluster/benchmarks/bower_components/plottable/plottable.d.ts
  class Map (line 61) | class Map<K, V> {
  class Set (line 77) | class Set<T> {
  class CallbackSet (line 241) | class CallbackSet<CB extends Function> extends Set<CB> {
  type StackedDatum (line 246) | type StackedDatum = {
  type StackingResult (line 250) | type StackingResult = Utils.Map<Dataset, Utils.Map<string, StackedDatum>>;
  class ClientToSVGTranslator (line 306) | class ClientToSVGTranslator {
  type DatasetCallback (line 340) | type DatasetCallback = (dataset: Dataset) => void;
  class Dataset (line 341) | class Dataset {
  type RenderPolicy (line 400) | interface RenderPolicy {
  class Immediate (line 407) | class Immediate implements RenderPolicy {
  class AnimationFrame (line 414) | class AnimationFrame implements RenderPolicy {
  class Timeout (line 422) | class Timeout implements RenderPolicy {
  type Accessor (line 475) | interface Accessor<T> {
  type Projector (line 482) | type Projector = (datum: any, index: number, dataset: Dataset) => any;
  type AttributeToProjector (line 487) | type AttributeToProjector = {
  type AppliedProjector (line 494) | type AppliedProjector = (datum: any, index: number) => any;
  type AttributeToAppliedProjector (line 498) | type AttributeToAppliedProjector = {
  type SpaceRequest (line 507) | type SpaceRequest = {
  type Range (line 514) | type Range = {
  type Point (line 521) | type Point = {
  type Bounds (line 528) | type Bounds = {
  type Entity (line 535) | interface Entity<C extends Component> {
  type Formatter (line 543) | type Formatter = (d: any) => string;
  type SymbolFactory (line 635) | type SymbolFactory = (symbolSize: number) => string;
  type IncludedValuesProvider (line 652) | interface IncludedValuesProvider<D> {
  type PaddingExceptionsProvider (line 663) | interface PaddingExceptionsProvider<D> {
  type ScaleCallback (line 668) | interface ScaleCallback<S extends Scale<any, any>> {
  class Scale (line 671) | class Scale<D, R> {
  class QuantitativeScale (line 769) | class QuantitativeScale<D> extends Scale<D, number> {
  class Linear (line 891) | class Linear extends QuantitativeScale<number> {
  class ModifiedLog (line 911) | class ModifiedLog extends QuantitativeScale<number> {
  class Category (line 993) | class Category extends Scale<string, number> {
  class Color (line 1072) | class Color extends Scale<string, string> {
  class Time (line 1106) | class Time extends QuantitativeScale<Date> {
  class InterpolatedColor (line 1142) | class InterpolatedColor extends Scale<number, string> {
  type TickGenerator (line 1181) | interface TickGenerator<D> {
  type DrawStep (line 1207) | type DrawStep = {
  type AppliedDrawStep (line 1214) | type AppliedDrawStep = {
  class Drawer (line 1219) | class Drawer {
  class Line (line 1289) | class Line extends Drawer {
  class Area (line 1296) | class Area extends Drawer {
  class Rectangle (line 1303) | class Rectangle extends Drawer {
  class Arc (line 1308) | class Arc extends Drawer {
  class ArcOutline (line 1314) | class ArcOutline extends Drawer {
  class Symbol (line 1320) | class Symbol extends Drawer {
  class Segment (line 1325) | class Segment extends Drawer {
  type ComponentCallback (line 1330) | type ComponentCallback = (component: Component) => void;
  class Alignment (line 1332) | class Alignment {
  class Component (line 1340) | class Component {
  class ComponentContainer (line 1587) | class ComponentContainer extends Component {
  class Group (line 1619) | class Group extends ComponentContainer {
  class PlotGroup (line 1657) | class PlotGroup extends Group {
  class Axis (line 1667) | class Axis<D> extends Component {
  type TimeAxisTierConfiguration (line 1900) | type TimeAxisTierConfiguration = {
  type TimeAxisConfiguration (line 1910) | type TimeAxisConfiguration = TimeAxisTierConfiguration[];
  class Time (line 1911) | class Time extends Axis<Date> {
  class Numeric (line 1993) | class Numeric extends Axis<number> {
  class Category (line 2068) | class Category extends Axis<string> {
  class Label (line 2117) | class Label extends Component {
  class TitleLabel (line 2172) | class TitleLabel extends Label {
  class AxisLabel (line 2181) | class AxisLabel extends Label {
  class Legend (line 2192) | class Legend extends Component {
  class InterpolatedColorLegend (line 2319) | class InterpolatedColorLegend extends Component {
  class Gridlines (line 2394) | class Gridlines extends Component {
  class Table (line 2415) | class Table extends ComponentContainer {
  type PropertyMode (line 2551) | enum PropertyMode {
  class SelectionBoxLayer (line 2555) | class SelectionBoxLayer extends Component {
  class GuideLineLayer (line 2668) | class GuideLineLayer<D> extends Component {
  type PlotEntity (line 2738) | interface PlotEntity extends Entity<Plot> {
  type AccessorScaleBinding (line 2743) | interface AccessorScaleBinding<D, R> {
  class Plot (line 2753) | class Plot extends Component {
  class Pie (line 2908) | class Pie extends Plot {
  class XYPlot (line 3032) | class XYPlot<X, Y> extends Plot {
  class Rectangle (line 3143) | class Rectangle<X, Y> extends XYPlot<X, Y> {
  class Scatter (line 3296) | class Scatter<X, Y> extends XYPlot<X, Y> {
  class Bar (line 3367) | class Bar<X, Y> extends XYPlot<X, Y> {
  class Line (line 3518) | class Line<X> extends XYPlot<X, number> {
  class Area (line 3622) | class Area<X> extends Line<X> {
  class ClusteredBar (line 3665) | class ClusteredBar<X, Y> extends Bar<X, Y> {
  class StackedArea (line 3685) | class StackedArea<X> extends Area<X> {
  class StackedBar (line 3740) | class StackedBar<X, Y> extends Bar<X, Y> {
  class Segment (line 3770) | class Segment<X, Y> extends XYPlot<X, Y> {
  class Waterfall (line 3869) | class Waterfall<X, Y> extends Bar<X, number> {
  type Animator (line 3918) | interface Animator {
  class Null (line 3944) | class Null implements Animator {
  class Easing (line 3953) | class Easing implements Animator {
  class Dispatcher (line 4067) | class Dispatcher {
  type MouseCallback (line 4082) | type MouseCallback = (p: Point, event: MouseEvent) => void;
  class Mouse (line 4083) | class Mouse extends Dispatcher {
  type TouchCallback (line 4194) | type TouchCallback = (ids: number[], idToPoint: {
  class Touch (line 4197) | class Touch extends Dispatcher {
  type KeyCallback (line 4284) | type KeyCallback = (keyCode: number, event: KeyboardEvent) => void;
  class Key (line 4285) | class Key extends Dispatcher {
  class Interaction (line 4334) | class Interaction {
  type ClickCallback (line 4387) | type ClickCallback = (point: Point) => void;
  class Click (line 4390) | class Click extends Interaction {
  class DoubleClick (line 4421) | class DoubleClick extends Interaction {
  type KeyCallback (line 4458) | type KeyCallback = (keyCode: number) => void;
  class Key (line 4460) | class Key extends Interaction {
  type PointerCallback (line 4517) | type PointerCallback = (point: Point) => void;
  class Pointer (line 4520) | class Pointer extends Interaction {
  class PanZoom (line 4579) | class PanZoom extends Interaction {
  type DragCallback (line 4712) | type DragCallback = (start: Point, end: Point) => void;
  class Drag (line 4715) | class Drag extends Interaction {
  type DragBoxCallback (line 4804) | type DragBoxCallback = (bounds: Bounds) => void;
  class DragBoxLayer (line 4807) | class DragBoxLayer extends Components.SelectionBoxLayer {
  class XDragBoxLayer (line 4935) | class XDragBoxLayer extends DragBoxLayer {
  class YDragBoxLayer (line 4961) | class YDragBoxLayer extends DragBoxLayer {
  type DragLineCallback (line 4987) | interface DragLineCallback<D> {
  class DragLineLayer (line 4992) | class DragLineLayer<D> extends GuideLineLayer<D> {

FILE: cluster/benchmarks/bower_components/plottable/plottable.js
  function __ (line 25) | function __() { this.constructor = d; }
  function inRange (line 43) | function inRange(x, a, b) {
  function clamp (line 55) | function clamp(x, min, max) {
  function max (line 59) | function max(array, firstArg, secondArg) {
  function min (line 68) | function min(array, firstArg, secondArg) {
  function isNaN (line 80) | function isNaN(n) {
  function isValidNumber (line 88) | function isValidNumber(n) {
  function range (line 96) | function range(start, stop, step) {
  function distanceSquared (line 116) | function distanceSquared(p1, p2) {
  function degreesToRadians (line 120) | function degreesToRadians(degree) {
  function Map (line 136) | function Map() {
  function Set (line 220) | function Set() {
  function elementBBox (line 289) | function elementBBox(element) {
  function requestAnimationFramePolyfill (line 312) | function requestAnimationFramePolyfill(callback) {
  function elementWidth (line 328) | function elementWidth(element) {
  function elementHeight (line 344) | function elementHeight(element) {
  function translate (line 353) | function translate(selection, x, y) {
  function clientRectsOverlap (line 372) | function clientRectsOverlap(clientRectA, clientRectB) {
  function clientRectInside (line 395) | function clientRectInside(innerClientRect, outerClientRect) {
  function boundingSVG (line 408) | function boundingSVG(element) {
  function generateUniqueClipPathId (line 423) | function generateUniqueClipPathId() {
  function intersectsBBox (line 439) | function intersectsBBox(xValOrRange, yValOrRange, bbox, tolerance) {
  function _parseRange (line 460) | function _parseRange(input) {
  function _parseStyleValue (line 471) | function _parseStyleValue(style, property) {
  function contrast (line 493) | function contrast(a, b) {
  function lightenColor (line 503) | function lightenColor(color, factor) {
  function colorTest (line 516) | function colorTest(colorTester, className) {
  function luminance (line 544) | function luminance(color) {
  function add (line 572) | function add(aList, bList) {
  function uniq (line 586) | function uniq(arr) {
  function flatten (line 602) | function flatten(a) {
  function createFilledArray (line 613) | function createFilledArray(value, count) {
  function CallbackSet (line 635) | function CallbackSet() {
  function stack (line 669) | function stack(datasets, keyAccessor, valueAccessor) {
  function stackedExtent (line 706) | function stackedExtent(stackingResult, keyAccessor, filter) {
  function normalizeKey (line 728) | function normalizeKey(key) {
  function warn (line 746) | function warn(warning) {
  function setTimeout (line 770) | function setTimeout(f, time) {
  function deprecated (line 794) | function deprecated(callingMethod, version, message) {
  function ClientToSVGTranslator (line 808) | function ClientToSVGTranslator(svg) {
  function Dataset (line 903) | function Dataset(data, metadata) {
  function Immediate (line 963) | function Immediate() {
  function AnimationFrame (line 976) | function AnimationFrame() {
  function Timeout (line 990) | function Timeout() {
  function renderPolicy (line 1032) | function renderPolicy(renderPolicy) {
  function registerToRender (line 1056) | function registerToRender(component) {
  function registerToComputeLayout (line 1069) | function registerToComputeLayout(component) {
  function requestRender (line 1075) | function requestRender() {
  function flush (line 1088) | function flush() {
  function currency (line 1128) | function currency(precision, symbol, prefix) {
  function fixed (line 1157) | function fixed(precision) {
  function general (line 1171) | function general(maxNumberOfDecimalPlaces) {
  function identity (line 1190) | function identity() {
  function percentage (line 1202) | function percentage(precision) {
  function siSuffix (line 1223) | function siSuffix(numberOfSignificantFigures) {
  function shortScale (line 1244) | function shortScale(precision) {
  function multiTime (line 1287) | function multiTime() {
  function time (line 1342) | function time(specifier) {
  function verifyPrecision (line 1346) | function verifyPrecision(precision) {
  function circle (line 1360) | function circle() {
  function square (line 1364) | function square() {
  function cross (line 1368) | function cross() {
  function diamond (line 1372) | function diamond() {
  function triangleUp (line 1376) | function triangleUp() {
  function triangleDown (line 1380) | function triangleDown() {
  function Scale (line 1394) | function Scale() {
  function QuantitativeScale (line 1543) | function QuantitativeScale() {
  function Linear (line 1775) | function Linear() {
  function ModifiedLog (line 1857) | function ModifiedLog(base) {
  function Category (line 2034) | function Category() {
  function Color (line 2144) | function Color(scaleType) {
  function Time (line 2258) | function Time() {
  function InterpolatedColor (line 2369) | function InterpolatedColor(scaleType) {
  function intervalTickGenerator (line 2524) | function intervalTickGenerator(interval) {
  function integerTickGenerator (line 2546) | function integerTickGenerator() {
  function Drawer (line 2565) | function Drawer(dataset) {
  function Line (line 2695) | function Line(dataset) {
  function Area (line 2718) | function Area(dataset) {
  function Rectangle (line 2741) | function Rectangle(dataset) {
  function Arc (line 2756) | function Arc(dataset) {
  function ArcOutline (line 2776) | function ArcOutline(dataset) {
  function Symbol (line 2796) | function Symbol(dataset) {
  function Segment (line 2812) | function Segment(dataset) {
  function Alignment (line 2826) | function Alignment() {
  function Component (line 2838) | function Component() {
  function ComponentContainer (line 3359) | function ComponentContainer() {
  function Group (line 3440) | function Group(components) {
  function PlotGroup (line 3522) | function PlotGroup() {
  function Axis (line 3569) | function Axis(scale, orientation) {
  function Time (line 4132) | function Time(scale, orientation) {
  function Numeric (line 4593) | function Numeric(scale, orientation) {
  function Category (line 4921) | function Category(scale, orientation) {
  function Label (line 5127) | function Label(displayText, angle) {
  function TitleLabel (line 5238) | function TitleLabel(text, angle) {
  function AxisLabel (line 5253) | function AxisLabel(text, angle) {
  function Legend (line 5275) | function Legend(colorScale) {
  function InterpolatedColorLegend (line 5555) | function InterpolatedColorLegend(interpolatedColorScale) {
  function Gridlines (line 5793) | function Gridlines(xScale, yScale) {
  function Table (line 5897) | function Table(rows) {
  function SelectionBoxLayer (line 6276) | function SelectionBoxLayer() {
  function GuideLineLayer (line 6493) | function GuideLineLayer(orientation) {
  function Plot (line 6645) | function Plot() {
  function Pie (line 7090) | function Pie() {
  function XYPlot (line 7414) | function XYPlot() {
  function Rectangle (line 7718) | function Rectangle() {
  function Scatter (line 8059) | function Scatter() {
  function Bar (line 8191) | function Bar(orientation) {
  function Line (line 8792) | function Line() {
  function Area (line 9189) | function Area() {
  function ClusteredBar (line 9360) | function ClusteredBar(orientation) {
  function StackedArea (line 9413) | function StackedArea() {
  function StackedBar (line 9598) | function StackedBar(orientation) {
  function Segment (line 9708) | function Segment() {
  function Waterfall (line 9875) | function Waterfall() {
  function Null (line 10075) | function Null() {
  function Easing (line 10101) | function Easing() {
  function Dispatcher (line 10204) | function Dispatcher() {
  function Mouse (line 10278) | function Mouse(svg) {
  function Touch (line 10465) | function Touch(svg) {
  function Key (line 10627) | function Key() {
  function Interaction (line 10703) | function Interaction() {
  function Click (line 10799) | function Click() {
  function DoubleClick (line 10881) | function DoubleClick() {
  function Key (line 10980) | function Key() {
  function Pointer (line 11092) | function Pointer() {
  function PanZoom (line 11217) | function PanZoom(xScale, yScale) {
  function Drag (line 11526) | function Drag() {
  function DragBoxLayer (line 11686) | function DragBoxLayer() {
  function XDragBoxLayer (line 12051) | function XDragBoxLayer() {
  function YDragBoxLayer (line 12105) | function YDragBoxLayer() {
  function DragLineLayer (line 12157) | function DragLineLayer(orientation) {
  function arrayEq (line 12343) | function arrayEq(a, b) {
  function objEq (line 12367) | function objEq(a, b) {
  function transform (line 12388) | function transform(s, x, y) {
  function getBBox (line 12402) | function getBBox(element) {
  function Cache (line 12436) | function Cache(compute, valueEq) {
  function Tokenizer (line 12476) | function Tokenizer() {
  function combineWhitespace (line 12521) | function combineWhitespace(str) {
  function isNotEmptyString (line 12525) | function isNotEmptyString(str) {
  function trimStart (line 12529) | function trimStart(str, c) {
  function trimEnd (line 12538) | function trimEnd(str, c) {
  function BaseAnimator (line 12559) | function BaseAnimator() {
  function __ (line 12638) | function __() { this.constructor = d; }
  function UnveilAnimator (line 12648) | function UnveilAnimator() {
  function __ (line 12704) | function __() { this.constructor = d; }
  function OpacityAnimator (line 12714) | function OpacityAnimator() {
  function Wrapper (line 12738) | function Wrapper() {
  function __ (line 12947) | function __() { this.constructor = d; }
  function SingleLineWrapper (line 12957) | function SingleLineWrapper() {
  function Writer (line 13003) | function Writer(measurer, wrapper) {
  function AbstractMeasurer (line 13128) | function AbstractMeasurer(area, className) {
  function __ (line 13186) | function __() { this.constructor = d; }
  function Measurer (line 13196) | function Measurer(area, className, useGuards) {
  function __ (line 13239) | function __() { this.constructor = d; }
  function CharacterMeasurer (line 13249) | function CharacterMeasurer() {
  function __ (line 13272) | function __() { this.constructor = d; }
  function CacheCharacterMeasurer (line 13282) | function CacheCharacterMeasurer(area, className) {

FILE: cluster/benchmarks/dashboard_app/main.py
  function argument_name (line 41) | def argument_name(argument):
  function index (line 59) | def index(pattern=None):
  function test (line 86) | def test(test_id):
  function benchmark_data (line 133) | def benchmark_data():
  function server_error (line 153) | def server_error(e):

FILE: cluster/benchmarks/dashboard_app/main_test.py
  class TestMain (line 21) | class TestMain(unittest.TestCase):
    method testArgumentInvalidFormat (line 23) | def testArgumentInvalidFormat(self):
    method testArgumentValidFormat (line 31) | def testArgumentValidFormat(self):
    method testIndexPage (line 35) | def testIndexPage(self):
    method testTestPage_InvalidTest (line 43) | def testTestPage_InvalidTest(self):
    method testTestPage_SampleTest (line 51) | def testTestPage_SampleTest(self):
    method testFetchBenchmarkData_InvalidTest (line 62) | def testFetchBenchmarkData_InvalidTest(self):
    method testFetchBenchmarkData_SampleTest (line 70) | def testFetchBenchmarkData_SampleTest(self):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py
  function define_flags (line 375) | def define_flags():
  class GlobalStepWatcher (line 394) | class GlobalStepWatcher(threading.Thread):
    method __init__ (line 401) | def __init__(self, sess, global_step_op,
    method run (line 414) | def run(self):
    method done (line 432) | def done(self):
    method num_steps (line 435) | def num_steps(self):
    method elapsed_time (line 438) | def elapsed_time(self):
  class CheckpointNotFoundException (line 442) | class CheckpointNotFoundException(Exception):
  function get_data_type (line 446) | def get_data_type(params):
  function loss_function (line 456) | def loss_function(logits, labels, aux_logits):
  function create_config_proto (line 471) | def create_config_proto(params):
  function get_mode_from_params (line 492) | def get_mode_from_params(params):
  function benchmark_one_step (line 511) | def benchmark_one_step(sess,
  function get_perf_timing_str (line 557) | def get_perf_timing_str(batch_size, step_train_times, scale=1):
  function load_checkpoint (line 571) | def load_checkpoint(saver, sess, ckpt_dir):
  function make_params (line 604) | def make_params(**kwargs):
  function make_params_from_flags (line 622) | def make_params_from_flags():
  class BenchmarkCNN (line 634) | class BenchmarkCNN(object):
    method __init__ (line 637) | def __init__(self, params):
    method reset_devices_for_task (line 838) | def reset_devices_for_task(self, task_num, is_local=False):
    method raw_devices_across_tasks (line 847) | def raw_devices_across_tasks(self, is_local=False):
    method print_info (line 858) | def print_info(self):
    method run (line 889) | def run(self):
    method _eval_cnn (line 918) | def _eval_cnn(self):
    method _eval_once (line 945) | def _eval_once(self, saver, summary_writer, target,
    method _benchmark_cnn (line 997) | def _benchmark_cnn(self):
    method _build_image_processing (line 1202) | def _build_image_processing(self, shift_ratio=0):
    method _build_model (line 1228) | def _build_model(self):
    method _build_fetches (line 1299) | def _build_fetches(self, global_step, all_logits, losses, device_grads,
    method _build_model_single_session (line 1407) | def _build_model_single_session(self):
    method add_forward_pass_and_gradients (line 1506) | def add_forward_pass_and_gradients(
    method get_image_preprocessor (line 1638) | def get_image_preprocessor(self):
    method add_sync_queues_and_barrier (line 1670) | def add_sync_queues_and_barrier(self, name_prefix,
  function store_benchmarks (line 1707) | def store_benchmarks(names_to_values, params):
  function setup (line 1712) | def setup(params):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/benchmark_storage.py
  function store_benchmark (line 18) | def store_benchmark(data, storage_type=None):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/cbuild_benchmark_storage.py
  function upload_to_benchmark_datastore (line 32) | def upload_to_benchmark_datastore(data, test_name=None, start_time=None):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/cnn_util.py
  function tensorflow_version_tuple (line 30) | def tensorflow_version_tuple():
  function tensorflow_version (line 36) | def tensorflow_version():
  function log_fn (line 41) | def log_fn(log):
  class Barrier (line 48) | class Barrier(object):
    method __init__ (line 58) | def __init__(self, parties):
    method wait (line 69) | def wait(self):
    method abort (line 87) | def abort(self):
  class ImageProducer (line 96) | class ImageProducer(object):
    method __init__ (line 120) | def __init__(self, sess, put_ops, batch_group_size):
    method _should_put (line 132) | def _should_put(self):
    method done (line 135) | def done(self):
    method start (line 141) | def start(self):
    method notify_image_consumption (line 148) | def notify_image_consumption(self):
    method _loop_producer (line 158) | def _loop_producer(self):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py
  class ConvNetBuilder (line 32) | class ConvNetBuilder(object):
    method __init__ (line 35) | def __init__(self,
    method get_custom_getter (line 58) | def get_custom_getter(self):
    method switch_to_aux_top_layer (line 98) | def switch_to_aux_top_layer(self):
    method get_variable (line 112) | def get_variable(self, name, shape, dtype, cast_dtype, *args, **kwargs):
    method _conv2d_impl (line 120) | def _conv2d_impl(self, input_layer, num_channels_in, filters, kernel_s...
    method conv (line 143) | def conv(self,
    method _pool (line 228) | def _pool(self,
    method mpool (line 263) | def mpool(self,
    method apool (line 275) | def apool(self,
    method reshape (line 288) | def reshape(self, shape, input_layer=None):
    method affine (line 295) | def affine(self,
    method inception_module (line 329) | def inception_module(self, name, cols, input_layer=None, in_size=None):
    method spatial_mean (line 367) | def spatial_mean(self, keep_dims=False):
    method dropout (line 375) | def dropout(self, keep_prob=0.5, input_layer=None):
    method _batch_norm_without_layers (line 391) | def _batch_norm_without_layers(self, input_layer, decay, use_scale, ep...
    method batch_norm (line 433) | def batch_norm(self, input_layer=None, decay=0.999, scale=False,
    method lrn (line 461) | def lrn(self, depth_radius, bias, alpha, beta):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/datasets.py
  function create_dataset (line 35) | def create_dataset(data_dir, data_name):
  class Dataset (line 62) | class Dataset(object):
    method __init__ (line 65) | def __init__(self, name, height=None, width=None, depth=None, data_dir...
    method tf_record_pattern (line 76) | def tf_record_pattern(self, subset):
    method reader (line 79) | def reader(self):
    method num_classes (line 83) | def num_classes(self):
    method num_classes (line 87) | def num_classes(self, val):
    method num_examples_per_epoch (line 91) | def num_examples_per_epoch(self, subset):
    method __str__ (line 94) | def __str__(self):
    method get_image_preprocessor (line 97) | def get_image_preprocessor(self):
    method queue_runner_required (line 100) | def queue_runner_required(self):
    method use_synthetic_gpu_images (line 103) | def use_synthetic_gpu_images(self):
  class ImagenetData (line 107) | class ImagenetData(Dataset):
    method __init__ (line 110) | def __init__(self, data_dir=None):
    method num_examples_per_epoch (line 115) | def num_examples_per_epoch(self, subset='train'):
    method get_image_preprocessor (line 123) | def get_image_preprocessor(self):
  class SyntheticData (line 127) | class SyntheticData(Dataset):
    method __init__ (line 130) | def __init__(self, unused_data_dir):
    method get_image_preprocessor (line 133) | def get_image_preprocessor(self):
    method use_synthetic_gpu_images (line 136) | def use_synthetic_gpu_images(self):
  class Cifar10Data (line 140) | class Cifar10Data(Dataset):
    method __init__ (line 146) | def __init__(self, data_dir=None):
    method read_data_files (line 153) | def read_data_files(self, subset='train'):
    method num_examples_per_epoch (line 175) | def num_examples_per_epoch(self, subset='train'):
    method get_image_preprocessor (line 183) | def get_image_preprocessor(self):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/alexnet_model.py
  class AlexnetModel (line 28) | class AlexnetModel(model.Model):
    method __init__ (line 31) | def __init__(self):
    method add_inference (line 34) | def add_inference(self, cnn):
  class AlexnetCifar10Model (line 51) | class AlexnetCifar10Model(model.Model):
    method __init__ (line 61) | def __init__(self):
    method add_inference (line 64) | def add_inference(self, cnn):
    method get_learning_rate (line 77) | def get_learning_rate(self, global_step, batch_size):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/densenet_model.py
  class DensenetCifar10Model (line 27) | class DensenetCifar10Model(model_lib.Model):
    method __init__ (line 30) | def __init__(self, model, layer_counts, growth_rate):
    method dense_block (line 36) | def dense_block(self, cnn, growth_rate):
    method transition_layer (line 46) | def transition_layer(self, cnn):
    method add_inference (line 53) | def add_inference(self, cnn):
    method get_learning_rate (line 77) | def get_learning_rate(self, global_step, batch_size):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/googlenet_model.py
  class GooglenetModel (line 28) | class GooglenetModel(model.Model):
    method __init__ (line 30) | def __init__(self):
    method add_inference (line 33) | def add_inference(self, cnn):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/inception_model.py
  class Inceptionv3Model (line 44) | class Inceptionv3Model(model.Model):
    method __init__ (line 46) | def __init__(self, auxiliary=False):
    method add_inference (line 50) | def add_inference(self, cnn):
  function inception_v4_sa (line 125) | def inception_v4_sa(cnn):
  function inception_v4_sb (line 130) | def inception_v4_sb(cnn):
  function inception_v4_sc (line 137) | def inception_v4_sc(cnn):
  function inception_v4_ra (line 144) | def inception_v4_ra(cnn, k, l, m, n):
  function inception_v4_rb (line 152) | def inception_v4_rb(cnn):
  class Inceptionv4Model (line 160) | class Inceptionv4Model(model.Model):
    method __init__ (line 162) | def __init__(self):
    method add_inference (line 165) | def add_inference(self, cnn):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/lenet_model.py
  class Lenet5Model (line 27) | class Lenet5Model(model.Model):
    method __init__ (line 29) | def __init__(self):
    method add_inference (line 32) | def add_inference(self, cnn):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/model.py
  class Model (line 18) | class Model(object):
    method __init__ (line 21) | def __init__(self,
    method get_model (line 38) | def get_model(self):
    method get_image_size (line 41) | def get_image_size(self):
    method get_batch_size (line 44) | def get_batch_size(self):
    method set_batch_size (line 47) | def set_batch_size(self, batch_size):
    method get_default_batch_size (line 50) | def get_default_batch_size(self):
    method get_layer_counts (line 53) | def get_layer_counts(self):
    method get_fp16_loss_scale (line 56) | def get_fp16_loss_scale(self):
    method get_learning_rate (line 59) | def get_learning_rate(self, global_step, batch_size):
    method add_inference (line 64) | def add_inference(self, unused_cnn):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/model_config.py
  function get_model_config (line 30) | def get_model_config(model, dataset):
  function get_cifar10_model_config (line 66) | def get_cifar10_model_config(model):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/overfeat_model.py
  class OverfeatModel (line 29) | class OverfeatModel(model.Model):
    method __init__ (line 31) | def __init__(self):
    method add_inference (line 34) | def add_inference(self, cnn):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/resnet_model.py
  function bottleneck_block_v1 (line 41) | def bottleneck_block_v1(cnn, depth, depth_bottleneck, stride):
  function bottleneck_block_v2 (line 81) | def bottleneck_block_v2(cnn, depth, depth_bottleneck, stride):
  function bottleneck_block (line 126) | def bottleneck_block(cnn, depth, depth_bottleneck, stride, pre_activation):
  function residual_block (line 142) | def residual_block(cnn, depth, stride, pre_activation):
  class ResnetModel (line 187) | class ResnetModel(model_lib.Model):
    method __init__ (line 190) | def __init__(self, model, layer_counts):
    method add_inference (line 204) | def add_inference(self, cnn):
    method get_learning_rate (line 227) | def get_learning_rate(self, global_step, batch_size):
  class ResnetCifar10Model (line 235) | class ResnetCifar10Model(model_lib.Model):
    method __init__ (line 245) | def __init__(self, model, layer_counts):
    method add_inference (line 250) | def add_inference(self, cnn):
    method get_learning_rate (line 277) | def get_learning_rate(self, global_step, batch_size):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/trivial_model.py
  class TrivialModel (line 20) | class TrivialModel(model.Model):
    method __init__ (line 23) | def __init__(self):
    method add_inference (line 26) | def add_inference(self, cnn):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/models/vgg_model.py
  function _construct_vgg (line 31) | def _construct_vgg(cnn, num_conv_layers):
  class Vgg11Model (line 56) | class Vgg11Model(model.Model):
    method __init__ (line 58) | def __init__(self):
    method add_inference (line 61) | def add_inference(self, cnn):
  class Vgg16Model (line 65) | class Vgg16Model(model.Model):
    method __init__ (line 67) | def __init__(self):
    method add_inference (line 70) | def add_inference(self, cnn):
  class Vgg19Model (line 74) | class Vgg19Model(model.Model):
    method __init__ (line 76) | def __init__(self):
    method add_inference (line 79) | def add_inference(self, cnn):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py
  function parse_example_proto (line 31) | def parse_example_proto(example_serialized):
  function get_image_resize_method (line 102) | def get_image_resize_method(resize_method, batch_position=0):
  function decode_jpeg (line 149) | def decode_jpeg(image_buffer, scope=None):  # , dtype=tf.float32):
  function eval_image (line 174) | def eval_image(image,
  function train_image (line 257) | def train_image(image_buffer,
  function distort_color (line 381) | def distort_color(image, batch_position=0, distort_color_in_yiq=False,
  class RecordInputImagePreprocessor (line 435) | class RecordInputImagePreprocessor(object):
    method __init__ (line 438) | def __init__(self,
    method preprocess (line 470) | def preprocess(self, image_buffer, bbox, batch_position):
    method parse_and_preprocess (line 490) | def parse_and_preprocess(self, value, batch_position):
    method minibatch (line 495) | def minibatch(self, dataset, subset, use_datasets, cache_data,
  class Cifar10ImagePreprocessor (line 564) | class Cifar10ImagePreprocessor(object):
    method __init__ (line 567) | def __init__(self,
    method _distort_image (line 603) | def _distort_image(self, image):
    method _eval_image (line 626) | def _eval_image(self, image):
    method preprocess (line 634) | def preprocess(self, raw_image):
    method minibatch (line 644) | def minibatch(self, dataset, subset, use_datasets, cache_data,
  class SyntheticImagePreprocessor (line 691) | class SyntheticImagePreprocessor(object):
    method __init__ (line 694) | def __init__(self, height, width, batch_size, num_splits,
    method minibatch (line 709) | def minibatch(self, dataset, subset, use_datasets, cache_data,
  class TestImagePreprocessor (line 739) | class TestImagePreprocessor(object):
    method __init__ (line 749) | def __init__(self,
    method set_fake_data (line 770) | def set_fake_data(self, fake_images, fake_labels):
    method minibatch (line 778) | def minibatch(self, dataset, subset, use_datasets, cache_data,

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py
  function main (line 33) | def main(extra_flags):

FILE: cluster/benchmarks/scripts/tf_cnn_benchmarks/variable_mgr.py
  class OverrideCachingDevice (line 37) | class OverrideCachingDevice(object):
    method __init__ (line 45) | def __init__(self, devices, device_for_small_variables,
    method __call__ (line 52) | def __call__(self, getter, *args, **kwargs):
  class OverrideToLocalVariableIfNotPsVar (line 69) | class OverrideToLocalVariableIfNotPsVar(object):
    method __call__ (line 74) | def __call__(self, getter, name, *args, **kwargs):
  class ParamServerDeviceSetter (line 90) | class ParamServerDeviceSetter(object):
    method __init__ (line 93) | def __init__(self, worker_device, ps_devices):
    method __call__ (line 105) | def __call__(self, op):
  class VariableMgr (line 120) | class VariableMgr(object):
    method __init__ (line 127) | def __init__(self, benchmark_cnn):
    method each_tower_has_variables (line 131) | def each_tower_has_variables(self):
    method supports_staged_vars (line 135) | def supports_staged_vars(self):
    method create_outer_variable_scope (line 139) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 144) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 161) | def get_gradients_to_apply(self, device_num, gradient_state):
    method append_apply_gradients_ops (line 172) | def append_apply_gradients_ops(
    method get_post_init_ops (line 186) | def get_post_init_ops(self):
    method get_devices (line 190) | def get_devices(self):
    method savable_variables (line 194) | def savable_variables(self):
    method trainable_variables_on_device (line 198) | def trainable_variables_on_device(self, rel_device_num, abs_device_num,
  class VariableMgrIndependent (line 221) | class VariableMgrIndependent(VariableMgr):
    method each_tower_has_variables (line 229) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 232) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 235) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 238) | def get_gradients_to_apply(self, device_num, gradient_state):
    method get_devices (line 242) | def get_devices(self):
  class VariableMgrLocalFetchFromPS (line 246) | class VariableMgrLocalFetchFromPS(VariableMgr):
    method each_tower_has_variables (line 254) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 257) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 260) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 263) | def get_gradients_to_apply(self, device_num, gradient_state):
    method get_devices (line 269) | def get_devices(self):
  class StagedModelVariable (line 279) | class StagedModelVariable(object):
    method __init__ (line 287) | def __init__(self, real_var, var_stage_get, variable_mgr):
    method _value (line 299) | def _value(self):
    method _ref (line 303) | def _ref(self):
    method read_value (line 307) | def read_value(self):
    method dtype (line 312) | def dtype(self):
    method assign_sub (line 316) | def assign_sub(self, delta, name=None):
    method _TensorConversionFunction (line 343) | def _TensorConversionFunction(self, dtype=None, name=None, as_ref=False):
  class StagedVariableGetter (line 356) | class StagedVariableGetter(object):
    method __init__ (line 363) | def __init__(self, device_num, devices, cpu_device, variable_mgr):
    method __call__ (line 378) | def __call__(self, getter, name, *args, **kwargs):
    method trainable_variables_on_device (line 414) | def trainable_variables_on_device(self, rel_device_num, abs_device_num,
  class VariableMgrLocalFetchFromStagedPS (line 439) | class VariableMgrLocalFetchFromStagedPS(VariableMgrLocalFetchFromPS):
    method __init__ (line 443) | def __init__(self, benchmark_cnn):
    method supports_staged_vars (line 452) | def supports_staged_vars(self):
    method create_outer_variable_scope (line 455) | def create_outer_variable_scope(self, device_num):
    method trainable_variables_on_device (line 461) | def trainable_variables_on_device(self, rel_device_num, abs_device_num,
  function parse_general_int (line 470) | def parse_general_int(s):
  function parse_all_reduce_spec (line 493) | def parse_all_reduce_spec(all_reduce_spec):
  function build_all_reduce_device_prefixes (line 577) | def build_all_reduce_device_prefixes(job_name, num_tasks):
  function group_device_names (line 596) | def group_device_names(devices, group_size):
  class VariableMgrLocalReplicated (line 624) | class VariableMgrLocalReplicated(VariableMgr):
    method __init__ (line 633) | def __init__(self, benchmark_cnn, all_reduce_spec):
    method each_tower_has_variables (line 644) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 647) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 650) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 665) | def get_gradients_to_apply(self, device_num, gradient_state):
    method get_post_init_ops (line 669) | def get_post_init_ops(self):
    method savable_variables (line 684) | def savable_variables(self):
    method get_devices (line 693) | def get_devices(self):
  class VariableMgrDistributedAllReduce (line 697) | class VariableMgrDistributedAllReduce(VariableMgr):
    method __init__ (line 705) | def __init__(self, benchmark_cnn, all_reduce_spec, job_name,
    method each_tower_has_variables (line 718) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 721) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 734) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 762) | def get_gradients_to_apply(self, device_num, gradient_state):
    method get_post_init_ops (line 769) | def get_post_init_ops(self):
    method savable_variables (line 784) | def savable_variables(self):
    method get_devices (line 793) | def get_devices(self):
  class VariableMgrDistributedFetchFromPS (line 797) | class VariableMgrDistributedFetchFromPS(VariableMgr):
    method each_tower_has_variables (line 805) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 808) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 818) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 822) | def get_gradients_to_apply(self, device_num, gradient_state):
    method get_devices (line 826) | def get_devices(self):
  class VariableMgrDistributedFetchFromStagedPS (line 835) | class VariableMgrDistributedFetchFromStagedPS(
    method __init__ (line 839) | def __init__(self, benchmark_cnn):
    method create_outer_variable_scope (line 845) | def create_outer_variable_scope(self, device_num):
    method supports_staged_vars (line 852) | def supports_staged_vars(self):
    method trainable_variables_on_device (line 855) | def trainable_variables_on_device(self, rel_device_num, abs_device_num,
  class VariableMgrDistributedReplicated (line 861) | class VariableMgrDistributedReplicated(VariableMgr):
    method each_tower_has_variables (line 870) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 873) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 878) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 881) | def get_gradients_to_apply(self, device_num, gradient_state):
    method append_apply_gradients_ops (line 898) | def append_apply_gradients_ops(self, gradient_state, opt,
    method _strip_port (line 916) | def _strip_port(self, s):
    method get_post_init_ops (line 921) | def get_post_init_ops(self):
    method _remove_shadow_var_prefix_if_present (line 940) | def _remove_shadow_var_prefix_if_present(self, var_name):
    method var_dict_name (line 946) | def var_dict_name(self, v):
    method savable_variables (line 949) | def savable_variables(self):
    method get_devices (line 970) | def get_devices(self):
  function split_grads_by_size (line 974) | def split_grads_by_size(threshold_size, device_grads):
  function sum_grad_and_var_all_reduce (line 1006) | def sum_grad_and_var_all_reduce(grad_and_vars, num_workers, alg, gpu_ind...
  function contains_any (line 1044) | def contains_any(haystack, needles):
  function sum_gradients_all_reduce (line 1061) | def sum_gradients_all_reduce(dev_prefixes, tower_grads, num_workers,
  function aggregate_gradients_using_copy_with_device_selection (line 1098) | def aggregate_gradients_using_copy_with_device_selection(
  function aggregate_gradients_using_copy_with_variable_colocation (line 1125) | def aggregate_gradients_using_copy_with_variable_colocation(
  function aggregate_gradients_using_copy (line 1154) | def aggregate_gradients_using_copy(tower_grads, use_mean):
  function aggregate_single_gradient_using_copy (line 1172) | def aggregate_single_gradient_using_copy(grad_and_vars, use_mean):

FILE: cluster/benchmarks/scripts/util/benchmark_util.py
  function store_data_in_json (line 44) | def store_data_in_json(

FILE: cluster/benchmarks/scripts/util/benchmark_util_test.py
  class BenchmarkUtilTest (line 30) | class BenchmarkUtilTest(unittest.TestCase):
    method testStoreDataWithNoEntries (line 32) | def testStoreDataWithNoEntries(self):
    method testStoreDataWithEntries (line 42) | def testStoreDataWithEntries(self):

FILE: cluster/benchmarks/scripts/util/convert_csv_to_json.py
  function get_data_from_csv (line 31) | def get_data_from_csv(csv_reader):
  function main (line 65) | def main():

FILE: cluster/benchmarks/scripts/util/convert_csv_to_json_test.py
  class ConvertCsvToJsonTest (line 23) | class ConvertCsvToJsonTest(unittest.TestCase):
    method testSingleEntryCSV (line 25) | def testSingleEntryCSV(self):
    method testTwoEntryCSV (line 37) | def testTwoEntryCSV(self):
    method testInvalidCSV_LessEntries (line 52) | def testInvalidCSV_LessEntries(self):
    method testInvalidCSV_MoreEntries (line 59) | def testInvalidCSV_MoreEntries(self):
    method testInvalidCSV_EmptyEntry (line 66) | def testInvalidCSV_EmptyEntry(self):
    method testInvalidCSV_InvalidDate (line 73) | def testInvalidCSV_InvalidDate(self):
    method testInvalidCSV_InvalidValue (line 79) | def testInvalidCSV_InvalidValue(self):

FILE: cluster/benchmarks/tools/k8s_tensorflow_lib.py
  function GenerateConfig (line 134) | def GenerateConfig(num_workers,
  function WorkerClusterSpecString (line 251) | def WorkerClusterSpecString(num_workers,
  function ParamServerClusterSpecString (line 259) | def ParamServerClusterSpecString(num_workers,
  function ClusterSpecString (line 268) | def ClusterSpecString(num_workers,
  function GetCommonArgs (line 288) | def GetCommonArgs(num_workers,
  function WorkerHosts (line 319) | def WorkerHosts(num_workers, port, name_prefix):
  function PsHosts (line 325) | def PsHosts(num_ps, port, name_prefix):

FILE: cluster/benchmarks/tools/k8s_tensorflow_test.py
  class K8sTensorflowTest (line 26) | class K8sTensorflowTest(unittest.TestCase):
    method testGenerateConfig_LoadBalancer (line 28) | def testGenerateConfig_LoadBalancer(self):
    method testGenerateConfig_SharedVolume (line 51) | def testGenerateConfig_SharedVolume(self):
    method testEnvVar (line 74) | def testEnvVar(self):
    method testClusterSpec (line 88) | def testClusterSpec(self):
    method testWorkerHosts (line 118) | def testWorkerHosts(self):
    method testPsHosts (line 126) | def testPsHosts(self):

FILE: cluster/benchmarks/tools/kubectl_util.py
  class TimeoutError (line 33) | class TimeoutError(Exception):
  function _WaitUntil (line 37) | def _WaitUntil(timeout, predicate, *args):
  function _GetPodNames (line 46) | def _GetPodNames(pod_name_prefix, job_name=None):
  function CreatePods (line 67) | def CreatePods(pod_name, yaml_file):
  function DeletePods (line 86) | def DeletePods(pod_name, yaml_file):
  function _GetJobSelector (line 107) | def _GetJobSelector(pod_name_prefix, job_name=None):
  function WaitForCompletion (line 114) | def WaitForCompletion(pod_name_prefix, job_name='worker', timeout=2*60*60):
  function _PrintLogs (line 173) | def _PrintLogs(pod_name_prefix, job_name):

FILE: cluster/benchmarks/tools/kubectl_util_test.py
  class KubectlUtilTest (line 31) | class KubectlUtilTest(unittest.TestCase):
    method testCreatePods (line 35) | def testCreatePods(self, mock_check_call, mock_check_output):
    method testDeletePods (line 46) | def testDeletePods(self, mock_check_call, mock_check_output):
    method testWaitForCompletion (line 56) | def testWaitForCompletion(self, mock_check_output):

FILE: cluster/benchmarks/tools/run_distributed_benchmarks.py
  function _ConvertToValidName (line 40) | def _ConvertToValidName(name):
  function _RunBenchmark (line 52) | def _RunBenchmark(name, yaml_file):
  function _BuildAndPushDockerImage (line 66) | def _BuildAndPushDockerImage(
  function _GetMostRecentDockerImageFromGcloud (line 102) | def _GetMostRecentDockerImageFromGcloud(docker_image):
  function get_gpu_volume_mounts (line 121) | def get_gpu_volume_mounts():
  class NoImageFoundError (line 142) | class NoImageFoundError(Exception):
  function main (line 146) | def main():

FILE: cluster/client_transfer_benchmark.py
  function clusterspec (line 79) | def clusterspec():
  function log (line 84) | def log(s):
  function session_config (line 88) | def session_config():
  function launch_distributed_service (line 97) | def launch_distributed_service():
  function run_benchmark (line 117) | def run_benchmark(master, direction=None):
  function create_done_queue (line 199) | def create_done_queue(i):

FILE: cluster/connect.py
  function toseconds (line 35) | def toseconds(dt):
  function main (line 42) | def main():

FILE: cluster/fill_efs.py
  function main (line 17) | def main():

FILE: cluster/imagenet64/aws.py
  class timeit (line 37) | class timeit:
    method __init__ (line 41) | def __init__(self, tag=""):
    method __enter__ (line 44) | def __enter__(self):
    method __exit__ (line 48) | def __exit__(self, *args):
  function _ExecuteCommandInThread (line 55) | def _ExecuteCommandInThread(ssh_client,
  function _StreamOutputToFile (line 85) | def _StreamOutputToFile(fd, file, line_extractor, cmd=None):
  function _ExecuteCommandAndStreamOutput (line 110) | def _ExecuteCommandAndStreamOutput(ssh_client,
  function lookup_aws_instances (line 149) | def lookup_aws_instances(name):
  function tf_job (line 174) | def tf_job(name, num_tasks, instance_type=None, placement_group=''):
  function terminate_job (line 223) | def terminate_job(name):
  function _ssh_to_host (line 238) | def _ssh_to_host(hostname,
  class Job (line 276) | class Job:
    method __init__ (line 277) | def __init__(self, name, instances):
    method wait_until_ready (line 284) | def wait_until_ready(self):
  function _encode_float (line 290) | def _encode_float(value):
  function _decode_float (line 294) | def _decode_float(b16):
  class Task (line 297) | class Task:
    method __init__ (line 298) | def __init__(self, instance, job, task_id):
    method wait_until_ready (line 309) | def wait_until_ready(self):
    method initialize (line 319) | def initialize(self):
    method run_sync (line 332) | def run_sync(self, cmd):
    method _setup_tasklogdir (line 342) | def _setup_tasklogdir(self):
    method run (line 346) | def run(self, cmd, mirror_output=False):
    method upload (line 373) | def upload(self, local_file, remote_file=None):
    method _upload_directory (line 384) | def _upload_directory(self, local_directory, remote_directory):
    method public_ip (line 388) | def public_ip(self):
    method port (line 393) | def port(self):
    method ip (line 397) | def ip(self):  # private ip

FILE: cluster/imagenet64/launch.py
  function ossystem (line 114) | def ossystem(cmd):
  class AWSInstance (line 119) | class AWSInstance(object):
    method __init__ (line 121) | def __init__(self, instance, ssh_key='', name='', username='ubuntu',
    method __del__ (line 135) | def __del__(self):
    method WaitUntilReady (line 138) | def WaitUntilReady(self):
    method CreateSshClient (line 156) | def CreateSshClient(self):
    method reuse_ssh_client (line 162) | def reuse_ssh_client(self):
    method CleanSshClient (line 168) | def CleanSshClient(self):
    method state (line 180) | def state(self):
    method SetNameTag (line 183) | def SetNameTag(self, name='tf'):
    method SetCustomTag (line 191) | def SetCustomTag(self, key, value):
    method Start (line 199) | def Start(self):
    method Stop (line 202) | def Stop(self):
    method StopAndWaitUntilStopped (line 206) | def StopAndWaitUntilStopped(self):
    method Terminate (line 210) | def Terminate(self):
    method TerminateAndWaitUntilTerminated (line 214) | def TerminateAndWaitUntilTerminated(self):
    method instance_id (line 219) | def instance_id(self):
    method ExecuteCommandAndWait (line 222) | def ExecuteCommandAndWait(self, cmd, print_error=False):
    method ExecuteCommandAndReturnStdout (line 226) | def ExecuteCommandAndReturnStdout(self, cmd):
    method ExecuteCommandAndStreamOutput (line 230) | def ExecuteCommandAndStreamOutput(self,
    method ExecuteCommandInThread (line 246) | def ExecuteCommandInThread(self,
    method RetrieveFile (line 261) | def RetrieveFile(self, remote_file, local_file):
    method UploadFile (line 266) | def UploadFile(self, local_file, remote_file):
  function CreateAwsInstances (line 271) | def CreateAwsInstances(num_instances=1,
  function setup_local_logdir (line 308) | def setup_local_logdir(run):
  function launch_job_aws (line 314) | def launch_job_aws(name, replicas):
  function tf_config_cmd (line 328) | def tf_config_cmd(full_cluster_spec, task_spec):
  function launch_aws (line 362) | def launch_aws():
  class Instance (line 404) | class Instance:
    method tf_env_setup (line 407) | def tf_env_setup(self, cluster_spec, task_spec):
  function launch_local (line 417) | def launch_local():
  function cnn_launcher (line 452) | def cnn_launcher():
  function main (line 541) | def main():

FILE: cluster/imagenet64/variable_mgr.py
  class OverrideCachingDevice (line 37) | class OverrideCachingDevice(object):
    method __init__ (line 45) | def __init__(self, devices, device_for_small_variables,
    method __call__ (line 52) | def __call__(self, getter, *args, **kwargs):
  class OverrideToLocalVariableIfNotPsVar (line 69) | class OverrideToLocalVariableIfNotPsVar(object):
    method __call__ (line 74) | def __call__(self, getter, name, *args, **kwargs):
  class ParamServerDeviceSetter (line 90) | class ParamServerDeviceSetter(object):
    method __init__ (line 93) | def __init__(self, worker_device, ps_devices):
    method __call__ (line 105) | def __call__(self, op):
  class VariableMgr (line 120) | class VariableMgr(object):
    method __init__ (line 127) | def __init__(self, benchmark_cnn):
    method each_tower_has_variables (line 131) | def each_tower_has_variables(self):
    method supports_staged_vars (line 135) | def supports_staged_vars(self):
    method create_outer_variable_scope (line 139) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 144) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 161) | def get_gradients_to_apply(self, device_num, gradient_state):
    method append_apply_gradients_ops (line 172) | def append_apply_gradients_ops(
    method get_post_init_ops (line 186) | def get_post_init_ops(self):
    method get_devices (line 190) | def get_devices(self):
    method savable_variables (line 194) | def savable_variables(self):
    method trainable_variables_on_device (line 198) | def trainable_variables_on_device(self, rel_device_num, abs_device_num,
  class VariableMgrIndependent (line 221) | class VariableMgrIndependent(VariableMgr):
    method each_tower_has_variables (line 229) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 232) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 235) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 238) | def get_gradients_to_apply(self, device_num, gradient_state):
    method get_devices (line 242) | def get_devices(self):
  class VariableMgrLocalFetchFromPS (line 246) | class VariableMgrLocalFetchFromPS(VariableMgr):
    method each_tower_has_variables (line 254) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 257) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 260) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 263) | def get_gradients_to_apply(self, device_num, gradient_state):
    method get_devices (line 269) | def get_devices(self):
  class StagedModelVariable (line 279) | class StagedModelVariable(object):
    method __init__ (line 287) | def __init__(self, real_var, var_stage_get, variable_mgr):
    method _value (line 299) | def _value(self):
    method _ref (line 303) | def _ref(self):
    method read_value (line 307) | def read_value(self):
    method dtype (line 312) | def dtype(self):
    method assign_sub (line 316) | def assign_sub(self, delta, name=None):
    method _TensorConversionFunction (line 343) | def _TensorConversionFunction(self, dtype=None, name=None, as_ref=False):
  class StagedVariableGetter (line 356) | class StagedVariableGetter(object):
    method __init__ (line 363) | def __init__(self, device_num, devices, cpu_device, variable_mgr):
    method __call__ (line 378) | def __call__(self, getter, name, *args, **kwargs):
    method trainable_variables_on_device (line 414) | def trainable_variables_on_device(self, rel_device_num, abs_device_num,
  class VariableMgrLocalFetchFromStagedPS (line 439) | class VariableMgrLocalFetchFromStagedPS(VariableMgrLocalFetchFromPS):
    method __init__ (line 443) | def __init__(self, benchmark_cnn):
    method supports_staged_vars (line 452) | def supports_staged_vars(self):
    method create_outer_variable_scope (line 455) | def create_outer_variable_scope(self, device_num):
    method trainable_variables_on_device (line 461) | def trainable_variables_on_device(self, rel_device_num, abs_device_num,
  function parse_general_int (line 470) | def parse_general_int(s):
  function parse_all_reduce_spec (line 493) | def parse_all_reduce_spec(all_reduce_spec):
  function build_all_reduce_device_prefixes (line 577) | def build_all_reduce_device_prefixes(job_name, num_tasks):
  function group_device_names (line 596) | def group_device_names(devices, group_size):
  class VariableMgrLocalReplicated (line 624) | class VariableMgrLocalReplicated(VariableMgr):
    method __init__ (line 633) | def __init__(self, benchmark_cnn, all_reduce_spec):
    method each_tower_has_variables (line 644) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 647) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 650) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 665) | def get_gradients_to_apply(self, device_num, gradient_state):
    method get_post_init_ops (line 669) | def get_post_init_ops(self):
    method savable_variables (line 684) | def savable_variables(self):
    method get_devices (line 693) | def get_devices(self):
  class VariableMgrDistributedAllReduce (line 697) | class VariableMgrDistributedAllReduce(VariableMgr):
    method __init__ (line 705) | def __init__(self, benchmark_cnn, all_reduce_spec, job_name,
    method each_tower_has_variables (line 718) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 721) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 734) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 762) | def get_gradients_to_apply(self, device_num, gradient_state):
    method get_post_init_ops (line 769) | def get_post_init_ops(self):
    method savable_variables (line 784) | def savable_variables(self):
    method get_devices (line 793) | def get_devices(self):
  class VariableMgrDistributedFetchFromPS (line 797) | class VariableMgrDistributedFetchFromPS(VariableMgr):
    method each_tower_has_variables (line 805) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 808) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 818) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 822) | def get_gradients_to_apply(self, device_num, gradient_state):
    method get_devices (line 826) | def get_devices(self):
  class VariableMgrDistributedFetchFromStagedPS (line 835) | class VariableMgrDistributedFetchFromStagedPS(
    method __init__ (line 839) | def __init__(self, benchmark_cnn):
    method create_outer_variable_scope (line 845) | def create_outer_variable_scope(self, device_num):
    method supports_staged_vars (line 852) | def supports_staged_vars(self):
    method trainable_variables_on_device (line 855) | def trainable_variables_on_device(self, rel_device_num, abs_device_num,
  class VariableMgrDistributedReplicated (line 861) | class VariableMgrDistributedReplicated(VariableMgr):
    method each_tower_has_variables (line 870) | def each_tower_has_variables(self):
    method create_outer_variable_scope (line 873) | def create_outer_variable_scope(self, device_num):
    method preprocess_device_grads (line 878) | def preprocess_device_grads(self, device_grads):
    method get_gradients_to_apply (line 881) | def get_gradients_to_apply(self, device_num, gradient_state):
    method append_apply_gradients_ops (line 898) | def append_apply_gradients_ops(self, gradient_state, opt,
    method _strip_port (line 916) | def _strip_port(self, s):
    method get_post_init_ops (line 921) | def get_post_init_ops(self):
    method _remove_shadow_var_prefix_if_present (line 940) | def _remove_shadow_var_prefix_if_present(self, var_name):
    method var_dict_name (line 946) | def var_dict_name(self, v):
    method savable_variables (line 949) | def savable_variables(self):
    method get_devices (line 970) | def get_devices(self):
  function split_grads_by_size (line 974) | def split_grads_by_size(threshold_size, device_grads):
  function sum_grad_and_var_all_reduce (line 1006) | def sum_grad_and_var_all_reduce(grad_and_vars, num_workers, alg, gpu_ind...
  function contains_any (line 1044) | def contains_any(haystack, needles):
  function sum_gradients_all_reduce (line 1061) | def sum_gradients_all_reduce(dev_prefixes, tower_grads, num_workers,
  function aggregate_gradients_using_copy_with_device_selection (line 1098) | def aggregate_gradients_using_copy_with_device_selection(
  function aggregate_gradients_using_copy_with_variable_colocation (line 1125) | def aggregate_gradients_using_copy_with_variable_colocation(
  function aggregate_gradients_using_copy (line 1154) | def aggregate_gradients_using_copy(tower_grads, use_mean):
  function aggregate_single_gradient_using_copy (line 1172) | def aggregate_single_gradient_using_copy(grad_and_vars, use_mean):

FILE: cluster/launch_async_adder.py
  function ossystem (line 192) | def ossystem(cmd):
  function setup_local_logdir (line 199) | def setup_local_logdir(run):
  function setup_remote_logdir (line 205) | def setup_remote_logdir(run):
  function run_in_window (line 213) | def run_in_window(window, cmd_list):
  function launch_job_tmux (line 245) | def launch_job_tmux(role, num_tasks):
  class LocalJob (line 266) | class LocalJob:
    method __init__ (line 267) | def __init__(self, name, num_tasks):
  class LocalTask (line 275) | class LocalTask: # same as Instance
    method __init__ (line 279) | def __init__(self, job, task_id):
    method run (line 285) | def run(self, cmd):
    method tf_env_setup (line 290) | def tf_env_setup(self, full_cluster_spec, task_spec):
  function select_window (line 336) | def select_window(window):
  function launch_job_aws (line 340) | def launch_job_aws(name, replicas):
  function tf_config_cmd (line 355) | def tf_config_cmd(full_cluster_spec, task_spec):
  class BasicAwsJob (line 389) | class BasicAwsJob():
    method __init__ (line 390) | def __init__(name, num_tasks):
  function launch_aws (line 395) | def launch_aws():
  class Instance (line 437) | class Instance:
    method tf_env_setup (line 440) | def tf_env_setup(self, cluster_spec, task_spec):
  function launch_local (line 450) | def launch_local():
  function cnn_launcher (line 522) | def cnn_launcher():
  function main (line 677) | def main():

FILE: cluster/launch_micro.py
  function main (line 56) | def main():

FILE: cluster/launch_ray.py
  function main (line 53) | def main():

FILE: cluster/launch_simple_tf.py
  function main (line 14) | def main():
  function launcher (line 23) | def launcher(do_local=False):
  function worker (line 43) | def worker():

FILE: cluster/local_distributed_benchmark.py
  function default_config (line 26) | def default_config():
  function create_graph (line 34) | def create_graph(device1, device2):
  function run_benchmark (line 55) | def run_benchmark(sess, init_op, add_op):
  function run_benchmark_local (line 68) | def run_benchmark_local():
  function run_benchmark_distributed (line 74) | def run_benchmark_distributed():

FILE: cluster/myutil.py
  class timeit (line 8) | class timeit:
    method __init__ (line 12) | def __init__(self, tag=""):
    method __enter__ (line 15) | def __enter__(self):
    method __exit__ (line 19) | def __exit__(self, *args):
  function get_instance_ip_map (line 24) | def get_instance_ip_map():

FILE: cluster/ray_add.py
  class ParameterServer (line 48) | class ParameterServer(object):
    method __init__ (line 49) | def __init__(self, data_size, read):
    method push (line 55) | def push(self, value):
    method pull (line 60) | def pull(self):
    method update_times (line 65) | def update_times(self):
    method get_throughput (line 70) | def get_throughput(self):
  function worker_task (line 76) | def worker_task(data_size, read, *parameter_servers):

FILE: cluster/simple_distributed.py
  function clusterspec (line 40) | def clusterspec():
  function log (line 45) | def log(s):
  function session_config (line 49) | def session_config():
  function launch_distributed_service (line 58) | def launch_distributed_service():
  function run_benchmark (line 78) | def run_benchmark(master, direction=None):
  function create_done_queue (line 160) | def create_done_queue(i):

FILE: cluster/terminate_instances.py
  function main (line 19) | def main():

FILE: cluster/test_aws.py
  function test_new_job (line 21) | def test_new_job():
  function test_terminate_job (line 30) | def test_terminate_job():
  function test_reuse_job (line 34) | def test_reuse_job():
  function test_send_file (line 38) | def test_send_file():
  function test_upload_directory (line 50) | def test_upload_directory():
  function test_stream_output (line 53) | def test_stream_output():
  function main (line 63) | def main():

FILE: cluster/tf-tools/benchmark/runner/cluster_aws.py
  class AWSInstance (line 8) | class AWSInstance(object):
    method __init__ (line 10) | def __init__(self, instance, ssh_key='', name='', username='ubuntu'):
    method __del__ (line 19) | def __del__(self):
    method WaitUntilReady (line 22) | def WaitUntilReady(self):
    method CreateSshClient (line 40) | def CreateSshClient(self):
    method reuse_ssh_client (line 46) | def reuse_ssh_client(self):
    method CleanSshClient (line 52) | def CleanSshClient(self):
    method state (line 64) | def state(self):
    method SetNameTag (line 67) | def SetNameTag(self, name='tf'):
    method Start (line 75) | def Start(self):
    method Stop (line 78) | def Stop(self):
    method StopAndWaitUntilStopped (line 82) | def StopAndWaitUntilStopped(self):
    method Terminate (line 86) | def Terminate(self):
    method TerminateAndWaitUntilTerminated (line 90) | def TerminateAndWaitUntilTerminated(self):
    method instance_id (line 95) | def instance_id(self):
    method ExecuteCommandAndWait (line 98) | def ExecuteCommandAndWait(self, cmd, print_error=False):
    method ExecuteCommandAndReturnStdout (line 102) | def ExecuteCommandAndReturnStdout(self, cmd):
    method ExecuteCommandAndStreamOutput (line 105) | def ExecuteCommandAndStreamOutput(self,
    method ExecuteCommandInThread (line 121) | def ExecuteCommandInThread(self,
    method RetrieveFile (line 136) | def RetrieveFile(self, remote_file, local_file):
    method UploadFile (line 141) | def UploadFile(self, local_file, remote_file):
  function MaybeCreatePlacementGroup (line 147) | def MaybeCreatePlacementGroup(name='tf_bm'):
  function DeletePlacementGroup (line 168) | def DeletePlacementGroup(name='tf_bm'):
  function CreateAwsInstances (line 184) | def CreateAwsInstances(num_instances=1,
  function LookupAwsInstances (line 220) | def LookupAwsInstances(image_id=None,
  function AwsInstances (line 252) | def AwsInstances(num_instances=1,
  function ReuseAwsInstances (line 294) | def ReuseAwsInstances(image_id=None,

FILE: cluster/tf-tools/benchmark/runner/command_builder.py
  function BuildDistributedCommandWorker (line 5) | def BuildDistributedCommandWorker(run_config, worker_hosts, ps_hosts,
  function BuildDistributedCommandPS (line 82) | def BuildDistributedCommandPS(run_config, worker_hosts, ps_hosts, task_i...
  function WorkerUtil (line 107) | def WorkerUtil(workers):
  function GpuDecode (line 118) | def GpuDecode(raw_gpu_input):
  function LoadYamlRunConfig (line 126) | def LoadYamlRunConfig(full_config, debug_level):

FILE: cluster/tf-tools/benchmark/runner/launch_experiment.py
  class timeit (line 46) | class timeit:
    method __init__ (line 50) | def __init__(self, tag=""):
    method __enter__ (line 53) | def __enter__(self):
    method __exit__ (line 57) | def __exit__(self, *args):
  function get_instance_ip_map (line 63) | def get_instance_ip_map():
  function main (line 84) | def main():

FILE: cluster/tf-tools/benchmark/runner/test_cluster_aws.py
  class timeit (line 19) | class timeit:
    method __init__ (line 23) | def __init__(self, tag=""):
    method __enter__ (line 26) | def __enter__(self):
    method __exit__ (line 30) | def __exit__(self, *args):
  function test_two_machine (line 35) | def test_two_machine():
  function main (line 39) | def main():

FILE: cluster/tf-tools/benchmark/runner/test_command_builder.py
  function main (line 5) | def main():

FILE: cluster/tf-tools/benchmark/runner/util.py
  function ExtractErrorToConsole (line 11) | def ExtractErrorToConsole(line):
  function ExtractToStdout (line 36) | def ExtractToStdout(line):
  function ExtractImagePerSecond (line 40) | def ExtractImagePerSecond(line):
  function ExecuteCommandAndWait (line 45) | def ExecuteCommandAndWait(ssh_client, command, print_error=True, ok_exit...
  function ExecuteCommandAndReturnStdout (line 57) | def ExecuteCommandAndReturnStdout(ssh_client, command):
  function _StreamOutputToFile (line 63) | def _StreamOutputToFile(fd, file, line_extractor, command=None):
  function ExecuteCommandAndStreamOutput (line 89) | def ExecuteCommandAndStreamOutput(ssh_client,
  function ExecuteCommandInThread (line 128) | def ExecuteCommandInThread(ssh_client,
  function SshToHost (line 158) | def SshToHost(hostname,

FILE: cluster/tmux.py
  function _setup_logdir (line 30) | def _setup_logdir(job_name):
  function _ossystem (line 40) | def _ossystem(cmd):
  function kill_job (line 44) | def kill_job(name):
  function tf_job (line 49) | def tf_job(name, num_tasks):
  class Job (line 76) | class Job:
    method __init__ (line 77) | def __init__(self, name, num_tasks, tmux_windows):
  class Task (line 85) | class Task:
    method __init__ (line 89) | def __init__(self, tmux_window, job, task_id):
    method run (line 100) | def run(self, cmd):
    method upload (line 103) | def upload(self, cmd):  # compatiblity with aws.py:Task
    method tf_env_setup (line 106) | def tf_env_setup(self, full_cluster_spec, task_spec):

FILE: conditional_backprop.py
  function conditional_backprop (line 15) | def conditional_backprop(do_backprop, tensor):

FILE: danjar_peek.py
  class Queue (line 5) | class Queue(tf.FIFOQueue):
    method __init__ (line 7) | def __init__(self, capacity):
    method peek (line 18) | def peek(self):
    method enqueue (line 21) | def enqueue(self, element):

FILE: distributed/benchmark_grpc_recv.py
  function session_config (line 62) | def session_config():
  function clusterspec (line 71) | def clusterspec():
  function create_graph (line 76) | def create_graph(device0, device1):
  function create_done_queue (line 94) | def create_done_queue(i):
  function run_benchmark (line 101) | def run_benchmark(sess, init_op, add_op):
  function run_benchmark_local (line 114) | def run_benchmark_local():
  function run_benchmark_distributed (line 120) | def run_benchmark_distributed():

FILE: distributed/client_transfer_benchmark.py
  function clusterspec (line 71) | def clusterspec():
  function log (line 76) | def log(s):
  function session_config (line 80) | def session_config():
  function launch_distributed_service (line 89) | def launch_distributed_service():
  function run_benchmark (line 109) | def run_benchmark(master):
  function create_done_queue (line 147) | def create_done_queue(i):

FILE: double_memory_bug.py
  function sessrun (line 7) | def sessrun(*args, **kwargs):

FILE: eager_lbfgs/eager_lbfgs.py
  function dot (line 10) | def dot(a, b):
  function verbose_func (line 14) | def verbose_func(s):
  function lbfgs (line 19) | def lbfgs(opfunc, x, config, state, do_verbose):
  class dummy (line 216) | class dummy(object):
  class Struct (line 219) | class Struct(dummy):
    method __getattribute__ (line 220) | def __getattribute__(self, key):
  function benchmark (line 226) | def benchmark(batch_size, iters, seed=1, cuda=True, history=100, verbose...
  function main (line 285) | def main():

FILE: eager_lbfgs/pytorch_lbfgs.py
  function benchmark (line 15) | def benchmark(batch_size, iters, seed=1, cuda=True, history=100, verbose...
  function main (line 90) | def main():

FILE: eager_lbfgs/run_experiment.py
  function run_experiment (line 14) | def run_experiment(iters, name):

FILE: eager_lbfgs/util.py
  function check_mkl (line 43) | def check_mkl():
  function set_global_args (line 48) | def set_global_args(local_args):
  function concat_blocks (line 55) | def concat_blocks(blocks, validate_dims=True):
  function concat_blocks_test (line 70) | def concat_blocks_test():
  function partition_matrix_evenly (line 79) | def partition_matrix_evenly(mat, splits):
  function partition_matrix_evenly_test (line 88) | def partition_matrix_evenly_test():
  function partition_matrix (line 96) | def partition_matrix(mat, sizes):
  function partition_matrix_test (line 99) | def partition_matrix_test():
  function pseudo_inverse (line 104) | def pseudo_inverse(mat, eps=1e-10):
  function symsqrt (line 112) | def symsqrt(mat, eps=1e-7):
  function pseudo_inverse_sqrt (line 120) | def pseudo_inverse_sqrt(mat, eps=1e-7):
  function pseudo_inverse_sqrt2 (line 127) | def pseudo_inverse_sqrt2(svd, eps=1e-7):
  function pseudo_inverse2 (line 139) | def pseudo_inverse2(svd, eps=1e-7):
  function pseudo_inverse_stable (line 153) | def pseudo_inverse_stable(svd, eps=1e-7):
  function regularized_inverse (line 168) | def regularized_inverse(mat, l=0.1):
  function regularized_inverse2 (line 172) | def regularized_inverse2(svd, L=1e-3):
  function regularized_inverse3 (line 184) | def regularized_inverse3(svd, L=1e-3):
  function regularized_inverse4 (line 200) | def regularized_inverse4(svd, L=1e-3):
  function pseudo_inverse_scipy (line 216) | def pseudo_inverse_scipy(tensor):
  function Identity (line 225) | def Identity(n, dtype=None, name=None):
  function ones (line 235) | def ones(n, dtype=None, name=None):
  function partition_list_np (line 241) | def partition_list_np(vec, sizes):
  function chunks (line 251) | def chunks(l, n):
  function partition_list (line 256) | def partition_list(l, sizes):
  function partition_list_test (line 267) | def partition_list_test():
  function v2c (line 275) | def v2c(vec):
  function v2c_np (line 280) | def v2c_np(vec):
  function v2r (line 285) | def v2r(vec):
  function c2v (line 290) | def c2v(col):
  function unvectorize_np (line 297) | def unvectorize_np(vec, rows):
  function unvec (line 304) | def unvec(vec, rows):
  function unvec_test (line 314) | def unvec_test():
  function vectorize_np (line 320) | def vectorize_np(mat):
  function vec (line 323) | def vec(mat):
  function vec_test (line 327) | def vec_test():
  function Kmat (line 333) | def Kmat(rows, cols):
  function Kmat_test (line 348) | def Kmat_test():
  function unflatten_np (line 367) | def unflatten_np(Wf, fs):
  function flatten_np (line 378) | def flatten_np(Ws):
  function flatten_np_test (line 381) | def flatten_np_test():
  function unflatten (line 388) | def unflatten(Wf, fs):
  function unflatten_test (line 402) | def unflatten_test():
  function flatten (line 411) | def flatten(Ws):
  function flatten_test (line 415) | def flatten_test():
  function check_close (line 423) | def check_close(a0, b0):
  function check_equal (line 426) | def check_equal(a0, b0, rtol=1e-9, atol=1e-12):
  function fix_shape (line 453) | def fix_shape(tf_shape):
  function kronecker_cols (line 456) | def kronecker_cols(a, b):
  function kronecker_cols_test (line 468) | def kronecker_cols_test():
  function kronecker (line 476) | def kronecker(A, B, do_shape_inference=True):
  function kronecker_test (line 508) | def kronecker_test():
  function col (line 522) | def col(A,i):
  function khatri_rao (line 529) | def khatri_rao(A, B):
  function khatri_rao_test (line 536) | def khatri_rao_test():
  function relu_mask (line 544) | def relu_mask(a, dtype=default_dtype):
  function relu_mask_test (line 550) | def relu_mask_test():
  function assert_rectangular (line 555) | def assert_rectangular(blocks):
  function empty_grid (line 559) | def empty_grid(rows, cols):
  function block_diagonal_inverse (line 566) | def block_diagonal_inverse(blocks):
  function block_diagonal_inverse_sqrt (line 586) | def block_diagonal_inverse_sqrt(blocks):
  function block_diagonal_inverse_test (line 605) | def block_diagonal_inverse_test():
  function t (line 615) | def t(x):
  function reset_time (line 622) | def reset_time():
  function record_time (line 627) | def record_time():
  function last_time (line 634) | def last_time():
  function summarize_time (line 642) | def summarize_time(time_list=None):
  function summarize_graph (line 660) | def summarize_graph(g=None):
  function disable_shape_inference (line 668) | def disable_shape_inference():
  function enable_shape_inference (line 671) | def enable_shape_inference():
  function dummy_collections_handler (line 678) | def dummy_collections_handler(info, elem, elem_): pass
  function disable_collections_handler (line 679) | def disable_collections_handler():
  function enable_collections_handler (line 681) | def enable_collections_handler():
  function dump_with_prompt (line 685) | def dump_with_prompt(result, fname, no_prefix=False):
  function dump (line 700) | def dump(result, fname, no_prefix=False):
  function dump32 (line 722) | def dump32(result, fname):
  function frobenius_np (line 732) | def frobenius_np(a):
  function nan_check (line 735) | def nan_check(result):
  function L2 (line 741) | def L2(t):
  class timeit (line 752) | class timeit:
    method __init__ (line 756) | def __init__(self, tag=""):
    method __enter__ (line 759) | def __enter__(self):
    method __exit__ (line 763) | def __exit__(self, *args):
  function record (line 777) | def record(tag, stat):
  function timeit_summarize (line 782) | def timeit_summarize():
  function parents (line 790) | def parents(op): return set(input.op for input in op.inputs)
  function children (line 791) | def children(op): return set(op for out in op.outputs for op in out.cons...
  function dict_graph (line 792) | def dict_graph():
  function nx_graph (line 798) | def nx_graph():
  function shortest_path (line 801) | def shortest_path(dep, target):
  function list_or_tuple (line 808) | def list_or_tuple(k):
  function is_numeric (line 811) | def is_numeric(ndarray):
  class VarInfo (line 815) | class VarInfo:
    method __init__ (line 817) | def __init__(self, setter, p):
  class SvdTuple (line 821) | class SvdTuple:
    method __init__ (line 825) | def __init__(self, suvi, *args):
  class SvdWrapper (line 846) | class SvdWrapper:
    method __init__ (line 853) | def __init__(self, target, name, do_inverses=False, use_resource=False):
    method update (line 926) | def update(self):
    method update_tf (line 933) | def update_tf(self):
    method update_scipy (line 938) | def update_scipy(self):
    method update_scipy_inv (line 944) | def update_scipy_inv(self):
    method update_scipy_svd (line 951) | def update_scipy_svd(self):
  function extract_grad (line 972) | def extract_grad(grads_and_vars, var):
  function intersept_op_creation (line 984) | def intersept_op_creation(op_type_name_to_intercept):
  function get_variable (line 997) | def get_variable(name, initializer, reuse=True):
  class VarStruct (line 1011) | class VarStruct:
    method __init__ (line 1026) | def __init__(self, initial_value, name, dtype=None):
    method set (line 1044) | def set(self, val):
    method initialize (line 1048) | def initialize(self):
  function get_var (line 1054) | def get_var(name, initializer, reuse=True):
  function run_all_tests (line 1075) | def run_all_tests(module):
  function capture_ops (line 1085) | def capture_ops():
  function capture_vars (line 1102) | def capture_vars():
  function Print (line 1118) | def Print(op):
  function get_host_prefix (line 1122) | def get_host_prefix():
  function summarize_difference (line 1126) | def summarize_difference(source, target):
  class BufferedWriter (line 1134) | class BufferedWriter:
    method __init__ (line 1137) | def __init__(self, outfn, save_every_secs=60*5):
    method write (line 1143) | def write(self, line):
    method flush (line 1152) | def flush():
  function ossystem (line 1158) | def ossystem(line):
  function setup_experiment_run_directory (line 1162) | def setup_experiment_run_directory(run, safe_mode=True):
  function get_last_logger (line 1187) | def get_last_logger(skip_existence_check=False):
  class TensorboardLogger (line 1195) | class TensorboardLogger:
    method __init__ (line 1204) | def __init__(self, run, step=0):
    method __call__ (line 1219) | def __call__(self, *args):
    method next_step (line 1224) | def next_step(self):
  function as_int32 (line 1235) | def as_int32(v):
  function add_dep (line 1239) | def add_dep(from_op, on_op):
  function register_default_session (line 1248) | def register_default_session(local_sess):
  function get_default_session (line 1253) | def get_default_session():
  function get_default_graph (line 1260) | def get_default_graph():
  function eval (line 1265) | def eval(tensor):
  function run (line 1271) | def run(fetches):
  function traced_run (line 1276) | def traced_run(fetches):
  function get_mnist_images (line 1298) | def get_mnist_images(max_images=0, fold='train'):
  function cachedGpuIdentityRegularizer (line 1364) | def cachedGpuIdentityRegularizer(n, Lambda):
  function ng_init (line 1377) | def ng_init(s1, s2): # uniform weight init from Ng UFLDL

FILE: enqueue_many_test.py
  function create_session (line 6) | def create_session():
  function run_op (line 44) | def run_op(op):

FILE: enqueue_many_test_singlerun.py
  function create_session (line 7) | def create_session():
  function run_op (line 46) | def run_op(op):

FILE: ericyue-slowreader/benchmark-batch.py
  function let_queue_repopulate (line 72) | def let_queue_repopulate(size_tensor, min_elements=100000, sleep_delay=0...

FILE: ericyue-slowreader/benchmark-synthetic-batch.py
  function let_queue_repopulate (line 35) | def let_queue_repopulate(size_tensor, min_elements=100000, sleep_delay=0...

FILE: ericyue-slowreader/benchmark.py
  function read_and_decode (line 49) | def read_and_decode(filename_queue):

FILE: free_gpus.py
  function tokenize (line 9) | def tokenize(cmd):
  function run_command (line 19) | def run_command(cmd):
  function run_shell (line 27) | def run_shell(cmd):
  function run_shell_background (line 45) | def run_shell_background(cmd_orig):
  function get_pid_gpu_map (line 53) | def get_pid_gpu_map():
  function kill_pids (line 75) | def kill_pids(pids_to_kill):
  function owner (line 83) | def owner(pid):

FILE: graph_template.py
  function profile (line 9) | def profile(func): return func
  class GraphTemplate (line 26) | class GraphTemplate:
    method __init__ (line 49) | def __init__(self, inputs, outputs, within_ops=None):
    method apply (line 99) | def apply(self, new_inputs, update_colocation_groups=True):
  function clear_original_ops (line 204) | def clear_original_ops(ops):
  function tf_ops_to_graph (line 209) | def tf_ops_to_graph(ops):
  function ops_in_toposorted_order (line 217) | def ops_in_toposorted_order(ops):
  class _DeviceCaptureOp (line 229) | class _DeviceCaptureOp(object):
    method __init__ (line 230) | def __init__(self):
    method _set_device (line 232) | def _set_device(self, device):
  function get_current_device (line 235) | def get_current_device():
  function flatten1 (line 243) | def flatten1(list_of_lists):
  function count_gpus (line 256) | def count_gpus():
  function current_function_name (line 264) | def current_function_name():
  function check_equal (line 270) | def check_equal(a, b):
  function capture_ops (line 277) | def capture_ops():
  function capture_ops_test (line 291) | def capture_ops_test():
  function graph_template_test (line 297) | def graph_template_test():
  function graph_devices_test (line 310) | def graph_devices_test():
  function variables_test (line 325) | def variables_test():
  function multi_variables_test (line 335) | def multi_variables_test():
  function colocate_test (line 350) | def colocate_test():
  function between_devices_copy_test (line 366) | def between_devices_copy_test():
  function optimization_test (line 387) | def optimization_test():
  function multidevice_shared_params_test (line 425) | def multidevice_shared_params_test():
  function multidevice_separate_params_test (line 461) | def multidevice_separate_params_test():
  function run_all_tests (line 512) | def run_all_tests(module):

FILE: input_benchmarks/convert_to_records.py
  function _int64_feature (line 43) | def _int64_feature(value):
  function _bytes_feature (line 47) | def _bytes_feature(value):
  function convert_to (line 51) | def convert_to(data_set, name):
  function main (line 78) | def main(argv):

FILE: input_benchmarks/fully_connected_feed.py
  function placeholder_inputs (line 46) | def placeholder_inputs(batch_size):
  function fill_feed_dict (line 68) | def fill_feed_dict(data_set, images_pl, labels_pl):
  function do_eval (line 96) | def do_eval(sess,
  function run_training (line 125) | def run_training():
  function main (line 240) | def main(_):

FILE: input_benchmarks/fully_connected_preloaded_var.py
  function run_training (line 54) | def run_training():
  function main (line 174) | def main(_):

FILE: input_benchmarks/fully_connected_reader.py
  function read_and_decode (line 53) | def read_and_decode(filename_queue):
  function inputs (line 84) | def inputs(train, batch_size, num_epochs):
  function run_training (line 123) | def run_training():
  function main (line 206) | def main(_):

FILE: inverse_segfault.py
  function load_MNIST_images (line 38) | def load_MNIST_images(filename):
  function set_global_args (line 61) | def set_global_args(local_args):
  function concat_blocks (line 66) | def concat_blocks(blocks, validate_dims=True):
  function concat_blocks_test (line 81) | def concat_blocks_test():
  function partition_matrix_evenly (line 90) | def partition_matrix_evenly(mat, splits):
  function partition_matrix_evenly_test (line 99) | def partition_matrix_evenly_test():
  function partition_matrix (line 107) | def partition_matrix(mat, sizes):
  function partition_matrix_test (line 110) | def partition_matrix_test():
  function pseudo_inverse (line 115) | def pseudo_inverse(mat, eps=1e-10):
  function symsqrt (line 123) | def symsqrt(mat, eps=1e-7):
  function pseudo_inverse_sqrt (line 131) | def pseudo_inverse_sqrt(mat, eps=1e-7):
  function pseudo_inverse_sqrt2 (line 138) | def pseudo_inverse_sqrt2(svd, eps=1e-7):
  function pseudo_inverse2 (line 150) | def pseudo_inverse2(svd, eps=1e-7):
  function pseudo_inverse_stable (line 164) | def pseudo_inverse_stable(svd, eps=1e-7):
  function regularized_inverse (line 179) | def regularized_inverse(mat, l=0.1):
  function regularized_inverse2 (line 184) | def regularized_inverse2(svd, L=1e-3):
  function regularized_inverse3 (line 196) | def regularized_inverse3(svd, L=1e-3):
  function regularized_inverse4 (line 212) | def regularized_inverse4(svd, L=1e-3):
  function pseudo_inverse_scipy (line 228) | def pseudo_inverse_scipy(tensor):
  function Identity (line 238) | def Identity(n, dtype=np.float32, name='dummy'):
  function ones (line 255) | def ones(n, dtype=None, name=None):
  function partition_list_np (line 261) | def partition_list_np(vec, sizes):
  function chunks (line 271) | def chunks(l, n):
  function partition_list (line 276) | def partition_list(l, sizes):
  function partition_list_test (line 287) | def partition_list_test():
  function v2c (line 295) | def v2c(vec):
  function v2c_np (line 300) | def v2c_np(vec):
  function v2r (line 305) | def v2r(vec):
  function c2v (line 310) | def c2v(col):
  function unvectorize_np (line 317) | def unvectorize_np(vec, rows):
  function unvec (line 324) | def unvec(vec, rows):
  function unvec_test (line 334) | def unvec_test():
  function vectorize_np (line 340) | def vectorize_np(mat):
  function vec (line 343) | def vec(mat):
  function vec_test (line 347) | def vec_test():
  function Kmat (line 353) | def Kmat(rows, cols):
  function Kmat_test (line 368) | def Kmat_test():
  function unflatten_np (line 387) | def unflatten_np(Wf, fs):
  function flatten_np (line 398) | def flatten_np(Ws):
  function flatten_np_test (line 401) | def flatten_np_test():
  function unflatten (line 408) | def unflatten(Wf, fs):
  function unflatten_test (line 422) | def unflatten_test():
  function flatten (line 431) | def flatten(Ws):
  function flatten_test (line 435) | def flatten_test():
  function check_close (line 443) | def check_close(a0, b0):
  function check_equal (line 446) | def check_equal(a0, b0, rtol=1e-9, atol=1e-12):
  function fix_shape (line 473) | def fix_shape(tf_shape):
  function kronecker_cols (line 476) | def kronecker_cols(a, b):
  function kronecker_cols_test (line 488) | def kronecker_cols_test():
  function kronecker (line 496) | def kronecker(A, B, do_shape_inference=True):
  function kronecker_test (line 528) | def kronecker_test():
  function col (line 542) | def col(A,i):
  function khatri_rao (line 549) | def khatri_rao(A, B):
  function khatri_rao_test (line 556) | def khatri_rao_test():
  function relu_mask (line 564) | def relu_mask(a, dtype=default_dtype):
  function relu_mask_test (line 570) | def relu_mask_test():
  function assert_rectangular (line 575) | def assert_rectangular(blocks):
  function empty_grid (line 579) | def empty_grid(rows, cols):
  function block_diagonal_inverse (line 586) | def block_diagonal_inverse(blocks):
  function block_diagonal_inverse_sqrt (line 606) | def block_diagonal_inverse_sqrt(blocks):
  function block_diagonal_inverse_test (line 625) | def block_diagonal_inverse_test():
  function t (line 635) | def t(x):
  function reset_time (line 642) | def reset_time():
  function record_time (line 647) | def record_time():
  function last_time (line 654) | def last_time():
  function summarize_time (line 661) | def summarize_time(time_list=None):
  function summarize_graph (line 678) | def summarize_graph(g=None):
  function disable_shape_inference (line 686) | def disable_shape_inference():
  function enable_shape_inference (line 689) | def enable_shape_inference():
  function dummy_collections_handler (line 696) | def dummy_collections_handler(info, elem, elem_): pass
  function disable_collections_handler (line 697) | def disable_collections_handler():
  function enable_collections_handler (line 699) | def enable_collections_handler():
  function dump_with_prompt (line 703) | def dump_with_prompt(result, fname, no_prefix=False):
  function dump (line 718) | def dump(result, fname, no_prefix=False):
  function dump32 (line 740) | def dump32(result, fname):
  function frobenius_np (line 750) | def frobenius_np(a):
  function nan_check (line 753) | def nan_check(result):
  function L2 (line 759) | def L2(t):
  class timeit (line 770) | class timeit:
    method __init__ (line 774) | def __init__(self, tag=""):
    method __enter__ (line 777) | def __enter__(self):
    method __exit__ (line 781) | def __exit__(self, *args):
  function record (line 796) | def record(tag, stat):
  function timeit_summarize (line 801) | def timeit_summarize():
  function parents (line 809) | def parents(op): return set(input.op for input in op.inputs)
  function children (line 810) | def children(op): return set(op for out in op.outputs for op in out.cons...
  function dict_graph (line 811) | def dict_graph():
  function nx_graph (line 817) | def nx_graph():
  function shortest_path (line 820) | def shortest_path(dep, target):
  function list_or_tuple (line 827) | def list_or_tuple(k):
  function is_numeric (line 830) | def is_numeric(ndarray):
  class VarInfo (line 834) | class VarInfo:
    method __init__ (line 836) | def __init__(self, setter, p):
  class SvdTuple (line 840) | class SvdTuple:
    method __init__ (line 844) | def __init__(self, suvi, *args):
  class SvdWrapper (line 865) | class SvdWrapper:
    method __init__ (line 872) | def __init__(self, target, name, do_inverses=False, use_resource=False):
    method update (line 945) | def update(self):
    method update_tf (line 952) | def update_tf(self):
    method update_scipy (line 956) | def update_scipy(self):
    method update_scipy_inv (line 962) | def update_scipy_inv(self):
    method update_scipy_svd (line 969) | def update_scipy_svd(self):
  function extract_grad (line 990) | def extract_grad(grads_and_vars, var):
  function intersept_op_creation (line 1002) | def intersept_op_creation(op_type_name_to_intercept):
  function get_variable (line 1015) | def get_variable(name, initializer, reuse=True):
  class VarStruct (line 1029) | class VarStruct:
    method __init__ (line 1044) | def __init__(self, initial_value, name, dtype=None):
    method set (line 1062) | def set(self, val):
    method initialize (line 1066) | def initialize(self):
  function get_var (line 1072) | def get_var(name, initializer, reuse=True):
  function run_all_tests (line 1093) | def run_all_tests(module):
  function capture_ops (line 1103) | def capture_ops():
  function capture_vars (line 1120) | def capture_vars():
  function Print (line 1136) | def Print(op):
  function get_host_prefix (line 1140) | def get_host_prefix():
  function summarize_difference (line 1144) | def summarize_difference(source, target):
  class BufferedWriter (line 1152) | class BufferedWriter:
    method __init__ (line 1155) | def __init__(self, outfn, save_every_secs=60*5):
    method write (line 1161) | def write(self, line):
    method flush (line 1170) | def flush():
  function ossystem (line 1176) | def ossystem(line):
  function setup_experiment_run_directory (line 1180) | def setup_experiment_run_directory(run, safe_mode=True):
  function get_last_logger (line 1205) | def get_last_logger(skip_existence_check=False):
  class TensorboardLogger (line 1213) | class TensorboardLogger:
    method __init__ (line 1222) | def __init__(self, run, step=0):
    method __call__ (line 1237) | def __call__(self, *args):
    method next_step (line 1242) | def next_step(self):
  function as_int32 (line 1253) | def as_int32(v):
  function add_dep (line 1257) | def add_dep(from_op, on_op):
  function register_default_session (line 1266) | def register_default_session(local_sess):
  function get_default_session (line 1271) | def get_default_session():
  function get_default_graph (line 1278) | def get_default_graph():
  function eval (line 1283) | def eval(tensor):
  function get_mnist_images (line 1289) | def get_mnist_images():
  function W_uniform (line 1330) | def W_uniform(s1, s2): # uniform weight init from Ng UFLDL
  function passthrough (line 1344) | def passthrough(obj, value): return value
  function main (line 1359) | def main():

FILE: keras_autoencoder/keras_large.py
  class TestCallback (line 39) | class TestCallback(callbacks.Callback):
    method __init__ (line 40) | def __init__(self, data_train, data_test, fn):
    method on_epoch_end (line 49) | def on_epoch_end(self, epoch, logs={}):

FILE: keras_autoencoder/util.py
  function set_global_args (line 35) | def set_global_args(local_args):
  function concat_blocks (line 40) | def concat_blocks(blocks, validate_dims=True):
  function concat_blocks_test (line 55) | def concat_blocks_test():
  function partition_matrix_evenly (line 64) | def partition_matrix_evenly(mat, splits):
  function partition_matrix_evenly_test (line 73) | def partition_matrix_evenly_test():
  function partition_matrix (line 81) | def partition_matrix(mat, sizes):
  function partition_matrix_test (line 84) | def partition_matrix_test():
  function pseudo_inverse (line 89) | def pseudo_inverse(mat, eps=1e-10):
  function symsqrt (line 97) | def symsqrt(mat, eps=1e-7):
  function pseudo_inverse_sqrt (line 105) | def pseudo_inverse_sqrt(mat, eps=1e-7):
  function pseudo_inverse_sqrt2 (line 112) | def pseudo_inverse_sqrt2(svd, eps=1e-7):
  function pseudo_inverse2 (line 124) | def pseudo_inverse2(svd, eps=1e-7):
  function pseudo_inverse_stable (line 138) | def pseudo_inverse_stable(svd, eps=1e-7):
  function regularized_inverse (line 153) | def regularized_inverse(mat, l=0.1):
  function regularized_inverse2 (line 157) | def regularized_inverse2(svd, L=1e-3):
  function regularized_inverse3 (line 169) | def regularized_inverse3(svd, L=1e-3):
  function regularized_inverse4 (line 185) | def regularized_inverse4(svd, L=1e-3):
  function pseudo_inverse_scipy (line 201) | def pseudo_inverse_scipy(tensor):
  function Identity (line 211) | def Identity(n, dtype=None, name=None):
  function ones (line 221) | def ones(n, dtype=None, name=None):
  function partition_list_np (line 227) | def partition_list_np(vec, sizes):
  function chunks (line 237) | def chunks(l, n):
  function partition_list (line 242) | def partition_list(l, sizes):
  function partition_list_test (line 253) | def partition_list_test():
  function v2c (line 261) | def v2c(vec):
  function v2c_np (line 266) | def v2c_np(vec):
  function v2r (line 271) | def v2r(vec):
  function c2v (line 276) | def c2v(col):
  function unvectorize_np (line 283) | def unvectorize_np(vec, rows):
  function unvec (line 290) | def unvec(vec, rows):
  function unvec_test (line 300) | def unvec_test():
  function vectorize_np (line 306) | def vectorize_np(mat):
  function vec (line 309) | def vec(mat):
  function vec_test (line 313) | def vec_test():
  function Kmat (line 319) | def Kmat(rows, cols):
  function Kmat_test (line 334) | def Kmat_test():
  function unflatten_np (line 353) | def unflatten_np(Wf, fs):
  function flatten_np (line 364) | def flatten_np(Ws):
  function flatten_np_test (line 367) | def flatten_np_test():
  function unflatten (line 374) | def unflatten(Wf, fs):
  function unflatten_test (line 388) | def unflatten_test():
  function flatten (line 397) | def flatten(Ws):
  function flatten_test (line 401) | def flatten_test():
  function check_close (line 409) | def check_close(a0, b0):
  function check_equal (line 412) | def check_equal(a0, b0, rtol=1e-9, atol=1e-12):
  function fix_shape (line 439) | def fix_shape(tf_shape):
  function kronecker_cols (line 442) | def kronecker_cols(a, b):
  function kronecker_cols_test (line 454) | def kronecker_cols_test():
  function kronecker (line 462) | def kronecker(A, B, do_shape_inference=True):
  function kronecker_test (line 494) | def kronecker_test():
  function col (line 508) | def col(A,i):
  function khatri_rao (line 515) | def khatri_rao(A, B):
  function khatri_rao_test (line 522) | def khatri_rao_test():
  function relu_mask (line 530) | def relu_mask(a, dtype=default_dtype):
  function relu_mask_test (line 536) | def relu_mask_test():
  function assert_rectangular (line 541) | def assert_rectangular(blocks):
  function empty_grid (line 545) | def empty_grid(rows, cols):
  function block_diagonal_inverse (line 552) | def block_diagonal_inverse(blocks):
  function block_diagonal_inverse_sqrt (line 572) | def block_diagonal_inverse_sqrt(blocks):
  function block_diagonal_inverse_test (line 591) | def block_diagonal_inverse_test():
  function t (line 601) | def t(x):
  function reset_time (line 608) | def reset_time():
  function record_time (line 613) | def record_time():
  function last_time (line 619) | def last_time():
  function summarize_time (line 626) | def summarize_time(time_list=None):
  function summarize_graph (line 643) | def summarize_graph(g=None):
  function disable_shape_inference (line 651) | def disable_shape_inference():
  function enable_shape_inference (line 654) | def enable_shape_inference():
  function dump_with_prompt (line 658) | def dump_with_prompt(result, fname, no_prefix=False):
  function dump (line 673) | def dump(result, fname, no_prefix=False):
  function dump32 (line 695) | def dump32(result, fname):
  function frobenius_np (line 705) | def frobenius_np(a):
  function nan_check (line 708) | def nan_check(result):
  function L2 (line 714) | def L2(t):
  class timeit (line 724) | class timeit:
    method __init__ (line 725) | def __init__(self, tag=""):
    method __enter__ (line 728) | def __enter__(self):
    method __exit__ (line 732) | def __exit__(self, *args):
  function timeit_summarize (line 741) | def timeit_summarize():
  function parents (line 749) | def parents(op): return set(input.op for input in op.inputs)
  function children (line 750) | def children(op): return set(op for out in op.outputs for op in out.cons...
  function dict_graph (line 751) | def dict_graph():
  function nx_graph (line 757) | def nx_graph():
  function shortest_path (line 760) | def shortest_path(dep, target):
  function list_or_tuple (line 767) | def list_or_tuple(k):
  function is_numeric (line 770) | def is_numeric(ndarray):
  class VarInfo (line 774) | class VarInfo:
    method __init__ (line 776) | def __init__(self, setter, p):
  class SvdTuple (line 780) | class SvdTuple:
    method __init__ (line 784) | def __init__(self, suvi, *args):
  class SvdWrapper (line 805) | class SvdWrapper:
    method __init__ (line 812) | def __init__(self, target, name, do_inverses=False):
    method update (line 872) | def update(self):
    method update_tf (line 879) | def update_tf(self):
    method update_scipy (line 883) | def update_scipy(self):
    method update_scipy_inv (line 889) | def update_scipy_inv(self):
    method update_scipy_svd (line 896) | def update_scipy_svd(self):
  function extract_grad (line 917) | def extract_grad(grads_and_vars, var):
  function intersept_op_creation (line 929) | def intersept_op_creation(op_type_name_to_intercept):
  function get_variable (line 942) | def get_variable(name, initializer, reuse=True):
  class VarStruct (line 956) | class VarStruct:
    method __init__ (line 971) | def __init__(self, initial_value, name, dtype=None):
    method set (line 989) | def set(self, val):
    method initialize (line 993) | def initialize(self):
  function get_var (line 999) | def get_var(name, initializer, reuse=True):
  function run_all_tests (line 1020) | def run_all_tests(module):
  function capture_ops (line 1030) | def capture_ops():
  function capture_vars (line 1047) | def capture_vars():
  function Print (line 1063) | def Print(op):
  function get_host_prefix (line 1067) | def get_host_prefix():
  function summarize_difference (line 1071) | def summarize_difference(source, target):
  class BufferedWriter (line 1079) | class BufferedWriter:
    method __init__ (line 1082) | def __init__(self, outfn, save_every_secs=60*5):
    method write (line 1088) | def write(self, line):
    method flush (line 1097) | def flush():
  function ossystem (line 1103) | def ossystem(line):
  function setup_experiment_run_directory (line 1107) | def setup_experiment_run_directory(run, safe_mode=True):
  function get_last_logger (line 1132) | def get_last_logger():
  class TensorboardLogger (line 1137) | class TensorboardLogger:
    method __init__ (line 1146) | def __init__(self, run, step=0):
    method __call__ (line 1161) | def __call__(self, *args):
    method next_step (line 1166) | def next_step(self):
  function as_int32 (line 1176) | def as_int32(v):
  function add_dep (line 1180) | def add_dep(from_op, on_op):
  function register_default_session (line 1189) | def register_default_session(local_sess):
  function get_default_session (line 1194) | def get_default_session():
  function get_default_graph (line 1201) | def get_default_graph():
  function eval (line 1206) | def eval(tensor):

FILE: keras_autoencoder/weightnorm.py
  class SGDWithWeightnorm (line 6) | class SGDWithWeightnorm(SGD):
    method get_updates (line 7) | def get_updates(self, params, constraints, loss):
  class AdamWithWeightnorm (line 75) | class AdamWithWeightnorm(Adam):
    method get_updates (line 76) | def get_updates(self, params, constraints, loss):
  function get_weightnorm_params_and_grads (line 146) | def get_weightnorm_params_and_grads(p, g):
  function add_weightnorm_param_updates (line 169) | def add_weightnorm_param_updates(updates, new_V_param, new_g_param, W, V...
  function data_based_init (line 182) | def data_based_init(model, input):

FILE: khatri_rao_benchmark.py
  function benchmark_construct (line 10) | def benchmark_construct(dims, iters, dtype):
  function benchmark_execute (line 20) | def benchmark_execute(dims, iters, dtype):

FILE: lazy_dog.py
  function argmax (line 30) | def argmax(t):
  function decode (line 33) | def decode(start_tokens, length=10):
  function main (line 59) | def main():

FILE: linalg-benchmark/benchmark.py
  function main (line 45) | def main():
  function get_tensorflow_version_url (line 125) | def get_tensorflow_version_url():
  function get_mkl_version (line 139) | def get_mkl_version():
  function traced_run (line 157) | def traced_run(fetches):
  function benchmark (line 179) | def benchmark(message, func):
  function print_cpu_info (line 204) | def print_cpu_info():

FILE: linalg-benchmark/launch.py
  function launch (line 18) | def launch(instance):
  function main (line 36) | def main():

FILE: linalg-benchmark/launch_tensorflow_svd_crash.py
  function main (line 13) | def main():

FILE: line_search_example/line_search_example.py
  function W_uniform (line 27) | def W_uniform(s1, s2):
  function f (line 66) | def f(i): return fs[i+1]  # W[i] has shape f[i] x f[i-1]
  function init_var (line 71) | def init_var(val, name, trainable=False):
  function sigmoid (line 85) | def sigmoid(x):
  function d_sigmoid (line 87) | def d_sigmoid(y):
  function kl (line 89) | def kl(x, y):
  function d_kl (line 91) | def d_kl(x, y):
  function save_wf (line 148) | def save_wf(): sess.run(Wf_save_op)
  function restore_wf (line 149) | def restore_wf(): sess.run(Wf_restore_op)
  function save_grad (line 150) | def save_grad(): sess.run(grad_save_op)
  function step_wf (line 151) | def step_wf(step):

FILE: line_search_example/util.py
  function set_global_args (line 35) | def set_global_args(local_args):
  function concat_blocks (line 40) | def concat_blocks(blocks, validate_dims=True):
  function concat_blocks_test (line 55) | def concat_blocks_test():
  function partition_matrix_evenly (line 64) | def partition_matrix_evenly(mat, splits):
  function partition_matrix_evenly_test (line 73) | def partition_matrix_evenly_test():
  function partition_matrix (line 81) | def partition_matrix(mat, sizes):
  function partition_matrix_test (line 84) | def partition_matrix_test():
  function pseudo_inverse (line 89) | def pseudo_inverse(mat, eps=1e-10):
  function symsqrt (line 97) | def symsqrt(mat, eps=1e-7):
  function pseudo_inverse_sqrt (line 105) | def pseudo_inverse_sqrt(mat, eps=1e-7):
  function pseudo_inverse_sqrt2 (line 112) | def pseudo_inverse_sqrt2(svd, eps=1e-7):
  function pseudo_inverse2 (line 124) | def pseudo_inverse2(svd, eps=1e-7):
  function pseudo_inverse_stable (line 138) | def pseudo_inverse_stable(svd, eps=1e-7):
  function regularized_inverse (line 153) | def regularized_inverse(mat, l=0.1):
  function regularized_inverse2 (line 157) | def regularized_inverse2(svd, L=1e-3):
  function regularized_inverse3 (line 169) | def regularized_inverse3(svd, L=1e-3):
  function regularized_inverse4 (line 185) | def regularized_inverse4(svd, L=1e-3):
  function pseudo_inverse_scipy (line 201) | def pseudo_inverse_scipy(tensor):
  function Identity (line 211) | def Identity(n, dtype=None, name=None):
  function ones (line 221) | def ones(n, dtype=None, name=None):
  function partition_list_np (line 227) | def partition_list_np(vec, sizes):
  function chunks (line 237) | def chunks(l, n):
  function partition_list (line 242) | def partition_list(l, sizes):
  function partition_list_test (line 253) | def partition_list_test():
  function v2c (line 261) | def v2c(vec):
  function v2c_np (line 266) | def v2c_np(vec):
  function v2r (line 271) | def v2r(vec):
  function c2v (line 276) | def c2v(col):
  function unvectorize_np (line 283) | def unvectorize_np(vec, rows):
  function unvec (line 290) | def unvec(vec, rows):
  function unvec_test (line 300) | def unvec_test():
  function vectorize_np (line 306) | def vectorize_np(mat):
  function vec (line 309) | def vec(mat):
  function vec_test (line 313) | def vec_test():
  function Kmat (line 319) | def Kmat(rows, cols):
  function Kmat_test (line 334) | def Kmat_test():
  function unflatten_np (line 353) | def unflatten_np(Wf, fs):
  function flatten_np (line 364) | def flatten_np(Ws):
  function flatten_np_test (line 367) | def flatten_np_test():
  function unflatten (line 374) | def unflatten(Wf, fs):
  function unflatten_test (line 388) | def unflatten_test():
  function flatten (line 397) | def flatten(Ws):
  function flatten_test (line 401) | def flatten_test():
  function check_close (line 409) | def check_close(a0, b0):
  function check_equal (line 412) | def check_equal(a0, b0, rtol=1e-9, atol=1e-12):
  function fix_shape (line 439) | def fix_shape(tf_shape):
  function kronecker_cols (line 442) | def kronecker_cols(a, b):
  function kronecker_cols_test (line 454) | def kronecker_cols_test():
  function kronecker (line 462) | def kronecker(A, B, do_shape_inference=True):
  function kronecker_test (line 494) | def kronecker_test():
  function col (line 508) | def col(A,i):
  function khatri_rao (line 515) | def khatri_rao(A, B):
  function khatri_rao_test (line 522) | def khatri_rao_test():
  function relu_mask (line 530) | def relu_mask(a, dtype=default_dtype):
  function relu_mask_test (line 536) | def relu_mask_test():
  function assert_rectangular (line 541) | def assert_rectangular(blocks):
  function empty_grid (line 545) | def empty_grid(rows, cols):
  function block_diagonal_inverse (line 552) | def block_diagonal_inverse(blocks):
  function block_diagonal_inverse_sqrt (line 572) | def block_diagonal_inverse_sqrt(blocks):
  function block_diagonal_inverse_test (line 591) | def block_diagonal_inverse_test():
  function t (line 601) | def t(x):
  function reset_time (line 608) | def reset_time():
  function record_time (line 613) | def record_time():
  function last_time (line 619) | def last_time():
  function summarize_time (line 626) | def summarize_time(time_list=None):
  function summarize_graph (line 643) | def summarize_graph(g=None):
  function disable_shape_inference (line 651) | def disable_shape_inference():
  function enable_shape_inference (line 654) | def enable_shape_inference():
  function dump_with_prompt (line 658) | def dump_with_prompt(result, fname, no_prefix=False):
  function dump (line 673) | def dump(result, fname, no_prefix=False):
  function dump32 (line 695) | def dump32(result, fname):
  function frobenius_np (line 705) | def frobenius_np(a):
  function nan_check (line 708) | def nan_check(result):
  function L2 (line 714) | def L2(t):
  class timeit (line 724) | class timeit:
    method __init__ (line 725) | def __init__(self, tag=""):
    method __enter__ (line 728) | def __enter__(self):
    method __exit__ (line 732) | def __exit__(self, *args):
  function timeit_summarize (line 741) | def timeit_summarize():
  function parents (line 749) | def parents(op): return set(input.op for input in op.inputs)
  function children (line 750) | def children(op): return set(op for out in op.outputs for op in out.cons...
  function dict_graph (line 751) | def dict_graph():
  function nx_graph (line 757) | def nx_graph():
  function shortest_path (line 760) | def shortest_path(dep, target):
  function list_or_tuple (line 767) | def list_or_tuple(k):
  function is_numeric (line 770) | def is_numeric(ndarray):
  class VarInfo (line 774) | class VarInfo:
    method __init__ (line 776) | def __init__(self, setter, p):
  class SvdTuple (line 780) | class SvdTuple:
    method __init__ (line 784) | def __init__(self, suvi, *args):
  class SvdWrapper (line 805) | class SvdWrapper:
    method __init__ (line 812) | def __init__(self, target, name, do_inverses=False):
    method update (line 872) | def update(self):
    method update_tf (line 879) | def update_tf(self):
    method update_scipy (line 883) | def update_scipy(self):
    method update_scipy_inv (line 889) | def update_scipy_inv(self):
    method update_scipy_svd (line 896) | def update_scipy_svd(self):
  function extract_grad (line 917) | def extract_grad(grads_and_vars, var):
  function intersept_op_creation (line 929) | def intersept_op_creation(op_type_name_to_intercept):
  function get_variable (line 942) | def get_variable(name, initializer, reuse=True):
  class VarStruct (line 956) | class VarStruct:
    method __init__ (line 971) | def __init__(self, initial_value, name, dtype=None):
    method set (line 989) | def set(self, val):
    method initialize (line 993) | def initialize(self):
  function get_var (line 999) | def get_var(name, initializer, reuse=True):
  function run_all_tests (line 1020) | def run_all_tests(module):
  function capture_ops (line 1030) | def capture_ops():
  function capture_vars (line 1047) | def capture_vars():
  function Print (line 1063) | def Print(op):
  function get_host_prefix (line 1067) | def get_host_prefix():
  function summarize_difference (line 1071) | def summarize_difference(source, target):
  class BufferedWriter (line 1079) | class BufferedWriter:
    method __init__ (line 1082) | def __init__(self, outfn, save_every_secs=60*5):
    method write (line 1088) | def write(self, line):
    method flush (line 1097) | def flush():
  function ossystem (line 1103) | def ossystem(line):
  function setup_experiment_run_directory (line 1107) | def setup_experiment_run_directory(run, safe_mode=True):
  function get_last_logger (line 1132) | def get_last_logger():
  class TensorboardLogger (line 1137) | class TensorboardLogger:
    method __init__ (line 1146) | def __init__(self, run, step=0):
    method __call__ (line 1161) | def __call__(self, *args):
    method next_step (line 1166) | def next_step(self):
  function as_int32 (line 1176) | def as_int32(v):
  function add_dep (line 1180) | def add_dep(from_op, on_op):
  function register_default_session (line 1185) | def register_default_session(local_sess):
  function get_default_session (line 1190) | def get_default_session():
  function eval (line 1195) | def eval(tensor):

FILE: linearize/linearize.py
  function run_after (line 11) | def run_after(a, b):
  function initialize_control_outputs (line 26) | def initialize_control_outputs(g):
  function nodesort (line 46) | def nodesort(ops):
  function parents_with_controls (line 51) | def parents_with_controls(op):
  function parents (line 57) | def parents(op):
  function children (line 61) | def children(op):
  function children_with_controls (line 65) | def children_with_controls(op):
  function get_graph (line 78) | def get_graph(g=None, as_hashes=False, exclude_controls=False):
  function print_tf_graph (line 103) | def print_tf_graph(graph):
  function memsorted (line 111) | def memsorted(nodes):
  function is_iterable (line 130) | def is_iterable(o):
  function linearize (line 140) | def linearize(targets=None, modify_graph=True):

FILE: linearize/linearize_test.py
  function create_session (line 13) | def create_session():
  function setup_env (line 17) | def setup_env():
  function make_caterpillar_graph (line 34) | def make_caterpillar_graph(length=5, node_mbs=1):
  function test_print (line 64) | def test_print():
  function test_toposort (line 80) | def test_toposort():
  function test_linearize (line 86) | def test_linearize():

FILE: linearize/memory_util.py
  function vlog (line 9) | def vlog(level):
  class TemporaryFileHelper (line 13) | class TemporaryFileHelper:
    method __init__ (line 15) | def __init__(self, temporary_file):
    method getvalue (line 17) | def getvalue(self):
  class capture_stderr (line 23) | class capture_stderr:
    method __init__ (line 31) | def __init__(self, fd=STDERR):
    method __enter__ (line 35) | def __enter__(self):
    method __exit__ (line 41) | def __exit__(self, exc_type, exc_value, traceback):
  function _parse_logline (line 83) | def _parse_logline(l):
  function memory_timeline (line 138) | def memory_timeline(log):
  function peak_memory (line 201) | def peak_memory(log, gpu_only=False):
  function print_memory_timeline (line 215) | def print_memory_timeline(log, gpu_only=False, ignore_less_than_bytes=0):
  function plot_memory_timeline (line 230) | def plot_memory_timeline(log, gpu_only=False, ignore_less_than_bytes=1000):
  function smart_initialize (line 257) | def smart_initialize(variables=None, sess=None):

FILE: matmul_benchmark_seq.py
  class timespec (line 21) | class timespec(ctypes.Structure):
  function clock_gettime (line 30) | def clock_gettime(clk_id):
  function bench (line 66) | def bench(n):
  function main (line 119) | def main():

FILE: natural_gradient_multilayer.py
  function gd_test (line 27) | def gd_test():
  function gd_manual_test (line 118) | def gd_manual_test():
  function gd_manual_vectorized_test (line 235) | def gd_manual_vectorized_test():
  function fisher_test (line 357) | def fisher_test():
  function natural_gradient_test (line 490) | def natural_gradient_test():
  function newton_test (line 619) | def newton_test():
  function relu_manual_vectorized_test (line 790) | def relu_manual_vectorized_test():

FILE: notebook_util.py
  function run_command (line 8) | def run_command(cmd):
  function list_available_gpus (line 14) | def list_available_gpus():
  function gpu_memory_map (line 27) | def gpu_memory_map():
  function pick_gpu_lowest_memory (line 46) | def pick_gpu_lowest_memory():
  function setup_one_gpu (line 53) | def setup_one_gpu():
  function setup_no_gpu (line 60) | def setup_no_gpu():

FILE: numpy_initializers/kfac_cifar.py
  function W_uniform (line 104) | def W_uniform(s1, s2): # uniform weight init from Ng UFLDL
  function ng_init (line 110) | def ng_init(rows, cols):
  function sessrun (line 145) | def sessrun(*args, **kwargs):
  function model_creator (line 172) | def model_creator(batch_size, name="default", dtype=np.float32):
  function main (line 398) | def main():

FILE: numpy_initializers/util.py
  function set_global_args (line 36) | def set_global_args(local_args):
  function concat_blocks (line 41) | def concat_blocks(blocks, validate_dims=True):
  function concat_blocks_test (line 56) | def concat_blocks_test():
  function partition_matrix_evenly (line 65) | def partition_matrix_evenly(mat, splits):
  function partition_matrix_evenly_test (line 74) | def partition_matrix_evenly_test():
  function partition_matrix (line 82) | def partition_matrix(mat, sizes):
  function partition_matrix_test (line 85) | def partition_matrix_test():
  function pseudo_inverse (line 90) | def pseudo_inverse(mat, eps=1e-10):
  function symsqrt (line 98) | def symsqrt(mat, eps=1e-7):
  function pseudo_inverse_sqrt (line 106) | def pseudo_inverse_sqrt(mat, eps=1e-7):
  function pseudo_inverse_sqrt2 (line 113) | def pseudo_inverse_sqrt2(svd, eps=1e-7):
  function pseudo_inverse2 (line 125) | def pseudo_inverse2(svd, eps=1e-7):
  function pseudo_inverse_stable (line 139) | def pseudo_inverse_stable(svd, eps=1e-7):
  function regularized_inverse (line 154) | def regularized_inverse(mat, l=0.1):
  function regularized_inverse2 (line 158) | def regularized_inverse2(svd, L=1e-3):
  function regularized_inverse3 (line 170) | def regularized_inverse3(svd, L=1e-3):
  function regularized_inverse4 (line 186) | def regularized_inverse4(svd, L=1e-3):
  function pseudo_inverse_scipy (line 202) | def pseudo_inverse_scipy(tensor):
  function Identity (line 212) | def Identity(n, dtype=None, name=None):
  function ones (line 222) | def ones(n, dtype=None, name=None):
  function partition_list_np (line 228) | def partition_list_np(vec, sizes):
  function chunks (line 238) | def chunks(l, n):
  function partition_list (line 243) | def partition_list(l, sizes):
  function partition_list_test (line 254) | def partition

Copy disabled (too large) Download .json

Condensed preview — 266 files, each showing path, character count, and a content snippet. Download the .json file for the full structured content (16,799K chars).

[
  {
    "path": ".gitignore",
    "chars": 269,
    "preview": "/__pycache__\n/.ipynb_checkpoints\n*#\n*~\n/linalg-benchmark/.idea/linalg-benchmark.iml\n/linalg-benchmark/.idea/misc.xml\n/li"
  },
  {
    "path": "README.md",
    "chars": 8,
    "preview": "# stuff\n"
  },
  {
    "path": "akaitsuki-slow/config.py",
    "chars": 2294,
    "preview": "import argparse\n \n \ndef str2bool(v):\n    return v.lower() in ('y', 'yes', 't', 'true', '1')\n \n \ndef get_args():\n    pars"
  },
  {
    "path": "akaitsuki-slow/feed_dict.pbtxt",
    "chars": 3595,
    "preview": "step_stats {\n  dev_stats {\n    device: \"/job:localhost/replica:0/task:0/cpu:0\"\n    node_stats {\n      node_name: \"_SOURC"
  },
  {
    "path": "akaitsuki-slow/feed_dict.py",
    "chars": 500,
    "preview": "import numpy as np\nimport tensorflow as tf\nfrom tensorflow.python.client import timeline \n \n\nsess = tf.Session()\na = tf."
  },
  {
    "path": "akaitsuki-slow/main.py",
    "chars": 9022,
    "preview": "import logging\nimport time\nimport config\nimport numpy as np\nimport tensorflow as tf\nfrom tensorflow.python.ops import ar"
  },
  {
    "path": "autotune/README.md",
    "chars": 177,
    "preview": "To run tests in this directory\n\n```\npytest\n```\n\nIf there's a slow test, you can run this file directly to see timings of"
  },
  {
    "path": "autotune/autograd_lib.py",
    "chars": 59417,
    "preview": "\"\"\"\nLibrary for extracting interesting quantites from autograd.\n\nNot thread-safe because of module-level variables affec"
  },
  {
    "path": "autotune/autograd_lib_test.py",
    "chars": 33262,
    "preview": "import sys\nfrom collections import namedtuple, defaultdict\n\nimport autograd_lib\nimport pytest\n\nimport util as u\n\n# Test "
  },
  {
    "path": "autotune/autograd_test.py",
    "chars": 36569,
    "preview": "# Tests that compare manual computation of quantities against PyTorch autograd\n\nimport os\nimport sys\n\nimport globals as "
  },
  {
    "path": "autotune/ciresan_bench.py",
    "chars": 3218,
    "preview": "import os\nimport sys\nimport time\nfrom typing import Optional, Tuple, Callable\n\n# import torch\nimport scipy\nimport torch\n"
  },
  {
    "path": "autotune/curvature_test.py",
    "chars": 18263,
    "preview": "# Prototype batch-size quantities from\n# Batch size formulas (https://docs.google.com/document/d/19Jmh4spbSAnAGX_eq7WSFP"
  },
  {
    "path": "autotune/eval_conv2d_approx.py",
    "chars": 6251,
    "preview": "\"\"\"Evaluate approximation quality of factoring on conv2d layers.\n\nEvaluates discrepancy in magnitude (l2 norm) and value"
  },
  {
    "path": "autotune/factored_test.py",
    "chars": 5713,
    "preview": "\"\"\"Test factored implementation of stats\"\"\"\n\nimport argparse\nimport os\nimport sys\nimport time\n\nimport autograd_lib\nimpor"
  },
  {
    "path": "autotune/globals.py",
    "chars": 1312,
    "preview": "# Module to hold global variables for curvature computation functions.\n# This is needed sincne functionality may be spli"
  },
  {
    "path": "autotune/hessian_test.py",
    "chars": 23219,
    "preview": "# Test exact Hessian computation\n\n# import torch\nimport sys\nfrom typing import Callable\n\nimport torch\nimport torch.nn as"
  },
  {
    "path": "autotune/linalg_bench.py",
    "chars": 3102,
    "preview": "import os\nimport sys\nimport time\nfrom typing import Optional, Tuple, Callable\n\n# import torch\nimport scipy\nimport torch\n"
  },
  {
    "path": "autotune/linesearch_test_disabled.py",
    "chars": 18470,
    "preview": "# Take simple MNIST model, test that line-search in Newton direction finds optimum\n# Additionally test Hessian manual vs"
  },
  {
    "path": "autotune/lyapunov_test.py",
    "chars": 7325,
    "preview": "import os\nimport sys\nimport time\nfrom typing import Optional, Tuple, Callable\n\n# import torch\nimport scipy\nimport torch\n"
  },
  {
    "path": "autotune/mnist_end2end_test.py",
    "chars": 15487,
    "preview": "import argparse\nimport os\nimport time\n\nimport autograd_lib\nimport globals as gl\n# import torch\nimport scipy\nimport torch"
  },
  {
    "path": "autotune/plotting_test.py",
    "chars": 22535,
    "preview": "# Plot simple minimization problem in wandb\n\nimport argparse\nimport json\nimport os\nimport random\nimport shutil\nimport sy"
  },
  {
    "path": "autotune/pytorch_benchmark.py",
    "chars": 3138,
    "preview": "\"\"\"\n(pytorch_p36) [ec2-user@ip-172-31-6-232 cifar]$ python pytorch_benchmark.py\nMKL version b'Intel(R) Math Kernel Libra"
  },
  {
    "path": "autotune/scipy_benchmark.py",
    "chars": 3486,
    "preview": "\"\"\"\nMKL version b'Intel(R) Math Kernel Library Version 2019.0.4 Product Build 20190411 for Intel(R) 64 architecture appl"
  },
  {
    "path": "autotune/svd_benchmark.py",
    "chars": 2334,
    "preview": "# Fastest way to compute eigenvectors for 4k matrix?\n#\n# Inverse on i3.metal\n# n=4096: 368 ms ± 1.51 ms per loop\n# \n# Xe"
  },
  {
    "path": "autotune/test/gesvd_crash.txt",
    "chars": 10437248,
    "preview": "6.559017896652221680e-01,0.000000000000000000e+00,0.000000000000000000e+00,0.000000000000000000e+00,0.000000000000000000"
  },
  {
    "path": "autotune/train_ciresan.py",
    "chars": 13925,
    "preview": "# Train Ciresan's 6-layer deep MNIST network\n# (from http://yann.lecun.com/exdb/mnist/)\n\nimport argparse\nimport os\nimpor"
  },
  {
    "path": "autotune/train_ciresan_cca.py",
    "chars": 10483,
    "preview": "# Train Ciresan's 6-layer deep MNIST network\n# (from http://yann.lecun.com/exdb/mnist/)\n\nimport argparse\nimport os\nimpor"
  },
  {
    "path": "autotune/train_ciresan_factored.py",
    "chars": 9679,
    "preview": "# Train Ciresan's 6-layer deep MNIST network\n# (from http://yann.lecun.com/exdb/mnist/)\n\nimport argparse\nimport os\nimpor"
  },
  {
    "path": "autotune/train_ciresan_new.py",
    "chars": 35478,
    "preview": "# TODO(y): FRI -- go over all formulas and results from ciresan run\n# TODO(y): add angle historgam\n# Train Ciresan's 6-l"
  },
  {
    "path": "autotune/train_medium.py",
    "chars": 15289,
    "preview": "import argparse\nimport os\nimport time\n\nimport autograd_lib\nimport globals as gl\n# import torch\nimport scipy\nimport torch"
  },
  {
    "path": "autotune/train_small.py",
    "chars": 16730,
    "preview": "# To verify Newton convergence in 1 step\n# python train_tiny.py --wandb=0 --method=newton --nonlin=0 --layer=0\n\nimport a"
  },
  {
    "path": "autotune/train_small_xent.py",
    "chars": 11858,
    "preview": "\"\"\"Train small network on MNIST with Cross-Entropy loss\"\"\"\n\nimport argparse\nimport os\nimport time\n\nimport autograd_lib\ni"
  },
  {
    "path": "autotune/train_small_xent_factored.py",
    "chars": 7464,
    "preview": "\"\"\"Train small network on MNIST with Cross-Entropy loss\"\"\"\n\nimport argparse\nimport os\nimport time\n\nimport autograd_lib\ni"
  },
  {
    "path": "autotune/train_tiny.py",
    "chars": 12087,
    "preview": "# To verify Newton convergence in 1 step\n# python train_tiny.py --wandb=0 --method=newton --nonlin=0 --layer=0\n\nimport a"
  },
  {
    "path": "autotune/train_tiny_xent.py",
    "chars": 11406,
    "preview": "\"\"\"Train small network on MNIST with Cross-Entropy loss\"\"\"\n\nimport argparse\nimport os\nimport time\n\nimport autograd_lib\ni"
  },
  {
    "path": "autotune/util.py",
    "chars": 86819,
    "preview": "# Take simple example, plot per-layer stats over time\n# This function allows you to visualize the statistics of a layer."
  },
  {
    "path": "autotune/util_test.py",
    "chars": 17375,
    "preview": "import math\nimport os\nimport sys\n\n# import torch\nimport pytest\nimport scipy\nfrom scipy import linalg\nimport torch\n\nimpor"
  },
  {
    "path": "aws-recipes.ipynb",
    "chars": 89439,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Init\"\n   ]\n  },\n  {\n   \"cell_type"
  },
  {
    "path": "aws-scratch.ipynb",
    "chars": 630237,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# Init\"\n   ]\n  },\n  {\n   \"cell_type"
  },
  {
    "path": "benchmark_huggingface_predict.py",
    "chars": 3396,
    "preview": "# Simple benchmark to time prediction using huggingface API\n# 150ms prediction on first word (18 word context)\n# 50ms pr"
  },
  {
    "path": "bin/tfversion",
    "chars": 506,
    "preview": "#!/usr/bin/env python\nimport os\nos.environ['TF_CPP_MIN_LOG_LEVEL']='2'\nimport tensorflow as tf\nversion=tf.__version__\npr"
  },
  {
    "path": "clipping-profile.ipynb",
    "chars": 48151,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {\n    \"collapsed\": true\n   },\n   \"source\": [\n    \"# Memory "
  },
  {
    "path": "cluster/.gitignore",
    "chars": 11,
    "preview": "/.DS_Store\n"
  },
  {
    "path": "cluster/README.md",
    "chars": 23,
    "preview": "# cluster\ntrain on AWS\n"
  },
  {
    "path": "cluster/async_adder.py",
    "chars": 12281,
    "preview": "#!/usr/bin/env python\nimport base64\nimport os\nimport portpicker\nimport subprocess\nimport sys\nimport tensorflow as tf\nimp"
  },
  {
    "path": "cluster/aws.py",
    "chars": 13143,
    "preview": "import threading\nimport base64\nimport struct\nfrom collections import OrderedDict\nfrom pprint import pprint as pp\nimport "
  },
  {
    "path": "cluster/benchmark_grpc_recv.py",
    "chars": 6388,
    "preview": "#!/usr/bin/env python\n#\n# Dependencies:\n# portpicker (pip install portpicker)\n# tcmalloc4 (sudo apt-get install google-p"
  },
  {
    "path": "cluster/benchmarks/.gitignore",
    "chars": 6,
    "preview": "*.pyc\n"
  },
  {
    "path": "cluster/benchmarks/LICENSE",
    "chars": 11357,
    "preview": "                                 Apache License\n                           Version 2.0, January 2004\n                   "
  },
  {
    "path": "cluster/benchmarks/README.md",
    "chars": 1806,
    "preview": "# Instructions for adding distributed benchmarks to continuous run:\n\n1. You can add your benchmark file under\n   [tensor"
  },
  {
    "path": "cluster/benchmarks/bower_components/d3/.bower.json",
    "chars": 691,
    "preview": "{\n  \"name\": \"d3\",\n  \"version\": \"3.5.5\",\n  \"main\": \"d3.js\",\n  \"scripts\": [\n    \"d3.js\"\n  ],\n  \"ignore\": [\n    \".DS_Store\""
  },
  {
    "path": "cluster/benchmarks/bower_components/d3/.gitattributes",
    "chars": 138,
    "preview": "bower.json -diff merge=ours\ncomponent.json -diff merge=ours\nd3.js -diff merge=ours\nd3.min.js -diff merge=ours\npackage.js"
  },
  {
    "path": "cluster/benchmarks/bower_components/d3/CONTRIBUTING.md",
    "chars": 2776,
    "preview": "# Contributing\n\n**Important:** these GitHub issues are for *bug reports and feature requests only*. Please use [StackOve"
  },
  {
    "path": "cluster/benchmarks/bower_components/d3/LICENSE",
    "chars": 1429,
    "preview": "Copyright (c) 2010-2015, Michael Bostock\nAll rights reserved.\n\nRedistribution and use in source and binary forms, with o"
  },
  {
    "path": "cluster/benchmarks/bower_components/d3/README.md",
    "chars": 698,
    "preview": "# Data-Driven Documents\n\n<a href=\"http://d3js.org\"><img src=\"http://d3js.org/logo.svg\" align=\"left\" hspace=\"10\" vspace=\""
  },
  {
    "path": "cluster/benchmarks/bower_components/d3/bower.json",
    "chars": 372,
    "preview": "{\n  \"name\": \"d3\",\n  \"version\": \"3.5.5\",\n  \"main\": \"d3.js\",\n  \"scripts\": [\n    \"d3.js\"\n  ],\n  \"ignore\": [\n    \".DS_Store\""
  },
  {
    "path": "cluster/benchmarks/bower_components/d3/d3.js",
    "chars": 335277,
    "preview": "!function() {\n  var d3 = {\n    version: \"3.5.5\"\n  };\n  var d3_arraySlice = [].slice, d3_array = function(list) {\n    ret"
  },
  {
    "path": "cluster/benchmarks/bower_components/d3/package.js",
    "chars": 365,
    "preview": "// Package metadata for Meteor.js.\n\nPackage.describe({\n  name: \"d3js:d3\", // http://atmospherejs.com/d3js/d3\n  summary: "
  },
  {
    "path": "cluster/benchmarks/bower_components/plottable/.bower.json",
    "chars": 1250,
    "preview": "{\n  \"name\": \"plottable\",\n  \"description\": \"A modular charting library built on D3\",\n  \"version\": \"2.2.0\",\n  \"main\": [\n  "
  },
  {
    "path": "cluster/benchmarks/bower_components/plottable/bower.json",
    "chars": 969,
    "preview": "{\n  \"name\": \"plottable\",\n  \"description\": \"A modular charting library built on D3\",\n  \"version\": \"2.2.0\",\n  \"main\": [\n  "
  },
  {
    "path": "cluster/benchmarks/bower_components/plottable/plottable.css",
    "chars": 4333,
    "preview": "\n.plottable-colors-0 {\n  background-color: #5279c7; /* INDIGO */\n}\n\n.plottable-colors-1 {\n  background-color: #fd373e; /"
  },
  {
    "path": "cluster/benchmarks/bower_components/plottable/plottable.d.ts",
    "chars": 194266,
    "preview": "declare namespace Plottable.Utils.Math {\n    /**\n     * Checks if x is between a and b.\n     *\n     * @param {number} x "
  },
  {
    "path": "cluster/benchmarks/bower_components/plottable/plottable.js",
    "chars": 633633,
    "preview": "/*!\nPlottable 2.2.0 (https://github.com/palantir/plottable)\nCopyright 2014-2015 Palantir Technologies\nLicensed under MIT"
  },
  {
    "path": "cluster/benchmarks/dashboard_app/app.yaml",
    "chars": 123,
    "preview": "runtime: python\nenv: flex\nentrypoint: gunicorn -b :$PORT main:app\nservice: benchmarks\n\nruntime_config:\n  python_version:"
  },
  {
    "path": "cluster/benchmarks/dashboard_app/main.py",
    "chars": 5504,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/dashboard_app/main_test.py",
    "chars": 2934,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/dashboard_app/requirements.txt",
    "chars": 44,
    "preview": "Flask==0.12.2\ngunicorn==19.7.1\ngoogle-cloud\n"
  },
  {
    "path": "cluster/benchmarks/dashboard_app/static/css/style.css",
    "chars": 1792,
    "preview": "/* Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"Lic"
  },
  {
    "path": "cluster/benchmarks/dashboard_app/static/js/benchmark_latency_chart.js",
    "chars": 2418,
    "preview": "// Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n//\n// Licensed under the Apache License, Version 2.0 (the"
  },
  {
    "path": "cluster/benchmarks/dashboard_app/templates/index.html",
    "chars": 1718,
    "preview": "<!--\n  @license\n  Copyright 2016 The TensorFlow Authors. All Rights Reserved.\n\n  Licensed under the Apache License, Vers"
  },
  {
    "path": "cluster/benchmarks/dashboard_app/templates/test.html",
    "chars": 1798,
    "preview": "<!--\n  @license\n  Copyright 2016 The TensorFlow Authors. All Rights Reserved.\n\n  Licensed under the Apache License, Vers"
  },
  {
    "path": "cluster/benchmarks/index.html",
    "chars": 283,
    "preview": "<!DOCTYPE html>\n<head>\n<meta charset=\"utf-8\">\n<style>\nbody {\n  font-family: roboto, sans-serif;\n}\na {\n  font-weight: 400"
  },
  {
    "path": "cluster/benchmarks/js/csv_benchmark_chart.js",
    "chars": 3127,
    "preview": "/**\n * @fileoverview Provides a way to create a mean latency chart based on a\n * csv file with latency data.\n */\n\n/**\n *"
  },
  {
    "path": "cluster/benchmarks/js/latency_chart.js",
    "chars": 3666,
    "preview": "/**\n * @fileoverview Combines all components needed to display a line chart for\n * benchmarks.\n * @param {string} title "
  },
  {
    "path": "cluster/benchmarks/scripts/Dockerfile.tf_cnn_benchmarks",
    "chars": 309,
    "preview": "FROM tensorflow/tensorflow:nightly-gpu\n\nRUN apt-get update && apt-get install -y python-pip && pip install google-cloud\n"
  },
  {
    "path": "cluster/benchmarks/scripts/benchmark_configs.yml",
    "chars": 1947,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/README.md",
    "chars": 1890,
    "preview": "# tf_cnn_benchmarks: High performance benchmarks\n\ntf_cnn_benchmarks contains implementations of several popular convolut"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/benchmark_cnn.py",
    "chars": 74994,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/benchmark_storage.py",
    "chars": 1679,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/cbuild_benchmark_storage.py",
    "chars": 3546,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/cnn_util.py",
    "chars": 5376,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/convnet_builder.py",
    "chars": 19017,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/datasets.py",
    "chars": 5462,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/alexnet_model.py",
    "chars": 2924,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/densenet_model.py",
    "chars": 3380,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/googlenet_model.py",
    "chars": 2185,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/inception_model.py",
    "chars": 8405,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/lenet_model.py",
    "chars": 1261,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/model.py",
    "chars": 2023,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/model_config.py",
    "chars": 3479,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/overfeat_model.py",
    "chars": 1518,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/resnet_model.py",
    "chars": 10524,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/trivial_model.py",
    "chars": 1024,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/models/vgg_model.py",
    "chars": 2276,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/preprocessing.py",
    "chars": 32720,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py",
    "chars": 1655,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/tf_cnn_benchmarks/variable_mgr.py",
    "chars": 44686,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/util/__init__.py",
    "chars": 0,
    "preview": ""
  },
  {
    "path": "cluster/benchmarks/scripts/util/benchmark_util.py",
    "chars": 3476,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/util/benchmark_util_test.py",
    "chars": 2276,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/util/convert_csv_to_json.py",
    "chars": 3034,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/scripts/util/convert_csv_to_json_test.py",
    "chars": 3356,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/soumith_benchmarks.html",
    "chars": 1607,
    "preview": "<!DOCTYPE html>\n<head>\n<meta charset=\"utf-8\">\n<link rel=\"stylesheet\" type=\"text/css\" href=\"./bower_components/plottable/"
  },
  {
    "path": "cluster/benchmarks/tools/k8s_tensorflow_lib.py",
    "chars": 10159,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/tools/k8s_tensorflow_test.py",
    "chars": 4331,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/tools/kubectl_util.py",
    "chars": 6712,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/tools/kubectl_util_test.py",
    "chars": 2633,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/benchmarks/tools/run_distributed_benchmarks.py",
    "chars": 8496,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/client_transfer_benchmark.py",
    "chars": 7636,
    "preview": "#!/usr/bin/env python\n# Benchmark transferring data from TF into Python runtime\n# Macbook\n# t->p 2919.72 MB/second\n# p->"
  },
  {
    "path": "cluster/cloud-formation-example/README.md",
    "chars": 1289,
    "preview": "# TensorFlow\n\n\nCreate Stack:\n```\naws --region ap-southeast-2 cloudformation create-stack --stack-name tensorflow --templ"
  },
  {
    "path": "cluster/cloud-formation-example/iam.yaml",
    "chars": 1001,
    "preview": "AWSTemplateFormatVersion: 2010-09-09\nDescription: TensorFlow IamInstanceProfile CloudFormation\nParameters:\n  InstancePro"
  },
  {
    "path": "cluster/cloud-formation-example/tensorflow.yaml",
    "chars": 5659,
    "preview": "AWSTemplateFormatVersion: 2010-09-09\nDescription: Distributed TensorFlow CloudFormation\nMappings: \n  AMI:\n    ap-southea"
  },
  {
    "path": "cluster/cloud-formation-example/zone.sh",
    "chars": 1095,
    "preview": "set -x\nset -e\n\noption=$1\nName=$2\nRegion=$3\nVPC=$4\n\nif [[ \"$option\" == \"create\" ]]; then\naws --region $Region route53 cre"
  },
  {
    "path": "cluster/connect.py",
    "chars": 1930,
    "preview": "#!/usr/bin/env python\n\"\"\"\n\nScript to connect to most recent instance with containing given fragment:\nUsage:\nconnect\n-- c"
  },
  {
    "path": "cluster/delete_placement_groups.py",
    "chars": 1773,
    "preview": "#!/usr/bin/env python\n\n# delete all placement groups\n\nimport boto3\n\n# {'PlacementGroups': [{'GroupName': 'gpu12',\n#    '"
  },
  {
    "path": "cluster/fill_efs.py",
    "chars": 965,
    "preview": "#!/usr/bin/env python\n\nimport numpy as np\nimport math\nimport argparse\n\nparser = argparse.ArgumentParser(description='scr"
  },
  {
    "path": "cluster/imagenet64/README.md",
    "chars": 5928,
    "preview": "# Performance\n\nReproducing 64-GPU ImageNet performance benchmark on AWS\n\nRun this:\n\n```\npython launch.py --num_workers=8"
  },
  {
    "path": "cluster/imagenet64/aws.py",
    "chars": 12496,
    "preview": "\"\"\"Utilities to launch jobs on AWS.\n\nExample usage:\njob = aws.tf_job('myjob', 1)\ntask = job.tasks[0]\ntask.upload(__file_"
  },
  {
    "path": "cluster/imagenet64/launch.py",
    "chars": 19924,
    "preview": "#!/usr/bin/env python\n\n# ImageNet experiments\n# 1 worker, 1 ps: 1 gpu/machine\n# python launch_async_adder.py --cluster=a"
  },
  {
    "path": "cluster/imagenet64/requirements.txt",
    "chars": 42,
    "preview": "boto3\nparamiko\npyyaml\ntensorflow-gpu==1.4\n"
  },
  {
    "path": "cluster/imagenet64/variable_mgr.py",
    "chars": 44686,
    "preview": "# Copyright 2017 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "cluster/instance_info.py",
    "chars": 1038,
    "preview": "from collections import defaultdict\n\nimport boto3\n\n\"\"\"\nA tool for retrieving basic information from the running EC2 inst"
  },
  {
    "path": "cluster/launch_async_adder.py",
    "chars": 26463,
    "preview": "#!/usr/bin/env python\n\n# ImageNet experiments\n# 1 worker, 1 ps: 1 gpu/machine\n# python launch_async_adder.py --cluster=a"
  },
  {
    "path": "cluster/launch_micro.py",
    "chars": 4180,
    "preview": "#!/usr/bin/env python\n# Launch single instance\n\nfrom collections import OrderedDict\nfrom pprint import pprint as pp\nimpo"
  },
  {
    "path": "cluster/launch_ray.py",
    "chars": 4235,
    "preview": "#!/usr/bin/env python\n# Launch single instance, \n\nfrom collections import OrderedDict\nfrom pprint import pprint as pp\nim"
  },
  {
    "path": "cluster/launch_simple_tf.py",
    "chars": 2222,
    "preview": "# simple example of launching tensorflow job\n\nimport time\nimport tensorflow as tf\n\nflags = tf.flags\nflags.DEFINE_string("
  },
  {
    "path": "cluster/local_distributed_benchmark.py",
    "chars": 3342,
    "preview": "\"\"\"Benchmark tensorflow distributed by adding vector of ones on worker2\nto variable on worker1 as fast as possible.\nOn 2"
  },
  {
    "path": "cluster/myutil.py",
    "chars": 1083,
    "preview": "from pprint import pprint as pp\nimport yaml\n#import util\nimport boto3\nfrom collections import OrderedDict\nimport time\n\nc"
  },
  {
    "path": "cluster/ray_add.py",
    "chars": 3623,
    "preview": "# To benchmark READ throughput, run the following.\n#\n#     python async_sgd_benchmark.py --num-workers=10 --num-paramete"
  },
  {
    "path": "cluster/simple_distributed.py",
    "chars": 6494,
    "preview": "#!/usr/bin/env python\n# Launches \n\nimport gc\nimport os\nimport portpicker\nimport subprocess\nimport sys\nimport tensorflow "
  },
  {
    "path": "cluster/terminate_instances.py",
    "chars": 2103,
    "preview": "#!/usr/bin/env python\n\"\"\"\n\nScript to kill all instances matching given prefix.\n\nUsage:\n\n./terminate_instances.py gpu   #"
  },
  {
    "path": "cluster/test_aws.py",
    "chars": 1774,
    "preview": "# simple example of launching tensorflow job\n\nimport aws\nimport os\nimport sys\nimport time\nimport tensorflow as tf\nimport"
  },
  {
    "path": "cluster/tf-tools/.gitignore",
    "chars": 22,
    "preview": "__pycache__\n.DS_Store\n"
  },
  {
    "path": "cluster/tf-tools/benchmark/multi_gpu/advanced_tweaks_compare.sh",
    "chars": 1224,
    "preview": "# Showing NCHW vs NHWC, NCCL and paramater server GPU vs CPU\n_NUM_GPUS=1,2,8\nLOG_FOLDER=advanced_tests\n\n# PS GPU vs. CPU"
  },
  {
    "path": "cluster/tf-tools/benchmark/multi_gpu/image_classification_bench_tests.sh",
    "chars": 1128,
    "preview": "# Runs tests for an 8 GPU server\n_NUM_GPUS=1,2,4,8\n# Inception v3\n./test_runner.sh --model inception3 --num_batches 100 "
  },
  {
    "path": "cluster/tf-tools/benchmark/multi_gpu/stats_monitor.sh",
    "chars": 1727,
    "preview": "#!/bin/bash\n\n\n# Get all nvidia-smi data worth having\n# There is no historical data so calling this after a run\n# when th"
  },
  {
    "path": "cluster/tf-tools/benchmark/multi_gpu/test_runner.sh",
    "chars": 10737,
    "preview": "#!/bin/bash\n\n# Set defaults to best performance for most scenarios and most used values\nGPUS_PER_HOST=1\nDATA_FORMAT=NCHW"
  },
  {
    "path": "cluster/tf-tools/benchmark/multi_gpu/unit_test_stats_monitor.sh",
    "chars": 677,
    "preview": "#!/bin/bash\n\n./monitor_nvidia.sh --log_full_path ./full_log.txt --log_summary_full_path ./log_summary.txt  &\n\nNVIDIA_MON"
  },
  {
    "path": "cluster/tf-tools/benchmark/runner/cluster_aws.py",
    "chars": 10565,
    "preview": "import boto3\nimport os\nimport time\nimport util\nfrom contextlib import contextmanager\n\n\nclass AWSInstance(object):\n\n  def"
  },
  {
    "path": "cluster/tf-tools/benchmark/runner/command_builder.py",
    "chars": 7114,
    "preview": "import os\nimport sys\nimport six\n\ndef BuildDistributedCommandWorker(run_config, worker_hosts, ps_hosts,\n                 "
  },
  {
    "path": "cluster/tf-tools/benchmark/runner/configs/aws/multi_server.yaml",
    "chars": 1031,
    "preview": "# Run config\ncloud_type: aws\n\ntf_url: tensorflow-gpu\n\n# Shared with AWS and GCE\ninstance_tag: tf-monster\ninstance_type: "
  },
  {
    "path": "cluster/tf-tools/benchmark/runner/configs/aws/yaroslav.yaml",
    "chars": 545,
    "preview": "# Run config\ncloud_type: aws\n\ntf_url: tensorflow-gpu\n\ninstance_tag: yaroslav\ninstance_type: p2.xlarge\ninstance_force_reu"
  },
  {
    "path": "cluster/tf-tools/benchmark/runner/instance_info.py",
    "chars": 1038,
    "preview": "from collections import defaultdict\n\nimport boto3\n\n\"\"\"\nA tool for retrieving basic information from the running EC2 inst"
  },
  {
    "path": "cluster/tf-tools/benchmark/runner/launch_experiment.py",
    "chars": 7237,
    "preview": "# launch imagenet experiment\n\nfrom command_builder import *\nfrom pprint import pprint as pp\nimport yaml\nimport cluster_a"
  },
  {
    "path": "cluster/tf-tools/benchmark/runner/test_cluster_aws.py",
    "chars": 2101,
    "preview": "from command_builder import *\nfrom pprint import pprint as pp\nimport yaml\nimport cluster_aws\n\nfrom collections import Or"
  },
  {
    "path": "cluster/tf-tools/benchmark/runner/test_command_builder.py",
    "chars": 721,
    "preview": "from command_builder import *\nfrom pprint import pprint as pp\nimport yaml\n\ndef main():\n\n  \n  with open('configs/aws/yaro"
  },
  {
    "path": "cluster/tf-tools/benchmark/runner/util.py",
    "chars": 6359,
    "preview": "#import exceptions\nimport functools\nimport logging\nimport os\nimport paramiko\nimport numpy\nimport sys\nimport threading\nim"
  },
  {
    "path": "cluster/tf-tools/install/aws_amzlinux.md",
    "chars": 5917,
    "preview": "# Install for Amazon Linux TensorFlow + CUDA\nThis install was created on the latest Amazon Linux image as of 12-FEB-2017"
  },
  {
    "path": "cluster/tf-tools/install/aws_ubuntu16_04.md",
    "chars": 6444,
    "preview": "# Install TensorFlow + CUDA on Ubuntu 16.04 on AWS\nThis install was done on AWS using Ubuntu 16.04 LTS Starting with m4."
  },
  {
    "path": "cluster/tmux.py",
    "chars": 4555,
    "preview": "\n#import util as myutil\nfrom collections import OrderedDict\nfrom collections import defaultdict\nfrom pprint import pprin"
  },
  {
    "path": "cluster/upload_test.txt",
    "chars": 10,
    "preview": "testfile3\n"
  },
  {
    "path": "conditional_backprop.py",
    "chars": 1425,
    "preview": "# Example of conditionally enabling backprop based on a variable.\n# variable \"switches\" determines which entries of \"y\" "
  },
  {
    "path": "configure_tf.sh",
    "chars": 1259,
    "preview": "#!/usr/bin/expect -d\n# Helper script that uses expect to automatically go through all configure\n# steps using the defaul"
  },
  {
    "path": "configure_tf_cpu.sh",
    "chars": 829,
    "preview": "#!/usr/bin/expect -d\n# Helper script that uses expect to automatically go through all configure\n# steps using the defaul"
  },
  {
    "path": "danjar_peek.py",
    "chars": 2176,
    "preview": "import tensorflow as tf\nfrom tensorflow.python.client import timeline\n\n\nclass Queue(tf.FIFOQueue):\n\n  def __init__(self,"
  },
  {
    "path": "distributed/README.md",
    "chars": 22,
    "preview": "TF distributed tools\n "
  },
  {
    "path": "distributed/benchmark_grpc_recv.py",
    "chars": 5534,
    "preview": "# Dependencies:\n# portpicker (pip install portpicker)\n# tcmalloc4 (sudo apt-get install google-perftools)\n# TF 0.12\n#\n#\n"
  },
  {
    "path": "distributed/client_transfer_benchmark.py",
    "chars": 6004,
    "preview": "# Benchmark transferring data from TF into Python runtime\n#\n## Dependencies:\n# portpicker (pip install portpicker)\n# tcm"
  },
  {
    "path": "double_memory_bug.py",
    "chars": 1272,
    "preview": "# Troubleshooting\n# https://github.com/tensorflow/tensorflow/issues/13433#issuecomment-351722017\n\nimport tensorflow as t"
  },
  {
    "path": "dynamic_stitch_gpu.py",
    "chars": 1029,
    "preview": "# from https://github.com/tensorflow/tensorflow/issues/7251\nimport os\nos.environ[\"CUDA_VISIBLE_DEVICES\"]=\"0\"\n\nimport ten"
  },
  {
    "path": "dynamic_stitch_gpu_profile.pbtxt",
    "chars": 67773,
    "preview": "step_stats {\n  dev_stats {\n    device: \"/job:localhost/replica:0/task:0/cpu:0\"\n    node_stats {\n      node_name: \"_SOURC"
  },
  {
    "path": "eager_lbfgs/.ipynb_checkpoints/performance-checkpoint.ipynb",
    "chars": 65876,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# l-BFGS performance: Eager vs PyTo"
  },
  {
    "path": "eager_lbfgs/common_gd.py",
    "chars": 1237,
    "preview": "import argparse\nparser = argparse.ArgumentParser(description='PyTorch MNIST Example')\n\nparser.add_argument('--batch-size"
  },
  {
    "path": "eager_lbfgs/data/short_batch.csv",
    "chars": 12,
    "preview": "100\n200\n300\n"
  },
  {
    "path": "eager_lbfgs/data/short_eager_batch.csv",
    "chars": 7,
    "preview": "10\n100\n"
  },
  {
    "path": "eager_lbfgs/data/short_eager_loss.csv",
    "chars": 75,
    "preview": "1.125071197748184204e-03\n1.720546046271920204e-03\n2.242934657260775566e-03\n"
  },
  {
    "path": "eager_lbfgs/data/short_eager_time.csv",
    "chars": 75,
    "preview": "9.806975307874381542e-01\n9.339727419428527355e-01\n9.292591358534991741e-01\n"
  },
  {
    "path": "eager_lbfgs/data/short_pytorch_loss.csv",
    "chars": 75,
    "preview": "1.125177601352334023e-03\n1.720896689221262932e-03\n2.242802875116467476e-03\n"
  },
  {
    "path": "eager_lbfgs/data/short_pytorch_time.csv",
    "chars": 75,
    "preview": "2.150501497089862823e-01\n2.058924520388245583e-01\n1.908177738077938557e-01\n"
  },
  {
    "path": "eager_lbfgs/eager_lbfgs.py",
    "chars": 7554,
    "preview": "import util as u\n\nimport tensorflow as tf\nimport numpy as np\nimport time\n\nfrom tensorflow.contrib.eager.python import tf"
  },
  {
    "path": "eager_lbfgs/performance.ipynb",
    "chars": 65876,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# l-BFGS performance: Eager vs PyTo"
  },
  {
    "path": "eager_lbfgs/pytorch_lbfgs.py",
    "chars": 2319,
    "preview": "import util as u\n\nimport torch\nimport torch.nn as nn\nimport torch.nn.functional as F\nimport torch.optim as optim\nfrom to"
  },
  {
    "path": "eager_lbfgs/run_experiment.py",
    "chars": 1586,
    "preview": "# compare timing for variety of batch-sizes\n# TODO: make PyTorch not run out of memory\n\nimport tensorflow as tf\nimport e"
  },
  {
    "path": "eager_lbfgs/torch_lbfgs.lua",
    "chars": 9020,
    "preview": "--[[ An implementation of L-BFGS, heavily inspired by minFunc (Mark Schmidt)\n\nThis implementation of L-BFGS relies on a "
  },
  {
    "path": "eager_lbfgs/util.py",
    "chars": 42312,
    "preview": "#!/usr/bin/env python\nimport socket\nimport contextlib\nimport inspect\nimport inspect\nimport networkx as nx\nimport numpy a"
  },
  {
    "path": "enqueue_many_test.py",
    "chars": 2473,
    "preview": "import os, sys\nimport numpy as np\nos.environ[\"CUDA_VISIBLE_DEVICES\"]=\"\"\nimport tensorflow as tf\n\ndef create_session():\n "
  },
  {
    "path": "enqueue_many_test_singlerun.py",
    "chars": 2478,
    "preview": "# Test multiple enqueue many in single .run call\nimport os, sys\nimport numpy as np\nos.environ[\"CUDA_VISIBLE_DEVICES\"]=\"\""
  },
  {
    "path": "ericyue-slowreader/benchmark-batch-noqueuerunners-timeline.json",
    "chars": 5205,
    "preview": "{\n    \"traceEvents\": [\n        {\n            \"ph\": \"M\",\n            \"pid\": 0,\n            \"name\": \"process_name\",\n      "
  },
  {
    "path": "ericyue-slowreader/benchmark-batch-noqueuerunners.profile",
    "chars": 132170,
    "preview": "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n \"http://www.w3.or"
  },
  {
    "path": "ericyue-slowreader/benchmark-batch-noqueuerunners.py",
    "chars": 3983,
    "preview": "# range size is  1000000\n# range queue 900000, batch queue 100000, 81510.55 per second\n# range d 900000, batch d 100000\n"
  },
  {
    "path": "ericyue-slowreader/benchmark-batch.py",
    "chars": 3306,
    "preview": "# measure the speed at which batches can be made\n# Only 14k \n# range queue 1996115, batch queue 3885, 6455.13 per second"
  },
  {
    "path": "ericyue-slowreader/benchmark-reader.py",
    "chars": 1441,
    "preview": "# [1484609202] time[  0.01] step[        20] speed[360350]\n# [1484609202] time[  0.00] step[        40] speed[1129322]\n#"
  },
  {
    "path": "ericyue-slowreader/benchmark-synthetic-batch.py",
    "chars": 2020,
    "preview": "# [1484611992] time[  0.00] step[       420] speed[613695]\n# [1484611992] time[  0.00] step[       440] speed[501141]\n# "
  },
  {
    "path": "ericyue-slowreader/benchmark-synthetic.py",
    "chars": 1782,
    "preview": "# [1484615767] time[  0.31] step[      2000] speed[652222]\n# [1484615767] time[  0.31] step[      4000] speed[654197]\n# "
  },
  {
    "path": "ericyue-slowreader/benchmark.py",
    "chars": 4628,
    "preview": "# On 32-core machine.\n\n# [2017-01-17 10:45:35] time[  1.84] step[       200] speed[ 10887]\n# [2017-01-17 10:45:37] time["
  },
  {
    "path": "ericyue-slowreader/profile-batch.py",
    "chars": 495,
    "preview": "# script for getting cpu profile of queue runners\n# \n# sudo apt-get install google-perftools\n# LD_PRELOAD has to be set "
  },
  {
    "path": "free_gpus.py",
    "chars": 2849,
    "preview": "#!/usr/bin/env python\n# Parse nvidia-smi for pids and kill all GPU users\n# Tested on nvidia-smi 370.23\nimport os, re, sy"
  },
  {
    "path": "github_pyfunc_slowness.py",
    "chars": 1320,
    "preview": "# Example of py_func slowing down future computations\n# On Mac\n# time 1 0.007195033016614616\n# time 2 0.0070790809113532"
  },
  {
    "path": "gpu-memory-transfer.ipynb",
    "chars": 77938,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"code\",\n   \"execution_count\": 1,\n   \"metadata\": {\n    \"collapsed\": false\n   },\n   \"out"
  },
  {
    "path": "gpu_oom.py",
    "chars": 395,
    "preview": "# Example of catching GPU OOM error\n# http://stackoverflow.com/questions/41942538/tensorflow-gpu-memory-error-try-except"
  },
  {
    "path": "graph_template.py",
    "chars": 16131,
    "preview": "\"\"\"Helpers to replicate computation specified as part of existing graph.\"\"\"\n\n\n# helper to allow @profile decorators even"
  },
  {
    "path": "imagenet15-scratch.ipynb",
    "chars": 52802,
    "preview": "{\n \"cells\": [\n  {\n   \"cell_type\": \"markdown\",\n   \"metadata\": {},\n   \"source\": [\n    \"# ImageNet in 15 minutes notes\"\n   "
  },
  {
    "path": "input_benchmarks/convert_to_records.py",
    "chars": 3172,
    "preview": "# Copyright 2015 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "input_benchmarks/fully_connected_feed.py",
    "chars": 9294,
    "preview": "# Copyright 2015 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "input_benchmarks/fully_connected_preloaded_var.py",
    "chars": 6509,
    "preview": "# Copyright 2015 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "input_benchmarks/fully_connected_reader.py",
    "chars": 7690,
    "preview": "# Copyright 2015 The TensorFlow Authors. All Rights Reserved.\n#\n# Licensed under the Apache License, Version 2.0 (the \"L"
  },
  {
    "path": "input_benchmarks/timeline.feed.json",
    "chars": 78175,
    "preview": "{\n    \"traceEvents\": [\n        {\n            \"ph\": \"M\",\n            \"pid\": 0,\n            \"args\": {\n                \"nam"
  },
  {
    "path": "input_benchmarks/timeline.reader.json",
    "chars": 574465,
    "preview": "{\n    \"traceEvents\": [\n        {\n            \"ph\": \"M\",\n            \"name\": \"process_name\",\n            \"args\": {\n      "
  },
  {
    "path": "input_benchmarks/timeline.var.json",
    "chars": 59643,
    "preview": "{\n    \"traceEvents\": [\n        {\n            \"args\": {\n                \"name\": \"Allocators\"\n            },\n            \""
  }
]

// ... and 66 more files (download for full content)

About this extraction

This page contains the full source code of the yaroslavvb/stuff GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 266 files (15.6 MB), approximately 4.1M tokens, and a symbol index with 3079 extracted functions, classes, methods, constants, and types. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo