[
  {
    "path": "README.md",
    "content": "## TensorFlow implementation of \"Iterative Pruning\"\n\n**CAUTION**: Out-of-date notices.\n\nCurrently, I've checked TF (>1.3) supports *sparse_matmul* and it seems that this is\nmore correct way to implement iterative pruning. This work is just naively done with quite old\nversions (0.8.0) and thus, I do not recommend to consider these codes for your serious cases. And there will be no updates or maintenance either.\n\n---\n\nThis work is based on \"Learning both Weights and Connections for Efficient\nNeural Network.\" [Song et al.](http://arxiv.org/pdf/1506.02626v3.pdf) @ NIPS '15.\nNote that these works are just for quantifying its effectiveness on latency (within TensorFlow),\nnot a best optimal. Thus, some details are abbreviated for simplicity. (e.g. # of iterations, adjusted dropout ratio, etc.)\n\nI applied Iterative Pruning on a small MNIST CNN model (13MB, originally), which can be\naccessed from [TensorFlow Tutorials](https://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html).\nAfter pruning off some percentages of weights, I've simply retrained two epochs for\neach case and got compressed models (minimum 2.6MB with 90% off) with minor loss of accuracy.\n(99.17% -> 98.99% with 90% off and retraining) Again, this is not an optimal.\n\n## Issues\n\nDue to lack of supports on SparseTensor and its operations of TensorFlow (0.8.0),\nthis implementation has some limitations. This work uses [*embedding_lookup_sparse*](https://www.tensorflow.org/versions/r0.8/api_docs/python/nn.html#embedding_lookup_sparse) to compute sparse matrix-vector multiplication.\nIt is not solely for the purpose of sparse matrix vector multiplication, and thus its performance may be sub-optimal. (I'm not sure.)\nAlso, TensorFlow uses \\<index, value\\> pair for sparse matrix rather than\nusing typical CSR format which is more compact and performant.\nIn summary, because of the following reasons, I think this implementation has some limitations.\n\n1. *embedding_lookup_sparse* doesn't support ```broadcasting```, which prohibits users to run test with normal test datasets.\n2. Performance may be somewhat sub-optimal.\n3. Because \"Sparse Variable\" is not supported, manual dense to sparse and sparse to dense transformation is required.\n4. 4D Convolution Tensor may also be applicable, but bit tricky.\n5. Current *embedding_lookup_sparse* forces additional matrix transpose, dimension squeeze and dimension reshape.\n\n## File descriptions and usages\n\nmodel_ckpt_dense: original model<br>\nmodel_ckpt_dense_pruned: 90% pruned-only model<br>\nmodel_ckpt_sparse_retrained: 90% pruned and retrained model<br>\n\n#### Python package requirements\n```bash\nsudo apt-get install python-scipy python-numpy python-matplotlib\n```\n\nTo regenerate these sparse model, edit ```config.py``` first as your threshold configuration,\nand then run training with second (pruning and retraining) and third (generate sparse form of weight data) round options.\n\n```bash\n./train.py -2 -3\n```\n\nTo inference single image (seven.png) and measure its latency,\n\n```bash\n./deploy_test.py -d -m model_ckpt_dense\n./deploy_test_sparse.py -d -m model_ckpt_sparse_retrained\n```\n\nTo test dense model,\n\n```bash\n./deploy_test.py -t -m model_ckpt_dense\n./deploy_test.py -t -m model_ckpt_dense_pruned\n./deploy_test.py -t -m model_ckpt_dense_retrained\n```\n\nTo draw histogram that shows the weight distribution,\n\n```bash\n# After running train.py (it generates .dat files)\n./draw_histogram.py\n```\n\n## Performance\nResults are currently somewhat mediocre or degraded due to indirection and additional storage overhead originated from sparse matrix form.\nAlso, it may because model size is too small. (12.49MB)\n\n#### Storage overhead\nBaseline: 12.49 MB<br>\n10 % pruned: 21.86 MB<br>\n20 % pruned: 19.45 MB<br>\n30 % pruned: 17.05 MB<br>\n40 % pruned: 14.64 MB<br>\n50 % pruned: 12.23 MB<br>\n60 % pruned: 9.83 MB<br>\n70 % pruned: 7.42 MB<br>\n80 % pruned: 5.02 MB<br>\n90 % pruned: 2.61 MB<br>\n\n#### CPU performance (5 times averaged)\nCPU: Intel Core i5-2500 @ 3.3 GHz,\nLLC size: 6 MB\n\n<img src=http://younghwanoh.github.io/images/cpu-desktop.png alt=http://younghwanoh.github.io/images/cpu-desktop.png>\n\nBaseline: 0.01118040085 s<br>\n10 % pruned: 1.919299984   s<br>\n20 % pruned: 0.2325239658  s<br>\n30 % pruned: 0.2111079693  s<br>\n40 % pruned: 0.1982570648  s<br>\n50 % pruned: 0.1691776752  s<br>\n60 % pruned: 0.1305227757  s<br>\n70 % pruned: 0.116039753   s<br>\n80 % pruned: 0.103564167   s<br>\n90 % pruned: 0.1058168888  s<br>\n\n#### GPU performance (5 times averaged)\nGPU: Nvidia Geforce GTX650 @ 1.058 GHz,\nLLC size: 256 KB\n\n<img src=http://younghwanoh.github.io/images/gpu-desktop.png alt=http://younghwanoh.github.io/images/gpu-desktop.png>\n\nBaseline: 0.1475181845 s<br>\n10 % pruned: 0.2954540253 s<br>\n20 % pruned: 0.2665398121 s<br>\n30 % pruned: 0.2585638046 s<br>\n40 % pruned: 0.2090051651 s<br>\n50 % pruned: 0.1995279789 s<br>\n60 % pruned: 0.1815193653 s<br>\n70 % pruned: 0.1436806202 s<br>\n80 % pruned: 0.135668993  s<br>\n90 % pruned: 0.1218701839 s<br>\n\n\n\n\n\n\n\n\n"
  },
  {
    "path": "config.py",
    "content": "#!/usr/bin/python\nimport thspace as ths\n\ndef _complex_concat(a, b):\n    tmp = []\n    for i in a:\n        for j in b:\n            tmp.append(i+j)\n    return tmp\n\ndef _add_prefix(a):\n    tmp = []\n    for idx, val in enumerate(a):\n        tmp.append(\"w_\" + val)\n        # tmp.append(\"b_\" + val)\n    return tmp\n\n# Pruning threshold setting (90 % off)\nth = ths.th90\n\n# CNN settings for pruned training\ntarget_layer = [\"fc1\", \"fc2\"]\nretrain_iterations = 10\n\n# Output data lists: do not change this\ntarget_all_layer = _add_prefix(target_layer)\n\ntarget_dat = _complex_concat(target_all_layer, [\".dat\"])\ntarget_p_dat = _complex_concat(target_all_layer, [\"_p.dat\"])\ntarget_tp_dat = _complex_concat(target_all_layer, [\"_tp.dat\"])\n\nweight_all = target_dat + target_p_dat + target_tp_dat\nsyn_all = [\"in_conv1.syn\", \"in_conv2.syn\", \"in_fc1.syn\", \"in_fc2.syn\"]\n\n# Data settings\nshow_zero = False\n\n# Graph settings\nalpha = 0.75\ncolor = \"green\"\npdf_prefix = \"\"\n"
  },
  {
    "path": "deploy_test.py",
    "content": "#!/usr/bin/python\n\nimport sys\nsys.dont_write_bytecode = True\n\nimport tensorflow as tf\nimport numpy as np\nimport argparse\nimport papl\nimport config\n\nargparser = argparse.ArgumentParser()\nargparser.add_argument(\"-t\", \"--test\", action=\"store_true\", help=\"Run test\")\nargparser.add_argument(\"-d\", \"--deploy\", action=\"store_true\", help=\"Run deploy with seven.png\")\nargparser.add_argument(\"-s\", \"--print_syn\", action=\"store_true\", help=\"Print synapses to .syn\")\nargparser.add_argument(\"-m\", \"--model\", default=\"./model_ckpt_dense\", help=\"Specify a target model file\")\nargs = argparser.parse_args()\n\nif (args.test or args.deploy or args.print_syn) == True:\n    from tensorflow.examples.tutorials.mnist import input_data\n    mnist = input_data.read_data_sets('/tmp/data/', one_hot=True)\nelse:\n    argparser.print_help()\n    sys.exit()\n\n# sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True)))\nsess = tf.InteractiveSession()\n# sess = tf.Session()\n\ndef imgread(path):\n    tmp = papl.imread(path)\n    img = np.zeros((28,28,1))\n    img[:,:,0]=tmp[:,:,0]\n    img = np.reshape(img, img.size)\n    return img\n\n# Restore values of variables\nsaver = tf.train.import_meta_graph(args.model+'.meta')\nsaver.restore(sess, args.model)\n\n# Calc results\nif args.test == True:\n    # Evaluate test sets\n    import time\n    accuracy = tf.get_collection(\"accuracy\")[0]\n\n    # To avoid OOM, run validation with 500/10000 test dataset\n    b = time.time()\n    result = 0\n    for i in range(20):\n        batch = mnist.test.next_batch(500)\n        result += sess.run(accuracy, feed_dict={\"x:0\": batch[0],\n                                                \"y_:0\": batch[1],\n                                                \"keep_prob:0\": 1.0})\n    result /= 20\n    a = time.time()\n\n    print(\"Test accuracy %g\" % result)\n    print \"Time: %s s\" % (a-b)\nelif args.deploy == True:\n    # Infer a single image & check its latency\n    import time\n    img = imgread('seven.png')\n    y_conv = tf.get_collection(\"y_conv\")[0]\n\n    b = time.time()\n    result = sess.run(tf.argmax(y_conv,1), feed_dict={\"x:0\":[img],\n                                                      \"y_:0\":mnist.test.labels,\n                                                      \"keep_prob:0\": 1.0})\n    a = time.time()\n\n    print \"Output: %s\" % result\n    print \"Time: %s s\" % (a-b)\n    papl.log(\"performance_ref.log\", a-b)\n\nelif args.print_syn == True:\n    # Print synapses (Input data of each neuron)\n    img = imgread('seven.png')\n    target_syn = config.syn_all\n    synapses = [ tf.get_collection(elem.split(\".\")[0])[0] for elem in target_syn ]\n    for i,j in zip(synapses, config.syn_all):\n        syn = sess.run(i, feed_dict={\"x:0\":[img],\n                                     \"y_:0\":mnist.test.labels,\n                                     \"keep_prob:0\": 1.0})\n        papl.print_synapse_nps(syn, j)\n    print \"Done! Synapse data is printed to x.syn\"\n"
  },
  {
    "path": "deploy_test_pruned.py",
    "content": "#!/usr/bin/python\n\nimport sys\nsys.dont_write_bytecode = True\n\nimport tensorflow as tf\nimport numpy as np\nimport argparse\nimport config\nimport papl\n\nargparser = argparse.ArgumentParser()\nargparser.add_argument(\"-t\", \"--test\", action=\"store_true\", help=\"Run test\")\nargparser.add_argument(\"-d\", \"--deploy\", action=\"store_true\", help=\"Run deploy with seven.png\")\nargparser.add_argument(\"-m\", \"--model\", default=\"./model_ckpt_sparse_retrained\", help=\"Specify a target model file\")\nargs = argparser.parse_args()\n\nif (args.test) == True:\n    print \"Error: TensorFlow 0.8 doesn't support broadcasts on sparse operations, cannot run test set now\"\n    sys.exit()\nelif (args.deploy) == True:\n    from tensorflow.examples.tutorials.mnist import input_data\n    mnist = input_data.read_data_sets('/tmp/data/', one_hot=True)\nelse:\n    argparser.print_help()\n    sys.exit()\n\n# sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True)))\nsess = tf.InteractiveSession(config=tf.ConfigProto(device_count={'GPU':0}))\n# sess = tf.Session()\n\ndef imgread(path):\n    tmp = papl.imread(path)\n    img = np.zeros((28,28,1))\n    img[:,:,0]=tmp[:,:,0]\n    return img\n\n# Declare weight variables\nsparse_w={\n    \"w_conv1\": tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1), name=\"w_conv1\"),\n    \"b_conv1\": tf.Variable(tf.constant(0.1, shape=[32]), name=\"b_conv1\"),\n    \"w_conv2\": tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1), name=\"w_conv2\"),\n    \"b_conv2\": tf.Variable(tf.constant(0.1, shape=[64]), name=\"b_conv2\"),\n    \"w_fc1\":      tf.Variable(tf.zeros([config.th[\"fc1_nnz\"]],  dtype=tf.float32),name=\"w_fc1\"),\n    \"w_fc1_idx\":  tf.Variable(tf.zeros([config.th[\"fc1_nnz\"],2],dtype=tf.int32),  name=\"w_fc1_idx\"),\n    \"w_fc1_shape\":tf.Variable(tf.zeros([2],     dtype=tf.int32),  name=\"w_fc1_shape\"),\n    \"b_fc1\":      tf.Variable(tf.zeros([1024], dtype=tf.float32), name=\"b_fc1\"),\n    \"w_fc2\":      tf.Variable(tf.zeros([config.th[\"fc2_nnz\"]],  dtype=tf.float32),name=\"w_fc2\"),\n    \"w_fc2_idx\":  tf.Variable(tf.zeros([config.th[\"fc2_nnz\"],2],dtype=tf.int32),  name=\"w_fc2_idx\"),\n    \"w_fc2_shape\":tf.Variable(tf.zeros([2],     dtype=tf.int32),  name=\"w_fc2_shape\"),\n    \"b_fc2\":      tf.Variable(tf.zeros([10], dtype=tf.float32), name=\"b_fc2\"),\n}\n\ndef sparse_cnn_model(weights):\n    def conv2d(x, W):\n        return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')\n    def max_pool_2x2(x):\n        return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],\n                              strides=[1, 2, 2, 1], padding='SAME')\n\n    h_conv1 = tf.nn.relu(conv2d(x_image, weights[\"w_conv1\"]) + weights[\"b_conv1\"])\n    h_pool1 = max_pool_2x2(h_conv1)\n    h_conv2 = tf.nn.relu(conv2d(h_pool1, weights[\"w_conv2\"]) + weights[\"b_conv2\"])\n    h_pool2 = max_pool_2x2(h_conv2)\n    h_pool2_flat = tf.squeeze(tf.reshape(h_pool2, [-1, 7*7*64]))\n    h_fc1 = tf.nn.relu(tf.nn.embedding_lookup_sparse(h_pool2_flat, weights[\"w_fc1_ids\"], weights[\"w_fc1\"], combiner=\"sum\") + weights[\"b_fc1\"])\n    h_fc1_drop = tf.squeeze(tf.nn.dropout(h_fc1, keep_prob))\n    y_conv = tf.nn.relu(tf.nn.embedding_lookup_sparse(h_fc1_drop, weights[\"w_fc2_ids\"], weights[\"w_fc2\"], combiner=\"sum\") + weights[\"b_fc2\"])\n    y_conv = tf.nn.softmax(tf.reshape(y_conv, [1, -1]))\n\n    return y_conv\n\n# Restore values of variables\nsaver = tf.train.Saver()\nsaver.restore(sess, args.model)\n\n# Retrieve SparseTensor from serialized dense variables\nsparse_w[\"w_fc1\"] = tf.SparseTensor(sparse_w[\"w_fc1_idx\"].eval(),\n                                    sparse_w[\"w_fc1\"].eval(),\n                                    sparse_w[\"w_fc1_shape\"].eval())\nsparse_w[\"w_fc2\"] = tf.SparseTensor(sparse_w[\"w_fc2_idx\"].eval(),\n                                    sparse_w[\"w_fc2\"].eval(),\n                                    sparse_w[\"w_fc2_shape\"].eval())\nsparse_w[\"w_fc1_ids\"] = tf.SparseTensor(sparse_w[\"w_fc1_idx\"].eval(),\n                                    sparse_w[\"w_fc1_idx\"].eval()[:,1],\n                                    sparse_w[\"w_fc1_shape\"].eval())\nsparse_w[\"w_fc2_ids\"] = tf.SparseTensor(sparse_w[\"w_fc2_idx\"].eval(),\n                                    sparse_w[\"w_fc2_idx\"].eval()[:,1],\n                                    sparse_w[\"w_fc2_shape\"].eval())\n\n# Construct a sparse model with retrieved variables\nif args.test == True:\n    x = tf.placeholder(\"float\", shape=[None, 784])\n    x_image = tf.reshape(x, [-1,28,28,1])\nelif args.deploy == True:\n    img = imgread(\"./seven.png\")\n    x = tf.placeholder(\"float\", shape=[None, 28, 28, 1])\n    x_image = x\ny_ = tf.placeholder(\"float\", shape=[None, 10])\nkeep_prob = tf.placeholder(\"float\")\n\ny_conv = sparse_cnn_model(sparse_w)\n\n# Calc results\nif args.test == True:\n    # Evaluate test sets\n    import time\n    correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))\n    accuracy = tf.reduce_mean(tf.cast(correct_prediction, \"float\"))\n\n    # To avoid OOM, run validation with 500/10000 test dataset\n    b = time.time()\n    result = 0\n    for i in range(20):\n        batch = mnist.test.next_batch(500)\n        result += accuracy.eval(feed_dict={x: batch[0],\n                                          y_: batch[1],\n                                          keep_prob: 1.0})\n    result /= 20\n    a = time.time()\n\n    print(\"test accuracy %g\" % result)\n    print \"time: %s s\" % (a-b)\nelif args.deploy == True:\n    # Infer a single image & check its latency\n    import time\n\n    b = time.time()\n    result = sess.run(tf.argmax(y_conv,1), feed_dict={x:[img], y_:mnist.test.labels, keep_prob: 1.0})\n    a = time.time()\n\n    print \"output: %s\" % result\n    print \"time: %s s\" % (a-b)\n    papl.log(\"performance_ref.log\", a-b)\n"
  },
  {
    "path": "draw_histogram.py",
    "content": "#!/usr/bin/python\n\nimport sys\nsys.dont_write_bytecode = True\n\nimport papl\nimport config\n\npapl.draw_histogram(config.weight_all, step=0.01)\npapl.draw_histogram(config.syn_all, step=1)\n"
  },
  {
    "path": "papl.py",
    "content": "import csv\nimport matplotlib\nimport matplotlib.pyplot as plt\nfrom matplotlib.ticker import FuncFormatter\nfrom matplotlib.backends.backend_pdf import PdfPages\nimport numpy as np\nimport sys\nsys.dont_write_bytecode = True\n\nimport config\n\n# =====================================================================================\n# Private methods\n# =====================================================================================\n\ndef _saveToPdf(output):\n    pp = PdfPages(output)\n    plt.savefig(pp, format='pdf')\n    pp.close()\n    plt.close()\n\n# Manipulate y-axis of histogarm\ndef _to_percent(y, position):\n    # tick locations calculated from fraction (global).\n    s = str(y*100)\n    # The percent symbol needs escaping in latex\n    if matplotlib.rcParams['text.usetex'] == True:\n        return s + r'$\\%$'\n    else:\n        return s + '%'\n\n# Calc min, max position of each histogram bin\ndef _minRuler(array):\n    minimum = min(array)\n    print \" - min: \", minimum\n    offset = minimum % step\n    return minimum - offset\n\ndef _maxRuler(array):\n    maximum = max(array)\n    print \" - max: \", maximum\n    offset = maximum % step\n    return maximum - offset + step\n\n# =====================================================================================\n# Start main methods (tools)\n# =====================================================================================\n\n# Input: x.dat from global variables (config) or arguments\n# Output: histogram. x.pdf\n# Histogram settings are configurable through config.py\ndef draw_histogram(*target, **kwargs):\n    if len(target) == 1:\n        target = target[0]\n        assert type(target) == list\n        file_list = target\n    else:\n        file_list = config.weight_all\n    global step\n    step = kwargs[\"step\"]\n\n    for target in file_list:\n        print \"Target: \", target\n        try:\n            with open(config.pdf_prefix+\"%s\" % target) as text:\n                x = np.float32(text.read().rstrip(\"\\n\").split(\"\\n\"))\n\n            # norm = np.ones_like(x) / float(len(x))\n            norm = np.ones_like(x)\n            binspace = np.arange(_minRuler(x), _maxRuler(x), step)\n            n, bins, patches = plt.hist(x, bins=binspace, weights=norm,\n                alpha=config.alpha, facecolor=config.color)\n\n            # formatter = FuncFormatter(_to_percent)\n            # plt.gca().yaxis.set_major_formatter(formatter)\n            plt.grid(True)\n\n            _saveToPdf(config.pdf_prefix+\"%s.pdf\" % target.split(\".\")[0])\n        except IOError as e:\n            print \"Warning: I/O error({0}) - {1}\".format(e.errno, e.strerror)\n            pass\n        except:\n            print \"Unexpected error:\", sys.exc_info()[0]\n            raise\n    print \"Graphs are drawned!\"\n\n# Input: model object list, Output: human-readable form of model as x.dat\ndef print_weight_vars(obj_dict, weight_obj_list, fname_list, show_zero=False):\n    for elem, fname in zip(weight_obj_list, fname_list):\n        weight_arr = obj_dict[elem].eval()\n        ndim = weight_arr.size\n        flat_weight_space = weight_arr.reshape(ndim)\n        with open(fname, \"w\") as filelog:\n            if show_zero == False:\n                flat_weight_space = flat_weight_space[flat_weight_space != 0]\n            writeLine = csv.writer(filelog, delimiter='\\n')\n            writeLine.writerow(flat_weight_space)\n\n# Input: synapse, Output: human-readable form of model as x.syn\ndef print_synapse_nps(syn_arr, fname, show_zero=False):\n    ndim = syn_arr.size\n    flat_syn_space = syn_arr.reshape(ndim)\n    with open(fname, \"w\") as filelog:\n        if show_zero == False:\n            flat_syn_space = flat_syn_space[flat_syn_space != 0]\n        writeLine = csv.writer(filelog, delimiter='\\n')\n        writeLine.writerow(flat_syn_space)\n\n# Input: sparse model object list, Output: human-readable form of model as x.dat\ndef print_sparse_weight_vars(obj_dict, weight_obj_list, fname_list):\n    for elem, fname in zip(weight_obj_list, fname_list):\n        weight_arr = obj_dict[elem].eval().values\n        ndim = weight_arr.size\n        flat_weight_space = weight_arr.reshape(ndim)\n        with open(fname, \"w\") as filelog:\n            writeLine = csv.writer(filelog, delimiter='\\n')\n            writeLine.writerow(flat_weight_space)\n\n# Input: n-d dense array, Output: pruned array with threshold\ndef prune_dense(weight_arr, name=\"None\", thresh=0.005, **kwargs):\n    \"\"\"Apply weight pruning with threshold \"\"\"\n    under_threshold = abs(weight_arr) < thresh\n    weight_arr[under_threshold] = 0\n    count = np.sum(under_threshold)\n    print \"Non-zero count (%s): %s\" % (name, weight_arr.size - count)\n    return weight_arr, -under_threshold, count\n\n# Input: anonymous dimension array and its pruning threshold,\n# Output: indices - index list of non-zero elements\n#         values  - value list of non-zero elements\n#         shape   - original shape of matrix\ndef prune_tf_sparse(weight_arr, name=\"None\", thresh=0.005):\n    assert isinstance(weight_arr, np.ndarray)\n\n    under_threshold = abs(weight_arr) < thresh\n    weight_arr[under_threshold] = 0\n    values = weight_arr[weight_arr != 0]\n    indices = np.transpose(np.nonzero(weight_arr))\n    shape = list(weight_arr.shape)\n\n    count = np.sum(under_threshold)\n    print \"Non-zero count (Sparse %s): %s\" % (name, weight_arr.size - count)\n    return [indices, values, shape]\n\n# Input: file name and text, Output: log file\ndef log(fname, log):\n   with open(fname, \"a\") as wobj:\n        wobj.write(str(log)+\"\\n\")\n\n# Input: Path to target image, Output: ndarray resized to fixed (28,28)\ndef imread(path):\n    import numpy as np\n    import Image\n    return np.array(Image.open(path).resize((28,28), resample=2))\n"
  },
  {
    "path": "read_model.py",
    "content": "#!/usr/bin/python\n\nimport sys\nsys.dont_write_bytecode = True\n\nimport tensorflow as tf\nimport papl\nimport argparse\n\nargparser = argparse.ArgumentParser()\nargparser.add_argument(\"-m\", \"--model\", required=True, help=\"Specify serialized input model\")\nargparser.add_argument(\"-r\", \"--ratio\", help=\"Specify ratio\")\nargs = argparser.parse_args()\n\ndef read_model_obj_with_sorted_ratio(fname, ratio):\n    saver = tf.train.Saver()\n    saver.restore(sess, fname)\n\n    print str(ratio*100)+\" %\"\n    target_obj_list = [weights[elem] for elem in papl.config.target_all_layer]\n    for elem in target_obj_list:\n        arr = elem.eval()\n        arr = list(arr.reshape(arr.size))\n        arr.sort(cmp=lambda x,y:cmp(abs(x), abs(y)))\n\n        print \"\\\"\"+elem.name[:-2]+\"\\\": \", abs(arr[int(len(arr)*ratio)-1]), \",\"\n\ndef print_raw_matrix(fname):\n    saver = tf.train.Saver()\n    saver.restore(sess, fname)\n    import numpy as np\n    np.save(\"w_fc1.raw\", weights[\"w_fc1\"].eval())\n    np.save(\"w_fc2.raw\", weights[\"w_fc2\"].eval())\n\ndef read_model_obj(fname):\n    saver = tf.train.Saver()\n\n    import os.path\n    try:\n        assert os.path.isfile(fname)\n        saver.restore(sess, fname)\n        switcher = {\n            \"model_ckpt_dense\": papl.config.target_dat,\n            \"model_ckpt_dense_pruned\": papl.config.target_p_dat,\n            \"model_ckpt_dense_retrained\": papl.config.target_tp_dat\n        }\n        papl.print_weight_vars(weights, papl.config.target_all_layer, switcher.get(args.model))\n    except AssertionError:\n        print \"Warning: No such files or directory\\n\"\n        pass\n    except:\n        import sys\n        print \"Unexpected error:\", sys.exc_info()[0]\n\nweights = {\n    \"w_conv1\": tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1), name=\"w_conv1\"),\n    \"b_conv1\": tf.Variable(tf.constant(0.1, shape=[32]), name=\"b_conv1\"),\n    \"w_conv2\": tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1), name=\"w_conv2\"),\n    \"b_conv2\": tf.Variable(tf.constant(0.1, shape=[64]), name=\"b_conv2\"),\n    \"w_fc1\": tf.Variable(tf.truncated_normal([7*7*64, 1024], stddev=0.1), name=\"w_fc1\"),\n    \"b_fc1\": tf.Variable(tf.constant(0.1, shape=[1024]), name=\"b_fc1\"),\n    \"w_fc2\": tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1), name=\"w_fc2\"),\n    \"b_fc2\": tf.Variable(tf.constant(0.1, shape=[10]), name=\"b_fc2\")\n}\n\nsess = tf.InteractiveSession()\n\nif __name__ == \"__main__\":\n    if bool(args.ratio) == False:\n        read_model_obj(args.model)\n    else:\n        read_model_obj_with_sorted_ratio(args.model, float(args.ratio))\n    # print_raw_matrix(args.model)\n"
  },
  {
    "path": "thspace.py",
    "content": "th10 = {\n    \"fc1_nnz\":  2890138 ,\n    \"fc2_nnz\":  9216 ,\n    \"w_conv1\":  0.0143598 ,\n    \"b_conv1\":  0.0639633 ,\n    \"w_conv2\":  0.0122113 ,\n    \"b_conv2\":  0.0764835 ,\n    \"w_fc1\":  0.0121145 ,\n    \"b_fc1\":  0.0907546 ,\n    \"w_fc2\":  0.0132132 ,\n    \"b_fc2\":  0.0911222\n}\n\nth20 = {\n    \"fc1_nnz\":  2569011 ,\n    \"fc2_nnz\":  8192 ,\n    \"w_conv1\":  0.0303244 ,\n    \"b_conv1\":  0.0703646 ,\n    \"w_conv2\":  0.0243528 ,\n    \"b_conv2\":  0.0805169 ,\n    \"w_fc1\":  0.024393 ,\n    \"b_fc1\":  0.0929042 ,\n    \"w_fc2\":  0.0266463 ,\n    \"b_fc2\":  0.0943206\n}\n\nth30 = { \n    \"fc1_nnz\":  2247885 ,\n    \"fc2_nnz\":  7168 ,\n    \"w_conv1\":  0.0473049 ,\n    \"b_conv1\":  0.075084 ,\n    \"w_conv2\":  0.0371282 ,\n    \"b_conv2\":  0.0821582 ,\n    \"w_fc1\":  0.0370279 ,\n    \"b_fc1\":  0.0944582 ,\n    \"w_fc2\":  0.0407012 ,\n    \"b_fc2\":  0.0944\n}\n\nth40 = {\n    \"fc1_nnz\":  1926757 ,\n    \"fc2_nnz\":  6143 ,\n    \"w_conv1\":  0.0619981 ,\n    \"b_conv1\":  0.0783646 ,\n    \"w_conv2\":  0.0506691 ,\n    \"b_conv2\":  0.0849416 ,\n    \"w_fc1\":  0.0503098 ,\n    \"b_fc1\":  0.0957049 ,\n    \"w_fc2\":  0.0552152 ,\n    \"b_fc2\":  0.0960752\n}\n\nth50 = {\n    \"fc1_nnz\":  1605631 ,\n    \"fc2_nnz\":  5120 ,\n    \"w_conv1\":  0.0762394 ,\n    \"b_conv1\":  0.0791745 ,\n    \"w_conv2\":  0.0650136 ,\n    \"b_conv2\":  0.0858885 ,\n    \"w_fc1\":  0.0645222 ,\n    \"b_fc1\":  0.0967964 ,\n    \"w_fc2\":  0.0705915 ,\n    \"b_fc2\":  0.0978322\n}\n\nth60 = {\n    \"fc1_nnz\":  1284506 ,\n    \"fc2_nnz\":  4095 ,\n    \"w_conv1\":  0.0936658 ,\n    \"b_conv1\":  0.0817409 ,\n    \"w_conv2\":  0.0805099 ,\n    \"b_conv2\":  0.0873334 ,\n    \"w_fc1\":  0.0801966 ,\n    \"b_fc1\":  0.0979769 ,\n    \"w_fc2\":  0.0870296 ,\n    \"b_fc2\":  0.0996566\n}\n\nth70 = {\n    \"fc1_nnz\":  963379 ,\n    \"fc2_nnz\":  3071 ,\n    \"w_conv1\":  0.110689 ,\n    \"b_conv1\":  0.0830755 ,\n    \"w_conv2\":  0.0988535 ,\n    \"b_conv2\":  0.088152 ,\n    \"w_fc1\":  0.0979934 ,\n    \"b_fc1\":  0.0991785 ,\n    \"w_fc2\":  0.105566 ,\n    \"b_fc2\":  0.100518\n}\n\nth80 = {\n    \"fc1_nnz\":  642247 ,\n    \"fc2_nnz\":  2048 ,\n    \"w_conv1\":  0.130691 ,\n    \"b_conv1\":  0.0912763 ,\n    \"w_conv2\":  0.120732 ,\n    \"b_conv2\":  0.0898623 ,\n    \"w_fc1\":  0.119522 ,\n    \"b_fc1\":  0.100626 ,\n    \"w_fc2\":  0.126458 ,\n    \"b_fc2\":  0.112008\n}\n\nth90 = {\n    \"fc1_nnz\":  320939 ,\n    \"fc2_nnz\":  1014 ,\n    \"w_conv1\":  0.162963 ,\n    \"b_conv1\":  0.0956728 ,\n    \"w_conv2\":  0.150202 ,\n    \"b_conv2\":  0.0928398 ,\n    \"w_fc1\":  0.148615 ,\n    \"b_fc1\":  0.102556 ,\n    \"w_fc2\":  0.15566 ,\n    \"b_fc2\":  0.112008\n}\n\nth95 = {\n    \"fc1_nnz\":  160566 ,\n    \"fc2_nnz\":  513 ,\n    \"w_fc1\":  0.169592 ,\n    \"b_fc1\":  0.103936 ,\n    \"w_fc2\":  0.17703 ,\n    \"b_fc2\":  0.112008\n}\n\nth99 = {\n    \"fc1_nnz\":  32111 ,\n    \"fc2_nnz\":  103 ,\n    \"w_fc1\":  0.1975 ,\n    \"b_fc1\":  0.10679 ,\n    \"w_fc2\":  0.21004 ,\n    \"b_fc2\":  0.112008\n}\n"
  },
  {
    "path": "train.py",
    "content": "#!/usr/bin/python\n\nfrom __future__ import absolute_import\nfrom __future__ import division\nfrom __future__ import print_function\n\nimport sys\nsys.dont_write_bytecode = True\n\nimport tensorflow as tf\nimport numpy as np\nimport argparse\nimport papl\n\nimport scipy.sparse as sp\n\nargparser = argparse.ArgumentParser()\nargparser.add_argument(\"-1\", \"--first_round\", action=\"store_true\",\n    help=\"Run 1st-round: train with 20000 iterations\")\nargparser.add_argument(\"-2\", \"--second_round\", action=\"store_true\",\n    help=\"Run 2nd-round: apply pruning and its additional training\")\nargparser.add_argument(\"-3\", \"--third_round\", action=\"store_true\",\n    help=\"Run 3rd-round: transform model to a sparse format and save it\")\nargparser.add_argument(\"-m\", \"--checkpoint\", default=\"./model_ckpt_dense\",\n    help=\"Target checkpoint model file for 2nd and 3rd round\")\nargs = argparser.parse_args()\n\nfrom tensorflow.examples.tutorials.mnist import input_data\nmnist = input_data.read_data_sets('/tmp/data/', one_hot=True)\nif (args.first_round or args.second_round or args.third_round) == False:\n    argparser.print_help()\n    sys.exit()\n\nsess = tf.InteractiveSession()\n\ndef apply_prune(weights):\n    total_fc_byte = 0\n    total_fc_csr_byte = 0\n    total_nnz_elem = 0\n    total_origin_elem = 0\n\n    dict_nzidx = {}\n\n    for target in papl.config.target_layer:\n        wl = \"w_\" + target\n        print(wl + \" threshold:\\t\" + str(papl.config.th[wl]))\n\n        # Get target layer's weights\n        weight_obj = weights[wl]\n        weight_arr = weight_obj.eval()\n\n        # Apply pruning\n        weight_arr, w_nzidx, w_nnz = papl.prune_dense(weight_arr, name=wl,\n                                            thresh=papl.config.th[wl])\n\n        # Store pruned weights as tensorflow objects\n        dict_nzidx[wl] = w_nzidx\n        sess.run(weight_obj.assign(weight_arr))\n\n    return dict_nzidx\n\ndef apply_prune_on_grads(grads_and_vars, dict_nzidx):\n    # Mask gradients with pruned elements\n    for key, nzidx in dict_nzidx.items():\n        count = 0\n        for grad, var in grads_and_vars:\n            if var.name == key+\":0\":\n                nzidx_obj = tf.cast(tf.constant(nzidx), tf.float32)\n                grads_and_vars[count] = (tf.mul(nzidx_obj, grad), var)\n            count += 1\n    return grads_and_vars\n\ndef gen_sparse_dict(dense_w):\n    sparse_w = dense_w\n    for target in papl.config.target_all_layer:\n        target_arr = np.transpose(dense_w[target].eval())\n        sparse_arr = papl.prune_tf_sparse(target_arr, name=target)\n        sparse_w[target+\"_idx\"]=tf.Variable(tf.constant(sparse_arr[0],dtype=tf.int32),\n            name=target+\"_idx\")\n        sparse_w[target]=tf.Variable(tf.constant(sparse_arr[1],dtype=tf.float32),\n            name=target)\n        sparse_w[target+\"_shape\"]=tf.Variable(tf.constant(sparse_arr[2],dtype=tf.int32),\n            name=target+\"_shape\")\n    return sparse_w\n\ndense_w={\n    \"w_conv1\": tf.Variable(tf.truncated_normal([5,5,1,32],stddev=0.1), name=\"w_conv1\"),\n    \"b_conv1\": tf.Variable(tf.constant(0.1,shape=[32]), name=\"b_conv1\"),\n    \"w_conv2\": tf.Variable(tf.truncated_normal([5,5,32,64],stddev=0.1), name=\"w_conv2\"),\n    \"b_conv2\": tf.Variable(tf.constant(0.1,shape=[64]), name=\"b_conv2\"),\n    \"w_fc1\": tf.Variable(tf.truncated_normal([7*7*64,1024],stddev=0.1), name=\"w_fc1\"),\n    \"b_fc1\": tf.Variable(tf.constant(0.1,shape=[1024]), name=\"b_fc1\"),\n    \"w_fc2\": tf.Variable(tf.truncated_normal([1024,10],stddev=0.1), name=\"w_fc2\"),\n    \"b_fc2\": tf.Variable(tf.constant(0.1,shape=[10]), name=\"b_fc2\")\n}\n\ndef dense_cnn_model(weights):\n    def conv2d(x, W):\n        return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')\n\n    def max_pool_2x2(x):\n        return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],\n                              strides=[1, 2, 2, 1], padding='SAME')\n\n    x_image = tf.reshape(x, [-1,28,28,1])\n    h_conv1 = tf.nn.relu(conv2d(x_image, weights[\"w_conv1\"]) + weights[\"b_conv1\"])\n    tf.add_to_collection(\"in_conv1\", x_image)\n    h_pool1 = max_pool_2x2(h_conv1)\n    tf.add_to_collection(\"in_conv2\", h_pool1)\n    h_conv2 = tf.nn.relu(conv2d(h_pool1, weights[\"w_conv2\"]) + weights[\"b_conv2\"])\n    h_pool2 = max_pool_2x2(h_conv2)\n    h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])\n    tf.add_to_collection(\"in_fc1\", h_pool2_flat)\n    h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, weights[\"w_fc1\"]) + weights[\"b_fc1\"])\n    h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)\n    tf.add_to_collection(\"in_fc2\", h_fc1_drop)\n    y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, weights[\"w_fc2\"]) + weights[\"b_fc2\"])\n    return y_conv\n\ndef test(y_infer, message=\"None.\"):\n    correct_prediction = tf.equal(tf.argmax(y_infer,1), tf.argmax(y_,1))\n    accuracy = tf.reduce_mean(tf.cast(correct_prediction, \"float\"))\n\n    # To avoid OOM, run validation with 500/10000 test dataset\n    result = 0\n    for i in range(20):\n        batch = mnist.test.next_batch(500)\n        result += accuracy.eval(feed_dict={x: batch[0],\n                                          y_: batch[1],\n                                          keep_prob: 1.0})\n    result /= 20\n\n    print(message+\" %g\\n\" % result)\n    return result\n\ndef check_file_exists(key):\n    import os\n    fileList = os.listdir(\".\")\n    count = 0\n    for elem in fileList:\n        if elem.find(key) >= 0:\n            count += 1\n    return key + (\"-\"+str(count) if count>0 else \"\")\n\n# Construct a dense model\nx = tf.placeholder(\"float\", shape=[None, 784], name=\"x\")\ny_ = tf.placeholder(\"float\", shape=[None, 10], name=\"y_\")\nkeep_prob = tf.placeholder(\"float\", name=\"keep_prob\")\n\ny_conv = dense_cnn_model(dense_w)\ntf.add_to_collection(\"y_conv\", y_conv)\n\nsaver = tf.train.Saver()\n\nif args.first_round == True:\n    # First round: Train baseline dense model\n    cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))\n    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)\n    correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))\n    accuracy = tf.reduce_mean(tf.cast(correct_prediction, \"float\"))\n    tf.add_to_collection(\"accuracy\", accuracy)\n\n    sess.run(tf.initialize_all_variables())\n\n    for i in range(20000):\n        batch = mnist.train.next_batch(50)\n        if i%100 == 0:\n            train_accuracy = accuracy.eval(feed_dict={\n                x:batch[0], y_: batch[1], keep_prob: 1.0})\n            print(\"step %d, training accuracy %g\"%(i, train_accuracy))\n        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})\n\n    # Test\n    score = test(y_conv, message=\"First-round prune-only test accuracy\")\n    papl.log(\"baseline_accuracy.log\", score)\n    \n    # Save model objects to readable format\n    papl.print_weight_vars(dense_w, papl.config.target_all_layer,\n                           papl.config.target_dat, show_zero=papl.config.show_zero)\n    # Save model objects to serialized format\n    saver.save(sess, \"./model_ckpt_dense\")\n\nif args.second_round == True:\n    # Second round: Retrain pruned model, start with default model: model_ckpt_dense\n    saver.restore(sess, args.checkpoint)\n\n    # Apply pruning on this context\n    dict_nzidx = apply_prune(dense_w)\n\n    # save model objects to readable format\n    papl.print_weight_vars(dense_w, papl.config.target_all_layer,\n                           papl.config.target_p_dat, show_zero=papl.config.show_zero)\n\n    # Test prune-only networks\n    score = test(y_conv, message=\"Second-round prune-only test accuracy\")\n    papl.log(\"prune_accuracy.log\", score)\n\n    # save model objects to serialized format\n    saver.save(sess, \"./model_ckpt_dense_pruned\")\n\n    # Retrain networks\n    cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0)))\n    trainer = tf.train.AdamOptimizer(1e-4)\n    grads_and_vars = trainer.compute_gradients(cross_entropy)\n    grads_and_vars = apply_prune_on_grads(grads_and_vars, dict_nzidx)\n    train_step = trainer.apply_gradients(grads_and_vars)\n\n    correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))\n    accuracy = tf.reduce_mean(tf.cast(correct_prediction, \"float\"))\n\n    # Initialize firstly touched variables (mostly from accuracy calc.)\n    for var in tf.all_variables():\n        if tf.is_variable_initialized(var).eval() == False:\n            sess.run(tf.initialize_variables([var]))\n\n    # Train x epochs additionally\n    for i in range(papl.config.retrain_iterations):\n        batch = mnist.train.next_batch(50)\n        if i%100 == 0:\n            train_accuracy = accuracy.eval(feed_dict={\n                x:batch[0], y_: batch[1], keep_prob: 1.0})\n            print(\"step %d, training accuracy %g\"%(i, train_accuracy))\n        train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5})\n\n    # Save retrained variables to a desne form\n    # key = check_file_exists(\"model_ckpt_dense_retrained\")\n    # saver.save(sess, key)\n    saver.save(sess, \"model_ckpt_dense_retrained\")\n\n    # Test the retrained model\n    score = test(y_conv, message=\"Second-round final test accuracy\")\n    papl.log(\"final_accuracy.log\", score)\n\nif args.third_round == True:\n    # Third round: Transform iteratively pruned model to a sparse format and save it\n    if args.second_round == False:\n        saver.restore(sess, \"./model_ckpt_dense_pruned\")\n\n    # Transform final weights to a sparse form\n    sparse_w = gen_sparse_dict(dense_w)\n\n    # Initialize new variables in a sparse form\n    for var in tf.all_variables():\n        if tf.is_variable_initialized(var).eval() == False:\n            sess.run(tf.initialize_variables([var]))\n\n    # Save model objects to readable format\n    papl.print_weight_vars(dense_w, papl.config.target_all_layer,\n                           papl.config.target_tp_dat, show_zero=papl.config.show_zero)\n    # Save model objects to serialized format\n    final_saver = tf.train.Saver(sparse_w)\n    final_saver.save(sess, \"./model_ckpt_sparse_retrained\") \n"
  }
]