Repository: garion9013/impl-pruning-TF Branch: master Commit: a185420c7bf8 Files: 16 Total size: 50.0 MB Directory structure: gitextract_z8dibw7e/ ├── README.md ├── config.py ├── deploy_test.py ├── deploy_test_pruned.py ├── draw_histogram.py ├── model_ckpt_dense ├── model_ckpt_dense.meta ├── model_ckpt_dense_pruned ├── model_ckpt_sparse_retrained ├── papl.py ├── read_model.py ├── sparse_model_extreme/ │ ├── model_ckpt_dense_pruned │ ├── model_ckpt_dense_retrained │ └── model_ckpt_sparse_retrained ├── thspace.py └── train.py ================================================ FILE CONTENTS ================================================ ================================================ FILE: README.md ================================================ ## TensorFlow implementation of "Iterative Pruning" **CAUTION**: Out-of-date notices. Currently, I've checked TF (>1.3) supports *sparse_matmul* and it seems that this is more correct way to implement iterative pruning. This work is just naively done with quite old versions (0.8.0) and thus, I do not recommend to consider these codes for your serious cases. And there will be no updates or maintenance either. --- This work is based on "Learning both Weights and Connections for Efficient Neural Network." [Song et al.](http://arxiv.org/pdf/1506.02626v3.pdf) @ NIPS '15. Note that these works are just for quantifying its effectiveness on latency (within TensorFlow), not a best optimal. Thus, some details are abbreviated for simplicity. (e.g. # of iterations, adjusted dropout ratio, etc.) I applied Iterative Pruning on a small MNIST CNN model (13MB, originally), which can be accessed from [TensorFlow Tutorials](https://www.tensorflow.org/versions/r0.8/tutorials/mnist/pros/index.html). After pruning off some percentages of weights, I've simply retrained two epochs for each case and got compressed models (minimum 2.6MB with 90% off) with minor loss of accuracy. (99.17% -> 98.99% with 90% off and retraining) Again, this is not an optimal. ## Issues Due to lack of supports on SparseTensor and its operations of TensorFlow (0.8.0), this implementation has some limitations. This work uses [*embedding_lookup_sparse*](https://www.tensorflow.org/versions/r0.8/api_docs/python/nn.html#embedding_lookup_sparse) to compute sparse matrix-vector multiplication. It is not solely for the purpose of sparse matrix vector multiplication, and thus its performance may be sub-optimal. (I'm not sure.) Also, TensorFlow uses \ pair for sparse matrix rather than using typical CSR format which is more compact and performant. In summary, because of the following reasons, I think this implementation has some limitations. 1. *embedding_lookup_sparse* doesn't support ```broadcasting```, which prohibits users to run test with normal test datasets. 2. Performance may be somewhat sub-optimal. 3. Because "Sparse Variable" is not supported, manual dense to sparse and sparse to dense transformation is required. 4. 4D Convolution Tensor may also be applicable, but bit tricky. 5. Current *embedding_lookup_sparse* forces additional matrix transpose, dimension squeeze and dimension reshape. ## File descriptions and usages model_ckpt_dense: original model
model_ckpt_dense_pruned: 90% pruned-only model
model_ckpt_sparse_retrained: 90% pruned and retrained model
#### Python package requirements ```bash sudo apt-get install python-scipy python-numpy python-matplotlib ``` To regenerate these sparse model, edit ```config.py``` first as your threshold configuration, and then run training with second (pruning and retraining) and third (generate sparse form of weight data) round options. ```bash ./train.py -2 -3 ``` To inference single image (seven.png) and measure its latency, ```bash ./deploy_test.py -d -m model_ckpt_dense ./deploy_test_sparse.py -d -m model_ckpt_sparse_retrained ``` To test dense model, ```bash ./deploy_test.py -t -m model_ckpt_dense ./deploy_test.py -t -m model_ckpt_dense_pruned ./deploy_test.py -t -m model_ckpt_dense_retrained ``` To draw histogram that shows the weight distribution, ```bash # After running train.py (it generates .dat files) ./draw_histogram.py ``` ## Performance Results are currently somewhat mediocre or degraded due to indirection and additional storage overhead originated from sparse matrix form. Also, it may because model size is too small. (12.49MB) #### Storage overhead Baseline: 12.49 MB
10 % pruned: 21.86 MB
20 % pruned: 19.45 MB
30 % pruned: 17.05 MB
40 % pruned: 14.64 MB
50 % pruned: 12.23 MB
60 % pruned: 9.83 MB
70 % pruned: 7.42 MB
80 % pruned: 5.02 MB
90 % pruned: 2.61 MB
#### CPU performance (5 times averaged) CPU: Intel Core i5-2500 @ 3.3 GHz, LLC size: 6 MB http://younghwanoh.github.io/images/cpu-desktop.png Baseline: 0.01118040085 s
10 % pruned: 1.919299984 s
20 % pruned: 0.2325239658 s
30 % pruned: 0.2111079693 s
40 % pruned: 0.1982570648 s
50 % pruned: 0.1691776752 s
60 % pruned: 0.1305227757 s
70 % pruned: 0.116039753 s
80 % pruned: 0.103564167 s
90 % pruned: 0.1058168888 s
#### GPU performance (5 times averaged) GPU: Nvidia Geforce GTX650 @ 1.058 GHz, LLC size: 256 KB http://younghwanoh.github.io/images/gpu-desktop.png Baseline: 0.1475181845 s
10 % pruned: 0.2954540253 s
20 % pruned: 0.2665398121 s
30 % pruned: 0.2585638046 s
40 % pruned: 0.2090051651 s
50 % pruned: 0.1995279789 s
60 % pruned: 0.1815193653 s
70 % pruned: 0.1436806202 s
80 % pruned: 0.135668993 s
90 % pruned: 0.1218701839 s
================================================ FILE: config.py ================================================ #!/usr/bin/python import thspace as ths def _complex_concat(a, b): tmp = [] for i in a: for j in b: tmp.append(i+j) return tmp def _add_prefix(a): tmp = [] for idx, val in enumerate(a): tmp.append("w_" + val) # tmp.append("b_" + val) return tmp # Pruning threshold setting (90 % off) th = ths.th90 # CNN settings for pruned training target_layer = ["fc1", "fc2"] retrain_iterations = 10 # Output data lists: do not change this target_all_layer = _add_prefix(target_layer) target_dat = _complex_concat(target_all_layer, [".dat"]) target_p_dat = _complex_concat(target_all_layer, ["_p.dat"]) target_tp_dat = _complex_concat(target_all_layer, ["_tp.dat"]) weight_all = target_dat + target_p_dat + target_tp_dat syn_all = ["in_conv1.syn", "in_conv2.syn", "in_fc1.syn", "in_fc2.syn"] # Data settings show_zero = False # Graph settings alpha = 0.75 color = "green" pdf_prefix = "" ================================================ FILE: deploy_test.py ================================================ #!/usr/bin/python import sys sys.dont_write_bytecode = True import tensorflow as tf import numpy as np import argparse import papl import config argparser = argparse.ArgumentParser() argparser.add_argument("-t", "--test", action="store_true", help="Run test") argparser.add_argument("-d", "--deploy", action="store_true", help="Run deploy with seven.png") argparser.add_argument("-s", "--print_syn", action="store_true", help="Print synapses to .syn") argparser.add_argument("-m", "--model", default="./model_ckpt_dense", help="Specify a target model file") args = argparser.parse_args() if (args.test or args.deploy or args.print_syn) == True: from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('/tmp/data/', one_hot=True) else: argparser.print_help() sys.exit() # sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))) sess = tf.InteractiveSession() # sess = tf.Session() def imgread(path): tmp = papl.imread(path) img = np.zeros((28,28,1)) img[:,:,0]=tmp[:,:,0] img = np.reshape(img, img.size) return img # Restore values of variables saver = tf.train.import_meta_graph(args.model+'.meta') saver.restore(sess, args.model) # Calc results if args.test == True: # Evaluate test sets import time accuracy = tf.get_collection("accuracy")[0] # To avoid OOM, run validation with 500/10000 test dataset b = time.time() result = 0 for i in range(20): batch = mnist.test.next_batch(500) result += sess.run(accuracy, feed_dict={"x:0": batch[0], "y_:0": batch[1], "keep_prob:0": 1.0}) result /= 20 a = time.time() print("Test accuracy %g" % result) print "Time: %s s" % (a-b) elif args.deploy == True: # Infer a single image & check its latency import time img = imgread('seven.png') y_conv = tf.get_collection("y_conv")[0] b = time.time() result = sess.run(tf.argmax(y_conv,1), feed_dict={"x:0":[img], "y_:0":mnist.test.labels, "keep_prob:0": 1.0}) a = time.time() print "Output: %s" % result print "Time: %s s" % (a-b) papl.log("performance_ref.log", a-b) elif args.print_syn == True: # Print synapses (Input data of each neuron) img = imgread('seven.png') target_syn = config.syn_all synapses = [ tf.get_collection(elem.split(".")[0])[0] for elem in target_syn ] for i,j in zip(synapses, config.syn_all): syn = sess.run(i, feed_dict={"x:0":[img], "y_:0":mnist.test.labels, "keep_prob:0": 1.0}) papl.print_synapse_nps(syn, j) print "Done! Synapse data is printed to x.syn" ================================================ FILE: deploy_test_pruned.py ================================================ #!/usr/bin/python import sys sys.dont_write_bytecode = True import tensorflow as tf import numpy as np import argparse import config import papl argparser = argparse.ArgumentParser() argparser.add_argument("-t", "--test", action="store_true", help="Run test") argparser.add_argument("-d", "--deploy", action="store_true", help="Run deploy with seven.png") argparser.add_argument("-m", "--model", default="./model_ckpt_sparse_retrained", help="Specify a target model file") args = argparser.parse_args() if (args.test) == True: print "Error: TensorFlow 0.8 doesn't support broadcasts on sparse operations, cannot run test set now" sys.exit() elif (args.deploy) == True: from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('/tmp/data/', one_hot=True) else: argparser.print_help() sys.exit() # sess = tf.InteractiveSession(config=tf.ConfigProto(gpu_options=tf.GPUOptions(allow_growth=True))) sess = tf.InteractiveSession(config=tf.ConfigProto(device_count={'GPU':0})) # sess = tf.Session() def imgread(path): tmp = papl.imread(path) img = np.zeros((28,28,1)) img[:,:,0]=tmp[:,:,0] return img # Declare weight variables sparse_w={ "w_conv1": tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1), name="w_conv1"), "b_conv1": tf.Variable(tf.constant(0.1, shape=[32]), name="b_conv1"), "w_conv2": tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1), name="w_conv2"), "b_conv2": tf.Variable(tf.constant(0.1, shape=[64]), name="b_conv2"), "w_fc1": tf.Variable(tf.zeros([config.th["fc1_nnz"]], dtype=tf.float32),name="w_fc1"), "w_fc1_idx": tf.Variable(tf.zeros([config.th["fc1_nnz"],2],dtype=tf.int32), name="w_fc1_idx"), "w_fc1_shape":tf.Variable(tf.zeros([2], dtype=tf.int32), name="w_fc1_shape"), "b_fc1": tf.Variable(tf.zeros([1024], dtype=tf.float32), name="b_fc1"), "w_fc2": tf.Variable(tf.zeros([config.th["fc2_nnz"]], dtype=tf.float32),name="w_fc2"), "w_fc2_idx": tf.Variable(tf.zeros([config.th["fc2_nnz"],2],dtype=tf.int32), name="w_fc2_idx"), "w_fc2_shape":tf.Variable(tf.zeros([2], dtype=tf.int32), name="w_fc2_shape"), "b_fc2": tf.Variable(tf.zeros([10], dtype=tf.float32), name="b_fc2"), } def sparse_cnn_model(weights): def conv2d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') h_conv1 = tf.nn.relu(conv2d(x_image, weights["w_conv1"]) + weights["b_conv1"]) h_pool1 = max_pool_2x2(h_conv1) h_conv2 = tf.nn.relu(conv2d(h_pool1, weights["w_conv2"]) + weights["b_conv2"]) h_pool2 = max_pool_2x2(h_conv2) h_pool2_flat = tf.squeeze(tf.reshape(h_pool2, [-1, 7*7*64])) h_fc1 = tf.nn.relu(tf.nn.embedding_lookup_sparse(h_pool2_flat, weights["w_fc1_ids"], weights["w_fc1"], combiner="sum") + weights["b_fc1"]) h_fc1_drop = tf.squeeze(tf.nn.dropout(h_fc1, keep_prob)) y_conv = tf.nn.relu(tf.nn.embedding_lookup_sparse(h_fc1_drop, weights["w_fc2_ids"], weights["w_fc2"], combiner="sum") + weights["b_fc2"]) y_conv = tf.nn.softmax(tf.reshape(y_conv, [1, -1])) return y_conv # Restore values of variables saver = tf.train.Saver() saver.restore(sess, args.model) # Retrieve SparseTensor from serialized dense variables sparse_w["w_fc1"] = tf.SparseTensor(sparse_w["w_fc1_idx"].eval(), sparse_w["w_fc1"].eval(), sparse_w["w_fc1_shape"].eval()) sparse_w["w_fc2"] = tf.SparseTensor(sparse_w["w_fc2_idx"].eval(), sparse_w["w_fc2"].eval(), sparse_w["w_fc2_shape"].eval()) sparse_w["w_fc1_ids"] = tf.SparseTensor(sparse_w["w_fc1_idx"].eval(), sparse_w["w_fc1_idx"].eval()[:,1], sparse_w["w_fc1_shape"].eval()) sparse_w["w_fc2_ids"] = tf.SparseTensor(sparse_w["w_fc2_idx"].eval(), sparse_w["w_fc2_idx"].eval()[:,1], sparse_w["w_fc2_shape"].eval()) # Construct a sparse model with retrieved variables if args.test == True: x = tf.placeholder("float", shape=[None, 784]) x_image = tf.reshape(x, [-1,28,28,1]) elif args.deploy == True: img = imgread("./seven.png") x = tf.placeholder("float", shape=[None, 28, 28, 1]) x_image = x y_ = tf.placeholder("float", shape=[None, 10]) keep_prob = tf.placeholder("float") y_conv = sparse_cnn_model(sparse_w) # Calc results if args.test == True: # Evaluate test sets import time correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) # To avoid OOM, run validation with 500/10000 test dataset b = time.time() result = 0 for i in range(20): batch = mnist.test.next_batch(500) result += accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0}) result /= 20 a = time.time() print("test accuracy %g" % result) print "time: %s s" % (a-b) elif args.deploy == True: # Infer a single image & check its latency import time b = time.time() result = sess.run(tf.argmax(y_conv,1), feed_dict={x:[img], y_:mnist.test.labels, keep_prob: 1.0}) a = time.time() print "output: %s" % result print "time: %s s" % (a-b) papl.log("performance_ref.log", a-b) ================================================ FILE: draw_histogram.py ================================================ #!/usr/bin/python import sys sys.dont_write_bytecode = True import papl import config papl.draw_histogram(config.weight_all, step=0.01) papl.draw_histogram(config.syn_all, step=1) ================================================ FILE: model_ckpt_dense ================================================ [File too large to display: 12.5 MB] ================================================ FILE: model_ckpt_dense_pruned ================================================ [File too large to display: 12.5 MB] ================================================ FILE: papl.py ================================================ import csv import matplotlib import matplotlib.pyplot as plt from matplotlib.ticker import FuncFormatter from matplotlib.backends.backend_pdf import PdfPages import numpy as np import sys sys.dont_write_bytecode = True import config # ===================================================================================== # Private methods # ===================================================================================== def _saveToPdf(output): pp = PdfPages(output) plt.savefig(pp, format='pdf') pp.close() plt.close() # Manipulate y-axis of histogarm def _to_percent(y, position): # tick locations calculated from fraction (global). s = str(y*100) # The percent symbol needs escaping in latex if matplotlib.rcParams['text.usetex'] == True: return s + r'$\%$' else: return s + '%' # Calc min, max position of each histogram bin def _minRuler(array): minimum = min(array) print " - min: ", minimum offset = minimum % step return minimum - offset def _maxRuler(array): maximum = max(array) print " - max: ", maximum offset = maximum % step return maximum - offset + step # ===================================================================================== # Start main methods (tools) # ===================================================================================== # Input: x.dat from global variables (config) or arguments # Output: histogram. x.pdf # Histogram settings are configurable through config.py def draw_histogram(*target, **kwargs): if len(target) == 1: target = target[0] assert type(target) == list file_list = target else: file_list = config.weight_all global step step = kwargs["step"] for target in file_list: print "Target: ", target try: with open(config.pdf_prefix+"%s" % target) as text: x = np.float32(text.read().rstrip("\n").split("\n")) # norm = np.ones_like(x) / float(len(x)) norm = np.ones_like(x) binspace = np.arange(_minRuler(x), _maxRuler(x), step) n, bins, patches = plt.hist(x, bins=binspace, weights=norm, alpha=config.alpha, facecolor=config.color) # formatter = FuncFormatter(_to_percent) # plt.gca().yaxis.set_major_formatter(formatter) plt.grid(True) _saveToPdf(config.pdf_prefix+"%s.pdf" % target.split(".")[0]) except IOError as e: print "Warning: I/O error({0}) - {1}".format(e.errno, e.strerror) pass except: print "Unexpected error:", sys.exc_info()[0] raise print "Graphs are drawned!" # Input: model object list, Output: human-readable form of model as x.dat def print_weight_vars(obj_dict, weight_obj_list, fname_list, show_zero=False): for elem, fname in zip(weight_obj_list, fname_list): weight_arr = obj_dict[elem].eval() ndim = weight_arr.size flat_weight_space = weight_arr.reshape(ndim) with open(fname, "w") as filelog: if show_zero == False: flat_weight_space = flat_weight_space[flat_weight_space != 0] writeLine = csv.writer(filelog, delimiter='\n') writeLine.writerow(flat_weight_space) # Input: synapse, Output: human-readable form of model as x.syn def print_synapse_nps(syn_arr, fname, show_zero=False): ndim = syn_arr.size flat_syn_space = syn_arr.reshape(ndim) with open(fname, "w") as filelog: if show_zero == False: flat_syn_space = flat_syn_space[flat_syn_space != 0] writeLine = csv.writer(filelog, delimiter='\n') writeLine.writerow(flat_syn_space) # Input: sparse model object list, Output: human-readable form of model as x.dat def print_sparse_weight_vars(obj_dict, weight_obj_list, fname_list): for elem, fname in zip(weight_obj_list, fname_list): weight_arr = obj_dict[elem].eval().values ndim = weight_arr.size flat_weight_space = weight_arr.reshape(ndim) with open(fname, "w") as filelog: writeLine = csv.writer(filelog, delimiter='\n') writeLine.writerow(flat_weight_space) # Input: n-d dense array, Output: pruned array with threshold def prune_dense(weight_arr, name="None", thresh=0.005, **kwargs): """Apply weight pruning with threshold """ under_threshold = abs(weight_arr) < thresh weight_arr[under_threshold] = 0 count = np.sum(under_threshold) print "Non-zero count (%s): %s" % (name, weight_arr.size - count) return weight_arr, -under_threshold, count # Input: anonymous dimension array and its pruning threshold, # Output: indices - index list of non-zero elements # values - value list of non-zero elements # shape - original shape of matrix def prune_tf_sparse(weight_arr, name="None", thresh=0.005): assert isinstance(weight_arr, np.ndarray) under_threshold = abs(weight_arr) < thresh weight_arr[under_threshold] = 0 values = weight_arr[weight_arr != 0] indices = np.transpose(np.nonzero(weight_arr)) shape = list(weight_arr.shape) count = np.sum(under_threshold) print "Non-zero count (Sparse %s): %s" % (name, weight_arr.size - count) return [indices, values, shape] # Input: file name and text, Output: log file def log(fname, log): with open(fname, "a") as wobj: wobj.write(str(log)+"\n") # Input: Path to target image, Output: ndarray resized to fixed (28,28) def imread(path): import numpy as np import Image return np.array(Image.open(path).resize((28,28), resample=2)) ================================================ FILE: read_model.py ================================================ #!/usr/bin/python import sys sys.dont_write_bytecode = True import tensorflow as tf import papl import argparse argparser = argparse.ArgumentParser() argparser.add_argument("-m", "--model", required=True, help="Specify serialized input model") argparser.add_argument("-r", "--ratio", help="Specify ratio") args = argparser.parse_args() def read_model_obj_with_sorted_ratio(fname, ratio): saver = tf.train.Saver() saver.restore(sess, fname) print str(ratio*100)+" %" target_obj_list = [weights[elem] for elem in papl.config.target_all_layer] for elem in target_obj_list: arr = elem.eval() arr = list(arr.reshape(arr.size)) arr.sort(cmp=lambda x,y:cmp(abs(x), abs(y))) print "\""+elem.name[:-2]+"\": ", abs(arr[int(len(arr)*ratio)-1]), "," def print_raw_matrix(fname): saver = tf.train.Saver() saver.restore(sess, fname) import numpy as np np.save("w_fc1.raw", weights["w_fc1"].eval()) np.save("w_fc2.raw", weights["w_fc2"].eval()) def read_model_obj(fname): saver = tf.train.Saver() import os.path try: assert os.path.isfile(fname) saver.restore(sess, fname) switcher = { "model_ckpt_dense": papl.config.target_dat, "model_ckpt_dense_pruned": papl.config.target_p_dat, "model_ckpt_dense_retrained": papl.config.target_tp_dat } papl.print_weight_vars(weights, papl.config.target_all_layer, switcher.get(args.model)) except AssertionError: print "Warning: No such files or directory\n" pass except: import sys print "Unexpected error:", sys.exc_info()[0] weights = { "w_conv1": tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1), name="w_conv1"), "b_conv1": tf.Variable(tf.constant(0.1, shape=[32]), name="b_conv1"), "w_conv2": tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1), name="w_conv2"), "b_conv2": tf.Variable(tf.constant(0.1, shape=[64]), name="b_conv2"), "w_fc1": tf.Variable(tf.truncated_normal([7*7*64, 1024], stddev=0.1), name="w_fc1"), "b_fc1": tf.Variable(tf.constant(0.1, shape=[1024]), name="b_fc1"), "w_fc2": tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1), name="w_fc2"), "b_fc2": tf.Variable(tf.constant(0.1, shape=[10]), name="b_fc2") } sess = tf.InteractiveSession() if __name__ == "__main__": if bool(args.ratio) == False: read_model_obj(args.model) else: read_model_obj_with_sorted_ratio(args.model, float(args.ratio)) # print_raw_matrix(args.model) ================================================ FILE: sparse_model_extreme/model_ckpt_dense_pruned ================================================ [File too large to display: 12.5 MB] ================================================ FILE: sparse_model_extreme/model_ckpt_dense_retrained ================================================ [File too large to display: 12.5 MB] ================================================ FILE: thspace.py ================================================ th10 = { "fc1_nnz": 2890138 , "fc2_nnz": 9216 , "w_conv1": 0.0143598 , "b_conv1": 0.0639633 , "w_conv2": 0.0122113 , "b_conv2": 0.0764835 , "w_fc1": 0.0121145 , "b_fc1": 0.0907546 , "w_fc2": 0.0132132 , "b_fc2": 0.0911222 } th20 = { "fc1_nnz": 2569011 , "fc2_nnz": 8192 , "w_conv1": 0.0303244 , "b_conv1": 0.0703646 , "w_conv2": 0.0243528 , "b_conv2": 0.0805169 , "w_fc1": 0.024393 , "b_fc1": 0.0929042 , "w_fc2": 0.0266463 , "b_fc2": 0.0943206 } th30 = { "fc1_nnz": 2247885 , "fc2_nnz": 7168 , "w_conv1": 0.0473049 , "b_conv1": 0.075084 , "w_conv2": 0.0371282 , "b_conv2": 0.0821582 , "w_fc1": 0.0370279 , "b_fc1": 0.0944582 , "w_fc2": 0.0407012 , "b_fc2": 0.0944 } th40 = { "fc1_nnz": 1926757 , "fc2_nnz": 6143 , "w_conv1": 0.0619981 , "b_conv1": 0.0783646 , "w_conv2": 0.0506691 , "b_conv2": 0.0849416 , "w_fc1": 0.0503098 , "b_fc1": 0.0957049 , "w_fc2": 0.0552152 , "b_fc2": 0.0960752 } th50 = { "fc1_nnz": 1605631 , "fc2_nnz": 5120 , "w_conv1": 0.0762394 , "b_conv1": 0.0791745 , "w_conv2": 0.0650136 , "b_conv2": 0.0858885 , "w_fc1": 0.0645222 , "b_fc1": 0.0967964 , "w_fc2": 0.0705915 , "b_fc2": 0.0978322 } th60 = { "fc1_nnz": 1284506 , "fc2_nnz": 4095 , "w_conv1": 0.0936658 , "b_conv1": 0.0817409 , "w_conv2": 0.0805099 , "b_conv2": 0.0873334 , "w_fc1": 0.0801966 , "b_fc1": 0.0979769 , "w_fc2": 0.0870296 , "b_fc2": 0.0996566 } th70 = { "fc1_nnz": 963379 , "fc2_nnz": 3071 , "w_conv1": 0.110689 , "b_conv1": 0.0830755 , "w_conv2": 0.0988535 , "b_conv2": 0.088152 , "w_fc1": 0.0979934 , "b_fc1": 0.0991785 , "w_fc2": 0.105566 , "b_fc2": 0.100518 } th80 = { "fc1_nnz": 642247 , "fc2_nnz": 2048 , "w_conv1": 0.130691 , "b_conv1": 0.0912763 , "w_conv2": 0.120732 , "b_conv2": 0.0898623 , "w_fc1": 0.119522 , "b_fc1": 0.100626 , "w_fc2": 0.126458 , "b_fc2": 0.112008 } th90 = { "fc1_nnz": 320939 , "fc2_nnz": 1014 , "w_conv1": 0.162963 , "b_conv1": 0.0956728 , "w_conv2": 0.150202 , "b_conv2": 0.0928398 , "w_fc1": 0.148615 , "b_fc1": 0.102556 , "w_fc2": 0.15566 , "b_fc2": 0.112008 } th95 = { "fc1_nnz": 160566 , "fc2_nnz": 513 , "w_fc1": 0.169592 , "b_fc1": 0.103936 , "w_fc2": 0.17703 , "b_fc2": 0.112008 } th99 = { "fc1_nnz": 32111 , "fc2_nnz": 103 , "w_fc1": 0.1975 , "b_fc1": 0.10679 , "w_fc2": 0.21004 , "b_fc2": 0.112008 } ================================================ FILE: train.py ================================================ #!/usr/bin/python from __future__ import absolute_import from __future__ import division from __future__ import print_function import sys sys.dont_write_bytecode = True import tensorflow as tf import numpy as np import argparse import papl import scipy.sparse as sp argparser = argparse.ArgumentParser() argparser.add_argument("-1", "--first_round", action="store_true", help="Run 1st-round: train with 20000 iterations") argparser.add_argument("-2", "--second_round", action="store_true", help="Run 2nd-round: apply pruning and its additional training") argparser.add_argument("-3", "--third_round", action="store_true", help="Run 3rd-round: transform model to a sparse format and save it") argparser.add_argument("-m", "--checkpoint", default="./model_ckpt_dense", help="Target checkpoint model file for 2nd and 3rd round") args = argparser.parse_args() from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets('/tmp/data/', one_hot=True) if (args.first_round or args.second_round or args.third_round) == False: argparser.print_help() sys.exit() sess = tf.InteractiveSession() def apply_prune(weights): total_fc_byte = 0 total_fc_csr_byte = 0 total_nnz_elem = 0 total_origin_elem = 0 dict_nzidx = {} for target in papl.config.target_layer: wl = "w_" + target print(wl + " threshold:\t" + str(papl.config.th[wl])) # Get target layer's weights weight_obj = weights[wl] weight_arr = weight_obj.eval() # Apply pruning weight_arr, w_nzidx, w_nnz = papl.prune_dense(weight_arr, name=wl, thresh=papl.config.th[wl]) # Store pruned weights as tensorflow objects dict_nzidx[wl] = w_nzidx sess.run(weight_obj.assign(weight_arr)) return dict_nzidx def apply_prune_on_grads(grads_and_vars, dict_nzidx): # Mask gradients with pruned elements for key, nzidx in dict_nzidx.items(): count = 0 for grad, var in grads_and_vars: if var.name == key+":0": nzidx_obj = tf.cast(tf.constant(nzidx), tf.float32) grads_and_vars[count] = (tf.mul(nzidx_obj, grad), var) count += 1 return grads_and_vars def gen_sparse_dict(dense_w): sparse_w = dense_w for target in papl.config.target_all_layer: target_arr = np.transpose(dense_w[target].eval()) sparse_arr = papl.prune_tf_sparse(target_arr, name=target) sparse_w[target+"_idx"]=tf.Variable(tf.constant(sparse_arr[0],dtype=tf.int32), name=target+"_idx") sparse_w[target]=tf.Variable(tf.constant(sparse_arr[1],dtype=tf.float32), name=target) sparse_w[target+"_shape"]=tf.Variable(tf.constant(sparse_arr[2],dtype=tf.int32), name=target+"_shape") return sparse_w dense_w={ "w_conv1": tf.Variable(tf.truncated_normal([5,5,1,32],stddev=0.1), name="w_conv1"), "b_conv1": tf.Variable(tf.constant(0.1,shape=[32]), name="b_conv1"), "w_conv2": tf.Variable(tf.truncated_normal([5,5,32,64],stddev=0.1), name="w_conv2"), "b_conv2": tf.Variable(tf.constant(0.1,shape=[64]), name="b_conv2"), "w_fc1": tf.Variable(tf.truncated_normal([7*7*64,1024],stddev=0.1), name="w_fc1"), "b_fc1": tf.Variable(tf.constant(0.1,shape=[1024]), name="b_fc1"), "w_fc2": tf.Variable(tf.truncated_normal([1024,10],stddev=0.1), name="w_fc2"), "b_fc2": tf.Variable(tf.constant(0.1,shape=[10]), name="b_fc2") } def dense_cnn_model(weights): def conv2d(x, W): return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME') def max_pool_2x2(x): return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') x_image = tf.reshape(x, [-1,28,28,1]) h_conv1 = tf.nn.relu(conv2d(x_image, weights["w_conv1"]) + weights["b_conv1"]) tf.add_to_collection("in_conv1", x_image) h_pool1 = max_pool_2x2(h_conv1) tf.add_to_collection("in_conv2", h_pool1) h_conv2 = tf.nn.relu(conv2d(h_pool1, weights["w_conv2"]) + weights["b_conv2"]) h_pool2 = max_pool_2x2(h_conv2) h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64]) tf.add_to_collection("in_fc1", h_pool2_flat) h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, weights["w_fc1"]) + weights["b_fc1"]) h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob) tf.add_to_collection("in_fc2", h_fc1_drop) y_conv=tf.nn.softmax(tf.matmul(h_fc1_drop, weights["w_fc2"]) + weights["b_fc2"]) return y_conv def test(y_infer, message="None."): correct_prediction = tf.equal(tf.argmax(y_infer,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) # To avoid OOM, run validation with 500/10000 test dataset result = 0 for i in range(20): batch = mnist.test.next_batch(500) result += accuracy.eval(feed_dict={x: batch[0], y_: batch[1], keep_prob: 1.0}) result /= 20 print(message+" %g\n" % result) return result def check_file_exists(key): import os fileList = os.listdir(".") count = 0 for elem in fileList: if elem.find(key) >= 0: count += 1 return key + ("-"+str(count) if count>0 else "") # Construct a dense model x = tf.placeholder("float", shape=[None, 784], name="x") y_ = tf.placeholder("float", shape=[None, 10], name="y_") keep_prob = tf.placeholder("float", name="keep_prob") y_conv = dense_cnn_model(dense_w) tf.add_to_collection("y_conv", y_conv) saver = tf.train.Saver() if args.first_round == True: # First round: Train baseline dense model cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0))) train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) tf.add_to_collection("accuracy", accuracy) sess.run(tf.initialize_all_variables()) for i in range(20000): batch = mnist.train.next_batch(50) if i%100 == 0: train_accuracy = accuracy.eval(feed_dict={ x:batch[0], y_: batch[1], keep_prob: 1.0}) print("step %d, training accuracy %g"%(i, train_accuracy)) train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) # Test score = test(y_conv, message="First-round prune-only test accuracy") papl.log("baseline_accuracy.log", score) # Save model objects to readable format papl.print_weight_vars(dense_w, papl.config.target_all_layer, papl.config.target_dat, show_zero=papl.config.show_zero) # Save model objects to serialized format saver.save(sess, "./model_ckpt_dense") if args.second_round == True: # Second round: Retrain pruned model, start with default model: model_ckpt_dense saver.restore(sess, args.checkpoint) # Apply pruning on this context dict_nzidx = apply_prune(dense_w) # save model objects to readable format papl.print_weight_vars(dense_w, papl.config.target_all_layer, papl.config.target_p_dat, show_zero=papl.config.show_zero) # Test prune-only networks score = test(y_conv, message="Second-round prune-only test accuracy") papl.log("prune_accuracy.log", score) # save model objects to serialized format saver.save(sess, "./model_ckpt_dense_pruned") # Retrain networks cross_entropy = -tf.reduce_sum(y_*tf.log(tf.clip_by_value(y_conv,1e-10,1.0))) trainer = tf.train.AdamOptimizer(1e-4) grads_and_vars = trainer.compute_gradients(cross_entropy) grads_and_vars = apply_prune_on_grads(grads_and_vars, dict_nzidx) train_step = trainer.apply_gradients(grads_and_vars) correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float")) # Initialize firstly touched variables (mostly from accuracy calc.) for var in tf.all_variables(): if tf.is_variable_initialized(var).eval() == False: sess.run(tf.initialize_variables([var])) # Train x epochs additionally for i in range(papl.config.retrain_iterations): batch = mnist.train.next_batch(50) if i%100 == 0: train_accuracy = accuracy.eval(feed_dict={ x:batch[0], y_: batch[1], keep_prob: 1.0}) print("step %d, training accuracy %g"%(i, train_accuracy)) train_step.run(feed_dict={x: batch[0], y_: batch[1], keep_prob: 0.5}) # Save retrained variables to a desne form # key = check_file_exists("model_ckpt_dense_retrained") # saver.save(sess, key) saver.save(sess, "model_ckpt_dense_retrained") # Test the retrained model score = test(y_conv, message="Second-round final test accuracy") papl.log("final_accuracy.log", score) if args.third_round == True: # Third round: Transform iteratively pruned model to a sparse format and save it if args.second_round == False: saver.restore(sess, "./model_ckpt_dense_pruned") # Transform final weights to a sparse form sparse_w = gen_sparse_dict(dense_w) # Initialize new variables in a sparse form for var in tf.all_variables(): if tf.is_variable_initialized(var).eval() == False: sess.run(tf.initialize_variables([var])) # Save model objects to readable format papl.print_weight_vars(dense_w, papl.config.target_all_layer, papl.config.target_tp_dat, show_zero=papl.config.show_zero) # Save model objects to serialized format final_saver = tf.train.Saver(sparse_w) final_saver.save(sess, "./model_ckpt_sparse_retrained")