[
  {
    "path": ".gitignore",
    "content": ".DS_Store\nnote/"
  },
  {
    "path": "README.md",
    "content": "# Index\n1. Application and interpret\n    * [Interpretable Policies for Reinforcement Learning by Genetic Programming](#interpretable-policies-for-reinforcement-learning-by-genetic-programming)\n    * [Discovery Radiomics with CLEAR-DR: Interpretable Computer Aided  Diagnosis of Diabetic Retinopathy](#discovery-radiomics-with-clear-dr-interpretable-computer-aided--diagnosis-of-diabetic-retinopathy)\n    * [Building Data-driven Models with Microstructural Images: Generalization  and Interpretability](#building-data-driven-models-with-microstructural-images-generalization--and-interpretability)\n    * [Interpretable Feature Recommendation for Signal Analytics](#interpretable-feature-recommendation-for-signal-analytics)\n    * [Interpretable and Pedagogical Examples](#interpretable-and-pedagogical-examples)\n    * [Unsupervised patient representations from clinical notes with  interpretable classification decisions](#unsupervised-patient-representations-from-clinical-notes-with--interpretable-classification-decisions)\n    * [Interpretable probabilistic embeddings: bridging the gap between topic  models and neural networks](#interpretable-probabilistic-embeddings-bridging-the-gap-between-topic--models-and-neural-networks)\n    * [Arrhythmia Classification from the Abductive Interpretation of Short  Single-Lead ECG Records](#arrhythmia-classification-from-the-abductive-interpretation-of-short--single-lead-ecg-records)\n    * [Interpretable classifiers using rules and Bayesian analysis: Building a  better stroke prediction model](#interpretable-classifiers-using-rules-and-bayesian-analysis-building-a--better-stroke-prediction-model)\n    * [Interpretable Deep Neural Networks for Single-Trial EEG Classification](#interpretable-deep-neural-networks-for-single-trial-eeg-classification)\n    * [Learning Interpretable Musical Compositional Rules and Traces](#learning-interpretable-musical-compositional-rules-and-traces)\n    * [Building an Interpretable Recommender via Loss-Preserving Transformation](#building-an-interpretable-recommender-via-loss-preserving-transformation)\n    * [Interpretable Machine Learning Models for the Digital Clock Drawing Test](#interpretable-machine-learning-models-for-the-digital-clock-drawing-test)\n    * [RETAIN: An Interpretable Predictive Model for Healthcare using Reverse  Time Attention Mechanism](#retain-an-interpretable-predictive-model-for-healthcare-using-reverse--time-attention-mechanism)\n    * [Real Time Fine-Grained Categorization with Accuracy and Interpretability](#real-time-fine-grained-categorization-with-accuracy-and-interpretability)\n    * [Interpreting Neural Networks to Improve Politeness Comprehension](#interpreting-neural-networks-to-improve-politeness-comprehension)\n    * [Interpretable Semantic Textual Similarity: Finding and explaining  differences between sentences](#interpretable-semantic-textual-similarity-finding-and-explaining--differences-between-sentences)\n    * [Streaming Weak Submodularity: Interpreting Neural Networks on the Fly](#streaming-weak-submodularity-interpreting-neural-networks-on-the-fly)\n    * [Interpretable Learning for Self-Driving Cars by Visualizing Causal  Attention](#interpretable-learning-for-self-driving-cars-by-visualizing-causal--attention)\n    * [Interpretable 3D Human Action Analysis with Temporal Convolutional  Networks](#interpretable-3d-human-action-analysis-with-temporal-convolutional--networks)\n    * [An Interpretable Knowledge Transfer Model for Knowledge Base Completion](#an-interpretable-knowledge-transfer-model-for-knowledge-base-completion)\n    * [MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis  Network](#mdnet-a-semantically-and-visually-interpretable-medical-image-diagnosis--network)\n    * [Interpretable Active Learning](#interpretable-active-learning)\n    * [DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation  of Self-Reported Pain](#deepfacelift-interpretable-personalized-models-for-automatic-estimation--of-self-reported-pain)\n    * [More cat than cute? Interpretable Prediction of Adjective-Noun Pairs](#more-cat-than-cute-interpretable-prediction-of-adjective-noun-pairs)\n    * [Interpretable Categorization of Heterogeneous Time Series Data](#interpretable-categorization-of-heterogeneous-time-series-data)\n    * [Interpretable Graph-Based Semi-Supervised Learning via Flows](#interpretable-graph-based-semi-supervised-learning-via-flows)\n    * [CTD: Fast, Accurate, and Interpretable Method for Static and Dynamic  Tensor Decompositions](#ctd-fast-accurate-and-interpretable-method-for-static-and-dynamic--tensor-decompositions)\n    * [Interpretable Machine Learning for Privacy-Preserving Pervasive Systems](#interpretable-machine-learning-for-privacy-preserving-pervasive-systems)\n    * [Interpretable Convolutional Neural Networks for Effective Translation  Initiation Site Prediction](#interpretable-convolutional-neural-networks-for-effective-translation--initiation-site-prediction)\n    * [Interpretable Facial Relational Network Using Relational Importance](#interpretable-facial-relational-network-using-relational-importance)\n    * [SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for  Predicting Chemical Properties](#smiles2vec-an-interpretable-general-purpose-deep-neural-network-for--predicting-chemical-properties)\n    \n1. Determine interpretability of CNN\n    * [Network Dissection: Quantifying Interpretability of Deep Visual  Representations](#network-dissection-quantifying-interpretability-of-deep-visual--representations)\n    * [A Formal Framework to Characterize Interpretability of Procedures](#a-formal-framework-to-characterize-interpretability-of-procedures)\n    \n1. Criticize\n    * [Interpretation of Neural Networks is Fragile](#interpretation-of-neural-networks-is-fragile)\n    * [The Promise and Peril of Human Evaluation for Model Interpretability](#the-promise-and-peril-of-human-evaluation-for-model-interpretability)\n    \n1. Interpret existing model\n    * [Artificial Intelligence as Structural Estimation: Economic  Interpretations of Deep Blue, Bonanza, and AlphaGo](#artificial-intelligence-as-structural-estimation-economic--interpretations-of-deep-blue-bonanza-and-alphago)\n    * [Semantic Structure and Interpretability of Word Embeddings](#semantic-structure-and-interpretability-of-word-embeddings)\n    * [Interpreting Convolutional Neural Networks Through Compression](#interpreting-convolutional-neural-networks-through-compression)\n    * [Interpreting Deep Visual Representations via Network Dissection](#interpreting-deep-visual-representations-via-network-dissection)\n    * [The Mythos of Model Interpretability](#the-mythos-of-model-interpretability)\n    * [Model-Agnostic Interpretability of Machine Learning](#model-agnostic-interpretability-of-machine-learning)\n    * [Using Visual Analytics to Interpret Predictive Machine Learning Models](#using-visual-analytics-to-interpret-predictive-machine-learning-models)\n    * [Towards Transparent AI Systems: Interpreting Visual Question Answering  Models](#towards-transparent-ai-systems-interpreting-visual-question-answering--models)\n    * [Embedding Projector: Interactive Visualization and Interpretation of  Embeddings](#embedding-projector-interactive-visualization-and-interpretation-of--embeddings)\n    * [Learning Interpretability for Visualizations using Adapted Cox Models  through a User Experiment](#learning-interpretability-for-visualizations-using-adapted-cox-models--through-a-user-experiment)\n    * [Interpreting Finite Automata for Sequential Data](#interpreting-finite-automata-for-sequential-data)\n    * [Interpretation of Prediction Models Using the Input Gradient](#interpretation-of-prediction-models-using-the-input-gradient)\n    * [Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery](#interpretable-recurrent-neural-networks-using-sequential-sparse-recovery)\n    * [An unexpected unity among methods for interpreting model predictions](#an-unexpected-unity-among-methods-for-interpreting-model-predictions)\n    * [Towards A Rigorous Science of Interpretable Machine Learning](#towards-a-rigorous-science-of-interpretable-machine-learning)\n    * [Softmax Q-Distribution Estimation for Structured Prediction: A  Theoretical Interpretation for RAML](#softmax-q-distribution-estimation-for-structured-prediction-a--theoretical-interpretation-for-raml)\n    * [A Unified Approach to Interpreting Model Predictions](#a-unified-approach-to-interpreting-model-predictions)\n    * [Interpreting Blackbox Models via Model Extraction](#interpreting-blackbox-models-via-model-extraction)\n    * [Interpretable &amp; Explorable Approximations of Black Box Models](#interpretable--explorable-approximations-of-black-box-models)\n    * [Interpretability via Model Extraction](#interpretability-via-model-extraction)\n    * [Methods for Interpreting and Understanding Deep Neural Networks](#methods-for-interpreting-and-understanding-deep-neural-networks)\n    * [Interpreting Classifiers through Attribute Interactions in Datasets](#interpreting-classifiers-through-attribute-interactions-in-datasets)\n    * [Using Program Induction to Interpret Transition System Dynamics](#using-program-induction-to-interpret-transition-system-dynamics)\n    * [Warp: a method for neural network interpretability applied to gene  expression profiles](#warp-a-method-for-neural-network-interpretability-applied-to-gene--expression-profiles)\n    * [Towards Interpretable Deep Neural Networks by Leveraging Adversarial  Examples](#towards-interpretable-deep-neural-networks-by-leveraging-adversarial--examples)\n    * [Explainable Artificial Intelligence: Understanding, Visualizing and  Interpreting Deep Learning Models](#explainable-artificial-intelligence-understanding-visualizing-and--interpreting-deep-learning-models)\n    * [Interpreting Shared Deep Learning Models via Explicable Boundary Trees](#interpreting-shared-deep-learning-models-via-explicable-boundary-trees)\n    * [Unleashing the Potential of CNNs for Interpretable Few-Shot Learning](#unleashing-the-potential-of-cnns-for-interpretable-few-shot-learning)\n    * [Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action  Recognition](#train-diagnose-and-fix-interpretable-approach-for-fine-grained-action--recognition)\n    * [An interpretable latent variable model for attribute applicability in  the Amazon catalogue](#an-interpretable-latent-variable-model-for-attribute-applicability-in--the-amazon-catalogue)\n    * [Where Classification Fails, Interpretation Rises](#where-classification-fails-interpretation-rises)\n    \n1. Attempt to improve interpretability\n    * [Contextual Regression: An Accurate and Conveniently Interpretable  Nonlinear Model for Mining Discovery from Scientific Data](#contextual-regression-an-accurate-and-conveniently-interpretable--nonlinear-model-for-mining-discovery-from-scientific-data)\n    * [Interpretable R-CNN](#interpretable-r-cnn)\n    * [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](#increasing-the-interpretability-of-recurrent-neural-networks-using--hidden-markov-models)\n    * [SnapToGrid: From Statistical to Interpretable Models for Biomedical  Information Extraction](#snaptogrid-from-statistical-to-interpretable-models-for-biomedical--information-extraction)\n    * [Meaningful Models: Utilizing Conceptual Structure to Improve Machine  Learning Interpretability](#meaningful-models-utilizing-conceptual-structure-to-improve-machine--learning-interpretability)\n    * [Particle Swarm Optimization for Generating Interpretable Fuzzy  Reinforcement Learning Policies](#particle-swarm-optimization-for-generating-interpretable-fuzzy--reinforcement-learning-policies)\n    * [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](#increasing-the-interpretability-of-recurrent-neural-networks-using--hidden-markov-models-1)\n    * [Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning](#growing-interpretable-part-graphs-on-convnets-via-multi-shot-learning)\n    * [GENESIM: genetic extraction of a single, interpretable model](#genesim-genetic-extraction-of-a-single-interpretable-model)\n    * [Stratified Knowledge Bases as Interpretable Probabilistic Models  (Extended Abstract)](#stratified-knowledge-bases-as-interpretable-probabilistic-models--extended-abstract)\n    * [Tree Space Prototypes: Another Look at Making Tree Ensembles  Interpretable](#tree-space-prototypes-another-look-at-making-tree-ensembles--interpretable)\n    * [Inducing Interpretable Representations with Variational Autoencoders](#inducing-interpretable-representations-with-variational-autoencoders)\n    * [Input Switched Affine Networks: An RNN Architecture Designed for  Interpretability](#input-switched-affine-networks-an-rnn-architecture-designed-for--interpretability)\n    * [Large scale modeling of antimicrobial resistance with interpretable  classifiers](#large-scale-modeling-of-antimicrobial-resistance-with-interpretable--classifiers)\n    * [Towards a New Interpretation of Separable Convolutions](#towards-a-new-interpretation-of-separable-convolutions)\n    * [Interpretable Structure-Evolving LSTM](#interpretable-structure-evolving-lstm)\n    * [Improving Interpretability of Deep Neural Networks with Semantic  Information](#improving-interpretability-of-deep-neural-networks-with-semantic--information)\n    * [InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations](#infogail-interpretable-imitation-learning-from-visual-demonstrations)\n    * [Patchnet: Interpretable Neural Networks for Image Classification](#patchnet-interpretable-neural-networks-for-image-classification)\n    * [Unsupervised Learning of Disentangled and Interpretable Representations  from Sequential Data](#unsupervised-learning-of-disentangled-and-interpretable-representations--from-sequential-data)\n    * [Interpretable Convolutional Neural Networks](#interpretable-convolutional-neural-networks)\n    * [InterpNET: Neural Introspection for Interpretable Deep Learning](#interpnet-neural-introspection-for-interpretable-deep-learning)\n    * [MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural  Networks](#minimalrnn-toward-more-interpretable-and-trainable-recurrent-neural--networks)\n    * [Beyond Sparsity: Tree Regularization of Deep Models for Interpretability](#beyond-sparsity-tree-regularization-of-deep-models-for-interpretability)\n    * [SPINE: SParse Interpretable Neural Embeddings](#spine-sparse-interpretable-neural-embeddings)\n    * [Improving the Adversarial Robustness and Interpretability of Deep Neural  Networks by Regularizing their Input Gradients](#improving-the-adversarial-robustness-and-interpretability-of-deep-neural--networks-by-regularizing-their-input-gradients)\n\n\n# Papers\n\n## [Interpretable Policies for Reinforcement Learning by Genetic Programming](https://arxiv.org/abs/1712.04170)\n[(PDF)](https://arxiv.org/pdf/1712.04170)\n\n`Authors:Daniel Hein, Steffen Udluft, Thomas A. Runkler`\n\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)\n\n\nCite as:\n\narXiv:1712.04170 [cs.AI]\n\n \n(or arXiv:1712.04170v1 [cs.AI] for this version)\n\n\n> Abstract: The search for interpretable reinforcement learning policies is of high\nacademic and industrial interest. Especially for industrial systems, domain\nexperts are more likely to deploy autonomously learned controllers if they are\nunderstandable and convenient to evaluate. Basic algebraic equations are\nsupposed to meet these requirements, as long as they are restricted to an\nadequate complexity. Here we introduce the genetic programming for\nreinforcement learning (GPRL) approach based on model-based batch reinforcement\nlearning and genetic programming, which autonomously learns policy equations\nfrom pre-existing default state-action trajectory samples. GPRL is compared to\na straight-forward method which utilizes genetic programming for symbolic\nregression, yielding policies imitating an existing well-performing, but\nnon-interpretable policy. Experiments on three reinforcement learning\nbenchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark,\ndemonstrate the superiority of our GPRL approach compared to the symbolic\nregression method. GPRL is capable of producing well-performing interpretable\nreinforcement learning policies from pre-existing default trajectory data.\n\n\n## [Discovery Radiomics with CLEAR-DR: Interpretable Computer Aided  Diagnosis of Diabetic Retinopathy](https://arxiv.org/abs/1710.10675)\n[(PDF)](https://arxiv.org/pdf/1710.10675)\n\n`Authors:Devinder Kumar, Graham W. Taylor, Alexander Wong`\n\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)\n\n\nCite as:\n\narXiv:1710.10675 [cs.AI]\n\n \n(or arXiv:1710.10675v1 [cs.AI] for this version)\n\n\n> Abstract: Objective: Radiomics-driven Computer Aided Diagnosis (CAD) has shown\nconsiderable promise in recent years as a potential tool for improving clinical\ndecision support in medical oncology, particularly those based around the\nconcept of Discovery Radiomics, where radiomic sequencers are discovered\nthrough the analysis of medical imaging data. One of the main limitations with\ncurrent CAD approaches is that it is very difficult to gain insight or\nrationale as to how decisions are made, thus limiting their utility to\nclinicians. Methods: In this study, we propose CLEAR-DR, a novel interpretable\nCAD system based on the notion of CLass-Enhanced Attentive Response Discovery\nRadiomics for the purpose of clinical decision support for diabetic\nretinopathy. Results: In addition to disease grading via the discovered deep\nradiomic sequencer, the CLEAR-DR system also produces a visual interpretation\nof the decision-making process to provide better insight and understanding into\nthe decision-making process of the system. Conclusion: We demonstrate the\neffectiveness and utility of the proposed CLEAR-DR system of enhancing the\ninterpretability of diagnostic grading results for the application of diabetic\nretinopathy grading. Significance: CLEAR-DR can act as a potential powerful\ntool to address the uninterpretability issue of current CAD systems, thus\nimproving their utility to clinicians.\n\n\n## [Interpretation of Neural Networks is Fragile](https://arxiv.org/abs/1710.10547)\n[(PDF)](https://arxiv.org/pdf/1710.10547)\n\n`Authors:Amirata Ghorbani, Abubakar Abid, James Zou`\n\n\nComments:\n\nSubmitted for review at ICLR 2018\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1710.10547 [stat.ML]\n\n \n(or arXiv:1710.10547v1 [stat.ML] for this version)\n\n\n> Abstract: In order for machine learning to be deployed and trusted in many\napplications, it is crucial to be able to reliably explain why the machine\nlearning algorithm makes certain predictions. For example, if an algorithm\nclassifies a given pathology image to be a malignant tumor, then the doctor may\nneed to know which parts of the image led the algorithm to this classification.\nHow to interpret black-box predictors is thus an important and active area of\nresearch. A fundamental question is: how much can we trust the interpretation\nitself? In this paper, we show that interpretation of deep learning predictions\nis extremely fragile in the following sense: two perceptively indistinguishable\ninputs with the same predicted label can be assigned very different\ninterpretations. We systematically characterize the fragility of several\nwidely-used feature-importance interpretation methods (saliency maps, relevance\npropagation, and DeepLIFT) on ImageNet and CIFAR-10. Our experiments show that\neven small random perturbation can change the feature importance and new\nsystematic perturbations can lead to dramatically different interpretations\nwithout changing the label. We extend these results to show that\ninterpretations based on exemplars (e.g. influence functions) are similarly\nfragile. Our analysis of the geometry of the Hessian matrix gives insight on\nwhy fragility could be a fundamental challenge to the current interpretation\napproaches.\n\n\n## [Artificial Intelligence as Structural Estimation: Economic  Interpretations of Deep Blue, Bonanza, and AlphaGo](https://arxiv.org/abs/1710.10967)\n[(PDF)](https://arxiv.org/pdf/1710.10967)\n\n`Authors:Mitsuru Igami`\n\n\nSubjects:\n\nEconometrics (econ.EM); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1710.10967 [econ.EM]\n\n \n(or arXiv:1710.10967v2 [econ.EM] for this version)\n\n\n> Abstract: Artificial intelligence (AI) has achieved superhuman performance in a growing\nnumber of tasks, including the classical games of chess, shogi, and Go, but\nunderstanding and explaining AI remain challenging. This paper studies the\nmachine-learning algorithms for developing the game AIs, and provides their\nstructural interpretations. Specifically, chess-playing Deep Blue is a\ncalibrated value function, whereas shogi-playing Bonanza represents an\nestimated value function via Rust's (1987) nested fixed-point method. AlphaGo's\n\"supervised-learning policy network\" is a deep neural network (DNN) version of\nHotz and Miller's (1993) conditional choice probability estimates; its\n\"reinforcement-learning value network\" is equivalent to Hotz, Miller, Sanders,\nand Smith's (1994) simulation method for estimating the value function. Their\nperformances suggest DNNs are a useful functional form when the state space is\nlarge and data are sparse. Explicitly incorporating strategic interactions and\nunobserved heterogeneity in the data-generating process would further improve\nAIs' explicability.\n\n\n## [Contextual Regression: An Accurate and Conveniently Interpretable  Nonlinear Model for Mining Discovery from Scientific Data](https://arxiv.org/abs/1710.10728)\n[(PDF)](https://arxiv.org/pdf/1710.10728)\n\n`Authors:Chengyu Liu, Wei Wang`\n\n\nComments:\n\n18 pages of Main Article, 30 pages of Supplementary Material\n\nSubjects:\n\nQuantitative Methods (q-bio.QM); Learning (cs.LG); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1710.10728 [q-bio.QM]\n\n \n(or arXiv:1710.10728v1 [q-bio.QM] for this version)\n\n\n> Abstract: Machine learning algorithms such as linear regression, SVM and neural network\nhave played an increasingly important role in the process of scientific\ndiscovery. However, none of them is both interpretable and accurate on\nnonlinear datasets. Here we present contextual regression, a method that joins\nthese two desirable properties together using a hybrid architecture of neural\nnetwork embedding and dot product layer. We demonstrate its high prediction\naccuracy and sensitivity through the task of predictive feature selection on a\nsimulated dataset and the application of predicting open chromatin sites in the\nhuman genome. On the simulated data, our method achieved high fidelity recovery\nof feature contributions under random noise levels up to 200%. On the open\nchromatin dataset, the application of our method not only outperformed the\nstate of the art method in terms of accuracy, but also unveiled two previously\nunfound open chromatin related histone marks. Our method can fill the blank of\naccurate and interpretable nonlinear modeling in scientific data mining tasks.\n\n\n## [Building Data-driven Models with Microstructural Images: Generalization  and Interpretability](https://arxiv.org/abs/1711.00404)\n[(PDF)](https://arxiv.org/pdf/1711.00404)\n\n`Authors:Julia Ling, Maxwell Hutchinson, Erin Antono, Brian DeCost, Elizabeth A. Holm, Bryce Meredig`\n\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Materials Science (cond-mat.mtrl-sci)\n\n\nCite as:\n\narXiv:1711.00404 [cs.AI]\n\n \n(or arXiv:1711.00404v1 [cs.AI] for this version)\n\n\n> Abstract: As data-driven methods rise in popularity in materials science applications,\na key question is how these machine learning models can be used to understand\nmicrostructure. Given the importance of process-structure-property relations\nthroughout materials science, it seems logical that models that can leverage\nmicrostructural data would be more capable of predicting property information.\nWhile there have been some recent attempts to use convolutional neural networks\nto understand microstructural images, these early studies have focused only on\nwhich featurizations yield the highest machine learning model accuracy for a\nsingle data set. This paper explores the use of convolutional neural networks\nfor classifying microstructure with a more holistic set of objectives in mind:\ngeneralization between data sets, number of features required, and\ninterpretability.\n\n\n## [Interpretable Feature Recommendation for Signal Analytics](https://arxiv.org/abs/1711.01870)\n[(PDF)](https://arxiv.org/pdf/1711.01870)\n\n`Authors:Snehasis Banerjee, Tanushyam Chattopadhyay, Ayan Mukherjee`\n\n\nComments:\n\n4 pages, Interpretable Data Mining Workshop, CIKM 2017\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1711.01870 [stat.ML]\n\n \n(or arXiv:1711.01870v1 [stat.ML] for this version)\n\n\n> Abstract: This paper presents an automated approach for interpretable feature\nrecommendation for solving signal data analytics problems. The method has been\ntested by performing experiments on datasets in the domain of prognostics where\ninterpretation of features is considered very important. The proposed approach\nis based on Wide Learning architecture and provides means for interpretation of\nthe recommended features. It is to be noted that such an interpretation is not\navailable with feature learning approaches like Deep Learning (such as\nConvolutional Neural Network) or feature transformation approaches like\nPrincipal Component Analysis. Results show that the feature recommendation and\ninterpretation techniques are quite effective for the problems at hand in terms\nof performance and drastic reduction in time to develop a solution. It is\nfurther shown by an example, how this human-in-loop interpretation system can\nbe used as a prescriptive system.\n\n\n## [Semantic Structure and Interpretability of Word Embeddings](https://arxiv.org/abs/1711.00331)\n[(PDF)](https://arxiv.org/pdf/1711.00331)\n\n`Authors:Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, Aykut Koc, Tolga Cukur`\n\n\nComments:\n\n10 Pages, 7 Figures\n\nSubjects:\n\nComputation and Language (cs.CL)\n\n\nCite as:\n\narXiv:1711.00331 [cs.CL]\n\n \n(or arXiv:1711.00331v2 [cs.CL] for this version)\n\n\n> Abstract: Dense word embeddings, which encode semantic meanings of words to low\ndimensional vector spaces have become very popular in natural language\nprocessing (NLP) research due to their state-of-the-art performances in many\nNLP tasks. Word embeddings are substantially successful in capturing semantic\nrelations among words, so a meaningful semantic structure must be present in\nthe respective vector spaces. However, in many cases, this semantic structure\nis broadly and heterogeneously distributed across the embedding dimensions,\nwhich makes interpretation a big challenge. In this study, we propose a\nstatistical method to uncover the latent semantic structure in the dense word\nembeddings. To perform our analysis we introduce a new dataset (SEMCAT) that\ncontains more than 6500 words semantically grouped under 110 categories. We\nfurther propose a method to quantify the interpretability of the word\nembeddings; the proposed method is a practical alternative to the classical\nword intrusion test that requires human intervention.\n\n\n## [Interpretable and Pedagogical Examples](https://arxiv.org/abs/1711.00694)\n[(PDF)](https://arxiv.org/pdf/1711.00694)\n\n`Authors:Smitha Milli, Pieter Abbeel, Igor Mordatch`\n\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1711.00694 [cs.AI]\n\n \n(or arXiv:1711.00694v1 [cs.AI] for this version)\n\n\n> Abstract: Teachers intentionally pick the most informative examples to show their\nstudents. However, if the teacher and student are neural networks, the examples\nthat the teacher network learns to give, although effective at teaching the\nstudent, are typically uninterpretable. We show that training the student and\nteacher iteratively, rather than jointly, can produce interpretable teaching\nstrategies. We evaluate interpretability by (1) measuring the similarity of the\nteacher's emergent strategies to intuitive strategies in each domain and (2)\nconducting human experiments to evaluate how effective the teacher's strategies\nare at teaching humans. We show that the teacher network learns to select or\ngenerate interpretable, pedagogical examples to teach rule-based,\nprobabilistic, boolean, and hierarchical concepts.\n\n\n## [Unsupervised patient representations from clinical notes with  interpretable classification decisions](https://arxiv.org/abs/1711.05198)\n[(PDF)](https://arxiv.org/pdf/1711.05198)\n\n`Authors:Madhumita Sushil, Simon Šuster, Kim Luyckx, Walter Daelemans`\n\n\nComments:\n\nAccepted poster at NIPS 2017 Workshop on Machine Learning for Health (this https URL)\n\nSubjects:\n\nComputation and Language (cs.CL)\n\n\nCite as:\n\narXiv:1711.05198 [cs.CL]\n\n \n(or arXiv:1711.05198v1 [cs.CL] for this version)\n\n\n> Abstract: We have two main contributions in this work: 1. We explore the usage of a\nstacked denoising autoencoder, and a paragraph vector model to learn\ntask-independent dense patient representations directly from clinical notes. We\nevaluate these representations by using them as features in multiple supervised\nsetups, and compare their performance with those of sparse representations. 2.\nTo understand and interpret the representations, we explore the best encoded\nfeatures within the patient representations obtained from the autoencoder\nmodel. Further, we calculate the significance of the input features of the\ntrained classifiers when we use these pretrained representations as input.\n\n\n## [Interpreting Convolutional Neural Networks Through Compression](https://arxiv.org/abs/1711.02329)\n[(PDF)](https://arxiv.org/pdf/1711.02329)\n\n`Authors:Reza Abbasi-Asl, Bin Yu`\n\n\nComments:\n\nPresented at NIPS 2017 Symposium on Interpretable Machine Learning\n\nSubjects:\n\nMachine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1711.02329 [stat.ML]\n\n \n(or arXiv:1711.02329v1 [stat.ML] for this version)\n\n\n> Abstract: Convolutional neural networks (CNNs) achieve state-of-the-art performance in\na wide variety of tasks in computer vision. However, interpreting CNNs still\nremains a challenge. This is mainly due to the large number of parameters in\nthese networks. Here, we investigate the role of compression and particularly\npruning filters in the interpretation of CNNs. We exploit our recently-proposed\ngreedy structural compression scheme that prunes filters in a trained CNN. In\nour compression, the filter importance index is defined as the classification\naccuracy reduction (CAR) of the network after pruning that filter. The filters\nare then iteratively pruned based on the CAR index. We demonstrate the\ninterpretability of CAR-compressed CNNs by showing that our algorithm prunes\nfilters with visually redundant pattern selectivity. Specifically, we show the\nimportance of shape-selective filters for object recognition, as opposed to\ncolor-selective filters. Out of top 20 CAR-pruned filters in AlexNet, 17 of\nthem in the first layer and 14 of them in the second layer are color-selective\nfilters. Finally, we introduce a variant of our CAR importance index that\nquantifies the importance of each image class to each CNN filter. We show that\nthe most and the least important class labels present a meaningful\ninterpretation of each filter that is consistent with the visualized pattern\nselectivity of that filter.\n\n\n## [Interpretable probabilistic embeddings: bridging the gap between topic  models and neural networks](https://arxiv.org/abs/1711.04154)\n[(PDF)](https://arxiv.org/pdf/1711.04154)\n\n`Authors:Anna Potapenko, Artem Popov, Konstantin Vorontsov`\n\n\nComments:\n\nAppeared in AINL-2017\n\nSubjects:\n\nComputation and Language (cs.CL)\n\n\nCite as:\n\narXiv:1711.04154 [cs.CL]\n\n \n(or arXiv:1711.04154v1 [cs.CL] for this version)\n\n\n> Abstract: We consider probabilistic topic models and more recent word embedding\ntechniques from a perspective of learning hidden semantic representations.\nInspired by a striking similarity of the two approaches, we merge them and\nlearn probabilistic embeddings with online EM-algorithm on word co-occurrence\ndata. The resulting embeddings perform on par with Skip-Gram Negative Sampling\n(SGNS) on word similarity tasks and benefit in the interpretability of the\ncomponents. Next, we learn probabilistic document embeddings that outperform\nparagraph2vec on a document similarity task and require less memory and time\nfor training. Finally, we employ multimodal Additive Regularization of Topic\nModels (ARTM) to obtain a high sparsity and learn embeddings for other\nmodalities, such as timestamps and categories. We observe further improvement\nof word similarity performance and meaningful inter-modality similarities.\n\n\n## [Arrhythmia Classification from the Abductive Interpretation of Short  Single-Lead ECG Records](https://arxiv.org/abs/1711.03892)\n[(PDF)](https://arxiv.org/pdf/1711.03892)\n\n`Authors:Tomás Teijeiro, Constantino A. García, Daniel Castro, Paulo Félix`\n\n\nComments:\n\n4 pages, 3 figures. Presented in the Computing in Cardiology 2017 conference\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)\n\n\nMSC classes:\n\n68T10\n\n\nCite as:\n\narXiv:1711.03892 [cs.AI]\n\n \n(or arXiv:1711.03892v1 [cs.AI] for this version)\n\n\n> Abstract: In this work we propose a new method for the rhythm classification of short\nsingle-lead ECG records, using a set of high-level and clinically meaningful\nfeatures provided by the abductive interpretation of the records. These\nfeatures include morphological and rhythm-related features that are used to\nbuild two classifiers: one that evaluates the record globally, using aggregated\nvalues for each feature; and another one that evaluates the record as a\nsequence, using a Recurrent Neural Network fed with the individual features for\neach detected heartbeat. The two classifiers are finally combined using the\nstacking technique, providing an answer by means of four target classes: Normal\nsinus rhythm, Atrial fibrillation, Other anomaly, and Noisy. The approach has\nbeen validated against the 2017 Physionet/CinC Challenge dataset, obtaining a\nfinal score of 0.83 and ranking first in the competition.\n\n\n## [Interpretable R-CNN](https://arxiv.org/abs/1711.05226)\n[(PDF)](https://arxiv.org/pdf/1711.05226)\n\n`Authors:Tianfu Wu, Xilai Li, Xi Song, Wei Sun, Liang Dong, Bo Li`\n\n\nComments:\n\n13 pages\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1711.05226 [cs.CV]\n\n \n(or arXiv:1711.05226v1 [cs.CV] for this version)\n\n\n> Abstract: This paper presents a method of learning qualitatively interpretable models\nin object detection using popular two-stage region-based ConvNet detection\nsystems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI\n(Region-of-Interest) prediction network.By interpretable models, we focus on\nweakly-supervised extractive rationale generation, that is learning to unfold\nlatent discriminative part configurations of object instances automatically and\nsimultaneously in detection without using any supervision for part\nconfigurations. We utilize a top-down hierarchical and compositional grammar\nmodel embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold\nthe space of latent part configurations of RoIs. We propose an AOGParsing\noperator to substitute the RoIPooling operator widely used in R-CNN, so the\nproposed method is applicable to many state-of-the-art ConvNet based detection\nsystems. The AOGParsing operator aims to harness both the explainable rigor of\ntop-down hierarchical and compositional grammar models and the discriminative\npower of bottom-up deep neural networks through end-to-end training. In\ndetection, a bounding box is interpreted by the best parse tree derived from\nthe AOG on-the-fly, which is treated as the extractive rationale generated for\ninterpreting detection. In learning, we propose a folding-unfolding method to\ntrain the AOG and ConvNet end-to-end. In experiments, we build on top of the\nR-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets\nwith performance comparable to state-of-the-art methods.\n\n\n## [Interpreting Deep Visual Representations via Network Dissection](https://arxiv.org/abs/1711.05611)\n[(PDF)](https://arxiv.org/pdf/1711.05611)\n\n`Authors:Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba`\n\n\nComments:\n\n*B. Zhou and D. Bau contributed equally to this work. 15 pages, 27 figures\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nACM classes:\n\nI.2.10\n\n\nCite as:\n\narXiv:1711.05611 [cs.CV]\n\n \n(or arXiv:1711.05611v1 [cs.CV] for this version)\n\n\n> Abstract: The success of recent deep convolutional neural networks (CNNs) depends on\nlearning hidden representations that can summarize the important factors of\nvariation behind the data. However, CNNs often criticized as being black boxes\nthat lack interpretability, since they have millions of unexplained model\nparameters. In this work, we describe Network Dissection, a method that\ninterprets networks by providing labels for the units of their deep visual\nrepresentations. The proposed method quantifies the interpretability of CNN\nrepresentations by evaluating the alignment between individual hidden units and\na set of visual semantic concepts. By identifying the best alignments, units\nare given human interpretable labels across a range of objects, parts, scenes,\ntextures, materials, and colors. The method reveals that deep representations\nare more transparent and interpretable than expected: we find that\nrepresentations are significantly more interpretable than they would be under a\nrandom equivalently powerful basis. We apply the method to interpret and\ncompare the latent representations of various network architectures trained to\nsolve different supervised and self-supervised training tasks. We then examine\nfactors affecting the network interpretability such as the number of the\ntraining iterations, regularizations, different initializations, and the\nnetwork depth and width. Finally we show that the interpreted units can be used\nto provide explicit explanations of a prediction given by a CNN for an image.\nOur results highlight that interpretability is an important property of deep\nneural networks that provides new insights into their hierarchical structure.\n\n\n## [Interpretable classifiers using rules and Bayesian analysis: Building a  better stroke prediction model](https://arxiv.org/abs/1511.01644)\n[(PDF)](https://arxiv.org/pdf/1511.01644)\n\n`Authors:Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, David Madigan`\n\n\nComments:\n\nPublished at this http URL in the Annals of Applied Statistics (this http URL) by the Institute of Mathematical Statistics (this http URL)\n\nSubjects:\n\nApplications (stat.AP); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nJournal reference:\n\nAnnals of Applied Statistics 2015, Vol. 9, No. 3, 1350-1371\n\n\nDOI:\n\n10.1214/15-AOAS848\n\n\nReport number:\n\nIMS-AOAS-AOAS848\n\n\nCite as:\n\narXiv:1511.01644 [stat.AP]\n\n \n(or arXiv:1511.01644v1 [stat.AP] for this version)\n\n\n> Abstract: We aim to produce predictive models that are not only accurate, but are also\ninterpretable to human experts. Our models are decision lists, which consist of\na series of if...then... statements (e.g., if high blood pressure, then stroke)\nthat discretize a high-dimensional, multivariate feature space into a series of\nsimple, readily interpretable decision statements. We introduce a generative\nmodel called Bayesian Rule Lists that yields a posterior distribution over\npossible decision lists. It employs a novel prior structure to encourage\nsparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy\non par with the current top algorithms for prediction in machine learning. Our\nmethod is motivated by recent developments in personalized medicine, and can be\nused to produce highly accurate and interpretable medical scoring systems. We\ndemonstrate this by producing an alternative to the CHADS$_2$ score, actively\nused in clinical practice for estimating the risk of stroke in patients that\nhave atrial fibrillation. Our model is as interpretable as CHADS$_2$, but more\naccurate.\n\n\n## [Interpretable Deep Neural Networks for Single-Trial EEG Classification](https://arxiv.org/abs/1604.08201)\n[(PDF)](https://arxiv.org/pdf/1604.08201)\n\n`Authors:Irene Sturm, Sebastian Bach, Wojciech Samek, Klaus-Robert Müller`\n\n\nComments:\n\n5 pages, 1 figure\n\nSubjects:\n\nNeural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1604.08201 [cs.NE]\n\n \n(or arXiv:1604.08201v1 [cs.NE] for this version)\n\n\n> Abstract: Background: In cognitive neuroscience the potential of Deep Neural Networks\n(DNNs) for solving complex classification tasks is yet to be fully exploited.\nThe most limiting factor is that DNNs as notorious 'black boxes' do not provide\ninsight into neurophysiological phenomena underlying a decision. Layer-wise\nRelevance Propagation (LRP) has been introduced as a novel method to explain\nindividual network decisions. New Method: We propose the application of DNNs\nwith LRP for the first time for EEG data analysis. Through LRP the single-trial\nDNN decisions are transformed into heatmaps indicating each data point's\nrelevance for the outcome of the decision. Results: DNN achieves classification\naccuracies comparable to those of CSP-LDA. In subjects with low performance\nsubject-to-subject transfer of trained DNNs can improve the results. The\nsingle-trial LRP heatmaps reveal neurophysiologically plausible patterns,\nresembling CSP-derived scalp maps. Critically, while CSP patterns represent\nclass-wise aggregated information, LRP heatmaps pinpoint neural patterns to\nsingle time points in single trials. Comparison with Existing Method(s): We\ncompare the classification performance of DNNs to that of linear CSP-LDA on two\ndata sets related to motor-imaginery BCI. Conclusion: We have demonstrated that\nDNN is a powerful non-linear tool for EEG analysis. With LRP a new quality of\nhigh-resolution assessment of neural activity can be reached. LRP is a\npotential remedy for the lack of interpretability of DNNs that has limited\ntheir utility in neuroscientific applications. The extreme specificity of the\nLRP-derived heatmaps opens up new avenues for investigating neural activity\nunderlying complex perception or decision-related processes.\n\n\n## [InfoGAN: Interpretable Representation Learning by Information Maximizing  Generative Adversarial Nets](https://arxiv.org/abs/1606.03657)\n[(PDF)](https://arxiv.org/pdf/1606.03657)\n\n`Authors:Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel`\n\n\nSubjects:\n\nLearning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1606.03657 [cs.LG]\n\n \n(or arXiv:1606.03657v1 [cs.LG] for this version)\n\n\n> Abstract: This paper describes InfoGAN, an information-theoretic extension to the\nGenerative Adversarial Network that is able to learn disentangled\nrepresentations in a completely unsupervised manner. InfoGAN is a generative\nadversarial network that also maximizes the mutual information between a small\nsubset of the latent variables and the observation. We derive a lower bound to\nthe mutual information objective that can be optimized efficiently, and show\nthat our training procedure can be interpreted as a variation of the Wake-Sleep\nalgorithm. Specifically, InfoGAN successfully disentangles writing styles from\ndigit shapes on the MNIST dataset, pose from lighting of 3D rendered images,\nand background digits from the central digit on the SVHN dataset. It also\ndiscovers visual concepts that include hair styles, presence/absence of\neyeglasses, and emotions on the CelebA face dataset. Experiments show that\nInfoGAN learns interpretable representations that are competitive with\nrepresentations learned by existing fully supervised methods.\n\n\n## [The Mythos of Model Interpretability](https://arxiv.org/abs/1606.03490)\n[(PDF)](https://arxiv.org/pdf/1606.03490)\n\n`Authors:Zachary C. Lipton`\n\n\nComments:\n\npresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nLearning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1606.03490 [cs.LG]\n\n \n(or arXiv:1606.03490v3 [cs.LG] for this version)\n\n\n> Abstract: Supervised machine learning models boast remarkable predictive capabilities.\nBut can you trust your model? Will it work in deployment? What else can it tell\nyou about the world? We want models to be not only good, but interpretable. And\nyet the task of interpretation appears underspecified. Papers provide diverse\nand sometimes non-overlapping motivations for interpretability, and offer\nmyriad notions of what attributes render models interpretable. Despite this\nambiguity, many papers proclaim interpretability axiomatically, absent further\nexplanation. In this paper, we seek to refine the discourse on\ninterpretability. First, we examine the motivations underlying interest in\ninterpretability, finding them to be diverse and occasionally discordant. Then,\nwe address model properties and techniques thought to confer interpretability,\nidentifying transparency to humans and post-hoc explanations as competing\nnotions. Throughout, we discuss the feasibility and desirability of different\nnotions, and question the oft-made assertions that linear models are\ninterpretable and that deep neural networks are not.\n\n\n## [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](https://arxiv.org/abs/1606.05320)\n[(PDF)](https://arxiv.org/pdf/1606.05320)\n\n`Authors:Viktoriya Krakovna, Finale Doshi-Velez`\n\n\nComments:\n\npresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.05320 [stat.ML]\n\n \n(or arXiv:1606.05320v2 [stat.ML] for this version)\n\n\n> Abstract: As deep neural networks continue to revolutionize various application\ndomains, there is increasing interest in making these powerful models more\nunderstandable and interpretable, and narrowing down the causes of good and bad\npredictions. We focus on recurrent neural networks (RNNs), state of the art\nmodels in speech recognition and translation. Our approach to increasing\ninterpretability is by combining an RNN with a hidden Markov model (HMM), a\nsimpler and more transparent model. We explore various combinations of RNNs and\nHMMs: an HMM trained on LSTM states; a hybrid model where an HMM is trained\nfirst, then a small LSTM is given HMM state distributions and trained to fill\nin gaps in the HMM's performance; and a jointly trained hybrid model. We find\nthat the LSTM and HMM learn complementary information about the features in the\ntext.\n\n\n## [Model-Agnostic Interpretability of Machine Learning](https://arxiv.org/abs/1606.05386)\n[(PDF)](https://arxiv.org/pdf/1606.05386)\n\n`Authors:Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin`\n\n\nComments:\n\npresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.05386 [stat.ML]\n\n \n(or arXiv:1606.05386v1 [stat.ML] for this version)\n\n\n> Abstract: Understanding why machine learning models behave the way they do empowers\nboth system designers and end-users in many ways: in model selection, feature\nengineering, in order to trust and act upon the predictions, and in more\nintuitive user interfaces. Thus, interpretability has become a vital concern in\nmachine learning, and work in the area of interpretable models has found\nrenewed interest. In some applications, such models are as accurate as\nnon-interpretable ones, and thus are preferred for their transparency. Even\nwhen they are not accurate, they may still be preferred when interpretability\nis of paramount importance. However, restricting machine learning to\ninterpretable models is often a severe limitation. In this paper we argue for\nexplaining machine learning predictions using model-agnostic approaches. By\ntreating the machine learning models as black-box functions, these approaches\nprovide crucial flexibility in the choice of models, explanations, and\nrepresentations, improving debugging, comparison, and interfaces for a variety\nof users and models. We also outline the main challenges for such methods, and\nreview a recently-introduced model-agnostic explanation approach (LIME) that\naddresses these challenges.\n\n\n## [Learning Interpretable Musical Compositional Rules and Traces](https://arxiv.org/abs/1606.05572)\n[(PDF)](https://arxiv.org/pdf/1606.05572)\n\n`Authors:Haizi Yu, Lav R. Varshney, Guy E. Garnett, Ranjitha Kumar`\n\n\nComments:\n\npresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.05572 [stat.ML]\n\n \n(or arXiv:1606.05572v1 [stat.ML] for this version)\n\n\n> Abstract: Throughout music history, theorists have identified and documented\ninterpretable rules that capture the decisions of composers. This paper asks,\n\"Can a machine behave like a music theorist?\" It presents MUS-ROVER, a\nself-learning system for automatically discovering rules from symbolic music.\nMUS-ROVER performs feature learning via $n$-gram models to extract\ncompositional rules --- statistical patterns over the resulting features. We\nevaluate MUS-ROVER on Bach's (SATB) chorales, demonstrating that it can recover\nknown rules, as well as identify new, characteristic patterns for further\nstudy. We discuss how the extracted rules can be used in both machine and human\ncomposition.\n\n\n## [Building an Interpretable Recommender via Loss-Preserving Transformation](https://arxiv.org/abs/1606.05819)\n[(PDF)](https://arxiv.org/pdf/1606.05819)\n\n`Authors:Amit Dhurandhar, Sechan Oh, Marek Petrik`\n\n\nComments:\n\nPresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.05819 [stat.ML]\n\n \n(or arXiv:1606.05819v1 [stat.ML] for this version)\n\n\n> Abstract: We propose a method for building an interpretable recommender system for\npersonalizing online content and promotions. Historical data available for the\nsystem consists of customer features, provided content (promotions), and user\nresponses. Unlike in a standard multi-class classification setting,\nmisclassification costs depend on both recommended actions and customers. Our\nmethod transforms such a data set to a new set which can be used with standard\ninterpretable multi-class classification algorithms. The transformation has the\ndesirable property that minimizing the standard misclassification penalty in\nthis new space is equivalent to minimizing the custom cost function.\n\n\n## [Using Visual Analytics to Interpret Predictive Machine Learning Models](https://arxiv.org/abs/1606.05685)\n[(PDF)](https://arxiv.org/pdf/1606.05685)\n\n`Authors:Josua Krause, Adam Perer, Enrico Bertini`\n\n\nComments:\n\npresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.05685 [stat.ML]\n\n \n(or arXiv:1606.05685v2 [stat.ML] for this version)\n\n\n> Abstract: It is commonly believed that increasing the interpretability of a machine\nlearning model may decrease its predictive power. However, inspecting\ninput-output relationships of those models using visual analytics, while\ntreating them as black-box, can help to understand the reasoning behind\noutcomes without sacrificing predictive quality. We identify a space of\npossible solutions and provide two examples of where such techniques have been\nsuccessfully used in practice.\n\n\n## [Interpretable Machine Learning Models for the Digital Clock Drawing Test](https://arxiv.org/abs/1606.07163)\n[(PDF)](https://arxiv.org/pdf/1606.07163)\n\n`Authors:William Souillard-Mandar, Randall Davis, Cynthia Rudin, Rhoda Au, Dana Penney`\n\n\nComments:\n\nPresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.07163 [stat.ML]\n\n \n(or arXiv:1606.07163v1 [stat.ML] for this version)\n\n\n> Abstract: The Clock Drawing Test (CDT) is a rapid, inexpensive, and popular\nneuropsychological screening tool for cognitive conditions. The Digital Clock\nDrawing Test (dCDT) uses novel software to analyze data from a digitizing\nballpoint pen that reports its position with considerable spatial and temporal\nprecision, making possible the analysis of both the drawing process and final\nproduct. We developed methodology to analyze pen stroke data from these\ndrawings, and computed a large collection of features which were then analyzed\nwith a variety of machine learning techniques. The resulting scoring systems\nwere designed to be more accurate than the systems currently used by\nclinicians, but just as interpretable and easy to use. The systems also allow\nus to quantify the tradeoff between accuracy and interpretability. We created\nautomated versions of the CDT scoring systems currently used by clinicians,\nallowing us to benchmark our models, which indicated that our machine learning\nmodels substantially outperformed the existing scoring systems.\n\n\n## [SnapToGrid: From Statistical to Interpretable Models for Biomedical  Information Extraction](https://arxiv.org/abs/1606.09604)\n[(PDF)](https://arxiv.org/pdf/1606.09604)\n\n`Authors:Marco A. Valenzuela-Escarcega, Gus Hahn-Powell, Dane Bell, Mihai Surdeanu`\n\n\nSubjects:\n\nComputation and Language (cs.CL)\n\n\nCite as:\n\narXiv:1606.09604 [cs.CL]\n\n \n(or arXiv:1606.09604v1 [cs.CL] for this version)\n\n\n> Abstract: We propose an approach for biomedical information extraction that marries the\nadvantages of machine learning models, e.g., learning directly from data, with\nthe benefits of rule-based approaches, e.g., interpretability. Our approach\nstarts by training a feature-based statistical model, then converts this model\nto a rule-based variant by converting its features to rules, and \"snapping to\ngrid\" the feature weights to discrete votes. In doing so, our proposal takes\nadvantage of the large body of work in machine learning, but it produces an\ninterpretable model, which can be directly edited by experts. We evaluate our\napproach on the BioNLP 2009 event extraction task. Our results show that there\nis a small performance penalty when converting the statistical model to rules,\nbut the gain in interpretability compensates for that: with minimal effort,\nhuman experts improve this model to have similar performance to the statistical\nmodel that served as starting point.\n\n\n## [Meaningful Models: Utilizing Conceptual Structure to Improve Machine  Learning Interpretability](https://arxiv.org/abs/1607.00279)\n[(PDF)](https://arxiv.org/pdf/1607.00279)\n\n`Authors:Nick Condry`\n\n\nComments:\n\n5 pages, 3 figures, presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Artificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1607.00279 [stat.ML]\n\n \n(or arXiv:1607.00279v1 [stat.ML] for this version)\n\n\n> Abstract: The last decade has seen huge progress in the development of advanced machine\nlearning models; however, those models are powerless unless human users can\ninterpret them. Here we show how the mind's construction of concepts and\nmeaning can be used to create more interpretable machine learning models. By\nproposing a novel method of classifying concepts, in terms of 'form' and\n'function', we elucidate the nature of meaning and offer proposals to improve\nmodel understandability. As machine learning begins to permeate daily life,\ninterpretable models may serve as a bridge between domain-expert authors and\nnon-expert users.\n\n\n## [RETAIN: An Interpretable Predictive Model for Healthcare using Reverse  Time Attention Mechanism](https://arxiv.org/abs/1608.05745)\n[(PDF)](https://arxiv.org/pdf/1608.05745)\n\n`Authors:Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Andy Schuetz, Walter F. Stewart, Jimeng Sun`\n\n\nComments:\n\nAccepted at Neural Information Processing Systems (NIPS) 2016\n\nSubjects:\n\nLearning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)\n\n\nCite as:\n\narXiv:1608.05745 [cs.LG]\n\n \n(or arXiv:1608.05745v4 [cs.LG] for this version)\n\n\n> Abstract: Accuracy and interpretability are two dominant features of successful\npredictive models. Typically, a choice must be made in favor of complex black\nbox models such as recurrent neural networks (RNN) for accuracy versus less\naccurate but more interpretable traditional models such as logistic regression.\nThis tradeoff poses challenges in medicine where both accuracy and\ninterpretability are important. We addressed this challenge by developing the\nREverse Time AttentIoN model (RETAIN) for application to Electronic Health\nRecords (EHR) data. RETAIN achieves high accuracy while remaining clinically\ninterpretable and is based on a two-level neural attention model that detects\ninfluential past visits and significant clinical variables within those visits\n(e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR\ndata in a reverse time order so that recent clinical visits are likely to\nreceive higher attention. RETAIN was tested on a large health system EHR\ndataset with 14 million visits completed by 263K patients over an 8 year period\nand demonstrated predictive accuracy and computational scalability comparable\nto state-of-the-art methods such as RNN, and ease of interpretability\ncomparable to traditional models.\n\n\n## [Towards Transparent AI Systems: Interpreting Visual Question Answering  Models](https://arxiv.org/abs/1608.08974)\n[(PDF)](https://arxiv.org/pdf/1608.08974)\n\n`Authors:Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1608.08974 [cs.CV]\n\n \n(or arXiv:1608.08974v2 [cs.CV] for this version)\n\n\n> Abstract: Deep neural networks have shown striking progress and obtained\nstate-of-the-art results in many AI research fields in the recent years.\nHowever, it is often unsatisfying to not know why they predict what they do. In\nthis paper, we address the problem of interpreting Visual Question Answering\n(VQA) models. Specifically, we are interested in finding what part of the input\n(pixels in images or words in questions) the VQA model focuses on while\nanswering the question. To tackle this problem, we use two visualization\ntechniques -- guided backpropagation and occlusion -- to find important words\nin the question and important regions in the image. We then present qualitative\nand quantitative analyses of these importance maps. We found that even without\nexplicit attention mechanisms, VQA models may sometimes be implicitly attending\nto relevant regions in the image, and often to appropriate words in the\nquestion.\n\n\n## [Real Time Fine-Grained Categorization with Accuracy and Interpretability](https://arxiv.org/abs/1610.00824)\n[(PDF)](https://arxiv.org/pdf/1610.00824)\n\n`Authors:Shaoli Huang, Dacheng Tao`\n\n\nComments:\n\narXiv admin note: text overlap with arXiv:1512.08086\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1610.00824 [cs.CV]\n\n \n(or arXiv:1610.00824v1 [cs.CV] for this version)\n\n\n> Abstract: A well-designed fine-grained categorization system usually has three\ncontradictory requirements: accuracy (the ability to identify objects among\nsubordinate categories); interpretability (the ability to provide\nhuman-understandable explanation of recognition system behavior); and\nefficiency (the speed of the system). To handle the trade-off between accuracy\nand interpretability, we propose a novel \"Deeper Part-Stacked CNN\" architecture\narmed with interpretability by modeling subtle differences between object\nparts. The proposed architecture consists of a part localization network, a\ntwo-stream classification network that simultaneously encodes object-level and\npart-level cues, and a feature vectors fusion component. Specifically, the part\nlocalization network is implemented by exploring a new paradigm for key point\nlocalization that first samples a small number of representable pixels and then\ndetermine their labels via a convolutional layer followed by a softmax layer.\nWe also use a cropping layer to extract part features and propose a scale\nmean-max layer for feature fusion learning. Experimentally, our proposed method\noutperform state-of-the-art approaches both in part localization task and\nclassification task on Caltech-UCSD Birds-200-2011. Moreover, by adopting a set\nof sharing strategies between the computation of multiple object parts, our\nsingle model is fairly efficient running at 32 frames/sec.\n\n\n## [Interpreting Neural Networks to Improve Politeness Comprehension](https://arxiv.org/abs/1610.02683)\n[(PDF)](https://arxiv.org/pdf/1610.02683)\n\n`Authors:Malika Aubakirova, Mohit Bansal`\n\n\nComments:\n\nTo appear at EMNLP 2016\n\nSubjects:\n\nComputation and Language (cs.CL); Artificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1610.02683 [cs.CL]\n\n \n(or arXiv:1610.02683v1 [cs.CL] for this version)\n\n\n> Abstract: We present an interpretable neural network approach to predicting and\nunderstanding politeness in natural language requests. Our models are based on\nsimple convolutional neural networks directly on raw text, avoiding any manual\nidentification of complex sentiment or syntactic features, while performing\nbetter than such feature-based models from previous work. More importantly, we\nuse the challenging task of politeness prediction as a testbed to next present\na much-needed understanding of what these successful networks are actually\nlearning. For this, we present several network visualizations based on\nactivation clusters, first derivative saliency, and embedding space\ntransformations, helping us automatically identify several subtle linguistics\nmarkers of politeness theories. Further, this analysis reveals multiple novel,\nhigh-scoring politeness strategies which, when added back as new features,\nreduce the accuracy gap between the original featurized system and the neural\nmodel, thus providing a clear quantitative interpretation of the success of\nthese neural networks.\n\n\n## [Particle Swarm Optimization for Generating Interpretable Fuzzy  Reinforcement Learning Policies](https://arxiv.org/abs/1610.05984)\n[(PDF)](https://arxiv.org/pdf/1610.05984)\n\n`Authors:Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft`\n\n\nSubjects:\n\nNeural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)\n\n\nJournal reference:\n\nEngineering Applications of Artificial Intelligence, Volume 65C,\n  October 2017, Pages 87-98\n\n\nDOI:\n\n10.1016/j.engappai.2017.07.005\n\n\nCite as:\n\narXiv:1610.05984 [cs.NE]\n\n \n(or arXiv:1610.05984v5 [cs.NE] for this version)\n\n\n> Abstract: Fuzzy controllers are efficient and interpretable system controllers for\ncontinuous state and action spaces. To date, such controllers have been\nconstructed manually or trained automatically either using expert-generated\nproblem-specific cost functions or incorporating detailed knowledge about the\noptimal control strategy. Both requirements for automatic training processes\nare not found in most real-world reinforcement learning (RL) problems. In such\napplications, online learning is often prohibited for safety reasons because\nonline learning requires exploration of the problem's dynamics during policy\ntraining. We introduce a fuzzy particle swarm reinforcement learning (FPSRL)\napproach that can construct fuzzy RL policies solely by training parameters on\nworld models that simulate real system dynamics. These world models are created\nby employing an autonomous machine learning technique that uses previously\ngenerated transition samples of a real system. To the best of our knowledge,\nthis approach is the first to relate self-organizing fuzzy controllers to\nmodel-based batch RL. Therefore, FPSRL is intended to solve problems in domains\nwhere online learning is prohibited, system dynamics are relatively easy to\nmodel from previously generated default policy transition samples, and it is\nexpected that a relatively easily interpretable control policy exists. The\nefficiency of the proposed approach with problems from such domains is\ndemonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole\nbalancing, and cart-pole swing-up. Our experimental results demonstrate\nhigh-performing, interpretable fuzzy policies.\n\n\n## [Embedding Projector: Interactive Visualization and Interpretation of  Embeddings](https://arxiv.org/abs/1611.05469)\n[(PDF)](https://arxiv.org/pdf/1611.05469)\n\n`Authors:Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, Martin Wattenberg`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Human-Computer Interaction (cs.HC)\n\n\nCite as:\n\narXiv:1611.05469 [stat.ML]\n\n \n(or arXiv:1611.05469v1 [stat.ML] for this version)\n\n\n> Abstract: Embeddings are ubiquitous in machine learning, appearing in recommender\nsystems, NLP, and many other applications. Researchers and developers often\nneed to explore the properties of a specific embedding, and one way to analyze\nembeddings is to visualize them. We present the Embedding Projector, a tool for\ninteractive visualization and interpretation of embeddings.\n\n\n## [Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning](https://arxiv.org/abs/1611.04246)\n[(PDF)](https://arxiv.org/pdf/1611.04246)\n\n`Authors:Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu`\n\n\nComments:\n\nin the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1611.04246 [cs.CV]\n\n \n(or arXiv:1611.04246v2 [cs.CV] for this version)\n\n\n> Abstract: This paper proposes a learning strategy that extracts object-part concepts\nfrom a pre-trained convolutional neural network (CNN), in an attempt to 1)\nexplore explicit semantics hidden in CNN units and 2) gradually grow a\nsemantically interpretable graphical model on the pre-trained CNN for\nhierarchical object understanding. Given part annotations on very few (e.g.,\n3-12) objects, our method mines certain latent patterns from the pre-trained\nCNN and associates them with different semantic parts. We use a four-layer\nAnd-Or graph to organize the mined latent patterns, so as to clarify their\ninternal semantic hierarchy. Our method is guided by a small number of part\nannotations, and it achieves superior performance (about 13%-107% improvement)\nin part center prediction on the PASCAL VOC and ImageNet datasets.\n\n\n## [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](https://arxiv.org/abs/1611.05934)\n[(PDF)](https://arxiv.org/pdf/1611.05934)\n\n`Authors:Viktoriya Krakovna, Finale Doshi-Velez`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems. arXiv admin note: substantial text overlap with arXiv:1606.05320\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.05934 [stat.ML]\n\n \n(or arXiv:1611.05934v1 [stat.ML] for this version)\n\n\n> Abstract: As deep neural networks continue to revolutionize various application\ndomains, there is increasing interest in making these powerful models more\nunderstandable and interpretable, and narrowing down the causes of good and bad\npredictions. We focus on recurrent neural networks, state of the art models in\nspeech recognition and translation. Our approach to increasing interpretability\nis by combining a long short-term memory (LSTM) model with a hidden Markov\nmodel (HMM), a simpler and more transparent model. We add the HMM state\nprobabilities to the output layer of the LSTM, and then train the HMM and LSTM\neither sequentially or jointly. The LSTM can make use of the information from\nthe HMM, and fill in the gaps when the HMM is not performing well. A small\nhybrid model usually performs better than a standalone LSTM of the same size,\nespecially on smaller data sets. We test the algorithms on text data and\nmedical time series data, and find that the LSTM and HMM learn complementary\ninformation about the features in the text.\n\n\n## [GENESIM: genetic extraction of a single, interpretable model](https://arxiv.org/abs/1611.05722)\n[(PDF)](https://arxiv.org/pdf/1611.05722)\n\n`Authors:Gilles Vandewiele, Olivier Janssens, Femke Ongenae, Filip De Turck, Sofie Van Hoecke`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.05722 [stat.ML]\n\n \n(or arXiv:1611.05722v1 [stat.ML] for this version)\n\n\n> Abstract: Models obtained by decision tree induction techniques excel in being\ninterpretable.However, they can be prone to overfitting, which results in a low\npredictive performance. Ensemble techniques are able to achieve a higher\naccuracy. However, this comes at a cost of losing interpretability of the\nresulting model. This makes ensemble techniques impractical in applications\nwhere decision support, instead of decision making, is crucial.\nTo bridge this gap, we present the GENESIM algorithm that transforms an\nensemble of decision trees to a single decision tree with an enhanced\npredictive performance by using a genetic algorithm. We compared GENESIM to\nprevalent decision tree induction and ensemble techniques using twelve publicly\navailable data sets. The results show that GENESIM achieves a better predictive\nperformance on most of these data sets than decision tree induction techniques\nand a predictive performance in the same order of magnitude as the ensemble\ntechniques. Moreover, the resulting model of GENESIM has a very low complexity,\nmaking it very interpretable, in contrast to ensemble techniques.\n\n\n## [Stratified Knowledge Bases as Interpretable Probabilistic Models  (Extended Abstract)](https://arxiv.org/abs/1611.06174)\n[(PDF)](https://arxiv.org/pdf/1611.06174)\n\n`Authors:Ondrej Kuzelka, Jesse Davis, Steven Schockaert`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1611.06174 [cs.AI]\n\n \n(or arXiv:1611.06174v1 [cs.AI] for this version)\n\n\n> Abstract: In this paper, we advocate the use of stratified logical theories for\nrepresenting probabilistic models. We argue that such encodings can be more\ninterpretable than those obtained in existing frameworks such as Markov logic\nnetworks. Among others, this allows for the use of domain experts to improve\nlearned models by directly removing, adding, or modifying logical formulas.\n\n\n## [Learning Interpretability for Visualizations using Adapted Cox Models  through a User Experiment](https://arxiv.org/abs/1611.06175)\n[(PDF)](https://arxiv.org/pdf/1611.06175)\n\n`Authors:Adrien Bibal, Benoit Frénay`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.06175 [stat.ML]\n\n \n(or arXiv:1611.06175v1 [stat.ML] for this version)\n\n\n> Abstract: In order to be useful, visualizations need to be interpretable. This paper\nuses a user-based approach to combine and assess quality measures in order to\nbetter model user preferences. Results show that cluster separability measures\nare outperformed by a neighborhood conservation measure, even though the former\nare usually considered as intuitively representative of user motives. Moreover,\ncombining measures, as opposed to using a single measure, further improves\nprediction performances.\n\n\n## [Tree Space Prototypes: Another Look at Making Tree Ensembles  Interpretable](https://arxiv.org/abs/1611.07115)\n[(PDF)](https://arxiv.org/pdf/1611.07115)\n\n`Authors:Hui Fen Tan, Giles Hooker, Martin T. Wells`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.07115 [stat.ML]\n\n \n(or arXiv:1611.07115v1 [stat.ML] for this version)\n\n\n> Abstract: Ensembles of decision trees have good prediction accuracy but suffer from a\nlack of interpretability. We propose a new approach for interpreting tree\nensembles by finding prototypes in tree space, utilizing the naturally-learned\nsimilarity measure from the tree ensemble. Demonstrating the method on random\nforests, we show that the method benefits from unique aspects of tree ensembles\nby leveraging tree structure to sequentially find prototypes. The method\nprovides good prediction accuracy when found prototypes are used in\nnearest-prototype classifiers, while using fewer prototypes than competitor\nmethods. We are investigating the sensitivity of the method to different\nprototype-finding procedures and demonstrating it on higher-dimensional data.\n\n\n## [Interpreting Finite Automata for Sequential Data](https://arxiv.org/abs/1611.07100)\n[(PDF)](https://arxiv.org/pdf/1611.07100)\n\n`Authors:Christian Albert Hammerschmidt, Sicco Verwer, Qin Lin, Radu State`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Artificial Intelligence (cs.AI)\n\n\nACM classes:\n\nI.2.6\n\n\nCite as:\n\narXiv:1611.07100 [stat.ML]\n\n \n(or arXiv:1611.07100v2 [stat.ML] for this version)\n\n\n> Abstract: Automaton models are often seen as interpretable models. Interpretability\nitself is not well defined: it remains unclear what interpretability means\nwithout first explicitly specifying objectives or desired attributes. In this\npaper, we identify the key properties used to interpret automata and propose a\nmodification of a state-merging approach to learn variants of finite state\nautomata. We apply the approach to problems beyond typical grammar inference\ntasks. Additionally, we cover several use-cases for prediction, classification,\nand clustering on sequential data in both supervised and unsupervised scenarios\nto show how the identified key properties are applicable in a wide range of\ncontexts.\n\n\n## [Inducing Interpretable Representations with Variational Autoencoders](https://arxiv.org/abs/1611.07492)\n[(PDF)](https://arxiv.org/pdf/1611.07492)\n\n`Authors:N. Siddharth, Brooks Paige, Alban Desmaison, Jan-Willem Van de Meent, Frank Wood, Noah D. Goodman, Pushmeet Kohli, Philip H.S. Torr`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.07492 [stat.ML]\n\n \n(or arXiv:1611.07492v1 [stat.ML] for this version)\n\n\n> Abstract: We develop a framework for incorporating structured graphical models in the\n\\emph{encoders} of variational autoencoders (VAEs) that allows us to induce\ninterpretable representations through approximate variational inference. This\nallows us to both perform reasoning (e.g. classification) under the structural\nconstraints of a given graphical model, and use deep generative models to deal\nwith messy, high-dimensional domains where it is often difficult to model all\nthe variation. Learning in this framework is carried out end-to-end with a\nvariational objective, applying to both unsupervised and semi-supervised\nschemes.\n\n\n## [Interpretation of Prediction Models Using the Input Gradient](https://arxiv.org/abs/1611.07634)\n[(PDF)](https://arxiv.org/pdf/1611.07634)\n\n`Authors:Yotam Hechtlinger`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.07634 [stat.ML]\n\n \n(or arXiv:1611.07634v1 [stat.ML] for this version)\n\n\n> Abstract: State of the art machine learning algorithms are highly optimized to provide\nthe optimal prediction possible, naturally resulting in complex models. While\nthese models often outperform simpler more interpretable models by order of\nmagnitudes, in terms of understanding the way the model functions, we are often\nfacing a \"black box\".\nIn this paper we suggest a simple method to interpret the behavior of any\npredictive model, both for regression and classification. Given a particular\nmodel, the information required to interpret it can be obtained by studying the\npartial derivatives of the model with respect to the input. We exemplify this\ninsight by interpreting convolutional and multi-layer neural networks in the\nfield of natural language processing.\n\n\n## [Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery](https://arxiv.org/abs/1611.07252)\n[(PDF)](https://arxiv.org/pdf/1611.07252)\n\n`Authors:Scott Wisdom, Thomas Powers, James Pitton, Les Atlas`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.07252 [stat.ML]\n\n \n(or arXiv:1611.07252v1 [stat.ML] for this version)\n\n\n> Abstract: Recurrent neural networks (RNNs) are powerful and effective for processing\nsequential data. However, RNNs are usually considered \"black box\" models whose\ninternal structure and learned parameters are not interpretable. In this paper,\nwe propose an interpretable RNN based on the sequential iterative\nsoft-thresholding algorithm (SISTA) for solving the sequential sparse recovery\nproblem, which models a sequence of correlated observations with a sequence of\nsparse latent vectors. The architecture of the resulting SISTA-RNN is\nimplicitly defined by the computational structure of SISTA, which results in a\nnovel stacked RNN architecture. Furthermore, the weights of the SISTA-RNN are\nperfectly interpretable as the parameters of a principled statistical model,\nwhich in this case include a sparsifying dictionary, iterative step size, and\nregularization parameters. In addition, on a particular sequential compressive\nsensing task, the SISTA-RNN trains faster and achieves better performance than\nconventional state-of-the-art black box RNNs, including long-short term memory\n(LSTM) RNNs.\n\n\n## [An unexpected unity among methods for interpreting model predictions](https://arxiv.org/abs/1611.07478)\n[(PDF)](https://arxiv.org/pdf/1611.07478)\n\n`Authors:Scott Lundberg, Su-In Lee`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1611.07478 [cs.AI]\n\n \n(or arXiv:1611.07478v3 [cs.AI] for this version)\n\n\n> Abstract: Understanding why a model made a certain prediction is crucial in many data\nscience fields. Interpretable predictions engender appropriate trust and\nprovide insight into how the model may be improved. However, with large modern\ndatasets the best accuracy is often achieved by complex models even experts\nstruggle to interpret, which creates a tension between accuracy and\ninterpretability. Recently, several methods have been proposed for interpreting\npredictions from complex models by estimating the importance of input features.\nHere, we present how a model-agnostic additive representation of the importance\nof input features unifies current methods. This representation is optimal, in\nthe sense that it is the only set of additive values that satisfies important\nproperties. We show how we can leverage these properties to create novel visual\nexplanations of model predictions. The thread of unity that this representation\nweaves through the literature indicates that there are common principles to be\nlearned about the interpretation of model predictions that apply in many\nscenarios.\n\n\n## [Input Switched Affine Networks: An RNN Architecture Designed for  Interpretability](https://arxiv.org/abs/1611.09434)\n[(PDF)](https://arxiv.org/pdf/1611.09434)\n\n`Authors:Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo`\n\n\nComments:\n\nICLR 2107 submission: this https URL\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)\n\n\nCite as:\n\narXiv:1611.09434 [cs.AI]\n\n \n(or arXiv:1611.09434v2 [cs.AI] for this version)\n\n\n> Abstract: There exist many problem domains where the interpretability of neural network\nmodels is essential for deployment. Here we introduce a recurrent architecture\ncomposed of input-switched affine transformations - in other words an RNN\nwithout any explicit nonlinearities, but with input-dependent recurrent\nweights. This simple form allows the RNN to be analyzed via straightforward\nlinear methods: we can exactly characterize the linear contribution of each\ninput to the model predictions; we can use a change-of-basis to disentangle\ninput, output, and computational hidden unit subspaces; we can fully\nreverse-engineer the architecture's solution to a simple task. Despite this\nease of interpretation, the input switched affine network achieves reasonable\nperformance on a text modeling tasks, and allows greater computational\nefficiency than networks with standard nonlinearities.\n\n\n## [Large scale modeling of antimicrobial resistance with interpretable  classifiers](https://arxiv.org/abs/1612.01030)\n[(PDF)](https://arxiv.org/pdf/1612.01030)\n\n`Authors:Alexandre Drouin, Frédéric Raymond, Gaël Letarte St-Pierre, Mario Marchand, Jacques Corbeil, François Laviolette`\n\n\nComments:\n\nPeer-reviewed and accepted for presentation at the Machine Learning for Health Workshop, NIPS 2016, Barcelona, Spain\n\nSubjects:\n\nGenomics (q-bio.GN); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1612.01030 [q-bio.GN]\n\n \n(or arXiv:1612.01030v1 [q-bio.GN] for this version)\n\n\n> Abstract: Antimicrobial resistance is an important public health concern that has\nimplications in the practice of medicine worldwide. Accurately predicting\nresistance phenotypes from genome sequences shows great promise in promoting\nbetter use of antimicrobial agents, by determining which antibiotics are likely\nto be effective in specific clinical cases. In healthcare, this would allow for\nthe design of treatment plans tailored for specific individuals, likely\nresulting in better clinical outcomes for patients with bacterial infections.\nIn this work, we present the recent work of Drouin et al. (2016) on using Set\nCovering Machines to learn highly interpretable models of antibiotic resistance\nand complement it by providing a large scale application of their method to the\nentire PATRIC database. We report prediction results for 36 new datasets and\npresent the Kover AMR platform, a new web-based tool allowing the visualization\nand interpretation of the generated models.\n\n\n## [Interpretable Semantic Textual Similarity: Finding and explaining  differences between sentences](https://arxiv.org/abs/1612.04868)\n[(PDF)](https://arxiv.org/pdf/1612.04868)\n\n`Authors:I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, E. Agirre`\n\n\nComments:\n\nPreprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)\n\nSubjects:\n\nComputation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nDOI:\n\n10.1016/j.knosys.2016.12.013\n\n\nCite as:\n\narXiv:1612.04868 [cs.CL]\n\n \n(or arXiv:1612.04868v1 [cs.CL] for this version)\n\n\n> Abstract: User acceptance of artificial intelligence agents might depend on their\nability to explain their reasoning, which requires adding an interpretability\nlayer that fa- cilitates users to understand their behavior. This paper focuses\non adding an in- terpretable layer on top of Semantic Textual Similarity (STS),\nwhich measures the degree of semantic equivalence between two sentences. The\ninterpretability layer is formalized as the alignment between pairs of segments\nacross the two sentences, where the relation between the segments is labeled\nwith a relation type and a similarity score. We present a publicly available\ndataset of sentence pairs annotated following the formalization. We then\ndevelop a system trained on this dataset which, given a sentence pair, explains\nwhat is similar and different, in the form of graded and typed segment\nalignments. When evaluated on the dataset, the system performs better than an\ninformed baseline, showing that the dataset and task are well-defined and\nfeasible. Most importantly, two user studies show how the system output can be\nused to automatically produce explanations in natural language. Users performed\nbetter when having access to the explanations, pro- viding preliminary evidence\nthat our dataset and method to automatically produce explanations is useful in\nreal applications.\n\n\n## [Towards a New Interpretation of Separable Convolutions](https://arxiv.org/abs/1701.04489)\n[(PDF)](https://arxiv.org/pdf/1701.04489)\n\n`Authors:Tapabrata Ghosh`\n\n\nSubjects:\n\nLearning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1701.04489 [cs.LG]\n\n \n(or arXiv:1701.04489v1 [cs.LG] for this version)\n\n\n> Abstract: In recent times, the use of separable convolutions in deep convolutional\nneural network architectures has been explored. Several researchers, most\nnotably (Chollet, 2016) and (Ghosh, 2017) have used separable convolutions in\ntheir deep architectures and have demonstrated state of the art or close to\nstate of the art performance. However, the underlying mechanism of action of\nseparable convolutions are still not fully understood. Although their\nmathematical definition is well understood as a depthwise convolution followed\nby a pointwise convolution, deeper interpretations such as the extreme\nInception hypothesis (Chollet, 2016) have failed to provide a thorough\nexplanation of their efficacy. In this paper, we propose a hybrid\ninterpretation that we believe is a better model for explaining the efficacy of\nseparable convolutions.\n\n\n## [Towards A Rigorous Science of Interpretable Machine Learning](https://arxiv.org/abs/1702.08608)\n[(PDF)](https://arxiv.org/pdf/1702.08608)\n\n`Authors:Finale Doshi-Velez, Been Kim`\n\n\nSubjects:\n\nMachine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1702.08608 [stat.ML]\n\n \n(or arXiv:1702.08608v2 [stat.ML] for this version)\n\n\n> Abstract: As machine learning systems become ubiquitous, there has been a surge of\ninterest in interpretable machine learning: systems that provide explanation\nfor their outputs. These explanations are often used to qualitatively assess\nother criteria such as safety or non-discrimination. However, despite the\ninterest in interpretability, there is very little consensus on what\ninterpretable machine learning is and how it should be measured. In this\nposition paper, we first define interpretability and describe when\ninterpretability is needed (and when it is not). Next, we suggest a taxonomy\nfor rigorous evaluation and expose open questions towards a more rigorous\nscience of interpretable machine learning.\n\n\n## [Streaming Weak Submodularity: Interpreting Neural Networks on the Fly](https://arxiv.org/abs/1703.02647)\n[(PDF)](https://arxiv.org/pdf/1703.02647)\n\n`Authors:Ethan R. Elenberg, Alexandros G. Dimakis, Moran Feldman, Amin Karbasi`\n\n\nComments:\n\nTo appear in NIPS 2017\n\nSubjects:\n\nMachine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1703.02647 [stat.ML]\n\n \n(or arXiv:1703.02647v3 [stat.ML] for this version)\n\n\n> Abstract: In many machine learning applications, it is important to explain the\npredictions of a black-box classifier. For example, why does a deep neural\nnetwork assign an image to a particular class? We cast interpretability of\nblack-box classifiers as a combinatorial maximization problem and propose an\nefficient streaming algorithm to solve it subject to cardinality constraints.\nBy extending ideas from Badanidiyuru et al. [2014], we provide a constant\nfactor approximation guarantee for our algorithm in the case of random stream\norder and a weakly submodular objective function. This is the first such\ntheoretical guarantee for this general class of functions, and we also show\nthat no such algorithm exists for a worst case stream order. Our algorithm\nobtains similar explanations of Inception V3 predictions $10$ times faster than\nthe state-of-the-art LIME framework of Ribeiro et al. [2016].\n\n\n## [Interpretable Structure-Evolving LSTM](https://arxiv.org/abs/1703.03055)\n[(PDF)](https://arxiv.org/pdf/1703.03055)\n\n`Authors:Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing`\n\n\nComments:\n\nTo appear in CVPR 2017 as a spotlight paper\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1703.03055 [cs.CV]\n\n \n(or arXiv:1703.03055v1 [cs.CV] for this version)\n\n\n> Abstract: This paper develops a general framework for learning interpretable data\nrepresentation via Long Short-Term Memory (LSTM) recurrent neural networks over\nhierarchal graph structures. Instead of learning LSTM models over the pre-fixed\nstructures, we propose to further learn the intermediate interpretable\nmulti-level graph structures in a progressive and stochastic way from data\nduring the LSTM network optimization. We thus call this model the\nstructure-evolving LSTM. In particular, starting with an initial element-level\ngraph representation where each node is a small data element, the\nstructure-evolving LSTM gradually evolves the multi-level graph representations\nby stochastically merging the graph nodes with high compatibilities along the\nstacked LSTM layers. In each LSTM layer, we estimate the compatibility of two\nconnected nodes from their corresponding LSTM gate outputs, which is used to\ngenerate a merging probability. The candidate graph structures are accordingly\ngenerated where the nodes are grouped into cliques with their merging\nprobabilities. We then produce the new graph structure with a\nMetropolis-Hasting algorithm, which alleviates the risk of getting stuck in\nlocal optimums by stochastic sampling with an acceptance probability. Once a\ngraph structure is accepted, a higher-level graph is then constructed by taking\nthe partitioned cliques as its nodes. During the evolving process,\nrepresentation becomes more abstracted in higher-levels where redundant\ninformation is filtered out, allowing more efficient propagation of long-range\ndata dependencies. We evaluate the effectiveness of structure-evolving LSTM in\nthe application of semantic object parsing and demonstrate its advantage over\nstate-of-the-art LSTM models on standard benchmarks.\n\n\n## [Improving Interpretability of Deep Neural Networks with Semantic  Information](https://arxiv.org/abs/1703.04096)\n[(PDF)](https://arxiv.org/pdf/1703.04096)\n\n`Authors:Yinpeng Dong, Hang Su, Jun Zhu, Bo Zhang`\n\n\nComments:\n\nTo appear in CVPR 2017\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1703.04096 [cs.CV]\n\n \n(or arXiv:1703.04096v2 [cs.CV] for this version)\n\n\n> Abstract: Interpretability of deep neural networks (DNNs) is essential since it enables\nusers to understand the overall strengths and weaknesses of the models, conveys\nan understanding of how the models will behave in the future, and how to\ndiagnose and correct potential problems. However, it is challenging to reason\nabout what a DNN actually does due to its opaque or black-box nature. To\naddress this issue, we propose a novel technique to improve the\ninterpretability of DNNs by leveraging the rich semantic information embedded\nin human descriptions. By concentrating on the video captioning task, we first\nextract a set of semantically meaningful topics from the human descriptions\nthat cover a wide range of visual concepts, and integrate them into the model\nwith an interpretive loss. We then propose a prediction difference maximization\nalgorithm to interpret the learned features of each neuron. Experimental\nresults demonstrate its effectiveness in video captioning using the\ninterpretable features, which can also be transferred to video action\nrecognition. By clearly understanding the learned features, users can easily\nrevise false predictions via a human-in-the-loop procedure.\n\n\n## [InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations](https://arxiv.org/abs/1703.08840)\n[(PDF)](https://arxiv.org/pdf/1703.08840)\n\n`Authors:Yunzhu Li, Jiaming Song, Stefano Ermon`\n\n\nComments:\n\n14 pages, NIPS 2017\n\nSubjects:\n\nLearning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1703.08840 [cs.LG]\n\n \n(or arXiv:1703.08840v2 [cs.LG] for this version)\n\n\n> Abstract: The goal of imitation learning is to mimic expert behavior without access to\nan explicit reward signal. Expert demonstrations provided by humans, however,\noften show significant variability due to latent factors that are typically not\nexplicitly modeled. In this paper, we propose a new algorithm that can infer\nthe latent structure of expert demonstrations in an unsupervised way. Our\nmethod, built on top of Generative Adversarial Imitation Learning, can not only\nimitate complex behaviors, but also learn interpretable and meaningful\nrepresentations of complex behavioral data, including visual demonstrations. In\nthe driving domain, we show that a model learned from human demonstrations is\nable to both accurately reproduce a variety of behaviors and accurately\nanticipate human actions using raw visual inputs. Compared with various\nbaselines, our method can better capture the latent structure underlying expert\ndemonstrations, often recovering semantically meaningful factors of variation\nin the data.\n\n\n## [Interpretable Learning for Self-Driving Cars by Visualizing Causal  Attention](https://arxiv.org/abs/1703.10631)\n[(PDF)](https://arxiv.org/pdf/1703.10631)\n\n`Authors:Jinkyu Kim, John Canny`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1703.10631 [cs.CV]\n\n \n(or arXiv:1703.10631v1 [cs.CV] for this version)\n\n\n> Abstract: Deep neural perception and control networks are likely to be a key component\nof self-driving vehicles. These models need to be explainable - they should\nprovide easy-to-interpret rationales for their behavior - so that passengers,\ninsurance companies, law enforcement, developers etc., can understand what\ntriggered a particular behavior. Here we explore the use of visual\nexplanations. These explanations take the form of real-time highlighted regions\nof an image that causally influence the network's output (steering control).\nOur approach is two-stage. In the first stage, we use a visual attention model\nto train a convolution network end-to-end from images to steering angle. The\nattention model highlights image regions that potentially influence the\nnetwork's output. Some of these are true influences, but some are spurious. We\nthen apply a causal filtering step to determine which input regions actually\ninfluence the output. This produces more succinct visual explanations and more\naccurately exposes the network's behavior. We demonstrate the effectiveness of\nour model on three datasets totaling 16 hours of driving. We first show that\ntraining with attention does not degrade the performance of the end-to-end\nnetwork. Then we show that the network causally cues on a variety of features\nthat are used by humans while driving.\n\n\n## [Interpretable 3D Human Action Analysis with Temporal Convolutional  Networks](https://arxiv.org/abs/1704.04516)\n[(PDF)](https://arxiv.org/pdf/1704.04516)\n\n`Authors:Tae Soo Kim, Austin Reiter`\n\n\nComments:\n\n8 pages, 5 figures, BNMW CVPR 2017 Submission\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nMSC classes:\n\n68T45, 68T10 (Primary)\n\n\nACM classes:\n\nI.2.10; I.5.4\n\n\nCite as:\n\narXiv:1704.04516 [cs.CV]\n\n \n(or arXiv:1704.04516v1 [cs.CV] for this version)\n\n\n> Abstract: The discriminative power of modern deep learning models for 3D human action\nrecognition is growing ever so potent. In conjunction with the recent\nresurgence of 3D human action representation with 3D skeletons, the quality and\nthe pace of recent progress have been significant. However, the inner workings\nof state-of-the-art learning based methods in 3D human action recognition still\nremain mostly black-box. In this work, we propose to use a new class of models\nknown as Temporal Convolutional Neural Networks (TCN) for 3D human action\nrecognition. Compared to popular LSTM-based Recurrent Neural Network models,\ngiven interpretable input such as 3D skeletons, TCN provides us a way to\nexplicitly learn readily interpretable spatio-temporal representations for 3D\nhuman action recognition. We provide our strategy in re-designing the TCN with\ninterpretability in mind and how such characteristics of the model is leveraged\nto construct a powerful 3D activity recognition method. Through this work, we\nwish to take a step towards a spatio-temporal model that is easier to\nunderstand, explain and interpret. The resulting model, Res-TCN, achieves\nstate-of-the-art results on the largest 3D human action recognition dataset,\nNTU-RGBD.\n\n\n## [An Interpretable Knowledge Transfer Model for Knowledge Base Completion](https://arxiv.org/abs/1704.05908)\n[(PDF)](https://arxiv.org/pdf/1704.05908)\n\n`Authors:Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy`\n\n\nComments:\n\nAccepted by ACL 2017. Minor update\n\nSubjects:\n\nComputation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1704.05908 [cs.CL]\n\n \n(or arXiv:1704.05908v2 [cs.CL] for this version)\n\n\n> Abstract: Knowledge bases are important resources for a variety of natural language\nprocessing tasks but suffer from incompleteness. We propose a novel embedding\nmodel, \\emph{ITransF}, to perform knowledge base completion. Equipped with a\nsparse attention mechanism, ITransF discovers hidden concepts of relations and\ntransfer statistical strength through the sharing of concepts. Moreover, the\nlearned associations between relations and concepts, which are represented by\nsparse attention vectors, can be interpreted easily. We evaluate ITransF on two\nbenchmark datasets---WN18 and FB15k for knowledge base completion and obtains\nimprovements on both the mean rank and Hits@10 metrics, over all baselines that\ndo not use additional information.\n\n\n## [Network Dissection: Quantifying Interpretability of Deep Visual  Representations](https://arxiv.org/abs/1704.05796)\n[(PDF)](https://arxiv.org/pdf/1704.05796)\n\n`Authors:David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba`\n\n\nComments:\n\nFirst two authors contributed equally. Oral presentation at CVPR 2017\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)\n\n\nACM classes:\n\nI.2.10\n\n\nCite as:\n\narXiv:1704.05796 [cs.CV]\n\n \n(or arXiv:1704.05796v1 [cs.CV] for this version)\n\n\n> Abstract: We propose a general framework called Network Dissection for quantifying the\ninterpretability of latent representations of CNNs by evaluating the alignment\nbetween individual hidden units and a set of semantic concepts. Given any CNN\nmodel, the proposed method draws on a broad data set of visual concepts to\nscore the semantics of hidden units at each intermediate convolutional layer.\nThe units with semantics are given labels across a range of objects, parts,\nscenes, textures, materials, and colors. We use the proposed method to test the\nhypothesis that interpretability of units is equivalent to random linear\ncombinations of units, then we apply our method to compare the latent\nrepresentations of various networks when trained to solve different supervised\nand self-supervised training tasks. We further analyze the effect of training\niterations, compare networks trained with different initializations, examine\nthe impact of network depth and width, and measure the effect of dropout and\nbatch normalization on the interpretability of deep visual representations. We\ndemonstrate that the proposed method can shed light on characteristics of CNN\nmodels and training methods that go beyond measurements of their discriminative\npower.\n\n\n## [Softmax Q-Distribution Estimation for Structured Prediction: A  Theoretical Interpretation for RAML](https://arxiv.org/abs/1705.07136)\n[(PDF)](https://arxiv.org/pdf/1705.07136)\n\n`Authors:Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy`\n\n\nComments:\n\nUnder Review of ICLR 2018\n\nSubjects:\n\nLearning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1705.07136 [cs.LG]\n\n \n(or arXiv:1705.07136v3 [cs.LG] for this version)\n\n\n> Abstract: Reward augmented maximum likelihood (RAML), a simple and effective learning\nframework to directly optimize towards the reward function in structured\nprediction tasks, has led to a number of impressive empirical successes. RAML\nincorporates task-specific reward by performing maximum-likelihood updates on\ncandidate outputs sampled according to an exponentiated payoff distribution,\nwhich gives higher probabilities to candidates that are close to the reference\noutput. While RAML is notable for its simplicity, efficiency, and its\nimpressive empirical successes, the theoretical properties of RAML, especially\nthe behavior of the exponentiated payoff distribution, has not been examined\nthoroughly. In this work, we introduce softmax Q-distribution estimation, a\nnovel theoretical interpretation of RAML, which reveals the relation between\nRAML and Bayesian decision theory. The softmax Q-distribution can be regarded\nas a smooth approximation of the Bayes decision boundary, and the Bayes\ndecision rule is achieved by decoding with this Q-distribution. We further show\nthat RAML is equivalent to approximately estimating the softmax Q-distribution,\nwith the temperature $\\tau$ controlling approximation error. We perform two\nexperiments, one on synthetic data of multi-class classification and one on\nreal data of image captioning, to demonstrate the relationship between RAML and\nthe proposed softmax Q-distribution estimation method, verifying our\ntheoretical analysis. Additional experiments on three structured prediction\ntasks with rewards defined on sequential (named entity recognition), tree-based\n(dependency parsing) and irregular (machine translation) structures show\nnotable improvements over maximum likelihood baselines.\n\n\n## [Logic Tensor Networks for Semantic Image Interpretation](https://arxiv.org/abs/1705.08968)\n[(PDF)](https://arxiv.org/pdf/1705.08968)\n\n`Authors:Ivan Donadello, Luciano Serafini, Artur d'Avila Garcez`\n\n\nComments:\n\n14 pages, 2 figures, IJCAI 2017\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1705.08968 [cs.AI]\n\n \n(or arXiv:1705.08968v1 [cs.AI] for this version)\n\n\n> Abstract: Semantic Image Interpretation (SII) is the task of extracting structured\nsemantic descriptions from images. It is widely agreed that the combined use of\nvisual data and background knowledge is of great importance for SII. Recently,\nStatistical Relational Learning (SRL) approaches have been developed for\nreasoning under uncertainty and learning in the presence of data and rich\nknowledge. Logic Tensor Networks (LTNs) are an SRL framework which integrates\nneural networks with first-order fuzzy logic to allow (i) efficient learning\nfrom noisy data in the presence of logical constraints, and (ii) reasoning with\nlogical formulas describing general properties of the data. In this paper, we\ndevelop and apply LTNs to two of the main tasks of SII, namely, the\nclassification of an image's bounding boxes and the detection of the relevant\npart-of relations between objects. To the best of our knowledge, this is the\nfirst successful application of SRL to such SII tasks. The proposed approach is\nevaluated on a standard image processing benchmark. Experiments show that the\nuse of background knowledge in the form of logical constraints can improve the\nperformance of purely data-driven approaches, including the state-of-the-art\nFast Region-based Convolutional Neural Networks (Fast R-CNN). Moreover, we show\nthat the use of logical background knowledge adds robustness to the learning\nsystem when errors are present in the labels of the training data.\n\n\n## [Patchnet: Interpretable Neural Networks for Image Classification](https://arxiv.org/abs/1705.08078)\n[(PDF)](https://arxiv.org/pdf/1705.08078)\n\n`Authors:Adityanarayanan Radhakrishnan, Charles Durham, Ali Soylemezoglu, Caroline Uhler`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1705.08078 [cs.CV]\n\n \n(or arXiv:1705.08078v1 [cs.CV] for this version)\n\n\n> Abstract: The ability to visually understand and interpret learned features from\ncomplex predictive models is crucial for their acceptance in sensitive areas\nsuch as health care. To move closer to this goal of truly interpretable complex\nmodels, we present PatchNet, a network that restricts global context for image\nclassification tasks in order to easily provide visual representations of\nlearned texture features on a predetermined local scale. We demonstrate how\nPatchNet provides visual heatmap representations of the learned features, and\nwe mathematically analyze the behavior of the network during convergence. We\nalso present a version of PatchNet that is particularly well suited for\nlowering false positive rates in image classification tasks. We apply PatchNet\nto the classification of textures from the Describable Textures Dataset and to\nthe ISBI-ISIC 2016 melanoma classification challenge.\n\n\n## [A Unified Approach to Interpreting Model Predictions](https://arxiv.org/abs/1705.07874)\n[(PDF)](https://arxiv.org/pdf/1705.07874)\n\n`Authors:Scott Lundberg, Su-In Lee`\n\n\nComments:\n\nTo appear in NIPS 2017\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1705.07874 [cs.AI]\n\n \n(or arXiv:1705.07874v2 [cs.AI] for this version)\n\n\n> Abstract: Understanding why a model makes a certain prediction can be as crucial as the\nprediction's accuracy in many applications. However, the highest accuracy for\nlarge modern datasets is often achieved by complex models that even experts\nstruggle to interpret, such as ensemble or deep learning models, creating a\ntension between accuracy and interpretability. In response, various methods\nhave recently been proposed to help users interpret the predictions of complex\nmodels, but it is often unclear how these methods are related and when one\nmethod is preferable over another. To address this problem, we present a\nunified framework for interpreting predictions, SHAP (SHapley Additive\nexPlanations). SHAP assigns each feature an importance value for a particular\nprediction. Its novel components include: (1) the identification of a new class\nof additive feature importance measures, and (2) theoretical results showing\nthere is a unique solution in this class with a set of desirable properties.\nThe new class unifies six existing methods, notable because several recent\nmethods in the class lack the proposed desirable properties. Based on insights\nfrom this unification, we present new methods that show improved computational\nperformance and/or better consistency with human intuition than previous\napproaches.\n\n\n## [Interpreting Blackbox Models via Model Extraction](https://arxiv.org/abs/1705.08504)\n[(PDF)](https://arxiv.org/pdf/1705.08504)\n\n`Authors:Osbert Bastani, Carolyn Kim, Hamsa Bastani`\n\n\nSubjects:\n\nLearning (cs.LG)\n\n\nCite as:\n\narXiv:1705.08504 [cs.LG]\n\n \n(or arXiv:1705.08504v1 [cs.LG] for this version)\n\n\n> Abstract: Interpretability has become an important issue as machine learning is\nincreasingly used to inform consequential decisions. We propose an approach for\ninterpreting a blackbox model by extracting a decision tree that approximates\nthe model. Our model extraction algorithm avoids overfitting by leveraging\nblackbox model access to actively sample new training points. We prove that as\nthe number of samples goes to infinity, the decision tree learned using our\nalgorithm converges to the exact greedy decision tree. In our evaluation, we\nuse our algorithm to interpret random forests and neural nets trained on\nseveral datasets from the UCI Machine Learning Repository, as well as control\npolicies learned for three classical reinforcement learning problems. We show\nthat our algorithm improves over a baseline based on CART on every problem\ninstance. Furthermore, we show how an interpretation generated by our approach\ncan be used to understand and debug these models.\n\n\n## [Interpretable & Explorable Approximations of Black Box Models](https://arxiv.org/abs/1707.01154)\n[(PDF)](https://arxiv.org/pdf/1707.01154)\n\n`Authors:Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec`\n\n\nComments:\n\nPresented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1707.01154 [cs.AI]\n\n \n(or arXiv:1707.01154v1 [cs.AI] for this version)\n\n\n> Abstract: We propose Black Box Explanations through Transparent Approximations (BETA),\na novel model agnostic framework for explaining the behavior of any black-box\nclassifier by simultaneously optimizing for fidelity to the original model and\ninterpretability of the explanation. To this end, we develop a novel objective\nfunction which allows us to learn (with optimality guarantees), a small number\nof compact decision sets each of which explains the behavior of the black box\nmodel in unambiguous, well-defined regions of feature space. Furthermore, our\nframework also is capable of accepting user input when generating these\napproximations, thus allowing users to interactively explore how the black-box\nmodel behaves in different subspaces that are of interest to the user. To the\nbest of our knowledge, this is the first approach which can produce global\nexplanations of the behavior of any given black box model through joint\noptimization of unambiguity, fidelity, and interpretability, while also\nallowing users to explore model behavior based on their preferences.\nExperimental evaluation with real-world datasets and user studies demonstrates\nthat our approach can generate highly compact, easy-to-understand, yet accurate\napproximations of various kinds of predictive models compared to\nstate-of-the-art baselines.\n\n\n## [Interpretability via Model Extraction](https://arxiv.org/abs/1706.09773)\n[(PDF)](https://arxiv.org/pdf/1706.09773)\n\n`Authors:Osbert Bastani, Carolyn Kim, Hamsa Bastani`\n\n\nComments:\n\nPresented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)\n\nSubjects:\n\nLearning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1706.09773 [cs.LG]\n\n \n(or arXiv:1706.09773v2 [cs.LG] for this version)\n\n\n> Abstract: The ability to interpret machine learning models has become increasingly\nimportant now that machine learning is used to inform consequential decisions.\nWe propose an approach called model extraction for interpreting complex,\nblackbox models. Our approach approximates the complex model using a much more\ninterpretable model; as long as the approximation quality is good, then\nstatistical properties of the complex model are reflected in the interpretable\nmodel. We show how model extraction can be used to understand and debug random\nforests and neural nets trained on several datasets from the UCI Machine\nLearning Repository, as well as control policies learned for several classical\nreinforcement learning problems.\n\n\n## [Methods for Interpreting and Understanding Deep Neural Networks](https://arxiv.org/abs/1706.07979)\n[(PDF)](https://arxiv.org/pdf/1706.07979)\n\n`Authors:Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller`\n\n\nComments:\n\n14 pages, 10 figures\n\nSubjects:\n\nLearning (cs.LG); Machine Learning (stat.ML)\n\n\nDOI:\n\n10.1016/j.dsp.2017.10.011\n\n\nCite as:\n\narXiv:1706.07979 [cs.LG]\n\n \n(or arXiv:1706.07979v1 [cs.LG] for this version)\n\n\n> Abstract: This paper provides an entry point to the problem of interpreting a deep\nneural network model and explaining its predictions. It is based on a tutorial\ngiven at ICASSP 2017. It introduces some recently proposed techniques of\ninterpretation, along with theory, tricks and recommendations, to make most\nefficient use of these techniques on real data. It also discusses a number of\npractical applications.\n\n\n## [MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis  Network](https://arxiv.org/abs/1707.02485)\n[(PDF)](https://arxiv.org/pdf/1707.02485)\n\n`Authors:Zizhao Zhang, Yuanpu Xie, Fuyong Xing, Mason McGough, Lin Yang`\n\n\nComments:\n\nCVPR2017 Oral\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1707.02485 [cs.CV]\n\n \n(or arXiv:1707.02485v1 [cs.CV] for this version)\n\n\n> Abstract: The inability to interpret the model prediction in semantically and visually\nmeaningful ways is a well-known shortcoming of most existing computer-aided\ndiagnosis methods. In this paper, we propose MDNet to establish a direct\nmultimodal mapping between medical images and diagnostic reports that can read\nimages, generate diagnostic reports, retrieve images by symptom descriptions,\nand visualize attention, to provide justifications of the network diagnosis\nprocess. MDNet includes an image model and a language model. The image model is\nproposed to enhance multi-scale feature ensembles and utilization efficiency.\nThe language model, integrated with our improved attention mechanism, aims to\nread and explore discriminative image feature descriptions from reports to\nlearn a direct mapping from sentence words to image pixels. The overall network\nis trained end-to-end by using our developed optimization strategy. Based on a\npathology bladder cancer images and its diagnostic reports (BCIDR) dataset, we\nconduct sufficient experiments to demonstrate that MDNet outperforms\ncomparative baselines. The proposed image model obtains state-of-the-art\nperformance on two CIFAR datasets as well.\n\n\n## [A Formal Framework to Characterize Interpretability of Procedures](https://arxiv.org/abs/1707.03886)\n[(PDF)](https://arxiv.org/pdf/1707.03886)\n\n`Authors:Amit Dhurandhar, Vijay Iyengar, Ronny Luss, Karthikeyan Shanmugam`\n\n\nComments:\n\npresented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1707.03886 [cs.AI]\n\n \n(or arXiv:1707.03886v1 [cs.AI] for this version)\n\n\n> Abstract: We provide a novel notion of what it means to be interpretable, looking past\nthe usual association with human understanding. Our key insight is that\ninterpretability is not an absolute concept and so we define it relative to a\ntarget model, which may or may not be a human. We define a framework that\nallows for comparing interpretable procedures by linking it to important\npractical aspects such as accuracy and robustness. We characterize many of the\ncurrent state-of-the-art interpretable methods in our framework portraying its\ngeneral applicability.\n\n\n## [Interpreting Classifiers through Attribute Interactions in Datasets](https://arxiv.org/abs/1707.07576)\n[(PDF)](https://arxiv.org/pdf/1707.07576)\n\n`Authors:Andreas Henelius, Kai Puolamäki, Antti Ukkonen`\n\n\nComments:\n\npresented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1707.07576 [stat.ML]\n\n \n(or arXiv:1707.07576v1 [stat.ML] for this version)\n\n\n> Abstract: In this work we present the novel ASTRID method for investigating which\nattribute interactions classifiers exploit when making predictions. Attribute\ninteractions in classification tasks mean that two or more attributes together\nprovide stronger evidence for a particular class label. Knowledge of such\ninteractions makes models more interpretable by revealing associations between\nattributes. This has applications, e.g., in pharmacovigilance to identify\ninteractions between drugs or in bioinformatics to investigate associations\nbetween single nucleotide polymorphisms. We also show how the found attribute\npartitioning is related to a factorisation of the data generating distribution\nand empirically demonstrate the utility of the proposed method.\n\n\n## [Interpretable Active Learning](https://arxiv.org/abs/1708.00049)\n[(PDF)](https://arxiv.org/pdf/1708.00049)\n\n`Authors:Richard L. Phillips, Kyu Hyun Chang, Sorelle A. Friedler`\n\n\nComments:\n\n6 pages, 5 figures, presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1708.00049 [stat.ML]\n\n \n(or arXiv:1708.00049v1 [stat.ML] for this version)\n\n\n> Abstract: Active learning has long been a topic of study in machine learning. However,\nas increasingly complex and opaque models have become standard practice, the\nprocess of active learning, too, has become more opaque. There has been little\ninvestigation into interpreting what specific trends and patterns an active\nlearning strategy may be exploring. This work expands on the Local\nInterpretable Model-agnostic Explanations framework (LIME) to provide\nexplanations for active learning recommendations. We demonstrate how LIME can\nbe used to generate locally faithful explanations for an active learning\nstrategy, and how these explanations can be used to understand how different\nmodels and datasets explore a problem space over time. In order to quantify the\nper-subgroup differences in how an active learning strategy queries spatial\nregions, we introduce a notion of uncertainty bias (based on disparate impact)\nto measure the discrepancy in the confidence for a model's predictions between\none subgroup and another. Using the uncertainty bias measure, we show that our\nquery explanations accurately reflect the subgroup focus of the active learning\nqueries, allowing for an interpretable explanation of what is being learned as\npoints with similar sources of uncertainty have their uncertainty bias\nresolved. We demonstrate that this technique can be applied to track\nuncertainty bias over user-defined clusters or automatically generated clusters\nbased on the source of uncertainty.\n\n\n## [Using Program Induction to Interpret Transition System Dynamics](https://arxiv.org/abs/1708.00376)\n[(PDF)](https://arxiv.org/pdf/1708.00376)\n\n`Authors:Svetlin Penkov, Subramanian Ramamoorthy`\n\n\nComments:\n\nPresented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia. arXiv admin note: substantial text overlap with arXiv:1705.08320\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1708.00376 [cs.AI]\n\n \n(or arXiv:1708.00376v1 [cs.AI] for this version)\n\n\n> Abstract: Explaining and reasoning about processes which underlie observed black-box\nphenomena enables the discovery of causal mechanisms, derivation of suitable\nabstract representations and the formulation of more robust predictions. We\npropose to learn high level functional programs in order to represent abstract\nmodels which capture the invariant structure in the observed data. We introduce\nthe $\\pi$-machine (program-induction machine) -- an architecture able to induce\ninterpretable LISP-like programs from observed data traces. We propose an\noptimisation procedure for program learning based on backpropagation, gradient\ndescent and A* search. We apply the proposed method to two problems: system\nidentification of dynamical systems and explaining the behaviour of a DQN\nagent. Our results show that the $\\pi$-machine can efficiently induce\ninterpretable programs from individual data traces.\n\n\n## [Warp: a method for neural network interpretability applied to gene  expression profiles](https://arxiv.org/abs/1708.04988)\n[(PDF)](https://arxiv.org/pdf/1708.04988)\n\n`Authors:Trofimov Assya, Lemieux Sebastien, Perreault Claude`\n\n\nComments:\n\n5 pages, 3 figures, NIPS2016, Machine Learning in Computational Biology workshop\n\nSubjects:\n\nGenomics (q-bio.GN); Artificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1708.04988 [q-bio.GN]\n\n \n(or arXiv:1708.04988v1 [q-bio.GN] for this version)\n\n\n> Abstract: We show a proof of principle for warping, a method to interpret the inner\nworking of neural networks in the context of gene expression analysis. Warping\nis an efficient way to gain insight to the inner workings of neural nets and\nmake them more interpretable. We demonstrate the ability of warping to recover\nmeaningful information for a given class on a samplespecific individual basis.\nWe found warping works well in both linearly and nonlinearly separable\ndatasets. These encouraging results show that warping has a potential to be the\nanswer to neural networks interpretability in computational biology.\n\n\n## [DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation  of Self-Reported Pain](https://arxiv.org/abs/1708.04670)\n[(PDF)](https://arxiv.org/pdf/1708.04670)\n\n`Authors:Dianbo Liu, Fengjiao Peng, Andrew Shea, Ognjen (Oggi)Rudovic, Rosalind Picard`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1708.04670 [cs.CV]\n\n \n(or arXiv:1708.04670v1 [cs.CV] for this version)\n\n\n> Abstract: Previous research on automatic pain estimation from facial expressions has\nfocused primarily on \"one-size-fits-all\" metrics (such as PSPI). In this work,\nwe focus on directly estimating each individual's self-reported visual-analog\nscale (VAS) pain metric, as this is considered the gold standard for pain\nmeasurement. The VAS pain score is highly subjective and context-dependent, and\nits range can vary significantly among different persons. To tackle these\nissues, we propose a novel two-stage personalized model, named DeepFaceLIFT,\nfor automatic estimation of VAS. This model is based on (1) Neural Network and\n(2) Gaussian process regression models, and is used to personalize the\nestimation of self-reported pain via a set of hand-crafted personal features\nand multi-task learning. We show on the benchmark dataset for pain analysis\n(The UNBC-McMaster Shoulder Pain Expression Archive) that the proposed\npersonalized model largely outperforms the traditional, unpersonalized models:\nthe intra-class correlation improves from a baseline performance of 19\\% to a\npersonalized performance of 35\\% while also providing confidence in the\nmodel\\textquotesingle s estimates -- in contrast to existing models for the\ntarget task. Additionally, DeepFaceLIFT automatically discovers the\npain-relevant facial regions for each person, allowing for an easy\ninterpretation of the pain-related facial cues.\n\n\n## [Towards Interpretable Deep Neural Networks by Leveraging Adversarial  Examples](https://arxiv.org/abs/1708.05493)\n[(PDF)](https://arxiv.org/pdf/1708.05493)\n\n`Authors:Yinpeng Dong, Hang Su, Jun Zhu, Fan Bao`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1708.05493 [cs.CV]\n\n \n(or arXiv:1708.05493v1 [cs.CV] for this version)\n\n\n> Abstract: Deep neural networks (DNNs) have demonstrated impressive performance on a\nwide array of tasks, but they are usually considered opaque since internal\nstructure and learned parameters are not interpretable. In this paper, we\nre-examine the internal representations of DNNs using adversarial images, which\nare generated by an ensemble-optimization algorithm. We find that: (1) the\nneurons in DNNs do not truly detect semantic objects/parts, but respond to\nobjects/parts only as recurrent discriminative patches; (2) deep visual\nrepresentations are not robust distributed codes of visual concepts because the\nrepresentations of adversarial images are largely not consistent with those of\nreal images, although they have similar visual appearance, both of which are\ndifferent from previous findings. To further improve the interpretability of\nDNNs, we propose an adversarial training scheme with a consistent loss such\nthat the neurons are endowed with human-interpretable concepts. The induced\ninterpretable representations enable us to trace eventual outcomes back to\ninfluential neurons. Therefore, human users can know how the models make\npredictions, as well as when and why they make errors.\n\n\n## [More cat than cute? Interpretable Prediction of Adjective-Noun Pairs](https://arxiv.org/abs/1708.06039)\n[(PDF)](https://arxiv.org/pdf/1708.06039)\n\n`Authors:Delia Fernandez, Alejandro Woodward, Victor Campos, Xavier Giro-i-Nieto, Brendan Jou, Shih-Fu Chang`\n\n\nComments:\n\nOral paper at ACM Multimedia 2017 Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes (MUSA2)\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)\n\n\nDOI:\n\n10.1145/3132515.3132520\n\n\nCite as:\n\narXiv:1708.06039 [cs.CV]\n\n \n(or arXiv:1708.06039v1 [cs.CV] for this version)\n\n\n> Abstract: The increasing availability of affect-rich multimedia resources has bolstered\ninterest in understanding sentiment and emotions in and from visual content.\nAdjective-noun pairs (ANP) are a popular mid-level semantic construct for\ncapturing affect via visually detectable concepts such as \"cute dog\" or\n\"beautiful landscape\". Current state-of-the-art methods approach ANP prediction\nby considering each of these compound concepts as individual tokens, ignoring\nthe underlying relationships in ANPs. This work aims at disentangling the\ncontributions of the `adjectives' and `nouns' in the visual prediction of ANPs.\nTwo specialised classifiers, one trained for detecting adjectives and another\nfor nouns, are fused to predict 553 different ANPs. The resulting ANP\nprediction model is more interpretable as it allows us to study contributions\nof the adjective and noun components. Source code and models are available at\nthis https URL .\n\n\n## [Interpretable Categorization of Heterogeneous Time Series Data](https://arxiv.org/abs/1708.09121)\n[(PDF)](https://arxiv.org/pdf/1708.09121)\n\n`Authors:Ritchie Lee, Mykel J. Kochenderfer, Ole J. Mengshoel, Joshua Silbermann`\n\n\nComments:\n\n10 pages, 7 figures\n\nSubjects:\n\nLearning (cs.LG)\n\n\nCite as:\n\narXiv:1708.09121 [cs.LG]\n\n \n(or arXiv:1708.09121v1 [cs.LG] for this version)\n\n\n> Abstract: The explanation of heterogeneous multivariate time series data is a central\nproblem in many applications. The problem requires two major data mining\nchallenges to be addressed simultaneously: Learning models that are\nhuman-interpretable and mining of heterogeneous multivariate time series data.\nThe intersection of these two areas is not adequately explored in the existing\nliterature. To address this gap, we propose grammar-based decision trees and an\nalgorithm for learning them. Grammar-based decision tree extends decision trees\nwith a grammar framework. Logical expressions, derived from context-free\ngrammar, are used for branching in place of simple thresholds on attributes.\nThe added expressivity enables support for a wide range of data types while\nretaining the interpretability of decision trees. By choosing a grammar based\non temporal logic, we show that grammar-based decision trees can be used for\nthe interpretable classification of high-dimensional and heterogeneous time\nseries data. In addition to classification, we show how grammar-based decision\ntrees can also be used for categorization, which is a combination of clustering\nand generating interpretable explanations for each cluster. We apply\ngrammar-based decision trees to analyze the classic Australian Sign Language\ndataset as well as categorize and explain near mid-air collisions to support\nthe development of a prototype aircraft collision avoidance system.\n\n\n## [Explainable Artificial Intelligence: Understanding, Visualizing and  Interpreting Deep Learning Models](https://arxiv.org/abs/1708.08296)\n[(PDF)](https://arxiv.org/pdf/1708.08296)\n\n`Authors:Wojciech Samek, Thomas Wiegand, Klaus-Robert Müller`\n\n\nComments:\n\n8 pages, 2 figures\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Computers and Society (cs.CY); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1708.08296 [cs.AI]\n\n \n(or arXiv:1708.08296v1 [cs.AI] for this version)\n\n\n> Abstract: With the availability of large databases and recent improvements in deep\nlearning methodology, the performance of AI systems is reaching or even\nexceeding the human level on an increasing number of complex tasks. Impressive\nexamples of this development can be found in domains such as image\nclassification, sentiment analysis, speech understanding or strategic game\nplaying. However, because of their nested non-linear structure, these highly\nsuccessful machine learning and artificial intelligence models are usually\napplied in a black box manner, i.e., no information is provided about what\nexactly makes them arrive at their predictions. Since this lack of transparency\ncan be a major drawback, e.g., in medical applications, the development of\nmethods for visualizing, explaining and interpreting deep learning models has\nrecently attracted increasing attention. This paper summarizes recent\ndevelopments in this field and makes a plea for more interpretability in\nartificial intelligence. Furthermore, it presents two approaches to explaining\npredictions of deep learning models, one method which computes the sensitivity\nof the prediction with respect to changes in the input and one approach which\nmeaningfully decomposes the decision in terms of the input variables. These\nmethods are evaluated on three classification tasks.\n\n\n## [Interpreting Shared Deep Learning Models via Explicable Boundary Trees](https://arxiv.org/abs/1709.03730)\n[(PDF)](https://arxiv.org/pdf/1709.03730)\n\n`Authors:Huijun Wu, Chen Wang, Jie Yin, Kai Lu, Liming Zhu`\n\n\nComments:\n\n9 pages, 10 figures\n\nSubjects:\n\nLearning (cs.LG); Human-Computer Interaction (cs.HC)\n\n\nCite as:\n\narXiv:1709.03730 [cs.LG]\n\n \n(or arXiv:1709.03730v1 [cs.LG] for this version)\n\n\n> Abstract: Despite outperforming the human in many tasks, deep neural network models are\nalso criticized for the lack of transparency and interpretability in decision\nmaking. The opaqueness results in uncertainty and low confidence when deploying\nsuch a model in model sharing scenarios, when the model is developed by a third\nparty. For a supervised machine learning model, sharing training process\nincluding training data provides an effective way to gain trust and to better\nunderstand model predictions. However, it is not always possible to share all\ntraining data due to privacy and policy constraints. In this paper, we propose\na method to disclose a small set of training data that is just sufficient for\nusers to get the insight of a complicated model. The method constructs a\nboundary tree using selected training data and the tree is able to approximate\nthe complicated model with high fidelity. We show that traversing data points\nin the tree gives users significantly better understanding of the model and\npaves the way for trustworthy model sharing.\n\n\n## [Interpretable Graph-Based Semi-Supervised Learning via Flows](https://arxiv.org/abs/1709.04764)\n[(PDF)](https://arxiv.org/pdf/1709.04764)\n\n`Authors:Raif M. Rustamov, James T. Klosowski`\n\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1709.04764 [stat.ML]\n\n \n(or arXiv:1709.04764v1 [stat.ML] for this version)\n\n\n> Abstract: In this paper, we consider the interpretability of the foundational\nLaplacian-based semi-supervised learning approaches on graphs. We introduce a\nnovel flow-based learning framework that subsumes the foundational approaches\nand additionally provides a detailed, transparent, and easily understood\nexpression of the learning process in terms of graph flows. As a result, one\ncan visualize and interactively explore the precise subgraph along which the\ninformation from labeled nodes flows to an unlabeled node of interest.\nSurprisingly, the proposed framework avoids trading accuracy for\ninterpretability, but in fact leads to improved prediction accuracy, which is\nsupported both by theoretical considerations and empirical results. The\nflow-based framework guarantees the maximum principle by construction and can\nhandle directed graphs in an out-of-the-box manner.\n\n\n## [Unsupervised Learning of Disentangled and Interpretable Representations  from Sequential Data](https://arxiv.org/abs/1709.07902)\n[(PDF)](https://arxiv.org/pdf/1709.07902)\n\n`Authors:Wei-Ning Hsu, Yu Zhang, James Glass`\n\n\nComments:\n\nAccepted to NIPS 2017\n\nSubjects:\n\nLearning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1709.07902 [cs.LG]\n\n \n(or arXiv:1709.07902v1 [cs.LG] for this version)\n\n\n> Abstract: We present a factorized hierarchical variational autoencoder, which learns\ndisentangled and interpretable representations from sequential data without\nsupervision. Specifically, we exploit the multi-scale nature of information in\nsequential data by formulating it explicitly within a factorized hierarchical\ngraphical model that imposes sequence-dependent priors and sequence-independent\npriors to different sets of latent variables. The model is evaluated on two\nspeech corpora to demonstrate, qualitatively, its ability to transform speakers\nor linguistic content by manipulating different sets of latent variables; and\nquantitatively, its ability to outperform an i-vector baseline for speaker\nverification and reduce the word error rate by as much as 35% in mismatched\ntrain/test scenarios for automatic speech recognition tasks.\n\n\n## [CTD: Fast, Accurate, and Interpretable Method for Static and Dynamic  Tensor Decompositions](https://arxiv.org/abs/1710.03608)\n[(PDF)](https://arxiv.org/pdf/1710.03608)\n\n`Authors:Jungwoo Lee, Dongjin Choi, Lee Sael`\n\n\nSubjects:\n\nNumerical Analysis (cs.NA); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1710.03608 [cs.NA]\n\n \n(or arXiv:1710.03608v1 [cs.NA] for this version)\n\n\n> Abstract: How can we find patterns and anomalies in a tensor, or multi-dimensional\narray, in an efficient and directly interpretable way? How can we do this in an\nonline environment, where a new tensor arrives each time step? Finding patterns\nand anomalies in a tensor is a crucial problem with many applications,\nincluding building safety monitoring, patient health monitoring, cyber\nsecurity, terrorist detection, and fake user detection in social networks.\nStandard PARAFAC and Tucker decomposition results are not directly\ninterpretable. Although a few sampling-based methods have previously been\nproposed towards better interpretability, they need to be made faster, more\nmemory efficient, and more accurate.\nIn this paper, we propose CTD, a fast, accurate, and directly interpretable\ntensor decomposition method based on sampling. CTD-S, the static version of\nCTD, provably guarantees a high accuracy that is 17 ~ 83x more accurate than\nthat of the state-of-the-art method. Also, CTD-S is made 5 ~ 86x faster, and 7\n~ 12x more memory-efficient than the state-of-the-art method by removing\nredundancy. CTD-D, the dynamic version of CTD, is the first interpretable\ndynamic tensor decomposition method ever proposed. Also, it is made 2 ~ 3x\nfaster than already fast CTD-S by exploiting factors at previous time step and\nby reordering operations. With CTD, we demonstrate how the results can be\neffectively interpreted in the online distributed denial of service (DDoS)\nattack detection.\n\n\n## [Interpretable Convolutional Neural Networks](https://arxiv.org/abs/1710.00935)\n[(PDF)](https://arxiv.org/pdf/1710.00935)\n\n`Authors:Quanshi Zhang, Ying Nian Wu, Song-Chun Zhu`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1710.00935 [cs.CV]\n\n \n(or arXiv:1710.00935v3 [cs.CV] for this version)\n\n\n> Abstract: This paper proposes a method to modify traditional convolutional neural\nnetworks (CNNs) into interpretable CNNs, in order to clarify knowledge\nrepresentations in high conv-layers of CNNs. In an interpretable CNN, each\nfilter in a high conv-layer represents a certain object part. We do not need\nany annotations of object parts or textures to supervise the learning process.\nInstead, the interpretable CNN automatically assigns each filter in a high\nconv-layer with an object part during the learning process. Our method can be\napplied to different types of CNNs with different structures. The clear\nknowledge representation in an interpretable CNN can help people understand the\nlogics inside a CNN, i.e., based on which patterns the CNN makes the decision.\nExperiments showed that filters in an interpretable CNN were more semantically\nmeaningful than those in traditional CNNs.\n\n\n## [Interpretable Machine Learning for Privacy-Preserving Pervasive Systems](https://arxiv.org/abs/1710.08464)\n[(PDF)](https://arxiv.org/pdf/1710.08464)\n\n`Authors:Benjamin Baron, Mirco Musolesi`\n\n\nSubjects:\n\nMachine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1710.08464 [stat.ML]\n\n \n(or arXiv:1710.08464v3 [stat.ML] for this version)\n\n\n> Abstract: The presence of pervasive systems in our everyday lives and the interaction\nof users with connected devices such as smartphones or home appliances generate\nincreasing amounts of traces that reflect users' behavior. A plethora of\nmachine learning techniques enable service providers to process these traces to\nextract latent information about the users. While most of the existing projects\nhave focused on the accuracy of these techniques, little work has been done on\nthe interpretation of the inference and identification algorithms based on\nthem. In this paper, we propose a machine learning interpretability framework\nfor inference algorithms based on data collected through pervasive systems and\nwe outline the open challenges in this research area. Our interpretability\nframework enable users to understand how the traces they generate could expose\ntheir privacy, while allowing for usable and personalized services at the same\ntime.\n\n\n## [InterpNET: Neural Introspection for Interpretable Deep Learning](https://arxiv.org/abs/1710.09511)\n[(PDF)](https://arxiv.org/pdf/1710.09511)\n\n`Authors:Shane Barratt`\n\n\nComments:\n\nPresented at NIPS 2017 Symposium on Interpretable Machine Learning\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1710.09511 [stat.ML]\n\n \n(or arXiv:1710.09511v2 [stat.ML] for this version)\n\n\n> Abstract: Humans are able to explain their reasoning. On the contrary, deep neural\nnetworks are not. This paper attempts to bridge this gap by introducing a new\nway to design interpretable neural networks for classification, inspired by\nphysiological evidence of the human visual system's inner-workings. This paper\nproposes a neural network design paradigm, termed InterpNET, which can be\ncombined with any existing classification architecture to generate natural\nlanguage explanations of the classifications. The success of the module relies\non the assumption that the network's computation and reasoning is represented\nin its internal layer activations. While in principle InterpNET could be\napplied to any existing classification architecture, it is evaluated via an\nimage classification and explanation task. Experiments on a CUB bird\nclassification and explanation dataset show qualitatively and quantitatively\nthat the model is able to generate high-quality explanations. While the current\nstate-of-the-art METEOR score on this dataset is 29.2, InterpNET achieves a\nmuch higher METEOR score of 37.9.\n\n\n## [MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural  Networks](https://arxiv.org/abs/1711.06788)\n[(PDF)](https://arxiv.org/pdf/1711.06788)\n\n`Authors:Minmin Chen`\n\n\nComments:\n\nPresented at NIPS 2017 Symposium on Interpretable Machine Learning\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1711.06788 [stat.ML]\n\n \n(or arXiv:1711.06788v1 [stat.ML] for this version)\n\n\n> Abstract: We introduce MinimalRNN, a new recurrent neural network architecture that\nachieves comparable performance as the popular gated RNNs with a simplified\nstructure. It employs minimal updates within RNN, which not only leads to\nefficient learning and testing but more importantly better interpretability and\ntrainability. We demonstrate that by endorsing the more restrictive update\nrule, MinimalRNN learns disentangled RNN states. We further examine the\nlearning dynamics of different RNN structures using input-output Jacobians, and\nshow that MinimalRNN is able to capture longer range dependencies than existing\nRNN architectures.\n\n\n## [Beyond Sparsity: Tree Regularization of Deep Models for Interpretability](https://arxiv.org/abs/1711.06178)\n[(PDF)](https://arxiv.org/pdf/1711.06178)\n\n`Authors:Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez`\n\n\nComments:\n\nTo appear in AAAI 2018. Contains 9-page main paper and appendix with supplementary material\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1711.06178 [stat.ML]\n\n \n(or arXiv:1711.06178v1 [stat.ML] for this version)\n\n\n> Abstract: The lack of interpretability remains a key barrier to the adoption of deep\nmodels in many applications. In this work, we explicitly regularize deep models\nso human users might step through the process behind their predictions in\nlittle time. Specifically, we train deep time-series models so their\nclass-probability predictions have high accuracy while being closely modeled by\ndecision trees with few nodes. Using intuitive toy examples as well as medical\ntasks for treating sepsis and HIV, we demonstrate that this new tree\nregularization yields models that are easier for humans to simulate than\nsimpler L1 or L2 penalties without sacrificing predictive power.\n\n\n## [The Promise and Peril of Human Evaluation for Model Interpretability](https://arxiv.org/abs/1711.07414)\n[(PDF)](https://arxiv.org/pdf/1711.07414)\n\n`Authors:Bernease Herman`\n\n\nComments:\n\nPresented at NIPS 2017 Symposium on Interpretable Machine Learning\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1711.07414 [cs.AI]\n\n \n(or arXiv:1711.07414v1 [cs.AI] for this version)\n\n\n> Abstract: Transparency, user trust, and human comprehension are popular ethical\nmotivations for interpretable machine learning. In support of these goals,\nresearchers evaluate model explanation performance using humans and real world\napplications. This alone presents a challenge in many areas of artificial\nintelligence. In this position paper, we propose a distinction between\ndescriptive and persuasive explanations. We discuss reasoning suggesting that\nfunctional interpretability may be correlated with cognitive function and user\npreferences. If this is indeed the case, evaluation and optimization using\nfunctional metrics could perpetuate implicit cognitive bias in explanations\nthat threaten transparency. Finally, we propose two potential research\ndirections to disambiguate cognitive function and explanation models, retaining\ncontrol over the tradeoff between accuracy and interpretability.\n\n\n## [Unleashing the Potential of CNNs for Interpretable Few-Shot Learning](https://arxiv.org/abs/1711.08277)\n[(PDF)](https://arxiv.org/pdf/1711.08277)\n\n`Authors:Boyang Deng, Qing Liu, Siyuan Qiao, Alan Yuille`\n\n\nComments:\n\nUnder review as a conference paper at ICLR 2018\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1711.08277 [cs.CV]\n\n \n(or arXiv:1711.08277v1 [cs.CV] for this version)\n\n\n> Abstract: Convolutional neural networks (CNNs) have been generally acknowledged as one\nof the driving forces for the advancement of computer vision. Despite their\npromising performances on many tasks, CNNs still face major obstacles on the\nroad to achieving ideal machine intelligence. One is the difficulty of\ninterpreting them and understanding their inner workings, which is important\nfor diagnosing their failures and correcting them. Another is that standard\nCNNs require large amounts of annotated data, which is sometimes very hard to\nobtain. Hence, it is desirable to enable them to learn from few examples. In\nthis work, we address these two limitations of CNNs by developing novel and\ninterpretable models for few-shot learning. Our models are based on the idea of\nencoding objects in terms of visual concepts, which are interpretable visual\ncues represented within CNNs. We first use qualitative visualizations and\nquantitative statistics, to uncover several key properties of feature encoding\nusing visual concepts. Motivated by these properties, we present two intuitive\nmodels for the problem of few-shot learning. Experiments show that our models\nachieve competitive performances, while being much more flexible and\ninterpretable than previous state-of-the-art few-shot learning methods. We\nconclude that visual concepts expose the natural capability of CNNs for\nfew-shot learning.\n\n\n## [Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action  Recognition](https://arxiv.org/abs/1711.08502)\n[(PDF)](https://arxiv.org/pdf/1711.08502)\n\n`Authors:Jingxuan Hou, Tae Soo Kim, Austin Reiter`\n\n\nComments:\n\n8 pages, 8 figures, CVPR18 submission\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1711.08502 [cs.CV]\n\n \n(or arXiv:1711.08502v1 [cs.CV] for this version)\n\n\n> Abstract: Despite the growing discriminative capabilities of modern deep learning\nmethods for recognition tasks, the inner workings of the state-of-art models\nstill remain mostly black-boxes. In this paper, we propose a systematic\ninterpretation of model parameters and hidden representations of Residual\nTemporal Convolutional Networks (Res-TCN) for action recognition in time-series\ndata. We also propose a Feature Map Decoder as part of the interpretation\nanalysis, which outputs a representation of model's hidden variables in the\nsame domain as the input. Such analysis empowers us to expose model's\ncharacteristic learning patterns in an interpretable way. For example, through\nthe diagnosis analysis, we discovered that our model has learned to achieve\nview-point invariance by implicitly learning to perform rotational\nnormalization of the input to a more discriminative view. Based on the findings\nfrom the model interpretation analysis, we propose a targeted refinement\ntechnique, which can generalize to various other recognition models. The\nproposed work introduces a three-stage paradigm for model learning: training,\ninterpretable diagnosis and targeted refinement. We validate our approach on\nskeleton based 3D human action recognition benchmark of NTU RGB+D. We show that\nthe proposed workflow is an effective model learning strategy and the resulting\nMulti-stream Residual Temporal Convolutional Network (MS-Res-TCN) achieves the\nstate-of-the-art performance on NTU RGB+D.\n\n\n## [SPINE: SParse Interpretable Neural Embeddings](https://arxiv.org/abs/1711.08792)\n[(PDF)](https://arxiv.org/pdf/1711.08792)\n\n`Authors:Anant Subramanian, Danish Pruthi, Harsh Jhamtani, Taylor Berg-Kirkpatrick, Eduard Hovy`\n\n\nComments:\n\nAAAI 2018\n\nSubjects:\n\nComputation and Language (cs.CL)\n\n\nCite as:\n\narXiv:1711.08792 [cs.CL]\n\n \n(or arXiv:1711.08792v1 [cs.CL] for this version)\n\n\n> Abstract: Prediction without justification has limited utility. Much of the success of\nneural models can be attributed to their ability to learn rich, dense and\nexpressive representations. While these representations capture the underlying\ncomplexity and latent trends in the data, they are far from being\ninterpretable. We propose a novel variant of denoising k-sparse autoencoders\nthat generates highly efficient and interpretable distributed word\nrepresentations (word embeddings), beginning with existing word representations\nfrom state-of-the-art methods like GloVe and word2vec. Through large scale\nhuman evaluation, we report that our resulting word embedddings are much more\ninterpretable than the original GloVe and word2vec embeddings. Moreover, our\nembeddings outperform existing popular word embeddings on a diverse suite of\nbenchmark downstream tasks.\n\n\n## [Improving the Adversarial Robustness and Interpretability of Deep Neural  Networks by Regularizing their Input Gradients](https://arxiv.org/abs/1711.09404)\n[(PDF)](https://arxiv.org/pdf/1711.09404)\n\n`Authors:Andrew Slavin Ross, Finale Doshi-Velez`\n\n\nComments:\n\nTo appear in AAAI 2018\n\nSubjects:\n\nLearning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1711.09404 [cs.LG]\n\n \n(or arXiv:1711.09404v1 [cs.LG] for this version)\n\n\n> Abstract: Deep neural networks have proven remarkably effective at solving many\nclassification problems, but have been criticized recently for two major\nweaknesses: the reasons behind their predictions are uninterpretable, and the\npredictions themselves can often be fooled by small adversarial perturbations.\nThese problems pose major obstacles for the adoption of neural networks in\ndomains that require security or transparency. In this work, we evaluate the\neffectiveness of defenses that differentiably penalize the degree to which\nsmall changes in inputs can alter model predictions. Across multiple attacks,\narchitectures, defenses, and datasets, we find that neural networks trained\nwith this input gradient regularization exhibit robustness to transferred\nadversarial examples generated to fool all of the other models. We also find\nthat adversarial examples generated to fool gradient-regularized models fool\nall other models equally well, and actually lead to more \"legitimate,\"\ninterpretable misclassifications as rated by people (which we confirm in a\nhuman subject experiment). Finally, we demonstrate that regularizing input\ngradients makes them more naturally interpretable as rationales for model\npredictions. We conclude by discussing this relationship between\ninterpretability and robustness in deep neural networks.\n\n\n## [Interpretable Convolutional Neural Networks for Effective Translation  Initiation Site Prediction](https://arxiv.org/abs/1711.09558)\n[(PDF)](https://arxiv.org/pdf/1711.09558)\n\n`Authors:Jasper Zuallaert, Mijung Kim, Yvan Saeys, Wesley De Neve`\n\n\nComments:\n\nPresented at International Workshop on Deep Learning in Bioinformatics, Biomedicine, and Healthcare Informatics (DLB2H 2017) --- in conjunction with the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2017)\n\nSubjects:\n\nGenomics (q-bio.GN); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1711.09558 [q-bio.GN]\n\n \n(or arXiv:1711.09558v1 [q-bio.GN] for this version)\n\n\n> Abstract: Thanks to rapidly evolving sequencing techniques, the amount of genomic data\nat our disposal is growing increasingly large. Determining the gene structure\nis a fundamental requirement to effectively interpret gene function and\nregulation. An important part in that determination process is the\nidentification of translation initiation sites. In this paper, we propose a\nnovel approach for automatic prediction of translation initiation sites,\nleveraging convolutional neural networks that allow for automatic feature\nextraction. Our experimental results demonstrate that we are able to improve\nthe state-of-the-art approaches with a decrease of 75.2% in false positive rate\nand with a decrease of 24.5% in error rate on chosen datasets. Furthermore, an\nin-depth analysis of the decision-making process used by our predictive model\nshows that our neural network implicitly learns biologically relevant features\nfrom scratch, without any prior knowledge about the problem at hand, such as\nthe Kozak consensus sequence, the influence of stop and start codons in the\nsequence and the presence of donor splice site patterns. In summary, our\nfindings yield a better understanding of the internal reasoning of a\nconvolutional neural network when applying such a neural network to genomic\ndata.\n\n\n## [Interpretable Facial Relational Network Using Relational Importance](https://arxiv.org/abs/1711.10688)\n[(PDF)](https://arxiv.org/pdf/1711.10688)\n\n`Authors:Seong Tae Kim, Yong Man Ro`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1711.10688 [cs.CV]\n\n \n(or arXiv:1711.10688v1 [cs.CV] for this version)\n\n\n> Abstract: Human face analysis is an important task in computer vision. According to\ncognitive-psychological studies, facial dynamics could provide crucial cues for\nface analysis. In particular, the motion of facial local regions in facial\nexpression is related to the motion of other facial regions. In this paper, a\nnovel deep learning approach which exploits the relations of facial local\ndynamics has been proposed to estimate facial traits from expression sequence.\nIn order to exploit the relations of facial dynamics in local regions, the\nproposed network consists of a facial local dynamic feature encoding network\nand a facial relational network. The facial relational network is designed to\nbe interpretable. Relational importance is automatically encoded and facial\ntraits are estimated by combining relational features based on the relational\nimportance. The relations of facial dynamics for facial trait estimation could\nbe interpreted by using the relational importance. By comparative experiments,\nthe effectiveness of the proposed method has been validated. Experimental\nresults show that the proposed method outperforms the state-of-the-art methods\nin gender and age estimation.\n\n\n## [An interpretable latent variable model for attribute applicability in  the Amazon catalogue](https://arxiv.org/abs/1712.00126)\n[(PDF)](https://arxiv.org/pdf/1712.00126)\n\n`Authors:Tammo Rukat, Dustin Lange, Cédric Archambeau`\n\n\nComments:\n\nPresented at NIPS 2017 Symposium on Interpretable Machine Learning\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1712.00126 [stat.ML]\n\n \n(or arXiv:1712.00126v2 [stat.ML] for this version)\n\n\n> Abstract: Learning attribute applicability of products in the Amazon catalog (e.g.,\npredicting that a shoe should have a value for size, but not for battery-type\nat scale is a challenge. The need for an interpretable model is contingent on\n(1) the lack of ground truth training data, (2) the need to utilise prior\ninformation about the underlying latent space and (3) the ability to understand\nthe quality of predictions on new, unseen data. To this end, we develop the\nMaxMachine, a probabilistic latent variable model that learns distributed\nbinary representations, associated to sets of features that are likely to\nco-occur in the data. Layers of MaxMachines can be stacked such that higher\nlayers encode more abstract information. Any set of variables can be clamped to\nencode prior information. We develop fast sampling based posterior inference.\nPreliminary results show that the model improves over the baseline in 17 out of\n19 product groups and provides qualitatively reasonable predictions.\n\n\n## [Where Classification Fails, Interpretation Rises](https://arxiv.org/abs/1712.00558)\n[(PDF)](https://arxiv.org/pdf/1712.00558)\n\n`Authors:Chanh Nguyen, Georgi Georgiev, Yujie Ji, Ting Wang`\n\n\nComments:\n\n6 pages, 6 figures\n\nSubjects:\n\nLearning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1712.00558 [cs.LG]\n\n \n(or arXiv:1712.00558v1 [cs.LG] for this version)\n\n\n> Abstract: An intriguing property of deep neural networks is their inherent vulnerability to adversarial inputs, which significantly hinders their\napplication in security-critical domains. Most existing detection methods\nattempt to use carefully engineered patterns to distinguish adversarial inputs\nfrom their genuine counterparts, which however can often be circumvented by\nadaptive adversaries. In this work, we take a completely different route by\nleveraging the definition of adversarial inputs: while deceiving for deep\nneural networks, they are barely discernible for human visions. Building upon\nrecent advances in interpretable models, we construct a new detection framework\nthat contrasts an input's interpretation against its classification. We\nvalidate the efficacy of this framework through extensive experiments using\nbenchmark datasets and attacks. We believe that this work opens a new direction\nfor designing adversarial input detection methods.\n\n\n## [SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for  Predicting Chemical Properties](https://arxiv.org/abs/1712.02034)\n[(PDF)](https://arxiv.org/pdf/1712.02034)\n\n`Authors:Garrett B. Goh, Nathan O. Hodas, Charles Siegel, Abhinav Vishnu`\n\n\nSubjects:\n\nMachine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1712.02034 [stat.ML]\n\n \n(or arXiv:1712.02034v1 [stat.ML] for this version)\n\n\n> Abstract: Chemical databases store information in text representations, and the SMILES\nformat is a universal standard used in many cheminformatics software. Encoded\nin each SMILES string is structural information that can be used to predict\ncomplex chemical properties. In this work, we develop SMILES2Vec, a deep RNN\nthat automatically learns features from SMILES strings to predict chemical\nproperties, without the need for additional explicit chemical information, or\nthe \"grammar\" of how SMILES encode structural data. Using Bayesian optimization\nmethods to tune the network architecture, we show that an optimized SMILES2Vec\nmodel can serve as a general-purpose neural network for learning a range of\ndistinct chemical properties including toxicity, activity, solubility and\nsolvation energy, while outperforming contemporary MLP networks that uses\nengineered features. Furthermore, we demonstrate proof-of-concept of\ninterpretability by developing an explanation mask that localizes on the most\nimportant characters used in making a prediction. When tested on the solubility\ndataset, this localization identifies specific parts of a chemical that is\nconsistent with established first-principles knowledge of solubility with an\naccuracy of 88%, demonstrating that neural networks can learn technically\naccurate chemical concepts. The fact that SMILES2Vec validates established\nchemical facts, while providing state-of-the-art accuracy, makes it a potential\ntool for widespread adoption of interpretable deep learning by the chemistry\ncommunity.\n"
  },
  {
    "path": "arxiv.md",
    "content": "# Index\n1. Application and interpret\n    * [Interpretable Policies for Reinforcement Learning by Genetic Programming](#interpretable-policies-for-reinforcement-learning-by-genetic-programming)\n    * [Discovery Radiomics with CLEAR-DR: Interpretable Computer Aided  Diagnosis of Diabetic Retinopathy](#discovery-radiomics-with-clear-dr-interpretable-computer-aided--diagnosis-of-diabetic-retinopathy)\n    * [Building Data-driven Models with Microstructural Images: Generalization  and Interpretability](#building-data-driven-models-with-microstructural-images-generalization--and-interpretability)\n    * [Interpretable Feature Recommendation for Signal Analytics](#interpretable-feature-recommendation-for-signal-analytics)\n    * [Interpretable and Pedagogical Examples](#interpretable-and-pedagogical-examples)\n    * [Unsupervised patient representations from clinical notes with  interpretable classification decisions](#unsupervised-patient-representations-from-clinical-notes-with--interpretable-classification-decisions)\n    * [Interpretable probabilistic embeddings: bridging the gap between topic  models and neural networks](#interpretable-probabilistic-embeddings-bridging-the-gap-between-topic--models-and-neural-networks)\n    * [Arrhythmia Classification from the Abductive Interpretation of Short  Single-Lead ECG Records](#arrhythmia-classification-from-the-abductive-interpretation-of-short--single-lead-ecg-records)\n    * [Interpretable classifiers using rules and Bayesian analysis: Building a  better stroke prediction model](#interpretable-classifiers-using-rules-and-bayesian-analysis-building-a--better-stroke-prediction-model)\n    * [Interpretable Deep Neural Networks for Single-Trial EEG Classification](#interpretable-deep-neural-networks-for-single-trial-eeg-classification)\n    * [Learning Interpretable Musical Compositional Rules and Traces](#learning-interpretable-musical-compositional-rules-and-traces)\n    * [Building an Interpretable Recommender via Loss-Preserving Transformation](#building-an-interpretable-recommender-via-loss-preserving-transformation)\n    * [Interpretable Machine Learning Models for the Digital Clock Drawing Test](#interpretable-machine-learning-models-for-the-digital-clock-drawing-test)\n    * [RETAIN: An Interpretable Predictive Model for Healthcare using Reverse  Time Attention Mechanism](#retain-an-interpretable-predictive-model-for-healthcare-using-reverse--time-attention-mechanism)\n    * [Real Time Fine-Grained Categorization with Accuracy and Interpretability](#real-time-fine-grained-categorization-with-accuracy-and-interpretability)\n    * [Interpreting Neural Networks to Improve Politeness Comprehension](#interpreting-neural-networks-to-improve-politeness-comprehension)\n    * [Interpretable Semantic Textual Similarity: Finding and explaining  differences between sentences](#interpretable-semantic-textual-similarity-finding-and-explaining--differences-between-sentences)\n    * [Streaming Weak Submodularity: Interpreting Neural Networks on the Fly](#streaming-weak-submodularity-interpreting-neural-networks-on-the-fly)\n    * [Interpretable Learning for Self-Driving Cars by Visualizing Causal  Attention](#interpretable-learning-for-self-driving-cars-by-visualizing-causal--attention)\n    * [Interpretable 3D Human Action Analysis with Temporal Convolutional  Networks](#interpretable-3d-human-action-analysis-with-temporal-convolutional--networks)\n    * [An Interpretable Knowledge Transfer Model for Knowledge Base Completion](#an-interpretable-knowledge-transfer-model-for-knowledge-base-completion)\n    * [MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis  Network](#mdnet-a-semantically-and-visually-interpretable-medical-image-diagnosis--network)\n    * [Interpretable Active Learning](#interpretable-active-learning)\n    * [DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation  of Self-Reported Pain](#deepfacelift-interpretable-personalized-models-for-automatic-estimation--of-self-reported-pain)\n    * [More cat than cute? Interpretable Prediction of Adjective-Noun Pairs](#more-cat-than-cute-interpretable-prediction-of-adjective-noun-pairs)\n    * [Interpretable Categorization of Heterogeneous Time Series Data](#interpretable-categorization-of-heterogeneous-time-series-data)\n    * [Interpretable Graph-Based Semi-Supervised Learning via Flows](#interpretable-graph-based-semi-supervised-learning-via-flows)\n    * [CTD: Fast, Accurate, and Interpretable Method for Static and Dynamic  Tensor Decompositions](#ctd-fast-accurate-and-interpretable-method-for-static-and-dynamic--tensor-decompositions)\n    * [Interpretable Machine Learning for Privacy-Preserving Pervasive Systems](#interpretable-machine-learning-for-privacy-preserving-pervasive-systems)\n    * [Interpretable Convolutional Neural Networks for Effective Translation  Initiation Site Prediction](#interpretable-convolutional-neural-networks-for-effective-translation--initiation-site-prediction)\n    * [Interpretable Facial Relational Network Using Relational Importance](#interpretable-facial-relational-network-using-relational-importance)\n    * [SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for  Predicting Chemical Properties](#smiles2vec-an-interpretable-general-purpose-deep-neural-network-for--predicting-chemical-properties)\n    \n1. Determine interpretability of CNN\n    * [Network Dissection: Quantifying Interpretability of Deep Visual  Representations](#network-dissection-quantifying-interpretability-of-deep-visual--representations)\n    * [A Formal Framework to Characterize Interpretability of Procedures](#a-formal-framework-to-characterize-interpretability-of-procedures)\n    \n1. Criticize\n    * [Interpretation of Neural Networks is Fragile](#interpretation-of-neural-networks-is-fragile)\n    * [The Promise and Peril of Human Evaluation for Model Interpretability](#the-promise-and-peril-of-human-evaluation-for-model-interpretability)\n    \n1. Interpret existing model\n    * [Artificial Intelligence as Structural Estimation: Economic  Interpretations of Deep Blue, Bonanza, and AlphaGo](#artificial-intelligence-as-structural-estimation-economic--interpretations-of-deep-blue-bonanza-and-alphago)\n    * [Semantic Structure and Interpretability of Word Embeddings](#semantic-structure-and-interpretability-of-word-embeddings)\n    * [Interpreting Convolutional Neural Networks Through Compression](#interpreting-convolutional-neural-networks-through-compression)\n    * [Interpreting Deep Visual Representations via Network Dissection](#interpreting-deep-visual-representations-via-network-dissection)\n    * [The Mythos of Model Interpretability](#the-mythos-of-model-interpretability)\n    * [Model-Agnostic Interpretability of Machine Learning](#model-agnostic-interpretability-of-machine-learning)\n    * [Using Visual Analytics to Interpret Predictive Machine Learning Models](#using-visual-analytics-to-interpret-predictive-machine-learning-models)\n    * [Towards Transparent AI Systems: Interpreting Visual Question Answering  Models](#towards-transparent-ai-systems-interpreting-visual-question-answering--models)\n    * [Embedding Projector: Interactive Visualization and Interpretation of  Embeddings](#embedding-projector-interactive-visualization-and-interpretation-of--embeddings)\n    * [Learning Interpretability for Visualizations using Adapted Cox Models  through a User Experiment](#learning-interpretability-for-visualizations-using-adapted-cox-models--through-a-user-experiment)\n    * [Interpreting Finite Automata for Sequential Data](#interpreting-finite-automata-for-sequential-data)\n    * [Interpretation of Prediction Models Using the Input Gradient](#interpretation-of-prediction-models-using-the-input-gradient)\n    * [Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery](#interpretable-recurrent-neural-networks-using-sequential-sparse-recovery)\n    * [An unexpected unity among methods for interpreting model predictions](#an-unexpected-unity-among-methods-for-interpreting-model-predictions)\n    * [Towards A Rigorous Science of Interpretable Machine Learning](#towards-a-rigorous-science-of-interpretable-machine-learning)\n    * [Softmax Q-Distribution Estimation for Structured Prediction: A  Theoretical Interpretation for RAML](#softmax-q-distribution-estimation-for-structured-prediction-a--theoretical-interpretation-for-raml)\n    * [A Unified Approach to Interpreting Model Predictions](#a-unified-approach-to-interpreting-model-predictions)\n    * [Interpreting Blackbox Models via Model Extraction](#interpreting-blackbox-models-via-model-extraction)\n    * [Interpretable &amp; Explorable Approximations of Black Box Models](#interpretable--explorable-approximations-of-black-box-models)\n    * [Interpretability via Model Extraction](#interpretability-via-model-extraction)\n    * [Methods for Interpreting and Understanding Deep Neural Networks](#methods-for-interpreting-and-understanding-deep-neural-networks)\n    * [Interpreting Classifiers through Attribute Interactions in Datasets](#interpreting-classifiers-through-attribute-interactions-in-datasets)\n    * [Using Program Induction to Interpret Transition System Dynamics](#using-program-induction-to-interpret-transition-system-dynamics)\n    * [Warp: a method for neural network interpretability applied to gene  expression profiles](#warp-a-method-for-neural-network-interpretability-applied-to-gene--expression-profiles)\n    * [Towards Interpretable Deep Neural Networks by Leveraging Adversarial  Examples](#towards-interpretable-deep-neural-networks-by-leveraging-adversarial--examples)\n    * [Explainable Artificial Intelligence: Understanding, Visualizing and  Interpreting Deep Learning Models](#explainable-artificial-intelligence-understanding-visualizing-and--interpreting-deep-learning-models)\n    * [Interpreting Shared Deep Learning Models via Explicable Boundary Trees](#interpreting-shared-deep-learning-models-via-explicable-boundary-trees)\n    * [Unleashing the Potential of CNNs for Interpretable Few-Shot Learning](#unleashing-the-potential-of-cnns-for-interpretable-few-shot-learning)\n    * [Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action  Recognition](#train-diagnose-and-fix-interpretable-approach-for-fine-grained-action--recognition)\n    * [An interpretable latent variable model for attribute applicability in  the Amazon catalogue](#an-interpretable-latent-variable-model-for-attribute-applicability-in--the-amazon-catalogue)\n    * [Where Classification Fails, Interpretation Rises](#where-classification-fails-interpretation-rises)\n    \n1. Attempt to improve interpretability\n    * [Contextual Regression: An Accurate and Conveniently Interpretable  Nonlinear Model for Mining Discovery from Scientific Data](#contextual-regression-an-accurate-and-conveniently-interpretable--nonlinear-model-for-mining-discovery-from-scientific-data)\n    * [Interpretable R-CNN](#interpretable-r-cnn)\n    * [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](#increasing-the-interpretability-of-recurrent-neural-networks-using--hidden-markov-models)\n    * [SnapToGrid: From Statistical to Interpretable Models for Biomedical  Information Extraction](#snaptogrid-from-statistical-to-interpretable-models-for-biomedical--information-extraction)\n    * [Meaningful Models: Utilizing Conceptual Structure to Improve Machine  Learning Interpretability](#meaningful-models-utilizing-conceptual-structure-to-improve-machine--learning-interpretability)\n    * [Particle Swarm Optimization for Generating Interpretable Fuzzy  Reinforcement Learning Policies](#particle-swarm-optimization-for-generating-interpretable-fuzzy--reinforcement-learning-policies)\n    * [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](#increasing-the-interpretability-of-recurrent-neural-networks-using--hidden-markov-models-1)\n    * [Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning](#growing-interpretable-part-graphs-on-convnets-via-multi-shot-learning)\n    * [GENESIM: genetic extraction of a single, interpretable model](#genesim-genetic-extraction-of-a-single-interpretable-model)\n    * [Stratified Knowledge Bases as Interpretable Probabilistic Models  (Extended Abstract)](#stratified-knowledge-bases-as-interpretable-probabilistic-models--extended-abstract)\n    * [Tree Space Prototypes: Another Look at Making Tree Ensembles  Interpretable](#tree-space-prototypes-another-look-at-making-tree-ensembles--interpretable)\n    * [Inducing Interpretable Representations with Variational Autoencoders](#inducing-interpretable-representations-with-variational-autoencoders)\n    * [Input Switched Affine Networks: An RNN Architecture Designed for  Interpretability](#input-switched-affine-networks-an-rnn-architecture-designed-for--interpretability)\n    * [Large scale modeling of antimicrobial resistance with interpretable  classifiers](#large-scale-modeling-of-antimicrobial-resistance-with-interpretable--classifiers)\n    * [Towards a New Interpretation of Separable Convolutions](#towards-a-new-interpretation-of-separable-convolutions)\n    * [Interpretable Structure-Evolving LSTM](#interpretable-structure-evolving-lstm)\n    * [Improving Interpretability of Deep Neural Networks with Semantic  Information](#improving-interpretability-of-deep-neural-networks-with-semantic--information)\n    * [InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations](#infogail-interpretable-imitation-learning-from-visual-demonstrations)\n    * [Patchnet: Interpretable Neural Networks for Image Classification](#patchnet-interpretable-neural-networks-for-image-classification)\n    * [Unsupervised Learning of Disentangled and Interpretable Representations  from Sequential Data](#unsupervised-learning-of-disentangled-and-interpretable-representations--from-sequential-data)\n    * [Interpretable Convolutional Neural Networks](#interpretable-convolutional-neural-networks)\n    * [InterpNET: Neural Introspection for Interpretable Deep Learning](#interpnet-neural-introspection-for-interpretable-deep-learning)\n    * [MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural  Networks](#minimalrnn-toward-more-interpretable-and-trainable-recurrent-neural--networks)\n    * [Beyond Sparsity: Tree Regularization of Deep Models for Interpretability](#beyond-sparsity-tree-regularization-of-deep-models-for-interpretability)\n    * [SPINE: SParse Interpretable Neural Embeddings](#spine-sparse-interpretable-neural-embeddings)\n    * [Improving the Adversarial Robustness and Interpretability of Deep Neural  Networks by Regularizing their Input Gradients](#improving-the-adversarial-robustness-and-interpretability-of-deep-neural--networks-by-regularizing-their-input-gradients)\n\n\n# Papers\n\n## [Interpretable Policies for Reinforcement Learning by Genetic Programming](https://arxiv.org/abs/1712.04170)\n[(PDF)](https://arxiv.org/pdf/1712.04170)\n\n`Authors:Daniel Hein, Steffen Udluft, Thomas A. Runkler`\n\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)\n\n\nCite as:\n\narXiv:1712.04170 [cs.AI]\n\n \n(or arXiv:1712.04170v1 [cs.AI] for this version)\n\n\n> Abstract: The search for interpretable reinforcement learning policies is of high\nacademic and industrial interest. Especially for industrial systems, domain\nexperts are more likely to deploy autonomously learned controllers if they are\nunderstandable and convenient to evaluate. Basic algebraic equations are\nsupposed to meet these requirements, as long as they are restricted to an\nadequate complexity. Here we introduce the genetic programming for\nreinforcement learning (GPRL) approach based on model-based batch reinforcement\nlearning and genetic programming, which autonomously learns policy equations\nfrom pre-existing default state-action trajectory samples. GPRL is compared to\na straight-forward method which utilizes genetic programming for symbolic\nregression, yielding policies imitating an existing well-performing, but\nnon-interpretable policy. Experiments on three reinforcement learning\nbenchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark,\ndemonstrate the superiority of our GPRL approach compared to the symbolic\nregression method. GPRL is capable of producing well-performing interpretable\nreinforcement learning policies from pre-existing default trajectory data.\n\n\n## [Discovery Radiomics with CLEAR-DR: Interpretable Computer Aided  Diagnosis of Diabetic Retinopathy](https://arxiv.org/abs/1710.10675)\n[(PDF)](https://arxiv.org/pdf/1710.10675)\n\n`Authors:Devinder Kumar, Graham W. Taylor, Alexander Wong`\n\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)\n\n\nCite as:\n\narXiv:1710.10675 [cs.AI]\n\n \n(or arXiv:1710.10675v1 [cs.AI] for this version)\n\n\n> Abstract: Objective: Radiomics-driven Computer Aided Diagnosis (CAD) has shown\nconsiderable promise in recent years as a potential tool for improving clinical\ndecision support in medical oncology, particularly those based around the\nconcept of Discovery Radiomics, where radiomic sequencers are discovered\nthrough the analysis of medical imaging data. One of the main limitations with\ncurrent CAD approaches is that it is very difficult to gain insight or\nrationale as to how decisions are made, thus limiting their utility to\nclinicians. Methods: In this study, we propose CLEAR-DR, a novel interpretable\nCAD system based on the notion of CLass-Enhanced Attentive Response Discovery\nRadiomics for the purpose of clinical decision support for diabetic\nretinopathy. Results: In addition to disease grading via the discovered deep\nradiomic sequencer, the CLEAR-DR system also produces a visual interpretation\nof the decision-making process to provide better insight and understanding into\nthe decision-making process of the system. Conclusion: We demonstrate the\neffectiveness and utility of the proposed CLEAR-DR system of enhancing the\ninterpretability of diagnostic grading results for the application of diabetic\nretinopathy grading. Significance: CLEAR-DR can act as a potential powerful\ntool to address the uninterpretability issue of current CAD systems, thus\nimproving their utility to clinicians.\n\n\n## [Interpretation of Neural Networks is Fragile](https://arxiv.org/abs/1710.10547)\n[(PDF)](https://arxiv.org/pdf/1710.10547)\n\n`Authors:Amirata Ghorbani, Abubakar Abid, James Zou`\n\n\nComments:\n\nSubmitted for review at ICLR 2018\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1710.10547 [stat.ML]\n\n \n(or arXiv:1710.10547v1 [stat.ML] for this version)\n\n\n> Abstract: In order for machine learning to be deployed and trusted in many\napplications, it is crucial to be able to reliably explain why the machine\nlearning algorithm makes certain predictions. For example, if an algorithm\nclassifies a given pathology image to be a malignant tumor, then the doctor may\nneed to know which parts of the image led the algorithm to this classification.\nHow to interpret black-box predictors is thus an important and active area of\nresearch. A fundamental question is: how much can we trust the interpretation\nitself? In this paper, we show that interpretation of deep learning predictions\nis extremely fragile in the following sense: two perceptively indistinguishable\ninputs with the same predicted label can be assigned very different\ninterpretations. We systematically characterize the fragility of several\nwidely-used feature-importance interpretation methods (saliency maps, relevance\npropagation, and DeepLIFT) on ImageNet and CIFAR-10. Our experiments show that\neven small random perturbation can change the feature importance and new\nsystematic perturbations can lead to dramatically different interpretations\nwithout changing the label. We extend these results to show that\ninterpretations based on exemplars (e.g. influence functions) are similarly\nfragile. Our analysis of the geometry of the Hessian matrix gives insight on\nwhy fragility could be a fundamental challenge to the current interpretation\napproaches.\n\n\n## [Artificial Intelligence as Structural Estimation: Economic  Interpretations of Deep Blue, Bonanza, and AlphaGo](https://arxiv.org/abs/1710.10967)\n[(PDF)](https://arxiv.org/pdf/1710.10967)\n\n`Authors:Mitsuru Igami`\n\n\nSubjects:\n\nEconometrics (econ.EM); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1710.10967 [econ.EM]\n\n \n(or arXiv:1710.10967v2 [econ.EM] for this version)\n\n\n> Abstract: Artificial intelligence (AI) has achieved superhuman performance in a growing\nnumber of tasks, including the classical games of chess, shogi, and Go, but\nunderstanding and explaining AI remain challenging. This paper studies the\nmachine-learning algorithms for developing the game AIs, and provides their\nstructural interpretations. Specifically, chess-playing Deep Blue is a\ncalibrated value function, whereas shogi-playing Bonanza represents an\nestimated value function via Rust's (1987) nested fixed-point method. AlphaGo's\n\"supervised-learning policy network\" is a deep neural network (DNN) version of\nHotz and Miller's (1993) conditional choice probability estimates; its\n\"reinforcement-learning value network\" is equivalent to Hotz, Miller, Sanders,\nand Smith's (1994) simulation method for estimating the value function. Their\nperformances suggest DNNs are a useful functional form when the state space is\nlarge and data are sparse. Explicitly incorporating strategic interactions and\nunobserved heterogeneity in the data-generating process would further improve\nAIs' explicability.\n\n\n## [Contextual Regression: An Accurate and Conveniently Interpretable  Nonlinear Model for Mining Discovery from Scientific Data](https://arxiv.org/abs/1710.10728)\n[(PDF)](https://arxiv.org/pdf/1710.10728)\n\n`Authors:Chengyu Liu, Wei Wang`\n\n\nComments:\n\n18 pages of Main Article, 30 pages of Supplementary Material\n\nSubjects:\n\nQuantitative Methods (q-bio.QM); Learning (cs.LG); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1710.10728 [q-bio.QM]\n\n \n(or arXiv:1710.10728v1 [q-bio.QM] for this version)\n\n\n> Abstract: Machine learning algorithms such as linear regression, SVM and neural network\nhave played an increasingly important role in the process of scientific\ndiscovery. However, none of them is both interpretable and accurate on\nnonlinear datasets. Here we present contextual regression, a method that joins\nthese two desirable properties together using a hybrid architecture of neural\nnetwork embedding and dot product layer. We demonstrate its high prediction\naccuracy and sensitivity through the task of predictive feature selection on a\nsimulated dataset and the application of predicting open chromatin sites in the\nhuman genome. On the simulated data, our method achieved high fidelity recovery\nof feature contributions under random noise levels up to 200%. On the open\nchromatin dataset, the application of our method not only outperformed the\nstate of the art method in terms of accuracy, but also unveiled two previously\nunfound open chromatin related histone marks. Our method can fill the blank of\naccurate and interpretable nonlinear modeling in scientific data mining tasks.\n\n\n## [Building Data-driven Models with Microstructural Images: Generalization  and Interpretability](https://arxiv.org/abs/1711.00404)\n[(PDF)](https://arxiv.org/pdf/1711.00404)\n\n`Authors:Julia Ling, Maxwell Hutchinson, Erin Antono, Brian DeCost, Elizabeth A. Holm, Bryce Meredig`\n\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Materials Science (cond-mat.mtrl-sci)\n\n\nCite as:\n\narXiv:1711.00404 [cs.AI]\n\n \n(or arXiv:1711.00404v1 [cs.AI] for this version)\n\n\n> Abstract: As data-driven methods rise in popularity in materials science applications,\na key question is how these machine learning models can be used to understand\nmicrostructure. Given the importance of process-structure-property relations\nthroughout materials science, it seems logical that models that can leverage\nmicrostructural data would be more capable of predicting property information.\nWhile there have been some recent attempts to use convolutional neural networks\nto understand microstructural images, these early studies have focused only on\nwhich featurizations yield the highest machine learning model accuracy for a\nsingle data set. This paper explores the use of convolutional neural networks\nfor classifying microstructure with a more holistic set of objectives in mind:\ngeneralization between data sets, number of features required, and\ninterpretability.\n\n\n## [Interpretable Feature Recommendation for Signal Analytics](https://arxiv.org/abs/1711.01870)\n[(PDF)](https://arxiv.org/pdf/1711.01870)\n\n`Authors:Snehasis Banerjee, Tanushyam Chattopadhyay, Ayan Mukherjee`\n\n\nComments:\n\n4 pages, Interpretable Data Mining Workshop, CIKM 2017\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1711.01870 [stat.ML]\n\n \n(or arXiv:1711.01870v1 [stat.ML] for this version)\n\n\n> Abstract: This paper presents an automated approach for interpretable feature\nrecommendation for solving signal data analytics problems. The method has been\ntested by performing experiments on datasets in the domain of prognostics where\ninterpretation of features is considered very important. The proposed approach\nis based on Wide Learning architecture and provides means for interpretation of\nthe recommended features. It is to be noted that such an interpretation is not\navailable with feature learning approaches like Deep Learning (such as\nConvolutional Neural Network) or feature transformation approaches like\nPrincipal Component Analysis. Results show that the feature recommendation and\ninterpretation techniques are quite effective for the problems at hand in terms\nof performance and drastic reduction in time to develop a solution. It is\nfurther shown by an example, how this human-in-loop interpretation system can\nbe used as a prescriptive system.\n\n\n## [Semantic Structure and Interpretability of Word Embeddings](https://arxiv.org/abs/1711.00331)\n[(PDF)](https://arxiv.org/pdf/1711.00331)\n\n`Authors:Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, Aykut Koc, Tolga Cukur`\n\n\nComments:\n\n10 Pages, 7 Figures\n\nSubjects:\n\nComputation and Language (cs.CL)\n\n\nCite as:\n\narXiv:1711.00331 [cs.CL]\n\n \n(or arXiv:1711.00331v2 [cs.CL] for this version)\n\n\n> Abstract: Dense word embeddings, which encode semantic meanings of words to low\ndimensional vector spaces have become very popular in natural language\nprocessing (NLP) research due to their state-of-the-art performances in many\nNLP tasks. Word embeddings are substantially successful in capturing semantic\nrelations among words, so a meaningful semantic structure must be present in\nthe respective vector spaces. However, in many cases, this semantic structure\nis broadly and heterogeneously distributed across the embedding dimensions,\nwhich makes interpretation a big challenge. In this study, we propose a\nstatistical method to uncover the latent semantic structure in the dense word\nembeddings. To perform our analysis we introduce a new dataset (SEMCAT) that\ncontains more than 6500 words semantically grouped under 110 categories. We\nfurther propose a method to quantify the interpretability of the word\nembeddings; the proposed method is a practical alternative to the classical\nword intrusion test that requires human intervention.\n\n\n## [Interpretable and Pedagogical Examples](https://arxiv.org/abs/1711.00694)\n[(PDF)](https://arxiv.org/pdf/1711.00694)\n\n`Authors:Smitha Milli, Pieter Abbeel, Igor Mordatch`\n\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1711.00694 [cs.AI]\n\n \n(or arXiv:1711.00694v1 [cs.AI] for this version)\n\n\n> Abstract: Teachers intentionally pick the most informative examples to show their\nstudents. However, if the teacher and student are neural networks, the examples\nthat the teacher network learns to give, although effective at teaching the\nstudent, are typically uninterpretable. We show that training the student and\nteacher iteratively, rather than jointly, can produce interpretable teaching\nstrategies. We evaluate interpretability by (1) measuring the similarity of the\nteacher's emergent strategies to intuitive strategies in each domain and (2)\nconducting human experiments to evaluate how effective the teacher's strategies\nare at teaching humans. We show that the teacher network learns to select or\ngenerate interpretable, pedagogical examples to teach rule-based,\nprobabilistic, boolean, and hierarchical concepts.\n\n\n## [Unsupervised patient representations from clinical notes with  interpretable classification decisions](https://arxiv.org/abs/1711.05198)\n[(PDF)](https://arxiv.org/pdf/1711.05198)\n\n`Authors:Madhumita Sushil, Simon Šuster, Kim Luyckx, Walter Daelemans`\n\n\nComments:\n\nAccepted poster at NIPS 2017 Workshop on Machine Learning for Health (this https URL)\n\nSubjects:\n\nComputation and Language (cs.CL)\n\n\nCite as:\n\narXiv:1711.05198 [cs.CL]\n\n \n(or arXiv:1711.05198v1 [cs.CL] for this version)\n\n\n> Abstract: We have two main contributions in this work: 1. We explore the usage of a\nstacked denoising autoencoder, and a paragraph vector model to learn\ntask-independent dense patient representations directly from clinical notes. We\nevaluate these representations by using them as features in multiple supervised\nsetups, and compare their performance with those of sparse representations. 2.\nTo understand and interpret the representations, we explore the best encoded\nfeatures within the patient representations obtained from the autoencoder\nmodel. Further, we calculate the significance of the input features of the\ntrained classifiers when we use these pretrained representations as input.\n\n\n## [Interpreting Convolutional Neural Networks Through Compression](https://arxiv.org/abs/1711.02329)\n[(PDF)](https://arxiv.org/pdf/1711.02329)\n\n`Authors:Reza Abbasi-Asl, Bin Yu`\n\n\nComments:\n\nPresented at NIPS 2017 Symposium on Interpretable Machine Learning\n\nSubjects:\n\nMachine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1711.02329 [stat.ML]\n\n \n(or arXiv:1711.02329v1 [stat.ML] for this version)\n\n\n> Abstract: Convolutional neural networks (CNNs) achieve state-of-the-art performance in\na wide variety of tasks in computer vision. However, interpreting CNNs still\nremains a challenge. This is mainly due to the large number of parameters in\nthese networks. Here, we investigate the role of compression and particularly\npruning filters in the interpretation of CNNs. We exploit our recently-proposed\ngreedy structural compression scheme that prunes filters in a trained CNN. In\nour compression, the filter importance index is defined as the classification\naccuracy reduction (CAR) of the network after pruning that filter. The filters\nare then iteratively pruned based on the CAR index. We demonstrate the\ninterpretability of CAR-compressed CNNs by showing that our algorithm prunes\nfilters with visually redundant pattern selectivity. Specifically, we show the\nimportance of shape-selective filters for object recognition, as opposed to\ncolor-selective filters. Out of top 20 CAR-pruned filters in AlexNet, 17 of\nthem in the first layer and 14 of them in the second layer are color-selective\nfilters. Finally, we introduce a variant of our CAR importance index that\nquantifies the importance of each image class to each CNN filter. We show that\nthe most and the least important class labels present a meaningful\ninterpretation of each filter that is consistent with the visualized pattern\nselectivity of that filter.\n\n\n## [Interpretable probabilistic embeddings: bridging the gap between topic  models and neural networks](https://arxiv.org/abs/1711.04154)\n[(PDF)](https://arxiv.org/pdf/1711.04154)\n\n`Authors:Anna Potapenko, Artem Popov, Konstantin Vorontsov`\n\n\nComments:\n\nAppeared in AINL-2017\n\nSubjects:\n\nComputation and Language (cs.CL)\n\n\nCite as:\n\narXiv:1711.04154 [cs.CL]\n\n \n(or arXiv:1711.04154v1 [cs.CL] for this version)\n\n\n> Abstract: We consider probabilistic topic models and more recent word embedding\ntechniques from a perspective of learning hidden semantic representations.\nInspired by a striking similarity of the two approaches, we merge them and\nlearn probabilistic embeddings with online EM-algorithm on word co-occurrence\ndata. The resulting embeddings perform on par with Skip-Gram Negative Sampling\n(SGNS) on word similarity tasks and benefit in the interpretability of the\ncomponents. Next, we learn probabilistic document embeddings that outperform\nparagraph2vec on a document similarity task and require less memory and time\nfor training. Finally, we employ multimodal Additive Regularization of Topic\nModels (ARTM) to obtain a high sparsity and learn embeddings for other\nmodalities, such as timestamps and categories. We observe further improvement\nof word similarity performance and meaningful inter-modality similarities.\n\n\n## [Arrhythmia Classification from the Abductive Interpretation of Short  Single-Lead ECG Records](https://arxiv.org/abs/1711.03892)\n[(PDF)](https://arxiv.org/pdf/1711.03892)\n\n`Authors:Tomás Teijeiro, Constantino A. García, Daniel Castro, Paulo Félix`\n\n\nComments:\n\n4 pages, 3 figures. Presented in the Computing in Cardiology 2017 conference\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)\n\n\nMSC classes:\n\n68T10\n\n\nCite as:\n\narXiv:1711.03892 [cs.AI]\n\n \n(or arXiv:1711.03892v1 [cs.AI] for this version)\n\n\n> Abstract: In this work we propose a new method for the rhythm classification of short\nsingle-lead ECG records, using a set of high-level and clinically meaningful\nfeatures provided by the abductive interpretation of the records. These\nfeatures include morphological and rhythm-related features that are used to\nbuild two classifiers: one that evaluates the record globally, using aggregated\nvalues for each feature; and another one that evaluates the record as a\nsequence, using a Recurrent Neural Network fed with the individual features for\neach detected heartbeat. The two classifiers are finally combined using the\nstacking technique, providing an answer by means of four target classes: Normal\nsinus rhythm, Atrial fibrillation, Other anomaly, and Noisy. The approach has\nbeen validated against the 2017 Physionet/CinC Challenge dataset, obtaining a\nfinal score of 0.83 and ranking first in the competition.\n\n\n## [Interpretable R-CNN](https://arxiv.org/abs/1711.05226)\n[(PDF)](https://arxiv.org/pdf/1711.05226)\n\n`Authors:Tianfu Wu, Xilai Li, Xi Song, Wei Sun, Liang Dong, Bo Li`\n\n\nComments:\n\n13 pages\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1711.05226 [cs.CV]\n\n \n(or arXiv:1711.05226v1 [cs.CV] for this version)\n\n\n> Abstract: This paper presents a method of learning qualitatively interpretable models\nin object detection using popular two-stage region-based ConvNet detection\nsystems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI\n(Region-of-Interest) prediction network.By interpretable models, we focus on\nweakly-supervised extractive rationale generation, that is learning to unfold\nlatent discriminative part configurations of object instances automatically and\nsimultaneously in detection without using any supervision for part\nconfigurations. We utilize a top-down hierarchical and compositional grammar\nmodel embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold\nthe space of latent part configurations of RoIs. We propose an AOGParsing\noperator to substitute the RoIPooling operator widely used in R-CNN, so the\nproposed method is applicable to many state-of-the-art ConvNet based detection\nsystems. The AOGParsing operator aims to harness both the explainable rigor of\ntop-down hierarchical and compositional grammar models and the discriminative\npower of bottom-up deep neural networks through end-to-end training. In\ndetection, a bounding box is interpreted by the best parse tree derived from\nthe AOG on-the-fly, which is treated as the extractive rationale generated for\ninterpreting detection. In learning, we propose a folding-unfolding method to\ntrain the AOG and ConvNet end-to-end. In experiments, we build on top of the\nR-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets\nwith performance comparable to state-of-the-art methods.\n\n\n## [Interpreting Deep Visual Representations via Network Dissection](https://arxiv.org/abs/1711.05611)\n[(PDF)](https://arxiv.org/pdf/1711.05611)\n\n`Authors:Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba`\n\n\nComments:\n\n*B. Zhou and D. Bau contributed equally to this work. 15 pages, 27 figures\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nACM classes:\n\nI.2.10\n\n\nCite as:\n\narXiv:1711.05611 [cs.CV]\n\n \n(or arXiv:1711.05611v1 [cs.CV] for this version)\n\n\n> Abstract: The success of recent deep convolutional neural networks (CNNs) depends on\nlearning hidden representations that can summarize the important factors of\nvariation behind the data. However, CNNs often criticized as being black boxes\nthat lack interpretability, since they have millions of unexplained model\nparameters. In this work, we describe Network Dissection, a method that\ninterprets networks by providing labels for the units of their deep visual\nrepresentations. The proposed method quantifies the interpretability of CNN\nrepresentations by evaluating the alignment between individual hidden units and\na set of visual semantic concepts. By identifying the best alignments, units\nare given human interpretable labels across a range of objects, parts, scenes,\ntextures, materials, and colors. The method reveals that deep representations\nare more transparent and interpretable than expected: we find that\nrepresentations are significantly more interpretable than they would be under a\nrandom equivalently powerful basis. We apply the method to interpret and\ncompare the latent representations of various network architectures trained to\nsolve different supervised and self-supervised training tasks. We then examine\nfactors affecting the network interpretability such as the number of the\ntraining iterations, regularizations, different initializations, and the\nnetwork depth and width. Finally we show that the interpreted units can be used\nto provide explicit explanations of a prediction given by a CNN for an image.\nOur results highlight that interpretability is an important property of deep\nneural networks that provides new insights into their hierarchical structure.\n\n\n## [Interpretable classifiers using rules and Bayesian analysis: Building a  better stroke prediction model](https://arxiv.org/abs/1511.01644)\n[(PDF)](https://arxiv.org/pdf/1511.01644)\n\n`Authors:Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, David Madigan`\n\n\nComments:\n\nPublished at this http URL in the Annals of Applied Statistics (this http URL) by the Institute of Mathematical Statistics (this http URL)\n\nSubjects:\n\nApplications (stat.AP); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nJournal reference:\n\nAnnals of Applied Statistics 2015, Vol. 9, No. 3, 1350-1371\n\n\nDOI:\n\n10.1214/15-AOAS848\n\n\nReport number:\n\nIMS-AOAS-AOAS848\n\n\nCite as:\n\narXiv:1511.01644 [stat.AP]\n\n \n(or arXiv:1511.01644v1 [stat.AP] for this version)\n\n\n> Abstract: We aim to produce predictive models that are not only accurate, but are also\ninterpretable to human experts. Our models are decision lists, which consist of\na series of if...then... statements (e.g., if high blood pressure, then stroke)\nthat discretize a high-dimensional, multivariate feature space into a series of\nsimple, readily interpretable decision statements. We introduce a generative\nmodel called Bayesian Rule Lists that yields a posterior distribution over\npossible decision lists. It employs a novel prior structure to encourage\nsparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy\non par with the current top algorithms for prediction in machine learning. Our\nmethod is motivated by recent developments in personalized medicine, and can be\nused to produce highly accurate and interpretable medical scoring systems. We\ndemonstrate this by producing an alternative to the CHADS$_2$ score, actively\nused in clinical practice for estimating the risk of stroke in patients that\nhave atrial fibrillation. Our model is as interpretable as CHADS$_2$, but more\naccurate.\n\n\n## [Interpretable Deep Neural Networks for Single-Trial EEG Classification](https://arxiv.org/abs/1604.08201)\n[(PDF)](https://arxiv.org/pdf/1604.08201)\n\n`Authors:Irene Sturm, Sebastian Bach, Wojciech Samek, Klaus-Robert Müller`\n\n\nComments:\n\n5 pages, 1 figure\n\nSubjects:\n\nNeural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1604.08201 [cs.NE]\n\n \n(or arXiv:1604.08201v1 [cs.NE] for this version)\n\n\n> Abstract: Background: In cognitive neuroscience the potential of Deep Neural Networks\n(DNNs) for solving complex classification tasks is yet to be fully exploited.\nThe most limiting factor is that DNNs as notorious 'black boxes' do not provide\ninsight into neurophysiological phenomena underlying a decision. Layer-wise\nRelevance Propagation (LRP) has been introduced as a novel method to explain\nindividual network decisions. New Method: We propose the application of DNNs\nwith LRP for the first time for EEG data analysis. Through LRP the single-trial\nDNN decisions are transformed into heatmaps indicating each data point's\nrelevance for the outcome of the decision. Results: DNN achieves classification\naccuracies comparable to those of CSP-LDA. In subjects with low performance\nsubject-to-subject transfer of trained DNNs can improve the results. The\nsingle-trial LRP heatmaps reveal neurophysiologically plausible patterns,\nresembling CSP-derived scalp maps. Critically, while CSP patterns represent\nclass-wise aggregated information, LRP heatmaps pinpoint neural patterns to\nsingle time points in single trials. Comparison with Existing Method(s): We\ncompare the classification performance of DNNs to that of linear CSP-LDA on two\ndata sets related to motor-imaginery BCI. Conclusion: We have demonstrated that\nDNN is a powerful non-linear tool for EEG analysis. With LRP a new quality of\nhigh-resolution assessment of neural activity can be reached. LRP is a\npotential remedy for the lack of interpretability of DNNs that has limited\ntheir utility in neuroscientific applications. The extreme specificity of the\nLRP-derived heatmaps opens up new avenues for investigating neural activity\nunderlying complex perception or decision-related processes.\n\n\n## [InfoGAN: Interpretable Representation Learning by Information Maximizing  Generative Adversarial Nets](https://arxiv.org/abs/1606.03657)\n[(PDF)](https://arxiv.org/pdf/1606.03657)\n\n`Authors:Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel`\n\n\nSubjects:\n\nLearning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1606.03657 [cs.LG]\n\n \n(or arXiv:1606.03657v1 [cs.LG] for this version)\n\n\n> Abstract: This paper describes InfoGAN, an information-theoretic extension to the\nGenerative Adversarial Network that is able to learn disentangled\nrepresentations in a completely unsupervised manner. InfoGAN is a generative\nadversarial network that also maximizes the mutual information between a small\nsubset of the latent variables and the observation. We derive a lower bound to\nthe mutual information objective that can be optimized efficiently, and show\nthat our training procedure can be interpreted as a variation of the Wake-Sleep\nalgorithm. Specifically, InfoGAN successfully disentangles writing styles from\ndigit shapes on the MNIST dataset, pose from lighting of 3D rendered images,\nand background digits from the central digit on the SVHN dataset. It also\ndiscovers visual concepts that include hair styles, presence/absence of\neyeglasses, and emotions on the CelebA face dataset. Experiments show that\nInfoGAN learns interpretable representations that are competitive with\nrepresentations learned by existing fully supervised methods.\n\n\n## [The Mythos of Model Interpretability](https://arxiv.org/abs/1606.03490)\n[(PDF)](https://arxiv.org/pdf/1606.03490)\n\n`Authors:Zachary C. Lipton`\n\n\nComments:\n\npresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nLearning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1606.03490 [cs.LG]\n\n \n(or arXiv:1606.03490v3 [cs.LG] for this version)\n\n\n> Abstract: Supervised machine learning models boast remarkable predictive capabilities.\nBut can you trust your model? Will it work in deployment? What else can it tell\nyou about the world? We want models to be not only good, but interpretable. And\nyet the task of interpretation appears underspecified. Papers provide diverse\nand sometimes non-overlapping motivations for interpretability, and offer\nmyriad notions of what attributes render models interpretable. Despite this\nambiguity, many papers proclaim interpretability axiomatically, absent further\nexplanation. In this paper, we seek to refine the discourse on\ninterpretability. First, we examine the motivations underlying interest in\ninterpretability, finding them to be diverse and occasionally discordant. Then,\nwe address model properties and techniques thought to confer interpretability,\nidentifying transparency to humans and post-hoc explanations as competing\nnotions. Throughout, we discuss the feasibility and desirability of different\nnotions, and question the oft-made assertions that linear models are\ninterpretable and that deep neural networks are not.\n\n\n## [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](https://arxiv.org/abs/1606.05320)\n[(PDF)](https://arxiv.org/pdf/1606.05320)\n\n`Authors:Viktoriya Krakovna, Finale Doshi-Velez`\n\n\nComments:\n\npresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.05320 [stat.ML]\n\n \n(or arXiv:1606.05320v2 [stat.ML] for this version)\n\n\n> Abstract: As deep neural networks continue to revolutionize various application\ndomains, there is increasing interest in making these powerful models more\nunderstandable and interpretable, and narrowing down the causes of good and bad\npredictions. We focus on recurrent neural networks (RNNs), state of the art\nmodels in speech recognition and translation. Our approach to increasing\ninterpretability is by combining an RNN with a hidden Markov model (HMM), a\nsimpler and more transparent model. We explore various combinations of RNNs and\nHMMs: an HMM trained on LSTM states; a hybrid model where an HMM is trained\nfirst, then a small LSTM is given HMM state distributions and trained to fill\nin gaps in the HMM's performance; and a jointly trained hybrid model. We find\nthat the LSTM and HMM learn complementary information about the features in the\ntext.\n\n\n## [Model-Agnostic Interpretability of Machine Learning](https://arxiv.org/abs/1606.05386)\n[(PDF)](https://arxiv.org/pdf/1606.05386)\n\n`Authors:Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin`\n\n\nComments:\n\npresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.05386 [stat.ML]\n\n \n(or arXiv:1606.05386v1 [stat.ML] for this version)\n\n\n> Abstract: Understanding why machine learning models behave the way they do empowers\nboth system designers and end-users in many ways: in model selection, feature\nengineering, in order to trust and act upon the predictions, and in more\nintuitive user interfaces. Thus, interpretability has become a vital concern in\nmachine learning, and work in the area of interpretable models has found\nrenewed interest. In some applications, such models are as accurate as\nnon-interpretable ones, and thus are preferred for their transparency. Even\nwhen they are not accurate, they may still be preferred when interpretability\nis of paramount importance. However, restricting machine learning to\ninterpretable models is often a severe limitation. In this paper we argue for\nexplaining machine learning predictions using model-agnostic approaches. By\ntreating the machine learning models as black-box functions, these approaches\nprovide crucial flexibility in the choice of models, explanations, and\nrepresentations, improving debugging, comparison, and interfaces for a variety\nof users and models. We also outline the main challenges for such methods, and\nreview a recently-introduced model-agnostic explanation approach (LIME) that\naddresses these challenges.\n\n\n## [Learning Interpretable Musical Compositional Rules and Traces](https://arxiv.org/abs/1606.05572)\n[(PDF)](https://arxiv.org/pdf/1606.05572)\n\n`Authors:Haizi Yu, Lav R. Varshney, Guy E. Garnett, Ranjitha Kumar`\n\n\nComments:\n\npresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.05572 [stat.ML]\n\n \n(or arXiv:1606.05572v1 [stat.ML] for this version)\n\n\n> Abstract: Throughout music history, theorists have identified and documented\ninterpretable rules that capture the decisions of composers. This paper asks,\n\"Can a machine behave like a music theorist?\" It presents MUS-ROVER, a\nself-learning system for automatically discovering rules from symbolic music.\nMUS-ROVER performs feature learning via $n$-gram models to extract\ncompositional rules --- statistical patterns over the resulting features. We\nevaluate MUS-ROVER on Bach's (SATB) chorales, demonstrating that it can recover\nknown rules, as well as identify new, characteristic patterns for further\nstudy. We discuss how the extracted rules can be used in both machine and human\ncomposition.\n\n\n## [Building an Interpretable Recommender via Loss-Preserving Transformation](https://arxiv.org/abs/1606.05819)\n[(PDF)](https://arxiv.org/pdf/1606.05819)\n\n`Authors:Amit Dhurandhar, Sechan Oh, Marek Petrik`\n\n\nComments:\n\nPresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.05819 [stat.ML]\n\n \n(or arXiv:1606.05819v1 [stat.ML] for this version)\n\n\n> Abstract: We propose a method for building an interpretable recommender system for\npersonalizing online content and promotions. Historical data available for the\nsystem consists of customer features, provided content (promotions), and user\nresponses. Unlike in a standard multi-class classification setting,\nmisclassification costs depend on both recommended actions and customers. Our\nmethod transforms such a data set to a new set which can be used with standard\ninterpretable multi-class classification algorithms. The transformation has the\ndesirable property that minimizing the standard misclassification penalty in\nthis new space is equivalent to minimizing the custom cost function.\n\n\n## [Using Visual Analytics to Interpret Predictive Machine Learning Models](https://arxiv.org/abs/1606.05685)\n[(PDF)](https://arxiv.org/pdf/1606.05685)\n\n`Authors:Josua Krause, Adam Perer, Enrico Bertini`\n\n\nComments:\n\npresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.05685 [stat.ML]\n\n \n(or arXiv:1606.05685v2 [stat.ML] for this version)\n\n\n> Abstract: It is commonly believed that increasing the interpretability of a machine\nlearning model may decrease its predictive power. However, inspecting\ninput-output relationships of those models using visual analytics, while\ntreating them as black-box, can help to understand the reasoning behind\noutcomes without sacrificing predictive quality. We identify a space of\npossible solutions and provide two examples of where such techniques have been\nsuccessfully used in practice.\n\n\n## [Interpretable Machine Learning Models for the Digital Clock Drawing Test](https://arxiv.org/abs/1606.07163)\n[(PDF)](https://arxiv.org/pdf/1606.07163)\n\n`Authors:William Souillard-Mandar, Randall Davis, Cynthia Rudin, Rhoda Au, Dana Penney`\n\n\nComments:\n\nPresented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1606.07163 [stat.ML]\n\n \n(or arXiv:1606.07163v1 [stat.ML] for this version)\n\n\n> Abstract: The Clock Drawing Test (CDT) is a rapid, inexpensive, and popular\nneuropsychological screening tool for cognitive conditions. The Digital Clock\nDrawing Test (dCDT) uses novel software to analyze data from a digitizing\nballpoint pen that reports its position with considerable spatial and temporal\nprecision, making possible the analysis of both the drawing process and final\nproduct. We developed methodology to analyze pen stroke data from these\ndrawings, and computed a large collection of features which were then analyzed\nwith a variety of machine learning techniques. The resulting scoring systems\nwere designed to be more accurate than the systems currently used by\nclinicians, but just as interpretable and easy to use. The systems also allow\nus to quantify the tradeoff between accuracy and interpretability. We created\nautomated versions of the CDT scoring systems currently used by clinicians,\nallowing us to benchmark our models, which indicated that our machine learning\nmodels substantially outperformed the existing scoring systems.\n\n\n## [SnapToGrid: From Statistical to Interpretable Models for Biomedical  Information Extraction](https://arxiv.org/abs/1606.09604)\n[(PDF)](https://arxiv.org/pdf/1606.09604)\n\n`Authors:Marco A. Valenzuela-Escarcega, Gus Hahn-Powell, Dane Bell, Mihai Surdeanu`\n\n\nSubjects:\n\nComputation and Language (cs.CL)\n\n\nCite as:\n\narXiv:1606.09604 [cs.CL]\n\n \n(or arXiv:1606.09604v1 [cs.CL] for this version)\n\n\n> Abstract: We propose an approach for biomedical information extraction that marries the\nadvantages of machine learning models, e.g., learning directly from data, with\nthe benefits of rule-based approaches, e.g., interpretability. Our approach\nstarts by training a feature-based statistical model, then converts this model\nto a rule-based variant by converting its features to rules, and \"snapping to\ngrid\" the feature weights to discrete votes. In doing so, our proposal takes\nadvantage of the large body of work in machine learning, but it produces an\ninterpretable model, which can be directly edited by experts. We evaluate our\napproach on the BioNLP 2009 event extraction task. Our results show that there\nis a small performance penalty when converting the statistical model to rules,\nbut the gain in interpretability compensates for that: with minimal effort,\nhuman experts improve this model to have similar performance to the statistical\nmodel that served as starting point.\n\n\n## [Meaningful Models: Utilizing Conceptual Structure to Improve Machine  Learning Interpretability](https://arxiv.org/abs/1607.00279)\n[(PDF)](https://arxiv.org/pdf/1607.00279)\n\n`Authors:Nick Condry`\n\n\nComments:\n\n5 pages, 3 figures, presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY\n\nSubjects:\n\nMachine Learning (stat.ML); Artificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1607.00279 [stat.ML]\n\n \n(or arXiv:1607.00279v1 [stat.ML] for this version)\n\n\n> Abstract: The last decade has seen huge progress in the development of advanced machine\nlearning models; however, those models are powerless unless human users can\ninterpret them. Here we show how the mind's construction of concepts and\nmeaning can be used to create more interpretable machine learning models. By\nproposing a novel method of classifying concepts, in terms of 'form' and\n'function', we elucidate the nature of meaning and offer proposals to improve\nmodel understandability. As machine learning begins to permeate daily life,\ninterpretable models may serve as a bridge between domain-expert authors and\nnon-expert users.\n\n\n## [RETAIN: An Interpretable Predictive Model for Healthcare using Reverse  Time Attention Mechanism](https://arxiv.org/abs/1608.05745)\n[(PDF)](https://arxiv.org/pdf/1608.05745)\n\n`Authors:Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Andy Schuetz, Walter F. Stewart, Jimeng Sun`\n\n\nComments:\n\nAccepted at Neural Information Processing Systems (NIPS) 2016\n\nSubjects:\n\nLearning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)\n\n\nCite as:\n\narXiv:1608.05745 [cs.LG]\n\n \n(or arXiv:1608.05745v4 [cs.LG] for this version)\n\n\n> Abstract: Accuracy and interpretability are two dominant features of successful\npredictive models. Typically, a choice must be made in favor of complex black\nbox models such as recurrent neural networks (RNN) for accuracy versus less\naccurate but more interpretable traditional models such as logistic regression.\nThis tradeoff poses challenges in medicine where both accuracy and\ninterpretability are important. We addressed this challenge by developing the\nREverse Time AttentIoN model (RETAIN) for application to Electronic Health\nRecords (EHR) data. RETAIN achieves high accuracy while remaining clinically\ninterpretable and is based on a two-level neural attention model that detects\ninfluential past visits and significant clinical variables within those visits\n(e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR\ndata in a reverse time order so that recent clinical visits are likely to\nreceive higher attention. RETAIN was tested on a large health system EHR\ndataset with 14 million visits completed by 263K patients over an 8 year period\nand demonstrated predictive accuracy and computational scalability comparable\nto state-of-the-art methods such as RNN, and ease of interpretability\ncomparable to traditional models.\n\n\n## [Towards Transparent AI Systems: Interpreting Visual Question Answering  Models](https://arxiv.org/abs/1608.08974)\n[(PDF)](https://arxiv.org/pdf/1608.08974)\n\n`Authors:Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1608.08974 [cs.CV]\n\n \n(or arXiv:1608.08974v2 [cs.CV] for this version)\n\n\n> Abstract: Deep neural networks have shown striking progress and obtained\nstate-of-the-art results in many AI research fields in the recent years.\nHowever, it is often unsatisfying to not know why they predict what they do. In\nthis paper, we address the problem of interpreting Visual Question Answering\n(VQA) models. Specifically, we are interested in finding what part of the input\n(pixels in images or words in questions) the VQA model focuses on while\nanswering the question. To tackle this problem, we use two visualization\ntechniques -- guided backpropagation and occlusion -- to find important words\nin the question and important regions in the image. We then present qualitative\nand quantitative analyses of these importance maps. We found that even without\nexplicit attention mechanisms, VQA models may sometimes be implicitly attending\nto relevant regions in the image, and often to appropriate words in the\nquestion.\n\n\n## [Real Time Fine-Grained Categorization with Accuracy and Interpretability](https://arxiv.org/abs/1610.00824)\n[(PDF)](https://arxiv.org/pdf/1610.00824)\n\n`Authors:Shaoli Huang, Dacheng Tao`\n\n\nComments:\n\narXiv admin note: text overlap with arXiv:1512.08086\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1610.00824 [cs.CV]\n\n \n(or arXiv:1610.00824v1 [cs.CV] for this version)\n\n\n> Abstract: A well-designed fine-grained categorization system usually has three\ncontradictory requirements: accuracy (the ability to identify objects among\nsubordinate categories); interpretability (the ability to provide\nhuman-understandable explanation of recognition system behavior); and\nefficiency (the speed of the system). To handle the trade-off between accuracy\nand interpretability, we propose a novel \"Deeper Part-Stacked CNN\" architecture\narmed with interpretability by modeling subtle differences between object\nparts. The proposed architecture consists of a part localization network, a\ntwo-stream classification network that simultaneously encodes object-level and\npart-level cues, and a feature vectors fusion component. Specifically, the part\nlocalization network is implemented by exploring a new paradigm for key point\nlocalization that first samples a small number of representable pixels and then\ndetermine their labels via a convolutional layer followed by a softmax layer.\nWe also use a cropping layer to extract part features and propose a scale\nmean-max layer for feature fusion learning. Experimentally, our proposed method\noutperform state-of-the-art approaches both in part localization task and\nclassification task on Caltech-UCSD Birds-200-2011. Moreover, by adopting a set\nof sharing strategies between the computation of multiple object parts, our\nsingle model is fairly efficient running at 32 frames/sec.\n\n\n## [Interpreting Neural Networks to Improve Politeness Comprehension](https://arxiv.org/abs/1610.02683)\n[(PDF)](https://arxiv.org/pdf/1610.02683)\n\n`Authors:Malika Aubakirova, Mohit Bansal`\n\n\nComments:\n\nTo appear at EMNLP 2016\n\nSubjects:\n\nComputation and Language (cs.CL); Artificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1610.02683 [cs.CL]\n\n \n(or arXiv:1610.02683v1 [cs.CL] for this version)\n\n\n> Abstract: We present an interpretable neural network approach to predicting and\nunderstanding politeness in natural language requests. Our models are based on\nsimple convolutional neural networks directly on raw text, avoiding any manual\nidentification of complex sentiment or syntactic features, while performing\nbetter than such feature-based models from previous work. More importantly, we\nuse the challenging task of politeness prediction as a testbed to next present\na much-needed understanding of what these successful networks are actually\nlearning. For this, we present several network visualizations based on\nactivation clusters, first derivative saliency, and embedding space\ntransformations, helping us automatically identify several subtle linguistics\nmarkers of politeness theories. Further, this analysis reveals multiple novel,\nhigh-scoring politeness strategies which, when added back as new features,\nreduce the accuracy gap between the original featurized system and the neural\nmodel, thus providing a clear quantitative interpretation of the success of\nthese neural networks.\n\n\n## [Particle Swarm Optimization for Generating Interpretable Fuzzy  Reinforcement Learning Policies](https://arxiv.org/abs/1610.05984)\n[(PDF)](https://arxiv.org/pdf/1610.05984)\n\n`Authors:Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft`\n\n\nSubjects:\n\nNeural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)\n\n\nJournal reference:\n\nEngineering Applications of Artificial Intelligence, Volume 65C,\n  October 2017, Pages 87-98\n\n\nDOI:\n\n10.1016/j.engappai.2017.07.005\n\n\nCite as:\n\narXiv:1610.05984 [cs.NE]\n\n \n(or arXiv:1610.05984v5 [cs.NE] for this version)\n\n\n> Abstract: Fuzzy controllers are efficient and interpretable system controllers for\ncontinuous state and action spaces. To date, such controllers have been\nconstructed manually or trained automatically either using expert-generated\nproblem-specific cost functions or incorporating detailed knowledge about the\noptimal control strategy. Both requirements for automatic training processes\nare not found in most real-world reinforcement learning (RL) problems. In such\napplications, online learning is often prohibited for safety reasons because\nonline learning requires exploration of the problem's dynamics during policy\ntraining. We introduce a fuzzy particle swarm reinforcement learning (FPSRL)\napproach that can construct fuzzy RL policies solely by training parameters on\nworld models that simulate real system dynamics. These world models are created\nby employing an autonomous machine learning technique that uses previously\ngenerated transition samples of a real system. To the best of our knowledge,\nthis approach is the first to relate self-organizing fuzzy controllers to\nmodel-based batch RL. Therefore, FPSRL is intended to solve problems in domains\nwhere online learning is prohibited, system dynamics are relatively easy to\nmodel from previously generated default policy transition samples, and it is\nexpected that a relatively easily interpretable control policy exists. The\nefficiency of the proposed approach with problems from such domains is\ndemonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole\nbalancing, and cart-pole swing-up. Our experimental results demonstrate\nhigh-performing, interpretable fuzzy policies.\n\n\n## [Embedding Projector: Interactive Visualization and Interpretation of  Embeddings](https://arxiv.org/abs/1611.05469)\n[(PDF)](https://arxiv.org/pdf/1611.05469)\n\n`Authors:Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, Martin Wattenberg`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Human-Computer Interaction (cs.HC)\n\n\nCite as:\n\narXiv:1611.05469 [stat.ML]\n\n \n(or arXiv:1611.05469v1 [stat.ML] for this version)\n\n\n> Abstract: Embeddings are ubiquitous in machine learning, appearing in recommender\nsystems, NLP, and many other applications. Researchers and developers often\nneed to explore the properties of a specific embedding, and one way to analyze\nembeddings is to visualize them. We present the Embedding Projector, a tool for\ninteractive visualization and interpretation of embeddings.\n\n\n## [Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning](https://arxiv.org/abs/1611.04246)\n[(PDF)](https://arxiv.org/pdf/1611.04246)\n\n`Authors:Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu`\n\n\nComments:\n\nin the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1611.04246 [cs.CV]\n\n \n(or arXiv:1611.04246v2 [cs.CV] for this version)\n\n\n> Abstract: This paper proposes a learning strategy that extracts object-part concepts\nfrom a pre-trained convolutional neural network (CNN), in an attempt to 1)\nexplore explicit semantics hidden in CNN units and 2) gradually grow a\nsemantically interpretable graphical model on the pre-trained CNN for\nhierarchical object understanding. Given part annotations on very few (e.g.,\n3-12) objects, our method mines certain latent patterns from the pre-trained\nCNN and associates them with different semantic parts. We use a four-layer\nAnd-Or graph to organize the mined latent patterns, so as to clarify their\ninternal semantic hierarchy. Our method is guided by a small number of part\nannotations, and it achieves superior performance (about 13%-107% improvement)\nin part center prediction on the PASCAL VOC and ImageNet datasets.\n\n\n## [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](https://arxiv.org/abs/1611.05934)\n[(PDF)](https://arxiv.org/pdf/1611.05934)\n\n`Authors:Viktoriya Krakovna, Finale Doshi-Velez`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems. arXiv admin note: substantial text overlap with arXiv:1606.05320\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.05934 [stat.ML]\n\n \n(or arXiv:1611.05934v1 [stat.ML] for this version)\n\n\n> Abstract: As deep neural networks continue to revolutionize various application\ndomains, there is increasing interest in making these powerful models more\nunderstandable and interpretable, and narrowing down the causes of good and bad\npredictions. We focus on recurrent neural networks, state of the art models in\nspeech recognition and translation. Our approach to increasing interpretability\nis by combining a long short-term memory (LSTM) model with a hidden Markov\nmodel (HMM), a simpler and more transparent model. We add the HMM state\nprobabilities to the output layer of the LSTM, and then train the HMM and LSTM\neither sequentially or jointly. The LSTM can make use of the information from\nthe HMM, and fill in the gaps when the HMM is not performing well. A small\nhybrid model usually performs better than a standalone LSTM of the same size,\nespecially on smaller data sets. We test the algorithms on text data and\nmedical time series data, and find that the LSTM and HMM learn complementary\ninformation about the features in the text.\n\n\n## [GENESIM: genetic extraction of a single, interpretable model](https://arxiv.org/abs/1611.05722)\n[(PDF)](https://arxiv.org/pdf/1611.05722)\n\n`Authors:Gilles Vandewiele, Olivier Janssens, Femke Ongenae, Filip De Turck, Sofie Van Hoecke`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.05722 [stat.ML]\n\n \n(or arXiv:1611.05722v1 [stat.ML] for this version)\n\n\n> Abstract: Models obtained by decision tree induction techniques excel in being\ninterpretable.However, they can be prone to overfitting, which results in a low\npredictive performance. Ensemble techniques are able to achieve a higher\naccuracy. However, this comes at a cost of losing interpretability of the\nresulting model. This makes ensemble techniques impractical in applications\nwhere decision support, instead of decision making, is crucial.\nTo bridge this gap, we present the GENESIM algorithm that transforms an\nensemble of decision trees to a single decision tree with an enhanced\npredictive performance by using a genetic algorithm. We compared GENESIM to\nprevalent decision tree induction and ensemble techniques using twelve publicly\navailable data sets. The results show that GENESIM achieves a better predictive\nperformance on most of these data sets than decision tree induction techniques\nand a predictive performance in the same order of magnitude as the ensemble\ntechniques. Moreover, the resulting model of GENESIM has a very low complexity,\nmaking it very interpretable, in contrast to ensemble techniques.\n\n\n## [Stratified Knowledge Bases as Interpretable Probabilistic Models  (Extended Abstract)](https://arxiv.org/abs/1611.06174)\n[(PDF)](https://arxiv.org/pdf/1611.06174)\n\n`Authors:Ondrej Kuzelka, Jesse Davis, Steven Schockaert`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1611.06174 [cs.AI]\n\n \n(or arXiv:1611.06174v1 [cs.AI] for this version)\n\n\n> Abstract: In this paper, we advocate the use of stratified logical theories for\nrepresenting probabilistic models. We argue that such encodings can be more\ninterpretable than those obtained in existing frameworks such as Markov logic\nnetworks. Among others, this allows for the use of domain experts to improve\nlearned models by directly removing, adding, or modifying logical formulas.\n\n\n## [Learning Interpretability for Visualizations using Adapted Cox Models  through a User Experiment](https://arxiv.org/abs/1611.06175)\n[(PDF)](https://arxiv.org/pdf/1611.06175)\n\n`Authors:Adrien Bibal, Benoit Frénay`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.06175 [stat.ML]\n\n \n(or arXiv:1611.06175v1 [stat.ML] for this version)\n\n\n> Abstract: In order to be useful, visualizations need to be interpretable. This paper\nuses a user-based approach to combine and assess quality measures in order to\nbetter model user preferences. Results show that cluster separability measures\nare outperformed by a neighborhood conservation measure, even though the former\nare usually considered as intuitively representative of user motives. Moreover,\ncombining measures, as opposed to using a single measure, further improves\nprediction performances.\n\n\n## [Tree Space Prototypes: Another Look at Making Tree Ensembles  Interpretable](https://arxiv.org/abs/1611.07115)\n[(PDF)](https://arxiv.org/pdf/1611.07115)\n\n`Authors:Hui Fen Tan, Giles Hooker, Martin T. Wells`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.07115 [stat.ML]\n\n \n(or arXiv:1611.07115v1 [stat.ML] for this version)\n\n\n> Abstract: Ensembles of decision trees have good prediction accuracy but suffer from a\nlack of interpretability. We propose a new approach for interpreting tree\nensembles by finding prototypes in tree space, utilizing the naturally-learned\nsimilarity measure from the tree ensemble. Demonstrating the method on random\nforests, we show that the method benefits from unique aspects of tree ensembles\nby leveraging tree structure to sequentially find prototypes. The method\nprovides good prediction accuracy when found prototypes are used in\nnearest-prototype classifiers, while using fewer prototypes than competitor\nmethods. We are investigating the sensitivity of the method to different\nprototype-finding procedures and demonstrating it on higher-dimensional data.\n\n\n## [Interpreting Finite Automata for Sequential Data](https://arxiv.org/abs/1611.07100)\n[(PDF)](https://arxiv.org/pdf/1611.07100)\n\n`Authors:Christian Albert Hammerschmidt, Sicco Verwer, Qin Lin, Radu State`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Artificial Intelligence (cs.AI)\n\n\nACM classes:\n\nI.2.6\n\n\nCite as:\n\narXiv:1611.07100 [stat.ML]\n\n \n(or arXiv:1611.07100v2 [stat.ML] for this version)\n\n\n> Abstract: Automaton models are often seen as interpretable models. Interpretability\nitself is not well defined: it remains unclear what interpretability means\nwithout first explicitly specifying objectives or desired attributes. In this\npaper, we identify the key properties used to interpret automata and propose a\nmodification of a state-merging approach to learn variants of finite state\nautomata. We apply the approach to problems beyond typical grammar inference\ntasks. Additionally, we cover several use-cases for prediction, classification,\nand clustering on sequential data in both supervised and unsupervised scenarios\nto show how the identified key properties are applicable in a wide range of\ncontexts.\n\n\n## [Inducing Interpretable Representations with Variational Autoencoders](https://arxiv.org/abs/1611.07492)\n[(PDF)](https://arxiv.org/pdf/1611.07492)\n\n`Authors:N. Siddharth, Brooks Paige, Alban Desmaison, Jan-Willem Van de Meent, Frank Wood, Noah D. Goodman, Pushmeet Kohli, Philip H.S. Torr`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.07492 [stat.ML]\n\n \n(or arXiv:1611.07492v1 [stat.ML] for this version)\n\n\n> Abstract: We develop a framework for incorporating structured graphical models in the\n\\emph{encoders} of variational autoencoders (VAEs) that allows us to induce\ninterpretable representations through approximate variational inference. This\nallows us to both perform reasoning (e.g. classification) under the structural\nconstraints of a given graphical model, and use deep generative models to deal\nwith messy, high-dimensional domains where it is often difficult to model all\nthe variation. Learning in this framework is carried out end-to-end with a\nvariational objective, applying to both unsupervised and semi-supervised\nschemes.\n\n\n## [Interpretation of Prediction Models Using the Input Gradient](https://arxiv.org/abs/1611.07634)\n[(PDF)](https://arxiv.org/pdf/1611.07634)\n\n`Authors:Yotam Hechtlinger`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.07634 [stat.ML]\n\n \n(or arXiv:1611.07634v1 [stat.ML] for this version)\n\n\n> Abstract: State of the art machine learning algorithms are highly optimized to provide\nthe optimal prediction possible, naturally resulting in complex models. While\nthese models often outperform simpler more interpretable models by order of\nmagnitudes, in terms of understanding the way the model functions, we are often\nfacing a \"black box\".\nIn this paper we suggest a simple method to interpret the behavior of any\npredictive model, both for regression and classification. Given a particular\nmodel, the information required to interpret it can be obtained by studying the\npartial derivatives of the model with respect to the input. We exemplify this\ninsight by interpreting convolutional and multi-layer neural networks in the\nfield of natural language processing.\n\n\n## [Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery](https://arxiv.org/abs/1611.07252)\n[(PDF)](https://arxiv.org/pdf/1611.07252)\n\n`Authors:Scott Wisdom, Thomas Powers, James Pitton, Les Atlas`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1611.07252 [stat.ML]\n\n \n(or arXiv:1611.07252v1 [stat.ML] for this version)\n\n\n> Abstract: Recurrent neural networks (RNNs) are powerful and effective for processing\nsequential data. However, RNNs are usually considered \"black box\" models whose\ninternal structure and learned parameters are not interpretable. In this paper,\nwe propose an interpretable RNN based on the sequential iterative\nsoft-thresholding algorithm (SISTA) for solving the sequential sparse recovery\nproblem, which models a sequence of correlated observations with a sequence of\nsparse latent vectors. The architecture of the resulting SISTA-RNN is\nimplicitly defined by the computational structure of SISTA, which results in a\nnovel stacked RNN architecture. Furthermore, the weights of the SISTA-RNN are\nperfectly interpretable as the parameters of a principled statistical model,\nwhich in this case include a sparsifying dictionary, iterative step size, and\nregularization parameters. In addition, on a particular sequential compressive\nsensing task, the SISTA-RNN trains faster and achieves better performance than\nconventional state-of-the-art black box RNNs, including long-short term memory\n(LSTM) RNNs.\n\n\n## [An unexpected unity among methods for interpreting model predictions](https://arxiv.org/abs/1611.07478)\n[(PDF)](https://arxiv.org/pdf/1611.07478)\n\n`Authors:Scott Lundberg, Su-In Lee`\n\n\nComments:\n\nPresented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1611.07478 [cs.AI]\n\n \n(or arXiv:1611.07478v3 [cs.AI] for this version)\n\n\n> Abstract: Understanding why a model made a certain prediction is crucial in many data\nscience fields. Interpretable predictions engender appropriate trust and\nprovide insight into how the model may be improved. However, with large modern\ndatasets the best accuracy is often achieved by complex models even experts\nstruggle to interpret, which creates a tension between accuracy and\ninterpretability. Recently, several methods have been proposed for interpreting\npredictions from complex models by estimating the importance of input features.\nHere, we present how a model-agnostic additive representation of the importance\nof input features unifies current methods. This representation is optimal, in\nthe sense that it is the only set of additive values that satisfies important\nproperties. We show how we can leverage these properties to create novel visual\nexplanations of model predictions. The thread of unity that this representation\nweaves through the literature indicates that there are common principles to be\nlearned about the interpretation of model predictions that apply in many\nscenarios.\n\n\n## [Input Switched Affine Networks: An RNN Architecture Designed for  Interpretability](https://arxiv.org/abs/1611.09434)\n[(PDF)](https://arxiv.org/pdf/1611.09434)\n\n`Authors:Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo`\n\n\nComments:\n\nICLR 2107 submission: this https URL\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)\n\n\nCite as:\n\narXiv:1611.09434 [cs.AI]\n\n \n(or arXiv:1611.09434v2 [cs.AI] for this version)\n\n\n> Abstract: There exist many problem domains where the interpretability of neural network\nmodels is essential for deployment. Here we introduce a recurrent architecture\ncomposed of input-switched affine transformations - in other words an RNN\nwithout any explicit nonlinearities, but with input-dependent recurrent\nweights. This simple form allows the RNN to be analyzed via straightforward\nlinear methods: we can exactly characterize the linear contribution of each\ninput to the model predictions; we can use a change-of-basis to disentangle\ninput, output, and computational hidden unit subspaces; we can fully\nreverse-engineer the architecture's solution to a simple task. Despite this\nease of interpretation, the input switched affine network achieves reasonable\nperformance on a text modeling tasks, and allows greater computational\nefficiency than networks with standard nonlinearities.\n\n\n## [Large scale modeling of antimicrobial resistance with interpretable  classifiers](https://arxiv.org/abs/1612.01030)\n[(PDF)](https://arxiv.org/pdf/1612.01030)\n\n`Authors:Alexandre Drouin, Frédéric Raymond, Gaël Letarte St-Pierre, Mario Marchand, Jacques Corbeil, François Laviolette`\n\n\nComments:\n\nPeer-reviewed and accepted for presentation at the Machine Learning for Health Workshop, NIPS 2016, Barcelona, Spain\n\nSubjects:\n\nGenomics (q-bio.GN); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1612.01030 [q-bio.GN]\n\n \n(or arXiv:1612.01030v1 [q-bio.GN] for this version)\n\n\n> Abstract: Antimicrobial resistance is an important public health concern that has\nimplications in the practice of medicine worldwide. Accurately predicting\nresistance phenotypes from genome sequences shows great promise in promoting\nbetter use of antimicrobial agents, by determining which antibiotics are likely\nto be effective in specific clinical cases. In healthcare, this would allow for\nthe design of treatment plans tailored for specific individuals, likely\nresulting in better clinical outcomes for patients with bacterial infections.\nIn this work, we present the recent work of Drouin et al. (2016) on using Set\nCovering Machines to learn highly interpretable models of antibiotic resistance\nand complement it by providing a large scale application of their method to the\nentire PATRIC database. We report prediction results for 36 new datasets and\npresent the Kover AMR platform, a new web-based tool allowing the visualization\nand interpretation of the generated models.\n\n\n## [Interpretable Semantic Textual Similarity: Finding and explaining  differences between sentences](https://arxiv.org/abs/1612.04868)\n[(PDF)](https://arxiv.org/pdf/1612.04868)\n\n`Authors:I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, E. Agirre`\n\n\nComments:\n\nPreprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)\n\nSubjects:\n\nComputation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nDOI:\n\n10.1016/j.knosys.2016.12.013\n\n\nCite as:\n\narXiv:1612.04868 [cs.CL]\n\n \n(or arXiv:1612.04868v1 [cs.CL] for this version)\n\n\n> Abstract: User acceptance of artificial intelligence agents might depend on their\nability to explain their reasoning, which requires adding an interpretability\nlayer that fa- cilitates users to understand their behavior. This paper focuses\non adding an in- terpretable layer on top of Semantic Textual Similarity (STS),\nwhich measures the degree of semantic equivalence between two sentences. The\ninterpretability layer is formalized as the alignment between pairs of segments\nacross the two sentences, where the relation between the segments is labeled\nwith a relation type and a similarity score. We present a publicly available\ndataset of sentence pairs annotated following the formalization. We then\ndevelop a system trained on this dataset which, given a sentence pair, explains\nwhat is similar and different, in the form of graded and typed segment\nalignments. When evaluated on the dataset, the system performs better than an\ninformed baseline, showing that the dataset and task are well-defined and\nfeasible. Most importantly, two user studies show how the system output can be\nused to automatically produce explanations in natural language. Users performed\nbetter when having access to the explanations, pro- viding preliminary evidence\nthat our dataset and method to automatically produce explanations is useful in\nreal applications.\n\n\n## [Towards a New Interpretation of Separable Convolutions](https://arxiv.org/abs/1701.04489)\n[(PDF)](https://arxiv.org/pdf/1701.04489)\n\n`Authors:Tapabrata Ghosh`\n\n\nSubjects:\n\nLearning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1701.04489 [cs.LG]\n\n \n(or arXiv:1701.04489v1 [cs.LG] for this version)\n\n\n> Abstract: In recent times, the use of separable convolutions in deep convolutional\nneural network architectures has been explored. Several researchers, most\nnotably (Chollet, 2016) and (Ghosh, 2017) have used separable convolutions in\ntheir deep architectures and have demonstrated state of the art or close to\nstate of the art performance. However, the underlying mechanism of action of\nseparable convolutions are still not fully understood. Although their\nmathematical definition is well understood as a depthwise convolution followed\nby a pointwise convolution, deeper interpretations such as the extreme\nInception hypothesis (Chollet, 2016) have failed to provide a thorough\nexplanation of their efficacy. In this paper, we propose a hybrid\ninterpretation that we believe is a better model for explaining the efficacy of\nseparable convolutions.\n\n\n## [Towards A Rigorous Science of Interpretable Machine Learning](https://arxiv.org/abs/1702.08608)\n[(PDF)](https://arxiv.org/pdf/1702.08608)\n\n`Authors:Finale Doshi-Velez, Been Kim`\n\n\nSubjects:\n\nMachine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1702.08608 [stat.ML]\n\n \n(or arXiv:1702.08608v2 [stat.ML] for this version)\n\n\n> Abstract: As machine learning systems become ubiquitous, there has been a surge of\ninterest in interpretable machine learning: systems that provide explanation\nfor their outputs. These explanations are often used to qualitatively assess\nother criteria such as safety or non-discrimination. However, despite the\ninterest in interpretability, there is very little consensus on what\ninterpretable machine learning is and how it should be measured. In this\nposition paper, we first define interpretability and describe when\ninterpretability is needed (and when it is not). Next, we suggest a taxonomy\nfor rigorous evaluation and expose open questions towards a more rigorous\nscience of interpretable machine learning.\n\n\n## [Streaming Weak Submodularity: Interpreting Neural Networks on the Fly](https://arxiv.org/abs/1703.02647)\n[(PDF)](https://arxiv.org/pdf/1703.02647)\n\n`Authors:Ethan R. Elenberg, Alexandros G. Dimakis, Moran Feldman, Amin Karbasi`\n\n\nComments:\n\nTo appear in NIPS 2017\n\nSubjects:\n\nMachine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1703.02647 [stat.ML]\n\n \n(or arXiv:1703.02647v3 [stat.ML] for this version)\n\n\n> Abstract: In many machine learning applications, it is important to explain the\npredictions of a black-box classifier. For example, why does a deep neural\nnetwork assign an image to a particular class? We cast interpretability of\nblack-box classifiers as a combinatorial maximization problem and propose an\nefficient streaming algorithm to solve it subject to cardinality constraints.\nBy extending ideas from Badanidiyuru et al. [2014], we provide a constant\nfactor approximation guarantee for our algorithm in the case of random stream\norder and a weakly submodular objective function. This is the first such\ntheoretical guarantee for this general class of functions, and we also show\nthat no such algorithm exists for a worst case stream order. Our algorithm\nobtains similar explanations of Inception V3 predictions $10$ times faster than\nthe state-of-the-art LIME framework of Ribeiro et al. [2016].\n\n\n## [Interpretable Structure-Evolving LSTM](https://arxiv.org/abs/1703.03055)\n[(PDF)](https://arxiv.org/pdf/1703.03055)\n\n`Authors:Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing`\n\n\nComments:\n\nTo appear in CVPR 2017 as a spotlight paper\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1703.03055 [cs.CV]\n\n \n(or arXiv:1703.03055v1 [cs.CV] for this version)\n\n\n> Abstract: This paper develops a general framework for learning interpretable data\nrepresentation via Long Short-Term Memory (LSTM) recurrent neural networks over\nhierarchal graph structures. Instead of learning LSTM models over the pre-fixed\nstructures, we propose to further learn the intermediate interpretable\nmulti-level graph structures in a progressive and stochastic way from data\nduring the LSTM network optimization. We thus call this model the\nstructure-evolving LSTM. In particular, starting with an initial element-level\ngraph representation where each node is a small data element, the\nstructure-evolving LSTM gradually evolves the multi-level graph representations\nby stochastically merging the graph nodes with high compatibilities along the\nstacked LSTM layers. In each LSTM layer, we estimate the compatibility of two\nconnected nodes from their corresponding LSTM gate outputs, which is used to\ngenerate a merging probability. The candidate graph structures are accordingly\ngenerated where the nodes are grouped into cliques with their merging\nprobabilities. We then produce the new graph structure with a\nMetropolis-Hasting algorithm, which alleviates the risk of getting stuck in\nlocal optimums by stochastic sampling with an acceptance probability. Once a\ngraph structure is accepted, a higher-level graph is then constructed by taking\nthe partitioned cliques as its nodes. During the evolving process,\nrepresentation becomes more abstracted in higher-levels where redundant\ninformation is filtered out, allowing more efficient propagation of long-range\ndata dependencies. We evaluate the effectiveness of structure-evolving LSTM in\nthe application of semantic object parsing and demonstrate its advantage over\nstate-of-the-art LSTM models on standard benchmarks.\n\n\n## [Improving Interpretability of Deep Neural Networks with Semantic  Information](https://arxiv.org/abs/1703.04096)\n[(PDF)](https://arxiv.org/pdf/1703.04096)\n\n`Authors:Yinpeng Dong, Hang Su, Jun Zhu, Bo Zhang`\n\n\nComments:\n\nTo appear in CVPR 2017\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1703.04096 [cs.CV]\n\n \n(or arXiv:1703.04096v2 [cs.CV] for this version)\n\n\n> Abstract: Interpretability of deep neural networks (DNNs) is essential since it enables\nusers to understand the overall strengths and weaknesses of the models, conveys\nan understanding of how the models will behave in the future, and how to\ndiagnose and correct potential problems. However, it is challenging to reason\nabout what a DNN actually does due to its opaque or black-box nature. To\naddress this issue, we propose a novel technique to improve the\ninterpretability of DNNs by leveraging the rich semantic information embedded\nin human descriptions. By concentrating on the video captioning task, we first\nextract a set of semantically meaningful topics from the human descriptions\nthat cover a wide range of visual concepts, and integrate them into the model\nwith an interpretive loss. We then propose a prediction difference maximization\nalgorithm to interpret the learned features of each neuron. Experimental\nresults demonstrate its effectiveness in video captioning using the\ninterpretable features, which can also be transferred to video action\nrecognition. By clearly understanding the learned features, users can easily\nrevise false predictions via a human-in-the-loop procedure.\n\n\n## [InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations](https://arxiv.org/abs/1703.08840)\n[(PDF)](https://arxiv.org/pdf/1703.08840)\n\n`Authors:Yunzhu Li, Jiaming Song, Stefano Ermon`\n\n\nComments:\n\n14 pages, NIPS 2017\n\nSubjects:\n\nLearning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1703.08840 [cs.LG]\n\n \n(or arXiv:1703.08840v2 [cs.LG] for this version)\n\n\n> Abstract: The goal of imitation learning is to mimic expert behavior without access to\nan explicit reward signal. Expert demonstrations provided by humans, however,\noften show significant variability due to latent factors that are typically not\nexplicitly modeled. In this paper, we propose a new algorithm that can infer\nthe latent structure of expert demonstrations in an unsupervised way. Our\nmethod, built on top of Generative Adversarial Imitation Learning, can not only\nimitate complex behaviors, but also learn interpretable and meaningful\nrepresentations of complex behavioral data, including visual demonstrations. In\nthe driving domain, we show that a model learned from human demonstrations is\nable to both accurately reproduce a variety of behaviors and accurately\nanticipate human actions using raw visual inputs. Compared with various\nbaselines, our method can better capture the latent structure underlying expert\ndemonstrations, often recovering semantically meaningful factors of variation\nin the data.\n\n\n## [Interpretable Learning for Self-Driving Cars by Visualizing Causal  Attention](https://arxiv.org/abs/1703.10631)\n[(PDF)](https://arxiv.org/pdf/1703.10631)\n\n`Authors:Jinkyu Kim, John Canny`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1703.10631 [cs.CV]\n\n \n(or arXiv:1703.10631v1 [cs.CV] for this version)\n\n\n> Abstract: Deep neural perception and control networks are likely to be a key component\nof self-driving vehicles. These models need to be explainable - they should\nprovide easy-to-interpret rationales for their behavior - so that passengers,\ninsurance companies, law enforcement, developers etc., can understand what\ntriggered a particular behavior. Here we explore the use of visual\nexplanations. These explanations take the form of real-time highlighted regions\nof an image that causally influence the network's output (steering control).\nOur approach is two-stage. In the first stage, we use a visual attention model\nto train a convolution network end-to-end from images to steering angle. The\nattention model highlights image regions that potentially influence the\nnetwork's output. Some of these are true influences, but some are spurious. We\nthen apply a causal filtering step to determine which input regions actually\ninfluence the output. This produces more succinct visual explanations and more\naccurately exposes the network's behavior. We demonstrate the effectiveness of\nour model on three datasets totaling 16 hours of driving. We first show that\ntraining with attention does not degrade the performance of the end-to-end\nnetwork. Then we show that the network causally cues on a variety of features\nthat are used by humans while driving.\n\n\n## [Interpretable 3D Human Action Analysis with Temporal Convolutional  Networks](https://arxiv.org/abs/1704.04516)\n[(PDF)](https://arxiv.org/pdf/1704.04516)\n\n`Authors:Tae Soo Kim, Austin Reiter`\n\n\nComments:\n\n8 pages, 5 figures, BNMW CVPR 2017 Submission\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nMSC classes:\n\n68T45, 68T10 (Primary)\n\n\nACM classes:\n\nI.2.10; I.5.4\n\n\nCite as:\n\narXiv:1704.04516 [cs.CV]\n\n \n(or arXiv:1704.04516v1 [cs.CV] for this version)\n\n\n> Abstract: The discriminative power of modern deep learning models for 3D human action\nrecognition is growing ever so potent. In conjunction with the recent\nresurgence of 3D human action representation with 3D skeletons, the quality and\nthe pace of recent progress have been significant. However, the inner workings\nof state-of-the-art learning based methods in 3D human action recognition still\nremain mostly black-box. In this work, we propose to use a new class of models\nknown as Temporal Convolutional Neural Networks (TCN) for 3D human action\nrecognition. Compared to popular LSTM-based Recurrent Neural Network models,\ngiven interpretable input such as 3D skeletons, TCN provides us a way to\nexplicitly learn readily interpretable spatio-temporal representations for 3D\nhuman action recognition. We provide our strategy in re-designing the TCN with\ninterpretability in mind and how such characteristics of the model is leveraged\nto construct a powerful 3D activity recognition method. Through this work, we\nwish to take a step towards a spatio-temporal model that is easier to\nunderstand, explain and interpret. The resulting model, Res-TCN, achieves\nstate-of-the-art results on the largest 3D human action recognition dataset,\nNTU-RGBD.\n\n\n## [An Interpretable Knowledge Transfer Model for Knowledge Base Completion](https://arxiv.org/abs/1704.05908)\n[(PDF)](https://arxiv.org/pdf/1704.05908)\n\n`Authors:Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy`\n\n\nComments:\n\nAccepted by ACL 2017. Minor update\n\nSubjects:\n\nComputation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1704.05908 [cs.CL]\n\n \n(or arXiv:1704.05908v2 [cs.CL] for this version)\n\n\n> Abstract: Knowledge bases are important resources for a variety of natural language\nprocessing tasks but suffer from incompleteness. We propose a novel embedding\nmodel, \\emph{ITransF}, to perform knowledge base completion. Equipped with a\nsparse attention mechanism, ITransF discovers hidden concepts of relations and\ntransfer statistical strength through the sharing of concepts. Moreover, the\nlearned associations between relations and concepts, which are represented by\nsparse attention vectors, can be interpreted easily. We evaluate ITransF on two\nbenchmark datasets---WN18 and FB15k for knowledge base completion and obtains\nimprovements on both the mean rank and Hits@10 metrics, over all baselines that\ndo not use additional information.\n\n\n## [Network Dissection: Quantifying Interpretability of Deep Visual  Representations](https://arxiv.org/abs/1704.05796)\n[(PDF)](https://arxiv.org/pdf/1704.05796)\n\n`Authors:David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba`\n\n\nComments:\n\nFirst two authors contributed equally. Oral presentation at CVPR 2017\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)\n\n\nACM classes:\n\nI.2.10\n\n\nCite as:\n\narXiv:1704.05796 [cs.CV]\n\n \n(or arXiv:1704.05796v1 [cs.CV] for this version)\n\n\n> Abstract: We propose a general framework called Network Dissection for quantifying the\ninterpretability of latent representations of CNNs by evaluating the alignment\nbetween individual hidden units and a set of semantic concepts. Given any CNN\nmodel, the proposed method draws on a broad data set of visual concepts to\nscore the semantics of hidden units at each intermediate convolutional layer.\nThe units with semantics are given labels across a range of objects, parts,\nscenes, textures, materials, and colors. We use the proposed method to test the\nhypothesis that interpretability of units is equivalent to random linear\ncombinations of units, then we apply our method to compare the latent\nrepresentations of various networks when trained to solve different supervised\nand self-supervised training tasks. We further analyze the effect of training\niterations, compare networks trained with different initializations, examine\nthe impact of network depth and width, and measure the effect of dropout and\nbatch normalization on the interpretability of deep visual representations. We\ndemonstrate that the proposed method can shed light on characteristics of CNN\nmodels and training methods that go beyond measurements of their discriminative\npower.\n\n\n## [Softmax Q-Distribution Estimation for Structured Prediction: A  Theoretical Interpretation for RAML](https://arxiv.org/abs/1705.07136)\n[(PDF)](https://arxiv.org/pdf/1705.07136)\n\n`Authors:Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy`\n\n\nComments:\n\nUnder Review of ICLR 2018\n\nSubjects:\n\nLearning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1705.07136 [cs.LG]\n\n \n(or arXiv:1705.07136v3 [cs.LG] for this version)\n\n\n> Abstract: Reward augmented maximum likelihood (RAML), a simple and effective learning\nframework to directly optimize towards the reward function in structured\nprediction tasks, has led to a number of impressive empirical successes. RAML\nincorporates task-specific reward by performing maximum-likelihood updates on\ncandidate outputs sampled according to an exponentiated payoff distribution,\nwhich gives higher probabilities to candidates that are close to the reference\noutput. While RAML is notable for its simplicity, efficiency, and its\nimpressive empirical successes, the theoretical properties of RAML, especially\nthe behavior of the exponentiated payoff distribution, has not been examined\nthoroughly. In this work, we introduce softmax Q-distribution estimation, a\nnovel theoretical interpretation of RAML, which reveals the relation between\nRAML and Bayesian decision theory. The softmax Q-distribution can be regarded\nas a smooth approximation of the Bayes decision boundary, and the Bayes\ndecision rule is achieved by decoding with this Q-distribution. We further show\nthat RAML is equivalent to approximately estimating the softmax Q-distribution,\nwith the temperature $\\tau$ controlling approximation error. We perform two\nexperiments, one on synthetic data of multi-class classification and one on\nreal data of image captioning, to demonstrate the relationship between RAML and\nthe proposed softmax Q-distribution estimation method, verifying our\ntheoretical analysis. Additional experiments on three structured prediction\ntasks with rewards defined on sequential (named entity recognition), tree-based\n(dependency parsing) and irregular (machine translation) structures show\nnotable improvements over maximum likelihood baselines.\n\n\n## [Logic Tensor Networks for Semantic Image Interpretation](https://arxiv.org/abs/1705.08968)\n[(PDF)](https://arxiv.org/pdf/1705.08968)\n\n`Authors:Ivan Donadello, Luciano Serafini, Artur d'Avila Garcez`\n\n\nComments:\n\n14 pages, 2 figures, IJCAI 2017\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1705.08968 [cs.AI]\n\n \n(or arXiv:1705.08968v1 [cs.AI] for this version)\n\n\n> Abstract: Semantic Image Interpretation (SII) is the task of extracting structured\nsemantic descriptions from images. It is widely agreed that the combined use of\nvisual data and background knowledge is of great importance for SII. Recently,\nStatistical Relational Learning (SRL) approaches have been developed for\nreasoning under uncertainty and learning in the presence of data and rich\nknowledge. Logic Tensor Networks (LTNs) are an SRL framework which integrates\nneural networks with first-order fuzzy logic to allow (i) efficient learning\nfrom noisy data in the presence of logical constraints, and (ii) reasoning with\nlogical formulas describing general properties of the data. In this paper, we\ndevelop and apply LTNs to two of the main tasks of SII, namely, the\nclassification of an image's bounding boxes and the detection of the relevant\npart-of relations between objects. To the best of our knowledge, this is the\nfirst successful application of SRL to such SII tasks. The proposed approach is\nevaluated on a standard image processing benchmark. Experiments show that the\nuse of background knowledge in the form of logical constraints can improve the\nperformance of purely data-driven approaches, including the state-of-the-art\nFast Region-based Convolutional Neural Networks (Fast R-CNN). Moreover, we show\nthat the use of logical background knowledge adds robustness to the learning\nsystem when errors are present in the labels of the training data.\n\n\n## [Patchnet: Interpretable Neural Networks for Image Classification](https://arxiv.org/abs/1705.08078)\n[(PDF)](https://arxiv.org/pdf/1705.08078)\n\n`Authors:Adityanarayanan Radhakrishnan, Charles Durham, Ali Soylemezoglu, Caroline Uhler`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1705.08078 [cs.CV]\n\n \n(or arXiv:1705.08078v1 [cs.CV] for this version)\n\n\n> Abstract: The ability to visually understand and interpret learned features from\ncomplex predictive models is crucial for their acceptance in sensitive areas\nsuch as health care. To move closer to this goal of truly interpretable complex\nmodels, we present PatchNet, a network that restricts global context for image\nclassification tasks in order to easily provide visual representations of\nlearned texture features on a predetermined local scale. We demonstrate how\nPatchNet provides visual heatmap representations of the learned features, and\nwe mathematically analyze the behavior of the network during convergence. We\nalso present a version of PatchNet that is particularly well suited for\nlowering false positive rates in image classification tasks. We apply PatchNet\nto the classification of textures from the Describable Textures Dataset and to\nthe ISBI-ISIC 2016 melanoma classification challenge.\n\n\n## [A Unified Approach to Interpreting Model Predictions](https://arxiv.org/abs/1705.07874)\n[(PDF)](https://arxiv.org/pdf/1705.07874)\n\n`Authors:Scott Lundberg, Su-In Lee`\n\n\nComments:\n\nTo appear in NIPS 2017\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1705.07874 [cs.AI]\n\n \n(or arXiv:1705.07874v2 [cs.AI] for this version)\n\n\n> Abstract: Understanding why a model makes a certain prediction can be as crucial as the\nprediction's accuracy in many applications. However, the highest accuracy for\nlarge modern datasets is often achieved by complex models that even experts\nstruggle to interpret, such as ensemble or deep learning models, creating a\ntension between accuracy and interpretability. In response, various methods\nhave recently been proposed to help users interpret the predictions of complex\nmodels, but it is often unclear how these methods are related and when one\nmethod is preferable over another. To address this problem, we present a\nunified framework for interpreting predictions, SHAP (SHapley Additive\nexPlanations). SHAP assigns each feature an importance value for a particular\nprediction. Its novel components include: (1) the identification of a new class\nof additive feature importance measures, and (2) theoretical results showing\nthere is a unique solution in this class with a set of desirable properties.\nThe new class unifies six existing methods, notable because several recent\nmethods in the class lack the proposed desirable properties. Based on insights\nfrom this unification, we present new methods that show improved computational\nperformance and/or better consistency with human intuition than previous\napproaches.\n\n\n## [Interpreting Blackbox Models via Model Extraction](https://arxiv.org/abs/1705.08504)\n[(PDF)](https://arxiv.org/pdf/1705.08504)\n\n`Authors:Osbert Bastani, Carolyn Kim, Hamsa Bastani`\n\n\nSubjects:\n\nLearning (cs.LG)\n\n\nCite as:\n\narXiv:1705.08504 [cs.LG]\n\n \n(or arXiv:1705.08504v1 [cs.LG] for this version)\n\n\n> Abstract: Interpretability has become an important issue as machine learning is\nincreasingly used to inform consequential decisions. We propose an approach for\ninterpreting a blackbox model by extracting a decision tree that approximates\nthe model. Our model extraction algorithm avoids overfitting by leveraging\nblackbox model access to actively sample new training points. We prove that as\nthe number of samples goes to infinity, the decision tree learned using our\nalgorithm converges to the exact greedy decision tree. In our evaluation, we\nuse our algorithm to interpret random forests and neural nets trained on\nseveral datasets from the UCI Machine Learning Repository, as well as control\npolicies learned for three classical reinforcement learning problems. We show\nthat our algorithm improves over a baseline based on CART on every problem\ninstance. Furthermore, we show how an interpretation generated by our approach\ncan be used to understand and debug these models.\n\n\n## [Interpretable & Explorable Approximations of Black Box Models](https://arxiv.org/abs/1707.01154)\n[(PDF)](https://arxiv.org/pdf/1707.01154)\n\n`Authors:Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec`\n\n\nComments:\n\nPresented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1707.01154 [cs.AI]\n\n \n(or arXiv:1707.01154v1 [cs.AI] for this version)\n\n\n> Abstract: We propose Black Box Explanations through Transparent Approximations (BETA),\na novel model agnostic framework for explaining the behavior of any black-box\nclassifier by simultaneously optimizing for fidelity to the original model and\ninterpretability of the explanation. To this end, we develop a novel objective\nfunction which allows us to learn (with optimality guarantees), a small number\nof compact decision sets each of which explains the behavior of the black box\nmodel in unambiguous, well-defined regions of feature space. Furthermore, our\nframework also is capable of accepting user input when generating these\napproximations, thus allowing users to interactively explore how the black-box\nmodel behaves in different subspaces that are of interest to the user. To the\nbest of our knowledge, this is the first approach which can produce global\nexplanations of the behavior of any given black box model through joint\noptimization of unambiguity, fidelity, and interpretability, while also\nallowing users to explore model behavior based on their preferences.\nExperimental evaluation with real-world datasets and user studies demonstrates\nthat our approach can generate highly compact, easy-to-understand, yet accurate\napproximations of various kinds of predictive models compared to\nstate-of-the-art baselines.\n\n\n## [Interpretability via Model Extraction](https://arxiv.org/abs/1706.09773)\n[(PDF)](https://arxiv.org/pdf/1706.09773)\n\n`Authors:Osbert Bastani, Carolyn Kim, Hamsa Bastani`\n\n\nComments:\n\nPresented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)\n\nSubjects:\n\nLearning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1706.09773 [cs.LG]\n\n \n(or arXiv:1706.09773v2 [cs.LG] for this version)\n\n\n> Abstract: The ability to interpret machine learning models has become increasingly\nimportant now that machine learning is used to inform consequential decisions.\nWe propose an approach called model extraction for interpreting complex,\nblackbox models. Our approach approximates the complex model using a much more\ninterpretable model; as long as the approximation quality is good, then\nstatistical properties of the complex model are reflected in the interpretable\nmodel. We show how model extraction can be used to understand and debug random\nforests and neural nets trained on several datasets from the UCI Machine\nLearning Repository, as well as control policies learned for several classical\nreinforcement learning problems.\n\n\n## [Methods for Interpreting and Understanding Deep Neural Networks](https://arxiv.org/abs/1706.07979)\n[(PDF)](https://arxiv.org/pdf/1706.07979)\n\n`Authors:Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller`\n\n\nComments:\n\n14 pages, 10 figures\n\nSubjects:\n\nLearning (cs.LG); Machine Learning (stat.ML)\n\n\nDOI:\n\n10.1016/j.dsp.2017.10.011\n\n\nCite as:\n\narXiv:1706.07979 [cs.LG]\n\n \n(or arXiv:1706.07979v1 [cs.LG] for this version)\n\n\n> Abstract: This paper provides an entry point to the problem of interpreting a deep\nneural network model and explaining its predictions. It is based on a tutorial\ngiven at ICASSP 2017. It introduces some recently proposed techniques of\ninterpretation, along with theory, tricks and recommendations, to make most\nefficient use of these techniques on real data. It also discusses a number of\npractical applications.\n\n\n## [MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis  Network](https://arxiv.org/abs/1707.02485)\n[(PDF)](https://arxiv.org/pdf/1707.02485)\n\n`Authors:Zizhao Zhang, Yuanpu Xie, Fuyong Xing, Mason McGough, Lin Yang`\n\n\nComments:\n\nCVPR2017 Oral\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1707.02485 [cs.CV]\n\n \n(or arXiv:1707.02485v1 [cs.CV] for this version)\n\n\n> Abstract: The inability to interpret the model prediction in semantically and visually\nmeaningful ways is a well-known shortcoming of most existing computer-aided\ndiagnosis methods. In this paper, we propose MDNet to establish a direct\nmultimodal mapping between medical images and diagnostic reports that can read\nimages, generate diagnostic reports, retrieve images by symptom descriptions,\nand visualize attention, to provide justifications of the network diagnosis\nprocess. MDNet includes an image model and a language model. The image model is\nproposed to enhance multi-scale feature ensembles and utilization efficiency.\nThe language model, integrated with our improved attention mechanism, aims to\nread and explore discriminative image feature descriptions from reports to\nlearn a direct mapping from sentence words to image pixels. The overall network\nis trained end-to-end by using our developed optimization strategy. Based on a\npathology bladder cancer images and its diagnostic reports (BCIDR) dataset, we\nconduct sufficient experiments to demonstrate that MDNet outperforms\ncomparative baselines. The proposed image model obtains state-of-the-art\nperformance on two CIFAR datasets as well.\n\n\n## [A Formal Framework to Characterize Interpretability of Procedures](https://arxiv.org/abs/1707.03886)\n[(PDF)](https://arxiv.org/pdf/1707.03886)\n\n`Authors:Amit Dhurandhar, Vijay Iyengar, Ronny Luss, Karthikeyan Shanmugam`\n\n\nComments:\n\npresented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1707.03886 [cs.AI]\n\n \n(or arXiv:1707.03886v1 [cs.AI] for this version)\n\n\n> Abstract: We provide a novel notion of what it means to be interpretable, looking past\nthe usual association with human understanding. Our key insight is that\ninterpretability is not an absolute concept and so we define it relative to a\ntarget model, which may or may not be a human. We define a framework that\nallows for comparing interpretable procedures by linking it to important\npractical aspects such as accuracy and robustness. We characterize many of the\ncurrent state-of-the-art interpretable methods in our framework portraying its\ngeneral applicability.\n\n\n## [Interpreting Classifiers through Attribute Interactions in Datasets](https://arxiv.org/abs/1707.07576)\n[(PDF)](https://arxiv.org/pdf/1707.07576)\n\n`Authors:Andreas Henelius, Kai Puolamäki, Antti Ukkonen`\n\n\nComments:\n\npresented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1707.07576 [stat.ML]\n\n \n(or arXiv:1707.07576v1 [stat.ML] for this version)\n\n\n> Abstract: In this work we present the novel ASTRID method for investigating which\nattribute interactions classifiers exploit when making predictions. Attribute\ninteractions in classification tasks mean that two or more attributes together\nprovide stronger evidence for a particular class label. Knowledge of such\ninteractions makes models more interpretable by revealing associations between\nattributes. This has applications, e.g., in pharmacovigilance to identify\ninteractions between drugs or in bioinformatics to investigate associations\nbetween single nucleotide polymorphisms. We also show how the found attribute\npartitioning is related to a factorisation of the data generating distribution\nand empirically demonstrate the utility of the proposed method.\n\n\n## [Interpretable Active Learning](https://arxiv.org/abs/1708.00049)\n[(PDF)](https://arxiv.org/pdf/1708.00049)\n\n`Authors:Richard L. Phillips, Kyu Hyun Chang, Sorelle A. Friedler`\n\n\nComments:\n\n6 pages, 5 figures, presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1708.00049 [stat.ML]\n\n \n(or arXiv:1708.00049v1 [stat.ML] for this version)\n\n\n> Abstract: Active learning has long been a topic of study in machine learning. However,\nas increasingly complex and opaque models have become standard practice, the\nprocess of active learning, too, has become more opaque. There has been little\ninvestigation into interpreting what specific trends and patterns an active\nlearning strategy may be exploring. This work expands on the Local\nInterpretable Model-agnostic Explanations framework (LIME) to provide\nexplanations for active learning recommendations. We demonstrate how LIME can\nbe used to generate locally faithful explanations for an active learning\nstrategy, and how these explanations can be used to understand how different\nmodels and datasets explore a problem space over time. In order to quantify the\nper-subgroup differences in how an active learning strategy queries spatial\nregions, we introduce a notion of uncertainty bias (based on disparate impact)\nto measure the discrepancy in the confidence for a model's predictions between\none subgroup and another. Using the uncertainty bias measure, we show that our\nquery explanations accurately reflect the subgroup focus of the active learning\nqueries, allowing for an interpretable explanation of what is being learned as\npoints with similar sources of uncertainty have their uncertainty bias\nresolved. We demonstrate that this technique can be applied to track\nuncertainty bias over user-defined clusters or automatically generated clusters\nbased on the source of uncertainty.\n\n\n## [Using Program Induction to Interpret Transition System Dynamics](https://arxiv.org/abs/1708.00376)\n[(PDF)](https://arxiv.org/pdf/1708.00376)\n\n`Authors:Svetlin Penkov, Subramanian Ramamoorthy`\n\n\nComments:\n\nPresented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia. arXiv admin note: substantial text overlap with arXiv:1705.08320\n\nSubjects:\n\nArtificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1708.00376 [cs.AI]\n\n \n(or arXiv:1708.00376v1 [cs.AI] for this version)\n\n\n> Abstract: Explaining and reasoning about processes which underlie observed black-box\nphenomena enables the discovery of causal mechanisms, derivation of suitable\nabstract representations and the formulation of more robust predictions. We\npropose to learn high level functional programs in order to represent abstract\nmodels which capture the invariant structure in the observed data. We introduce\nthe $\\pi$-machine (program-induction machine) -- an architecture able to induce\ninterpretable LISP-like programs from observed data traces. We propose an\noptimisation procedure for program learning based on backpropagation, gradient\ndescent and A* search. We apply the proposed method to two problems: system\nidentification of dynamical systems and explaining the behaviour of a DQN\nagent. Our results show that the $\\pi$-machine can efficiently induce\ninterpretable programs from individual data traces.\n\n\n## [Warp: a method for neural network interpretability applied to gene  expression profiles](https://arxiv.org/abs/1708.04988)\n[(PDF)](https://arxiv.org/pdf/1708.04988)\n\n`Authors:Trofimov Assya, Lemieux Sebastien, Perreault Claude`\n\n\nComments:\n\n5 pages, 3 figures, NIPS2016, Machine Learning in Computational Biology workshop\n\nSubjects:\n\nGenomics (q-bio.GN); Artificial Intelligence (cs.AI)\n\n\nCite as:\n\narXiv:1708.04988 [q-bio.GN]\n\n \n(or arXiv:1708.04988v1 [q-bio.GN] for this version)\n\n\n> Abstract: We show a proof of principle for warping, a method to interpret the inner\nworking of neural networks in the context of gene expression analysis. Warping\nis an efficient way to gain insight to the inner workings of neural nets and\nmake them more interpretable. We demonstrate the ability of warping to recover\nmeaningful information for a given class on a samplespecific individual basis.\nWe found warping works well in both linearly and nonlinearly separable\ndatasets. These encouraging results show that warping has a potential to be the\nanswer to neural networks interpretability in computational biology.\n\n\n## [DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation  of Self-Reported Pain](https://arxiv.org/abs/1708.04670)\n[(PDF)](https://arxiv.org/pdf/1708.04670)\n\n`Authors:Dianbo Liu, Fengjiao Peng, Andrew Shea, Ognjen (Oggi)Rudovic, Rosalind Picard`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1708.04670 [cs.CV]\n\n \n(or arXiv:1708.04670v1 [cs.CV] for this version)\n\n\n> Abstract: Previous research on automatic pain estimation from facial expressions has\nfocused primarily on \"one-size-fits-all\" metrics (such as PSPI). In this work,\nwe focus on directly estimating each individual's self-reported visual-analog\nscale (VAS) pain metric, as this is considered the gold standard for pain\nmeasurement. The VAS pain score is highly subjective and context-dependent, and\nits range can vary significantly among different persons. To tackle these\nissues, we propose a novel two-stage personalized model, named DeepFaceLIFT,\nfor automatic estimation of VAS. This model is based on (1) Neural Network and\n(2) Gaussian process regression models, and is used to personalize the\nestimation of self-reported pain via a set of hand-crafted personal features\nand multi-task learning. We show on the benchmark dataset for pain analysis\n(The UNBC-McMaster Shoulder Pain Expression Archive) that the proposed\npersonalized model largely outperforms the traditional, unpersonalized models:\nthe intra-class correlation improves from a baseline performance of 19\\% to a\npersonalized performance of 35\\% while also providing confidence in the\nmodel\\textquotesingle s estimates -- in contrast to existing models for the\ntarget task. Additionally, DeepFaceLIFT automatically discovers the\npain-relevant facial regions for each person, allowing for an easy\ninterpretation of the pain-related facial cues.\n\n\n## [Towards Interpretable Deep Neural Networks by Leveraging Adversarial  Examples](https://arxiv.org/abs/1708.05493)\n[(PDF)](https://arxiv.org/pdf/1708.05493)\n\n`Authors:Yinpeng Dong, Hang Su, Jun Zhu, Fan Bao`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1708.05493 [cs.CV]\n\n \n(or arXiv:1708.05493v1 [cs.CV] for this version)\n\n\n> Abstract: Deep neural networks (DNNs) have demonstrated impressive performance on a\nwide array of tasks, but they are usually considered opaque since internal\nstructure and learned parameters are not interpretable. In this paper, we\nre-examine the internal representations of DNNs using adversarial images, which\nare generated by an ensemble-optimization algorithm. We find that: (1) the\nneurons in DNNs do not truly detect semantic objects/parts, but respond to\nobjects/parts only as recurrent discriminative patches; (2) deep visual\nrepresentations are not robust distributed codes of visual concepts because the\nrepresentations of adversarial images are largely not consistent with those of\nreal images, although they have similar visual appearance, both of which are\ndifferent from previous findings. To further improve the interpretability of\nDNNs, we propose an adversarial training scheme with a consistent loss such\nthat the neurons are endowed with human-interpretable concepts. The induced\ninterpretable representations enable us to trace eventual outcomes back to\ninfluential neurons. Therefore, human users can know how the models make\npredictions, as well as when and why they make errors.\n\n\n## [More cat than cute? Interpretable Prediction of Adjective-Noun Pairs](https://arxiv.org/abs/1708.06039)\n[(PDF)](https://arxiv.org/pdf/1708.06039)\n\n`Authors:Delia Fernandez, Alejandro Woodward, Victor Campos, Xavier Giro-i-Nieto, Brendan Jou, Shih-Fu Chang`\n\n\nComments:\n\nOral paper at ACM Multimedia 2017 Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes (MUSA2)\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)\n\n\nDOI:\n\n10.1145/3132515.3132520\n\n\nCite as:\n\narXiv:1708.06039 [cs.CV]\n\n \n(or arXiv:1708.06039v1 [cs.CV] for this version)\n\n\n> Abstract: The increasing availability of affect-rich multimedia resources has bolstered\ninterest in understanding sentiment and emotions in and from visual content.\nAdjective-noun pairs (ANP) are a popular mid-level semantic construct for\ncapturing affect via visually detectable concepts such as \"cute dog\" or\n\"beautiful landscape\". Current state-of-the-art methods approach ANP prediction\nby considering each of these compound concepts as individual tokens, ignoring\nthe underlying relationships in ANPs. This work aims at disentangling the\ncontributions of the `adjectives' and `nouns' in the visual prediction of ANPs.\nTwo specialised classifiers, one trained for detecting adjectives and another\nfor nouns, are fused to predict 553 different ANPs. The resulting ANP\nprediction model is more interpretable as it allows us to study contributions\nof the adjective and noun components. Source code and models are available at\nthis https URL .\n\n\n## [Interpretable Categorization of Heterogeneous Time Series Data](https://arxiv.org/abs/1708.09121)\n[(PDF)](https://arxiv.org/pdf/1708.09121)\n\n`Authors:Ritchie Lee, Mykel J. Kochenderfer, Ole J. Mengshoel, Joshua Silbermann`\n\n\nComments:\n\n10 pages, 7 figures\n\nSubjects:\n\nLearning (cs.LG)\n\n\nCite as:\n\narXiv:1708.09121 [cs.LG]\n\n \n(or arXiv:1708.09121v1 [cs.LG] for this version)\n\n\n> Abstract: The explanation of heterogeneous multivariate time series data is a central\nproblem in many applications. The problem requires two major data mining\nchallenges to be addressed simultaneously: Learning models that are\nhuman-interpretable and mining of heterogeneous multivariate time series data.\nThe intersection of these two areas is not adequately explored in the existing\nliterature. To address this gap, we propose grammar-based decision trees and an\nalgorithm for learning them. Grammar-based decision tree extends decision trees\nwith a grammar framework. Logical expressions, derived from context-free\ngrammar, are used for branching in place of simple thresholds on attributes.\nThe added expressivity enables support for a wide range of data types while\nretaining the interpretability of decision trees. By choosing a grammar based\non temporal logic, we show that grammar-based decision trees can be used for\nthe interpretable classification of high-dimensional and heterogeneous time\nseries data. In addition to classification, we show how grammar-based decision\ntrees can also be used for categorization, which is a combination of clustering\nand generating interpretable explanations for each cluster. We apply\ngrammar-based decision trees to analyze the classic Australian Sign Language\ndataset as well as categorize and explain near mid-air collisions to support\nthe development of a prototype aircraft collision avoidance system.\n\n\n## [Explainable Artificial Intelligence: Understanding, Visualizing and  Interpreting Deep Learning Models](https://arxiv.org/abs/1708.08296)\n[(PDF)](https://arxiv.org/pdf/1708.08296)\n\n`Authors:Wojciech Samek, Thomas Wiegand, Klaus-Robert Müller`\n\n\nComments:\n\n8 pages, 2 figures\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Computers and Society (cs.CY); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1708.08296 [cs.AI]\n\n \n(or arXiv:1708.08296v1 [cs.AI] for this version)\n\n\n> Abstract: With the availability of large databases and recent improvements in deep\nlearning methodology, the performance of AI systems is reaching or even\nexceeding the human level on an increasing number of complex tasks. Impressive\nexamples of this development can be found in domains such as image\nclassification, sentiment analysis, speech understanding or strategic game\nplaying. However, because of their nested non-linear structure, these highly\nsuccessful machine learning and artificial intelligence models are usually\napplied in a black box manner, i.e., no information is provided about what\nexactly makes them arrive at their predictions. Since this lack of transparency\ncan be a major drawback, e.g., in medical applications, the development of\nmethods for visualizing, explaining and interpreting deep learning models has\nrecently attracted increasing attention. This paper summarizes recent\ndevelopments in this field and makes a plea for more interpretability in\nartificial intelligence. Furthermore, it presents two approaches to explaining\npredictions of deep learning models, one method which computes the sensitivity\nof the prediction with respect to changes in the input and one approach which\nmeaningfully decomposes the decision in terms of the input variables. These\nmethods are evaluated on three classification tasks.\n\n\n## [Interpreting Shared Deep Learning Models via Explicable Boundary Trees](https://arxiv.org/abs/1709.03730)\n[(PDF)](https://arxiv.org/pdf/1709.03730)\n\n`Authors:Huijun Wu, Chen Wang, Jie Yin, Kai Lu, Liming Zhu`\n\n\nComments:\n\n9 pages, 10 figures\n\nSubjects:\n\nLearning (cs.LG); Human-Computer Interaction (cs.HC)\n\n\nCite as:\n\narXiv:1709.03730 [cs.LG]\n\n \n(or arXiv:1709.03730v1 [cs.LG] for this version)\n\n\n> Abstract: Despite outperforming the human in many tasks, deep neural network models are\nalso criticized for the lack of transparency and interpretability in decision\nmaking. The opaqueness results in uncertainty and low confidence when deploying\nsuch a model in model sharing scenarios, when the model is developed by a third\nparty. For a supervised machine learning model, sharing training process\nincluding training data provides an effective way to gain trust and to better\nunderstand model predictions. However, it is not always possible to share all\ntraining data due to privacy and policy constraints. In this paper, we propose\na method to disclose a small set of training data that is just sufficient for\nusers to get the insight of a complicated model. The method constructs a\nboundary tree using selected training data and the tree is able to approximate\nthe complicated model with high fidelity. We show that traversing data points\nin the tree gives users significantly better understanding of the model and\npaves the way for trustworthy model sharing.\n\n\n## [Interpretable Graph-Based Semi-Supervised Learning via Flows](https://arxiv.org/abs/1709.04764)\n[(PDF)](https://arxiv.org/pdf/1709.04764)\n\n`Authors:Raif M. Rustamov, James T. Klosowski`\n\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1709.04764 [stat.ML]\n\n \n(or arXiv:1709.04764v1 [stat.ML] for this version)\n\n\n> Abstract: In this paper, we consider the interpretability of the foundational\nLaplacian-based semi-supervised learning approaches on graphs. We introduce a\nnovel flow-based learning framework that subsumes the foundational approaches\nand additionally provides a detailed, transparent, and easily understood\nexpression of the learning process in terms of graph flows. As a result, one\ncan visualize and interactively explore the precise subgraph along which the\ninformation from labeled nodes flows to an unlabeled node of interest.\nSurprisingly, the proposed framework avoids trading accuracy for\ninterpretability, but in fact leads to improved prediction accuracy, which is\nsupported both by theoretical considerations and empirical results. The\nflow-based framework guarantees the maximum principle by construction and can\nhandle directed graphs in an out-of-the-box manner.\n\n\n## [Unsupervised Learning of Disentangled and Interpretable Representations  from Sequential Data](https://arxiv.org/abs/1709.07902)\n[(PDF)](https://arxiv.org/pdf/1709.07902)\n\n`Authors:Wei-Ning Hsu, Yu Zhang, James Glass`\n\n\nComments:\n\nAccepted to NIPS 2017\n\nSubjects:\n\nLearning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1709.07902 [cs.LG]\n\n \n(or arXiv:1709.07902v1 [cs.LG] for this version)\n\n\n> Abstract: We present a factorized hierarchical variational autoencoder, which learns\ndisentangled and interpretable representations from sequential data without\nsupervision. Specifically, we exploit the multi-scale nature of information in\nsequential data by formulating it explicitly within a factorized hierarchical\ngraphical model that imposes sequence-dependent priors and sequence-independent\npriors to different sets of latent variables. The model is evaluated on two\nspeech corpora to demonstrate, qualitatively, its ability to transform speakers\nor linguistic content by manipulating different sets of latent variables; and\nquantitatively, its ability to outperform an i-vector baseline for speaker\nverification and reduce the word error rate by as much as 35% in mismatched\ntrain/test scenarios for automatic speech recognition tasks.\n\n\n## [CTD: Fast, Accurate, and Interpretable Method for Static and Dynamic  Tensor Decompositions](https://arxiv.org/abs/1710.03608)\n[(PDF)](https://arxiv.org/pdf/1710.03608)\n\n`Authors:Jungwoo Lee, Dongjin Choi, Lee Sael`\n\n\nSubjects:\n\nNumerical Analysis (cs.NA); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1710.03608 [cs.NA]\n\n \n(or arXiv:1710.03608v1 [cs.NA] for this version)\n\n\n> Abstract: How can we find patterns and anomalies in a tensor, or multi-dimensional\narray, in an efficient and directly interpretable way? How can we do this in an\nonline environment, where a new tensor arrives each time step? Finding patterns\nand anomalies in a tensor is a crucial problem with many applications,\nincluding building safety monitoring, patient health monitoring, cyber\nsecurity, terrorist detection, and fake user detection in social networks.\nStandard PARAFAC and Tucker decomposition results are not directly\ninterpretable. Although a few sampling-based methods have previously been\nproposed towards better interpretability, they need to be made faster, more\nmemory efficient, and more accurate.\nIn this paper, we propose CTD, a fast, accurate, and directly interpretable\ntensor decomposition method based on sampling. CTD-S, the static version of\nCTD, provably guarantees a high accuracy that is 17 ~ 83x more accurate than\nthat of the state-of-the-art method. Also, CTD-S is made 5 ~ 86x faster, and 7\n~ 12x more memory-efficient than the state-of-the-art method by removing\nredundancy. CTD-D, the dynamic version of CTD, is the first interpretable\ndynamic tensor decomposition method ever proposed. Also, it is made 2 ~ 3x\nfaster than already fast CTD-S by exploiting factors at previous time step and\nby reordering operations. With CTD, we demonstrate how the results can be\neffectively interpreted in the online distributed denial of service (DDoS)\nattack detection.\n\n\n## [Interpretable Convolutional Neural Networks](https://arxiv.org/abs/1710.00935)\n[(PDF)](https://arxiv.org/pdf/1710.00935)\n\n`Authors:Quanshi Zhang, Ying Nian Wu, Song-Chun Zhu`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1710.00935 [cs.CV]\n\n \n(or arXiv:1710.00935v3 [cs.CV] for this version)\n\n\n> Abstract: This paper proposes a method to modify traditional convolutional neural\nnetworks (CNNs) into interpretable CNNs, in order to clarify knowledge\nrepresentations in high conv-layers of CNNs. In an interpretable CNN, each\nfilter in a high conv-layer represents a certain object part. We do not need\nany annotations of object parts or textures to supervise the learning process.\nInstead, the interpretable CNN automatically assigns each filter in a high\nconv-layer with an object part during the learning process. Our method can be\napplied to different types of CNNs with different structures. The clear\nknowledge representation in an interpretable CNN can help people understand the\nlogics inside a CNN, i.e., based on which patterns the CNN makes the decision.\nExperiments showed that filters in an interpretable CNN were more semantically\nmeaningful than those in traditional CNNs.\n\n\n## [Interpretable Machine Learning for Privacy-Preserving Pervasive Systems](https://arxiv.org/abs/1710.08464)\n[(PDF)](https://arxiv.org/pdf/1710.08464)\n\n`Authors:Benjamin Baron, Mirco Musolesi`\n\n\nSubjects:\n\nMachine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1710.08464 [stat.ML]\n\n \n(or arXiv:1710.08464v3 [stat.ML] for this version)\n\n\n> Abstract: The presence of pervasive systems in our everyday lives and the interaction\nof users with connected devices such as smartphones or home appliances generate\nincreasing amounts of traces that reflect users' behavior. A plethora of\nmachine learning techniques enable service providers to process these traces to\nextract latent information about the users. While most of the existing projects\nhave focused on the accuracy of these techniques, little work has been done on\nthe interpretation of the inference and identification algorithms based on\nthem. In this paper, we propose a machine learning interpretability framework\nfor inference algorithms based on data collected through pervasive systems and\nwe outline the open challenges in this research area. Our interpretability\nframework enable users to understand how the traces they generate could expose\ntheir privacy, while allowing for usable and personalized services at the same\ntime.\n\n\n## [InterpNET: Neural Introspection for Interpretable Deep Learning](https://arxiv.org/abs/1710.09511)\n[(PDF)](https://arxiv.org/pdf/1710.09511)\n\n`Authors:Shane Barratt`\n\n\nComments:\n\nPresented at NIPS 2017 Symposium on Interpretable Machine Learning\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1710.09511 [stat.ML]\n\n \n(or arXiv:1710.09511v2 [stat.ML] for this version)\n\n\n> Abstract: Humans are able to explain their reasoning. On the contrary, deep neural\nnetworks are not. This paper attempts to bridge this gap by introducing a new\nway to design interpretable neural networks for classification, inspired by\nphysiological evidence of the human visual system's inner-workings. This paper\nproposes a neural network design paradigm, termed InterpNET, which can be\ncombined with any existing classification architecture to generate natural\nlanguage explanations of the classifications. The success of the module relies\non the assumption that the network's computation and reasoning is represented\nin its internal layer activations. While in principle InterpNET could be\napplied to any existing classification architecture, it is evaluated via an\nimage classification and explanation task. Experiments on a CUB bird\nclassification and explanation dataset show qualitatively and quantitatively\nthat the model is able to generate high-quality explanations. While the current\nstate-of-the-art METEOR score on this dataset is 29.2, InterpNET achieves a\nmuch higher METEOR score of 37.9.\n\n\n## [MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural  Networks](https://arxiv.org/abs/1711.06788)\n[(PDF)](https://arxiv.org/pdf/1711.06788)\n\n`Authors:Minmin Chen`\n\n\nComments:\n\nPresented at NIPS 2017 Symposium on Interpretable Machine Learning\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1711.06788 [stat.ML]\n\n \n(or arXiv:1711.06788v1 [stat.ML] for this version)\n\n\n> Abstract: We introduce MinimalRNN, a new recurrent neural network architecture that\nachieves comparable performance as the popular gated RNNs with a simplified\nstructure. It employs minimal updates within RNN, which not only leads to\nefficient learning and testing but more importantly better interpretability and\ntrainability. We demonstrate that by endorsing the more restrictive update\nrule, MinimalRNN learns disentangled RNN states. We further examine the\nlearning dynamics of different RNN structures using input-output Jacobians, and\nshow that MinimalRNN is able to capture longer range dependencies than existing\nRNN architectures.\n\n\n## [Beyond Sparsity: Tree Regularization of Deep Models for Interpretability](https://arxiv.org/abs/1711.06178)\n[(PDF)](https://arxiv.org/pdf/1711.06178)\n\n`Authors:Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez`\n\n\nComments:\n\nTo appear in AAAI 2018. Contains 9-page main paper and appendix with supplementary material\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1711.06178 [stat.ML]\n\n \n(or arXiv:1711.06178v1 [stat.ML] for this version)\n\n\n> Abstract: The lack of interpretability remains a key barrier to the adoption of deep\nmodels in many applications. In this work, we explicitly regularize deep models\nso human users might step through the process behind their predictions in\nlittle time. Specifically, we train deep time-series models so their\nclass-probability predictions have high accuracy while being closely modeled by\ndecision trees with few nodes. Using intuitive toy examples as well as medical\ntasks for treating sepsis and HIV, we demonstrate that this new tree\nregularization yields models that are easier for humans to simulate than\nsimpler L1 or L2 penalties without sacrificing predictive power.\n\n\n## [The Promise and Peril of Human Evaluation for Model Interpretability](https://arxiv.org/abs/1711.07414)\n[(PDF)](https://arxiv.org/pdf/1711.07414)\n\n`Authors:Bernease Herman`\n\n\nComments:\n\nPresented at NIPS 2017 Symposium on Interpretable Machine Learning\n\nSubjects:\n\nArtificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1711.07414 [cs.AI]\n\n \n(or arXiv:1711.07414v1 [cs.AI] for this version)\n\n\n> Abstract: Transparency, user trust, and human comprehension are popular ethical\nmotivations for interpretable machine learning. In support of these goals,\nresearchers evaluate model explanation performance using humans and real world\napplications. This alone presents a challenge in many areas of artificial\nintelligence. In this position paper, we propose a distinction between\ndescriptive and persuasive explanations. We discuss reasoning suggesting that\nfunctional interpretability may be correlated with cognitive function and user\npreferences. If this is indeed the case, evaluation and optimization using\nfunctional metrics could perpetuate implicit cognitive bias in explanations\nthat threaten transparency. Finally, we propose two potential research\ndirections to disambiguate cognitive function and explanation models, retaining\ncontrol over the tradeoff between accuracy and interpretability.\n\n\n## [Unleashing the Potential of CNNs for Interpretable Few-Shot Learning](https://arxiv.org/abs/1711.08277)\n[(PDF)](https://arxiv.org/pdf/1711.08277)\n\n`Authors:Boyang Deng, Qing Liu, Siyuan Qiao, Alan Yuille`\n\n\nComments:\n\nUnder review as a conference paper at ICLR 2018\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1711.08277 [cs.CV]\n\n \n(or arXiv:1711.08277v1 [cs.CV] for this version)\n\n\n> Abstract: Convolutional neural networks (CNNs) have been generally acknowledged as one\nof the driving forces for the advancement of computer vision. Despite their\npromising performances on many tasks, CNNs still face major obstacles on the\nroad to achieving ideal machine intelligence. One is the difficulty of\ninterpreting them and understanding their inner workings, which is important\nfor diagnosing their failures and correcting them. Another is that standard\nCNNs require large amounts of annotated data, which is sometimes very hard to\nobtain. Hence, it is desirable to enable them to learn from few examples. In\nthis work, we address these two limitations of CNNs by developing novel and\ninterpretable models for few-shot learning. Our models are based on the idea of\nencoding objects in terms of visual concepts, which are interpretable visual\ncues represented within CNNs. We first use qualitative visualizations and\nquantitative statistics, to uncover several key properties of feature encoding\nusing visual concepts. Motivated by these properties, we present two intuitive\nmodels for the problem of few-shot learning. Experiments show that our models\nachieve competitive performances, while being much more flexible and\ninterpretable than previous state-of-the-art few-shot learning methods. We\nconclude that visual concepts expose the natural capability of CNNs for\nfew-shot learning.\n\n\n## [Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action  Recognition](https://arxiv.org/abs/1711.08502)\n[(PDF)](https://arxiv.org/pdf/1711.08502)\n\n`Authors:Jingxuan Hou, Tae Soo Kim, Austin Reiter`\n\n\nComments:\n\n8 pages, 8 figures, CVPR18 submission\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1711.08502 [cs.CV]\n\n \n(or arXiv:1711.08502v1 [cs.CV] for this version)\n\n\n> Abstract: Despite the growing discriminative capabilities of modern deep learning\nmethods for recognition tasks, the inner workings of the state-of-art models\nstill remain mostly black-boxes. In this paper, we propose a systematic\ninterpretation of model parameters and hidden representations of Residual\nTemporal Convolutional Networks (Res-TCN) for action recognition in time-series\ndata. We also propose a Feature Map Decoder as part of the interpretation\nanalysis, which outputs a representation of model's hidden variables in the\nsame domain as the input. Such analysis empowers us to expose model's\ncharacteristic learning patterns in an interpretable way. For example, through\nthe diagnosis analysis, we discovered that our model has learned to achieve\nview-point invariance by implicitly learning to perform rotational\nnormalization of the input to a more discriminative view. Based on the findings\nfrom the model interpretation analysis, we propose a targeted refinement\ntechnique, which can generalize to various other recognition models. The\nproposed work introduces a three-stage paradigm for model learning: training,\ninterpretable diagnosis and targeted refinement. We validate our approach on\nskeleton based 3D human action recognition benchmark of NTU RGB+D. We show that\nthe proposed workflow is an effective model learning strategy and the resulting\nMulti-stream Residual Temporal Convolutional Network (MS-Res-TCN) achieves the\nstate-of-the-art performance on NTU RGB+D.\n\n\n## [SPINE: SParse Interpretable Neural Embeddings](https://arxiv.org/abs/1711.08792)\n[(PDF)](https://arxiv.org/pdf/1711.08792)\n\n`Authors:Anant Subramanian, Danish Pruthi, Harsh Jhamtani, Taylor Berg-Kirkpatrick, Eduard Hovy`\n\n\nComments:\n\nAAAI 2018\n\nSubjects:\n\nComputation and Language (cs.CL)\n\n\nCite as:\n\narXiv:1711.08792 [cs.CL]\n\n \n(or arXiv:1711.08792v1 [cs.CL] for this version)\n\n\n> Abstract: Prediction without justification has limited utility. Much of the success of\nneural models can be attributed to their ability to learn rich, dense and\nexpressive representations. While these representations capture the underlying\ncomplexity and latent trends in the data, they are far from being\ninterpretable. We propose a novel variant of denoising k-sparse autoencoders\nthat generates highly efficient and interpretable distributed word\nrepresentations (word embeddings), beginning with existing word representations\nfrom state-of-the-art methods like GloVe and word2vec. Through large scale\nhuman evaluation, we report that our resulting word embedddings are much more\ninterpretable than the original GloVe and word2vec embeddings. Moreover, our\nembeddings outperform existing popular word embeddings on a diverse suite of\nbenchmark downstream tasks.\n\n\n## [Improving the Adversarial Robustness and Interpretability of Deep Neural  Networks by Regularizing their Input Gradients](https://arxiv.org/abs/1711.09404)\n[(PDF)](https://arxiv.org/pdf/1711.09404)\n\n`Authors:Andrew Slavin Ross, Finale Doshi-Velez`\n\n\nComments:\n\nTo appear in AAAI 2018\n\nSubjects:\n\nLearning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1711.09404 [cs.LG]\n\n \n(or arXiv:1711.09404v1 [cs.LG] for this version)\n\n\n> Abstract: Deep neural networks have proven remarkably effective at solving many\nclassification problems, but have been criticized recently for two major\nweaknesses: the reasons behind their predictions are uninterpretable, and the\npredictions themselves can often be fooled by small adversarial perturbations.\nThese problems pose major obstacles for the adoption of neural networks in\ndomains that require security or transparency. In this work, we evaluate the\neffectiveness of defenses that differentiably penalize the degree to which\nsmall changes in inputs can alter model predictions. Across multiple attacks,\narchitectures, defenses, and datasets, we find that neural networks trained\nwith this input gradient regularization exhibit robustness to transferred\nadversarial examples generated to fool all of the other models. We also find\nthat adversarial examples generated to fool gradient-regularized models fool\nall other models equally well, and actually lead to more \"legitimate,\"\ninterpretable misclassifications as rated by people (which we confirm in a\nhuman subject experiment). Finally, we demonstrate that regularizing input\ngradients makes them more naturally interpretable as rationales for model\npredictions. We conclude by discussing this relationship between\ninterpretability and robustness in deep neural networks.\n\n\n## [Interpretable Convolutional Neural Networks for Effective Translation  Initiation Site Prediction](https://arxiv.org/abs/1711.09558)\n[(PDF)](https://arxiv.org/pdf/1711.09558)\n\n`Authors:Jasper Zuallaert, Mijung Kim, Yvan Saeys, Wesley De Neve`\n\n\nComments:\n\nPresented at International Workshop on Deep Learning in Bioinformatics, Biomedicine, and Healthcare Informatics (DLB2H 2017) --- in conjunction with the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2017)\n\nSubjects:\n\nGenomics (q-bio.GN); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1711.09558 [q-bio.GN]\n\n \n(or arXiv:1711.09558v1 [q-bio.GN] for this version)\n\n\n> Abstract: Thanks to rapidly evolving sequencing techniques, the amount of genomic data\nat our disposal is growing increasingly large. Determining the gene structure\nis a fundamental requirement to effectively interpret gene function and\nregulation. An important part in that determination process is the\nidentification of translation initiation sites. In this paper, we propose a\nnovel approach for automatic prediction of translation initiation sites,\nleveraging convolutional neural networks that allow for automatic feature\nextraction. Our experimental results demonstrate that we are able to improve\nthe state-of-the-art approaches with a decrease of 75.2% in false positive rate\nand with a decrease of 24.5% in error rate on chosen datasets. Furthermore, an\nin-depth analysis of the decision-making process used by our predictive model\nshows that our neural network implicitly learns biologically relevant features\nfrom scratch, without any prior knowledge about the problem at hand, such as\nthe Kozak consensus sequence, the influence of stop and start codons in the\nsequence and the presence of donor splice site patterns. In summary, our\nfindings yield a better understanding of the internal reasoning of a\nconvolutional neural network when applying such a neural network to genomic\ndata.\n\n\n## [Interpretable Facial Relational Network Using Relational Importance](https://arxiv.org/abs/1711.10688)\n[(PDF)](https://arxiv.org/pdf/1711.10688)\n\n`Authors:Seong Tae Kim, Yong Man Ro`\n\n\nSubjects:\n\nComputer Vision and Pattern Recognition (cs.CV)\n\n\nCite as:\n\narXiv:1711.10688 [cs.CV]\n\n \n(or arXiv:1711.10688v1 [cs.CV] for this version)\n\n\n> Abstract: Human face analysis is an important task in computer vision. According to\ncognitive-psychological studies, facial dynamics could provide crucial cues for\nface analysis. In particular, the motion of facial local regions in facial\nexpression is related to the motion of other facial regions. In this paper, a\nnovel deep learning approach which exploits the relations of facial local\ndynamics has been proposed to estimate facial traits from expression sequence.\nIn order to exploit the relations of facial dynamics in local regions, the\nproposed network consists of a facial local dynamic feature encoding network\nand a facial relational network. The facial relational network is designed to\nbe interpretable. Relational importance is automatically encoded and facial\ntraits are estimated by combining relational features based on the relational\nimportance. The relations of facial dynamics for facial trait estimation could\nbe interpreted by using the relational importance. By comparative experiments,\nthe effectiveness of the proposed method has been validated. Experimental\nresults show that the proposed method outperforms the state-of-the-art methods\nin gender and age estimation.\n\n\n## [An interpretable latent variable model for attribute applicability in  the Amazon catalogue](https://arxiv.org/abs/1712.00126)\n[(PDF)](https://arxiv.org/pdf/1712.00126)\n\n`Authors:Tammo Rukat, Dustin Lange, Cédric Archambeau`\n\n\nComments:\n\nPresented at NIPS 2017 Symposium on Interpretable Machine Learning\n\nSubjects:\n\nMachine Learning (stat.ML); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1712.00126 [stat.ML]\n\n \n(or arXiv:1712.00126v2 [stat.ML] for this version)\n\n\n> Abstract: Learning attribute applicability of products in the Amazon catalog (e.g.,\npredicting that a shoe should have a value for size, but not for battery-type\nat scale is a challenge. The need for an interpretable model is contingent on\n(1) the lack of ground truth training data, (2) the need to utilise prior\ninformation about the underlying latent space and (3) the ability to understand\nthe quality of predictions on new, unseen data. To this end, we develop the\nMaxMachine, a probabilistic latent variable model that learns distributed\nbinary representations, associated to sets of features that are likely to\nco-occur in the data. Layers of MaxMachines can be stacked such that higher\nlayers encode more abstract information. Any set of variables can be clamped to\nencode prior information. We develop fast sampling based posterior inference.\nPreliminary results show that the model improves over the baseline in 17 out of\n19 product groups and provides qualitatively reasonable predictions.\n\n\n## [Where Classification Fails, Interpretation Rises](https://arxiv.org/abs/1712.00558)\n[(PDF)](https://arxiv.org/pdf/1712.00558)\n\n`Authors:Chanh Nguyen, Georgi Georgiev, Yujie Ji, Ting Wang`\n\n\nComments:\n\n6 pages, 6 figures\n\nSubjects:\n\nLearning (cs.LG); Machine Learning (stat.ML)\n\n\nCite as:\n\narXiv:1712.00558 [cs.LG]\n\n \n(or arXiv:1712.00558v1 [cs.LG] for this version)\n\n\n> Abstract: An intriguing property of deep neural networks is their inherent vulnerability to adversarial inputs, which significantly hinders their\napplication in security-critical domains. Most existing detection methods\nattempt to use carefully engineered patterns to distinguish adversarial inputs\nfrom their genuine counterparts, which however can often be circumvented by\nadaptive adversaries. In this work, we take a completely different route by\nleveraging the definition of adversarial inputs: while deceiving for deep\nneural networks, they are barely discernible for human visions. Building upon\nrecent advances in interpretable models, we construct a new detection framework\nthat contrasts an input's interpretation against its classification. We\nvalidate the efficacy of this framework through extensive experiments using\nbenchmark datasets and attacks. We believe that this work opens a new direction\nfor designing adversarial input detection methods.\n\n\n## [SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for  Predicting Chemical Properties](https://arxiv.org/abs/1712.02034)\n[(PDF)](https://arxiv.org/pdf/1712.02034)\n\n`Authors:Garrett B. Goh, Nathan O. Hodas, Charles Siegel, Abhinav Vishnu`\n\n\nSubjects:\n\nMachine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)\n\n\nCite as:\n\narXiv:1712.02034 [stat.ML]\n\n \n(or arXiv:1712.02034v1 [stat.ML] for this version)\n\n\n> Abstract: Chemical databases store information in text representations, and the SMILES\nformat is a universal standard used in many cheminformatics software. Encoded\nin each SMILES string is structural information that can be used to predict\ncomplex chemical properties. In this work, we develop SMILES2Vec, a deep RNN\nthat automatically learns features from SMILES strings to predict chemical\nproperties, without the need for additional explicit chemical information, or\nthe \"grammar\" of how SMILES encode structural data. Using Bayesian optimization\nmethods to tune the network architecture, we show that an optimized SMILES2Vec\nmodel can serve as a general-purpose neural network for learning a range of\ndistinct chemical properties including toxicity, activity, solubility and\nsolvation energy, while outperforming contemporary MLP networks that uses\nengineered features. Furthermore, we demonstrate proof-of-concept of\ninterpretability by developing an explanation mask that localizes on the most\nimportant characters used in making a prediction. When tested on the solubility\ndataset, this localization identifies specific parts of a chemical that is\nconsistent with established first-principles knowledge of solubility with an\naccuracy of 88%, demonstrating that neural networks can learn technically\naccurate chemical concepts. The fact that SMILES2Vec validates established\nchemical facts, while providing state-of-the-art accuracy, makes it a potential\ntool for widespread adoption of interpretable deep learning by the chemistry\ncommunity.\n"
  }
]