Full Code of liupeng3425/interpretable-research for AI

master dafdf4ea91c7 cached

3 files

310.3 KB

73.8k tokens

1 requests

Download .txt

Showing preview only (319K chars total). Download the full file or copy to clipboard to get everything.

Repository: liupeng3425/interpretable-research
Branch: master
Commit: dafdf4ea91c7
Files: 3
Total size: 310.3 KB

Directory structure:
gitextract_ktd4iga1/

├── .gitignore
├── README.md
└── arxiv.md

================================================
FILE CONTENTS
================================================

================================================
FILE: .gitignore
================================================
.DS_Store
note/

================================================
FILE: README.md
================================================
# Index
1. Application and interpret
    * [Interpretable Policies for Reinforcement Learning by Genetic Programming](#interpretable-policies-for-reinforcement-learning-by-genetic-programming)
    * [Discovery Radiomics with CLEAR-DR: Interpretable Computer Aided  Diagnosis of Diabetic Retinopathy](#discovery-radiomics-with-clear-dr-interpretable-computer-aided--diagnosis-of-diabetic-retinopathy)
    * [Building Data-driven Models with Microstructural Images: Generalization  and Interpretability](#building-data-driven-models-with-microstructural-images-generalization--and-interpretability)
    * [Interpretable Feature Recommendation for Signal Analytics](#interpretable-feature-recommendation-for-signal-analytics)
    * [Interpretable and Pedagogical Examples](#interpretable-and-pedagogical-examples)
    * [Unsupervised patient representations from clinical notes with  interpretable classification decisions](#unsupervised-patient-representations-from-clinical-notes-with--interpretable-classification-decisions)
    * [Interpretable probabilistic embeddings: bridging the gap between topic  models and neural networks](#interpretable-probabilistic-embeddings-bridging-the-gap-between-topic--models-and-neural-networks)
    * [Arrhythmia Classification from the Abductive Interpretation of Short  Single-Lead ECG Records](#arrhythmia-classification-from-the-abductive-interpretation-of-short--single-lead-ecg-records)
    * [Interpretable classifiers using rules and Bayesian analysis: Building a  better stroke prediction model](#interpretable-classifiers-using-rules-and-bayesian-analysis-building-a--better-stroke-prediction-model)
    * [Interpretable Deep Neural Networks for Single-Trial EEG Classification](#interpretable-deep-neural-networks-for-single-trial-eeg-classification)
    * [Learning Interpretable Musical Compositional Rules and Traces](#learning-interpretable-musical-compositional-rules-and-traces)
    * [Building an Interpretable Recommender via Loss-Preserving Transformation](#building-an-interpretable-recommender-via-loss-preserving-transformation)
    * [Interpretable Machine Learning Models for the Digital Clock Drawing Test](#interpretable-machine-learning-models-for-the-digital-clock-drawing-test)
    * [RETAIN: An Interpretable Predictive Model for Healthcare using Reverse  Time Attention Mechanism](#retain-an-interpretable-predictive-model-for-healthcare-using-reverse--time-attention-mechanism)
    * [Real Time Fine-Grained Categorization with Accuracy and Interpretability](#real-time-fine-grained-categorization-with-accuracy-and-interpretability)
    * [Interpreting Neural Networks to Improve Politeness Comprehension](#interpreting-neural-networks-to-improve-politeness-comprehension)
    * [Interpretable Semantic Textual Similarity: Finding and explaining  differences between sentences](#interpretable-semantic-textual-similarity-finding-and-explaining--differences-between-sentences)
    * [Streaming Weak Submodularity: Interpreting Neural Networks on the Fly](#streaming-weak-submodularity-interpreting-neural-networks-on-the-fly)
    * [Interpretable Learning for Self-Driving Cars by Visualizing Causal  Attention](#interpretable-learning-for-self-driving-cars-by-visualizing-causal--attention)
    * [Interpretable 3D Human Action Analysis with Temporal Convolutional  Networks](#interpretable-3d-human-action-analysis-with-temporal-convolutional--networks)
    * [An Interpretable Knowledge Transfer Model for Knowledge Base Completion](#an-interpretable-knowledge-transfer-model-for-knowledge-base-completion)
    * [MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis  Network](#mdnet-a-semantically-and-visually-interpretable-medical-image-diagnosis--network)
    * [Interpretable Active Learning](#interpretable-active-learning)
    * [DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation  of Self-Reported Pain](#deepfacelift-interpretable-personalized-models-for-automatic-estimation--of-self-reported-pain)
    * [More cat than cute? Interpretable Prediction of Adjective-Noun Pairs](#more-cat-than-cute-interpretable-prediction-of-adjective-noun-pairs)
    * [Interpretable Categorization of Heterogeneous Time Series Data](#interpretable-categorization-of-heterogeneous-time-series-data)
    * [Interpretable Graph-Based Semi-Supervised Learning via Flows](#interpretable-graph-based-semi-supervised-learning-via-flows)
    * [CTD: Fast, Accurate, and Interpretable Method for Static and Dynamic  Tensor Decompositions](#ctd-fast-accurate-and-interpretable-method-for-static-and-dynamic--tensor-decompositions)
    * [Interpretable Machine Learning for Privacy-Preserving Pervasive Systems](#interpretable-machine-learning-for-privacy-preserving-pervasive-systems)
    * [Interpretable Convolutional Neural Networks for Effective Translation  Initiation Site Prediction](#interpretable-convolutional-neural-networks-for-effective-translation--initiation-site-prediction)
    * [Interpretable Facial Relational Network Using Relational Importance](#interpretable-facial-relational-network-using-relational-importance)
    * [SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for  Predicting Chemical Properties](#smiles2vec-an-interpretable-general-purpose-deep-neural-network-for--predicting-chemical-properties)
    
1. Determine interpretability of CNN
    * [Network Dissection: Quantifying Interpretability of Deep Visual  Representations](#network-dissection-quantifying-interpretability-of-deep-visual--representations)
    * [A Formal Framework to Characterize Interpretability of Procedures](#a-formal-framework-to-characterize-interpretability-of-procedures)
    
1. Criticize
    * [Interpretation of Neural Networks is Fragile](#interpretation-of-neural-networks-is-fragile)
    * [The Promise and Peril of Human Evaluation for Model Interpretability](#the-promise-and-peril-of-human-evaluation-for-model-interpretability)
    
1. Interpret existing model
    * [Artificial Intelligence as Structural Estimation: Economic  Interpretations of Deep Blue, Bonanza, and AlphaGo](#artificial-intelligence-as-structural-estimation-economic--interpretations-of-deep-blue-bonanza-and-alphago)
    * [Semantic Structure and Interpretability of Word Embeddings](#semantic-structure-and-interpretability-of-word-embeddings)
    * [Interpreting Convolutional Neural Networks Through Compression](#interpreting-convolutional-neural-networks-through-compression)
    * [Interpreting Deep Visual Representations via Network Dissection](#interpreting-deep-visual-representations-via-network-dissection)
    * [The Mythos of Model Interpretability](#the-mythos-of-model-interpretability)
    * [Model-Agnostic Interpretability of Machine Learning](#model-agnostic-interpretability-of-machine-learning)
    * [Using Visual Analytics to Interpret Predictive Machine Learning Models](#using-visual-analytics-to-interpret-predictive-machine-learning-models)
    * [Towards Transparent AI Systems: Interpreting Visual Question Answering  Models](#towards-transparent-ai-systems-interpreting-visual-question-answering--models)
    * [Embedding Projector: Interactive Visualization and Interpretation of  Embeddings](#embedding-projector-interactive-visualization-and-interpretation-of--embeddings)
    * [Learning Interpretability for Visualizations using Adapted Cox Models  through a User Experiment](#learning-interpretability-for-visualizations-using-adapted-cox-models--through-a-user-experiment)
    * [Interpreting Finite Automata for Sequential Data](#interpreting-finite-automata-for-sequential-data)
    * [Interpretation of Prediction Models Using the Input Gradient](#interpretation-of-prediction-models-using-the-input-gradient)
    * [Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery](#interpretable-recurrent-neural-networks-using-sequential-sparse-recovery)
    * [An unexpected unity among methods for interpreting model predictions](#an-unexpected-unity-among-methods-for-interpreting-model-predictions)
    * [Towards A Rigorous Science of Interpretable Machine Learning](#towards-a-rigorous-science-of-interpretable-machine-learning)
    * [Softmax Q-Distribution Estimation for Structured Prediction: A  Theoretical Interpretation for RAML](#softmax-q-distribution-estimation-for-structured-prediction-a--theoretical-interpretation-for-raml)
    * [A Unified Approach to Interpreting Model Predictions](#a-unified-approach-to-interpreting-model-predictions)
    * [Interpreting Blackbox Models via Model Extraction](#interpreting-blackbox-models-via-model-extraction)
    * [Interpretable &amp; Explorable Approximations of Black Box Models](#interpretable--explorable-approximations-of-black-box-models)
    * [Interpretability via Model Extraction](#interpretability-via-model-extraction)
    * [Methods for Interpreting and Understanding Deep Neural Networks](#methods-for-interpreting-and-understanding-deep-neural-networks)
    * [Interpreting Classifiers through Attribute Interactions in Datasets](#interpreting-classifiers-through-attribute-interactions-in-datasets)
    * [Using Program Induction to Interpret Transition System Dynamics](#using-program-induction-to-interpret-transition-system-dynamics)
    * [Warp: a method for neural network interpretability applied to gene  expression profiles](#warp-a-method-for-neural-network-interpretability-applied-to-gene--expression-profiles)
    * [Towards Interpretable Deep Neural Networks by Leveraging Adversarial  Examples](#towards-interpretable-deep-neural-networks-by-leveraging-adversarial--examples)
    * [Explainable Artificial Intelligence: Understanding, Visualizing and  Interpreting Deep Learning Models](#explainable-artificial-intelligence-understanding-visualizing-and--interpreting-deep-learning-models)
    * [Interpreting Shared Deep Learning Models via Explicable Boundary Trees](#interpreting-shared-deep-learning-models-via-explicable-boundary-trees)
    * [Unleashing the Potential of CNNs for Interpretable Few-Shot Learning](#unleashing-the-potential-of-cnns-for-interpretable-few-shot-learning)
    * [Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action  Recognition](#train-diagnose-and-fix-interpretable-approach-for-fine-grained-action--recognition)
    * [An interpretable latent variable model for attribute applicability in  the Amazon catalogue](#an-interpretable-latent-variable-model-for-attribute-applicability-in--the-amazon-catalogue)
    * [Where Classification Fails, Interpretation Rises](#where-classification-fails-interpretation-rises)
    
1. Attempt to improve interpretability
    * [Contextual Regression: An Accurate and Conveniently Interpretable  Nonlinear Model for Mining Discovery from Scientific Data](#contextual-regression-an-accurate-and-conveniently-interpretable--nonlinear-model-for-mining-discovery-from-scientific-data)
    * [Interpretable R-CNN](#interpretable-r-cnn)
    * [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](#increasing-the-interpretability-of-recurrent-neural-networks-using--hidden-markov-models)
    * [SnapToGrid: From Statistical to Interpretable Models for Biomedical  Information Extraction](#snaptogrid-from-statistical-to-interpretable-models-for-biomedical--information-extraction)
    * [Meaningful Models: Utilizing Conceptual Structure to Improve Machine  Learning Interpretability](#meaningful-models-utilizing-conceptual-structure-to-improve-machine--learning-interpretability)
    * [Particle Swarm Optimization for Generating Interpretable Fuzzy  Reinforcement Learning Policies](#particle-swarm-optimization-for-generating-interpretable-fuzzy--reinforcement-learning-policies)
    * [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](#increasing-the-interpretability-of-recurrent-neural-networks-using--hidden-markov-models-1)
    * [Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning](#growing-interpretable-part-graphs-on-convnets-via-multi-shot-learning)
    * [GENESIM: genetic extraction of a single, interpretable model](#genesim-genetic-extraction-of-a-single-interpretable-model)
    * [Stratified Knowledge Bases as Interpretable Probabilistic Models  (Extended Abstract)](#stratified-knowledge-bases-as-interpretable-probabilistic-models--extended-abstract)
    * [Tree Space Prototypes: Another Look at Making Tree Ensembles  Interpretable](#tree-space-prototypes-another-look-at-making-tree-ensembles--interpretable)
    * [Inducing Interpretable Representations with Variational Autoencoders](#inducing-interpretable-representations-with-variational-autoencoders)
    * [Input Switched Affine Networks: An RNN Architecture Designed for  Interpretability](#input-switched-affine-networks-an-rnn-architecture-designed-for--interpretability)
    * [Large scale modeling of antimicrobial resistance with interpretable  classifiers](#large-scale-modeling-of-antimicrobial-resistance-with-interpretable--classifiers)
    * [Towards a New Interpretation of Separable Convolutions](#towards-a-new-interpretation-of-separable-convolutions)
    * [Interpretable Structure-Evolving LSTM](#interpretable-structure-evolving-lstm)
    * [Improving Interpretability of Deep Neural Networks with Semantic  Information](#improving-interpretability-of-deep-neural-networks-with-semantic--information)
    * [InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations](#infogail-interpretable-imitation-learning-from-visual-demonstrations)
    * [Patchnet: Interpretable Neural Networks for Image Classification](#patchnet-interpretable-neural-networks-for-image-classification)
    * [Unsupervised Learning of Disentangled and Interpretable Representations  from Sequential Data](#unsupervised-learning-of-disentangled-and-interpretable-representations--from-sequential-data)
    * [Interpretable Convolutional Neural Networks](#interpretable-convolutional-neural-networks)
    * [InterpNET: Neural Introspection for Interpretable Deep Learning](#interpnet-neural-introspection-for-interpretable-deep-learning)
    * [MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural  Networks](#minimalrnn-toward-more-interpretable-and-trainable-recurrent-neural--networks)
    * [Beyond Sparsity: Tree Regularization of Deep Models for Interpretability](#beyond-sparsity-tree-regularization-of-deep-models-for-interpretability)
    * [SPINE: SParse Interpretable Neural Embeddings](#spine-sparse-interpretable-neural-embeddings)
    * [Improving the Adversarial Robustness and Interpretability of Deep Neural  Networks by Regularizing their Input Gradients](#improving-the-adversarial-robustness-and-interpretability-of-deep-neural--networks-by-regularizing-their-input-gradients)


# Papers

## [Interpretable Policies for Reinforcement Learning by Genetic Programming](https://arxiv.org/abs/1712.04170)
[(PDF)](https://arxiv.org/pdf/1712.04170)

`Authors:Daniel Hein, Steffen Udluft, Thomas A. Runkler`


Subjects:

Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)


Cite as:

arXiv:1712.04170 [cs.AI]

 
(or arXiv:1712.04170v1 [cs.AI] for this version)


> Abstract: The search for interpretable reinforcement learning policies is of high
academic and industrial interest. Especially for industrial systems, domain
experts are more likely to deploy autonomously learned controllers if they are
understandable and convenient to evaluate. Basic algebraic equations are
supposed to meet these requirements, as long as they are restricted to an
adequate complexity. Here we introduce the genetic programming for
reinforcement learning (GPRL) approach based on model-based batch reinforcement
learning and genetic programming, which autonomously learns policy equations
from pre-existing default state-action trajectory samples. GPRL is compared to
a straight-forward method which utilizes genetic programming for symbolic
regression, yielding policies imitating an existing well-performing, but
non-interpretable policy. Experiments on three reinforcement learning
benchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark,
demonstrate the superiority of our GPRL approach compared to the symbolic
regression method. GPRL is capable of producing well-performing interpretable
reinforcement learning policies from pre-existing default trajectory data.


## [Discovery Radiomics with CLEAR-DR: Interpretable Computer Aided  Diagnosis of Diabetic Retinopathy](https://arxiv.org/abs/1710.10675)
[(PDF)](https://arxiv.org/pdf/1710.10675)

`Authors:Devinder Kumar, Graham W. Taylor, Alexander Wong`


Subjects:

Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)


Cite as:

arXiv:1710.10675 [cs.AI]

 
(or arXiv:1710.10675v1 [cs.AI] for this version)


> Abstract: Objective: Radiomics-driven Computer Aided Diagnosis (CAD) has shown
considerable promise in recent years as a potential tool for improving clinical
decision support in medical oncology, particularly those based around the
concept of Discovery Radiomics, where radiomic sequencers are discovered
through the analysis of medical imaging data. One of the main limitations with
current CAD approaches is that it is very difficult to gain insight or
rationale as to how decisions are made, thus limiting their utility to
clinicians. Methods: In this study, we propose CLEAR-DR, a novel interpretable
CAD system based on the notion of CLass-Enhanced Attentive Response Discovery
Radiomics for the purpose of clinical decision support for diabetic
retinopathy. Results: In addition to disease grading via the discovered deep
radiomic sequencer, the CLEAR-DR system also produces a visual interpretation
of the decision-making process to provide better insight and understanding into
the decision-making process of the system. Conclusion: We demonstrate the
effectiveness and utility of the proposed CLEAR-DR system of enhancing the
interpretability of diagnostic grading results for the application of diabetic
retinopathy grading. Significance: CLEAR-DR can act as a potential powerful
tool to address the uninterpretability issue of current CAD systems, thus
improving their utility to clinicians.


## [Interpretation of Neural Networks is Fragile](https://arxiv.org/abs/1710.10547)
[(PDF)](https://arxiv.org/pdf/1710.10547)

`Authors:Amirata Ghorbani, Abubakar Abid, James Zou`


Comments:

Submitted for review at ICLR 2018

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1710.10547 [stat.ML]

 
(or arXiv:1710.10547v1 [stat.ML] for this version)


> Abstract: In order for machine learning to be deployed and trusted in many
applications, it is crucial to be able to reliably explain why the machine
learning algorithm makes certain predictions. For example, if an algorithm
classifies a given pathology image to be a malignant tumor, then the doctor may
need to know which parts of the image led the algorithm to this classification.
How to interpret black-box predictors is thus an important and active area of
research. A fundamental question is: how much can we trust the interpretation
itself? In this paper, we show that interpretation of deep learning predictions
is extremely fragile in the following sense: two perceptively indistinguishable
inputs with the same predicted label can be assigned very different
interpretations. We systematically characterize the fragility of several
widely-used feature-importance interpretation methods (saliency maps, relevance
propagation, and DeepLIFT) on ImageNet and CIFAR-10. Our experiments show that
even small random perturbation can change the feature importance and new
systematic perturbations can lead to dramatically different interpretations
without changing the label. We extend these results to show that
interpretations based on exemplars (e.g. influence functions) are similarly
fragile. Our analysis of the geometry of the Hessian matrix gives insight on
why fragility could be a fundamental challenge to the current interpretation
approaches.


## [Artificial Intelligence as Structural Estimation: Economic  Interpretations of Deep Blue, Bonanza, and AlphaGo](https://arxiv.org/abs/1710.10967)
[(PDF)](https://arxiv.org/pdf/1710.10967)

`Authors:Mitsuru Igami`


Subjects:

Econometrics (econ.EM); Artificial Intelligence (cs.AI); Learning (cs.LG)


Cite as:

arXiv:1710.10967 [econ.EM]

 
(or arXiv:1710.10967v2 [econ.EM] for this version)


> Abstract: Artificial intelligence (AI) has achieved superhuman performance in a growing
number of tasks, including the classical games of chess, shogi, and Go, but
understanding and explaining AI remain challenging. This paper studies the
machine-learning algorithms for developing the game AIs, and provides their
structural interpretations. Specifically, chess-playing Deep Blue is a
calibrated value function, whereas shogi-playing Bonanza represents an
estimated value function via Rust's (1987) nested fixed-point method. AlphaGo's
"supervised-learning policy network" is a deep neural network (DNN) version of
Hotz and Miller's (1993) conditional choice probability estimates; its
"reinforcement-learning value network" is equivalent to Hotz, Miller, Sanders,
and Smith's (1994) simulation method for estimating the value function. Their
performances suggest DNNs are a useful functional form when the state space is
large and data are sparse. Explicitly incorporating strategic interactions and
unobserved heterogeneity in the data-generating process would further improve
AIs' explicability.


## [Contextual Regression: An Accurate and Conveniently Interpretable  Nonlinear Model for Mining Discovery from Scientific Data](https://arxiv.org/abs/1710.10728)
[(PDF)](https://arxiv.org/pdf/1710.10728)

`Authors:Chengyu Liu, Wei Wang`


Comments:

18 pages of Main Article, 30 pages of Supplementary Material

Subjects:

Quantitative Methods (q-bio.QM); Learning (cs.LG); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)


Cite as:

arXiv:1710.10728 [q-bio.QM]

 
(or arXiv:1710.10728v1 [q-bio.QM] for this version)


> Abstract: Machine learning algorithms such as linear regression, SVM and neural network
have played an increasingly important role in the process of scientific
discovery. However, none of them is both interpretable and accurate on
nonlinear datasets. Here we present contextual regression, a method that joins
these two desirable properties together using a hybrid architecture of neural
network embedding and dot product layer. We demonstrate its high prediction
accuracy and sensitivity through the task of predictive feature selection on a
simulated dataset and the application of predicting open chromatin sites in the
human genome. On the simulated data, our method achieved high fidelity recovery
of feature contributions under random noise levels up to 200%. On the open
chromatin dataset, the application of our method not only outperformed the
state of the art method in terms of accuracy, but also unveiled two previously
unfound open chromatin related histone marks. Our method can fill the blank of
accurate and interpretable nonlinear modeling in scientific data mining tasks.


## [Building Data-driven Models with Microstructural Images: Generalization  and Interpretability](https://arxiv.org/abs/1711.00404)
[(PDF)](https://arxiv.org/pdf/1711.00404)

`Authors:Julia Ling, Maxwell Hutchinson, Erin Antono, Brian DeCost, Elizabeth A. Holm, Bryce Meredig`


Subjects:

Artificial Intelligence (cs.AI); Materials Science (cond-mat.mtrl-sci)


Cite as:

arXiv:1711.00404 [cs.AI]

 
(or arXiv:1711.00404v1 [cs.AI] for this version)


> Abstract: As data-driven methods rise in popularity in materials science applications,
a key question is how these machine learning models can be used to understand
microstructure. Given the importance of process-structure-property relations
throughout materials science, it seems logical that models that can leverage
microstructural data would be more capable of predicting property information.
While there have been some recent attempts to use convolutional neural networks
to understand microstructural images, these early studies have focused only on
which featurizations yield the highest machine learning model accuracy for a
single data set. This paper explores the use of convolutional neural networks
for classifying microstructure with a more holistic set of objectives in mind:
generalization between data sets, number of features required, and
interpretability.


## [Interpretable Feature Recommendation for Signal Analytics](https://arxiv.org/abs/1711.01870)
[(PDF)](https://arxiv.org/pdf/1711.01870)

`Authors:Snehasis Banerjee, Tanushyam Chattopadhyay, Ayan Mukherjee`


Comments:

4 pages, Interpretable Data Mining Workshop, CIKM 2017

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1711.01870 [stat.ML]

 
(or arXiv:1711.01870v1 [stat.ML] for this version)


> Abstract: This paper presents an automated approach for interpretable feature
recommendation for solving signal data analytics problems. The method has been
tested by performing experiments on datasets in the domain of prognostics where
interpretation of features is considered very important. The proposed approach
is based on Wide Learning architecture and provides means for interpretation of
the recommended features. It is to be noted that such an interpretation is not
available with feature learning approaches like Deep Learning (such as
Convolutional Neural Network) or feature transformation approaches like
Principal Component Analysis. Results show that the feature recommendation and
interpretation techniques are quite effective for the problems at hand in terms
of performance and drastic reduction in time to develop a solution. It is
further shown by an example, how this human-in-loop interpretation system can
be used as a prescriptive system.


## [Semantic Structure and Interpretability of Word Embeddings](https://arxiv.org/abs/1711.00331)
[(PDF)](https://arxiv.org/pdf/1711.00331)

`Authors:Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, Aykut Koc, Tolga Cukur`


Comments:

10 Pages, 7 Figures

Subjects:

Computation and Language (cs.CL)


Cite as:

arXiv:1711.00331 [cs.CL]

 
(or arXiv:1711.00331v2 [cs.CL] for this version)


> Abstract: Dense word embeddings, which encode semantic meanings of words to low
dimensional vector spaces have become very popular in natural language
processing (NLP) research due to their state-of-the-art performances in many
NLP tasks. Word embeddings are substantially successful in capturing semantic
relations among words, so a meaningful semantic structure must be present in
the respective vector spaces. However, in many cases, this semantic structure
is broadly and heterogeneously distributed across the embedding dimensions,
which makes interpretation a big challenge. In this study, we propose a
statistical method to uncover the latent semantic structure in the dense word
embeddings. To perform our analysis we introduce a new dataset (SEMCAT) that
contains more than 6500 words semantically grouped under 110 categories. We
further propose a method to quantify the interpretability of the word
embeddings; the proposed method is a practical alternative to the classical
word intrusion test that requires human intervention.


## [Interpretable and Pedagogical Examples](https://arxiv.org/abs/1711.00694)
[(PDF)](https://arxiv.org/pdf/1711.00694)

`Authors:Smitha Milli, Pieter Abbeel, Igor Mordatch`


Subjects:

Artificial Intelligence (cs.AI)


Cite as:

arXiv:1711.00694 [cs.AI]

 
(or arXiv:1711.00694v1 [cs.AI] for this version)


> Abstract: Teachers intentionally pick the most informative examples to show their
students. However, if the teacher and student are neural networks, the examples
that the teacher network learns to give, although effective at teaching the
student, are typically uninterpretable. We show that training the student and
teacher iteratively, rather than jointly, can produce interpretable teaching
strategies. We evaluate interpretability by (1) measuring the similarity of the
teacher's emergent strategies to intuitive strategies in each domain and (2)
conducting human experiments to evaluate how effective the teacher's strategies
are at teaching humans. We show that the teacher network learns to select or
generate interpretable, pedagogical examples to teach rule-based,
probabilistic, boolean, and hierarchical concepts.


## [Unsupervised patient representations from clinical notes with  interpretable classification decisions](https://arxiv.org/abs/1711.05198)
[(PDF)](https://arxiv.org/pdf/1711.05198)

`Authors:Madhumita Sushil, Simon Šuster, Kim Luyckx, Walter Daelemans`


Comments:

Accepted poster at NIPS 2017 Workshop on Machine Learning for Health (this https URL)

Subjects:

Computation and Language (cs.CL)


Cite as:

arXiv:1711.05198 [cs.CL]

 
(or arXiv:1711.05198v1 [cs.CL] for this version)


> Abstract: We have two main contributions in this work: 1. We explore the usage of a
stacked denoising autoencoder, and a paragraph vector model to learn
task-independent dense patient representations directly from clinical notes. We
evaluate these representations by using them as features in multiple supervised
setups, and compare their performance with those of sparse representations. 2.
To understand and interpret the representations, we explore the best encoded
features within the patient representations obtained from the autoencoder
model. Further, we calculate the significance of the input features of the
trained classifiers when we use these pretrained representations as input.


## [Interpreting Convolutional Neural Networks Through Compression](https://arxiv.org/abs/1711.02329)
[(PDF)](https://arxiv.org/pdf/1711.02329)

`Authors:Reza Abbasi-Asl, Bin Yu`


Comments:

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)


Cite as:

arXiv:1711.02329 [stat.ML]

 
(or arXiv:1711.02329v1 [stat.ML] for this version)


> Abstract: Convolutional neural networks (CNNs) achieve state-of-the-art performance in
a wide variety of tasks in computer vision. However, interpreting CNNs still
remains a challenge. This is mainly due to the large number of parameters in
these networks. Here, we investigate the role of compression and particularly
pruning filters in the interpretation of CNNs. We exploit our recently-proposed
greedy structural compression scheme that prunes filters in a trained CNN. In
our compression, the filter importance index is defined as the classification
accuracy reduction (CAR) of the network after pruning that filter. The filters
are then iteratively pruned based on the CAR index. We demonstrate the
interpretability of CAR-compressed CNNs by showing that our algorithm prunes
filters with visually redundant pattern selectivity. Specifically, we show the
importance of shape-selective filters for object recognition, as opposed to
color-selective filters. Out of top 20 CAR-pruned filters in AlexNet, 17 of
them in the first layer and 14 of them in the second layer are color-selective
filters. Finally, we introduce a variant of our CAR importance index that
quantifies the importance of each image class to each CNN filter. We show that
the most and the least important class labels present a meaningful
interpretation of each filter that is consistent with the visualized pattern
selectivity of that filter.


## [Interpretable probabilistic embeddings: bridging the gap between topic  models and neural networks](https://arxiv.org/abs/1711.04154)
[(PDF)](https://arxiv.org/pdf/1711.04154)

`Authors:Anna Potapenko, Artem Popov, Konstantin Vorontsov`


Comments:

Appeared in AINL-2017

Subjects:

Computation and Language (cs.CL)


Cite as:

arXiv:1711.04154 [cs.CL]

 
(or arXiv:1711.04154v1 [cs.CL] for this version)


> Abstract: We consider probabilistic topic models and more recent word embedding
techniques from a perspective of learning hidden semantic representations.
Inspired by a striking similarity of the two approaches, we merge them and
learn probabilistic embeddings with online EM-algorithm on word co-occurrence
data. The resulting embeddings perform on par with Skip-Gram Negative Sampling
(SGNS) on word similarity tasks and benefit in the interpretability of the
components. Next, we learn probabilistic document embeddings that outperform
paragraph2vec on a document similarity task and require less memory and time
for training. Finally, we employ multimodal Additive Regularization of Topic
Models (ARTM) to obtain a high sparsity and learn embeddings for other
modalities, such as timestamps and categories. We observe further improvement
of word similarity performance and meaningful inter-modality similarities.


## [Arrhythmia Classification from the Abductive Interpretation of Short  Single-Lead ECG Records](https://arxiv.org/abs/1711.03892)
[(PDF)](https://arxiv.org/pdf/1711.03892)

`Authors:Tomás Teijeiro, Constantino A. García, Daniel Castro, Paulo Félix`


Comments:

4 pages, 3 figures. Presented in the Computing in Cardiology 2017 conference

Subjects:

Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)


MSC classes:

68T10


Cite as:

arXiv:1711.03892 [cs.AI]

 
(or arXiv:1711.03892v1 [cs.AI] for this version)


> Abstract: In this work we propose a new method for the rhythm classification of short
single-lead ECG records, using a set of high-level and clinically meaningful
features provided by the abductive interpretation of the records. These
features include morphological and rhythm-related features that are used to
build two classifiers: one that evaluates the record globally, using aggregated
values for each feature; and another one that evaluates the record as a
sequence, using a Recurrent Neural Network fed with the individual features for
each detected heartbeat. The two classifiers are finally combined using the
stacking technique, providing an answer by means of four target classes: Normal
sinus rhythm, Atrial fibrillation, Other anomaly, and Noisy. The approach has
been validated against the 2017 Physionet/CinC Challenge dataset, obtaining a
final score of 0.83 and ranking first in the competition.


## [Interpretable R-CNN](https://arxiv.org/abs/1711.05226)
[(PDF)](https://arxiv.org/pdf/1711.05226)

`Authors:Tianfu Wu, Xilai Li, Xi Song, Wei Sun, Liang Dong, Bo Li`


Comments:

13 pages

Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1711.05226 [cs.CV]

 
(or arXiv:1711.05226v1 [cs.CV] for this version)


> Abstract: This paper presents a method of learning qualitatively interpretable models
in object detection using popular two-stage region-based ConvNet detection
systems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI
(Region-of-Interest) prediction network.By interpretable models, we focus on
weakly-supervised extractive rationale generation, that is learning to unfold
latent discriminative part configurations of object instances automatically and
simultaneously in detection without using any supervision for part
configurations. We utilize a top-down hierarchical and compositional grammar
model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold
the space of latent part configurations of RoIs. We propose an AOGParsing
operator to substitute the RoIPooling operator widely used in R-CNN, so the
proposed method is applicable to many state-of-the-art ConvNet based detection
systems. The AOGParsing operator aims to harness both the explainable rigor of
top-down hierarchical and compositional grammar models and the discriminative
power of bottom-up deep neural networks through end-to-end training. In
detection, a bounding box is interpreted by the best parse tree derived from
the AOG on-the-fly, which is treated as the extractive rationale generated for
interpreting detection. In learning, we propose a folding-unfolding method to
train the AOG and ConvNet end-to-end. In experiments, we build on top of the
R-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets
with performance comparable to state-of-the-art methods.


## [Interpreting Deep Visual Representations via Network Dissection](https://arxiv.org/abs/1711.05611)
[(PDF)](https://arxiv.org/pdf/1711.05611)

`Authors:Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba`


Comments:

*B. Zhou and D. Bau contributed equally to this work. 15 pages, 27 figures

Subjects:

Computer Vision and Pattern Recognition (cs.CV)


ACM classes:

I.2.10


Cite as:

arXiv:1711.05611 [cs.CV]

 
(or arXiv:1711.05611v1 [cs.CV] for this version)


> Abstract: The success of recent deep convolutional neural networks (CNNs) depends on
learning hidden representations that can summarize the important factors of
variation behind the data. However, CNNs often criticized as being black boxes
that lack interpretability, since they have millions of unexplained model
parameters. In this work, we describe Network Dissection, a method that
interprets networks by providing labels for the units of their deep visual
representations. The proposed method quantifies the interpretability of CNN
representations by evaluating the alignment between individual hidden units and
a set of visual semantic concepts. By identifying the best alignments, units
are given human interpretable labels across a range of objects, parts, scenes,
textures, materials, and colors. The method reveals that deep representations
are more transparent and interpretable than expected: we find that
representations are significantly more interpretable than they would be under a
random equivalently powerful basis. We apply the method to interpret and
compare the latent representations of various network architectures trained to
solve different supervised and self-supervised training tasks. We then examine
factors affecting the network interpretability such as the number of the
training iterations, regularizations, different initializations, and the
network depth and width. Finally we show that the interpreted units can be used
to provide explicit explanations of a prediction given by a CNN for an image.
Our results highlight that interpretability is an important property of deep
neural networks that provides new insights into their hierarchical structure.


## [Interpretable classifiers using rules and Bayesian analysis: Building a  better stroke prediction model](https://arxiv.org/abs/1511.01644)
[(PDF)](https://arxiv.org/pdf/1511.01644)

`Authors:Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, David Madigan`


Comments:

Published at this http URL in the Annals of Applied Statistics (this http URL) by the Institute of Mathematical Statistics (this http URL)

Subjects:

Applications (stat.AP); Learning (cs.LG); Machine Learning (stat.ML)


Journal reference:

Annals of Applied Statistics 2015, Vol. 9, No. 3, 1350-1371


DOI:

10.1214/15-AOAS848


Report number:

IMS-AOAS-AOAS848


Cite as:

arXiv:1511.01644 [stat.AP]

 
(or arXiv:1511.01644v1 [stat.AP] for this version)


> Abstract: We aim to produce predictive models that are not only accurate, but are also
interpretable to human experts. Our models are decision lists, which consist of
a series of if...then... statements (e.g., if high blood pressure, then stroke)
that discretize a high-dimensional, multivariate feature space into a series of
simple, readily interpretable decision statements. We introduce a generative
model called Bayesian Rule Lists that yields a posterior distribution over
possible decision lists. It employs a novel prior structure to encourage
sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy
on par with the current top algorithms for prediction in machine learning. Our
method is motivated by recent developments in personalized medicine, and can be
used to produce highly accurate and interpretable medical scoring systems. We
demonstrate this by producing an alternative to the CHADS$_2$ score, actively
used in clinical practice for estimating the risk of stroke in patients that
have atrial fibrillation. Our model is as interpretable as CHADS$_2$, but more
accurate.


## [Interpretable Deep Neural Networks for Single-Trial EEG Classification](https://arxiv.org/abs/1604.08201)
[(PDF)](https://arxiv.org/pdf/1604.08201)

`Authors:Irene Sturm, Sebastian Bach, Wojciech Samek, Klaus-Robert Müller`


Comments:

5 pages, 1 figure

Subjects:

Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)


Cite as:

arXiv:1604.08201 [cs.NE]

 
(or arXiv:1604.08201v1 [cs.NE] for this version)


> Abstract: Background: In cognitive neuroscience the potential of Deep Neural Networks
(DNNs) for solving complex classification tasks is yet to be fully exploited.
The most limiting factor is that DNNs as notorious 'black boxes' do not provide
insight into neurophysiological phenomena underlying a decision. Layer-wise
Relevance Propagation (LRP) has been introduced as a novel method to explain
individual network decisions. New Method: We propose the application of DNNs
with LRP for the first time for EEG data analysis. Through LRP the single-trial
DNN decisions are transformed into heatmaps indicating each data point's
relevance for the outcome of the decision. Results: DNN achieves classification
accuracies comparable to those of CSP-LDA. In subjects with low performance
subject-to-subject transfer of trained DNNs can improve the results. The
single-trial LRP heatmaps reveal neurophysiologically plausible patterns,
resembling CSP-derived scalp maps. Critically, while CSP patterns represent
class-wise aggregated information, LRP heatmaps pinpoint neural patterns to
single time points in single trials. Comparison with Existing Method(s): We
compare the classification performance of DNNs to that of linear CSP-LDA on two
data sets related to motor-imaginery BCI. Conclusion: We have demonstrated that
DNN is a powerful non-linear tool for EEG analysis. With LRP a new quality of
high-resolution assessment of neural activity can be reached. LRP is a
potential remedy for the lack of interpretability of DNNs that has limited
their utility in neuroscientific applications. The extreme specificity of the
LRP-derived heatmaps opens up new avenues for investigating neural activity
underlying complex perception or decision-related processes.


## [InfoGAN: Interpretable Representation Learning by Information Maximizing  Generative Adversarial Nets](https://arxiv.org/abs/1606.03657)
[(PDF)](https://arxiv.org/pdf/1606.03657)

`Authors:Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel`


Subjects:

Learning (cs.LG); Machine Learning (stat.ML)


Cite as:

arXiv:1606.03657 [cs.LG]

 
(or arXiv:1606.03657v1 [cs.LG] for this version)


> Abstract: This paper describes InfoGAN, an information-theoretic extension to the
Generative Adversarial Network that is able to learn disentangled
representations in a completely unsupervised manner. InfoGAN is a generative
adversarial network that also maximizes the mutual information between a small
subset of the latent variables and the observation. We derive a lower bound to
the mutual information objective that can be optimized efficiently, and show
that our training procedure can be interpreted as a variation of the Wake-Sleep
algorithm. Specifically, InfoGAN successfully disentangles writing styles from
digit shapes on the MNIST dataset, pose from lighting of 3D rendered images,
and background digits from the central digit on the SVHN dataset. It also
discovers visual concepts that include hair styles, presence/absence of
eyeglasses, and emotions on the CelebA face dataset. Experiments show that
InfoGAN learns interpretable representations that are competitive with
representations learned by existing fully supervised methods.


## [The Mythos of Model Interpretability](https://arxiv.org/abs/1606.03490)
[(PDF)](https://arxiv.org/pdf/1606.03490)

`Authors:Zachary C. Lipton`


Comments:

presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)


Cite as:

arXiv:1606.03490 [cs.LG]

 
(or arXiv:1606.03490v3 [cs.LG] for this version)


> Abstract: Supervised machine learning models boast remarkable predictive capabilities.
But can you trust your model? Will it work in deployment? What else can it tell
you about the world? We want models to be not only good, but interpretable. And
yet the task of interpretation appears underspecified. Papers provide diverse
and sometimes non-overlapping motivations for interpretability, and offer
myriad notions of what attributes render models interpretable. Despite this
ambiguity, many papers proclaim interpretability axiomatically, absent further
explanation. In this paper, we seek to refine the discourse on
interpretability. First, we examine the motivations underlying interest in
interpretability, finding them to be diverse and occasionally discordant. Then,
we address model properties and techniques thought to confer interpretability,
identifying transparency to humans and post-hoc explanations as competing
notions. Throughout, we discuss the feasibility and desirability of different
notions, and question the oft-made assertions that linear models are
interpretable and that deep neural networks are not.


## [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](https://arxiv.org/abs/1606.05320)
[(PDF)](https://arxiv.org/pdf/1606.05320)

`Authors:Viktoriya Krakovna, Finale Doshi-Velez`


Comments:

presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Computation and Language (cs.CL); Learning (cs.LG)


Cite as:

arXiv:1606.05320 [stat.ML]

 
(or arXiv:1606.05320v2 [stat.ML] for this version)


> Abstract: As deep neural networks continue to revolutionize various application
domains, there is increasing interest in making these powerful models more
understandable and interpretable, and narrowing down the causes of good and bad
predictions. We focus on recurrent neural networks (RNNs), state of the art
models in speech recognition and translation. Our approach to increasing
interpretability is by combining an RNN with a hidden Markov model (HMM), a
simpler and more transparent model. We explore various combinations of RNNs and
HMMs: an HMM trained on LSTM states; a hybrid model where an HMM is trained
first, then a small LSTM is given HMM state distributions and trained to fill
in gaps in the HMM's performance; and a jointly trained hybrid model. We find
that the LSTM and HMM learn complementary information about the features in the
text.


## [Model-Agnostic Interpretability of Machine Learning](https://arxiv.org/abs/1606.05386)
[(PDF)](https://arxiv.org/pdf/1606.05386)

`Authors:Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin`


Comments:

presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1606.05386 [stat.ML]

 
(or arXiv:1606.05386v1 [stat.ML] for this version)


> Abstract: Understanding why machine learning models behave the way they do empowers
both system designers and end-users in many ways: in model selection, feature
engineering, in order to trust and act upon the predictions, and in more
intuitive user interfaces. Thus, interpretability has become a vital concern in
machine learning, and work in the area of interpretable models has found
renewed interest. In some applications, such models are as accurate as
non-interpretable ones, and thus are preferred for their transparency. Even
when they are not accurate, they may still be preferred when interpretability
is of paramount importance. However, restricting machine learning to
interpretable models is often a severe limitation. In this paper we argue for
explaining machine learning predictions using model-agnostic approaches. By
treating the machine learning models as black-box functions, these approaches
provide crucial flexibility in the choice of models, explanations, and
representations, improving debugging, comparison, and interfaces for a variety
of users and models. We also outline the main challenges for such methods, and
review a recently-introduced model-agnostic explanation approach (LIME) that
addresses these challenges.


## [Learning Interpretable Musical Compositional Rules and Traces](https://arxiv.org/abs/1606.05572)
[(PDF)](https://arxiv.org/pdf/1606.05572)

`Authors:Haizi Yu, Lav R. Varshney, Guy E. Garnett, Ranjitha Kumar`


Comments:

presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1606.05572 [stat.ML]

 
(or arXiv:1606.05572v1 [stat.ML] for this version)


> Abstract: Throughout music history, theorists have identified and documented
interpretable rules that capture the decisions of composers. This paper asks,
"Can a machine behave like a music theorist?" It presents MUS-ROVER, a
self-learning system for automatically discovering rules from symbolic music.
MUS-ROVER performs feature learning via $n$-gram models to extract
compositional rules --- statistical patterns over the resulting features. We
evaluate MUS-ROVER on Bach's (SATB) chorales, demonstrating that it can recover
known rules, as well as identify new, characteristic patterns for further
study. We discuss how the extracted rules can be used in both machine and human
composition.


## [Building an Interpretable Recommender via Loss-Preserving Transformation](https://arxiv.org/abs/1606.05819)
[(PDF)](https://arxiv.org/pdf/1606.05819)

`Authors:Amit Dhurandhar, Sechan Oh, Marek Petrik`


Comments:

Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1606.05819 [stat.ML]

 
(or arXiv:1606.05819v1 [stat.ML] for this version)


> Abstract: We propose a method for building an interpretable recommender system for
personalizing online content and promotions. Historical data available for the
system consists of customer features, provided content (promotions), and user
responses. Unlike in a standard multi-class classification setting,
misclassification costs depend on both recommended actions and customers. Our
method transforms such a data set to a new set which can be used with standard
interpretable multi-class classification algorithms. The transformation has the
desirable property that minimizing the standard misclassification penalty in
this new space is equivalent to minimizing the custom cost function.


## [Using Visual Analytics to Interpret Predictive Machine Learning Models](https://arxiv.org/abs/1606.05685)
[(PDF)](https://arxiv.org/pdf/1606.05685)

`Authors:Josua Krause, Adam Perer, Enrico Bertini`


Comments:

presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1606.05685 [stat.ML]

 
(or arXiv:1606.05685v2 [stat.ML] for this version)


> Abstract: It is commonly believed that increasing the interpretability of a machine
learning model may decrease its predictive power. However, inspecting
input-output relationships of those models using visual analytics, while
treating them as black-box, can help to understand the reasoning behind
outcomes without sacrificing predictive quality. We identify a space of
possible solutions and provide two examples of where such techniques have been
successfully used in practice.


## [Interpretable Machine Learning Models for the Digital Clock Drawing Test](https://arxiv.org/abs/1606.07163)
[(PDF)](https://arxiv.org/pdf/1606.07163)

`Authors:William Souillard-Mandar, Randall Davis, Cynthia Rudin, Rhoda Au, Dana Penney`


Comments:

Presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1606.07163 [stat.ML]

 
(or arXiv:1606.07163v1 [stat.ML] for this version)


> Abstract: The Clock Drawing Test (CDT) is a rapid, inexpensive, and popular
neuropsychological screening tool for cognitive conditions. The Digital Clock
Drawing Test (dCDT) uses novel software to analyze data from a digitizing
ballpoint pen that reports its position with considerable spatial and temporal
precision, making possible the analysis of both the drawing process and final
product. We developed methodology to analyze pen stroke data from these
drawings, and computed a large collection of features which were then analyzed
with a variety of machine learning techniques. The resulting scoring systems
were designed to be more accurate than the systems currently used by
clinicians, but just as interpretable and easy to use. The systems also allow
us to quantify the tradeoff between accuracy and interpretability. We created
automated versions of the CDT scoring systems currently used by clinicians,
allowing us to benchmark our models, which indicated that our machine learning
models substantially outperformed the existing scoring systems.


## [SnapToGrid: From Statistical to Interpretable Models for Biomedical  Information Extraction](https://arxiv.org/abs/1606.09604)
[(PDF)](https://arxiv.org/pdf/1606.09604)

`Authors:Marco A. Valenzuela-Escarcega, Gus Hahn-Powell, Dane Bell, Mihai Surdeanu`


Subjects:

Computation and Language (cs.CL)


Cite as:

arXiv:1606.09604 [cs.CL]

 
(or arXiv:1606.09604v1 [cs.CL] for this version)


> Abstract: We propose an approach for biomedical information extraction that marries the
advantages of machine learning models, e.g., learning directly from data, with
the benefits of rule-based approaches, e.g., interpretability. Our approach
starts by training a feature-based statistical model, then converts this model
to a rule-based variant by converting its features to rules, and "snapping to
grid" the feature weights to discrete votes. In doing so, our proposal takes
advantage of the large body of work in machine learning, but it produces an
interpretable model, which can be directly edited by experts. We evaluate our
approach on the BioNLP 2009 event extraction task. Our results show that there
is a small performance penalty when converting the statistical model to rules,
but the gain in interpretability compensates for that: with minimal effort,
human experts improve this model to have similar performance to the statistical
model that served as starting point.


## [Meaningful Models: Utilizing Conceptual Structure to Improve Machine  Learning Interpretability](https://arxiv.org/abs/1607.00279)
[(PDF)](https://arxiv.org/pdf/1607.00279)

`Authors:Nick Condry`


Comments:

5 pages, 3 figures, presented at 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY

Subjects:

Machine Learning (stat.ML); Artificial Intelligence (cs.AI)


Cite as:

arXiv:1607.00279 [stat.ML]

 
(or arXiv:1607.00279v1 [stat.ML] for this version)


> Abstract: The last decade has seen huge progress in the development of advanced machine
learning models; however, those models are powerless unless human users can
interpret them. Here we show how the mind's construction of concepts and
meaning can be used to create more interpretable machine learning models. By
proposing a novel method of classifying concepts, in terms of 'form' and
'function', we elucidate the nature of meaning and offer proposals to improve
model understandability. As machine learning begins to permeate daily life,
interpretable models may serve as a bridge between domain-expert authors and
non-expert users.


## [RETAIN: An Interpretable Predictive Model for Healthcare using Reverse  Time Attention Mechanism](https://arxiv.org/abs/1608.05745)
[(PDF)](https://arxiv.org/pdf/1608.05745)

`Authors:Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Andy Schuetz, Walter F. Stewart, Jimeng Sun`


Comments:

Accepted at Neural Information Processing Systems (NIPS) 2016

Subjects:

Learning (cs.LG); Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE)


Cite as:

arXiv:1608.05745 [cs.LG]

 
(or arXiv:1608.05745v4 [cs.LG] for this version)


> Abstract: Accuracy and interpretability are two dominant features of successful
predictive models. Typically, a choice must be made in favor of complex black
box models such as recurrent neural networks (RNN) for accuracy versus less
accurate but more interpretable traditional models such as logistic regression.
This tradeoff poses challenges in medicine where both accuracy and
interpretability are important. We addressed this challenge by developing the
REverse Time AttentIoN model (RETAIN) for application to Electronic Health
Records (EHR) data. RETAIN achieves high accuracy while remaining clinically
interpretable and is based on a two-level neural attention model that detects
influential past visits and significant clinical variables within those visits
(e.g. key diagnoses). RETAIN mimics physician practice by attending the EHR
data in a reverse time order so that recent clinical visits are likely to
receive higher attention. RETAIN was tested on a large health system EHR
dataset with 14 million visits completed by 263K patients over an 8 year period
and demonstrated predictive accuracy and computational scalability comparable
to state-of-the-art methods such as RNN, and ease of interpretability
comparable to traditional models.


## [Towards Transparent AI Systems: Interpreting Visual Question Answering  Models](https://arxiv.org/abs/1608.08974)
[(PDF)](https://arxiv.org/pdf/1608.08974)

`Authors:Yash Goyal, Akrit Mohapatra, Devi Parikh, Dhruv Batra`


Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)


Cite as:

arXiv:1608.08974 [cs.CV]

 
(or arXiv:1608.08974v2 [cs.CV] for this version)


> Abstract: Deep neural networks have shown striking progress and obtained
state-of-the-art results in many AI research fields in the recent years.
However, it is often unsatisfying to not know why they predict what they do. In
this paper, we address the problem of interpreting Visual Question Answering
(VQA) models. Specifically, we are interested in finding what part of the input
(pixels in images or words in questions) the VQA model focuses on while
answering the question. To tackle this problem, we use two visualization
techniques -- guided backpropagation and occlusion -- to find important words
in the question and important regions in the image. We then present qualitative
and quantitative analyses of these importance maps. We found that even without
explicit attention mechanisms, VQA models may sometimes be implicitly attending
to relevant regions in the image, and often to appropriate words in the
question.


## [Real Time Fine-Grained Categorization with Accuracy and Interpretability](https://arxiv.org/abs/1610.00824)
[(PDF)](https://arxiv.org/pdf/1610.00824)

`Authors:Shaoli Huang, Dacheng Tao`


Comments:

arXiv admin note: text overlap with arXiv:1512.08086

Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1610.00824 [cs.CV]

 
(or arXiv:1610.00824v1 [cs.CV] for this version)


> Abstract: A well-designed fine-grained categorization system usually has three
contradictory requirements: accuracy (the ability to identify objects among
subordinate categories); interpretability (the ability to provide
human-understandable explanation of recognition system behavior); and
efficiency (the speed of the system). To handle the trade-off between accuracy
and interpretability, we propose a novel "Deeper Part-Stacked CNN" architecture
armed with interpretability by modeling subtle differences between object
parts. The proposed architecture consists of a part localization network, a
two-stream classification network that simultaneously encodes object-level and
part-level cues, and a feature vectors fusion component. Specifically, the part
localization network is implemented by exploring a new paradigm for key point
localization that first samples a small number of representable pixels and then
determine their labels via a convolutional layer followed by a softmax layer.
We also use a cropping layer to extract part features and propose a scale
mean-max layer for feature fusion learning. Experimentally, our proposed method
outperform state-of-the-art approaches both in part localization task and
classification task on Caltech-UCSD Birds-200-2011. Moreover, by adopting a set
of sharing strategies between the computation of multiple object parts, our
single model is fairly efficient running at 32 frames/sec.


## [Interpreting Neural Networks to Improve Politeness Comprehension](https://arxiv.org/abs/1610.02683)
[(PDF)](https://arxiv.org/pdf/1610.02683)

`Authors:Malika Aubakirova, Mohit Bansal`


Comments:

To appear at EMNLP 2016

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI)


Cite as:

arXiv:1610.02683 [cs.CL]

 
(or arXiv:1610.02683v1 [cs.CL] for this version)


> Abstract: We present an interpretable neural network approach to predicting and
understanding politeness in natural language requests. Our models are based on
simple convolutional neural networks directly on raw text, avoiding any manual
identification of complex sentiment or syntactic features, while performing
better than such feature-based models from previous work. More importantly, we
use the challenging task of politeness prediction as a testbed to next present
a much-needed understanding of what these successful networks are actually
learning. For this, we present several network visualizations based on
activation clusters, first derivative saliency, and embedding space
transformations, helping us automatically identify several subtle linguistics
markers of politeness theories. Further, this analysis reveals multiple novel,
high-scoring politeness strategies which, when added back as new features,
reduce the accuracy gap between the original featurized system and the neural
model, thus providing a clear quantitative interpretation of the success of
these neural networks.


## [Particle Swarm Optimization for Generating Interpretable Fuzzy  Reinforcement Learning Policies](https://arxiv.org/abs/1610.05984)
[(PDF)](https://arxiv.org/pdf/1610.05984)

`Authors:Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft`


Subjects:

Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Learning (cs.LG); Systems and Control (cs.SY)


Journal reference:

Engineering Applications of Artificial Intelligence, Volume 65C,
  October 2017, Pages 87-98


DOI:

10.1016/j.engappai.2017.07.005


Cite as:

arXiv:1610.05984 [cs.NE]

 
(or arXiv:1610.05984v5 [cs.NE] for this version)


> Abstract: Fuzzy controllers are efficient and interpretable system controllers for
continuous state and action spaces. To date, such controllers have been
constructed manually or trained automatically either using expert-generated
problem-specific cost functions or incorporating detailed knowledge about the
optimal control strategy. Both requirements for automatic training processes
are not found in most real-world reinforcement learning (RL) problems. In such
applications, online learning is often prohibited for safety reasons because
online learning requires exploration of the problem's dynamics during policy
training. We introduce a fuzzy particle swarm reinforcement learning (FPSRL)
approach that can construct fuzzy RL policies solely by training parameters on
world models that simulate real system dynamics. These world models are created
by employing an autonomous machine learning technique that uses previously
generated transition samples of a real system. To the best of our knowledge,
this approach is the first to relate self-organizing fuzzy controllers to
model-based batch RL. Therefore, FPSRL is intended to solve problems in domains
where online learning is prohibited, system dynamics are relatively easy to
model from previously generated default policy transition samples, and it is
expected that a relatively easily interpretable control policy exists. The
efficiency of the proposed approach with problems from such domains is
demonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole
balancing, and cart-pole swing-up. Our experimental results demonstrate
high-performing, interpretable fuzzy policies.


## [Embedding Projector: Interactive Visualization and Interpretation of  Embeddings](https://arxiv.org/abs/1611.05469)
[(PDF)](https://arxiv.org/pdf/1611.05469)

`Authors:Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, Martin Wattenberg`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Human-Computer Interaction (cs.HC)


Cite as:

arXiv:1611.05469 [stat.ML]

 
(or arXiv:1611.05469v1 [stat.ML] for this version)


> Abstract: Embeddings are ubiquitous in machine learning, appearing in recommender
systems, NLP, and many other applications. Researchers and developers often
need to explore the properties of a specific embedding, and one way to analyze
embeddings is to visualize them. We present the Embedding Projector, a tool for
interactive visualization and interpretation of embeddings.


## [Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning](https://arxiv.org/abs/1611.04246)
[(PDF)](https://arxiv.org/pdf/1611.04246)

`Authors:Quanshi Zhang, Ruiming Cao, Ying Nian Wu, Song-Chun Zhu`


Comments:

in the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17)

Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1611.04246 [cs.CV]

 
(or arXiv:1611.04246v2 [cs.CV] for this version)


> Abstract: This paper proposes a learning strategy that extracts object-part concepts
from a pre-trained convolutional neural network (CNN), in an attempt to 1)
explore explicit semantics hidden in CNN units and 2) gradually grow a
semantically interpretable graphical model on the pre-trained CNN for
hierarchical object understanding. Given part annotations on very few (e.g.,
3-12) objects, our method mines certain latent patterns from the pre-trained
CNN and associates them with different semantic parts. We use a four-layer
And-Or graph to organize the mined latent patterns, so as to clarify their
internal semantic hierarchy. Our method is guided by a small number of part
annotations, and it achieves superior performance (about 13%-107% improvement)
in part center prediction on the PASCAL VOC and ImageNet datasets.


## [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](https://arxiv.org/abs/1611.05934)
[(PDF)](https://arxiv.org/pdf/1611.05934)

`Authors:Viktoriya Krakovna, Finale Doshi-Velez`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems. arXiv admin note: substantial text overlap with arXiv:1606.05320

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1611.05934 [stat.ML]

 
(or arXiv:1611.05934v1 [stat.ML] for this version)


> Abstract: As deep neural networks continue to revolutionize various application
domains, there is increasing interest in making these powerful models more
understandable and interpretable, and narrowing down the causes of good and bad
predictions. We focus on recurrent neural networks, state of the art models in
speech recognition and translation. Our approach to increasing interpretability
is by combining a long short-term memory (LSTM) model with a hidden Markov
model (HMM), a simpler and more transparent model. We add the HMM state
probabilities to the output layer of the LSTM, and then train the HMM and LSTM
either sequentially or jointly. The LSTM can make use of the information from
the HMM, and fill in the gaps when the HMM is not performing well. A small
hybrid model usually performs better than a standalone LSTM of the same size,
especially on smaller data sets. We test the algorithms on text data and
medical time series data, and find that the LSTM and HMM learn complementary
information about the features in the text.


## [GENESIM: genetic extraction of a single, interpretable model](https://arxiv.org/abs/1611.05722)
[(PDF)](https://arxiv.org/pdf/1611.05722)

`Authors:Gilles Vandewiele, Olivier Janssens, Femke Ongenae, Filip De Turck, Sofie Van Hoecke`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1611.05722 [stat.ML]

 
(or arXiv:1611.05722v1 [stat.ML] for this version)


> Abstract: Models obtained by decision tree induction techniques excel in being
interpretable.However, they can be prone to overfitting, which results in a low
predictive performance. Ensemble techniques are able to achieve a higher
accuracy. However, this comes at a cost of losing interpretability of the
resulting model. This makes ensemble techniques impractical in applications
where decision support, instead of decision making, is crucial.
To bridge this gap, we present the GENESIM algorithm that transforms an
ensemble of decision trees to a single decision tree with an enhanced
predictive performance by using a genetic algorithm. We compared GENESIM to
prevalent decision tree induction and ensemble techniques using twelve publicly
available data sets. The results show that GENESIM achieves a better predictive
performance on most of these data sets than decision tree induction techniques
and a predictive performance in the same order of magnitude as the ensemble
techniques. Moreover, the resulting model of GENESIM has a very low complexity,
making it very interpretable, in contrast to ensemble techniques.


## [Stratified Knowledge Bases as Interpretable Probabilistic Models  (Extended Abstract)](https://arxiv.org/abs/1611.06174)
[(PDF)](https://arxiv.org/pdf/1611.06174)

`Authors:Ondrej Kuzelka, Jesse Davis, Steven Schockaert`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Artificial Intelligence (cs.AI)


Cite as:

arXiv:1611.06174 [cs.AI]

 
(or arXiv:1611.06174v1 [cs.AI] for this version)


> Abstract: In this paper, we advocate the use of stratified logical theories for
representing probabilistic models. We argue that such encodings can be more
interpretable than those obtained in existing frameworks such as Markov logic
networks. Among others, this allows for the use of domain experts to improve
learned models by directly removing, adding, or modifying logical formulas.


## [Learning Interpretability for Visualizations using Adapted Cox Models  through a User Experiment](https://arxiv.org/abs/1611.06175)
[(PDF)](https://arxiv.org/pdf/1611.06175)

`Authors:Adrien Bibal, Benoit Frénay`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC); Learning (cs.LG)


Cite as:

arXiv:1611.06175 [stat.ML]

 
(or arXiv:1611.06175v1 [stat.ML] for this version)


> Abstract: In order to be useful, visualizations need to be interpretable. This paper
uses a user-based approach to combine and assess quality measures in order to
better model user preferences. Results show that cluster separability measures
are outperformed by a neighborhood conservation measure, even though the former
are usually considered as intuitively representative of user motives. Moreover,
combining measures, as opposed to using a single measure, further improves
prediction performances.


## [Tree Space Prototypes: Another Look at Making Tree Ensembles  Interpretable](https://arxiv.org/abs/1611.07115)
[(PDF)](https://arxiv.org/pdf/1611.07115)

`Authors:Hui Fen Tan, Giles Hooker, Martin T. Wells`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1611.07115 [stat.ML]

 
(or arXiv:1611.07115v1 [stat.ML] for this version)


> Abstract: Ensembles of decision trees have good prediction accuracy but suffer from a
lack of interpretability. We propose a new approach for interpreting tree
ensembles by finding prototypes in tree space, utilizing the naturally-learned
similarity measure from the tree ensemble. Demonstrating the method on random
forests, we show that the method benefits from unique aspects of tree ensembles
by leveraging tree structure to sequentially find prototypes. The method
provides good prediction accuracy when found prototypes are used in
nearest-prototype classifiers, while using fewer prototypes than competitor
methods. We are investigating the sensitivity of the method to different
prototype-finding procedures and demonstrating it on higher-dimensional data.


## [Interpreting Finite Automata for Sequential Data](https://arxiv.org/abs/1611.07100)
[(PDF)](https://arxiv.org/pdf/1611.07100)

`Authors:Christian Albert Hammerschmidt, Sicco Verwer, Qin Lin, Radu State`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Artificial Intelligence (cs.AI)


ACM classes:

I.2.6


Cite as:

arXiv:1611.07100 [stat.ML]

 
(or arXiv:1611.07100v2 [stat.ML] for this version)


> Abstract: Automaton models are often seen as interpretable models. Interpretability
itself is not well defined: it remains unclear what interpretability means
without first explicitly specifying objectives or desired attributes. In this
paper, we identify the key properties used to interpret automata and propose a
modification of a state-merging approach to learn variants of finite state
automata. We apply the approach to problems beyond typical grammar inference
tasks. Additionally, we cover several use-cases for prediction, classification,
and clustering on sequential data in both supervised and unsupervised scenarios
to show how the identified key properties are applicable in a wide range of
contexts.


## [Inducing Interpretable Representations with Variational Autoencoders](https://arxiv.org/abs/1611.07492)
[(PDF)](https://arxiv.org/pdf/1611.07492)

`Authors:N. Siddharth, Brooks Paige, Alban Desmaison, Jan-Willem Van de Meent, Frank Wood, Noah D. Goodman, Pushmeet Kohli, Philip H.S. Torr`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)


Cite as:

arXiv:1611.07492 [stat.ML]

 
(or arXiv:1611.07492v1 [stat.ML] for this version)


> Abstract: We develop a framework for incorporating structured graphical models in the
\emph{encoders} of variational autoencoders (VAEs) that allows us to induce
interpretable representations through approximate variational inference. This
allows us to both perform reasoning (e.g. classification) under the structural
constraints of a given graphical model, and use deep generative models to deal
with messy, high-dimensional domains where it is often difficult to model all
the variation. Learning in this framework is carried out end-to-end with a
variational objective, applying to both unsupervised and semi-supervised
schemes.


## [Interpretation of Prediction Models Using the Input Gradient](https://arxiv.org/abs/1611.07634)
[(PDF)](https://arxiv.org/pdf/1611.07634)

`Authors:Yotam Hechtlinger`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1611.07634 [stat.ML]

 
(or arXiv:1611.07634v1 [stat.ML] for this version)


> Abstract: State of the art machine learning algorithms are highly optimized to provide
the optimal prediction possible, naturally resulting in complex models. While
these models often outperform simpler more interpretable models by order of
magnitudes, in terms of understanding the way the model functions, we are often
facing a "black box".
In this paper we suggest a simple method to interpret the behavior of any
predictive model, both for regression and classification. Given a particular
model, the information required to interpret it can be obtained by studying the
partial derivatives of the model with respect to the input. We exemplify this
insight by interpreting convolutional and multi-layer neural networks in the
field of natural language processing.


## [Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery](https://arxiv.org/abs/1611.07252)
[(PDF)](https://arxiv.org/pdf/1611.07252)

`Authors:Scott Wisdom, Thomas Powers, James Pitton, Les Atlas`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1611.07252 [stat.ML]

 
(or arXiv:1611.07252v1 [stat.ML] for this version)


> Abstract: Recurrent neural networks (RNNs) are powerful and effective for processing
sequential data. However, RNNs are usually considered "black box" models whose
internal structure and learned parameters are not interpretable. In this paper,
we propose an interpretable RNN based on the sequential iterative
soft-thresholding algorithm (SISTA) for solving the sequential sparse recovery
problem, which models a sequence of correlated observations with a sequence of
sparse latent vectors. The architecture of the resulting SISTA-RNN is
implicitly defined by the computational structure of SISTA, which results in a
novel stacked RNN architecture. Furthermore, the weights of the SISTA-RNN are
perfectly interpretable as the parameters of a principled statistical model,
which in this case include a sparsifying dictionary, iterative step size, and
regularization parameters. In addition, on a particular sequential compressive
sensing task, the SISTA-RNN trains faster and achieves better performance than
conventional state-of-the-art black box RNNs, including long-short term memory
(LSTM) RNNs.


## [An unexpected unity among methods for interpreting model predictions](https://arxiv.org/abs/1611.07478)
[(PDF)](https://arxiv.org/pdf/1611.07478)

`Authors:Scott Lundberg, Su-In Lee`


Comments:

Presented at NIPS 2016 Workshop on Interpretable Machine Learning in Complex Systems

Subjects:

Artificial Intelligence (cs.AI)


Cite as:

arXiv:1611.07478 [cs.AI]

 
(or arXiv:1611.07478v3 [cs.AI] for this version)


> Abstract: Understanding why a model made a certain prediction is crucial in many data
science fields. Interpretable predictions engender appropriate trust and
provide insight into how the model may be improved. However, with large modern
datasets the best accuracy is often achieved by complex models even experts
struggle to interpret, which creates a tension between accuracy and
interpretability. Recently, several methods have been proposed for interpreting
predictions from complex models by estimating the importance of input features.
Here, we present how a model-agnostic additive representation of the importance
of input features unifies current methods. This representation is optimal, in
the sense that it is the only set of additive values that satisfies important
properties. We show how we can leverage these properties to create novel visual
explanations of model predictions. The thread of unity that this representation
weaves through the literature indicates that there are common principles to be
learned about the interpretation of model predictions that apply in many
scenarios.


## [Input Switched Affine Networks: An RNN Architecture Designed for  Interpretability](https://arxiv.org/abs/1611.09434)
[(PDF)](https://arxiv.org/pdf/1611.09434)

`Authors:Jakob N. Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo`


Comments:

ICLR 2107 submission: this https URL

Subjects:

Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG); Neural and Evolutionary Computing (cs.NE)


Cite as:

arXiv:1611.09434 [cs.AI]

 
(or arXiv:1611.09434v2 [cs.AI] for this version)


> Abstract: There exist many problem domains where the interpretability of neural network
models is essential for deployment. Here we introduce a recurrent architecture
composed of input-switched affine transformations - in other words an RNN
without any explicit nonlinearities, but with input-dependent recurrent
weights. This simple form allows the RNN to be analyzed via straightforward
linear methods: we can exactly characterize the linear contribution of each
input to the model predictions; we can use a change-of-basis to disentangle
input, output, and computational hidden unit subspaces; we can fully
reverse-engineer the architecture's solution to a simple task. Despite this
ease of interpretation, the input switched affine network achieves reasonable
performance on a text modeling tasks, and allows greater computational
efficiency than networks with standard nonlinearities.


## [Large scale modeling of antimicrobial resistance with interpretable  classifiers](https://arxiv.org/abs/1612.01030)
[(PDF)](https://arxiv.org/pdf/1612.01030)

`Authors:Alexandre Drouin, Frédéric Raymond, Gaël Letarte St-Pierre, Mario Marchand, Jacques Corbeil, François Laviolette`


Comments:

Peer-reviewed and accepted for presentation at the Machine Learning for Health Workshop, NIPS 2016, Barcelona, Spain

Subjects:

Genomics (q-bio.GN); Learning (cs.LG); Machine Learning (stat.ML)


Cite as:

arXiv:1612.01030 [q-bio.GN]

 
(or arXiv:1612.01030v1 [q-bio.GN] for this version)


> Abstract: Antimicrobial resistance is an important public health concern that has
implications in the practice of medicine worldwide. Accurately predicting
resistance phenotypes from genome sequences shows great promise in promoting
better use of antimicrobial agents, by determining which antibiotics are likely
to be effective in specific clinical cases. In healthcare, this would allow for
the design of treatment plans tailored for specific individuals, likely
resulting in better clinical outcomes for patients with bacterial infections.
In this work, we present the recent work of Drouin et al. (2016) on using Set
Covering Machines to learn highly interpretable models of antibiotic resistance
and complement it by providing a large scale application of their method to the
entire PATRIC database. We report prediction results for 36 new datasets and
present the Kover AMR platform, a new web-based tool allowing the visualization
and interpretation of the generated models.


## [Interpretable Semantic Textual Similarity: Finding and explaining  differences between sentences](https://arxiv.org/abs/1612.04868)
[(PDF)](https://arxiv.org/pdf/1612.04868)

`Authors:I. Lopez-Gazpio, M. Maritxalar, A. Gonzalez-Agirre, G. Rigau, L. Uria, E. Agirre`


Comments:

Preprint version, Knowledge-Based Systems (ISSN: 0950-7051). (2016)

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)


DOI:

10.1016/j.knosys.2016.12.013


Cite as:

arXiv:1612.04868 [cs.CL]

 
(or arXiv:1612.04868v1 [cs.CL] for this version)


> Abstract: User acceptance of artificial intelligence agents might depend on their
ability to explain their reasoning, which requires adding an interpretability
layer that fa- cilitates users to understand their behavior. This paper focuses
on adding an in- terpretable layer on top of Semantic Textual Similarity (STS),
which measures the degree of semantic equivalence between two sentences. The
interpretability layer is formalized as the alignment between pairs of segments
across the two sentences, where the relation between the segments is labeled
with a relation type and a similarity score. We present a publicly available
dataset of sentence pairs annotated following the formalization. We then
develop a system trained on this dataset which, given a sentence pair, explains
what is similar and different, in the form of graded and typed segment
alignments. When evaluated on the dataset, the system performs better than an
informed baseline, showing that the dataset and task are well-defined and
feasible. Most importantly, two user studies show how the system output can be
used to automatically produce explanations in natural language. Users performed
better when having access to the explanations, pro- viding preliminary evidence
that our dataset and method to automatically produce explanations is useful in
real applications.


## [Towards a New Interpretation of Separable Convolutions](https://arxiv.org/abs/1701.04489)
[(PDF)](https://arxiv.org/pdf/1701.04489)

`Authors:Tapabrata Ghosh`


Subjects:

Learning (cs.LG); Machine Learning (stat.ML)


Cite as:

arXiv:1701.04489 [cs.LG]

 
(or arXiv:1701.04489v1 [cs.LG] for this version)


> Abstract: In recent times, the use of separable convolutions in deep convolutional
neural network architectures has been explored. Several researchers, most
notably (Chollet, 2016) and (Ghosh, 2017) have used separable convolutions in
their deep architectures and have demonstrated state of the art or close to
state of the art performance. However, the underlying mechanism of action of
separable convolutions are still not fully understood. Although their
mathematical definition is well understood as a depthwise convolution followed
by a pointwise convolution, deeper interpretations such as the extreme
Inception hypothesis (Chollet, 2016) have failed to provide a thorough
explanation of their efficacy. In this paper, we propose a hybrid
interpretation that we believe is a better model for explaining the efficacy of
separable convolutions.


## [Towards A Rigorous Science of Interpretable Machine Learning](https://arxiv.org/abs/1702.08608)
[(PDF)](https://arxiv.org/pdf/1702.08608)

`Authors:Finale Doshi-Velez, Been Kim`


Subjects:

Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Learning (cs.LG)


Cite as:

arXiv:1702.08608 [stat.ML]

 
(or arXiv:1702.08608v2 [stat.ML] for this version)


> Abstract: As machine learning systems become ubiquitous, there has been a surge of
interest in interpretable machine learning: systems that provide explanation
for their outputs. These explanations are often used to qualitatively assess
other criteria such as safety or non-discrimination. However, despite the
interest in interpretability, there is very little consensus on what
interpretable machine learning is and how it should be measured. In this
position paper, we first define interpretability and describe when
interpretability is needed (and when it is not). Next, we suggest a taxonomy
for rigorous evaluation and expose open questions towards a more rigorous
science of interpretable machine learning.


## [Streaming Weak Submodularity: Interpreting Neural Networks on the Fly](https://arxiv.org/abs/1703.02647)
[(PDF)](https://arxiv.org/pdf/1703.02647)

`Authors:Ethan R. Elenberg, Alexandros G. Dimakis, Moran Feldman, Amin Karbasi`


Comments:

To appear in NIPS 2017

Subjects:

Machine Learning (stat.ML); Information Theory (cs.IT); Learning (cs.LG)


Cite as:

arXiv:1703.02647 [stat.ML]

 
(or arXiv:1703.02647v3 [stat.ML] for this version)


> Abstract: In many machine learning applications, it is important to explain the
predictions of a black-box classifier. For example, why does a deep neural
network assign an image to a particular class? We cast interpretability of
black-box classifiers as a combinatorial maximization problem and propose an
efficient streaming algorithm to solve it subject to cardinality constraints.
By extending ideas from Badanidiyuru et al. [2014], we provide a constant
factor approximation guarantee for our algorithm in the case of random stream
order and a weakly submodular objective function. This is the first such
theoretical guarantee for this general class of functions, and we also show
that no such algorithm exists for a worst case stream order. Our algorithm
obtains similar explanations of Inception V3 predictions $10$ times faster than
the state-of-the-art LIME framework of Ribeiro et al. [2016].


## [Interpretable Structure-Evolving LSTM](https://arxiv.org/abs/1703.03055)
[(PDF)](https://arxiv.org/pdf/1703.03055)

`Authors:Xiaodan Liang, Liang Lin, Xiaohui Shen, Jiashi Feng, Shuicheng Yan, Eric P. Xing`


Comments:

To appear in CVPR 2017 as a spotlight paper

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)


Cite as:

arXiv:1703.03055 [cs.CV]

 
(or arXiv:1703.03055v1 [cs.CV] for this version)


> Abstract: This paper develops a general framework for learning interpretable data
representation via Long Short-Term Memory (LSTM) recurrent neural networks over
hierarchal graph structures. Instead of learning LSTM models over the pre-fixed
structures, we propose to further learn the intermediate interpretable
multi-level graph structures in a progressive and stochastic way from data
during the LSTM network optimization. We thus call this model the
structure-evolving LSTM. In particular, starting with an initial element-level
graph representation where each node is a small data element, the
structure-evolving LSTM gradually evolves the multi-level graph representations
by stochastically merging the graph nodes with high compatibilities along the
stacked LSTM layers. In each LSTM layer, we estimate the compatibility of two
connected nodes from their corresponding LSTM gate outputs, which is used to
generate a merging probability. The candidate graph structures are accordingly
generated where the nodes are grouped into cliques with their merging
probabilities. We then produce the new graph structure with a
Metropolis-Hasting algorithm, which alleviates the risk of getting stuck in
local optimums by stochastic sampling with an acceptance probability. Once a
graph structure is accepted, a higher-level graph is then constructed by taking
the partitioned cliques as its nodes. During the evolving process,
representation becomes more abstracted in higher-levels where redundant
information is filtered out, allowing more efficient propagation of long-range
data dependencies. We evaluate the effectiveness of structure-evolving LSTM in
the application of semantic object parsing and demonstrate its advantage over
state-of-the-art LSTM models on standard benchmarks.


## [Improving Interpretability of Deep Neural Networks with Semantic  Information](https://arxiv.org/abs/1703.04096)
[(PDF)](https://arxiv.org/pdf/1703.04096)

`Authors:Yinpeng Dong, Hang Su, Jun Zhu, Bo Zhang`


Comments:

To appear in CVPR 2017

Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1703.04096 [cs.CV]

 
(or arXiv:1703.04096v2 [cs.CV] for this version)


> Abstract: Interpretability of deep neural networks (DNNs) is essential since it enables
users to understand the overall strengths and weaknesses of the models, conveys
an understanding of how the models will behave in the future, and how to
diagnose and correct potential problems. However, it is challenging to reason
about what a DNN actually does due to its opaque or black-box nature. To
address this issue, we propose a novel technique to improve the
interpretability of DNNs by leveraging the rich semantic information embedded
in human descriptions. By concentrating on the video captioning task, we first
extract a set of semantically meaningful topics from the human descriptions
that cover a wide range of visual concepts, and integrate them into the model
with an interpretive loss. We then propose a prediction difference maximization
algorithm to interpret the learned features of each neuron. Experimental
results demonstrate its effectiveness in video captioning using the
interpretable features, which can also be transferred to video action
recognition. By clearly understanding the learned features, users can easily
revise false predictions via a human-in-the-loop procedure.


## [InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations](https://arxiv.org/abs/1703.08840)
[(PDF)](https://arxiv.org/pdf/1703.08840)

`Authors:Yunzhu Li, Jiaming Song, Stefano Ermon`


Comments:

14 pages, NIPS 2017

Subjects:

Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1703.08840 [cs.LG]

 
(or arXiv:1703.08840v2 [cs.LG] for this version)


> Abstract: The goal of imitation learning is to mimic expert behavior without access to
an explicit reward signal. Expert demonstrations provided by humans, however,
often show significant variability due to latent factors that are typically not
explicitly modeled. In this paper, we propose a new algorithm that can infer
the latent structure of expert demonstrations in an unsupervised way. Our
method, built on top of Generative Adversarial Imitation Learning, can not only
imitate complex behaviors, but also learn interpretable and meaningful
representations of complex behavioral data, including visual demonstrations. In
the driving domain, we show that a model learned from human demonstrations is
able to both accurately reproduce a variety of behaviors and accurately
anticipate human actions using raw visual inputs. Compared with various
baselines, our method can better capture the latent structure underlying expert
demonstrations, often recovering semantically meaningful factors of variation
in the data.


## [Interpretable Learning for Self-Driving Cars by Visualizing Causal  Attention](https://arxiv.org/abs/1703.10631)
[(PDF)](https://arxiv.org/pdf/1703.10631)

`Authors:Jinkyu Kim, John Canny`


Subjects:

Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)


Cite as:

arXiv:1703.10631 [cs.CV]

 
(or arXiv:1703.10631v1 [cs.CV] for this version)


> Abstract: Deep neural perception and control networks are likely to be a key component
of self-driving vehicles. These models need to be explainable - they should
provide easy-to-interpret rationales for their behavior - so that passengers,
insurance companies, law enforcement, developers etc., can understand what
triggered a particular behavior. Here we explore the use of visual
explanations. These explanations take the form of real-time highlighted regions
of an image that causally influence the network's output (steering control).
Our approach is two-stage. In the first stage, we use a visual attention model
to train a convolution network end-to-end from images to steering angle. The
attention model highlights image regions that potentially influence the
network's output. Some of these are true influences, but some are spurious. We
then apply a causal filtering step to determine which input regions actually
influence the output. This produces more succinct visual explanations and more
accurately exposes the network's behavior. We demonstrate the effectiveness of
our model on three datasets totaling 16 hours of driving. We first show that
training with attention does not degrade the performance of the end-to-end
network. Then we show that the network causally cues on a variety of features
that are used by humans while driving.


## [Interpretable 3D Human Action Analysis with Temporal Convolutional  Networks](https://arxiv.org/abs/1704.04516)
[(PDF)](https://arxiv.org/pdf/1704.04516)

`Authors:Tae Soo Kim, Austin Reiter`


Comments:

8 pages, 5 figures, BNMW CVPR 2017 Submission

Subjects:

Computer Vision and Pattern Recognition (cs.CV)


MSC classes:

68T45, 68T10 (Primary)


ACM classes:

I.2.10; I.5.4


Cite as:

arXiv:1704.04516 [cs.CV]

 
(or arXiv:1704.04516v1 [cs.CV] for this version)


> Abstract: The discriminative power of modern deep learning models for 3D human action
recognition is growing ever so potent. In conjunction with the recent
resurgence of 3D human action representation with 3D skeletons, the quality and
the pace of recent progress have been significant. However, the inner workings
of state-of-the-art learning based methods in 3D human action recognition still
remain mostly black-box. In this work, we propose to use a new class of models
known as Temporal Convolutional Neural Networks (TCN) for 3D human action
recognition. Compared to popular LSTM-based Recurrent Neural Network models,
given interpretable input such as 3D skeletons, TCN provides us a way to
explicitly learn readily interpretable spatio-temporal representations for 3D
human action recognition. We provide our strategy in re-designing the TCN with
interpretability in mind and how such characteristics of the model is leveraged
to construct a powerful 3D activity recognition method. Through this work, we
wish to take a step towards a spatio-temporal model that is easier to
understand, explain and interpret. The resulting model, Res-TCN, achieves
state-of-the-art results on the largest 3D human action recognition dataset,
NTU-RGBD.


## [An Interpretable Knowledge Transfer Model for Knowledge Base Completion](https://arxiv.org/abs/1704.05908)
[(PDF)](https://arxiv.org/pdf/1704.05908)

`Authors:Qizhe Xie, Xuezhe Ma, Zihang Dai, Eduard Hovy`


Comments:

Accepted by ACL 2017. Minor update

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Learning (cs.LG)


Cite as:

arXiv:1704.05908 [cs.CL]

 
(or arXiv:1704.05908v2 [cs.CL] for this version)


> Abstract: Knowledge bases are important resources for a variety of natural language
processing tasks but suffer from incompleteness. We propose a novel embedding
model, \emph{ITransF}, to perform knowledge base completion. Equipped with a
sparse attention mechanism, ITransF discovers hidden concepts of relations and
transfer statistical strength through the sharing of concepts. Moreover, the
learned associations between relations and concepts, which are represented by
sparse attention vectors, can be interpreted easily. We evaluate ITransF on two
benchmark datasets---WN18 and FB15k for knowledge base completion and obtains
improvements on both the mean rank and Hits@10 metrics, over all baselines that
do not use additional information.


## [Network Dissection: Quantifying Interpretability of Deep Visual  Representations](https://arxiv.org/abs/1704.05796)
[(PDF)](https://arxiv.org/pdf/1704.05796)

`Authors:David Bau, Bolei Zhou, Aditya Khosla, Aude Oliva, Antonio Torralba`


Comments:

First two authors contributed equally. Oral presentation at CVPR 2017

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)


ACM classes:

I.2.10


Cite as:

arXiv:1704.05796 [cs.CV]

 
(or arXiv:1704.05796v1 [cs.CV] for this version)


> Abstract: We propose a general framework called Network Dissection for quantifying the
interpretability of latent representations of CNNs by evaluating the alignment
between individual hidden units and a set of semantic concepts. Given any CNN
model, the proposed method draws on a broad data set of visual concepts to
score the semantics of hidden units at each intermediate convolutional layer.
The units with semantics are given labels across a range of objects, parts,
scenes, textures, materials, and colors. We use the proposed method to test the
hypothesis that interpretability of units is equivalent to random linear
combinations of units, then we apply our method to compare the latent
representations of various networks when trained to solve different supervised
and self-supervised training tasks. We further analyze the effect of training
iterations, compare networks trained with different initializations, examine
the impact of network depth and width, and measure the effect of dropout and
batch normalization on the interpretability of deep visual representations. We
demonstrate that the proposed method can shed light on characteristics of CNN
models and training methods that go beyond measurements of their discriminative
power.


## [Softmax Q-Distribution Estimation for Structured Prediction: A  Theoretical Interpretation for RAML](https://arxiv.org/abs/1705.07136)
[(PDF)](https://arxiv.org/pdf/1705.07136)

`Authors:Xuezhe Ma, Pengcheng Yin, Jingzhou Liu, Graham Neubig, Eduard Hovy`


Comments:

Under Review of ICLR 2018

Subjects:

Learning (cs.LG); Computation and Language (cs.CL); Machine Learning (stat.ML)


Cite as:

arXiv:1705.07136 [cs.LG]

 
(or arXiv:1705.07136v3 [cs.LG] for this version)


> Abstract: Reward augmented maximum likelihood (RAML), a simple and effective learning
framework to directly optimize towards the reward function in structured
prediction tasks, has led to a number of impressive empirical successes. RAML
incorporates task-specific reward by performing maximum-likelihood updates on
candidate outputs sampled according to an exponentiated payoff distribution,
which gives higher probabilities to candidates that are close to the reference
output. While RAML is notable for its simplicity, efficiency, and its
impressive empirical successes, the theoretical properties of RAML, especially
the behavior of the exponentiated payoff distribution, has not been examined
thoroughly. In this work, we introduce softmax Q-distribution estimation, a
novel theoretical interpretation of RAML, which reveals the relation between
RAML and Bayesian decision theory. The softmax Q-distribution can be regarded
as a smooth approximation of the Bayes decision boundary, and the Bayes
decision rule is achieved by decoding with this Q-distribution. We further show
that RAML is equivalent to approximately estimating the softmax Q-distribution,
with the temperature $\tau$ controlling approximation error. We perform two
experiments, one on synthetic data of multi-class classification and one on
real data of image captioning, to demonstrate the relationship between RAML and
the proposed softmax Q-distribution estimation method, verifying our
theoretical analysis. Additional experiments on three structured prediction
tasks with rewards defined on sequential (named entity recognition), tree-based
(dependency parsing) and irregular (machine translation) structures show
notable improvements over maximum likelihood baselines.


## [Logic Tensor Networks for Semantic Image Interpretation](https://arxiv.org/abs/1705.08968)
[(PDF)](https://arxiv.org/pdf/1705.08968)

`Authors:Ivan Donadello, Luciano Serafini, Artur d'Avila Garcez`


Comments:

14 pages, 2 figures, IJCAI 2017

Subjects:

Artificial Intelligence (cs.AI)


Cite as:

arXiv:1705.08968 [cs.AI]

 
(or arXiv:1705.08968v1 [cs.AI] for this version)


> Abstract: Semantic Image Interpretation (SII) is the task of extracting structured
semantic descriptions from images. It is widely agreed that the combined use of
visual data and background knowledge is of great importance for SII. Recently,
Statistical Relational Learning (SRL) approaches have been developed for
reasoning under uncertainty and learning in the presence of data and rich
knowledge. Logic Tensor Networks (LTNs) are an SRL framework which integrates
neural networks with first-order fuzzy logic to allow (i) efficient learning
from noisy data in the presence of logical constraints, and (ii) reasoning with
logical formulas describing general properties of the data. In this paper, we
develop and apply LTNs to two of the main tasks of SII, namely, the
classification of an image's bounding boxes and the detection of the relevant
part-of relations between objects. To the best of our knowledge, this is the
first successful application of SRL to such SII tasks. The proposed approach is
evaluated on a standard image processing benchmark. Experiments show that the
use of background knowledge in the form of logical constraints can improve the
performance of purely data-driven approaches, including the state-of-the-art
Fast Region-based Convolutional Neural Networks (Fast R-CNN). Moreover, we show
that the use of logical background knowledge adds robustness to the learning
system when errors are present in the labels of the training data.


## [Patchnet: Interpretable Neural Networks for Image Classification](https://arxiv.org/abs/1705.08078)
[(PDF)](https://arxiv.org/pdf/1705.08078)

`Authors:Adityanarayanan Radhakrishnan, Charles Durham, Ali Soylemezoglu, Caroline Uhler`


Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1705.08078 [cs.CV]

 
(or arXiv:1705.08078v1 [cs.CV] for this version)


> Abstract: The ability to visually understand and interpret learned features from
complex predictive models is crucial for their acceptance in sensitive areas
such as health care. To move closer to this goal of truly interpretable complex
models, we present PatchNet, a network that restricts global context for image
classification tasks in order to easily provide visual representations of
learned texture features on a predetermined local scale. We demonstrate how
PatchNet provides visual heatmap representations of the learned features, and
we mathematically analyze the behavior of the network during convergence. We
also present a version of PatchNet that is particularly well suited for
lowering false positive rates in image classification tasks. We apply PatchNet
to the classification of textures from the Describable Textures Dataset and to
the ISBI-ISIC 2016 melanoma classification challenge.


## [A Unified Approach to Interpreting Model Predictions](https://arxiv.org/abs/1705.07874)
[(PDF)](https://arxiv.org/pdf/1705.07874)

`Authors:Scott Lundberg, Su-In Lee`


Comments:

To appear in NIPS 2017

Subjects:

Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)


Cite as:

arXiv:1705.07874 [cs.AI]

 
(or arXiv:1705.07874v2 [cs.AI] for this version)


> Abstract: Understanding why a model makes a certain prediction can be as crucial as the
prediction's accuracy in many applications. However, the highest accuracy for
large modern datasets is often achieved by complex models that even experts
struggle to interpret, such as ensemble or deep learning models, creating a
tension between accuracy and interpretability. In response, various methods
have recently been proposed to help users interpret the predictions of complex
models, but it is often unclear how these methods are related and when one
method is preferable over another. To address this problem, we present a
unified framework for interpreting predictions, SHAP (SHapley Additive
exPlanations). SHAP assigns each feature an importance value for a particular
prediction. Its novel components include: (1) the identification of a new class
of additive feature importance measures, and (2) theoretical results showing
there is a unique solution in this class with a set of desirable properties.
The new class unifies six existing methods, notable because several recent
methods in the class lack the proposed desirable properties. Based on insights
from this unification, we present new methods that show improved computational
performance and/or better consistency with human intuition than previous
approaches.


## [Interpreting Blackbox Models via Model Extraction](https://arxiv.org/abs/1705.08504)
[(PDF)](https://arxiv.org/pdf/1705.08504)

`Authors:Osbert Bastani, Carolyn Kim, Hamsa Bastani`


Subjects:

Learning (cs.LG)


Cite as:

arXiv:1705.08504 [cs.LG]

 
(or arXiv:1705.08504v1 [cs.LG] for this version)


> Abstract: Interpretability has become an important issue as machine learning is
increasingly used to inform consequential decisions. We propose an approach for
interpreting a blackbox model by extracting a decision tree that approximates
the model. Our model extraction algorithm avoids overfitting by leveraging
blackbox model access to actively sample new training points. We prove that as
the number of samples goes to infinity, the decision tree learned using our
algorithm converges to the exact greedy decision tree. In our evaluation, we
use our algorithm to interpret random forests and neural nets trained on
several datasets from the UCI Machine Learning Repository, as well as control
policies learned for three classical reinforcement learning problems. We show
that our algorithm improves over a baseline based on CART on every problem
instance. Furthermore, we show how an interpretation generated by our approach
can be used to understand and debug these models.


## [Interpretable & Explorable Approximations of Black Box Models](https://arxiv.org/abs/1707.01154)
[(PDF)](https://arxiv.org/pdf/1707.01154)

`Authors:Himabindu Lakkaraju, Ece Kamar, Rich Caruana, Jure Leskovec`


Comments:

Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning

Subjects:

Artificial Intelligence (cs.AI)


Cite as:

arXiv:1707.01154 [cs.AI]

 
(or arXiv:1707.01154v1 [cs.AI] for this version)


> Abstract: We propose Black Box Explanations through Transparent Approximations (BETA),
a novel model agnostic framework for explaining the behavior of any black-box
classifier by simultaneously optimizing for fidelity to the original model and
interpretability of the explanation. To this end, we develop a novel objective
function which allows us to learn (with optimality guarantees), a small number
of compact decision sets each of which explains the behavior of the black box
model in unambiguous, well-defined regions of feature space. Furthermore, our
framework also is capable of accepting user input when generating these
approximations, thus allowing users to interactively explore how the black-box
model behaves in different subspaces that are of interest to the user. To the
best of our knowledge, this is the first approach which can produce global
explanations of the behavior of any given black box model through joint
optimization of unambiguity, fidelity, and interpretability, while also
allowing users to explore model behavior based on their preferences.
Experimental evaluation with real-world datasets and user studies demonstrates
that our approach can generate highly compact, easy-to-understand, yet accurate
approximations of various kinds of predictive models compared to
state-of-the-art baselines.


## [Interpretability via Model Extraction](https://arxiv.org/abs/1706.09773)
[(PDF)](https://arxiv.org/pdf/1706.09773)

`Authors:Osbert Bastani, Carolyn Kim, Hamsa Bastani`


Comments:

Presented as a poster at the 2017 Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2017)

Subjects:

Learning (cs.LG); Computers and Society (cs.CY); Machine Learning (stat.ML)


Cite as:

arXiv:1706.09773 [cs.LG]

 
(or arXiv:1706.09773v2 [cs.LG] for this version)


> Abstract: The ability to interpret machine learning models has become increasingly
important now that machine learning is used to inform consequential decisions.
We propose an approach called model extraction for interpreting complex,
blackbox models. Our approach approximates the complex model using a much more
interpretable model; as long as the approximation quality is good, then
statistical properties of the complex model are reflected in the interpretable
model. We show how model extraction can be used to understand and debug random
forests and neural nets trained on several datasets from the UCI Machine
Learning Repository, as well as control policies learned for several classical
reinforcement learning problems.


## [Methods for Interpreting and Understanding Deep Neural Networks](https://arxiv.org/abs/1706.07979)
[(PDF)](https://arxiv.org/pdf/1706.07979)

`Authors:Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller`


Comments:

14 pages, 10 figures

Subjects:

Learning (cs.LG); Machine Learning (stat.ML)


DOI:

10.1016/j.dsp.2017.10.011


Cite as:

arXiv:1706.07979 [cs.LG]

 
(or arXiv:1706.07979v1 [cs.LG] for this version)


> Abstract: This paper provides an entry point to the problem of interpreting a deep
neural network model and explaining its predictions. It is based on a tutorial
given at ICASSP 2017. It introduces some recently proposed techniques of
interpretation, along with theory, tricks and recommendations, to make most
efficient use of these techniques on real data. It also discusses a number of
practical applications.


## [MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis  Network](https://arxiv.org/abs/1707.02485)
[(PDF)](https://arxiv.org/pdf/1707.02485)

`Authors:Zizhao Zhang, Yuanpu Xie, Fuyong Xing, Mason McGough, Lin Yang`


Comments:

CVPR2017 Oral

Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1707.02485 [cs.CV]

 
(or arXiv:1707.02485v1 [cs.CV] for this version)


> Abstract: The inability to interpret the model prediction in semantically and visually
meaningful ways is a well-known shortcoming of most existing computer-aided
diagnosis methods. In this paper, we propose MDNet to establish a direct
multimodal mapping between medical images and diagnostic reports that can read
images, generate diagnostic reports, retrieve images by symptom descriptions,
and visualize attention, to provide justifications of the network diagnosis
process. MDNet includes an image model and a language model. The image model is
proposed to enhance multi-scale feature ensembles and utilization efficiency.
The language model, integrated with our improved attention mechanism, aims to
read and explore discriminative image feature descriptions from reports to
learn a direct mapping from sentence words to image pixels. The overall network
is trained end-to-end by using our developed optimization strategy. Based on a
pathology bladder cancer images and its diagnostic reports (BCIDR) dataset, we
conduct sufficient experiments to demonstrate that MDNet outperforms
comparative baselines. The proposed image model obtains state-of-the-art
performance on two CIFAR datasets as well.


## [A Formal Framework to Characterize Interpretability of Procedures](https://arxiv.org/abs/1707.03886)
[(PDF)](https://arxiv.org/pdf/1707.03886)

`Authors:Amit Dhurandhar, Vijay Iyengar, Ronny Luss, Karthikeyan Shanmugam`


Comments:

presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

Subjects:

Artificial Intelligence (cs.AI)


Cite as:

arXiv:1707.03886 [cs.AI]

 
(or arXiv:1707.03886v1 [cs.AI] for this version)


> Abstract: We provide a novel notion of what it means to be interpretable, looking past
the usual association with human understanding. Our key insight is that
interpretability is not an absolute concept and so we define it relative to a
target model, which may or may not be a human. We define a framework that
allows for comparing interpretable procedures by linking it to important
practical aspects such as accuracy and robustness. We characterize many of the
current state-of-the-art interpretable methods in our framework portraying its
general applicability.


## [Interpreting Classifiers through Attribute Interactions in Datasets](https://arxiv.org/abs/1707.07576)
[(PDF)](https://arxiv.org/pdf/1707.07576)

`Authors:Andreas Henelius, Kai Puolamäki, Antti Ukkonen`


Comments:

presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1707.07576 [stat.ML]

 
(or arXiv:1707.07576v1 [stat.ML] for this version)


> Abstract: In this work we present the novel ASTRID method for investigating which
attribute interactions classifiers exploit when making predictions. Attribute
interactions in classification tasks mean that two or more attributes together
provide stronger evidence for a particular class label. Knowledge of such
interactions makes models more interpretable by revealing associations between
attributes. This has applications, e.g., in pharmacovigilance to identify
interactions between drugs or in bioinformatics to investigate associations
between single nucleotide polymorphisms. We also show how the found attribute
partitioning is related to a factorisation of the data generating distribution
and empirically demonstrate the utility of the proposed method.


## [Interpretable Active Learning](https://arxiv.org/abs/1708.00049)
[(PDF)](https://arxiv.org/pdf/1708.00049)

`Authors:Richard L. Phillips, Kyu Hyun Chang, Sorelle A. Friedler`


Comments:

6 pages, 5 figures, presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1708.00049 [stat.ML]

 
(or arXiv:1708.00049v1 [stat.ML] for this version)


> Abstract: Active learning has long been a topic of study in machine learning. However,
as increasingly complex and opaque models have become standard practice, the
process of active learning, too, has become more opaque. There has been little
investigation into interpreting what specific trends and patterns an active
learning strategy may be exploring. This work expands on the Local
Interpretable Model-agnostic Explanations framework (LIME) to provide
explanations for active learning recommendations. We demonstrate how LIME can
be used to generate locally faithful explanations for an active learning
strategy, and how these explanations can be used to understand how different
models and datasets explore a problem space over time. In order to quantify the
per-subgroup differences in how an active learning strategy queries spatial
regions, we introduce a notion of uncertainty bias (based on disparate impact)
to measure the discrepancy in the confidence for a model's predictions between
one subgroup and another. Using the uncertainty bias measure, we show that our
query explanations accurately reflect the subgroup focus of the active learning
queries, allowing for an interpretable explanation of what is being learned as
points with similar sources of uncertainty have their uncertainty bias
resolved. We demonstrate that this technique can be applied to track
uncertainty bias over user-defined clusters or automatically generated clusters
based on the source of uncertainty.


## [Using Program Induction to Interpret Transition System Dynamics](https://arxiv.org/abs/1708.00376)
[(PDF)](https://arxiv.org/pdf/1708.00376)

`Authors:Svetlin Penkov, Subramanian Ramamoorthy`


Comments:

Presented at 2017 ICML Workshop on Human Interpretability in Machine Learning (WHI 2017), Sydney, NSW, Australia. arXiv admin note: substantial text overlap with arXiv:1705.08320

Subjects:

Artificial Intelligence (cs.AI)


Cite as:

arXiv:1708.00376 [cs.AI]

 
(or arXiv:1708.00376v1 [cs.AI] for this version)


> Abstract: Explaining and reasoning about processes which underlie observed black-box
phenomena enables the discovery of causal mechanisms, derivation of suitable
abstract representations and the formulation of more robust predictions. We
propose to learn high level functional programs in order to represent abstract
models which capture the invariant structure in the observed data. We introduce
the $\pi$-machine (program-induction machine) -- an architecture able to induce
interpretable LISP-like programs from observed data traces. We propose an
optimisation procedure for program learning based on backpropagation, gradient
descent and A* search. We apply the proposed method to two problems: system
identification of dynamical systems and explaining the behaviour of a DQN
agent. Our results show that the $\pi$-machine can efficiently induce
interpretable programs from individual data traces.


## [Warp: a method for neural network interpretability applied to gene  expression profiles](https://arxiv.org/abs/1708.04988)
[(PDF)](https://arxiv.org/pdf/1708.04988)

`Authors:Trofimov Assya, Lemieux Sebastien, Perreault Claude`


Comments:

5 pages, 3 figures, NIPS2016, Machine Learning in Computational Biology workshop

Subjects:

Genomics (q-bio.GN); Artificial Intelligence (cs.AI)


Cite as:

arXiv:1708.04988 [q-bio.GN]

 
(or arXiv:1708.04988v1 [q-bio.GN] for this version)


> Abstract: We show a proof of principle for warping, a method to interpret the inner
working of neural networks in the context of gene expression analysis. Warping
is an efficient way to gain insight to the inner workings of neural nets and
make them more interpretable. We demonstrate the ability of warping to recover
meaningful information for a given class on a samplespecific individual basis.
We found warping works well in both linearly and nonlinearly separable
datasets. These encouraging results show that warping has a potential to be the
answer to neural networks interpretability in computational biology.


## [DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation  of Self-Reported Pain](https://arxiv.org/abs/1708.04670)
[(PDF)](https://arxiv.org/pdf/1708.04670)

`Authors:Dianbo Liu, Fengjiao Peng, Andrew Shea, Ognjen (Oggi)Rudovic, Rosalind Picard`


Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Learning (cs.LG)


Cite as:

arXiv:1708.04670 [cs.CV]

 
(or arXiv:1708.04670v1 [cs.CV] for this version)


> Abstract: Previous research on automatic pain estimation from facial expressions has
focused primarily on "one-size-fits-all" metrics (such as PSPI). In this work,
we focus on directly estimating each individual's self-reported visual-analog
scale (VAS) pain metric, as this is considered the gold standard for pain
measurement. The VAS pain score is highly subjective and context-dependent, and
its range can vary significantly among different persons. To tackle these
issues, we propose a novel two-stage personalized model, named DeepFaceLIFT,
for automatic estimation of VAS. This model is based on (1) Neural Network and
(2) Gaussian process regression models, and is used to personalize the
estimation of self-reported pain via a set of hand-crafted personal features
and multi-task learning. We show on the benchmark dataset for pain analysis
(The UNBC-McMaster Shoulder Pain Expression Archive) that the proposed
personalized model largely outperforms the traditional, unpersonalized models:
the intra-class correlation improves from a baseline performance of 19\% to a
personalized performance of 35\% while also providing confidence in the
model\textquotesingle s estimates -- in contrast to existing models for the
target task. Additionally, DeepFaceLIFT automatically discovers the
pain-relevant facial regions for each person, allowing for an easy
interpretation of the pain-related facial cues.


## [Towards Interpretable Deep Neural Networks by Leveraging Adversarial  Examples](https://arxiv.org/abs/1708.05493)
[(PDF)](https://arxiv.org/pdf/1708.05493)

`Authors:Yinpeng Dong, Hang Su, Jun Zhu, Fan Bao`


Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1708.05493 [cs.CV]

 
(or arXiv:1708.05493v1 [cs.CV] for this version)


> Abstract: Deep neural networks (DNNs) have demonstrated impressive performance on a
wide array of tasks, but they are usually considered opaque since internal
structure and learned parameters are not interpretable. In this paper, we
re-examine the internal representations of DNNs using adversarial images, which
are generated by an ensemble-optimization algorithm. We find that: (1) the
neurons in DNNs do not truly detect semantic objects/parts, but respond to
objects/parts only as recurrent discriminative patches; (2) deep visual
representations are not robust distributed codes of visual concepts because the
representations of adversarial images are largely not consistent with those of
real images, although they have similar visual appearance, both of which are
different from previous findings. To further improve the interpretability of
DNNs, we propose an adversarial training scheme with a consistent loss such
that the neurons are endowed with human-interpretable concepts. The induced
interpretable representations enable us to trace eventual outcomes back to
influential neurons. Therefore, human users can know how the models make
predictions, as well as when and why they make errors.


## [More cat than cute? Interpretable Prediction of Adjective-Noun Pairs](https://arxiv.org/abs/1708.06039)
[(PDF)](https://arxiv.org/pdf/1708.06039)

`Authors:Delia Fernandez, Alejandro Woodward, Victor Campos, Xavier Giro-i-Nieto, Brendan Jou, Shih-Fu Chang`


Comments:

Oral paper at ACM Multimedia 2017 Workshop on Multimodal Understanding of Social, Affective and Subjective Attributes (MUSA2)

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Multimedia (cs.MM)


DOI:

10.1145/3132515.3132520


Cite as:

arXiv:1708.06039 [cs.CV]

 
(or arXiv:1708.06039v1 [cs.CV] for this version)


> Abstract: The increasing availability of affect-rich multimedia resources has bolstered
interest in understanding sentiment and emotions in and from visual content.
Adjective-noun pairs (ANP) are a popular mid-level semantic construct for
capturing affect via visually detectable concepts such as "cute dog" or
"beautiful landscape". Current state-of-the-art methods approach ANP prediction
by considering each of these compound concepts as individual tokens, ignoring
the underlying relationships in ANPs. This work aims at disentangling the
contributions of the `adjectives' and `nouns' in the visual prediction of ANPs.
Two specialised classifiers, one trained for detecting adjectives and another
for nouns, are fused to predict 553 different ANPs. The resulting ANP
prediction model is more interpretable as it allows us to study contributions
of the adjective and noun components. Source code and models are available at
this https URL .


## [Interpretable Categorization of Heterogeneous Time Series Data](https://arxiv.org/abs/1708.09121)
[(PDF)](https://arxiv.org/pdf/1708.09121)

`Authors:Ritchie Lee, Mykel J. Kochenderfer, Ole J. Mengshoel, Joshua Silbermann`


Comments:

10 pages, 7 figures

Subjects:

Learning (cs.LG)


Cite as:

arXiv:1708.09121 [cs.LG]

 
(or arXiv:1708.09121v1 [cs.LG] for this version)


> Abstract: The explanation of heterogeneous multivariate time series data is a central
problem in many applications. The problem requires two major data mining
challenges to be addressed simultaneously: Learning models that are
human-interpretable and mining of heterogeneous multivariate time series data.
The intersection of these two areas is not adequately explored in the existing
literature. To address this gap, we propose grammar-based decision trees and an
algorithm for learning them. Grammar-based decision tree extends decision trees
with a grammar framework. Logical expressions, derived from context-free
grammar, are used for branching in place of simple thresholds on attributes.
The added expressivity enables support for a wide range of data types while
retaining the interpretability of decision trees. By choosing a grammar based
on temporal logic, we show that grammar-based decision trees can be used for
the interpretable classification of high-dimensional and heterogeneous time
series data. In addition to classification, we show how grammar-based decision
trees can also be used for categorization, which is a combination of clustering
and generating interpretable explanations for each cluster. We apply
grammar-based decision trees to analyze the classic Australian Sign Language
dataset as well as categorize and explain near mid-air collisions to support
the development of a prototype aircraft collision avoidance system.


## [Explainable Artificial Intelligence: Understanding, Visualizing and  Interpreting Deep Learning Models](https://arxiv.org/abs/1708.08296)
[(PDF)](https://arxiv.org/pdf/1708.08296)

`Authors:Wojciech Samek, Thomas Wiegand, Klaus-Robert Müller`


Comments:

8 pages, 2 figures

Subjects:

Artificial Intelligence (cs.AI); Computers and Society (cs.CY); Neural and Evolutionary Computing (cs.NE); Machine Learning (stat.ML)


Cite as:

arXiv:1708.08296 [cs.AI]

 
(or arXiv:1708.08296v1 [cs.AI] for this version)


> Abstract: With the availability of large databases and recent improvements in deep
learning methodology, the performance of AI systems is reaching or even
exceeding the human level on an increasing number of complex tasks. Impressive
examples of this development can be found in domains such as image
classification, sentiment analysis, speech understanding or strategic game
playing. However, because of their nested non-linear structure, these highly
successful machine learning and artificial intelligence models are usually
applied in a black box manner, i.e., no information is provided about what
exactly makes them arrive at their predictions. Since this lack of transparency
can be a major drawback, e.g., in medical applications, the development of
methods for visualizing, explaining and interpreting deep learning models has
recently attracted increasing attention. This paper summarizes recent
developments in this field and makes a plea for more interpretability in
artificial intelligence. Furthermore, it presents two approaches to explaining
predictions of deep learning models, one method which computes the sensitivity
of the prediction with respect to changes in the input and one approach which
meaningfully decomposes the decision in terms of the input variables. These
methods are evaluated on three classification tasks.


## [Interpreting Shared Deep Learning Models via Explicable Boundary Trees](https://arxiv.org/abs/1709.03730)
[(PDF)](https://arxiv.org/pdf/1709.03730)

`Authors:Huijun Wu, Chen Wang, Jie Yin, Kai Lu, Liming Zhu`


Comments:

9 pages, 10 figures

Subjects:

Learning (cs.LG); Human-Computer Interaction (cs.HC)


Cite as:

arXiv:1709.03730 [cs.LG]

 
(or arXiv:1709.03730v1 [cs.LG] for this version)


> Abstract: Despite outperforming the human in many tasks, deep neural network models are
also criticized for the lack of transparency and interpretability in decision
making. The opaqueness results in uncertainty and low confidence when deploying
such a model in model sharing scenarios, when the model is developed by a third
party. For a supervised machine learning model, sharing training process
including training data provides an effective way to gain trust and to better
understand model predictions. However, it is not always possible to share all
training data due to privacy and policy constraints. In this paper, we propose
a method to disclose a small set of training data that is just sufficient for
users to get the insight of a complicated model. The method constructs a
boundary tree using selected training data and the tree is able to approximate
the complicated model with high fidelity. We show that traversing data points
in the tree gives users significantly better understanding of the model and
paves the way for trustworthy model sharing.


## [Interpretable Graph-Based Semi-Supervised Learning via Flows](https://arxiv.org/abs/1709.04764)
[(PDF)](https://arxiv.org/pdf/1709.04764)

`Authors:Raif M. Rustamov, James T. Klosowski`


Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1709.04764 [stat.ML]

 
(or arXiv:1709.04764v1 [stat.ML] for this version)


> Abstract: In this paper, we consider the interpretability of the foundational
Laplacian-based semi-supervised learning approaches on graphs. We introduce a
novel flow-based learning framework that subsumes the foundational approaches
and additionally provides a detailed, transparent, and easily understood
expression of the learning process in terms of graph flows. As a result, one
can visualize and interactively explore the precise subgraph along which the
information from labeled nodes flows to an unlabeled node of interest.
Surprisingly, the proposed framework avoids trading accuracy for
interpretability, but in fact leads to improved prediction accuracy, which is
supported both by theoretical considerations and empirical results. The
flow-based framework guarantees the maximum principle by construction and can
handle directed graphs in an out-of-the-box manner.


## [Unsupervised Learning of Disentangled and Interpretable Representations  from Sequential Data](https://arxiv.org/abs/1709.07902)
[(PDF)](https://arxiv.org/pdf/1709.07902)

`Authors:Wei-Ning Hsu, Yu Zhang, James Glass`


Comments:

Accepted to NIPS 2017

Subjects:

Learning (cs.LG); Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS); Machine Learning (stat.ML)


Cite as:

arXiv:1709.07902 [cs.LG]

 
(or arXiv:1709.07902v1 [cs.LG] for this version)


> Abstract: We present a factorized hierarchical variational autoencoder, which learns
disentangled and interpretable representations from sequential data without
supervision. Specifically, we exploit the multi-scale nature of information in
sequential data by formulating it explicitly within a factorized hierarchical
graphical model that imposes sequence-dependent priors and sequence-independent
priors to different sets of latent variables. The model is evaluated on two
speech corpora to demonstrate, qualitatively, its ability to transform speakers
or linguistic content by manipulating different sets of latent variables; and
quantitatively, its ability to outperform an i-vector baseline for speaker
verification and reduce the word error rate by as much as 35% in mismatched
train/test scenarios for automatic speech recognition tasks.


## [CTD: Fast, Accurate, and Interpretable Method for Static and Dynamic  Tensor Decompositions](https://arxiv.org/abs/1710.03608)
[(PDF)](https://arxiv.org/pdf/1710.03608)

`Authors:Jungwoo Lee, Dongjin Choi, Lee Sael`


Subjects:

Numerical Analysis (cs.NA); Learning (cs.LG); Machine Learning (stat.ML)


Cite as:

arXiv:1710.03608 [cs.NA]

 
(or arXiv:1710.03608v1 [cs.NA] for this version)


> Abstract: How can we find patterns and anomalies in a tensor, or multi-dimensional
array, in an efficient and directly interpretable way? How can we do this in an
online environment, where a new tensor arrives each time step? Finding patterns
and anomalies in a tensor is a crucial problem with many applications,
including building safety monitoring, patient health monitoring, cyber
security, terrorist detection, and fake user detection in social networks.
Standard PARAFAC and Tucker decomposition results are not directly
interpretable. Although a few sampling-based methods have previously been
proposed towards better interpretability, they need to be made faster, more
memory efficient, and more accurate.
In this paper, we propose CTD, a fast, accurate, and directly interpretable
tensor decomposition method based on sampling. CTD-S, the static version of
CTD, provably guarantees a high accuracy that is 17 ~ 83x more accurate than
that of the state-of-the-art method. Also, CTD-S is made 5 ~ 86x faster, and 7
~ 12x more memory-efficient than the state-of-the-art method by removing
redundancy. CTD-D, the dynamic version of CTD, is the first interpretable
dynamic tensor decomposition method ever proposed. Also, it is made 2 ~ 3x
faster than already fast CTD-S by exploiting factors at previous time step and
by reordering operations. With CTD, we demonstrate how the results can be
effectively interpreted in the online distributed denial of service (DDoS)
attack detection.


## [Interpretable Convolutional Neural Networks](https://arxiv.org/abs/1710.00935)
[(PDF)](https://arxiv.org/pdf/1710.00935)

`Authors:Quanshi Zhang, Ying Nian Wu, Song-Chun Zhu`


Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1710.00935 [cs.CV]

 
(or arXiv:1710.00935v3 [cs.CV] for this version)


> Abstract: This paper proposes a method to modify traditional convolutional neural
networks (CNNs) into interpretable CNNs, in order to clarify knowledge
representations in high conv-layers of CNNs. In an interpretable CNN, each
filter in a high conv-layer represents a certain object part. We do not need
any annotations of object parts or textures to supervise the learning process.
Instead, the interpretable CNN automatically assigns each filter in a high
conv-layer with an object part during the learning process. Our method can be
applied to different types of CNNs with different structures. The clear
knowledge representation in an interpretable CNN can help people understand the
logics inside a CNN, i.e., based on which patterns the CNN makes the decision.
Experiments showed that filters in an interpretable CNN were more semantically
meaningful than those in traditional CNNs.


## [Interpretable Machine Learning for Privacy-Preserving Pervasive Systems](https://arxiv.org/abs/1710.08464)
[(PDF)](https://arxiv.org/pdf/1710.08464)

`Authors:Benjamin Baron, Mirco Musolesi`


Subjects:

Machine Learning (stat.ML); Cryptography and Security (cs.CR); Learning (cs.LG)


Cite as:

arXiv:1710.08464 [stat.ML]

 
(or arXiv:1710.08464v3 [stat.ML] for this version)


> Abstract: The presence of pervasive systems in our everyday lives and the interaction
of users with connected devices such as smartphones or home appliances generate
increasing amounts of traces that reflect users' behavior. A plethora of
machine learning techniques enable service providers to process these traces to
extract latent information about the users. While most of the existing projects
have focused on the accuracy of these techniques, little work has been done on
the interpretation of the inference and identification algorithms based on
them. In this paper, we propose a machine learning interpretability framework
for inference algorithms based on data collected through pervasive systems and
we outline the open challenges in this research area. Our interpretability
framework enable users to understand how the traces they generate could expose
their privacy, while allowing for usable and personalized services at the same
time.


## [InterpNET: Neural Introspection for Interpretable Deep Learning](https://arxiv.org/abs/1710.09511)
[(PDF)](https://arxiv.org/pdf/1710.09511)

`Authors:Shane Barratt`


Comments:

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1710.09511 [stat.ML]

 
(or arXiv:1710.09511v2 [stat.ML] for this version)


> Abstract: Humans are able to explain their reasoning. On the contrary, deep neural
networks are not. This paper attempts to bridge this gap by introducing a new
way to design interpretable neural networks for classification, inspired by
physiological evidence of the human visual system's inner-workings. This paper
proposes a neural network design paradigm, termed InterpNET, which can be
combined with any existing classification architecture to generate natural
language explanations of the classifications. The success of the module relies
on the assumption that the network's computation and reasoning is represented
in its internal layer activations. While in principle InterpNET could be
applied to any existing classification architecture, it is evaluated via an
image classification and explanation task. Experiments on a CUB bird
classification and explanation dataset show qualitatively and quantitatively
that the model is able to generate high-quality explanations. While the current
state-of-the-art METEOR score on this dataset is 29.2, InterpNET achieves a
much higher METEOR score of 37.9.


## [MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural  Networks](https://arxiv.org/abs/1711.06788)
[(PDF)](https://arxiv.org/pdf/1711.06788)

`Authors:Minmin Chen`


Comments:

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1711.06788 [stat.ML]

 
(or arXiv:1711.06788v1 [stat.ML] for this version)


> Abstract: We introduce MinimalRNN, a new recurrent neural network architecture that
achieves comparable performance as the popular gated RNNs with a simplified
structure. It employs minimal updates within RNN, which not only leads to
efficient learning and testing but more importantly better interpretability and
trainability. We demonstrate that by endorsing the more restrictive update
rule, MinimalRNN learns disentangled RNN states. We further examine the
learning dynamics of different RNN structures using input-output Jacobians, and
show that MinimalRNN is able to capture longer range dependencies than existing
RNN architectures.


## [Beyond Sparsity: Tree Regularization of Deep Models for Interpretability](https://arxiv.org/abs/1711.06178)
[(PDF)](https://arxiv.org/pdf/1711.06178)

`Authors:Mike Wu, Michael C. Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, Finale Doshi-Velez`


Comments:

To appear in AAAI 2018. Contains 9-page main paper and appendix with supplementary material

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1711.06178 [stat.ML]

 
(or arXiv:1711.06178v1 [stat.ML] for this version)


> Abstract: The lack of interpretability remains a key barrier to the adoption of deep
models in many applications. In this work, we explicitly regularize deep models
so human users might step through the process behind their predictions in
little time. Specifically, we train deep time-series models so their
class-probability predictions have high accuracy while being closely modeled by
decision trees with few nodes. Using intuitive toy examples as well as medical
tasks for treating sepsis and HIV, we demonstrate that this new tree
regularization yields models that are easier for humans to simulate than
simpler L1 or L2 penalties without sacrificing predictive power.


## [The Promise and Peril of Human Evaluation for Model Interpretability](https://arxiv.org/abs/1711.07414)
[(PDF)](https://arxiv.org/pdf/1711.07414)

`Authors:Bernease Herman`


Comments:

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Artificial Intelligence (cs.AI); Learning (cs.LG); Machine Learning (stat.ML)


Cite as:

arXiv:1711.07414 [cs.AI]

 
(or arXiv:1711.07414v1 [cs.AI] for this version)


> Abstract: Transparency, user trust, and human comprehension are popular ethical
motivations for interpretable machine learning. In support of these goals,
researchers evaluate model explanation performance using humans and real world
applications. This alone presents a challenge in many areas of artificial
intelligence. In this position paper, we propose a distinction between
descriptive and persuasive explanations. We discuss reasoning suggesting that
functional interpretability may be correlated with cognitive function and user
preferences. If this is indeed the case, evaluation and optimization using
functional metrics could perpetuate implicit cognitive bias in explanations
that threaten transparency. Finally, we propose two potential research
directions to disambiguate cognitive function and explanation models, retaining
control over the tradeoff between accuracy and interpretability.


## [Unleashing the Potential of CNNs for Interpretable Few-Shot Learning](https://arxiv.org/abs/1711.08277)
[(PDF)](https://arxiv.org/pdf/1711.08277)

`Authors:Boyang Deng, Qing Liu, Siyuan Qiao, Alan Yuille`


Comments:

Under review as a conference paper at ICLR 2018

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG); Machine Learning (stat.ML)


Cite as:

arXiv:1711.08277 [cs.CV]

 
(or arXiv:1711.08277v1 [cs.CV] for this version)


> Abstract: Convolutional neural networks (CNNs) have been generally acknowledged as one
of the driving forces for the advancement of computer vision. Despite their
promising performances on many tasks, CNNs still face major obstacles on the
road to achieving ideal machine intelligence. One is the difficulty of
interpreting them and understanding their inner workings, which is important
for diagnosing their failures and correcting them. Another is that standard
CNNs require large amounts of annotated data, which is sometimes very hard to
obtain. Hence, it is desirable to enable them to learn from few examples. In
this work, we address these two limitations of CNNs by developing novel and
interpretable models for few-shot learning. Our models are based on the idea of
encoding objects in terms of visual concepts, which are interpretable visual
cues represented within CNNs. We first use qualitative visualizations and
quantitative statistics, to uncover several key properties of feature encoding
using visual concepts. Motivated by these properties, we present two intuitive
models for the problem of few-shot learning. Experiments show that our models
achieve competitive performances, while being much more flexible and
interpretable than previous state-of-the-art few-shot learning methods. We
conclude that visual concepts expose the natural capability of CNNs for
few-shot learning.


## [Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action  Recognition](https://arxiv.org/abs/1711.08502)
[(PDF)](https://arxiv.org/pdf/1711.08502)

`Authors:Jingxuan Hou, Tae Soo Kim, Austin Reiter`


Comments:

8 pages, 8 figures, CVPR18 submission

Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1711.08502 [cs.CV]

 
(or arXiv:1711.08502v1 [cs.CV] for this version)


> Abstract: Despite the growing discriminative capabilities of modern deep learning
methods for recognition tasks, the inner workings of the state-of-art models
still remain mostly black-boxes. In this paper, we propose a systematic
interpretation of model parameters and hidden representations of Residual
Temporal Convolutional Networks (Res-TCN) for action recognition in time-series
data. We also propose a Feature Map Decoder as part of the interpretation
analysis, which outputs a representation of model's hidden variables in the
same domain as the input. Such analysis empowers us to expose model's
characteristic learning patterns in an interpretable way. For example, through
the diagnosis analysis, we discovered that our model has learned to achieve
view-point invariance by implicitly learning to perform rotational
normalization of the input to a more discriminative view. Based on the findings
from the model interpretation analysis, we propose a targeted refinement
technique, which can generalize to various other recognition models. The
proposed work introduces a three-stage paradigm for model learning: training,
interpretable diagnosis and targeted refinement. We validate our approach on
skeleton based 3D human action recognition benchmark of NTU RGB+D. We show that
the proposed workflow is an effective model learning strategy and the resulting
Multi-stream Residual Temporal Convolutional Network (MS-Res-TCN) achieves the
state-of-the-art performance on NTU RGB+D.


## [SPINE: SParse Interpretable Neural Embeddings](https://arxiv.org/abs/1711.08792)
[(PDF)](https://arxiv.org/pdf/1711.08792)

`Authors:Anant Subramanian, Danish Pruthi, Harsh Jhamtani, Taylor Berg-Kirkpatrick, Eduard Hovy`


Comments:

AAAI 2018

Subjects:

Computation and Language (cs.CL)


Cite as:

arXiv:1711.08792 [cs.CL]

 
(or arXiv:1711.08792v1 [cs.CL] for this version)


> Abstract: Prediction without justification has limited utility. Much of the success of
neural models can be attributed to their ability to learn rich, dense and
expressive representations. While these representations capture the underlying
complexity and latent trends in the data, they are far from being
interpretable. We propose a novel variant of denoising k-sparse autoencoders
that generates highly efficient and interpretable distributed word
representations (word embeddings), beginning with existing word representations
from state-of-the-art methods like GloVe and word2vec. Through large scale
human evaluation, we report that our resulting word embedddings are much more
interpretable than the original GloVe and word2vec embeddings. Moreover, our
embeddings outperform existing popular word embeddings on a diverse suite of
benchmark downstream tasks.


## [Improving the Adversarial Robustness and Interpretability of Deep Neural  Networks by Regularizing their Input Gradients](https://arxiv.org/abs/1711.09404)
[(PDF)](https://arxiv.org/pdf/1711.09404)

`Authors:Andrew Slavin Ross, Finale Doshi-Velez`


Comments:

To appear in AAAI 2018

Subjects:

Learning (cs.LG); Cryptography and Security (cs.CR); Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1711.09404 [cs.LG]

 
(or arXiv:1711.09404v1 [cs.LG] for this version)


> Abstract: Deep neural networks have proven remarkably effective at solving many
classification problems, but have been criticized recently for two major
weaknesses: the reasons behind their predictions are uninterpretable, and the
predictions themselves can often be fooled by small adversarial perturbations.
These problems pose major obstacles for the adoption of neural networks in
domains that require security or transparency. In this work, we evaluate the
effectiveness of defenses that differentiably penalize the degree to which
small changes in inputs can alter model predictions. Across multiple attacks,
architectures, defenses, and datasets, we find that neural networks trained
with this input gradient regularization exhibit robustness to transferred
adversarial examples generated to fool all of the other models. We also find
that adversarial examples generated to fool gradient-regularized models fool
all other models equally well, and actually lead to more "legitimate,"
interpretable misclassifications as rated by people (which we confirm in a
human subject experiment). Finally, we demonstrate that regularizing input
gradients makes them more naturally interpretable as rationales for model
predictions. We conclude by discussing this relationship between
interpretability and robustness in deep neural networks.


## [Interpretable Convolutional Neural Networks for Effective Translation  Initiation Site Prediction](https://arxiv.org/abs/1711.09558)
[(PDF)](https://arxiv.org/pdf/1711.09558)

`Authors:Jasper Zuallaert, Mijung Kim, Yvan Saeys, Wesley De Neve`


Comments:

Presented at International Workshop on Deep Learning in Bioinformatics, Biomedicine, and Healthcare Informatics (DLB2H 2017) --- in conjunction with the IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2017)

Subjects:

Genomics (q-bio.GN); Learning (cs.LG)


Cite as:

arXiv:1711.09558 [q-bio.GN]

 
(or arXiv:1711.09558v1 [q-bio.GN] for this version)


> Abstract: Thanks to rapidly evolving sequencing techniques, the amount of genomic data
at our disposal is growing increasingly large. Determining the gene structure
is a fundamental requirement to effectively interpret gene function and
regulation. An important part in that determination process is the
identification of translation initiation sites. In this paper, we propose a
novel approach for automatic prediction of translation initiation sites,
leveraging convolutional neural networks that allow for automatic feature
extraction. Our experimental results demonstrate that we are able to improve
the state-of-the-art approaches with a decrease of 75.2% in false positive rate
and with a decrease of 24.5% in error rate on chosen datasets. Furthermore, an
in-depth analysis of the decision-making process used by our predictive model
shows that our neural network implicitly learns biologically relevant features
from scratch, without any prior knowledge about the problem at hand, such as
the Kozak consensus sequence, the influence of stop and start codons in the
sequence and the presence of donor splice site patterns. In summary, our
findings yield a better understanding of the internal reasoning of a
convolutional neural network when applying such a neural network to genomic
data.


## [Interpretable Facial Relational Network Using Relational Importance](https://arxiv.org/abs/1711.10688)
[(PDF)](https://arxiv.org/pdf/1711.10688)

`Authors:Seong Tae Kim, Yong Man Ro`


Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1711.10688 [cs.CV]

 
(or arXiv:1711.10688v1 [cs.CV] for this version)


> Abstract: Human face analysis is an important task in computer vision. According to
cognitive-psychological studies, facial dynamics could provide crucial cues for
face analysis. In particular, the motion of facial local regions in facial
expression is related to the motion of other facial regions. In this paper, a
novel deep learning approach which exploits the relations of facial local
dynamics has been proposed to estimate facial traits from expression sequence.
In order to exploit the relations of facial dynamics in local regions, the
proposed network consists of a facial local dynamic feature encoding network
and a facial relational network. The facial relational network is designed to
be interpretable. Relational importance is automatically encoded and facial
traits are estimated by combining relational features based on the relational
importance. The relations of facial dynamics for facial trait estimation could
be interpreted by using the relational importance. By comparative experiments,
the effectiveness of the proposed method has been validated. Experimental
results show that the proposed method outperforms the state-of-the-art methods
in gender and age estimation.


## [An interpretable latent variable model for attribute applicability in  the Amazon catalogue](https://arxiv.org/abs/1712.00126)
[(PDF)](https://arxiv.org/pdf/1712.00126)

`Authors:Tammo Rukat, Dustin Lange, Cédric Archambeau`


Comments:

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1712.00126 [stat.ML]

 
(or arXiv:1712.00126v2 [stat.ML] for this version)


> Abstract: Learning attribute applicability of products in the Amazon catalog (e.g.,
predicting that a shoe should have a value for size, but not for battery-type
at scale is a challenge. The need for an interpretable model is contingent on
(1) the lack of ground truth training data, (2) the need to utilise prior
information about the underlying latent space and (3) the ability to understand
the quality of predictions on new, unseen data. To this end, we develop the
MaxMachine, a probabilistic latent variable model that learns distributed
binary representations, associated to sets of features that are likely to
co-occur in the data. Layers of MaxMachines can be stacked such that higher
layers encode more abstract information. Any set of variables can be clamped to
encode prior information. We develop fast sampling based posterior inference.
Preliminary results show that the model improves over the baseline in 17 out of
19 product groups and provides qualitatively reasonable predictions.


## [Where Classification Fails, Interpretation Rises](https://arxiv.org/abs/1712.00558)
[(PDF)](https://arxiv.org/pdf/1712.00558)

`Authors:Chanh Nguyen, Georgi Georgiev, Yujie Ji, Ting Wang`


Comments:

6 pages, 6 figures

Subjects:

Learning (cs.LG); Machine Learning (stat.ML)


Cite as:

arXiv:1712.00558 [cs.LG]

 
(or arXiv:1712.00558v1 [cs.LG] for this version)


> Abstract: An intriguing property of deep neural networks is their inherent vulnerability to adversarial inputs, which significantly hinders their
application in security-critical domains. Most existing detection methods
attempt to use carefully engineered patterns to distinguish adversarial inputs
from their genuine counterparts, which however can often be circumvented by
adaptive adversaries. In this work, we take a completely different route by
leveraging the definition of adversarial inputs: while deceiving for deep
neural networks, they are barely discernible for human visions. Building upon
recent advances in interpretable models, we construct a new detection framework
that contrasts an input's interpretation against its classification. We
validate the efficacy of this framework through extensive experiments using
benchmark datasets and attacks. We believe that this work opens a new direction
for designing adversarial input detection methods.


## [SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for  Predicting Chemical Properties](https://arxiv.org/abs/1712.02034)
[(PDF)](https://arxiv.org/pdf/1712.02034)

`Authors:Garrett B. Goh, Nathan O. Hodas, Charles Siegel, Abhinav Vishnu`


Subjects:

Machine Learning (stat.ML); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Learning (cs.LG)


Cite as:

arXiv:1712.02034 [stat.ML]

 
(or arXiv:1712.02034v1 [stat.ML] for this version)


> Abstract: Chemical databases store information in text representations, and the SMILES
format is a universal standard used in many cheminformatics software. Encoded
in each SMILES string is structural information that can be used to predict
complex chemical properties. In this work, we develop SMILES2Vec, a deep RNN
that automatically learns features from SMILES strings to predict chemical
properties, without the need for additional explicit chemical information, or
the "grammar" of how SMILES encode structural data. Using Bayesian optimization
methods to tune the network architecture, we show that an optimized SMILES2Vec
model can serve as a general-purpose neural network for learning a range of
distinct chemical properties including toxicity, activity, solubility and
solvation energy, while outperforming contemporary MLP networks that uses
engineered features. Furthermore, we demonstrate proof-of-concept of
interpretability by developing an explanation mask that localizes on the most
important characters used in making a prediction. When tested on the solubility
dataset, this localization identifies specific parts of a chemical that is
consistent with established first-principles knowledge of solubility with an
accuracy of 88%, demonstrating that neural networks can learn technically
accurate chemical concepts. The fact that SMILES2Vec validates established
chemical facts, while providing state-of-the-art accuracy, makes it a potential
tool for widespread adoption of interpretable deep learning by the chemistry
community.


================================================
FILE: arxiv.md
================================================
# Index
1. Application and interpret
    * [Interpretable Policies for Reinforcement Learning by Genetic Programming](#interpretable-policies-for-reinforcement-learning-by-genetic-programming)
    * [Discovery Radiomics with CLEAR-DR: Interpretable Computer Aided  Diagnosis of Diabetic Retinopathy](#discovery-radiomics-with-clear-dr-interpretable-computer-aided--diagnosis-of-diabetic-retinopathy)
    * [Building Data-driven Models with Microstructural Images: Generalization  and Interpretability](#building-data-driven-models-with-microstructural-images-generalization--and-interpretability)
    * [Interpretable Feature Recommendation for Signal Analytics](#interpretable-feature-recommendation-for-signal-analytics)
    * [Interpretable and Pedagogical Examples](#interpretable-and-pedagogical-examples)
    * [Unsupervised patient representations from clinical notes with  interpretable classification decisions](#unsupervised-patient-representations-from-clinical-notes-with--interpretable-classification-decisions)
    * [Interpretable probabilistic embeddings: bridging the gap between topic  models and neural networks](#interpretable-probabilistic-embeddings-bridging-the-gap-between-topic--models-and-neural-networks)
    * [Arrhythmia Classification from the Abductive Interpretation of Short  Single-Lead ECG Records](#arrhythmia-classification-from-the-abductive-interpretation-of-short--single-lead-ecg-records)
    * [Interpretable classifiers using rules and Bayesian analysis: Building a  better stroke prediction model](#interpretable-classifiers-using-rules-and-bayesian-analysis-building-a--better-stroke-prediction-model)
    * [Interpretable Deep Neural Networks for Single-Trial EEG Classification](#interpretable-deep-neural-networks-for-single-trial-eeg-classification)
    * [Learning Interpretable Musical Compositional Rules and Traces](#learning-interpretable-musical-compositional-rules-and-traces)
    * [Building an Interpretable Recommender via Loss-Preserving Transformation](#building-an-interpretable-recommender-via-loss-preserving-transformation)
    * [Interpretable Machine Learning Models for the Digital Clock Drawing Test](#interpretable-machine-learning-models-for-the-digital-clock-drawing-test)
    * [RETAIN: An Interpretable Predictive Model for Healthcare using Reverse  Time Attention Mechanism](#retain-an-interpretable-predictive-model-for-healthcare-using-reverse--time-attention-mechanism)
    * [Real Time Fine-Grained Categorization with Accuracy and Interpretability](#real-time-fine-grained-categorization-with-accuracy-and-interpretability)
    * [Interpreting Neural Networks to Improve Politeness Comprehension](#interpreting-neural-networks-to-improve-politeness-comprehension)
    * [Interpretable Semantic Textual Similarity: Finding and explaining  differences between sentences](#interpretable-semantic-textual-similarity-finding-and-explaining--differences-between-sentences)
    * [Streaming Weak Submodularity: Interpreting Neural Networks on the Fly](#streaming-weak-submodularity-interpreting-neural-networks-on-the-fly)
    * [Interpretable Learning for Self-Driving Cars by Visualizing Causal  Attention](#interpretable-learning-for-self-driving-cars-by-visualizing-causal--attention)
    * [Interpretable 3D Human Action Analysis with Temporal Convolutional  Networks](#interpretable-3d-human-action-analysis-with-temporal-convolutional--networks)
    * [An Interpretable Knowledge Transfer Model for Knowledge Base Completion](#an-interpretable-knowledge-transfer-model-for-knowledge-base-completion)
    * [MDNet: A Semantically and Visually Interpretable Medical Image Diagnosis  Network](#mdnet-a-semantically-and-visually-interpretable-medical-image-diagnosis--network)
    * [Interpretable Active Learning](#interpretable-active-learning)
    * [DeepFaceLIFT: Interpretable Personalized Models for Automatic Estimation  of Self-Reported Pain](#deepfacelift-interpretable-personalized-models-for-automatic-estimation--of-self-reported-pain)
    * [More cat than cute? Interpretable Prediction of Adjective-Noun Pairs](#more-cat-than-cute-interpretable-prediction-of-adjective-noun-pairs)
    * [Interpretable Categorization of Heterogeneous Time Series Data](#interpretable-categorization-of-heterogeneous-time-series-data)
    * [Interpretable Graph-Based Semi-Supervised Learning via Flows](#interpretable-graph-based-semi-supervised-learning-via-flows)
    * [CTD: Fast, Accurate, and Interpretable Method for Static and Dynamic  Tensor Decompositions](#ctd-fast-accurate-and-interpretable-method-for-static-and-dynamic--tensor-decompositions)
    * [Interpretable Machine Learning for Privacy-Preserving Pervasive Systems](#interpretable-machine-learning-for-privacy-preserving-pervasive-systems)
    * [Interpretable Convolutional Neural Networks for Effective Translation  Initiation Site Prediction](#interpretable-convolutional-neural-networks-for-effective-translation--initiation-site-prediction)
    * [Interpretable Facial Relational Network Using Relational Importance](#interpretable-facial-relational-network-using-relational-importance)
    * [SMILES2Vec: An Interpretable General-Purpose Deep Neural Network for  Predicting Chemical Properties](#smiles2vec-an-interpretable-general-purpose-deep-neural-network-for--predicting-chemical-properties)
    
1. Determine interpretability of CNN
    * [Network Dissection: Quantifying Interpretability of Deep Visual  Representations](#network-dissection-quantifying-interpretability-of-deep-visual--representations)
    * [A Formal Framework to Characterize Interpretability of Procedures](#a-formal-framework-to-characterize-interpretability-of-procedures)
    
1. Criticize
    * [Interpretation of Neural Networks is Fragile](#interpretation-of-neural-networks-is-fragile)
    * [The Promise and Peril of Human Evaluation for Model Interpretability](#the-promise-and-peril-of-human-evaluation-for-model-interpretability)
    
1. Interpret existing model
    * [Artificial Intelligence as Structural Estimation: Economic  Interpretations of Deep Blue, Bonanza, and AlphaGo](#artificial-intelligence-as-structural-estimation-economic--interpretations-of-deep-blue-bonanza-and-alphago)
    * [Semantic Structure and Interpretability of Word Embeddings](#semantic-structure-and-interpretability-of-word-embeddings)
    * [Interpreting Convolutional Neural Networks Through Compression](#interpreting-convolutional-neural-networks-through-compression)
    * [Interpreting Deep Visual Representations via Network Dissection](#interpreting-deep-visual-representations-via-network-dissection)
    * [The Mythos of Model Interpretability](#the-mythos-of-model-interpretability)
    * [Model-Agnostic Interpretability of Machine Learning](#model-agnostic-interpretability-of-machine-learning)
    * [Using Visual Analytics to Interpret Predictive Machine Learning Models](#using-visual-analytics-to-interpret-predictive-machine-learning-models)
    * [Towards Transparent AI Systems: Interpreting Visual Question Answering  Models](#towards-transparent-ai-systems-interpreting-visual-question-answering--models)
    * [Embedding Projector: Interactive Visualization and Interpretation of  Embeddings](#embedding-projector-interactive-visualization-and-interpretation-of--embeddings)
    * [Learning Interpretability for Visualizations using Adapted Cox Models  through a User Experiment](#learning-interpretability-for-visualizations-using-adapted-cox-models--through-a-user-experiment)
    * [Interpreting Finite Automata for Sequential Data](#interpreting-finite-automata-for-sequential-data)
    * [Interpretation of Prediction Models Using the Input Gradient](#interpretation-of-prediction-models-using-the-input-gradient)
    * [Interpretable Recurrent Neural Networks Using Sequential Sparse Recovery](#interpretable-recurrent-neural-networks-using-sequential-sparse-recovery)
    * [An unexpected unity among methods for interpreting model predictions](#an-unexpected-unity-among-methods-for-interpreting-model-predictions)
    * [Towards A Rigorous Science of Interpretable Machine Learning](#towards-a-rigorous-science-of-interpretable-machine-learning)
    * [Softmax Q-Distribution Estimation for Structured Prediction: A  Theoretical Interpretation for RAML](#softmax-q-distribution-estimation-for-structured-prediction-a--theoretical-interpretation-for-raml)
    * [A Unified Approach to Interpreting Model Predictions](#a-unified-approach-to-interpreting-model-predictions)
    * [Interpreting Blackbox Models via Model Extraction](#interpreting-blackbox-models-via-model-extraction)
    * [Interpretable &amp; Explorable Approximations of Black Box Models](#interpretable--explorable-approximations-of-black-box-models)
    * [Interpretability via Model Extraction](#interpretability-via-model-extraction)
    * [Methods for Interpreting and Understanding Deep Neural Networks](#methods-for-interpreting-and-understanding-deep-neural-networks)
    * [Interpreting Classifiers through Attribute Interactions in Datasets](#interpreting-classifiers-through-attribute-interactions-in-datasets)
    * [Using Program Induction to Interpret Transition System Dynamics](#using-program-induction-to-interpret-transition-system-dynamics)
    * [Warp: a method for neural network interpretability applied to gene  expression profiles](#warp-a-method-for-neural-network-interpretability-applied-to-gene--expression-profiles)
    * [Towards Interpretable Deep Neural Networks by Leveraging Adversarial  Examples](#towards-interpretable-deep-neural-networks-by-leveraging-adversarial--examples)
    * [Explainable Artificial Intelligence: Understanding, Visualizing and  Interpreting Deep Learning Models](#explainable-artificial-intelligence-understanding-visualizing-and--interpreting-deep-learning-models)
    * [Interpreting Shared Deep Learning Models via Explicable Boundary Trees](#interpreting-shared-deep-learning-models-via-explicable-boundary-trees)
    * [Unleashing the Potential of CNNs for Interpretable Few-Shot Learning](#unleashing-the-potential-of-cnns-for-interpretable-few-shot-learning)
    * [Train, Diagnose and Fix: Interpretable Approach for Fine-grained Action  Recognition](#train-diagnose-and-fix-interpretable-approach-for-fine-grained-action--recognition)
    * [An interpretable latent variable model for attribute applicability in  the Amazon catalogue](#an-interpretable-latent-variable-model-for-attribute-applicability-in--the-amazon-catalogue)
    * [Where Classification Fails, Interpretation Rises](#where-classification-fails-interpretation-rises)
    
1. Attempt to improve interpretability
    * [Contextual Regression: An Accurate and Conveniently Interpretable  Nonlinear Model for Mining Discovery from Scientific Data](#contextual-regression-an-accurate-and-conveniently-interpretable--nonlinear-model-for-mining-discovery-from-scientific-data)
    * [Interpretable R-CNN](#interpretable-r-cnn)
    * [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](#increasing-the-interpretability-of-recurrent-neural-networks-using--hidden-markov-models)
    * [SnapToGrid: From Statistical to Interpretable Models for Biomedical  Information Extraction](#snaptogrid-from-statistical-to-interpretable-models-for-biomedical--information-extraction)
    * [Meaningful Models: Utilizing Conceptual Structure to Improve Machine  Learning Interpretability](#meaningful-models-utilizing-conceptual-structure-to-improve-machine--learning-interpretability)
    * [Particle Swarm Optimization for Generating Interpretable Fuzzy  Reinforcement Learning Policies](#particle-swarm-optimization-for-generating-interpretable-fuzzy--reinforcement-learning-policies)
    * [Increasing the Interpretability of Recurrent Neural Networks Using  Hidden Markov Models](#increasing-the-interpretability-of-recurrent-neural-networks-using--hidden-markov-models-1)
    * [Growing Interpretable Part Graphs on ConvNets via Multi-Shot Learning](#growing-interpretable-part-graphs-on-convnets-via-multi-shot-learning)
    * [GENESIM: genetic extraction of a single, interpretable model](#genesim-genetic-extraction-of-a-single-interpretable-model)
    * [Stratified Knowledge Bases as Interpretable Probabilistic Models  (Extended Abstract)](#stratified-knowledge-bases-as-interpretable-probabilistic-models--extended-abstract)
    * [Tree Space Prototypes: Another Look at Making Tree Ensembles  Interpretable](#tree-space-prototypes-another-look-at-making-tree-ensembles--interpretable)
    * [Inducing Interpretable Representations with Variational Autoencoders](#inducing-interpretable-representations-with-variational-autoencoders)
    * [Input Switched Affine Networks: An RNN Architecture Designed for  Interpretability](#input-switched-affine-networks-an-rnn-architecture-designed-for--interpretability)
    * [Large scale modeling of antimicrobial resistance with interpretable  classifiers](#large-scale-modeling-of-antimicrobial-resistance-with-interpretable--classifiers)
    * [Towards a New Interpretation of Separable Convolutions](#towards-a-new-interpretation-of-separable-convolutions)
    * [Interpretable Structure-Evolving LSTM](#interpretable-structure-evolving-lstm)
    * [Improving Interpretability of Deep Neural Networks with Semantic  Information](#improving-interpretability-of-deep-neural-networks-with-semantic--information)
    * [InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations](#infogail-interpretable-imitation-learning-from-visual-demonstrations)
    * [Patchnet: Interpretable Neural Networks for Image Classification](#patchnet-interpretable-neural-networks-for-image-classification)
    * [Unsupervised Learning of Disentangled and Interpretable Representations  from Sequential Data](#unsupervised-learning-of-disentangled-and-interpretable-representations--from-sequential-data)
    * [Interpretable Convolutional Neural Networks](#interpretable-convolutional-neural-networks)
    * [InterpNET: Neural Introspection for Interpretable Deep Learning](#interpnet-neural-introspection-for-interpretable-deep-learning)
    * [MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural  Networks](#minimalrnn-toward-more-interpretable-and-trainable-recurrent-neural--networks)
    * [Beyond Sparsity: Tree Regularization of Deep Models for Interpretability](#beyond-sparsity-tree-regularization-of-deep-models-for-interpretability)
    * [SPINE: SParse Interpretable Neural Embeddings](#spine-sparse-interpretable-neural-embeddings)
    * [Improving the Adversarial Robustness and Interpretability of Deep Neural  Networks by Regularizing their Input Gradients](#improving-the-adversarial-robustness-and-interpretability-of-deep-neural--networks-by-regularizing-their-input-gradients)


# Papers

## [Interpretable Policies for Reinforcement Learning by Genetic Programming](https://arxiv.org/abs/1712.04170)
[(PDF)](https://arxiv.org/pdf/1712.04170)

`Authors:Daniel Hein, Steffen Udluft, Thomas A. Runkler`


Subjects:

Artificial Intelligence (cs.AI); Neural and Evolutionary Computing (cs.NE); Systems and Control (cs.SY)


Cite as:

arXiv:1712.04170 [cs.AI]

 
(or arXiv:1712.04170v1 [cs.AI] for this version)


> Abstract: The search for interpretable reinforcement learning policies is of high
academic and industrial interest. Especially for industrial systems, domain
experts are more likely to deploy autonomously learned controllers if they are
understandable and convenient to evaluate. Basic algebraic equations are
supposed to meet these requirements, as long as they are restricted to an
adequate complexity. Here we introduce the genetic programming for
reinforcement learning (GPRL) approach based on model-based batch reinforcement
learning and genetic programming, which autonomously learns policy equations
from pre-existing default state-action trajectory samples. GPRL is compared to
a straight-forward method which utilizes genetic programming for symbolic
regression, yielding policies imitating an existing well-performing, but
non-interpretable policy. Experiments on three reinforcement learning
benchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark,
demonstrate the superiority of our GPRL approach compared to the symbolic
regression method. GPRL is capable of producing well-performing interpretable
reinforcement learning policies from pre-existing default trajectory data.


## [Discovery Radiomics with CLEAR-DR: Interpretable Computer Aided  Diagnosis of Diabetic Retinopathy](https://arxiv.org/abs/1710.10675)
[(PDF)](https://arxiv.org/pdf/1710.10675)

`Authors:Devinder Kumar, Graham W. Taylor, Alexander Wong`


Subjects:

Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)


Cite as:

arXiv:1710.10675 [cs.AI]

 
(or arXiv:1710.10675v1 [cs.AI] for this version)


> Abstract: Objective: Radiomics-driven Computer Aided Diagnosis (CAD) has shown
considerable promise in recent years as a potential tool for improving clinical
decision support in medical oncology, particularly those based around the
concept of Discovery Radiomics, where radiomic sequencers are discovered
through the analysis of medical imaging data. One of the main limitations with
current CAD approaches is that it is very difficult to gain insight or
rationale as to how decisions are made, thus limiting their utility to
clinicians. Methods: In this study, we propose CLEAR-DR, a novel interpretable
CAD system based on the notion of CLass-Enhanced Attentive Response Discovery
Radiomics for the purpose of clinical decision support for diabetic
retinopathy. Results: In addition to disease grading via the discovered deep
radiomic sequencer, the CLEAR-DR system also produces a visual interpretation
of the decision-making process to provide better insight and understanding into
the decision-making process of the system. Conclusion: We demonstrate the
effectiveness and utility of the proposed CLEAR-DR system of enhancing the
interpretability of diagnostic grading results for the application of diabetic
retinopathy grading. Significance: CLEAR-DR can act as a potential powerful
tool to address the uninterpretability issue of current CAD systems, thus
improving their utility to clinicians.


## [Interpretation of Neural Networks is Fragile](https://arxiv.org/abs/1710.10547)
[(PDF)](https://arxiv.org/pdf/1710.10547)

`Authors:Amirata Ghorbani, Abubakar Abid, James Zou`


Comments:

Submitted for review at ICLR 2018

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1710.10547 [stat.ML]

 
(or arXiv:1710.10547v1 [stat.ML] for this version)


> Abstract: In order for machine learning to be deployed and trusted in many
applications, it is crucial to be able to reliably explain why the machine
learning algorithm makes certain predictions. For example, if an algorithm
classifies a given pathology image to be a malignant tumor, then the doctor may
need to know which parts of the image led the algorithm to this classification.
How to interpret black-box predictors is thus an important and active area of
research. A fundamental question is: how much can we trust the interpretation
itself? In this paper, we show that interpretation of deep learning predictions
is extremely fragile in the following sense: two perceptively indistinguishable
inputs with the same predicted label can be assigned very different
interpretations. We systematically characterize the fragility of several
widely-used feature-importance interpretation methods (saliency maps, relevance
propagation, and DeepLIFT) on ImageNet and CIFAR-10. Our experiments show that
even small random perturbation can change the feature importance and new
systematic perturbations can lead to dramatically different interpretations
without changing the label. We extend these results to show that
interpretations based on exemplars (e.g. influence functions) are similarly
fragile. Our analysis of the geometry of the Hessian matrix gives insight on
why fragility could be a fundamental challenge to the current interpretation
approaches.


## [Artificial Intelligence as Structural Estimation: Economic  Interpretations of Deep Blue, Bonanza, and AlphaGo](https://arxiv.org/abs/1710.10967)
[(PDF)](https://arxiv.org/pdf/1710.10967)

`Authors:Mitsuru Igami`


Subjects:

Econometrics (econ.EM); Artificial Intelligence (cs.AI); Learning (cs.LG)


Cite as:

arXiv:1710.10967 [econ.EM]

 
(or arXiv:1710.10967v2 [econ.EM] for this version)


> Abstract: Artificial intelligence (AI) has achieved superhuman performance in a growing
number of tasks, including the classical games of chess, shogi, and Go, but
understanding and explaining AI remain challenging. This paper studies the
machine-learning algorithms for developing the game AIs, and provides their
structural interpretations. Specifically, chess-playing Deep Blue is a
calibrated value function, whereas shogi-playing Bonanza represents an
estimated value function via Rust's (1987) nested fixed-point method. AlphaGo's
"supervised-learning policy network" is a deep neural network (DNN) version of
Hotz and Miller's (1993) conditional choice probability estimates; its
"reinforcement-learning value network" is equivalent to Hotz, Miller, Sanders,
and Smith's (1994) simulation method for estimating the value function. Their
performances suggest DNNs are a useful functional form when the state space is
large and data are sparse. Explicitly incorporating strategic interactions and
unobserved heterogeneity in the data-generating process would further improve
AIs' explicability.


## [Contextual Regression: An Accurate and Conveniently Interpretable  Nonlinear Model for Mining Discovery from Scientific Data](https://arxiv.org/abs/1710.10728)
[(PDF)](https://arxiv.org/pdf/1710.10728)

`Authors:Chengyu Liu, Wei Wang`


Comments:

18 pages of Main Article, 30 pages of Supplementary Material

Subjects:

Quantitative Methods (q-bio.QM); Learning (cs.LG); Applications (stat.AP); Computation (stat.CO); Machine Learning (stat.ML)


Cite as:

arXiv:1710.10728 [q-bio.QM]

 
(or arXiv:1710.10728v1 [q-bio.QM] for this version)


> Abstract: Machine learning algorithms such as linear regression, SVM and neural network
have played an increasingly important role in the process of scientific
discovery. However, none of them is both interpretable and accurate on
nonlinear datasets. Here we present contextual regression, a method that joins
these two desirable properties together using a hybrid architecture of neural
network embedding and dot product layer. We demonstrate its high prediction
accuracy and sensitivity through the task of predictive feature selection on a
simulated dataset and the application of predicting open chromatin sites in the
human genome. On the simulated data, our method achieved high fidelity recovery
of feature contributions under random noise levels up to 200%. On the open
chromatin dataset, the application of our method not only outperformed the
state of the art method in terms of accuracy, but also unveiled two previously
unfound open chromatin related histone marks. Our method can fill the blank of
accurate and interpretable nonlinear modeling in scientific data mining tasks.


## [Building Data-driven Models with Microstructural Images: Generalization  and Interpretability](https://arxiv.org/abs/1711.00404)
[(PDF)](https://arxiv.org/pdf/1711.00404)

`Authors:Julia Ling, Maxwell Hutchinson, Erin Antono, Brian DeCost, Elizabeth A. Holm, Bryce Meredig`


Subjects:

Artificial Intelligence (cs.AI); Materials Science (cond-mat.mtrl-sci)


Cite as:

arXiv:1711.00404 [cs.AI]

 
(or arXiv:1711.00404v1 [cs.AI] for this version)


> Abstract: As data-driven methods rise in popularity in materials science applications,
a key question is how these machine learning models can be used to understand
microstructure. Given the importance of process-structure-property relations
throughout materials science, it seems logical that models that can leverage
microstructural data would be more capable of predicting property information.
While there have been some recent attempts to use convolutional neural networks
to understand microstructural images, these early studies have focused only on
which featurizations yield the highest machine learning model accuracy for a
single data set. This paper explores the use of convolutional neural networks
for classifying microstructure with a more holistic set of objectives in mind:
generalization between data sets, number of features required, and
interpretability.


## [Interpretable Feature Recommendation for Signal Analytics](https://arxiv.org/abs/1711.01870)
[(PDF)](https://arxiv.org/pdf/1711.01870)

`Authors:Snehasis Banerjee, Tanushyam Chattopadhyay, Ayan Mukherjee`


Comments:

4 pages, Interpretable Data Mining Workshop, CIKM 2017

Subjects:

Machine Learning (stat.ML); Learning (cs.LG)


Cite as:

arXiv:1711.01870 [stat.ML]

 
(or arXiv:1711.01870v1 [stat.ML] for this version)


> Abstract: This paper presents an automated approach for interpretable feature
recommendation for solving signal data analytics problems. The method has been
tested by performing experiments on datasets in the domain of prognostics where
interpretation of features is considered very important. The proposed approach
is based on Wide Learning architecture and provides means for interpretation of
the recommended features. It is to be noted that such an interpretation is not
available with feature learning approaches like Deep Learning (such as
Convolutional Neural Network) or feature transformation approaches like
Principal Component Analysis. Results show that the feature recommendation and
interpretation techniques are quite effective for the problems at hand in terms
of performance and drastic reduction in time to develop a solution. It is
further shown by an example, how this human-in-loop interpretation system can
be used as a prescriptive system.


## [Semantic Structure and Interpretability of Word Embeddings](https://arxiv.org/abs/1711.00331)
[(PDF)](https://arxiv.org/pdf/1711.00331)

`Authors:Lutfi Kerem Senel, Ihsan Utlu, Veysel Yucesoy, Aykut Koc, Tolga Cukur`


Comments:

10 Pages, 7 Figures

Subjects:

Computation and Language (cs.CL)


Cite as:

arXiv:1711.00331 [cs.CL]

 
(or arXiv:1711.00331v2 [cs.CL] for this version)


> Abstract: Dense word embeddings, which encode semantic meanings of words to low
dimensional vector spaces have become very popular in natural language
processing (NLP) research due to their state-of-the-art performances in many
NLP tasks. Word embeddings are substantially successful in capturing semantic
relations among words, so a meaningful semantic structure must be present in
the respective vector spaces. However, in many cases, this semantic structure
is broadly and heterogeneously distributed across the embedding dimensions,
which makes interpretation a big challenge. In this study, we propose a
statistical method to uncover the latent semantic structure in the dense word
embeddings. To perform our analysis we introduce a new dataset (SEMCAT) that
contains more than 6500 words semantically grouped under 110 categories. We
further propose a method to quantify the interpretability of the word
embeddings; the proposed method is a practical alternative to the classical
word intrusion test that requires human intervention.


## [Interpretable and Pedagogical Examples](https://arxiv.org/abs/1711.00694)
[(PDF)](https://arxiv.org/pdf/1711.00694)

`Authors:Smitha Milli, Pieter Abbeel, Igor Mordatch`


Subjects:

Artificial Intelligence (cs.AI)


Cite as:

arXiv:1711.00694 [cs.AI]

 
(or arXiv:1711.00694v1 [cs.AI] for this version)


> Abstract: Teachers intentionally pick the most informative examples to show their
students. However, if the teacher and student are neural networks, the examples
that the teacher network learns to give, although effective at teaching the
student, are typically uninterpretable. We show that training the student and
teacher iteratively, rather than jointly, can produce interpretable teaching
strategies. We evaluate interpretability by (1) measuring the similarity of the
teacher's emergent strategies to intuitive strategies in each domain and (2)
conducting human experiments to evaluate how effective the teacher's strategies
are at teaching humans. We show that the teacher network learns to select or
generate interpretable, pedagogical examples to teach rule-based,
probabilistic, boolean, and hierarchical concepts.


## [Unsupervised patient representations from clinical notes with  interpretable classification decisions](https://arxiv.org/abs/1711.05198)
[(PDF)](https://arxiv.org/pdf/1711.05198)

`Authors:Madhumita Sushil, Simon Šuster, Kim Luyckx, Walter Daelemans`


Comments:

Accepted poster at NIPS 2017 Workshop on Machine Learning for Health (this https URL)

Subjects:

Computation and Language (cs.CL)


Cite as:

arXiv:1711.05198 [cs.CL]

 
(or arXiv:1711.05198v1 [cs.CL] for this version)


> Abstract: We have two main contributions in this work: 1. We explore the usage of a
stacked denoising autoencoder, and a paragraph vector model to learn
task-independent dense patient representations directly from clinical notes. We
evaluate these representations by using them as features in multiple supervised
setups, and compare their performance with those of sparse representations. 2.
To understand and interpret the representations, we explore the best encoded
features within the patient representations obtained from the autoencoder
model. Further, we calculate the significance of the input features of the
trained classifiers when we use these pretrained representations as input.


## [Interpreting Convolutional Neural Networks Through Compression](https://arxiv.org/abs/1711.02329)
[(PDF)](https://arxiv.org/pdf/1711.02329)

`Authors:Reza Abbasi-Asl, Bin Yu`


Comments:

Presented at NIPS 2017 Symposium on Interpretable Machine Learning

Subjects:

Machine Learning (stat.ML); Computer Vision and Pattern Recognition (cs.CV); Learning (cs.LG)


Cite as:

arXiv:1711.02329 [stat.ML]

 
(or arXiv:1711.02329v1 [stat.ML] for this version)


> Abstract: Convolutional neural networks (CNNs) achieve state-of-the-art performance in
a wide variety of tasks in computer vision. However, interpreting CNNs still
remains a challenge. This is mainly due to the large number of parameters in
these networks. Here, we investigate the role of compression and particularly
pruning filters in the interpretation of CNNs. We exploit our recently-proposed
greedy structural compression scheme that prunes filters in a trained CNN. In
our compression, the filter importance index is defined as the classification
accuracy reduction (CAR) of the network after pruning that filter. The filters
are then iteratively pruned based on the CAR index. We demonstrate the
interpretability of CAR-compressed CNNs by showing that our algorithm prunes
filters with visually redundant pattern selectivity. Specifically, we show the
importance of shape-selective filters for object recognition, as opposed to
color-selective filters. Out of top 20 CAR-pruned filters in AlexNet, 17 of
them in the first layer and 14 of them in the second layer are color-selective
filters. Finally, we introduce a variant of our CAR importance index that
quantifies the importance of each image class to each CNN filter. We show that
the most and the least important class labels present a meaningful
interpretation of each filter that is consistent with the visualized pattern
selectivity of that filter.


## [Interpretable probabilistic embeddings: bridging the gap between topic  models and neural networks](https://arxiv.org/abs/1711.04154)
[(PDF)](https://arxiv.org/pdf/1711.04154)

`Authors:Anna Potapenko, Artem Popov, Konstantin Vorontsov`


Comments:

Appeared in AINL-2017

Subjects:

Computation and Language (cs.CL)


Cite as:

arXiv:1711.04154 [cs.CL]

 
(or arXiv:1711.04154v1 [cs.CL] for this version)


> Abstract: We consider probabilistic topic models and more recent word embedding
techniques from a perspective of learning hidden semantic representations.
Inspired by a striking similarity of the two approaches, we merge them and
learn probabilistic embeddings with online EM-algorithm on word co-occurrence
data. The resulting embeddings perform on par with Skip-Gram Negative Sampling
(SGNS) on word similarity tasks and benefit in the interpretability of the
components. Next, we learn probabilistic document embeddings that outperform
paragraph2vec on a document similarity task and require less memory and time
for training. Finally, we employ multimodal Additive Regularization of Topic
Models (ARTM) to obtain a high sparsity and learn embeddings for other
modalities, such as timestamps and categories. We observe further improvement
of word similarity performance and meaningful inter-modality similarities.


## [Arrhythmia Classification from the Abductive Interpretation of Short  Single-Lead ECG Records](https://arxiv.org/abs/1711.03892)
[(PDF)](https://arxiv.org/pdf/1711.03892)

`Authors:Tomás Teijeiro, Constantino A. García, Daniel Castro, Paulo Félix`


Comments:

4 pages, 3 figures. Presented in the Computing in Cardiology 2017 conference

Subjects:

Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)


MSC classes:

68T10


Cite as:

arXiv:1711.03892 [cs.AI]

 
(or arXiv:1711.03892v1 [cs.AI] for this version)


> Abstract: In this work we propose a new method for the rhythm classification of short
single-lead ECG records, using a set of high-level and clinically meaningful
features provided by the abductive interpretation of the records. These
features include morphological and rhythm-related features that are used to
build two classifiers: one that evaluates the record globally, using aggregated
values for each feature; and another one that evaluates the record as a
sequence, using a Recurrent Neural Network fed with the individual features for
each detected heartbeat. The two classifiers are finally combined using the
stacking technique, providing an answer by means of four target classes: Normal
sinus rhythm, Atrial fibrillation, Other anomaly, and Noisy. The approach has
been validated against the 2017 Physionet/CinC Challenge dataset, obtaining a
final score of 0.83 and ranking first in the competition.


## [Interpretable R-CNN](https://arxiv.org/abs/1711.05226)
[(PDF)](https://arxiv.org/pdf/1711.05226)

`Authors:Tianfu Wu, Xilai Li, Xi Song, Wei Sun, Liang Dong, Bo Li`


Comments:

13 pages

Subjects:

Computer Vision and Pattern Recognition (cs.CV)


Cite as:

arXiv:1711.05226 [cs.CV]

 
(or arXiv:1711.05226v1 [cs.CV] for this version)


> Abstract: This paper presents a method of learning qualitatively interpretable models
in object detection using popular two-stage region-based ConvNet detection
systems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI
(Region-of-Interest) prediction network.By interpretable models, we focus on
weakly-supervised extractive rationale generation, that is learning to unfold
latent discriminative part configurations of object instances automatically and
simultaneously in detection without using any supervision for part
configurations. We utilize a top-down hierarchical and compositional grammar
model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold
the space of latent part configurations of RoIs. We propose an AOGParsing
operator to substitute the RoIPooling operator widely used in R-CNN, so the
proposed method is applicable to many state-of-the-art ConvNet based detection
systems. The AOGParsing operator aims to harness both the explainable rigor of
top-down hierarchical and compositional grammar models and the discriminative
power of bottom-up deep neural networks through end-to-end training. In
detection, a bounding box is interpreted by the best parse tree derived from
the AOG on-the-fly, which is treated as the extractive rationale generated for
interpreting detection. In learning, we propose a folding-unfolding method to
train the AOG and ConvNet end-to-end. In experiments, we build on top of the
R-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets
with performance comparable to state-of-the-art methods.


## [Interpreting Deep Visual Representations via Network Dissection](https://arxiv.org/abs/1711.05611)
[(PDF)](https://arxiv.org/pdf/1711.05611)

`Authors:Bolei Zhou, David Bau, Aude Oliva, Antonio Torralba`


Comments:

*B. Zhou and D. Bau contributed equally to this work. 15 pages, 27 figures

Subjects:

Computer Vision and Pattern Recognition (cs.CV)


ACM classes:

I.2.10


Cite as:

arXiv:1711.05611 [cs.CV]

 
(or arXiv:1711.05611v1 [cs.CV] for this version)


> Abstract: The success of recent deep convolutional neural networks (CNNs) depends on
learning hidden representations that can summarize the important factors of
variation behind the data. However, CNNs often criticized as being black boxes
that lack interpretability, since they have millions of unexplained model
parameters. In this work, we describe Network Dissection, a method that
interprets networks by providing labels for the units of their deep visual
representations. The proposed method quantifies the interpretability of CNN
representations by evaluating the alignment between individual hidden units and
a set of visual semantic concepts. By identifying the best alignments, units
are given human interpretable labels across a range of objects, parts, scenes,
textures, materials, and colors. The method reveals that deep representations
are more transparent and interpretable than expected: we find that
representations are significantly more interpretable than they would be under a
random equivalently powerful basis. We apply the method to interpret and
compare the latent representations of various network architectures trained to
solve different supervised and self-supervised training tasks. We then examine
factors affecting the network interpretability such as the number of the
training iterations, regularizations, different initializations, and the
network depth and width. Finally we show that the interpreted units can be used
to provide explicit explanations of a prediction given by a CNN for an image.
Our results highlight that interpretability is an important property of deep
neural networks that provides new insights into their hierarchical structure.


## [Interpretable classifiers using rules and Bayesian analysis: Building a  better stroke prediction model](https://arxiv.org/abs/1511.01644)
[(PDF)](https://arxiv.org/pdf/1511.01644)

`Authors:Benjamin Letham, Cynthia Rudin, Tyler H. McCormick, David Madigan`


Comments:

Published at this http URL in the Annals of Applied Statistics (this http URL) by the Institute of Mathematical Statistics (this http URL)

Subjects:

Applications (stat.AP); Learning (cs.LG); Machine Learning (stat.ML)


Journal reference:

Annals of Applied Statistics 2015, Vol. 9, No. 3, 1350-1371


DOI:

10.1214/15-AOAS848


Report number:

IMS-AOAS-AOAS848


Cite as:

arXiv:1511.01644 [stat.AP]

 
(or arXiv:1511.01644v1 [stat.AP] for this version)


> Abstract: We aim to produce predictive models that are not only accurate, but are also
interpretable to human experts. Our models are decision lists, which consist of
a series of if...then... statements (e.g., if high blood pressure, then stroke)
that discretize a high-dimensional, multivariate feature space into a series of
simple, readily interpretable decision statements. We introduce a generative
model called Bayesian Rule Lists that yields a posterior distribution over
possible decision lists. It employs a novel prior structure to encourage
sparsity. Our experiments show that Bayesian Rule Lists has predictive accuracy
on par with the current top algorithms for prediction in machine learning. Our
method is motivated by recent developments in personalized medicine, and can be
used to produce highly accurate and interpretable medical scoring systems. We
demonstrate this by producing an alternative to the CHADS$_2$ score, actively
used in clinical practice for estimating the risk of stroke in patients that
have atrial fibrillation. Our model is as interpretable as CHADS$_2$, but more
accurate.


## [Interpretable Deep Neural Networks for Single-Trial EEG Classification](https://arxiv.org/abs/1604.08201)
[(PDF)](https://arxiv.org/pdf/1604.08201)

`Authors:Irene Sturm, Sebastian Bach, Wojciech Samek, Klaus-Robert Müller`


Comments:

5 pages, 1 figure

Subjects:

Neural and Evolutiona

Download .txt

gitextract_ktd4iga1/

├── .gitignore
├── README.md
└── arxiv.md

Download .json

Condensed preview — 3 files, each showing path, character count, and a content snippet. Download the .json file or copy for the full structured content (326K chars).

[
  {
    "path": ".gitignore",
    "chars": 15,
    "preview": ".DS_Store\nnote/"
  },
  {
    "path": "README.md",
    "chars": 158881,
    "preview": "# Index\n1. Application and interpret\n    * [Interpretable Policies for Reinforcement Learning by Genetic Programming](#i"
  },
  {
    "path": "arxiv.md",
    "chars": 158881,
    "preview": "# Index\n1. Application and interpret\n    * [Interpretable Policies for Reinforcement Learning by Genetic Programming](#i"
  }
]

About this extraction

This page contains the full source code of the liupeng3425/interpretable-research GitHub repository, extracted and formatted as plain text for AI agents and large language models (LLMs). The extraction includes 3 files (310.3 KB), approximately 73.8k tokens. Use this with OpenClaw, Claude, ChatGPT, Cursor, Windsurf, or any other AI tool that accepts text input. You can copy the full output to your clipboard or download it as a .txt file.

Extracted by GitExtract — free GitHub repo to text converter for AI. Built by Nikandr Surkov.

Extract another repo