Methods of explainable-AI (ex-AI)

Explainable-AI Pythia - the high prestress of the temple of Apollo at Delphi

Welcome Students to the course LV 706.315 !

3 ECTS, summer term 2018

This page is valid as of June, 17, 2018, 12:00 CEST

INTRODUCTION: 6 min Youtube Video on explainable AI

MOTIVATION for this course:

This course “Methods of explainable AI” is a natural offspring of the interactive Machine Learning (iML) courses and the decision making courses held over the years before. A huge motivation in continuing to study our iML concept [1] – with a human in the loop [2] (see our project page) – is that modern AI/machine learning models (see the difference AI-ML) are often considered to be “black-boxes” [3]; it is difficult to re-enact and to answer the question of why a certain machine decision has been reached. A general serious drawback is that such models have no explicit declarative knowledge representation, hence have difficulty in generating the required explanatory structures – the context – which considerably limits the achievement of their full potential [4]. Most of all a human expert decision maker (e.g. a medical doctor) is not interested in the internals of any model, she/he wants to retrace a result on demand in an human understandable way. This calls not only for explainable models, but also for explanation interfaces (see AK HCI course). AI usability is experiencing a new renaissance in engineering. Interestingly, early AI systems (rule based systems) were explainiable to a certain extent within a well-defined problem space. Therefore this course will also provide a background on decision support systems from the early 1970ies (e.g. MYCIN, or GAMUTS of Radiology). Last but not least a proverb by Richard FEYNMAN: “If you do not undertand it – try to explain it” – and indeed “explainability” is the core of science.

GOAL of this course

This graduate course follows a research-based teaching (RBT) approach and provides an overview of selected current state-of-the-art methods on making AI transparent re-traceable, re-enactable, understandable, consequently explainable. Note: We speak Python.


Explainability is motivated due to lacking transparency of so-called black-box approaches, which do not foster trust [6] and acceptance of AI generally and ML specifically. Rising legal and privacy aspects, e.g. with the new European General Data Protection Regulations (GDPR, which is now in effect since May 2018) will make black-box approaches difficult to use in Business, because they often are not able to explain why a machine decision has been made (see explainable AI).
Consequently, the field of Explainable AI is recently gaining international awareness and interest (see the news blog), because raising legal, ethical, and social aspects make it mandatory to enable – on request – a human to understand and to explain why a machine decision has been made [see Wikipedia on Explainable Artificial Intelligence]. Note: that does not mean that it is always necessary to explain everything and all – but to be able to explain it if necessary – e.g. for general understanding, for teaching, for learning, for research – or in court – or even on demand by a citizen – right of explanabiltiy.


Research students of  Computer Science who are interested in knowledge discovery/data mining by following the idea of iML-approaches, i.e. human-in-the-loop learning systems. This is a cross-disciplinary computer science topic and highly relevant for the application in complex domains, such as health, biomedicine, paleontology, biology and in safety critical domains e.g. cyberdefense.


If you need a statistics/probability refresher go to the Mini-Course MAKE-Decisions and review the statistics/probability primer:

Module 00 – Primer on Probability and Information Science (optional)

Keywords: probability, data, information, entropy measures

Topic 00: Mathematical Notations
Topic 01: Probability Distribution and Probability Density
Topic 02: Expectation and Expected Utility Theory
Topic 03: Joint Probability and Conditional Probability
Topic 04: Independent and Identically Distributed Data IIDD
Topic 05: Bayes and Laplace
Topic 06: Measuring Information: Kullback-Leibler Divergence and Entropy

Lecture slides 2×2 (10,300 kB): contact lecturer for slide set

Recommened Reading for students:
David J.C. Mackay 2003. Information theory, inference and learning algorithms, Boston (MA), Cambridge University Press.
Online available:
Slides online available:

Module 01 – Introduction

Keywords: HCI-KDD approach, integrative AI/ML, complexity, automatic ML, interactive ML

Topic 00: Reflection – follow up from Module 0 – dealing with probability and information
Topic 01: The HCI-KDD approach: Towards an integrative AI/ML ecosystem
Topic 02: The complexity of the application area health informatics
Topic 03: Probabilistic information
Topic 04: Automatic ML
Topic 05: Interactive ML
Topic 06: From interactive ML to explainable AI

Lecture slides 2×2 (26,755 kB): contact lecturer for slide set

Module 02 – Decision Making and Decision Support

Keywords: information, decision, action

Topic 00: Reflection – follow up from Module 1 – introduction
Topic 01: Medical action = Decision making
Topic 02: The underlying principles of intelligence and cognition
Topic 03: Human vs. Computer
Topic 04: Human Information Processing
Topic 05: Probabilistic decision theory
Topic 06: The problem of understanding context

Lecture slides 2×2 (31,120 kB): contact lecturer for slide set

Module 03 – From Expert Sytems to Explainable AI

Topic 00: Reflection – follow up from Module 02
Topic 01: Decision Support Systems (DSS)
Topic 02: Computers help making better decisions?
Topic 03: History of DSS = History of AI
Topic 04: Example: Towards Precision Medicine
Topic 05: Example: Case based Reasoning (CBR)
Topic 06: A few principles of causality

Lecture slides 2×2 (27,177 kB): contact lecturer for slide set

Module 04 – Overview of Explanation Methods and Transparence Algorithms


Topic 00: Reflection – follow up from Module 3
Topic 01: Global vs. local explainability
Topic 02: Ante-hoc vs. Post-hoc interpretability
Topic 03: Ante-hoc: GAM, S-AOG, Hybrid models, iML
Topic 04: Post-hoc: LIME, BETA, LRP
Topic 05: Making neural networks transparent
Topic 06: Explanation Interfaces

Lecture slides 2×2 (33,887 kB): contact lecturer for slide set

Module 05 – Selected Methods of explainable-AI (incomplete!)


Topic 00: Reflection – follow up from Module 4
Topic 01: LIME (Local Interpretable Model Agnostic Explanations) – Ribeiro et al. (2016)
Topic 02: BETA (Black Box Explanation through Transparent Approximation) – Lakkaraju et al. (2017)
Topic 03: LRP (Layer-wise Relevance Propagation) – Bach et al. (2015)
Topic 04: Deep Taylor Decomposition – Montavon et al. (2017)
Topic 05: Prediction Difference Analysis – Zintgraf et al. (2017)

Lecture slides 2×2 (15,521 kB): contact lecturer for slide set

Module 06 – More Selected Methods of explainable-AI (more incomplete!)

Topic 00: Reflection – follow up from Module 5
Topic 01: Visualizing Convolutional Neural Nets with Deconvolution – Zeiler & Fergus (2014)
Topic 02: Inverting Convolutional Neural Networks – Mahendran & Vedaldi (2015)
Topic 03: Guided Backpropagation – Springenberg et al. (2015)
Topic 04: Deep Generator Networks – Nguyen et al. (2016)
Topic 05: Testing with Concept Activation Vectors (TCAV)  – Kim et al. (2018)

Lecture slides 2×2 (11,944 kB): contact lecturer for slide set

Module 07 – Even More Selected Methods of explainable-AI (even more incomplete!)

Topic 00: Reflection – follow up from Module 6
Topic 01: Visualizing Convolutional Neural Nets with Deconvolution – Zeiler & Fergus (2014)
Topic 02: Inverting Convolutional Neural Networks – Mahendran & Vedaldi (2015)
Topic 03: Guided Backpropagation – Springenberg et al. (2015)
Topic 04:
Topic 05: Interactive Machine Learning with the human-in-the-loop – Holzinger et al. (2017)

Lecture slides 2×2 (16,111 kB): contact lecturer for slide set

Module TE – Testing and Evaluation of Machine Learning Algorithms

Keywords: performance, metrics, error, accuracy

Topic 01: Test data and training data quality
Topic 02: Performance measures (confusion matrix, ROC, AOC)
Topic 03: Hypothesis testing and estimating
Topic 04: Comparision of machine learning algorithms
Topic 05: “There-is-no-free-lunch” theorem
Topic 06: Measuring beyond accuracy (simplicity, scalability, interpretability, learnability, …)

Lecture slides 2×2 (12,756 kB): contact lecturer for slide set

Reading for students:

Module HC – Methods for getting Insight into Human Intelligence

Keywords: performance, metrics, error, accuracy

Topic 01: Fundamentals: What is a good explanation?
Topic 02: Fundamental biometric technologies: 2D/3D cameras, eye-tracking, heart sensors
Topic 03: Advanced biometric technologies: EMG/ECG/EOG/PPG/GSR
Topic 04: Thinking-Aloud Technique
Topic 05: Microphone/Infrared sensor arrays
Topic 06: Affective computing: towards explanation user interfaces

Lecture slides 2×2 (17,111 kB): contact lecturer for slide set

Reading for students:

José Hernández-Orallo 2017. The measure of all minds: evaluating natural and artificial intelligence, Cambridge University Press, doi:10.1017/9781316594179. Book Website:

Andrew T. Duchowski 2017. Eye tracking methodology: Theory and practice. Third Edition, Cham, Springer, doi:10.1007/978-3-319-57883-5.

Module  – Vision

Keywords: human vision, visual system, seeing, perceiving, visual cognition

Topic 01: Visual attention
Topic 02: Visual Psychophysics
Topic 03: Visual Search
Topic 04: Attentive User Interfaces and Usability
Topic 05: Visual Analytics

Lecture slides 2×2 (8,776 kB): contact lecturer for slide set

Reading for students:


The content includes but is not limited to:

  • Performance vs Interpretability
  • Motivation for transparent AI, re-traceablility, understandability, interpretability, explainability
  • Motivation use cases from the medical domain
  • Early AI = interpretable: examples of early expert systems from the medical domain (MYCIN, GAMUTS in Radiology, …)
  • Medical Expert Systems > Intelligent Tutoring Systems
  • Physics > Physiology > Psychology (signal processing > perception > cognition)
  • What is interpretable by a human? What is not interpretable by a human?
  • Global vs. local explainability
  • Explainability – Interpretability – Understandability
  • Ante-hoc vs. Post-hoc interpretability
  • Model interpretability vs. Data interpretability (Focusing on data vs. focusing on model)
  • Compliance to new Legislation (European Data Protection Regulation – right of explanation,
    Note: This is particularly important for young start-up companies in AI and Machine Learning)
  • Techniques of Explainability (related work)
  • Explainable Medicine: Ensemble view vs. patients view (stakeholder view)
  • Transparency and trust – (how) can we measure trust?
  • Making Deep Neural Networks transparent
  • Activation Maximization Method
  • POST-HOC (… selecting an appropriate model for the given problem and develop a special technique to interpret it)
    • LIME (Local Interpretable Model Agnostic Explanations),
    • BETA (Black Box Explanation through Transparent Approximation),
    • LRP (Layer-wise Relevance Propagation)
  • ANTE-HOC (… selecting a model that is already explainable and train it to the problem)
    • iML (interactive Machine Learning with the human-in-the-loop),
    • GAM (Generalized Additive Models),
    • Stochastic-AOG (Stochastic AND-OR-Templates / Graphs)
    • Hybrid models
    • Deep Symbolic Networks (Deep learning transparency)
  • Example: interactive Machine Learning with the Human-in-the-Loop
  • Example: Interpretable Deep Learning Model (making neural networks transparent)
  • Examples for “explainable user interfaces”

References from own work (references to related work will be given within the course):

[1]          Andreas Holzinger, Chris Biemann, Constantinos S. Pattichis & Douglas B. Kell (2017). What do we need to build explainable AI systems for the medical domain? arXiv:1712.09923.

[2]         Andreas Holzinger, Bernd Malle, Peter Kieseberg, Peter M. Roth, Heimo Müller, Robert Reihs & Kurt Zatloukal (2017). Towards the Augmented Pathologist: Challenges of Explainable-AI in Digital Pathology. arXiv:1712.06657.

[3]         Andreas Holzinger, Markus Plass, Katharina Holzinger, Gloria Cerasela Crisan, Camelia-M. Pintea & Vasile Palade (2017). A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop. arXiv:1708.01104.

[4]          Andreas Holzinger (2016). Interactive Machine Learning for Health Informatics: When do we need the human-in-the-loop? Brain Informatics, 3, (2), 119-131, doi:10.1007/s40708-016-0042-6.

[5]          Andreas Holzinger (2018). Explainable AI (ex-AI). Informatik-Spektrum, 41, (2), 138-143, doi:10.1007/s00287-018-1102-5.

[6]          Katharina Holzinger, Klaus Mak, Peter Kieseberg & Andreas Holzinger 2018. Can we trust Machine Learning Results? Artificial Intelligence in Safety-Critical decision Support. ERCIM News, 112, (1), 42-43.

Mini Glossary:

Ante-hoc Explainability (AHE) := such models are interpretable by design, e.g. glass-box approaches; typical examples include linear regression, decision trees/lists, random forests, Naive Bayes and fuzzy inference systems; or GAMs, Stochastic AOGs, and deep symbolic networks; they have a long tradition and can be designed from expert knowledge or from data and are useful as framework for the interaction between human knowledge and hidden knowledge in the data.

BETA := Black Box Explanation through Transparent Approximation, developed by Lakkarju, Bach & Leskovec (2016) it learns two-level decision sets, where each rule explains the model behaviour.

Explainability := motivated by the opaqueness of so called “black-box” approaches it is the ability to provide an explanation on why a machine decision has been reached (e.g. why is it a cat what the deep network recognized). Finding an appropriate explanation is difficult, because this needs understanding the context and providing a description of causality and consequences of a given fact. (German: Erklärbarkeit; siehe auch: Verstehbarkeit, Nachvollziehbarkeit, Zurückverfolgbarkeit, Transparenz)

Explanation := set of statements to describe a given set of facts to clarify causality, context and consequences thereof and is a core topic of knowledge discovery involving “why” questionss (“Why is this a cat?”). (German: Erklärung, Begründung)

Explanatory power := is the ability of a set hypothesis to effectively explain the subject matter it pertains to (opposite: explanatory impotence).

European General Data Protection Regulation (EU GDPR) :=  Regulation EU 2016/679 – see the EUR-Lex 32016R0679 , will make black-box approaches difficult to use, because they often are not able to explain why a decision has been made (see explainable AI).

Interactive Machine Learning (iML) := machine learning algorithms which can interact with – partly human – agents and can optimize its learning behaviour trough this interaction. Holzinger, A. 2016. Interactive Machine Learning for Health Informatics: When do we need the human-in-the-loop? Brain Informatics (BRIN), 3, (2), 119-131.

Inverse Probability:= an older term for the probability distribution of an unobserved variable, and was described by De Morgan 1837, in reference to Laplace’s (1774) method of probability.

Post-hoc Explainability (PHE) := such models are designed for interpreting black-box models and provide local explanations for a specific decision and re-enact on request, typical examples include LIME, BETA, LRP, or Local Gradient Explanation Vectors, prediction decomposition or simply feature selection.

Preference learning (PL) := concerns problems in learning to rank, i.e. learning a predictive preference model from observed preference information, e.g. with label ranking, instance ranking, or object ranking.  Fürnkranz, J., Hüllermeier, E., Cheng, W. & Park, S.-H. 2012. Preference-based reinforcement learning: a formal framework and a policy iteration algorithm. Machine Learning, 89, (1-2), 123-156.

Multi-Agent Systems (MAS) := include collections of several independent agents, could also be a mixture of computer agents and human agents. An exellent pointer of the later one is: Jennings, N. R., Moreau, L., Nicholson, D., Ramchurn, S. D., Roberts, S., Rodden, T. & Rogers, A. 2014. On human-agent collectives. Communications of the ACM, 80-88.

Transfer Learning (TL) := The ability of an algorithm to recognize and apply knowledge and skills learned in previous tasks to
novel tasks or new domains, which share some commonality. Central question: Given a target task, how do we identify the
commonality between the task and previous tasks, and transfer the knowledge from the previous tasks to the target one?
Pan, S. J. & Yang, Q. 2010. A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering, 22, (10), 1345-1359, doi:10.1109/tkde.2009.191.


Time Line of relevant events for interactive Machine Learning (iML):

According to John Launchbury (watch his excellent video on Youtube) from DARPA there can be determined three waves of AI:

Wave 1: Handcrafted Knowledge – enables reasoning, explainability and re-traceabilty over narrowly defined problems; this is what we call classic AI, and consists for mostly rule-based systems.
Wave 2: Statistical Learning – black-box models, with no contextual capability and minimal reasoning ability needs big data; the fact that probabilsitic learning models can cope with stochastic and non-deterministic problems.
Wave 3: Contextual Adaptation – sytems may contruct contextual explanatory models for classes of real world phenomena and glass-box models allow to re-enact on decisons, to answer the question on “why” a machine decision has been reached. This is what humans can do very well – and thus, if we want to excel in this area we have to understand the underlying principles of intelligence.

Terms (incomplete)

Dimension = n attributes which jointly describe a property.

Features = any measurements, attributes or traits representing the data. Features are key for learning and understanding. Andrew Ng emphasizes that machine learning is mostly feature engineering.

Reals = numbers expressible as finite/infinite decimals.

Regression = predicting the value of a random variable y from a measurement x.

Reinforcement learning = adaptive control, i.e. to learn how to (re-)act in a given environment, given delayed/ non-deterministic rewards.  Human learning is mostly reinforcement learning.

Historic People (incomplete)

Bayes, Thomas (1702-1761) gave a straightforward definition of probability [Wikipedia]

Laplace, Pierre-Simon, Marquis de (1749-1827) developed the Bayesian interpretation of probability [Wikipedia]

Price, Richard (1723-1791) edited and commented the work of Thomas Bayes in 1763 [Wikipedia]

Tukey, John Wilder (1915-2000) suggested in 1962 together with Frederick Mosteller the name “data analysis” for computational statistical sciences, which became much later the name data science [Wikipedia]

Antonyms (incomplete)

big data sets < > small data sets

certain <> uncertain

correlation < > causality

comprehensible < > incomprehensible

confident <> doubtful

discriminative < > generative

explainable <> obscure

Frequentist < > Bayesian

Independent identical distributed data (IID-Data) <>non independent identical distributed data (non-IID)

intelligible <> unintelligible

legitimate <> illegitimate

low dimensional < > high dimensional

underfitting < > overfitting

parametric < > non-parametric

reliable <> unreliable

supervised < > unsupervised

sure <> unsure

transparent < > opaque

trustworthy <> untrustworthy

truthful <> untruthful