Transparency & Trust in Machine Learning: Making AI interpretable and explainable

A huge motivation for us in continuing to study interactive Machine Learning (iML) [1] – with a human in the loop [2] (see our project page) is that modern deep learning models are often considered to be “black-boxes” [3]. A further drawback is that such models have no explicit declarative knowledge representation, hence have difficulty in generating the required explanatory structures – which considerably limits the achievement of their full potential [4].

Even if we understand the mathematical theories behind the machine model it is still complicated to get insight into the internal working of that model, hence black box models are lacking transparency, consequently we raise the question: “Can we trust our results?”

In fact: “Can we explain how and why a result was achieved?” A classic example is the question “Which objects are similar?”, but an even more interesting question would be to answer “Why are those objects similar?”

We believe that there is growing demand in machine learning approaches, which are not only well performing, but transparent, interpretable and trustworthy. We are currently working on methods and models to reenact the machine decision-making process, to reproduce and to comprehend the learning and knowledge extraction process. This is important, because for decision support it is necessary to understand the causality of learned representations [5], [6]. If human intelligence is complemented by machine learning and at least in some cases even overruled, humans must still be able to understand, and most of all to be able to interactively influence the machine decision process. This needs context awareness and sensemaking to close the gap between human thinking and machine “thinking”.

A recent, and very interesting discussion with Daniel S. WELD (Artificial Intelligence, Crowdsourcing, Information Extraction) on Explainable AI can be found here:

The interview in essence brings out that most machine learning models are very complicated: deep neural networks operate incredibly quickly, considering thousands of possibilities in seconds before making decisions and Dan Weld points out: “The human brain simply can’t keep up” – and pointed at the example when AlphaGo made an unexpected decision: It is not possible to understand why the algorithm made exactly that choice. Of course this may not be critical in a game – no one gets hurt; however, deploying intelligent machines that we can not understand could set a dangerous precedent in e.g. in our domain: health informatics. According to Dan Weld, understanding and trusting machines is “the key problem to solve” in AI safety, security, data protection and privacy, and it is urgently necessary. He further explains, “Since machine learning is nowadays at the core of pretty much every AI success story, it’s really important for us to be able to understand what is it that the machine learned.” In case a machine learning system is confronted with a “known unknown,” it may recognize its uncertainty with the situation in the given context. However, when it encounters an unknown unknown, it won’t even recognize that this is an uncertain situation: the system will have extremely high confidence that its result is correct – but it still will be wrong, and Dan pointed on the example of classifiers “trained on data that had some regularity in it that’s not reflected in the real world” – which is a problem of having little data or even no available training data (see [1]) – the problem of “unknown unknowns” is definitely underestimated in the traditional AI community. Governments and businesses can’t afford to deploy highly intelligent AI systems that make unexpected, harmful decisions, especially if these systems are in safety critical environments.

A huge motivation for this approach are rising legal and privacy aspects, e.g. with the new European General Data Protection Regulation (GDPR and ISO/IEC 27001) entering into force on May, 25, 2018, will make black-box approaches difficult to use in business, because they are not able to explain why a decision has been made.

This will stimulate research in this area with the goal of making decisions interpretable, comprehensible and reproducible. On the example of health informatics this is not only useful for machine learning research, and for clinical decision making, but at the same time a big asset for the training of medical students.

The General Data Protection Regulation (GDPR) (Regulation (EU) 2016/679) is a regulation by which the European Parliament, the Council of the European Union and the European Commission intend to strengthen and unify data protection for all individuals within the European Union (EU). It also addresses the export of personal data outside of the European Union (this will affect data-centric projects between the EU and e.g. the US). The GDPR aims primarily to give control back to citizens and residents over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU. The GDPR replaces the data protection Directive 95/46/EC) of 1995. The regulation was adopted on 27 April 2016 and becomes enforceable from 25 May 2018 after now a two-year transition period and, unlike a directive, it does not require national governments to pass any enabling legislation, and is thus directly binding – which affects practically all data-driven businesses and particularly machine learning and AI technology Here to note is that the “right to be forgotten” [7] established by the European Court of Justice has been extended to become a “right of erasure”; it will no longer be sufficient to remove a person’s data from search results when requested to do so, data controllers must now erase that data. However, if the data is encrypted, it may be sufficient to destroy the encryption keys rather than go through the prolonged process of ensuring that the data has been fully erased [8].


[1]          Holzinger, A. 2016. Interactive Machine Learning for Health Informatics: When do we need the human-in-the-loop? Brain Informatics, 3, (2), 119-131, doi:10.1007/s40708-016-0042-6.

[2]          Holzinger, A., Plass, M., Holzinger, K., Crisan, G. C., Pintea, C.-M. & Palade, V. 2017. A glass-box interactive machine learning approach for solving NP-hard problems with the human-in-the-loop. arXiv:1708.01104.

[3]          Lipton, Z. C. 2016. The mythos of model interpretability. arXiv preprint arXiv:1606.03490.

[4]          Bologna, G. & Hayashi, Y. 2017. Characterization of Symbolic Rules Embedded in Deep DIMLP Networks: A Challenge to Transparency of Deep Learning. Journal of Artificial Intelligence and Soft Computing Research, 7, (4), 265-286, doi:10.1515/jaiscr-2017-0019.

[5]          Pearl, J. 2009. Causality: Models, Reasoning, and Inference (2nd Edition), Cambridge, Cambridge University Press.

[6]          Gershman, S. J., Horvitz, E. J. & Tenenbaum, J. B. 2015. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349, (6245), 273-278, doi:10.1126/science.aac6076.

[7]          Malle, B., Kieseberg, P., Schrittwieser, S. & Holzinger, A. 2016. Privacy Aware Machine Learning and the “Right to be Forgotten”. ERCIM News (special theme: machine learning), 107, (3), 22-23.

[8]          Kingston, J. 2017. Using artificial intelligence to support compliance with the general data protection regulation. Artificial Intelligence and Law, doi:10.1007/s10506-017-9206-9.


2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), New York, NY


Interpretable Machine Learning Workshop

Andrew G Wilson, Jason Yosinski, Patrice Simard, Rich Caruana, William Herlands


Journal “Artificial Intelligence and Law”

ISSN: 0924-8463 (Print) 1572-8382 (Online)


AI = Artificial Intelligence (today interchangeably used together with Machine learning (ML) – those are highly interrelated but not the same:

Causality = extends from Greek philosophy to todays neuropsychology; assumptions about the nature of causality may be shown to be functions of a previous event preceding a later event.

Explainability = fundamental topic within AI

Etiology = in medicine (many) factors coming together to cause an illness (see causality)