This article will attempt to define interpretability in the context of machine learning. Then, we will highlight the importance of machine learning interpretability and explain why it has gained popularity recently. Further, a taxonomy of machine learning interpretability techniques will be provided. Moreover, an evaluation schema of machine learning interpretability will be presented, and finally, we will discuss the properties of the explanations humans tend to prefer.
Although the literature on machine learning interpretability has been increasing the past few years, no consensus has been reached yet on its definition. Miller (2017) defines interpretability as the degree to which a human can understand the cause of a decision. Kim (2016) on the other hand, defines interpretability as the degree to which a human can consistently predict a model’s output.
In this article we will adopt the following definition: a model is interpretable if its decisions can be easily understood by humans. In other words, a model M1 is more interpretable than a model M2, if the decisions taken by M1 are easier to understand.
The mere high performance of a machine learning model is not enough to trust its decisions. In fact, as it was pointed out by Dosh-Velez (2017): The problem is that a single metric, such as classification accuracy, is an incomplete description of most real-world tasks. This is especially true when the machine learning model is applied in a critical domain (i.e., domain of application where an error committed by the model could lead to severe consequences). If wrong decisions have severe impacts, then interpretability is a must.
Moreover, the need for interpretable machine learning solutions has now been translated to several Artificial Intelligence related regulations passed by countries around the world. The European Union’s General Data Protection Regulation (GDPR) for example, requires among other things, the transparency of all algorithmic decisions (Principle of Transparency) .
Nonetheless, there are still cases where interpretability is not a requirement. These cases are either when the model is used in a low-risk environment, i.e., an error will not have serious consequences, (e.g. a movie recommender system) or the technique has already been extensively studied and examined (e.g. optical character recognition). According to Doshi-Velez and Kim (2017) the need for interpretability is due to an incompleteness in problem formalization, that is to say that for certain problems it is not enough to get the prediction (the what). The model must also explain how it came to the prediction (the why), because a correct prediction only partially solves the problem.
Finally, and according to the same authors, the interpretability of machine learning models makes the evaluation of the following properties easier:
These methods and techniques can be categorized into four categories: model-agnostic, model-specific, global-interpretability, local-interpretability. The first refer to the set of interpretability techniques that can be applied to any machine learning black-box model, the second to those that can only be applied to one model. The third interpretability methods areconcerned with the overall behavior of the model, and finally the fourth category are methods that are concerned with justifying only one prediction of the model.
A more detailed overview of the categories of machine learning interpretability is provided in the following taxonomy taken from the paper Explainable AI: A Review of Machine Learning Interpretability Methods:
Figure 1: Taxonomy of interpretability methods
No consensus exists on how to evaluate the interpretability of a machine learning model. Nonetheless, researchers have made initial attempts to formulate some approaches for evaluation. Doshi-Velez and Kim (2017) propose three main levels for the evaluation of interpretability:
Psychology and cognitive science can be of great help when it comes to discovering what humans consider to constitute a “good explanation”. Indeed, Miller (2017) has carried out a huge survey of publications on explanations, and this subsection builds on his summary.
This article broadly introduced the concept of machine learning interpretability. We saw its definitions, importance, taxonomy, and evaluation methods. In the next article, we will attempt to do the same with fuzzy logic. And in the third article, we will see how fuzzy logic can benefit machine learning interpretability.
Christoph Molnar, Interpretable Machine Learning
Miller, Tim. “Explanation in artificial intelligence: Insights from the social sciences.” arXiv Preprint arXiv:1706.07269. (2017)
Kim, Been, Rajiv Khanna, and Oluwasanmi O. Koyejo. “Examples are not enough, learn to criticize! Criticism for interpretability.” Advances in Neural Information Processing Systems (2016)
Doshi-Velez, Finale, and Been Kim. “Towards a rigorous science of interpretable machine learning,” no. Ml: 1–13. http://arxiv.org/abs/1702.08608 ( 2017)
Transparent information, communication and modalities for the exercise of the rights of the data subject
Lipton, Peter. “Contrastive explanation.” Royal Institute of Philosophy Supplements 27 (1990): 247–266
Nickerson, Raymond S. “Confirmation Bias: A ubiquitous phenomenon in many guises.” Review of General Psychology 2 (2). Educational Publishing Foundation: 175. (1998)