With the GDPR’s implementation date looming, there has been much discussion about whether the regulation requires a “right to an explanation” from machine learning models.

Regardless of the regulation’s effects on machine learning, however, the practical implications of attempting to explain machine learning models presents significant difficulties. These difficulties will become an increasing focus for privacy professionals as machine learning is deployed more and more throughout organizations in the future. The GDPR, in short, will not be the last law to address this issue.

This brief guide is meant to help privacy professionals tackle these complex issues and to understand what is and isn’t possible when seeking to open up the so-called “black box” of machine learning models.

What is machine learning?
Machine learning is a technique that allows algorithms to extract correlations from data with minimal supervision. The goals of machine learning can be quite varied, but they often involve trying to maximize the accuracy of an algorithm’s prediction. In machine learning parlance, a particular algorithm is often called a “model,” and these models take data as input and output a particular prediction. For example, the input data could be a customer’s shopping history and the output could be products that customer is likely to buy in the future. The model makes accurate predictions by attempting to change its internal parameters — the various ways it combines the input data — to maximize its predictive accuracy. These models may have relatively few parameters, or they may have millions that interact in complex, unanticipated ways. As computing power has increased over the last few decades, data scientists have discovered new ways to quickly train these models. As a result, the number — and power — of complex models with thousands or millions of parameters has vastly increased. These types of models are becoming easier to use, even for non-data scientists, and as a result, they might be coming to an organization near you. 

The accuracy vs. interpretability tradeoff 
There is a fundamental tradeoff between the predictive accuracy of a model and how easy the model is to interpret. A simple linear regression model — one of the most common, simplistic models — may have a few parameters and be easy to interpret. For this reason, these types of models are frequently favored in financial institutions, where legal requirements mandate certain decisions be explained (like adverse credit decisions). A linear regression model, however, may not have sufficient predictive power for a particular use case. By contrast, a large neural network with millions of parameters will often be much better at predicting outcomes. But this increased power comes at the cost of inscrutability. The model will react to input data in numerous ways as it makes its way through the neural network, raising the question of what actually causes the model to make one prediction or decision over another. The Department of Defense’s “Explainable AI” project captures this tradeoff in the graph below, listing a variety of machine learning techniques along with their levels of explainability.

Current explainability techniques
Despite these difficulties, all is not lost in trying to understand how powerful machine learning techniques work. There is a rapidly growing literature on techniques to interpret more complex machine learning models like neural networks. For example, Local Interpretable Model-Agnostic Explanations is one approach that attempts to determine the most salient features of a model — or key factors driving any one decision — by feeding inputs similar to the original ones through the model and observing how the predictions change. This has the benefit of giving simple explanations, such as whether a particular word in a document or shape in a photo is driving the model’s predictions. Other methods of explaining complex models, such as DeepLIFT, peer into the inner workings of a neural network to extract the parameters that are driving the model’s output. SHAP, a recent addition to this set of methods, attempts to unify these prior attempts at interpreting model output.

Limitations of model interpretability
Despite its promises, advances in interpretability all have significant limitations. In each case, these explanatory models can only explain decisions that have already been made, so they cannot help much in trying to understand the model as a whole or predict how it will act before deployment. In addition, some tasks more easily lend themselves to being interpreted using these methods. In image recognition, for example, pulling out the parts of an image that drive the model’s predictions makes sense because humans easily recognize whether an image is a frog or a car or a horse. But what if a complex model is routing internet traffic in a data center? Or what if a model is controlling an electrical grid or predicting the weather? The patterns in these cases are much less intuitive to humans than language or images. And a fundamental premise of many of these interpretability methods is that humans can understand the parts of the input that are driving the model’s output. If the data itself is not interpretable by humans, then these methods may only go so far.

Bias in the data
In some cases, models that seem to perform well may actually be picking up some sort of noise in the data. For example, in the paper from which LIME originated, the authors demonstrated that a neural network designed to differentiate between huskies and wolves was not learning any anatomical differences. Instead, it learned that wolves are more likely to be in pictures with snow than huskies are. This drove a significant amount of the model’s predictions, but was not actually relevant to the task of telling two similar animals apart. This example should urge caution: In real-world applications, misguided correlations could be something that human users cannot readily understand or interpret. Here, the model was relying on the presence of snow; but in other examples, models could be looking at gender, race, or other sensitive features that could create a host moral, ethical, and legal problems. 

Legal barriers to interpreting models 
In some cases, there may be legal barriers to performing this type of interpretability analysis on models as well. For example, in the United States, the Supreme Court of Wisconsin ruled against a prisoner who challenged the use of a proprietary algorithm that informed the judge about the defendant’s potential future criminality (here’s a link to the full case). The court ruled that the prisoner did not have a right to inspect the inner workings of the algorithm due to intellectual property constraints. While this may be a more extreme case than in a typical business scenario, lawyers and engineers may need to be wary about third-party vendors that do not allow inspection of their models because of trade secrets or confidentiality clauses.

As companies of all sizes rely more heavily on complex models, and as the use of machine learning expands, the impetus to understand these models will only grow. Some of this impetus will be legal. Some of the impetus will be prudential, such as ensuring that models are actually capturing what they should be.

But in all cases, privacy pros will come under increasing pressure to understand or attempt to understand how these models are working. Even the least technical among us may be required to understand model explainability — and its limits — sooner than we think.