R / R News / Rstats

Companies Are Demanding Model Interpretability. Here’s How To Do It Right.

by Adrian Antico · April 11, 2019

This article is originally published at https://www.remixinstitute.com

There are plenty of methods out there for machine learning model interpretability with fancy names but what is lacking in all of them?

ICE and Partial Dependence Plots in LIME fail to tell me the accuracy surrounding the fitted relationship. Further, ICE doesn’t exactly tell me the probability of each of the lines occurring. Your model can overfit or underfit your data pretty easily, especially if you are using deep learning models. LIME (should be called LAME) fails to tell me how the model actually performs.

Imagine you are working on a price elasticity model that will guide pricing decisions. Currently you would show the relationship that the model was able to fit. Given that we will be using a model to guide pricing decisions, a sensible stakeholder might ask, “I see the relationship that your model fit, but how do I know that corresponds to the actual relationship?”

What do you do? Give the stakeholder some model accuracy metrics? Tell them that you used deep learning so they should just trust it because it is state-of-the-art technology?

Here is a simple solution to the shortfall of partial dependence plots: Use calibration on your predicted relationship. It’s that simple. Below is an example plot from the RemixAutoML package in R. The x-axis is the independent variable of interest. The spacing between ticks are based on percentiles of the distribution of the independent variable. What that means is that, across the x-axis, the data is uniformly distributed, so no need for the dashes as shown above in the ICE chart. Secondly, we can see the relationship of the independent variable as it relates to the target variable, as does the partial dependence plots, but we can also see how good a fit the model has across the range of the independent variable. This addresses the skepticism from your stakeholders about the accuracy of your predictions. If you want to see the variability of your predictions, use the boxplot version too. If you want to see the relationship for specific group, simply subset your data so only that group of interest is included, and rerun the function.

Partial Dependence Calibration Plots from RemixAutoML package in R

 #######################################################
# Create data to simulate validation data with predicted values
#######################################################
# Correl: This is the correlation used to determine how correlated the variables are to 
# the target variable. Switch it up (between 0 and 1) to see how the charts below change.
Correl <- 0.85
data <- data.table::data.table(Target = runif(1000))
#  Mock independent variables - they are correlated variables with 
# various transformations so you can see different kinds of relationships 
# in the charts below
# Helper columns for creating simulated variables 
data[, x1 := qnorm(Target)]
data[, x2 := runif(1000)]
# Create one variable at a time
data[, Independent_Variable1 := log(pnorm(Correl * x1 +
                        	sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable2 := (pnorm(Correl * x1 +
                                   	sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable3 := exp(pnorm(Correl * x1 +
                                      	sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable4 := exp(exp(pnorm(Correl * x1 +
                                          	sqrt(1-Correl^2) * qnorm(x2))))]
data[, Independent_Variable5 := sqrt(pnorm(Correl * x1 +
                                       	sqrt(1-Correl^2) * qnorm(x2)))]
data[, Independent_Variable6 := (pnorm(Correl * x1 +
                                   	sqrt(1-Correl^2) * qnorm(x2)))^0.10]
data[, Independent_Variable7 := (pnorm(Correl * x1 +
                                   	sqrt(1-Correl^2) * qnorm(x2)))^0.25]
data[, Independent_Variable8 := (pnorm(Correl * x1 +
                                   	sqrt(1-Correl^2) * qnorm(x2)))^0.75]
data[, Independent_Variable9 := (pnorm(Correl * x1 +
                                   	sqrt(1-Correl^2) * qnorm(x2)))^2]
data[, Independent_Variable10 := (pnorm(Correl * x1 +
                                    	sqrt(1-Correl^2) * qnorm(x2)))^4]
data[, Independent_Variable11 := ifelse(Independent_Variable2 < 0.20, "A",
                                  	ifelse(Independent_Variable2 < 0.40, "B",
                                         	ifelse(Independent_Variable2 < 0.6, "C",
                                                	ifelse(Independent_Variable2 < 0.8, "D", "E"))))]
# We’ll use this as a mock predicted value
data[, Predict := (pnorm(Correl * x1 +
                     	sqrt(1-Correl^2) * qnorm(x2)))]
# Remove the helper columns
data[, ':=' (x1 = NULL, x2 = NULL)]
# In the ParDepCalPlot() function below, note the Function argument - 
# we are using mean() to aggregate our values but you 
# can use quantile(x, probs = y) for quantile regression
# Partial Dependence Calibration Plot: 
p1 <- RemixAutoML::ParDepCalPlots(data,
                              	PredictionColName = "Predict",
                              	TargetColName = "Target",
                              	IndepVar = "Independent_Variable1",
                              	GraphType = "calibration",
                              	PercentileBucket = 0.05,
                              	FactLevels = 10,
                              	Function = function(x) mean(x, na.rm = TRUE))
# Partial Dependence Calibration BoxPlot:  note the GraphType argument
p2 <- RemixAutoML::ParDepCalPlots(data,
                              	PredictionColName = "Predict",
                              	TargetColName = "Target",
                              	IndepVar = "Independent_Variable1",
                              	GraphType = "boxplot",
                              	PercentileBucket = 0.05,
                              	FactLevels = 10,
                              	Function = function(x) mean(x, na.rm = TRUE))
# Partial Dependence Calibration Plot: 
p3 <- RemixAutoML::ParDepCalPlots(data,
                                   PredictionColName = "Predict",
                                   TargetColName = "Target",
                                   IndepVar = "Independent_Variable4",
                                   GraphType = "calibration",
                                   PercentileBucket = 0.05,
                                   FactLevels = 10,
                                   Function = function(x) mean(x, na.rm = TRUE))
# Partial Dependence Calibration BoxPlot for factor variables: 
p4 <- RemixAutoML::ParDepCalPlots(data,
                                   PredictionColName = "Predict",
                                   TargetColName  = "Target",
                                   IndepVar = "Independent_Variable11",
                                   GraphType = "calibration",
                                   PercentileBucket = 0.05,
                                   FactLevels = 10,
                                   Function = function(x) mean(x, na.rm = TRUE))
# Plot all the individual graphs in a single pane
RemixAutoML::multiplot(plotlist = list(p1,p2,p3,p4), cols = 2)

The post Companies Are Demanding Model Interpretability. Here’s How To Do It Right. first appeared on Remix Institute.

The post Companies Are Demanding Model Interpretability. Here’s How To Do It Right. appeared first on Remix Institute.

Thanks for visiting r-craft.org
This article is originally published at https://www.remixinstitute.com
Please visit source website for post related comments.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Companies Are Demanding Model Interpretability. Here’s How To Do It Right.

You may also like...

Categories

Companies Are Demanding Model Interpretability. Here’s How To Do It Right.

You may also like...

ggplot2 3.2.0

sparklyr 1.0: Apache Arrow, XGBoost, Broom and TFRecords

Why R? Webinar – rZYPAD – Development pipeline for R production

Categories