In one of my previous posts I discussed how random forests can be turned into a “white box”, such that each prediction is decomposed into a sum of contributions from each feature i.e. (prediction = bias + feature_1 contribution + … + feature_n contribution).

I’ve a had quite a few requests for code to do this. Unfortunately, most random libraries (including scikit-learn) don’t expose tree paths of predictions. The implementation for sklearn required a hacky patch for exposing the paths. Fortunately, since, scikit-learn has two additions in the API that make this relatively straightforward: obtaining leaf node_ids for predictions, and storing all intermediate values in all nodes in decision trees, not only leaf nodes. Combining these, it is possible to extract the prediction paths for each individual prediction and decompose the predictions via inspecting the paths.

Without further ado, the code is available at github, and also via pip install treeinterpreter

Note: this requires scikit-learn 0.17, which is still in development. You can check how to install it at

Decomposing random forest predictions with treeinterpreter

Let’s take a sample dataset, train a random forest model, predict some values on the test set and then decompose the predictions.

from treeinterpreter import treeinterpreter as ti
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
import numpy as np

from sklearn.datasets import load_boston
boston = load_boston()
rf = RandomForestRegressor()[:300],[:300])

Lets pick two arbitrary data points that yield different price estimates from the model.

instances =[[300, 309]]
print "Instance 0 prediction:", rf.predict(instances[0])
print "Instance 1 prediction:", rf.predict(instances[1])

Instance 0 prediction: [ 30.76]
Instance 1 prediction: [ 2.41]

Predictions that the random forest model made for the two data points are quite different. But why? We can now decompose the predictions into the bias term (which is just the trainset mean) and individual feature contributions, so we see which features contributed to the difference and by how much.

We can simply the treeinterpreter predict method with the model and the data.

prediction, bias, contributions = ti.predict(rf, instances)

Printint out the results:

for i in range(len(instances)):
    print "Instance", i
    print "Bias (trainset mean)", biases[i]
    print "Feature contributions:"
    for c, feature in sorted(zip(contributions[i], 
                             key=lambda x: -abs(x[0])):
        print feature, round(c, 2)
    print "-"*20  

Instance 0
Bias (trainset mean) 25.249333333
Feature contributions:
RM 2.73
LSTAT 1.71
ZN 1.04
DIS -0.7
B -0.39
TAX -0.19
CRIM -0.13
RAD 0.11
INDUS 0.06
AGE -0.02
NOX -0.01
CHAS 0.0
Instance 1
Bias (trainset mean) 25.2849333333
Feature contributions:
RM -4.88
LSTAT 2.38
DIS 0.32
AGE -0.28
TAX -0.23
CRIM 0.16
B -0.15
INDUS -0.14
CHAS -0.1
ZN -0.05
NOX -0.05
RAD -0.02

The feature contributions are sorted by their absolute impact. We can see that in the first instance where the prediction was high, most of the positive contributions came from RM, LSTAT and PTRATIO feaures. On the second instance the predicted value is much lower, since RM feature actually has a very large negative impact that is not offset by the positive impact of other features, thus taking the prediction below the dataset mean.

But is the decomposition actually correct? This is easy to check: bias and contributions need to sum up to the predicted value:

print prediction
print biases + np.sum(contributions, axis=1)

[ 30.76 22.41]
[ 30.76 22.41]

Note that when summing up the contributions, we are dealing with floating point numbers so the values can slightly different due to rounding errors

Comparing too datasets

One use case where this approach can be very useful is when comparing two datasets. For example

  • Understanding the exact reasons why estimated values are different on two datasets, for example what contributes to estimated house prices being different in two neighborhoods.
  • Debugging models and/or data, for example understanding why average predicted values on newer data do not match the results seen on older data.

For this example, let’s split the remaining housing price data into two test datasets and compute the average estimated prices for them.

ds1 =[300:400]
ds2 =[400:]

print np.mean(rf.predict(ds1))
print np.mean(rf.predict(ds2))


We can see that the average predicted prices for the houses in the two datasets are quite different. We can now trivially break down the contributors to this difference: which features contribute to this different and by how much.

prediction1, bias1, contributions1 = ti.predict(rf, ds1)
prediction2, bias2, contributions2 = ti.predict(rf, ds2)

We can now calculate the mean contribution of each feature to the difference.

totalc1 = np.mean(contributions1, axis=0) 
totalc2 = np.mean(contributions2, axis=0) 

Since biases are equal for both datasets (because the training data for the model was the same), the difference between the average predicted values has to come only from feature contributions. In other words, the sum of the feature contribution differences should be equal to the difference in average prediction, which we can trivially check to be the case

print np.sum(totalc1 - totalc2)
print np.mean(prediction1) - np.mean(prediction2)


Finally, we can just print out the differences of the contributions in the two datasets. The sum of these is exactly the difference between the average predictions.

for c, feature in sorted(zip(totalc1 - totalc2, 
                             boston.feature_names), reverse=True):
    print feature, round(c, 2)

CRIM 0.5
RM 0.5
AGE 0.
NOX 0.03
B 0.01
CHAS -0.01
ZN -0.02
RAD -0.03
INDUS -0.03
TAX -0.08
DIS -0.14

Classification trees and forests

Exactly the same method can be used for classification trees, where features contribute to the estimated probability of a given class.
We can see this on the sample iris dataset.

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
iris = load_iris()

rf = RandomForestClassifier(max_depth = 4)
idx = range(len(

Let’s predict now for a single instance.

instance =[idx][100:101]
print rf.predict_proba(instance)

Breakdown of feature contributions:

prediction, bias, contributions = ti.predict(rf, instance)
print "Prediction", prediction
print "Bias (trainset prior)", bias
print "Feature contributions:"
for c, feature in zip(contributions[0], 
    print feature, c

Prediction [[ 0. 0.9 0.1]]
Bias (trainset prior) [[ 0.36 0.262 0.378]]
Feature contributions:
sepal length (cm) [-0.1228614 0.07971035 0.04315104]
sepal width (cm) [ 0. -0.01352012 0.01352012]
petal length (cm) [-0.11716058 0.24709886 -0.12993828]
petal width (cm) [-0.11997802 0.32471091 -0.20473289]

We can see that the strongest contributors to predicting the second class were petal length and width, which had the larges impact on updating the prior.


Making random forest predictions interpretable is actually pretty straightforward, and leading to similar level of interpretability as linear models. With treeinterpreter (pip install treeinterpreter), this can be done with just a couple of lines of code.

Source link


Please enter your comment!
Please enter your name here