Advertisement

Interpreting Random Forest Classification Models Using a Feature Contribution Method

  • Anna Palczewska
  • Jan Palczewski
  • Richard Marchese Robinson
  • Daniel Neagu
Chapter
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 263)

Abstract

Model interpretation is one of the key aspects of the model evaluation process. The explanation of the relationship between model variables and outputs is relatively easy for statistical models, such as linear regressions, thanks to the availability of model parameters and their statistical significance . For “black box” models, such as random forest, this information is hidden inside the model structure. This work presents an approach for computing feature contributions for random forest classification models. It allows for the determination of the influence of each variable on the model prediction for an individual instance. By analysing feature contributions for a training dataset, the most significant variables can be determined and their typical contribution towards predictions made for individual classes, i.e., class-specific feature contribution “patterns”, are discovered. These patterns represent a standard behaviour of the model and allow for an additional assessment of the model reliability for new data. Interpretation of feature contributions for two UCI benchmark datasets shows the potential of the proposed methodology. The robustness of results is demonstrated through an extensive analysis of feature contributions calculated for a large number of generated random forest models.

Keywords

Random forest Classification Variable importance Feature contribution Cluster analysis 

Notes

Acknowledgments

This work is partially supported by BBSRC and Syngenta Ltd through the Industrial CASE Studentship Grant (No. BB/H530854/1).

References

  1. 1.
    Tropsha, A.: Best practices for QSAR model development, validation, and exploitation. Mol. Inform. 29(6–7), 476–488 (2010)Google Scholar
  2. 2.
    Rosenbaum, L., Hinselmann, G., Jahn, A., Zell, A.: Interpreting linear support vector machine models with heat map molecule coloring. J. Cheminf. 3(1), 11 (2011)CrossRefGoogle Scholar
  3. 3.
    Carlsson, L., Helgee, E.A., Boyer, S.: Interpretation of nonlinear QSAR models applied to ames mutagenicity data. J. Chem. Inf. Model. 49(11), 2551–2558 (2009)Google Scholar
  4. 4.
    Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Muller, K.R.: How to explain individual classification decisions. J. Mach. Learn. Res. 11, 1803–1831 (2010)MATHMathSciNetGoogle Scholar
  5. 5.
    Hansen, K., Baehrens, D., Schroeter, T., Rupp, M., Muller, K.R.: Visual interpretation of kernel-based prediction models. Mol. Inform. 30(9), 817–826 (2011)CrossRefGoogle Scholar
  6. 6.
    Kuz’min, V.E., Polishchuk, P.G., Artemenko, A.G., Andronati, S.A.: Interpretation of QSAR models based on random forest methods. Mol. Inform. 30(6–7), 593–603 (2011)Google Scholar
  7. 7.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefMATHGoogle Scholar
  8. 8.
    Breiman, L., Cutler, A.: Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests (2008)
  9. 9.
    Strobl, C., Boulesteix, A.-L., Zeileis, A., Hothorn, T.: Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinf. 8(1), 25 (2007)Google Scholar
  10. 10.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth & Brooks/Cole Advanced Books & Software, Monterey (1984)MATHGoogle Scholar
  11. 11.
    Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)Google Scholar
  12. 12.
  13. 13.
    Cormen, T.H., Stein, C., Rivest, R.L., Leiserson, C.E.: Introduction to Algorithms. 2nd edn. McGraw-Hill Higher Education, New York (2001)Google Scholar
  14. 14.
    Hand, D.J., Smyth, P., Mannila, H.: Principles of Data Mining. MIT Press, Cambridge (2001)Google Scholar
  15. 15.
    Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2012)Google Scholar
  16. 16.
  17. 17.
    CRAN—The Comprehensive R Archive Network. http://cran.r-project.org/

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Anna Palczewska
    • 1
  • Jan Palczewski
    • 2
  • Richard Marchese Robinson
    • 3
  • Daniel Neagu
    • 1
  1. 1.Department of ComputingUniversity of BradfordBradfordUK
  2. 2.School of MathematicsUniversity of LeedsLeedsUK
  3. 3.School of Pharmacy and Biomolecular SciencesLiverpool John Moores UniversityLiverpoolUK

Personalised recommendations