Abstract
In this paper, we present Forest GUMP (for Generalized, Unifying Merge Process) a tool for verification and precise explanation of Random forests. Besides pre/post-condition-based verification and equivalence checking, Forest GUMP also supports three concepts of explanation, the well-known model explanation and outcome explanation, as well as class characterization, i.e., the precise characterization of all samples that are equally classified. Key technology to achieve these results is algebraic aggregation, i.e., the transformation of a Random Forest into a semantically equivalent, concise white-box representation in terms of Algebraic Decision Diagrams (ADDs). The paper sketches the method and demonstrates the use of Forest GUMP along illustrative examples. This way readers should acquire an intuition about the tool, and the way how it should be used to increase the understanding not only of the considered dataset, but also of the character of Random Forests and the ADD technology, here enriched to comprise infeasible path elimination. As Forest GUMP is publicly available all experiments can be reproduced, modified, and complemented using any dataset that is available in the ARFF format.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Data Availability
The artifact is available in the Zenodo repository [26].
References
Akers, S.B.: Binary decision diagrams. IEEE Trans. Comput. 27(6), 509–516 (1978)
Bahar, R., Frohm, E., Gaona, C., Hachtel, G., Macii, E., Pardo, A., Somenzi, F.: Algebraic decision diagrams and their applications. In: Proceedings of 1993 International Conference on Computer Aided Design (ICCAD), pp. 188–191 (1993). https://doi.org/10.1109/ICCAD.1993.580054
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Bryant, R.E.: Graph-based algorithms for Boolean function manipulation. IEEE Trans. Comput. 35(8), 677–691 (1986). https://doi.org/10.1109/TC.1986.1676819
Chen, H., Zhang, H., Si, S., Li, Y., Boning, D.S., Hsieh, C.: Robustness verification of tree-based models. In: Wallach, H.M., Larochelle, H., Beygelzimer, A., d’Alché-Buc, F., Fox, E.B., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada, pp. 12317–12328 (2019). https://papers.nips.cc/paper/2019/hash/cd9508fdaa5c1390e9cc329001cf1459-Abstract.html
Chipman, H.A., George, E.I., McCulloh, R.E.: Making sense of a forest of trees. In: Weisberg, S. (ed.) Proceedings of the 30th Symposium on the Interface, pp. 84–92. Interface Foundation of North America, Fairfax Station, VA (1998)
Deng, H.: Interpreting tree ensembles with inTrees. Int. J. Data Sci. Anal. 7(4), 277–287 (2019). https://doi.org/10.1007/s41060-018-0144-8
Domingos, P.M.: Knowledge discovery via multiple models. Intell. Data Anal. 2(1–4), 187–202 (1998). https://doi.org/10.1016/S1088-467X(98)00023-7
Einziger, G., Goldstein, M., Sa’ar, Y., Segall, I.: Verifying robustness of gradient boosted models. In: The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 2019, pp. 2446–2453. AAAI Press, New York (2019). https://doi.org/10.1609/aaai.v33i01.33012446
Fernández-Delgado, M., Cernadas, E., Barro, S., Amorim, D.: Do we need hundreds of classifiers to solve real world classification problems? J. Mach. Learn. Res. 15(1), 3133–3181 (2014)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugen. 7(2), 179–188 (1936)
Gossen, F., Steffen, B.: Algebraic aggregation of random forests: towards explainability and rapid evaluation. Int. J. Softw. Tools Technol. Transf. (2021). https://doi.org/10.1007/s10009-021-00635-x
Gossen, F., Margaria, T., Murtovi, A., Naujokat, S., Steffen, B.: Dsls for decision services: a tutorial introduction to language-driven engineering. In: Margaria, T., Steffen, B. (eds.) Leveraging Applications of Formal Methods, Verification and Validation. Modeling - 8th International Symposium, Proceedings, Part I, ISoLA 2018, Limassol, Cyprus, November 5-9, 2018. Lecture Notes in Computer Science, vol. 11244, pp. 546–564. Springer, Berlin (2018). https://doi.org/10.1007/978-3-030-03418-4_33
Gossen, F., Margaria, T., Steffen, B.: Towards explainability in machine learning: the formal methods way. IT Prof. 22(4), 8–12 (2020). https://doi.org/10.1109/MITP.2020.3005640
Gossen, F., Margaria, T., Steffen, B.: Formal methods boost experimental performance for explainable AI. IT Prof. 23(6), 8–12 (2021). https://doi.org/10.1109/MITP.2021.3123495.
Gossen, F., Murtovi, A., Linden, J., Steffen, B.: The java library for algebraic decision diagrams. https://add-lib.scce.info. Accessed 2023-02-22
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93 (2019). https://doi.org/10.1145/3236009
Hara, S., Hayashi, K.: Making tree ensembles interpretable: a Bayesian model selection approach. In: Storkey, A.J., Pérez-Cruz, F. (eds.) International Conference on Artificial Intelligence and Statistics, AISTATS 2018, 9-11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain. PMLR Proceedings of Machine Learning Research, vol. 84, pp. 77–85. (2018). http://proceedings.mlr.press/v84/hara18a.html
Ho, T.K.: Random decision forests. In: Proceedings of 3rd International Conference on Document Analysis and Recognition, vol. 1, pp. 278–282 (1995). https://doi.org/10.1109/ICDAR.1995.598994
Hungar, H., Steffen, B., Margaria, T.: Methods for generating selection structures, for making selections according to selection structures and for creating selection descriptions. USPTO Patent number: 9141708 (Sep 2015). https://patents.justia.com/patent/9141708
Kantchelian, A., Tygar, J.D., Joseph, A.D.: Evasion and hardening of tree ensemble classifiers. In: Balcan, M., Weinberger, K.Q. (eds.) Proceedings of the 33nd International Conference on Machine Learning, ICML 2016, New York City, NY, USA, June 19-24, 2016 JMLR Workshop and Conference Proceedings, vol. 48, pp. 2387–2396 (2016). http://proceedings.mlr.press/v48/kantchelian16.html
Lee, C.Y.: Representation of switching circuits by binary-decision programs. Bell Syst. Tech. J. 38(4), 985–999 (1959)
Lou, Y., Caruana, R., Gehrke, J.: Intelligible models for classification and regression. In: Yang, Q., Agarwal, D., Pei, J. (eds.) The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, Beijing, China, August 12-16, 2012 pp. 150–158. ACM, New York (2012). https://doi.org/10.1145/2339530.2339556
Mangla, P., Singh, V., Balasubramanian, V.N.: On saliency maps and adversarial robustness. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 272–288. Springer, Berlin (2020)
Mundhenk, T.N., Chen, B.Y., Friedland, G.: Efficient saliency maps for explainable ai. Arxiv preprint (2019). arXiv:1911.11293
Murtovi, A., Bainczyk, A., Steffen, B.: Forest gump: a tool for explanation (tacas 2022 artifact) (Nov 2021). https://doi.org/10.5281/zenodo.5733107
Murtovi, A., Bainczyk, A., Steffen, B.: Forest GUMP: a tool for explanation. In: Fisman, D., Rosu, G. (eds.) Tools and Algorithms for the Construction and Analysis of Systems - 28th International Conference, TACAS 2022, Held as Part of the European Joint Conferences on Theory and Practice of Software, Proceedings, Part II, ETAPS 2022, Munich, Germany, April 2-7, 2022. Lecture Notes in Computer Science, vol. 13244, pp. 314–331. Springer, Berlin (2022). https://doi.org/10.1007/978-3-030-99527-0_17.
Nolte, G., Schlüter, M., Murtovi, A., Bernhard, S.: The power of Typed Affine Decision Structures: a case study. Int. J. Softw. Tools Technol. Transf. (2023, in this issue). https://doi.org/10.1007/s10009-023-00701-6
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Ranzato, F., Zanella, M.: Abstract interpretation of decision tree ensemble classifiers. In: The Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020. pp. 5478–5486. AAAI Press, New York (2020). https://ojs.aaai.org/index.php/AAAI/article/view/5998
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?” explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?”: explaining the predictions of any classifier. In: Krishnapuram, B., Shah, M., Smola, A.J., Aggarwal, C.C., Shen, D., Rastogi, R. (eds.) Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, August 13-17, 2016 pp. 1135–1144. ACM, New York (2016). https://doi.org/10.1145/2939672.2939778
Sato, N., Kuruma, H., Nakagawa, Y., Ogawa, H.: Formal verification of decision-tree ensemble model and detection of its violating-input-value ranges. CoRR (2019). arXiv:1904.11753
Schlüter, M., Nolte, G., Murtovi, A., Bernhard, S.: Towards rigorous understanding of Neural Networks via semantics-preserving transformations. Int. J. Softw. Tools Technol. Transf. (2023, in this issue). https://doi.org/10.1007/s10009-023-00700-7
Steffen, B., Gossen, F., Naujokat, S., Margaria, T.: Language-Driven Engineering: From General-Purpose to Purpose-Specific Languages, pp. 311–344. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-91908-9_17
Törnblom, J., Nadjm-Tehrani, S.: Formal verification of random forests in safety-critical applications. In: Artho, C., Ölveczky, P.C. (eds.) Formal Techniques for Safety-Critical Systems - 6th International Workshop, FTSCS 2018, Gold Coast, Australia, November 16, 2018, Revised Selected Papers. Communications in Computer and Information Science, vol. 1008, pp. 55–71. Springer, New York (2018). https://doi.org/10.1007/978-3-030-12988-0_4
Van Assche, A., Blockeel, H.: Seeing the forest through the trees: learning a comprehensible model from an ensemble. In: Kok, J.N., Koronacki, J., Mantaras, R.L.D., Matwin, S., Mladenič, D., Skowron, A. (eds.) Machine Learning: ECML 2007, pp. 418–429. Springer, Berlin (2007)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques, vol. 2 (2005)
Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques 4th edn. Morgan Kaufmann, San Francisco (2016)
Zhou, Y., Hooker, G.: Interpreting Models via Single Tree Approximation (2016)
Funding
Open Access funding enabled and organized by Projekt DEAL.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Murtovi, A., Bainczyk, A., Nolte, G. et al. Forest GUMP: a tool for verification and explanation. Int J Softw Tools Technol Transfer 25, 287–299 (2023). https://doi.org/10.1007/s10009-023-00702-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10009-023-00702-5