Forest GUMP: a tool for verification and explanation

In this paper, we present Forest GUMP (for Generalized, Unifying Merge Process) a tool for verification and precise explanation of Random forests. Besides pre/post-condition-based verification and equivalence checking, Forest GUMP also supports three concepts of explanation, the well-known model explanation and outcome explanation, as well as class characterization, i.e., the precise characterization of all samples that are equally classified. Key technology to achieve these results is algebraic aggregation, i.e., the transformation of a Random Forest into a semantically equivalent, concise white-box representation in terms of Algebraic Decision Diagrams (ADDs). The paper sketches the method and demonstrates the use of Forest GUMP along illustrative examples. This way readers should acquire an intuition about the tool, and the way how it should be used to increase the understanding not only of the considered dataset, but also of the character of Random Forests and the ADD technology, here enriched to comprise infeasible path elimination. As Forest GUMP is publicly available all experiments can be reproduced, modified, and complemented using any dataset that is available in the ARFF format.


Introduction
Random Forests are one of the most widely known classifiers in machine learning [3,19].The method is easy to understand and implement, and at the same time achieves impressive classification accuracies in many applications [10].Compared to other methods, Random Forests are fast to train and they are clearly more suitable for smaller datasets.In contrast to a single decision tree, Random Forests, a collection of many trees, do not overfit as easily on a dataset and their variance decreases with their size.On the other hand, Random Forests are considered black-box models because of their highly parallel nature: following the execution of Ran-dom Forests means, in particular, following the execution in all the involved trees.Such black-box executions are hard to explain to a human user even for very small examples.
In contrast, individual decision trees are considered whitebox models because of their sequential evaluation nature.Even if a tree is large in size, a human can easily follow its computation step-by-step by evaluating (simple) decisions at each node from the root to a leaf.Indeed, the set of decisions along such an execution path precisely explains why a certain choice has been taken.
Popular methods towards explainability try to establish some user intuition.For example, they may hint at the most influential input data, like highlighting or framing the area of a picture where a face has been identified [24,25].Such information is very helpful, in particular to reveal some of the "popular" drastic mismatches incurred by neural networks: if the framed area of the image does not contain the "tagged" object, the identification is clearly questionable [31].However, even in a correct classification, the tag by itself gives no reason why the identification is indeed correct.
More ambitious are methods that try to turn black-box model into white-box models, ideally preserving the semantics of the classification function.For Random Forests this has been achieved for the first time in [12,14] using the 'aggregating power' of Algebraic Decision Diagrams (ADDs) and Binary Decision Diagrams (BDDs).ADDs are essen-tially decision trees whose leaves are labeled with elements of some algebra, whereas BBDs are the special case for the algebra of Boolean values.Lifting the algebraic operations from the leaves to the entire ADDs/BDDs allows one to aggregate entire Random Forests into single semantically equivalent ADDs, the precondition for solving three explainability problems: -The Model Explanation Problem [17], i.e. the problem of making the model as a whole interpretable, is solved in terms of an ADD that specifies precisely the same classification function as the original Random Forest (cf.Sect.8.2). -The Class Characterization Problem [12,14], i.e. the problem, given a class c, characterizing the set of all samples that are classified by the Random Forest as c.This problem is solved in terms of a BDD which precisely characterizes this set of samples (cf.Sect.8.3).-The Outcome Explanation Problem [17], i.e. the problem of explaining a concrete classification, is solved in terms of a minimal conjunction of (negated) decisions that are sufficient to guide the sample into the considered class (cf.Sect.8.4).
This paper is an extended version of [27] where Forest GUMP (for Generalized, Unifying Merge Process) was presented as a tool for providing a tangible experience with the three described concepts of (precise) explanation.Novel concepts in this paper are the verification of pre/post-conditions and equivalence checking.Technically, both heavily rely on the complete elimination of infeasible paths which is therefore treated in detail in this paper.We treat verification as special cases of model explanation: -For pre-post-condition-based verification we simply project the model explanation of the considered ADD onto the part that is consistent with the precondition and check whether all remaining leaves satisfy the postcondition.The required projection can efficiently be implemented as part of infeasibility elimination.-Equivalence checking is done by introducing an equality relation on ADDs that produces a BDD whose true leaves characterize all paths where the classification agree, while the false leaves characterizes all cases where the classification disagrees.The required equality relation is, in fact, simply the lifted version of the ordinary equality relation on the set of classes.
As Forest GUMP is publicly available all experiments can be reproduced, modified, and complemented using any data set that is available in the ARFF format [38].
Our implementation relies on the standard Random Forest implementation in Weka [39] and on the ADD implementation of the ADD-Lib [13,16,35].Whereas the verification part in this paper is entirely new, the reader may find a more detailed description of the transformations, the three concepts of explanation, and a quantitative analysis in [12,14,15].

Related work
In recent years, several approaches for verifying decision tree ensembles, including Random Forests, have been developed.They mostly propose solutions to answer robustness queries such as whether it is possible to change the output of a classifier for a given input x by slightly modifying the input or finding the nearest instance x such that the classifier's prediction is changed.These methods work by e.g.encoding the problem as a mixed integer linear program [21], as a max-clique searching problem in k-partite graphs [5] or as a SMT formula [9,33].There are also approaches based on abstract interpretation [30,36].To our knowledge, we are the first to propose verification methods for Random Forests based on Algebraic Decision Diagrams.
There exist various methods for improving the understandability of Random Forests such as extracting decision rules from the considered black-box model [7], methods that are agnostic to the black-box model under consideration [23,32] or by deriving a single decision tree from the black-box model [6,8,18,37,40].In this context, single decision trees are considered key to a solution of both, the model explanation and outcome explanation problem.State-of-the-art solutions to derive a single decision tree from a Random Forest are approximative [6,8,18,37,40].Thus, their derived explanations are not fully faithful to the original semantics of the considered Random Forest.This is in contrast to our ADD-based aggregation, which precisely reflects the semantics of the original Random Forest.
After a short introduction to Random Forests in Sect.2, we present our approach to their aggregation in Sect.3, followed by a detailed discussion concerning the elimination of all redundant predicates (which is essential for verification) in Sect. 4. Subsequently, we present a powerful, but non-compositional abstraction in Sect.5, before we describe both, pre/post-condition-based verification and equivalence checking in Sect.6, Forest GUMP in Sect.7, and comprehensive use cases of Forest GUMP in Sect.8.The paper closes with conclusions and directions to future work in Sect.9.

Random Forests
Learning Random Forests is a quite popular, and algorithmically relatively simple classification technique that yields good results for many real-world applications.Its decision model generalizes a training dataset that holds examples of input data labeled with the desired output, also called class.
More concretely, a Random Forest is a collection of decision trees that are typically themselves classifiers that were Fig. 1 Random Forest learned from the Iris dataset [11] (39 nodes) each learned from a random sample of a given training dataset.Figure 1 shows a Random Forest with three trees.In practice, this number is typically one or two orders of magnitude higher.Section 8, e.g., discusses an example with 20 decision trees.
The point of Random Forests is to turn classification into a 'democratic' process which has much better statistical properties than approaches based on individual trees.This process proceeds in two steps: -First, each decision tree of the forest is evaluated individually for the considered input data by tracing from their roots down to one of the leaves which yields one decision per tree, i.e. the predicted class.-Second, the individual decisions are aggregated via majority vote which selects the most frequently chosen class.
The key advantage of this approach is, compared to single decision trees, the reduced variance.A detailed introduction to Random Forests, decision trees, and their learning procedures can be found in [3,19,29].
In this paper, we use Weka [39] as our reference implementation of Random Forests.However, our approach does not depend on implementation details and can be easily adapted to other implementations.
Figure 1 shows a small Random Forest that was learned from the popular Iris dataset [11].The dataset lists dimensions of Iris flowers' sepals and petals for three different species.Using this forest to decide the species on the basis of given measurements requires to first evaluate the three trees individually and to subsequently determine the majority vote.This effort clearly grows linearly with the size of the forest.In the following we use this example to illustrate our approach of forest aggregation for explainability.
Key idea behind our approach is to partially evaluate the Random Forests at construction time which, in particular, eliminates redundancies between the individual trees of a Random Forest.E.g., in our accompanying Iris flower ex-ample (cf.Fig. 1) the predicate petalwidth < 1.65 is used in all three trees.This can easily lead to cases where the same predicate is evaluated many times in the classification process.The partial evaluation proposed in this paper transforms Random Forests into decision structures where such redundancies are totally eliminated.
An adequate data structure to achieve this goal for binary decisions are Binary Decision Diagrams [1,4,22] (BDDs): For a given predicate ordering, they constitute a normal form where each predicate is evaluated at most once, and only if required to determine the final outcome.
Algebraic Decision Diagrams (ADDs) [2] generalize BDDs to capture functions of the type B P → C n which are exactly what we need to specify the semantics of Random Forests for a classification domain C over a set of predicates P.Moreover, in analogy to BDDs, which inherit the algebraic structure of their co-domain B, ADDs also inherit the algebraic structure of their co-domains if available.
We exploit this property during the partial evaluation of Random Forests by considering the class vector co-domain (cf.Sect.3).The aggregation to achieve the corresponding optimized decision structures is then a straightforward consequence of the used ADD technology.

Class vector aggregation
Class vectors faithfully represent the information about how many trees of the original Random Forest voted for a certain outcome.Obviously, this information is sufficient to obtain the precise results of a corresponding majority vote.Formally, the domain of class vectors forms a monoid where addition + is defined component-wise and 0 is the neutral element.With the compositionality of the algebraic structure V and the corresponding ADDs D V , we can transform any Random Forest incrementally into a semantically equivalent ADD.Starting with the empty Random Forest, i.e. the neutral element 0, we consider one tree after the other, aggregating a growing sequence of decision trees until the entire forest is entailed in the new decision diagram.The details of this transformation are described in [12].Figure 2 shows the result of this transformation for our running example.

Infeasible path elimination
When aggregating the trees of a Random Forest they all use varying sets of predicates.In contrast to simple Boolean variables, predicates are not independent on one another, i.e. the evaluation of one predicate may yield some degree of knowledge about the outcome concerning other predicates.E.g., the predicate petallength < 2.45 induces knowledge about other predicates that reason about petallength: When the petal length is smaller than 2.45 it cannot possibly be greater or equal to 2.7 at the same time.This is not taken care of by the symbolic treatment of predicates we followed until now.In fact, predicates are typically considered independent in the ADD/BDD community.
Infeasible path elimination, as illustrated by the difference between Fig. 2 and Fig. 3 for our running example, leverages the potential of a semantic treatment of predicates with significant effect on the size of the resulting ADDs.In fact, the experiments with thousands of trees reported in [12] would not have been successful without infeasible path elimination.
While [12] already briefly describes the functionality of the infeasible path reduction, we will here specify the way the elimination works by means of a recursive function λ : D A → D A , where D A is the set of ADDs for an algebra A. Let Π(P) be the set of all possible path conditions, i.e. the set of all conjunctions over, possibly negated, predicates of P. We first define a function λ : Here, the path condition pc is the conjunction of predicates seen along the path from the root to a node.If we are at an inner node (p,t,u), there are three different cases: 1.If the path condition pc implies the predicate p, then the node (p,t,u) is redundant as the predicate p is always true and we know that we will always follow the thensuccessor t in this case.Thus, λ (t, pc) is returned.2. The second case, where the path condition pc implies ¬p can be handled analogously.3.If the path condition pc implies neither p nor ¬p, we do not know whether the predicate p holds and the node is not redundant.When we make the recursive calls to the then-and else-successor we add p, respectively ¬p to the path condition.
We can then define λ as Please note that infeasible path elimination -is only required after aggregation: The trees in the original Random Forest have no infeasible paths by construction.They are introduced in the course of our symbolic aggregation, which is insensitive to semantic properties.-is compositional and can therefore be applied during the step-wise transformation, before the final most frequent label abstraction (cf.Sect.5), and at the very end.-does not support normal forms: Whereas class vector abstraction is canonical for a given variable ordering, infeasible path elimination is not!Thus our approach may yield different decision diagrams depending on the order of tree aggregation.It is guaranteed, however, that the resulting decision diagrams are minimal.
Infeasible path elimination is a hard problem in general.1For the theory of linear algebra considered in this paper, however, LP-solvers are powerful enough to identify infeasible paths in polynomial time.
Class vector aggregation and infeasible path elimination are both compositional and can therefore be applied in arbitrary order without changing the semantics.The majority vote at compile time described in the next section is not compositional and must therefore be applied at the very end.

Majority vote at compile time
As mentioned above, maintaining the information about the result of the majority votes is not compositional.In fact, knowing the result of the majority votes for two Random Forest gives no guarantees about the majority vote of the 1 For the cases considered here it is polynomial, but there are of course theories for which it becomes exponentially hard or even undecidable.combined forest.Thus the majority vote abstraction can only be applied at the very end, after the entire aggregation has been computed compositionally.
The result of the compositional aggregation process, including infeasible path elimination, is a decision diagram d ∈ D V with class vectors in its terminal nodes.The majority vote abstraction Δ C : D V → D C can now be defined as the lifted version of the majority vote abstraction on class vectors v ∈ N |C| (cf.[12]): Note that δ C does not project into the same carrier set but rather from one algebraic structure V into another C.2 However, these transformations can be applied to the corresponding decision diagrams in the very same way.Figure 4 shows the result of the most-frequent-class abstraction for our running example.

Random Forest verification
In this section, we present solutions of two verification problems, pre-post-condition-based verification and equivalence checking, as direct applications of model explanation [12].

Pre/post-condition-based verification
Given a precondition φ and a postcondition ψ and an ADD, we can either verify that for all inputs that satisfy φ the ADD's output satisfies ψ or we can provide a counter example that shows that the specification does not hold.
We can perform this procedure using the infeasible path elimination described in Sect. 4. One can easily incorporate the precondition to the infeasible path reduction by calling λ with φ instead of true, i.e.By applying the infeasible path elimination with φ as the precondition we obtain the ADD in Fig. 13.If we apply the function Δ post (C) to this ADD, the resulting ADD consists only of the true node as all classes in Fig. 13 are contained in the postcondition C. Therefore one can conclude that, given the precondition φ, the ADD satisfies the postcondition C.

Equivalence checking
In this section we provide a method for checking the equivalence of ADDs.As noted in [12] applying the infeasible path reduction does not preserve canonicity.Thus, we cannot check the semantic equivalence of two ADDs by checking whether they are structurally identical.But, similar to the class characterization, we can create an ADD that characterizes all paths in which two ADDs agree respectively disagree.We define the function δ eq : C 2 → B as follows Fig. 5 The model explanation for a Random Forest with two decision trees The function δ eq can be lifted to operate on ADDs, yielding Δ eq : D 2 C → D B .The application of δ eq produces a BDD which precisely characterizes the differences between two ADDs.
If we remove all infeasible paths of this BDD using the infeasible path elimination from Sect. 4 and the BDD consists of a single leaf, the true node, the ADDs are equivalent.Otherwise, we can follow one path from the root to the false leaf to construct a concrete input for which the two ADDs disagree.
For a concrete example consider Fig. 5 showing the model explanation for a Random Forest consisting of two decision trees and Fig. 6 which results from extending this ADD by adding the effect of a third decision tree.The infeasible path elimination for the ADD in Fig. 6 has been applied after each aggregation step.More concretely, after the aggregation of Fig. 6 The model explanation for a Random Forest with three decision trees Fig. 7 The equivalence ADD for the ADDs of Fig. 5 and 6 after infeasibility elimination the first two ADDs, corresponding to the first two decision trees, the infeasible path elimination is applied and the result is then aggregated with the third ADD, corresponding to the third decision tree, where the infeasible paths are again eliminated.After infeasibility elimination the equivalence ADD of these two ADDs precisely identifies the impact of the third decision tree (cf.Fig. 7): all paths that end in the true leaf are paths that the two Random Forests agree on and correspondingly, all paths that end in the false leaf are paths the Random Forests disagree on.

Forest GUMP: structural overview
Forest GUMP3 (Generalized Unifying Merge Process) is a tool we developed to illustrate the power of algebraic aggregation for the optimization, verification and explanation of Random Forests.It is designed to allow everyone, in particular people without IT or machine learning knowledge, to experience the nature of Random Forests.To avoid unnecessary entry hurdles, we decided to implement Forest GUMP as a simple-to-use web application.It allows the user to experience the methods described in the previous sections and the proposed solutions to verification and explainability problems which will be illustrated in the following section.
Forest GUMP's user interface (see Fig. 8) is essentially divided into two parts.On the left side the user can input the necessary data to learn a Random Forest and subsequently visualize it while the currently chosen representation will be visualized on the right side.First, the user has to upload a dataset or choose one of six datasets that we provide (cf. (1) in Fig. 8) on which the Random Forest will be learned.Next, the hyperparameters necessary for the learning procedure have to be selected, such as the number of trees to be learned (cf.(2) in Fig. 8).Then, one can choose different aggregation methods, i.e. the ones mentioned in the previous sections and further ones which will be explained in the following Sections (cf.(3) in Fig. 8).Further, there is a checkbox to enable the elimination of infeasible paths for the selected aggregation method and another checkbox to specify when elimination takes place, i.e. stepwise or after merging all ADDs.It is also possible to input a sample, classify it with the ADD and highlight the path from the root to the leaf (satisfied predicates are highlighted in green, unsatisfied predicates are highlighted in red).Moreover, the area provides two further buttons that allow users to check the equivalence between two ADDs and then offer them a prepost-condition-based verification.In the end, the currently visualized ADD can be exported as an executable classifier since Forest GUMP provides code generators for Java, C++, Python and GraphViz's dot format (cf.(4) in Fig. 8).Additionally, the currently visualized ADD can be exported as an SVG to be viewed locally (cf.(4) in Fig. 8).
The grey rectangle (cf.(5) in Fig. 8) points to the root of the currently visualized ADD.One can zoom into/out which can be helpful when the ADDs are rather large (cf.(6) in Fig. 8).On the top left the number of nodes and the length of the currently highlighted path are displayed (cf.(7) in Fig. 8).On the bottom right, one can open a history of all the representations one chose to visualize (cf.(8) in Fig. 8).Fig. 9 The execution history in Forest GUMP: The user can re-execute previous setups and export the history as CSV Figure 9 shows the expanded execution history.For each visualized ADD, the execution history lists the aggregation variant, the hyperparameters used to learn the Random Forest and the size (i.e. the number of nodes) and the maximum depth which is the longest path from root to leaf.The execution history also allows one to replay an experiment by clicking on the button on the right side of a row which allows one to compare different ADD variants.One can also delete the individual entries or the whole history and export the history to a CSV.

Forest GUMP in action
In the following, we will see how complicated it is to understand how a Random Forest comes to its decision and

Learning a Random Forest
To begin, we need a Random Forest which requires a dataset on which it will be learned.In Forest GUMP, the user can upload their own dataset in the Attribute-Relation File Format (ARFF) [39].Alternatively, we provide six exemplary datasets from which a user can select one to directly start using the tool.Figure 10 illustrates how this looks like in Forest GUMP.
Having chosen a dataset, next, the hyperparameters necessary for the learning procedure of the Random Forest have to be specified (see Fig. 11).The inputs are the following: -the number of trees to be learned, -the bagging size, i.e. the fraction of samples to be used to learn each tree and -a seed to be able to reproduce the setting.Additionally, the user can decide to eliminate the infeasible paths as this can strongly reduce the size of the ADDs (see Sect. 4).While the predicate order is fixed by default, the user can decide to let Forest GUMP optimize the predicate order as the order can also greatly impact the size of the ADDs.A more in depth discussion on the interplay between the infeasible path elimination and the predicate order will follow.
Figure 11 shows a Random Forest that was learned on the Iris dataset, consisting of 20 trees,4 a bagging size of 100% and 58 as the seed.
If we now want to classify a given input, for each tree we would have to traverse from the root to the leaf and receive one predicted class per tree.The class which was predicted most often is the final result.Trying to understand why the Random Forest predicted this specific class is seemingly impossible.In the following we will show how we can do better.

Model explanation
A concise white-box model corresponding to the Random Forest of Fig. 11 can be constructed through the most frequent label abstraction (see Sect. 5) of the aggregated Random Forest (see Sect. 3), whose infeasible paths are eliminated (see Sect. 4).This solves the Model Explanation Problem.
Figure 12 sketches the result of this construction: A whitebox model with 310 nodes.Admittedly, this model is still frightening, but given a sample, it allows one to easily follow the corresponding classification process, and in this case it may require at most 19 individual decisions based on the petal and sepal characteristics.This decision set is our set of predicates.The conjunction of these predicates is a solution to the Outcome Explanation Problem.However, more concise explanations are derived from the class characterization BDD discussed in the following section.
Given the sample petallength = 2.4, petalwidth = 1.8, sepallength = 5.9, sepalwidth = 2.5, the outcome explanation given by the model explanation consists of the following 9 predicates (in Fig. 12 satisfied predicates are highlighted Fig. 12 An extract of the model explanation.The ADD is constructed from the most frequent label abstraction of the aggregated Random Forest following an elimination of all infeasible paths (310 nodes, the longest path with length 19, the highlighted path has a length of 9) (Color figure online) in green, unsatisfied predicates are highlighted in red): While this is already an improvement compared to the Random Forest, where you would have to traverse all 20 decision trees, we will see how we can improve even more in the following.

Class characterization
The class characterization problem is particularly interesting because it allows one to 'reverse' the classification process.The BDD shown in Fig. 8 is a minimal characterization of the set of all the samples that are guaranteed to be classified as Iris-Setosa.
Remark: Being able to reverse a learned classification function has a major practical importance.Think, e.g., of a marketing research scenario where data have been collected with the aim to propose best fitting product offers to customers according to their user profile.This scenario can be considered as a classification problem where the offered product plays the role of the class.Now, being able to reverse the customer → product classification function provides the marketing team with a tailored product → customer promotion process: for a given product, it addresses all customers considered to favor this very product as in the corresponding patent [20].
The path highlighted in Fig. 8 is the path from the root to the leaf for the same sample petallength = 2.4, petalwidth = 1.8, sepallength = 5.9, sepalwidth = 2.5.

Outcome explanation problem
The previous classification formula expresses the collection of 'conditions' that this sample satisfies, and it provides therefore a precise justification why it is classified in this class.Despite the fact that class characterization BDDs are concise in a global context it is easy to see that there are some redundancies in the formula of this specific local path.For example, a petallength < 2.45 is also inherently smaller than 2.6, 4.85 and 4.95; therefore, for this specific sample those three predicates are redundant.This is the result of the imposed predicate ordering in BDDs: all the BDD predicates are listed, and they are listed in a fixed order.After eliminating these redundancies, we are left with the following precise minimal outcome explanation: this sample is recognized as belonging to the class Iris-Setosa because it has the properties ¬(petalwidth < 0.75) ∧ (petallength < 2.45).In Forest GUMP we make these redundant predicates explicit by highlighting them in blue (see Fig. 8).From 9 predicates in the model explanation to 5 predicates in the class characterization, we have now arrived at an explanation that only consists of 2 predicates.

Verification
When looking at a certain classification result, one question is central: how stable is this result, i.e., would the classification be the same if the input values would marginally be changed?Pre/post-condition-based verification is an ideal tool to answer this question, and our model explanationbased approach is particularly well-suited: the model explanation projected onto a precondition that expresses the desired stability interval directly specifies all adversarial examples.This is the optimal information for judging whether the classification is critical.In the example ADD shown in Fig. 13 one immediately sees, e.g., that Iris-setosa is the right class unless the petallength is greater than 2,45.
Considering equivalence checking: In Sect.6.2, we saw a model explanation characterizing the semantic difference between two given ADDs.In order to have an example for a successful equivalence check let us consider the ADD in Fig. 14.This ADD corresponds to the same Random Forest as the one shown in Fig. 6.The only difference is that, this time, the infeasible path elimination has been applied after the aggregation of all three decision trees.This is in contrast Fig. 14 The model explanation for the same Random Forest of Fig. 6 where infeasible paths are eliminated only after the aggregation of all three decision trees to the ADD in Fig. 6 where the infeasibility elimination has been applied iteratively, i.e. after merging the first two ADDs corresponding to the first two decision trees and again after merging the result with the third one.
As the infeasible path elimination is semantics preserving, the resulting ADDs of Fig. 6 and 14 are semantically equivalent.One the other hand, we can easily see that structurally, they are not equal.This illustrates our observation that infeasible path elimination is not canonical (cf.Sect.4).We can, however, easily confirm their semantic equivalence by first constructing the equivalence ADD and a subsequently eliminating all infeasible paths (cf.Sect.6.2) which, as desired, results in the ADD consists only of a single terminal node true.

Conclusion and perspectives
In this paper, which is an extended version of [27], we have presented Forest GUMP, a tool for verification and precise explanation of Random forests.Novel contributions are the verification of pre/post-conditions and equivalence checking.Both heavily rely on the complete elimination of infeasible paths which is therefore treated in detail in this paper.Technically, verification is treated as special cases of model explanation: -Pre-post-condition-based verification is reduced to model explanation projected onto the part that is consistent with the precondition combined with a check whether all remaining leaves satisfy the postcondition.
-Equivalence checking is done by introducing an equality relation on ADDS that produces a BDD whose true leaves characterizes all paths where the classification agree, while the false leaves characterizes all cases where the classification disagrees.
Besides pre/post-condition-based verification and equivalence checking, Forest GUMP also supports three concepts of explanation, the well-known model explanation and outcome explanation, as well as class characterization, i.e., the precise characterization of all samples that are equally classified.
Forest GUMP is designed to provide even non-technical people with a tangible experience with the validation of machine learned models and its limitations.As Forest GUMP is publicly available all experiments can be reproduced, modified, and complemented using any data set that is available in the ARFF format [38].
Playing with Forest GUMP led to interesting observations about Random Forest Learning: Changing the random seed for the learning process had a significant impact on the size of the explanation models and the class characterizations.The observed sizes of the explanation models ranged from 138 to 519.Interesting was that the larger sizes did not necessarily imply a better prediction quality.The same also applied to the class characterizations.In fact, we observed a 100% prediction quality for a class characterization of only 3 nodes, while a class characterization for the same species with 40 nodes only scored 33% prediction.
The impact of infeasible path elimination is enormous.As reported in [12], forests with 10.000 trees can be handled when unsatisfiable path are eliminated.Otherwise, treating 100 trees is quite problematic.Our newest experience with a similar approach for Neural Networks [28,34] confirms the importance of infeasible path.
Of course, these are first steps in a very ambitious new direction and it has to be seen how far the approach carries.Scalability will probably require decomposition methods, perhaps in a similar fashion as illustrated by the difference between model explanation and the considerably smaller class characterization.More work is needed also on techniques that aim at dealing with large numbers of predicates.
Funding Note Open Access funding enabled and organized by Projekt DEAL.

Data Availability
The artifact is available in the Zenodo repository [26].
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material.If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Fig. 2 Fig. 3
Fig. 2 Class vector aggregation of the Random Forest (83 nodes) λ(x) λ (x, φ) As we are concerned with classification and the output of a Random Forest is a class c ∈ C, postconditions can be specified as subsets C ⊂ C, e.g.C can be a subset of "good" classes.We can then define δ post (C) : C → B as δ post (C)(c) := 1 if c ∈ C 0 otherwise.The function δ post (C) can be lifted to operate on ADDs, yielding Δ post (C) : D C → D B .If we apply the function Δ post (C) to an ADD to which the infeasible path elimination with the precondition φ has been applied and the resulting

Fig. 4
Fig.4 Most frequent label abstraction of the aggregated Random Forest (majority vote) without semantically redundant nodes(18 nodes)

Fig. 8
Fig. 8 Overview of Forest GUMP.The visualized ADD is our solution to the class characterization problem (cf.Sect.8.3) for the class Iris-Setosa (10 nodes, highlighted path of length 5) (Color figure online)

Fig. 10
Fig. 10 Users can choose to upload their own dataset or select one of six exemplary datasets

Fig. 11 A
Fig. 11 A Random Forest consisting of 20 individual decision trees (191 number of nodes, the longest path consists of 9 nodes).Note that each decision tree is represented as an ADD and that all ADDs share While the direct problem is 'given a sample, provide its classification', the reverse problem sounds 'given a class, what are the characteristics of all the samples belonging to this class?'BDD-based Class Characterization can be defined via the following simple transformation function: Given a class c ∈ C, we define a corresponding projection function δ B (c) : C → B on the co-domain as δ B (c)(c ) := 1 if c = c 0 otherwise, for c ∈ C. Again, the function δ B (c) can be lifted to operate on ADDs, yielding Δ B (c) : D C → D B .