Algebraically explainable controllers: decision trees and support vector machines join forces

Recently, decision trees (DT) have been used as an explainable representation of controllers (a.k.a. strategies, policies, schedulers). Although they are often very efficient and produce small and understandable controllers for discrete systems, complex continuous dynamics still pose a challenge. In particular, when the relationships between variables take more complex forms, such as polynomials, they cannot be obtained using the available DT learning procedures. In contrast, support vector machines provide a more powerful representation, capable of discovering many such relationships, but not in an explainable form. Therefore, we suggest to combine the two frameworks to obtain an understandable representation over richer, domain-relevant algebraic predicates. We demonstrate and evaluate the proposed method experimentally on established benchmarks.


Introduction
Safe and efficient controllers for cyber-physical systems are hard to obtain manually, in particular in presence of both the discrete type of behaviour and continuous aspects such as complex dynamics in space and/or time.To that end, various model checking tools offer also an automatic controller synthesis option, for instance UPPAAL Stratego [DJL + 15], PRISM [KNP11], SCOTS [RZ16], or STORM [DJKV17].In their most basic form, they use discretization to represent the continuous input space with a finite set of states.For each of those states, the synthesized controller describes which actions are allowed.So, the controller can be expressed explicitly as a lookup table, often with millions of rows.
There are two main issues with this representation.First, storing such a large table can require several hundreds of megabytes of storage.However, the devices on which the controller should run are often embedded chips with very limited storage capacity.This makes it infeasible to store the entire lookup table on the device.Second, the sheer size makes it impossible to understand the behavior of the controller.The safety guarantees of the controller rely on the assumption that the formal model was correct and behaves as expected.To validate this and certify the quality, understanding the controller is crucial.For example, a non-permissive controller for the emergency braking system might try to immediately stop the car.This fulfils the safety requirement, as no crash can occur; however, it is not useful in a real application.These flaws in the model can be detected if we can represent the safe controller in a succinct and explainable way.
Running Example To demonstrate our approach to face these issues, we have a closer look at the adaptive cruise control model (in short cruise) from [LMT15], which models a simplified emergency braking system for a car.Synthesizing a safe controller with UPPAAL Stratego gives us a file with more than six million lines, although previous work [Akm19] has shown that there is a way of formulating the safe behavior with a handful of sentences or equations.The goal of this paper is to find such a succinct and explainable representation automatically, utilizing techniques from machine learning.

Controller Representation with Decision Trees
Recently significant progress has been made [BCC + 15,ABC + 19,AKL + 19,AJJ + 20,AJK + 21] with representing controllers using decision trees (a.k.a.nested if-then-else code) .Decision trees, e.g.[Mit97], are simple in structure, making them easy to understand, but still expressive enough to represent complex controllers.The open-source tool dtControl [AJK + 21] takes advantage of that and offers an automated way of generating succinct decision trees.It can read controllers from many commonly used model checkers and implements various heuristics to minimize the size of the decision tree.
Traditionally, in a decision tree, the predicate in the decision node has the form v i ≤ c with some variable v i and some constant c ∈ R. Such a split divides up the feature space with a hyperplane orthogonal to the feature axis v i thus giving it the name axis-aligned split [Mit97].Those splits are easy to understand and efficient to find, forming the basis of efficient decision-tree learning.
However, axis-aligned predicates are incapable of capturing more complex relationships as seen in Figure 1a.In this toy example, 5 predicates are needed to separate the red from the blue labels.For a real-world dataset with thousands of data points, this behavior can be even more extreme.For that reason, dtControl also supports linear predicates as proposed by [MKS94].These splits are still hyperplanes but now can have arbitrary orientations (see Figure 1b).This makes the predicates harder to find, more difficult to comprehend as more variables are involved, but can ultimately give significantly smaller decision trees.
Extending the notion of using more complex decision predicates, the newest version of dtControl allows the use of algebraic predicates [Wei20,AJK + 21] (see Figure 1c).A domain expert can provide arbitrary closed-form mathematical expressions that are then used in the decision tree construction.It is even possible to leave some constants unspecified and dtControl will find suitable values for those.
Limitations While dtControl has already greatly reduced the size of the controllers in many benchmarks, it still has room for improvement.Most of the implemented heuristics for continuous systems rely on clever ways of deter- minizing a non-deterministic controller.This means we start with a (possibly most) permissive controller, i.e. a controller that can permit several safe actions per state.Then for every state we dynamically select one action, making the choice in each state deterministic.This allows to represent the resulting strategy more succinctly.However, in some instances such as the cruise model, we want to keep the permissiveness of the controller, for example to give a human driver the maximum amount of freedom.
For accurately representing the most-permissive controller for the cruise model without any determinization heuristics, dtControl needs several hundred decision nodes and the resulting decision tree is hardly explainable.When providing the right domain knowledge, significantly smaller decision trees can be found by using algebraic predicates [Akm19,Wei20].However, so far, the supplied domain knowledge had to be tailored to the problem by hand.

Our Contributions
We addresses these limitations by proposing, demonstrating, and critically evaluating two approaches: -First, we explore how to automate the generation of relevant algebraic predicates for decision trees (DT) using domain knowledge.-Second, we use support vector machines (SVM) to learn the predicates directly from data, resulting in more accurate but less understandable predicates.Subsequently, we show how to improve their explainability.-We experimentally evaluate this combination of DT and SVM learning on 28 case studies and analyze the results.In particular, we receive an explainable DT with only 5 decision nodes for the cruise example.

Related Work
This work extends the open-source tool dtControl that was first presented in [AJJ + 20], covered in detail in [Jac20], and since then has been extended significantly [AJK + 21].Adapting techniques from machine learning and formal verification, we combine the insights we receive from the controller data with the domain knowledge to construct smaller and more explainable decision trees.
There have been several approaches using non-linear predicates in decision trees.[IS96] explicitly constructs new features by combining existing ones (for example take their product or ratio) while [BB98] explicitly uses SVMs inside the decision nodes.Both sources focus on the decision tree's ability to generalize but not on the explainability.Specifically, they do not explicitly reconstruct algebraic predicates from the SVM.
In previous versions of dtControl [Wei20,AJK + 21], curve fitting [Arl94] has been used to find undetermined coefficients in algebraic splitting predicates.This approach is based on regression analysis and uses least square fitting [Lev44,Mar63].In our work, however, we use the predicates to separate the data rather than fitting it.For a more detailed comparison, see Subsection C.1.
Typically, binary decision diagrams (BDDs) [Bry86] are used to represent controllers in a compressed way [RZ16,ZVJ18].As BDDs can only represent a binary function {0, 1} n → {0, 1}, this approach requires us to encode the list of state-action pairs of the controller in binary variables.As a result, the BDDs are hardly explainable.Additionally, the size of the BDD heavily depends on the variable ordering.Finding an optimal ordering is NP-complete [BW96] and currently known heuristics struggle with high-dimensional inputs.
Algebraic decision diagrams [BFG + 97] extend BDDs to support representing a function {0, 1} n → S with S ⊂ N.They have been used for representing controllers in, e.g., [SHB00].However, they suffer from the same issues we discussed for BDDs.

Preliminaries
The paper is concerned with representing controllers, which we thus now define.While the specific type of dynamics of the system is irrelevant, we assume the states of the system are given by values of state variables: Definition 1 (Controller).For a model M with states S and actions A, a controller C : S → 2 A selects for every state s ∈ S a set of (so-called safe) actions C(s) ⊆ A. Moreover, we assume the set of states is S ⊆ ∏ M i=1 Dom(v i ), where, for 1 ≤ i ≤ M ∈ N, v i is a state variable with domain Dom(v i ) and ∏ denotes the cartesian product.
Note that this definition allows for permissive controllers that can provide multiple possible actions for a state.Further, we call a controller deterministic when |C(s)| = 1 for all s ∈ S, meaning it chooses exactly one possible action in every state. 1

Representing Controllers by Decision Trees. Definition 2 (Decision Trees). A decision tree T is defined as follows:
-T is a rooted full binary tree, meaning every node either is an inner node and has exactly two children, or is a leaf node and has no children.
1 Also note that according to our definition, the controller's decision is solely based on its current state, not its past states.In practice, this limitation can often be circumvented by encoding additional information about the past into the state.For example, the decision of whether to water the plants may depend on the precipitation of the last three days.Then we can model our current state as a tuple (p 1 , p 2 , p 3 ) where p i describes the precipitation i days ago.
-Every inner node v is associated with a decision predicates α v .A decision predicate (or just predicate) is a boolean function S → {0, 1} over the input S. -Every leaf is associated with an output label a ∈ A.
For learning a decision tree, numerous methods exists such as CART [BFOS84], ID3 [Qui86], and C4.5 [Qui93].In principle, they all evaluate different predicates (see Subsection 2.3) by calculating some impurity measure (see Subsection 2.2) and then greedily pick the most promising one before splitting the dataset on that predicate and recursively continuing with the two children.
A decision tree represents a function as follows: every input vector x ∈ S is evaluated by starting at the root of T and traversing the tree until we reach a leaf node .Then, the label of the leaf a is our prediction for the input x.When traversing the tree, at each inner node v we decide at which child to continue by evaluating the decision predicate α v with x.If the predicate evaluates to true, we pick the left child, otherwise, we pick the right one.
When we represent a controller with a decision tree, our input data is the set of states and an output labels describe a subset of safe actions.Figure 2 shows a decision-tree representation of a deterministic and a permissive controller of a battery-powered temperature control system.

Impurity Measures
To evaluate how promising a predicate α is, dtControl implements different impurity measures.The most classic impurity measure is entropy, originating from information theory.It measures how much uncertainty is left in a dataset.
If the dataset is dominated by one specific label, the entropy is low, whereas a heterogeneous dataset has a higher entropy.For a dataset X with N data points and the label set B, let n(X, y) be the number of data points in X with the label y ∈ B. Then the entropy H(X) is defined as: To evaluate a predicate α, we calculate the remaining entropy after the split.If α is a binary split and partitions X into X = X l X r , we define

Predicates in Decision Trees
In every decision node of our tree, a predicate function A → {0, 1} is used to decide at which child we continue.We distinguish three categories of predicates according to their complexity: Axis-Aligned Predicates are the simplest and by far the most commonly used type of predicates.They have the form v i ≤ c for a constant c ∈ R. Geometrically speaking, the function v i = c describes a hyperplane orthogonal to the v i axis, intersecting at v i = c.That is why they are called axis-aligned predicates [Mit97].
For finding the best axis-aligned predicate, we make the simple observation that for a feature v i with k different values, there are only k − 1 different relevant values for c.So we can simply evaluate all possible predicates for all features v i and select the most promising one.
Linear Predicates [MKS94] or sometimes called oblique predicates have the form ∑ i a i v i ≤ c with a i , c ∈ R.This linear combination of different features describes a hyperplane with arbitrary orientation.Hence they are more expressive but it also makes it harder to find optimal predicates.Algorithms used to find suitable predicates include the OC1 algorithm [MKS94], logistic regression [HTF09,Chapter 4.4], and support vector machines (SVM) [HTF09,Chapter 12] -a machine-learning technique using a hyperplane to split a dataset into two partitions.A hyperplane can be formally defined by the orthogonal vector that goes through the origin w and the distance from the origin b.Then it can be used as a classifier in the following way: where sgn is the sign function.
Algebraic Predicates as outlined in [Akm19] and implemented in [Wei20,AJK + 21], are even more powerful predicates, which can reduce the size of the decision tree and improve explainability.Algebraic predicates allow any closed-form expressions and hence they are the most expressive.For the same reason, automatically generating good algebraic splitting predicates is difficult and typically requires human guidance.
Figure 1 shows how the three types of predicates work on a toy dataset.As expected, the more expressive the predicate is, the fewer predicates are needed to build a perfect classifier.

Running Example
In the cruise control model of [LMT15], we want to control a car so that it will not crash into another vehicle in front.As a secondary objective, the car should drive as fast as possible, thereby minimizing the distance between both cars.The model is illustrated in Figure 3.We only consider two vehicles, our vehicle called ego and the next vehicle in front of us called front.We drive on a single lane without cars entering or leaving, therefore this constellation does not change.The state of the system is modelled by the velocities v e , v f of the cars and their relative distance d r ; the safety criteria d r ≥ d sa f e should hold at every state.In the model, both cars choose a constant acceleration a e , a f for the duration of one time step t 1 .Then, the new state (d r , v e , v f ) is given by v e = v e + a e t 1 (4) The model restricts the domains of the accelerations to a e , a f ∈ {−2, 0, 2} describing the three actions deceleration, neutral, and acceleration.Similarly, the cars have a bounded minimum and maximum velocity v min , v max and the distance sensor has a limited reach of d max .Depending on the values of these parameters, the size of the generated controller changes considerably.For technical details, see Appendix A.

Insufficiency of the Current Solutions
To illustrate the issues with the current solutions, we consider the dataset cruise_250 (see Appendix A.2). Here, the model checker UPPAAL Stratego [DJL + 15] generates a controller file with over 400MiB comprising 320,523 states and 961,569 state-action pairs.Representing it with a binary decision diagram [Bry86] still uses over 1,800 nodes.With dtControl and axis-aligned or linear splits, we can get a decision tree with 869 or 369 nodes respectively which is still far too large to be understandable.Using the determinization heuristics discussed in [AJJ + 20], we find a decision tree with only 3 nodes.Unfortunately, this determinized controller is of little use -it simply lets the car decelerate until it reaches minimal velocity.Of course, this behavior satisfies the safety criteria but is not helpful in the real world.To also fulfill the secondary objective of minimizing the relative distance we have two options.We can pre-determinize the controller by always picking the largest safe acceleration or we keep the maximal permissiveness.In the latter case, the cruise controller acts as an emergency braking system by letting the human driver choose any action as long as it is a safe one.
Handwritten Strategy As shown in [Akm19], there is a decision tree representing the most-permissive controller for the cruise example with just 11 nodes.Yet, there has not been a way of automatically generating it with dtControl so far.The natural predicate for the decision to be made can be derived from the basic kinematics of the model, and hence turns out to be the quadratic polynomial given in Equation 7. See Appendix A.3 for the derivation.

Predicates From Domain Knowledge
As discussed in [Akm19,AJK + 21], we depend on domain knowledge provided by human experts to generate helpful splitting predicates.Providing very specific equations like the handcrafted predicate in Equation 7 is inherently tedious and error prone.Thus, we aim to be able to synthesize the specific predicates from general domain knowledge, in our example the velocity and distance relations: We proceed as follows.Let P be the set of physical quantities appearing in the general domain knowledge equations.In our concrete case, we have P = {d, v, a, t}, where these letters denote distance, velocity, acceleration, and time.Then, let S = {v e , v f , d r } be the set of state variables and C = {d sa f e , v min , v max , a acc , a neu , a dec , t 1 } be a set of constants describing the minimum safety distance, the minimum and maximum velocities of the cars, the acceleration values corresponding to the actions "accelerate", "neutral", "decelerate", and the duration of one time step.We observe that every constant and state variable describes exactly one physical quantity.We define the function ρ p (X) that returns the subset of entities of the set X that are associated with the physical quantity p ∈ P. For example, ρ d (S ∪ C) = {d r , d sa f e }.Our approach can be described as following: 1. Initialize V p := ρ p (S ∪ C) for all p ∈ P. Each V p contains the relevant values for the physical quantity p. 2. For every equation from the domain knowledge (8), solve it for every physical quantity.This gives a set of equations in the form p = f (P \ {p}) for p ∈ P that we call base identities.For example, the base identities for the acceleration are: a = v t and a = 2(d−vt) . A complete list of all 8 base identities is in Appendix B.1. 3.For every physical quantity p and every pair of values x 1 , x 2 ∈ V p , add x 1 + x 2 and x 1 − x 2 to V p . 4. For every base identity α associated with the quantity p α , and for every possible substitution function σ that maps physical quantities p ∈ P to values x ∈ V p , add σ(α) to V p α .
Steps 3 and 4 can be repeated, thereby creating increasingly complex expressions.As an example, let us see how we can generate the value d one describing the difference in distance after one time step if the ego vehicle accelerates and the front vehicle decelerates.In Step 3 we add a min − a max to V a , as well as In Step 4 we use the base identity α : d = 1 2 at 2 + vt with the substitutions and we receive the expression for d one : In contrast to the grammar approach from [Akm19], every predicate we generate can be used directly as a splitting predicate in a decision tree -we do not have any non-terminals we need to replace later.For example, we could try to use d one ≤ c for some constant c ∈ R in our decision tree, where we would replace the constants a min , a max and t 1 in d one with their respective numerical values, leaving us with a function of the state variables v f and v e .Or instead, we could use d one in the next substitution to create more sophisticated expressions.

Handcrafted Predicate Derivation
We have seen a technique of generating compounded predicates, but how far away is the handcrafted predicate we want to synthesize?For that, we illustrate how to derive the handcrafted predicate from Equation 7 using our approach.For details, see Appendix B.2.In Figure 4 we show the resulting sequence of combinations that is necessary to arrive at the handcrafted predicate.
Performance Unfortunately, the proposed approach is infeasible in practice.From 8 base identities and 9 starting values from C ∪ S (we leave out a neutral ), after one iteration, we already have 3,604 predicates.After the second iteration, we estimate the number of predicates to be in the realms of 10 18 .
Fig. 4: A derivation of the "can-accelerate" predicate from Equation 7.
An important contributor to the growth is Step 3. Without Step 3, we generate 66 predicates in the first and 10,568 in the second iteration.Unfortunately, Figure 4 shows that these sums and differences are crucial throughout all iterations of the algorithm.
The difficulties are further detailed on in Appendix B.3 and lead us to approach the problem from a different perspective in the next section.

Predicates From Controller Data
Instead of using domain knowledge we now precisely analyze the controller data.In Figure 5, we have visualized a part of the controller data from the cruise model together with a handcrafted splitting predicate.The coordinate axes describe our three state variables v e , v f , d r and the color of the data points describes which actions are allowed.We see that the handcrafted strategy perfectly separates the red and blue labels.
In this section, we use SVM to perform this task.(For completeness, Appendix C.1 explores why the available curve fitting used in [Wei20,AJK + 21] functionality is not sufficient for our goal.)We apply the standard SVM learning algorithms.These either yield linear classifiers or can use kernel methods where the resulting classifiers in principle can use, for instance, arbitrary polynomials, see Appendix C.2.To keep the complexity of the polynomials under control, we can restrict the degree of the polynomials.For our running example, we know that the kinematics equations are of order two, so we can restrict to quadratic polynomials.Concretely, we change from the linear three-dimensional space (v e , v f , d r ) T to the quadratic space with the following 9 dimensions The moderate increase in dimensions is clearly outweighed by the much better performance of the linear SVM algorithms we can now use.Note that in this case, we used domain knowledge to estimate that no polynomials of degree more than two are necessary.Still, this requires way less manual effort than designing predicates by hand.Moreover, such domain knowledge is not required in general.When evaluating how the approach generalizes in Section 6.2, we obtain good results by always use quadratic polynomials, independent of the case study.
The polynomial predicates we receive for the cruise example consist of up to 25 terms (see Appendix C.3 for an example).To make the decision trees even smaller and more explainable, we introduce four additional techniques besides the standard SVM usage.First, we simplify the individual predicates by removing unimportant terms in Subsection 5.1 and rounding the coefficients to nice numbers in Subsection 5.2.Then, we optimize which predicates are selected when building the decision tree by proposing a new impurity measure in Subsection 5.3 and changing the predicates' priorities in Subsection 5.4.

Feature Importance
As we have discussed in Section 3, the state of the cruise model is defined by v e , v f , and d r .However, the model checker UPPAAL Stratego also exposes four additional state variables.These comprise the current acceleration values a e , a f that do not impact the acceleration the cars choose in the next time step and the variables f choose and e choose that are an artifact from the internal model and have constant values for all relevant states.
To recognize such unimportant variables, we introduce a basic version of a feature importance measure.Consider the two-dimensional dataset shown in Figure 6a with features x 1 and x 2 .To classify a data point, feature x 2 is not needed.We verify this, by removing feature x 2 and grouping the data points with the same x 1 value.We can now measure how many "collisions" occur.If zero collisions occur, the feature is not needed.Otherwise, we can give a rough estimate of the importance of that feature by calculating the ratio of data points where a collision happened.
Fig. 6: Examples with redundant features.In (a) x 2 is not needed.In (b) x 1 and x 2 individually look redundant, but only one may be removed.Also, removing x 1 is preferred over removing x 2 Note that for a dataset like Figure 6b, this approach would judge both features as irrelevant.Individually seen that is correct but we can only remove one of them without causing collisions.This is why we calculate the feature importance incrementally.When we find an irrelevant feature, we remove it directly before calculating the importance of the next feature.As a result, the outcome may depend on the order of features we choose.For example, in Figure 6b, removing x 1 would result in a linearly separable dataset while removing x 2 would not.In general, there might even be a case where we can either remove a single feature x i or all three features x i+1 , x i+2 , x i+3 .However, we did not observe any behavior like this so far, so we leave this issue for future work.

Rounding Coefficients
With the feature importance, we remove variables that are clearly useless and reduce the number of terms in the cruise predicates from 25 to 9. Still, we generate predicates that contain unnecessary terms.For example, we know from our handcrafted predicate (see Equation 7) that we do not need a d 2 r term for the cruise predicates.But, in the predicate we generate (see Appendix C.4), the respective coefficient has a small positive value.To understand why that is the case, recall that the only objective the SVM has is to maximize the margin between the data points.For that, a small coefficient for d 2 r seems to be beneficial.If we loosen the maximum margin objective, we can generate a predicate with equivalent accuracy but a simpler algebraic expression.Again, as we are not interested in the classifier's ability to generalize -as long as the accuracy for our controller data stays the same -we do not care about how large the margins are.So, to prettify our predicate, we proceed in three steps: 1. Setting coefficients to zero. 2. Scaling the entire predicate.3. Rounding coefficient to integers or "nice" numbers.
Rounding to Zero If we can set a coefficient to zero, the predicate becomes significantly shorter and easier to understand.So this is our primary goal.A natural approach is to try setting a coefficient with a small absolute value to zero and checking if the classification for all samples stays the same.While this suffices for some coefficients, sometimes we need to change the remaining ones to counterbalance the change.So, what we do instead is to remove the feature temporarily and try re-training the SVM.If successful, we permanently remove the feature for this split and try the next feature.Similar to the feature importance approach (Subsection 5.1), the result may again depend on the order of coefficients we try to remove.Here, we use the heuristic of trying to remove the coefficient with the smallest absolute value first.
Compared to the feature importance approach, three key differences make this approach more powerful: -We only consider the subset of the entire dataset available in the current subtree.-We only focus on separating one specific label (we only have the two labels +1 and −1).-We directly consider the features in the higher-dimensional space such as d 2 r .
Scaling the Predicate An additional step to improve readability is to scale the generated predicate.In principle, a predicate α : 0.5x + 0.1y ≤ 0.3 is equivalent to a scaled predicate 10α : 5x + y ≤ 3 but the second one might be easier to read.The SVM uses an internal scaling constraint but for us, this is not relevant.We can again lift this constraint and scale all coefficients as well as the intercept value b arbitrarily.One could think of various heuristic of how to scale the predicate.We decided to use a simple one: we search for the coefficient with the value closest to 1 and scale the predicate so that it becomes exactly 1.This way, we have at least one term with a simple coefficient.

General Rounding
As the last step, we generalize the "rounding to zero" approach and use it on the coefficient we could not set to zero.This way, instead of having a predicate like 8.165839d 2 r − 2.935846v r ≤ 0 we can use a nicer looking one like 8d 2 r − 3v r ≤ 0. For that, we try the approach from above with increasing relative precision.For example, for the coefficient of d 2 r , we first try the value 10, then 8, then 8.2, and so on, until we find a value that does not change the classification for any sample.Note that we do not re-train the SVM in this step but simply change the coefficient and check if the classification stays the same.
With these techniques, we can finally generate pretty predicates.For example, one the predicate we find for the cruise_250 dataset exactly corresponds to the handcrafted polynomial from Equation 7after substituting all constants.The only difference is the constant offset.
These rounding procedures raise a number of technical numerical issues, but we describe ways to mitigate these in Appendix C.5.

Min-label Entropy
Now that we have pretty predicates, we shift our focus to the decision tree construction for the next two sections.For the cruise dataset, we can now construct a decision tree with 37 nodes, only 10% of the size when using linear predicates.Moreover, we have seen that the approach generates the exact predicates we derived by hand in Appendix A.3.Still, the decision tree is not as compact as the 11 node tree from [Akm19] as we do not directly use those predicates.To understand why this is the case, we have a look at Figure 7.We see that split A perfectly separates the blue label from the rest, while B separates the red and orange labels but distributes the blue one among both children.Considering only a single split, we would prefer split B because the dataset is nicely separated except for the small number of blue samples.The entropy impurity measure comes to the same conclusion and assigns split B a better entropy score.However, when building a perfect classifier for representing the mostpermissive controller, we have a different perspective than in machine learning.At some point, we need to separate the blue labels from the rest.If we do not separate them now and select split B, we have to add additional splits on both sides of the split B. If we rather start with split A, we can select split B as the next split in the left child and thus receive a smaller decision tree.

A B
Entropy: 1 Entropy: 0.88 Fig. 7: Two different splits with their respective entropy values.While split B has a better entropy value and is preferred in machine learing, we want to use split A first when building a perfect classifier.
This effect is especially prevalent if the number of samples per label differs significantly.In the cruise example, we observe exactly that: the label "all actions are allowed" has 20 times more data points than any of the other labels.Hence, we introduce a new impurity measure that we call min-label entropy: Definition 3.For a dataset X = X l X r with the label set B, let n(X, y) describe the number of data points in X with label y ∈ B. For a predicate α that splits the dataset into X l and X r , we define the min-label entropy H * as Intuitively, the min-label entropy measure estimates for every label y, how difficult it will be to to separate the label y in both partions after this split.Then it returns the value of the best label.The strategy we want to provoke with this impurity measure is to first fully separate one label and then continue with the next one.Specifically, if we can completely separate one label like in the example in Figure 7, the impurity for this split is 0 and we definitely select such a spilt.

Predicate Priority
With the min-label entropy, we reduce the decision tree size of the cruise example to 25.As a last optimization heuristic, we also adjust the priorities of the predicates.When deciding between an axis-aligned and a polynomial predicate that both have similar impurity values, we want to choose the axis-aligned one as it is considerably simpler to understand.For that reason, dtControl has implemented a priority function for predicate generators.For example, when we give the polynomial predicates the priority 0.5 and the axis-aligned ones the priority 1, we only choose a polynomial predicate if it is at least twice as good in terms of the impurity measure.In fact, we want to choose an even lower value as a priority for another reason.In the cruise example, we know that we can find a polynomial that distinguishes cases where we can accelerate from those where we cannot.In our handpicked strategy, we did however not consider the edge cases when we are already driving at minimal or maximal velocity.If we do not exclude those, the data is not perfectly separable, meaning we will find a polynomial split that almost classifies everything correctly, but misses a few data points.While this is not a huge problem, it turns out that it is more effective to first exclude the edge cases with axis-aligned predicates and then perfectly split the data with a complex predicate later.We can achieve it with a low priority value ≤ 0.2 for the polynomial splits in combination with our min-label entropy.This way, we will only choose the complicated splits if they are at least 5 times better.Note that the impurity is 0 if we can perfectly separate one label, so in this case, we are infinitely better than any non-perfect solution.

Experimental Evaluation
In this section, we will evaluate our approach experimentally.As we did not succeed in creating truly explainable decision trees for the cruise example using the method proposed in Section 4, we focus on the method of Section 5.For completeness, in Appendix D.1 we provide the evaluation of the approach described in Section 4 .Thus we evaluate how well generating quadratic polynomials with SVMs performs in practice.While developing the various techniques and heuristics, we mainly focused on the cruise dataset.In this section, we first analyze the results for this dataset but then investigate how well the approach generalizes to other case studies.For that, we compare our results to the existing approaches and to the minimum decision tree achievable in theory.Afterwards, we look at our new impurity measure and its performance independently and finally comment on explainability.
Artifacts All resources such as generated domain knowledge predicates, model files, and synthesized controllers used in this paper are available to download at [Jü21].The repository also contains scripts to reproduce the benchmark tables presented in this paper.

Cruise Control
Using all the strategies discussed in Section 5 we achieve great results for the cruise model.For the cruise_250 dataset, we find a succinct decision tree with only 11 nodes (see Figure 8a).This is exactly the number of nodes [Akm19] found with the handcrafted strategy.In fact, we precisely found the handcrafted "must break" and "can accelerate" predicate from the handcrafted strategy, with only a small difference in the constant offset.
For the slightly larger cruise_300 dataset, we generate a very similar but slightly larger decision tree with 13 nodes (Figure 8b).The quadratic predicates change in line with the change of the constant v min (see Table 3 in the Appendix A.2) and one complex splitting predicate is exchanged for two simpler predicates.Fig. 8: The decision trees for the cruise example generated by our data-driven approach.
In both cases, the generated decision trees are almost 80 times smaller than the ones we receive with axis-aligned predicates and 30 times smaller than the ones with linear predicates.

Generalizing to Other Benchmarks
To see how our approach and the individual heuristics generalize, we evaluate them on the case studies of cyber-physical systems from [AJJ + 20] as well as on the case studies from the quantitative verification benchmark set [HKP + 19] that were used in [AJK + 21].We avoid using any determinization heuristics so that we generate the most-permissive controllers.We ran all experiments on a server with the operating system Ubuntu 20.04, a 2.2GHz Intel(R) Xeon(R) CPU E5-2630 v4 and 250 GB RAM.Table 1 contains a selection of the results, with the case studies of cyber-physical systems at the top and quantitative verification at the bottom.In every row, we compare the number of nodes in the generated decision tree for the axis-aligned splitting strategy (Ax.Al.), Table 1: The number of nodes of the generated decision trees using axis-aligned splits, linear splits, and the proposed quadratic polynomial splits with priority 0.1 and 1.0.Each row displays the result using the entropy impurity measure at the top and using min-label entropy at the bottom.TO means time out after 3 hours.As a comparison, we show the number of states of the underlying controller and the theoretical minimum size a decision tree needs to have (see Appendix D.2).The full table is in Appendix D.3. the smallest decision tree we could generate with axis-aligned and linear predicates2 (Linear), -axis-aligned predicates and the quadratic polynomials generated by support vector machines with a priority value of 0.1 (Poly), -and with the default priority value of 1.0 (PolyPrio1).
In every cell, the top number describes the result using the entropy impurity measure and the bottom number refers to the result using min-label entropy.TO indicates that we were not able to generate a decision tree within three hours.As a comparison, we list the number of states of the controller as well as the theoretical minimum size of the decision tree (see Appendix D.2 for a description of how to calculate this).
A complete table with all 28 case studies, a comparison with BDDs, and results for different linear strategies can be found in Appendix D.3.
Scatter Plot To complement the table, Figure 9 visualizes the results in a logarithmic scatter plot.As a reference, we take the smallest tree we could generate with linear predicates and the entropy impurity.Then we compare it to the size of the tree with axis-aligned predicates and our quadratic polynomials.For example, the two blue points near the location (370, 10) are the two cruise datasets.The x-coordinate is the size of the tree with linear predicates and the y-coordinate shows the size of the polynomial or axis-aligned results.
Analysis Our new approach gives smaller decision trees for almost all case studies, except for helicopter and cdrive.10,where the linear solution is smaller by 6% and traffic_30m where we run into a timeout (see the full table of results in Appendix D.3).In Table 2 we show the cumulated statistics.Most notably, the number of cases where we find a tree of minimum size has increased from 2 to 10 out of 28.DT is smaller or equal 25 (89%) DT has less than half the size 8 (29%)

Min-Label Entropy and Predicate Priority
We applied two significant changes to arrive at the small decision trees in the cruise example: the min-label entropy impurity measure and the modified Based on the decision tree size using linear predicates, we compare how many nodes the decision trees with axis-aligned splits and quadratic polynomials have.Every sample corresponds to a case study of cyber-physical systems (CPS) or originates from the quantitative verification benchmark set (QV).predicate priority value.We now analyze how useful they are on their own for the other case studies.
In Figure 10 we again make use of a logarithmic scatter plot to visualize the data from our tables.As a baseline, we take the size of the decision tree generated with our proposed approach using the entropy impurity measure and the default priority 1.0.We compare it to the size when using the proposed min-label entropy (blue) and when using the reduced priority value 0.1 (red).

Min-Label Entropy
The min-label entropy reduces the tree size in 14 out of 17 cases (82%) where we are not already at the minimum size and do not run into a timeout.Interestingly, this behavior is different when using the min-label entropy with axis-aligned splits or linear splits.There, the min-label entropy can only improve the result in 30 out of 106 cases (28%).Also, we observe 5 cases where our approach only times out when using the min-label entropy but not when using the standard entropy.A reason for this might be that the min-label entropy encourages the formation of decision trees formed like a line.For all case studies where we generate minimum-sized trees like the 10rooms case study, every leaf has a unique label.With the min-label entropy impurity, every splitting predicate separates out one of those labels.So the tree looks like a line.As a consequence, the runtime for finding predicates does not decrease as fast while constructing the tree.When we construct a perfectly balanced tree, the size of the dataset left at the subtree at depth d is only a small fraction (2 −d ) of the original size.In the case of a line, however, the dataset size only decreases slowly.
Low Priority Heuristic While the low priority value helps in the cruise example in combination with the min-label entropy, the only other cases where this heuristic brings an improvement are the dcdc and eajs.2.100.5.ExpUtil case studies (see the full table of results in Appendix D.3).We conclude that our motivating idea of first separating the "outliers" and then using the more sophisticated splits later does not generalize well.Apparently, it is beneficial to just take the best available split right away in complex models.Based on the decision tree size using quadratic polynomials as predicates with entropy and priority 1.0, we compare how the heuristics change the tree size.Every sample corresponds to a case study of cyber-physical systems (CPS) or originates from the quantitative verification benchmark set (QV).

Explainability
We have seen that we can significantly reduce the number of decision tree nodes with our proposed approach.But how explainable are the trees we generate?
Of course, reducing the number of decision nodes already helps create an explainable decision tree.Still, we have to consider that the complexity of the individual splitting predicate increases, thereby potentially reducing explainability.As an example, we consider the 10rooms case study.Here, we find a decision tree with 49 nodes which is the minimum size for a mostpermissive decision tree.Unfortunately, the decision tree is not particularly explainable as some predicates comprise up to 35 terms, even after trying to round coefficients to zero.The reason is the large number of 10 state variables.A quadratic polynomial with ten variables can already have 65 terms.
Regardless of the complexity of individual predicates, for some case studies, the minimum decision tree size is already too large to be easily decision for the case studies helicopter and will have more than 400 and 1,800 nodes respectively.So, in these cases, we might need have seen that automatically generating predicates from domain knowledge is not yet feasible with the current method.Hence we proposed learning quadratic polynomials with support vector machines directly from the controller data.Additionally, we introduced a new impurity measure called min-label entropy focuses separating one specific label first.We integrated both ideas into the decisiontree learning algorithm and implemented it in the open-source tool dtControl.We were able significantly smaller decision trees in cases where the determinization heuristics could not be applied.For cruise model, we generated a tree with the same size as the one created with help of a human expert, and in 10 out of 28 case studies, we even found a decision tree of minimum size.
On the one hand, we showed that more expressive quadratic polynomials can help to generate succinct trees for permissive controllers.On the other hand, a key aspect still to be improved upon is the explainability.Of course, succinct decision trees are already easier to understand by nature, but more complex predicates again reduce explainability.Ideally, we would want to automatically generate a justification explaining the coefficients of each complex predicate.Table 3 contains an overview of the parameters used and the resulting size of the controller measured in number of states and number of state-action pairs.The generated controllers are included in [Jü21].

A.3 Handwritten Controller
To better understand the model, we will briefly explain how the handcrafted strategy works.In the worst case, the front vehicle will start decelerating in the next time step and will continue until it has reached its minimal velocity.For our car, we have to decide what action to take for the next time step t 1 : accelerate, stay neutral or decelerate.To see if it is safe to accelerate, we calculate the relative distance after accelerating for one time step t 1 and then decelerating until the ego vehicle has reached the minimal velocity.
In Figure 11 we have plotted the velocity-time diagram describing the kinematics of both cars in case the ego vehicle accelerates in the next time step.The front vehicle (red) instantly decelerates with the rate a min and then continues with minimal velocity.The ego vehicle (blue) starts with a higher velocity, accelerates for one step, and then decelerates with the same rate.The distance traveled is the time integral of the velocity, so the area between the curves describes the relative distance change.We can partition the area into four sections and calculate the respective areas: With these values, we can write the predicate deciding whether it is safe to accelerate in the next time step as a quadratic polynomial of our state variables

B.2 Handcrafted
how to derive the Equation 7(described in detail in Appendix A.3) using the automated approach of Section 4. We d one d f e : the the front car travels with minimal until the ego car also reaches its minimal velocity.
Figure 4 shows how we can arrive at the handcrafted predicate.At the top, we have a subset of the base values S ∪ C comprising constants like a max and state variable like d r .The color encodes to which physical quantity a value belongs.For example, all velocities are drawn in blue.Then the diagram shows the iterations of our algorithm.Every iteration consists of two phases: Step 3 where values of the same type can be added or subtracted to form new values, and Step 4 where we use our domain knowledge base identities to calculate new values.We leave out the irrelevant values as the total number of generated values would be far too large as we will see in the next section.We observe that the first meaningful distance predicates emerge after four iterations.Then, summing the distance expressions together takes another 3 iterations.The non-simplified version of the predicate that our algorithm would output is By reformulating, we see that this is the same predicate as given in Equation 7 and derived in Appendix A.3.There is only a slight difference in the constant offset.The easiest way to validate this predicate is by using the understanding given in Figure 11.For now, we assume v e ≥ v f and we accelerate while the front car brakes, as this is the worst case.Then, the kinematics can be described in three phases.First, the behavior in the next time step where the front one ).Second, the when both cars break until the front car has reached minimum velocity (d f 1 − d e ).search space and the missing uniqueness of the derivation.
First, as the search space is so large, we need a heuristic telling us which expression will be useful at a later stage.When introducing the approach we stressed that we can use any intermediate predicate in the decision tree.So a natural choice for a heuristic would be some kind of impurity measure on the controller data.Unfortunately, expressions like the time until we reach minimal velocity are not great splitting predicates.And even expressions close to the final predicate like d one or d e are of little use on their own.In Figure 12, we plot the handcrafted predicate d * together with its individual components in a two-dimensional plot for the value v f = 2.While each predicate individually is a bad classifier, their sum can perfectly classify the data.Given only the impurities of the predicates, there does not seem to be a way to conclude which predicates are useful.Thus, developing a better heuristic is an important step for future work.
Second, the handcrafted predicate is so complex that there are alternative expressions that evaluate to the same predicate after substituting the constants with their values.For example, when calculating d one , we set the acceleration to a min − a max which evaluates to −4.Our approach would also try setting the acceleration to a min + a min which also evaluates to −4 but lacks any relation to the situation we want to describe.So only the general domain knowledge and the controller data might not be enough to unambiguously derive an explainable predicate.Either a human has to steer the process and select the most sensible predicates, or we have to somehow incorporate additional information about the model.

C Predicates From Controller Data
By looking at a two-dimensional plot for a fixed value of v f in Figure 12, we see that the predicate is well defined by the data points -we should be able to reconstruct the splitting function by solely looking at the data and fitting a function to it.

C.1 Problems with Curve Fitting
The recent extensions of dtControl [Wei20,AJK + 21] enable us to use curve fitting [Arl94] for finding unspecified coefficients.We know from Equation 7that the handpicked strategy is a quadratic polynomial so we can try to use a general quadratic polynomial coefficients with curve fitting.Unfortunately, this approach fails to find the correct predicate.To understand why we need to investigate how the curve fitting is implemented.
For now, we always consider a one versus the rest split.This means, we pick a label y that we want to separate from the rest and set Consider the two-dimensional data from Figure 13a.What the current version of curve fitting does is the following.First, we map our two-dimensional data x i ∈ R 2 with label y i ∈ {−1, 1} to the three-dimensional space where y i is used as the third coordinate.Then we use regression analysis to fit a function to the data with least-squares-fitting [Lev44,Mar63] (see Figure 13b).What we propose in this thesis, is to use a classification approach rather than a regression approach.So in Figure 13b we are interested in the gray function separating the data points instead of fitting them.This way, we put most emphasis on the sample points close to the split rather than weighting every sample equally.Fig. 13: In the current curve-fitting implementation, a two-dimensional dataset (a) is mapped to a three-dimensional space where the z ∈ {−1, 1} is determined by the label.The old approach then fits a function to the new dataset.Our approach instead tries to separate the data like the gray surface does in (b).
Coming back to the two-dimensional space (Figure 13a), we want to find a function that smoothly separates the labels, ideally maximizing the distance to any specific sample.This is where support vector machines come into play.

C.2 Using Support Vector Machines
Support vector machines (SVMs) exactly do what we want here: find a function that separates the data and maximizes the margins.The main idea offered in this work is that we can reconstruct the algebraic decision function from the internal coefficients of the SVM.This is not feasible for tools like neural networks [HTF09,Chapter 11] but we will see how and under what conditions it is possible for SVMs in the next sections.
The dtControl tool already supports finding linear splitting predicates with SVMs.However, for the cruise example, a linear predicate is not enough to perfectly split the data.So we are tempted to use a polynomial kernel to increase the expressiveness of our SVM.However, the runtime of common training algorithms for SVMs is at least quadratic in the number of samples.And in fact, the algorithms implemented in the open-source tool scikit-learn [PVG + 11] do not terminate within an hour for the cruise example with a few hundred thousand sample points.
We can circumvent this issue by taking advantage of our specific use case.Usually, SVMs are used with high-dimensional datasets like images [DS02] or language models [PWH + 04] where the number of features has the same order of magnitude as the number of samples.For the purpose of controller synthesis, the number of state variables is usually small as the number of states usually grows exponentially with the number of state variables.So while the kernel trick is useful for high-dimensional data, we can renounce the kernel trick in our case and explicitly construct the higher-dimensional space.A similar idea is also described in [CHC + 10].
For example, from the linear three-dimensional space (v e , v f , d r ) T to the quadratic space, we will have the following 9 dimensions The moderate increase in dimensions is clearly outweighed by the much better performance of the linear SVM algorithms we can now use.
Problems With Higher Dimensions At the moment, we only support mapping to the quadratic space, which means our predicates are quadratic polynomials.For higher degree polynomials, we have not seen that the gained expressiveness justifies the significantly increased complexity of the predicates.For example, a cubic predicate with 5 variables already has 55 terms.Even with the methods we will discuss in Subsection 5.1 and Subsection 5.2, this predicate will not fulfil our goal of being explainable.Mapping to a space with features like e x or sin(x) poses the challenge that we can only fit the coefficient, but not scale the function in x-direction like e cx or sin(cx) and is therefore left for future work.
Reconstructing the Algebraic Decision Function Assuming that our SVM finds a separating hyperplane that we want to use as a splitting predicate in our decision tree, how do we reconstruct the algebraic representation?The SVM algorithm finds a hyperplane ( w * , b * ) with w * • x − b * = 0 where x corresponds to a transformed set of state variables in the form of Equation 9.This means the w i are the coefficients of the quadratic polynomial of our state variables.When implementing it in practice, there is a small intermediate step we want to mention for completeness.In order for the quadratic optimization algorithm to work properly, the input data needs to be normalized to have a mean of 0 and a standard deviation of 1.This standardization of course has to be taken into account when exporting the coefficients.

C.3 Predicate Without Prettifying
Before applying the methods described in Subsection 5.1 and Subsection 5.2, a predicate for the cruise example looks like this (rounded to 6 decimal places): − 1.004058e choose d r + 0.000121d 2 r + 4.011296e choose v e − 0.002316d r v e + 0.51353v 2 e + 8.5 • 10 −5 e choose a e − 0.000276d r a e + 0.002239v e a e − 6.4

Predicate Without Rounding
After leaving out features as described in without rounding coefficients as described in Subsection 5.2, a predicate for the cruise example looks like this (rounded to 6 decimal places): − 0.000463d 2 r + 0.008656d r v e − 0.549255v 2 e − 0.005078d r v f + 0.046916v e v f + 0.496888v 2 f + 2.043519d r − 10.25286v e + 6.138132v f − 39.685041 ≤ 0

C.5 Numerical Errors
One problem we need to handle concerns floating point precision errors.When testing our classifier, we use the internal coefficients of the SVM.The coefficients we output are different though, as we need to undo the normalization we applied.We must ensure that possible precision errors from these transformations do not change the classification.In the original predicate generated by the SVM, the classifier maximizes the margin between the label sets so we can be quite confident that small precision errors will not change the classification3 .When trying out rounded coefficients, however, we lose this property.A rounded coefficient might classify everything correctly but the slightly different transformed coefficient might lead to other results.
As heuristic against we change a our is try rounded we the and try 3.00001 instead.that 3 not problems.Additionally, precision if our uses large discussed in C.6.finished, sample is correctly or error rate.In this we use transformed coefficients of the polynomial we output so we can be sure that the decision tree is as accurate as the tool tells the user.

C.6 Advanced Numerical Precision Problems
Sometimes, when training the SVM, very large coefficients occur.As long as every datapoint is located on the right side of the hyperplane, the loss function does not penalize these large coefficients.However, when evaluating a predicate like 2x 1 − 3x 2 + 10 18 x 3 − 10 18 ≤ 0 the floating point precision reaches its limits.So, in the rare case that we observe such a behavior, we apply the following countermeasure.
For every feature i, we add the control samples ( e i , 1) and ( e i , −1) to the dataset, where e i is the unit vector in direction of feature i.Then we re-train the SVM.Note that this way, the The loss function penalizes control samples that are far away from the separating hyperplane because either the positive or the negative control sample is located on the wrong side.Hence, the SVM uses small coefficients to keep the control samples reasonably close to the decision function.The liblinear tool [FCH + 08] that we use in this work also supports different sample weights.Thus, we give the control samples magnitude smaller than the weights for the regular samples so we do not disturb the regular training too much.
If, for some reason, we still have coefficients with an absolute value larger than 10 7 or smaller than 10 −7 but not 0, we change them to a value inside of this interval.This might change the classification but therefore ensures that we do not run into precision errors after exporting the decision tree and evaluating the predicate on a device with a slightly different floating-point engine.

D.1 Domain Knowledge Approach
As we have seen in Section 4, our approach was unable to generate the handcrafted predicate for the cruise example.Still, we generated a lot of predicates that might be useful when building the decision tree.We evaluate three sets of predicates that we generated with our approach from Section 4. As the number of predicates increases so fast and we cannot even complete two iterations, we also try skipping Step 3 in the approach, meaning we do not add sums and differences of our values.The number of nodes of the resulting decision trees for the cruise_250 dataset are shown in Table 4.In addition to the number of generated predicates, we also list the number of unique predicates as multiple predicates can evaluate to the same expression after substituting the constants with their values (as we have seen in the example with a min = −a max ).To build the decision tree, we use the generated predicates in addition to the axis-aligned predicates and choose the best splitting predicates using the entropy impurity measure.
As a comparison, we have included the decision trees we receive without domain knowledge using only axis-aligned splits and using linear predicates generated with the OC1 heuristic [MKS94].
We see that the generated predicates help find succinct decision trees.For the largest predicate set, we even find a smaller tree than we do with linear predicates.Still, it is not clear whether this is because the predicates describe the dynamics of the system well or whether this improvement is simply due to the large number of predicates we try.In fact, we try so many splitting predicates that the runtime increases from 1 minute when using linear predicates to over twelve hours for the large predicate set, even after implementing use-casespecific optimizations.
The BDD sizes for the cyber-physical number of nodes from 10 tries.For the quantitative verification case studies, we show the BDD sizes from [AJK + 21] which correspond to the minimum across 20 tries.
The benchmarks are split into three tables.Table 5 contains the cyberphysical system case studies, Table 6 and Table 7 studies

Fig. 1 :
Fig. 1: Example showing how different types of predicates can separate a dataset.

Fig. 2 :
Fig. 2: An example of how a decision tree can represent a controller.(a) shows a determinized controller, (b) a permissive one with multiple safe actions at some states.

Fig. 5 :
Fig. 5: Visualization of a handcrafted predicate perfectly separating the data.

Fig. 9 :
Fig.9: Performance comparison of different predicate types.Based on the decision tree size using linear predicates, we compare how many nodes the decision trees with axis-aligned splits and quadratic polynomials have.Every sample corresponds to a case study of cyber-physical systems (CPS) or originates from the quantitative verification benchmark set (QV).
Fig.10: Performance comparison of the min-label entropy (MLE) and the low priority heuristic.Based on the decision tree size using quadratic polynomials as predicates with entropy and priority 1.0, we compare how the heuristics change the tree size.Every sample corresponds to a case study of cyber-physical systems (CPS) or originates from the quantitative verification benchmark set (QV).

3:
The parameters used for generating the controllers of the cruise model and the resulting sizes measured in number of states and number of state-action pairs.

Fig. 12 :
Fig.12: The handcrafted predicate d * and the terms used to derive it, plotted for a fixed value of v f = 2.While the sum of the terms is a perfect classifier, individually, they are not helpful for the classification.

Table 2 :
Cumulated statistics over all 28 benchmarks.We compare the best linear strategy with entropy impurity with the best of our heuristics.

Table 5 :
Benchmark results for the cyber-physical system case studies.

Table 6 :
Benchmark results for case studies from the quantitative verification benchmark set (part 1).

Table 7 :
Benchmark results for case studies from the quantitative verification benchmark set (part 2).